On 06/02/2024 16:30, Fabiano Rosas wrote:
External email: Use caution opening links or attachments


Avihai Horon <avih...@nvidia.com> writes:

On 05/02/2024 21:49, Fabiano Rosas wrote:
External email: Use caution opening links or attachments


During multifd channel creation (multifd_send_new_channel_async) when
TLS is enabled, the multifd_channel_connect function is called twice,
once to create the TLS handshake thread and another time after the
asynchrounous TLS handshake has finished.

This creates a slightly confusing call stack where
multifd_channel_connect() is called more times than the number of
channels. It also splits error handling between the two callers of
multifd_channel_connect() causing some code duplication. Lastly, it
gets in the way of having a single point to determine whether all
channel creation tasks have been initiated.

Refactor the code to move the reentrancy one level up at the
multifd_new_send_channel_async() level, de-duplicating the error
handling and allowing for the next patch to introduce a
synchronization point common to all the multifd channel creation,
regardless of TLS.

Signed-off-by: Fabiano Rosas <faro...@suse.de>
---
   migration/multifd.c | 73 +++++++++++++++++++--------------------------
   1 file changed, 30 insertions(+), 43 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cc10be2c3f..89d39fa67c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -869,30 +869,7 @@ out:
       return NULL;
   }

-static bool multifd_channel_connect(MultiFDSendParams *p,
-                                    QIOChannel *ioc,
-                                    Error **errp);
-
-static void multifd_tls_outgoing_handshake(QIOTask *task,
-                                           gpointer opaque)
-{
-    MultiFDSendParams *p = opaque;
-    QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-    Error *err = NULL;
-
-    if (!qio_task_propagate_error(task, &err)) {
-        trace_multifd_tls_outgoing_handshake_complete(ioc);
-        if (multifd_channel_connect(p, ioc, &err)) {
-            return;
-        }
-    }
-
-    trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
-
-    multifd_send_set_error(err);
-    multifd_send_kick_main(p);
-    error_free(err);
-}
+static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque);

   static void *multifd_tls_handshake_thread(void *opaque)
   {
@@ -900,7 +877,7 @@ static void *multifd_tls_handshake_thread(void *opaque)
       QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);

       qio_channel_tls_handshake(tioc,
-                              multifd_tls_outgoing_handshake,
+                              multifd_new_send_channel_async,
                                 p,
                                 NULL,
                                 NULL);
@@ -936,19 +913,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
                                       QIOChannel *ioc,
                                       Error **errp)
   {
-    trace_multifd_set_outgoing_channel(
-        ioc, object_get_typename(OBJECT(ioc)),
-        migrate_get_current()->hostname);
-
-    if (migrate_channel_requires_tls_upgrade(ioc)) {
-        /*
-         * tls_channel_connect will call back to this
-         * function after the TLS handshake,
-         * so we mustn't call multifd_send_thread until then
-         */
-        return multifd_tls_channel_connect(p, ioc, errp);
-    }
-
       migration_ioc_register_yank(ioc);
       p->registered_yank = true;
       p->c = ioc;
@@ -959,20 +923,43 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
       return true;
   }

+/*
+ * When TLS is enabled this function is called once to establish the
+ * TLS connection and a second time after the TLS handshake to create
+ * the multifd channel. Without TLS it goes straight into the channel
+ * creation.
+ */
   static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
   {
       MultiFDSendParams *p = opaque;
       QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
       Error *local_err = NULL;

+    bool ret;
+
       trace_multifd_new_send_channel_async(p->id);
-    if (!qio_task_propagate_error(task, &local_err)) {
-        qio_channel_set_delay(ioc, false);
-        if (multifd_channel_connect(p, ioc, &local_err)) {
-            return;
-        }
+
+    if (qio_task_propagate_error(task, &local_err)) {
+        ret = false;
+        goto out;
+    }
I think this common error handling for both TLS/non-TLS is a bit
problematic if there is an error in TLS handshake:
multifd_tls_channel_connect() sets p->c = QIO_CHANNEL(tioc).
TLS handshake fails.
multifd_new_send_channel_async() errors and calls
object_unref(OBJECT(ioc)) which will result in freeing the IOC.
Then, multifd_send_terminate_threads() will try to access p->ioc because
it's not NULL, causing a segfault.
Good catch.

I'm not sure the current reference counting is even correct. AFAICS, the
refcount is 2 at new_send_channel_async due to the qio_task taking a
reference and that will be decremented after we return from the
completion callback, which is multifd_new_send_channel_async itself. The
last reference should be dropped when we cleanup the channel.

So I don't really understand the need for that unref there. But there's
no asserts being reached due to an extra decrement, so there might be
some extra increment hiding somewhere.

I think the ref counting is correct, in the non-TLS case we never set p->c = ioc, so the cleanup will just skip destroying this p->c.


Anyway, I'll figure this out and update this patch. Thanks

+
+    qio_channel_set_delay(ioc, false);
Maybe qio_channel_set_delay() should be moved inside
multifd_channel_connect()? It's called two times when TLS is used.

It looks like it could, I'll do that.

+
+    trace_multifd_set_outgoing_channel(ioc, object_get_typename(OBJECT(ioc)),
+                                       migrate_get_current()->hostname);
+
+    if (migrate_channel_requires_tls_upgrade(ioc)) {
+        ret = multifd_tls_channel_connect(p, ioc, &local_err);
+    } else {
+        ret = multifd_channel_connect(p, ioc, &local_err);
+    }
+
+    if (ret) {
+        return;
       }

+out:
       trace_multifd_new_send_channel_async_error(p->id, local_err);
       multifd_send_set_error(local_err);
       multifd_send_kick_main(p);
--
2.35.3


Reply via email to