On Mon, Apr 01, 2024 at 08:41:20PM +0800, Zhu Yangyang wrote:
> Coroutines are not supposed to block. Instead, they should yield.
> 
> Fixes: f95910f ("nbd: implement TLS support in the protocol negotiation")
> Signed-off-by: Zhu Yangyang <zhuyangyan...@huawei.com>
> ---
>  nbd/client.c       |  7 ++++---
>  nbd/common.c       | 19 ++++++++++++++++---
>  nbd/nbd-internal.h |  6 +++---
>  nbd/server.c       | 10 +++++-----
>  4 files changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 29ffc609a4..1ab91ed205 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -619,18 +619,19 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>          return NULL;
>      }
>      qio_channel_set_name(QIO_CHANNEL(tioc), "nbd-client-tls");
> -    data.loop = g_main_loop_new(g_main_context_default(), FALSE);
>      trace_nbd_receive_starttls_tls_handshake();
>      qio_channel_tls_handshake(tioc,
> -                              nbd_tls_handshake,
> +                              nbd_client_tls_handshake,
>                                &data,
>                                NULL,
>                                NULL);
>  
>      if (!data.complete) {
> +        data.loop = g_main_loop_new(g_main_context_default(), FALSE);
>          g_main_loop_run(data.loop);
> +        g_main_loop_unref(data.loop);
>      }
> -    g_main_loop_unref(data.loop);
> +

Aha - you figured out an elegant way around the client code not being
in coroutine context that had been stumping me, at least for the sake
of a minimal patch.

>      if (data.error) {
>          error_propagate(errp, data.error);
>          object_unref(OBJECT(tioc));
> diff --git a/nbd/common.c b/nbd/common.c
> index 3247c1d618..01ca30a5c4 100644
> --- a/nbd/common.c
> +++ b/nbd/common.c
> @@ -47,14 +47,27 @@ int nbd_drop(QIOChannel *ioc, size_t size, Error **errp)
>  }
>  
>  
> -void nbd_tls_handshake(QIOTask *task,
> -                       void *opaque)
> +void nbd_client_tls_handshake(QIOTask *task, void *opaque)
>  {
>      struct NBDTLSHandshakeData *data = opaque;
>  
>      qio_task_propagate_error(task, &data->error);
>      data->complete = true;
> -    g_main_loop_quit(data->loop);
> +    if (data->loop) {
> +        g_main_loop_quit(data->loop);
> +    }
> +}
> +
> +
> +void nbd_server_tls_handshake(QIOTask *task, void *opaque)
> +{
> +    struct NBDTLSHandshakeData *data = opaque;
> +
> +    qio_task_propagate_error(task, &data->error);
> +    data->complete = true;
> +    if (!qemu_coroutine_entered(data->co)) {
> +        aio_co_wake(data->co);
> +    }
>  }

This matches up with what I was experimenting with last week, in that
server is in coroutine context while client is not.  However, it means
we no longer have a common implementation between the two, so keeping
the code isolated in nbd/common.c makes less sense than just placing
the two specific callbacks in the two specific files right where their
only use case exists (and even making them static at that point).

Or, can we still merge it into one helper?  It looks like we now have
3 viable possibilities:

data->loop data->co
non-NULL   non-NULL    impossible
NULL       NULL        client, qio_task completed right away
non-NULL   NULL        client, qio_task did not complete right away
NULL       non-NULL    server, waking the coroutine depends on if we are in one

With that, we can still get by with one function, but need good
documentation.  I'll post a v3 along those lines, to see what you
think.

>  
>  
> diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
> index dfa02f77ee..99cca9382c 100644
> --- a/nbd/nbd-internal.h
> +++ b/nbd/nbd-internal.h
> @@ -74,13 +74,13 @@ static inline int nbd_write(QIOChannel *ioc, const void 
> *buffer, size_t size,
>  
>  struct NBDTLSHandshakeData {
>      GMainLoop *loop;
> +    Coroutine *co;
>      bool complete;
>      Error *error;
>  };

I had tried to get rid of the GMainLoop *loop member altogether, but
your change has the benefit of a smaller diff than what I was facing
(I got lost in the weeds trying to see if I could convert all of the
client into running in coroutine context).

>  
> -
> -void nbd_tls_handshake(QIOTask *task,
> -                       void *opaque);
> +void nbd_server_tls_handshake(QIOTask *task, void *opaque);
> +void nbd_client_tls_handshake(QIOTask *task, void *opaque);
>  
>  int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);
>  
> diff --git a/nbd/server.c b/nbd/server.c
> index c3484cc1eb..b218512ced 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -777,17 +777,17 @@ static QIOChannel 
> *nbd_negotiate_handle_starttls(NBDClient *client,
>  
>      qio_channel_set_name(QIO_CHANNEL(tioc), "nbd-server-tls");
>      trace_nbd_negotiate_handle_starttls_handshake();
> -    data.loop = g_main_loop_new(g_main_context_default(), FALSE);
> +    data.co = qemu_coroutine_self();
>      qio_channel_tls_handshake(tioc,
> -                              nbd_tls_handshake,
> +                              nbd_server_tls_handshake,
>                                &data,
>                                NULL,
>                                NULL);
>  
> -    if (!data.complete) {
> -        g_main_loop_run(data.loop);
> +    while (!data.complete) {
> +        qemu_coroutine_yield();
>      }
> -    g_main_loop_unref(data.loop);
> +
>      if (data.error) {
>          object_unref(OBJECT(tioc));
>          error_propagate(errp, data.error);
> -- 
> 2.33.0
>

Thanks for the updated patch - it looks like we are heading in a good direction.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


Reply via email to