On Wed, Apr 10, 2024 at 10:05:28AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > @@ -755,7 +764,8 @@ struct NBDTLSServerHandshakeData {
> > > >        Coroutine *co;
> > > >    };
> > > > 
> > > > -static void nbd_server_tls_handshake(QIOTask *task, void *opaque)
> > > > +static coroutine_fn void
> > > 
> > > This is not, that's a callback for tls handshake, which is not coroutine 
> > > context as I understand.
> > 
> > The call chain is:
> > 
> > nbd_negotiate() - coroutine_fn before this patch
> >    nbd_negotiate_options() - marked coroutine_fn here
> >      nbd_negotiate_handle_starttls() - marked coroutine_fn here
> >        qio_channel_tls_handshake() - in io/channel-tls.c; not marked 
> > coroutine_fn, but
> >                                      works both in or out of coroutine 
> > context
> >          ...
> >          nbd_server_tls_handshake() - renamed in 1/2 of this series
> > 
> > > without this hunk:
> > 
> > I can drop that particular marking.  There are cases where the
> > callback is reached without having done the qemu_coroutine_yield() in
> > nbd_negotiate_handle_starttls; but there are also cases where the IO
> > took enough time that the coroutine yielded and it is that callback
> > that reawakens it.
> 
> The key thing for me is that in this case (when coroutine yielded), 
> nbd_server_tls_handshake() would finally be called from glib IO handler, set 
> in
> 
>    qio_channel_tls_handshake()
>       qio_channel_tls_handshake_task()
>          qio_channel_add_watch_full()
>             g_source_set_callback(source, (GSourceFunc)func, user_data, 
> notify);   <<<
> 
> , which would definitely not be a coroutine context.
> 
> 
> Do I understand correctly, that "coroutine_fn" means "only call from 
> coroutine context"[1], or does it mean "could be called from coroutine 
> context"[2]?

I'm always fuzzy on that distinction myself.  But reading the docs helps...


> 
> The comment in osdep.h says:
> 
>  * Mark a function that executes in coroutine context                         
>                             |}
>  *                                                                            
>                             |
>  * Functions that execute in coroutine context cannot be called directly from 
>                             |
>  * normal functions. ...
> 
> So I assume, we mean [1].

...[1] sounds more appropriate.  Adding the marker is more of a
promise that "I've audited that this function does not block and is
therefore safe for a function in coroutine context to use", but
sometimes additionally implies "this function assumes a coroutine is
present; if you are not in a coroutine, things might break".  But with
a bit of thought, it is obvious that a coroutine can call functions
that are not marked with coroutine_fn: any non-blocking syscall fits
in this category (since we don't have control over system headers to
add coroutine_fn annotations to those).  So on that grounds, it is
okay for a function marked coroutine_fn to call another function not
marked coroutine_fn - it just makes the auditing harder to ensure that
you aren't violating your promise of no non-blocking calls.

It's too close to 9.0-rc3 for my comfort to include this patch series.
Even though this bug can cause wedged migrations, I'd feel more
comfortable with preparing the pull request for this series (including
your fix for dropping this one annotation) for 9.1 and for qemu-stable
once the release is complete.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


Reply via email to