On Mon, Aug 30, 2021 at 03:00:37AM +0000, John Johnson wrote:
> > On Aug 24, 2021, at 7:15 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote:
> > On Mon, Aug 16, 2021 at 09:42:37AM -0700, Elena Ufimtseva wrote:
> >> +    proxy->ioc = ioc;
> >> +    proxy->flags = VFIO_PROXY_CLIENT;
> >> +    proxy->state = VFIO_PROXY_CONNECTED;
> >> +    qemu_cond_init(&proxy->close_cv);
> >> +
> >> +    if (vfio_user_iothread == NULL) {
> >> +        vfio_user_iothread = iothread_create("VFIO user", errp);
> >> +    }
> > 
> > Why is a dedicated IOThread needed for VFIO user?
> > 
> 
>       It seemed the best model for inbound message processing.  Most messages
> are replies, so the receiver will either signal a thread waiting the reply or
> report any errors from the server if there is no waiter.  None of this 
> requires
> the BQL.
> 
>       If the message is a request - which currently only happens if device
> DMA targets guest memory that wasn’t mmap()d by QEMU or if the ’secure-dma’
> option is used - then the receiver can then acquire BQL so it can call the
> VFIO code to handle the request.

QEMU is generally event-driven and the APIs are designed for that style.
Threads in QEMU are there for parallelism or resource control,
everything else is event-driven.

It's not clear to me if there is a reason why the message processing
must be done in a separate thread or whether it is just done this way
because the code was written in multi-threaded style instead of
event-driven style.

You mentioned other threads waiting for replies. Which threads are they?

> > Please use true. '1' is shorter but it's less obvious to the reader (I
> > had to jump to the definition to check whether this field was bool or
> > int).
> > 
> 
>       I assume this is also true for the other boolean struct members
> I’ve added.

Yes, please. QEMU uses bool and true/false.

> 
> 
> >> +    aio_bh_schedule_oneshot(iothread_get_aio_context(vfio_user_iothread),
> >> +                            vfio_user_cb, proxy);
> >> +
> >> +    /* drop locks so the iothread can make progress */
> >> +    qemu_mutex_unlock_iothread();
> > 
> > Why does the BQL needs to be dropped so vfio_user_iothread can make
> > progress?
> > 
> 
>       See above.  Acquiring BQL by the iothread is rare, but I have to
> handle the case where a disconnect is concurrent with an incoming request
> message that is waiting for the BQL.  See the proxy state check after BQL
> is acquired in vfio_user_recv()

Here is how this code looks when written using coroutines (this is from
nbd/server.c):

  static coroutine_fn void nbd_trip(void *opaque)
  {
      ...
      req = nbd_request_get(client);
      ret = nbd_co_receive_request(req, &request, &local_err);
      client->recv_coroutine = NULL;
  
      if (client->closing) {
          /*
           * The client may be closed when we are blocked in
           * nbd_co_receive_request()
           */
          goto done;
      }

It's the same check. The code is inverted: the server reads the next
request using nbd_co_receive_request() (this coroutine function can
yield while waiting for data on the socket).

This way the network communication code doesn't need to know how
messages will by processed by the client or server. There is no need for
if (isreply) { qemu_cond_signal(&reply->cv); } else {
proxy->request(proxy->reqarg, buf, &reqfds); }. The callbacks and
threads aren't hardcoded into the network communication code.

This goes back to the question earlier about why a dedicated thread is
necessary here. I suggest writing the network communication code using
coroutines. That way the code is easier to read (no callbacks or
thread synchronization), there are fewer thread-safety issues to worry
about, and users or management tools don't need to know about additional
threads (e.g. CPU/NUMA affinity).

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to