On Wed, Aug 23, 2017 at 3:20 PM, Eric Blake <ebl...@redhat.com> wrote: > On 08/22/2017 07:51 AM, Stefan Hajnoczi wrote: >> The following scenario leads to an assertion failure in >> qio_channel_yield(): >> >> 1. Request coroutine calls qio_channel_yield() successfully when sending >> would block on the socket. It is now yielded. >> 2. nbd_read_reply_entry() calls nbd_recv_coroutines_enter_all() because >> nbd_receive_reply() failed. >> 3. Request coroutine is entered and returns from qio_channel_yield(). >> Note that the socket fd handler has not fired yet so >> ioc->write_coroutine is still set. >> 4. Request coroutine attempts to send the request body with nbd_rwv() >> but the socket would still block. qio_channel_yield() is called >> again and assert(!ioc->write_coroutine) is hit. >> >> The problem is that nbd_read_reply_entry() does not distinguish between >> request coroutines that are waiting to receive a reply and those that >> are not. >> >> This patch adds a per-request bool receiving flag so >> nbd_read_reply_entry() can avoid spurious aio_wake() calls. >> >> Reported-by: Dr. David Alan Gilbert <dgilb...@redhat.com> >> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> >> --- >> This should fix the issue that Dave is seeing but I'm concerned that >> there are more problems in nbd-client.c. We don't have good >> abstractions for writing coroutine socket I/O code. Something like Go's >> channels would avoid manual low-level coroutine calls. There is >> currently no way to cancel qio_channel_yield() so requests doing I/O may >> remain in-flight indefinitely and nbd-client.c doesn't join them... > > Is this patch needed for 2.10-rc4, or does Fam's series cover the issue?
Fam's series fixes non-shared storage migration. This patch addresses the failure case when the server closes the connection prematurely. Stefan