On Tue, Mar 16, 2021 at 09:37:13PM +0300, Vladimir Sementsov-Ogievskiy wrote: > 16.03.2021 19:08, Roman Kagan wrote: > > On Mon, Mar 15, 2021 at 11:15:44PM +0300, Vladimir Sementsov-Ogievskiy > > wrote: > > > 15.03.2021 09:06, Roman Kagan wrote: > > > > As the reconnect logic no longer interferes with drained sections, it > > > > appears unnecessary to explicitly manipulate the in_flight counter. > > > > > > > > Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance") > > > > > > And here you actually allow qemu_aio_coroutine_enter() call in > > > nbd_client_attach_aio_context_bh() to enter connection_co in any yield > > > point which is possible during drained section. The analysis should be > > > done to be sure that all these yield points are safe for reentering by > > > external qemu_aio_coroutine_enter(). (By external I mean not by the > > > actual enter() we are waiting for at the yield() point. For example > > > qemu_channel_yield() supports reentering.. And therefore (as I > > > understand after fast looking through) nbd_read() should support > > > reentering too.. > > > > I'll do a more thorough analysis of how safe it is. > > > > FWIW this hasn't triggered any test failures yet, but that assert in > > patch 3 didn't ever go off either so I'm not sure I can trust the tests > > on this. > > > > Hmm. First, we should consider qemu_coroutine_yield() in > nbd_co_establish_connection(). > > Most of nbd_co_establish_connection_cancel() purpose is to avoid > reentering this yield()..
Unless I'm overlooking something, nbd_co_establish_connection() is fine with spurious entering at this yield point. What does look problematic, though, is your next point: > And I don't know, how to make it safely reenterable: keep in mind bh > that may be already scheduled by connect_thread_func(). And if bh is > already scheduled, can we cancel it? I'm not sure. > > We have qemu_bh_delete(). But is it possible, that BH is near to be > executed and already cannot be removed by qemu_bh_delete()? I don't > know. > > And if we can't safely drop the bh at any moment, we should wait in > nbd_client_detach_aio_context until the scheduled bh enters the > connection_co.. Or something like this So I think it's not the reentry at this yield point per se which is problematic, it's that that bh may have been scheduled before the aio_context switch so once it runs it would wake up connection_co on the old aio_context. I think it may be possible to address by adding a check into connect_bh(). Thanks, Roman.