Am 07.06.2019 um 16:27 hat Vladimir Sementsov-Ogievskiy geschrieben: > 07.06.2019 16:26, Kevin Wolf wrote: > > Am 07.06.2019 um 14:02 hat Vladimir Sementsov-Ogievskiy geschrieben: > >> 07.06.2019 11:06, Kevin Wolf wrote: > >>> Am 07.06.2019 um 05:17 hat Eric Blake geschrieben: > >>>> On 4/11/19 12:27 PM, Vladimir Sementsov-Ogievskiy wrote: > >>>>> +static coroutine_fn void nbd_reconnect_loop(NBDConnection *con) > >>>>> +{ > >>>>> + NBDClientSession *s = nbd_get_client_session(con->bs); > >>>>> + uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > >>>>> + uint64_t delay_ns = s->reconnect_delay * 1000000000UL; > >>>> > >>>> Do we have a #define constant for nanoseconds in a second to make this > >>>> more legible than counting '0's? > >>>> > >>>>> + uint64_t timeout = 1000000000UL; /* 1 sec */ > >>>>> + uint64_t max_timeout = 16000000000UL; /* 16 sec */ > >>>> > >>>> 1 * constant, 16 * constant > >>>> > >>>>> + > >>>>> + nbd_reconnect_attempt(con); > >>>>> + > >>>>> + while (nbd_client_connecting(s)) { > >>>>> + if (s->state == NBD_CLIENT_CONNECTING_WAIT && > >>>>> + qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > > >>>>> delay_ns) > >>>>> + { > >>>>> + s->state = NBD_CLIENT_CONNECTING_NOWAIT; > >>>>> + qemu_co_queue_restart_all(&s->free_sema); > >>>>> + } > >>>>> + > >>>>> + bdrv_dec_in_flight(con->bs); > >>>>> + qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, timeout); > >>>> > >>>> Another place where I'd like someone more familiar with coroutines to > >>>> also have a look. > >>> > >>> What's the exact question you'd like me to answer? > >>> > >>> But anyway, bdrv_dec/inc_in_flight() around the sleep looks wrong to me. > >>> Either the operation must be waited for in drain, then you can't > >>> decrease the counter even during the sleep. Or drain doesn't have to > >>> consider it, then why is the counter even increased in the first place? > >>> > >>> The way it currently is, drain can return assuming that there are no > >>> requests, but after the timeout the request automatically comes back > >>> while the drain caller assumes that there is no more activity. This > >>> doesn't look right. > >> > >> Hmm. > >> > >> This ind/dec around all lifetime of connection coroutine you added in > >> > >> commit 5ad81b4946baf948c65cf7e1436d9b74803c1280 > >> Author: Kevin Wolf <kw...@redhat.com> > >> Date: Fri Feb 15 16:53:51 2019 +0100 > >> > >> nbd: Restrict connection_co reentrance > >> > >> > >> And now I try to connect in endless loop, with qemu_co_sleep_ns() inside. > >> I need to get a change to bdrv_drain to complete, so I have to sometimes > >> drop this in_flight request. But I understand your point. > > > > Ah, right, I see. I think it's fine to add a second point where we > > decrease the counter (though I would add a comment telling why we do > > this) as long as the right conditions are met. > > > > The right conditions are probably something like: Once drained, we > > guarantee that we don't call any callbacks until the drained section > > ends. In nbd_read_eof() this is true because we can't get an answer if > > we had no request running. > > > > Or actually... This assumes a nice compliant server that doesn't just > > arbitrarily send unexpected messages. Is the existing code broken if we > > connect to a malicious server? > > > >> Will the following work better? > >> > >> bdrv_dec_in_flight(con->bs); > >> qemu_co_sleep_ns(...); > >> while (s->drained) { > >> s->wait_drained_end = true; > >> qemu_coroutine_yield(); > >> } > >> bdrv_inc_in_flight(con->bs); > >> > >> ... > >> .drained_begin() { > >> s->drained = true; > >> } > >> > >> .drained_end() { > >> if (s->wait_drained_end) { > >> s->wait_drained_end = false; > >> aio_co_wake(s->connection_co); > >> } > >> } > > > > This should make sure that we don't call any callbacks before the drain > > section has ended. > > > > We probably still have a problem because nbd_client_attach_aio_context() > > reenters the coroutine with qemu_aio_coroutine_enter(), which will cause > > an assertion failure if co->scheduled is set. So this needs to use a > > version that can cancel a sleep. > > > > I see that your patch currently simply ignores attaching a new context, > > but then the coroutine stays in the old AioContext. Did I miss some > > additional code that moves it to the new context somehow or will it just > > stay where it was if the coroutine happens to be reconnecting when the > > AioContext was supposed to change? > > > Hmm. Do you mean "in latter case we do nothing" in > > void nbd_client_attach_aio_context(BlockDriverState *bs, > AioContext *new_context) > { > NBDClientSession *client = nbd_get_client_session(bs); > > /* > * client->connection_co is either yielded from nbd_receive_reply or > from > * nbd_reconnect_loop(), in latter case we do nothing > */ > if (client->state == NBD_CLIENT_CONNECTED) { > qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), > new_context); > > bdrv_inc_in_flight(bs); > > /* > * Need to wait here for the BH to run because the BH must run > while the > * node is still drained. > */ > aio_wait_bh_oneshot(new_context, nbd_client_attach_aio_context_bh, > bs); > } > } > > ? > Not sure why I ignored this case. So, I should run if() body unconditionally > here and support > interrupting timer-sleeping coroutine in nbd_client_attach_aio_context_bh, > yes?
Yes, I think so. We have now two places where the coroutine could be yielded, the old place that simply yielded in a loop and can be reentered without a problem and the new one in a sleep. Both need to be supported when we switch to coroutine to a new AioContext. Kevin