On Tue 17 May 2016 04:47:57 PM CEST, Kevin Wolf wrote: >> > Right, but if the coroutine yields, we jump back to the caller, which >> > looks like this: >> > >> > co = qemu_coroutine_create(bdrv_flush_co_entry); >> > qemu_coroutine_enter(co, &rwco); >> > while (rwco.ret == NOT_DONE) { >> > aio_poll(aio_context, true); >> > } >> > >> > So this loops until the flush has completed. The only way I can see >> > how something else (job (a)) can run is if aio_poll() calls it. >> >> If I'm not wrong that can happen if job (a) is sleeping. >> >> If job (a) is launched with a rate limit, then the bdrv_drain() call >> will not drain it completely but return when the block job hits the >> I/O limit -> block_job_sleep_ns() -> co_aio_sleep_ns(). > > So timers. Currently, bdrv_drained_begin/end disables the FD handlers, > but not timers. Not sure if that's easily fixable in a generic way, > though, so maybe we need to disable timers one by one. > > Pausing a block job does not actually cause the timer to be cancelled. > Maybe we need to change block_job_sleep_ns() so that the function > returns only when the job isn't paused any more. Maybe something like: > > job->busy = false; > if (!block_job_is_paused(job)) { > co_aio_sleep_ns(bdrv_get_aio_context(job->bs), type, ns); > /* The job could be paused now */ > } > if (block_job_is_paused(job)) { > qemu_coroutine_yield(); > } > job->busy = true; > > Then it should be safe to assume that if block jobs are paused (even > if they were sleeping), they won't issue new requests any more until > they are resumed.
That looks good, I'll give it a try. Berto