Am 18.09.2018 um 16:12 hat Paolo Bonzini geschrieben: > On 18/09/2018 13:34, Kevin Wolf wrote: > >> But then basically the main issue is mirror.c's call to > >> bdrv_drained_begin/end. There are no other calls to > >> bdrv_drained_begin/end inside coroutines IIRC. > > > > Coroutine or not doesn't matter. What matters is that you drain inside > > some (high-level) operation that already needs to be completed itself > > for drain to return. > > Indeed. However, are there any calls to bdrv_drained_begin/end that are > inside inc_in_flight/dec_in_flight, and are not inside coroutines? > (Sorry if I don't see the high-level issue, I'm more familiar with the > low-level functioning...).
inc_in_flight is only one thing that signals pending activity. Essentially, what we're interested in is everything that will make bdrv_drain_poll() return true. For a BDS that is in use by a block job, that BDS will be considered busy as long as the job hasn't paused yet, because otherwise the job can still issue new requests to the BDS (see child_job_drained_poll()). A closely related fix in this series is "blockjob: Lie better in child_job_drained_poll()". Jobs need to make bdrv_drain_poll() return true for their BDSes even between returning from .run (before John's recent series called .start) and between .prepare/.commit/.abort (which used to be the deferred_to_main_loop part) because otherwise drain would return, but the BH would still be pending and change the graph in the middle of a drained section, which is obviously a big problem. This is not purely theoretical, but a crash we observed in practice: bdrv_reopen() suffered use-after-free crashes because after its initial bdrv_subtree_drained_begin(), a node was removed from the graph in a block job completion, but it still was in the ReopenQueue. (IIRC the reopen belonged to one job and the completion to a second job that completed at the same time.) So there is no choice here: The bdrv_subtree_drained_begin() in reopen _must_ make sure that the block job is really quiescent and no callbacks for it are pending any more. At the same time, the completion callbacks of the job (.prepare/.commit/ .abort) use drain themselves because they contain reopens and perform graph reconfigurations. In this case, we don't want to wait for the completion callback to finish because that's a deadlock. Whether we want to wait on the job doesn't depend on where in its code the job currently is. It only depends on whether the drain is issued by the job itself or by someone else. This is what makes it hard to return the right thing in child_job_drained_poll() because I don't see an easy way to know the caller. Just setting busy = false temporarily for calling bdrv_drained_begin() in the job would be wrong: If another thread concurrently drained the same BDS, it would still have to wait! The job being in the exact same state, child_job_drained_poll() has to return different values in the two threads. > It seems to me that if we fixed the mirror bdrv_(co_)drained_begin > case, then it would be good enough. It's only one special case of the problem. If we could solve the problem for the other job types, this one could be fixed the same way. Kevin