13.06.2019 19:03, Max Reitz wrote: > [re-adding the original CCs, why not] > > On 13.06.19 16:30, Vladimir Sementsov-Ogievskiy wrote: >> 13.06.2019 17:21, Max Reitz wrote: >>> On 13.06.19 16:19, Vladimir Sementsov-Ogievskiy wrote: >>>> 13.06.2019 1:08, Max Reitz wrote: >>>>> If the main loop cancels all block jobs while the block layer is not >>>>> drained, this cancelling may not happen instantaneously. We can start a >>>>> drained section before vm_shutdown(), which entails another >>>>> bdrv_drain_all(); this nested bdrv_drain_all() will thus be a no-op, >>>>> basically. >>>>> >>>>> We do not have to end the drained section, because we actually do not >>>>> want any requests to happen from this point on. >>>>> >>>>> Signed-off-by: Max Reitz <mre...@redhat.com> >>>>> --- >>>>> I don't know whether it actually makes sense to never end this drained >>>>> section. It makes sense to me. Please correct me if I'm wrong. >>>>> --- >>>>> vl.c | 11 +++++++++++ >>>>> 1 file changed, 11 insertions(+) >>>>> >>>>> diff --git a/vl.c b/vl.c >>>>> index cd1fbc4cdc..3f8b3f74f5 100644 >>>>> --- a/vl.c >>>>> +++ b/vl.c >>>>> @@ -4538,6 +4538,17 @@ int main(int argc, char **argv, char **envp) >>>>> */ >>>>> migration_shutdown(); >>>>> >>>>> + /* >>>>> + * We must cancel all block jobs while the block layer is drained, >>>>> + * or cancelling will be affected by throttling and thus may block >>>>> + * for an extended period of time. >>>>> + * vm_shutdown() will bdrv_drain_all(), so we may as well include >>>>> + * it in the drained section. >>>>> + * We do not need to end this section, because we do not want any >>>>> + * requests happening from here on anyway. >>>>> + */ >>>>> + bdrv_drain_all_begin(); >>>>> + >>>>> /* No more vcpu or device emulation activity beyond this point */ >>>>> vm_shutdown(); >>>>> >>>>> >>>> >>>> So, actually, the problem is that we may wait for job requests twice: >>>> on drain and then on cancel. >>> >>> We don’t wait on drain. When the throttle node is drained, it will >>> ignore throttling (as noted in the cover letter). >>> >>> We do wait when cancelling a job while the throttle node isn’t drained, >>> though. That’s the problem. >> >> Ah, understand now. >> >> Is it safe to drain_begin before stopping cpus? We may finish up then with >> some queued >> somewhere IO requests.. > > Hm... Aren’t guest devices prohibited from issuing requests to the > block layer while their respective block device is drained?
It's at least a buggy place, I remember Denis Plotnikov sent patch to fix it and had a huge discussion with Kevin. And here it is: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00732.html > > Otherwise, I suppose I’ll have to move the bdrv_drain_all_begin() below > the vm_shutdown(). That wouldn’t be too big of a problem. > > Max > -- Best regards, Vladimir