On 6/19/19 9:40 PM, Max Reitz wrote: > On 14.06.19 11:22, Vladimir Sementsov-Ogievskiy wrote: >> 13.06.2019 19:03, Max Reitz wrote: >>> [re-adding the original CCs, why not] >>> >>> On 13.06.19 16:30, Vladimir Sementsov-Ogievskiy wrote: >>>> 13.06.2019 17:21, Max Reitz wrote: >>>>> On 13.06.19 16:19, Vladimir Sementsov-Ogievskiy wrote: >>>>>> 13.06.2019 1:08, Max Reitz wrote: >>>>>>> If the main loop cancels all block jobs while the block layer is not >>>>>>> drained, this cancelling may not happen instantaneously. We can start a >>>>>>> drained section before vm_shutdown(), which entails another >>>>>>> bdrv_drain_all(); this nested bdrv_drain_all() will thus be a no-op, >>>>>>> basically. >>>>>>> >>>>>>> We do not have to end the drained section, because we actually do not >>>>>>> want any requests to happen from this point on.
Your cover description was more helpful to understand the big picture than this commit description. >>>>>>> >>>>>>> Signed-off-by: Max Reitz <mre...@redhat.com> >>>>>>> --- >>>>>>> I don't know whether it actually makes sense to never end this drained >>>>>>> section. It makes sense to me. Please correct me if I'm wrong. >>>>>>> --- >>>>>>> vl.c | 11 +++++++++++ >>>>>>> 1 file changed, 11 insertions(+) >>>>>>> >>>>>>> diff --git a/vl.c b/vl.c >>>>>>> index cd1fbc4cdc..3f8b3f74f5 100644 >>>>>>> --- a/vl.c >>>>>>> +++ b/vl.c >>>>>>> @@ -4538,6 +4538,17 @@ int main(int argc, char **argv, char **envp) >>>>>>> */ >>>>>>> migration_shutdown(); >>>>>>> >>>>>>> + /* >>>>>>> + * We must cancel all block jobs while the block layer is drained, >>>>>>> + * or cancelling will be affected by throttling and thus may block >>>>>>> + * for an extended period of time. >>>>>>> + * vm_shutdown() will bdrv_drain_all(), so we may as well include >>>>>>> + * it in the drained section. >>>>>>> + * We do not need to end this section, because we do not want any >>>>>>> + * requests happening from here on anyway. >>>>>>> + */ >>>>>>> + bdrv_drain_all_begin(); This change looks sane, but I'm not expert with the 'block layer'. >>>>>>> + >>>>>>> /* No more vcpu or device emulation activity beyond this point >>>>>>> */ >>>>>>> vm_shutdown(); >>>>>>> >>>>>>> >>>>>> >>>>>> So, actually, the problem is that we may wait for job requests twice: >>>>>> on drain and then on cancel. >>>>> >>>>> We don’t wait on drain. When the throttle node is drained, it will >>>>> ignore throttling (as noted in the cover letter). >>>>> >>>>> We do wait when cancelling a job while the throttle node isn’t drained, >>>>> though. That’s the problem. >>>> >>>> Ah, understand now. >>>> >>>> Is it safe to drain_begin before stopping cpus? We may finish up then with >>>> some queued >>>> somewhere IO requests.. >>> >>> Hm... Aren’t guest devices prohibited from issuing requests to the >>> block layer while their respective block device is drained? >> >> It's at least a buggy place, I remember Denis Plotnikov sent patch to fix it >> and had a huge >> discussion with Kevin. >> And here it is: >> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00732.html > > Ah, I even have that in my inbox... The latest reply I see came in April: > > https://lists.nongnu.org/archive/html/qemu-block/2019-04/msg00243.html > > Where Kevin asked for an RFC patch in the current state. > > I’m not sure whether I should work around a potential bug here, if we > can agree that it is a bug, and if it isn’t clear whether this place > would actually be affected. Hmmm so what is the outcome of this thread? Apparently this is not a bug for Max, but a potential one for Vladimir? Vladimir are you NACKing this patch? Thanks, Phil. > > Max >