On Mon, Mar 04, 2024 at 05:15:05PM -0300, Fabiano Rosas wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Mon, Mar 04, 2024 at 08:53:24PM +0800, Peter Xu wrote:
> >> On Mon, Mar 04, 2024 at 12:42:25PM +0000, Daniel P. Berrangé wrote:
> >> > On Mon, Mar 04, 2024 at 08:35:36PM +0800, Peter Xu wrote:
> >> > > Fabiano,
> >> > > 
> >> > > On Thu, Feb 29, 2024 at 12:29:54PM -0300, Fabiano Rosas wrote:
> >> > > > => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying 
> >> > > > memory
> >> > > 
> >> > > I'm curious normally how much time does it take to do the final 
> >> > > fdatasync()
> >> > > for you when you did this test.
> 
> I measured and it takes ~4s for the live migration and ~2s for the
> non-live. I didn't notice this before because the VM goes into
> postmigrate, so it's paused anyway.
> 
> >> > > 
> >> > > I finally got a relatively large system today and gave it a quick shot 
> >> > > over
> >> > > 128G (100G busy dirty) mapped-ram snapshot with 8 multifd channels.  
> >> > > The
> >> > > migration save/load does all fine, so I don't think there's anything 
> >> > > wrong
> >> > > with the patchset, however when save completes (I'll need to stop the
> >> > > workload as my disk isn't fast enough I guess..) I'll always hit a 
> >> > > super
> >> > > long hang of QEMU on fdatasync() on XFS during which the main thread 
> >> > > is in
> >> > > UNINTERRUPTIBLE state.
> >> > 
> >> > That isn't very surprising. If you don't have O_DIRECT enabled, then
> >> > all that disk I/O from the migrate is going to be in RAM, and thus the
> >> > fdatasync() is likely to trigger writing out alot of data.
> >> > 
> >> > Blocking the main QEMU thread though is pretty unhelpful. That suggests
> >> > the data sync needs to be moved to a non-main thread.
> >> 
> >> Perhaps migration thread itself can also be a candidate, then.
> >> 
> >> > 
> >> > With O_DIRECT meanwhile there should be essentially no hit from 
> >> > fdatasync.
> >> 
> >> The update of COMPLETED status can be a good place of a marker point to
> >> show such flush done if from the gut feeling of a user POV.  If that makes
> >> sense, maybe we can do that sync before setting COMPLETED.
> 
> At the migration completion I believe the multifd threads will have
> already cleaned up and dropped the reference to the channel, it might be
> too late then.
> 
> In the multifd threads, we'll be wasting (like we are today) the extra
> syscalls after the first sync succeeds.
> 
> >> 
> >> No matter which thread does that sync, it's still a pity that it'll go into
> >> UNINTERRUPTIBLE during fdatasync(), then whoever wants to e.g. attach a gdb
> >> onto it to have a look will also hang.
> >
> > Or... would it be nicer we get rid of the fdatasync() but leave that for
> > upper layers?  QEMU used to support file: migration already, it never
> > manage cache behavior; it does smell like something shouldn't be done in
> > QEMU when thinking about it, at least mapped-ram is nothing special to me
> > from this regard.
> >
> > User should be able to control that either manually (sync), or Libvirt can
> > do that after QEMU quits; after all Libvirt holds the fd itself?  It should
> > allow us to get rid of above UNINTERRUPTIBLE / un-debuggable period of QEMU
> > went away.  Another side benefit: rather than holding all of QEMU resources
> > (especially, guest RAM) when waiting for a super slow disk flush, Libvirt /
> > upper layer can do that separately after releasing all the QEMU resources
> > first.
> 
> I like the idea of QEMU having a self-contained
> implementation. Specially since we'll add O_DIRECT support, which is
> already quite heavy-handed if we're talking about managing cache
> behavior.
> 
> However, it's not trivial to find the right place to add the sync.
> Wherever we put it there will be some implications, such as ensuring the
> sync works even after migration failure, avoiding concurrent cleanup,
> etc.
> 
> In any case, I don't think it's correct to have the sync at
> qio_channel_close(), now that we've seen it might block for a long
> time. We could at the very least have a qio_channel_flush()[1] which the
> QIOChannelFile implements with fdatasync(). Then the clients can choose
> when to sync.

Yes, I agree with de-coupling it.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to