On Mon, Mar 04, 2024 at 12:42:25PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 04, 2024 at 08:35:36PM +0800, Peter Xu wrote:
> > Fabiano,
> > 
> > On Thu, Feb 29, 2024 at 12:29:54PM -0300, Fabiano Rosas wrote:
> > > => guest: 128 GB RAM - 120 GB dirty - 1 vcpu in tight loop dirtying memory
> > 
> > I'm curious normally how much time does it take to do the final fdatasync()
> > for you when you did this test.
> > 
> > I finally got a relatively large system today and gave it a quick shot over
> > 128G (100G busy dirty) mapped-ram snapshot with 8 multifd channels.  The
> > migration save/load does all fine, so I don't think there's anything wrong
> > with the patchset, however when save completes (I'll need to stop the
> > workload as my disk isn't fast enough I guess..) I'll always hit a super
> > long hang of QEMU on fdatasync() on XFS during which the main thread is in
> > UNINTERRUPTIBLE state.
> 
> That isn't very surprising. If you don't have O_DIRECT enabled, then
> all that disk I/O from the migrate is going to be in RAM, and thus the
> fdatasync() is likely to trigger writing out alot of data.
> 
> Blocking the main QEMU thread though is pretty unhelpful. That suggests
> the data sync needs to be moved to a non-main thread.

Perhaps migration thread itself can also be a candidate, then.

> 
> With O_DIRECT meanwhile there should be essentially no hit from fdatasync.

The update of COMPLETED status can be a good place of a marker point to
show such flush done if from the gut feeling of a user POV.  If that makes
sense, maybe we can do that sync before setting COMPLETED.

No matter which thread does that sync, it's still a pity that it'll go into
UNINTERRUPTIBLE during fdatasync(), then whoever wants to e.g. attach a gdb
onto it to have a look will also hang.

Thanks,

-- 
Peter Xu


Reply via email to