On 6/7/24 12:42 PM, Fabiano Rosas wrote:
Peter Xu <pet...@redhat.com> writes:

On Thu, May 23, 2024 at 04:05:48PM -0300, Fabiano Rosas wrote:
We've recently added support for direct-io with multifd, which brings
performance benefits, but creates a non-uniform user interface by
coupling direct-io with the multifd capability. This means that users
cannot keep the direct-io flag enabled while disabling multifd.

Libvirt in particular already has support for direct-io and parallel
migration separately from each other, so it would be a regression to
now require both options together. It's relatively simple for QEMU to
add support for direct-io migration without multifd, so let's do this
in order to keep both options decoupled.

We cannot simply enable the O_DIRECT flag, however, because not all IO
performed by the migration thread satisfies the alignment requirements
of O_DIRECT. There are many small read & writes that add headers and
synchronization flags to the stream, which at the moment are required
to always be present.

Fortunately, due to fixed-ram migration there is a discernible moment
where only RAM pages are written to the migration file. Enable
direct-io during that moment.

Signed-off-by: Fabiano Rosas <faro...@suse.de>

Is anyone going to consume this?  How's the performance?

I don't think we have a pre-determined consumer for this. This came up
in an internal discussion about making the interface simpler for libvirt
and in a thread on the libvirt mailing list[1] about using O_DIRECT to
keep the snapshot data out of the caches to avoid impacting the rest of
the system. (I could have described this better in the commit message,
sorry).

Quoting Daniel:

   "Note the reason for using O_DIRECT is *not* to make saving / restoring
    the guest VM faster. Rather it is to ensure that saving/restoring a VM
    does not trash the host I/O / buffer cache, which will negatively impact
    performance of all the *other* concurrently running VMs."

1- https://lore.kernel.org/r/87sez86ztq....@suse.de

About performance, a quick test on a stopped 30G guest, shows
mapped-ram=on direct-io=on it's 12% slower than mapped-ram=on
direct-io=off.


It doesn't look super fast to me if we need to enable/disable dio in each
loop.. then it's a matter of whether we should bother, or would it be
easier that we simply require multifd when direct-io=on.

AIUI, the issue here that users are already allowed to specify in
libvirt the equivalent to direct-io and multifd independent of each
other (bypass-cache, parallel). To start requiring both together now in
some situations would be a regression. I confess I don't know libvirt
code to know whether this can be worked around somehow, but as I said,
it's a relatively simple change from the QEMU side.

Currently, libvirt does not support --parallel with virDomainSave* and virDomainRestore* APIs. I'll work on that after getting support for mapped-ram merged. --parallel is supported in virDomainMigrate* APIs, but obviously those APIs don't accept --bypass-cache.

Regards,
Jim


Another option which would be for libvirt to keep using multifd, but
make it 1 channel only if --parallel is not specified. That might be
enough to solve the interface issues. Of course, it's a different code
altogether than the usual precopy code that gets executed when
multifd=off, I don't know whether that could be an issue somehow.


Reply via email to