Re: Revisiting parallel save/restore

2024-05-02 Thread Claudio Fontana
On 4/26/24 16:50, Daniel P. Berrangé wrote:
> On Fri, Apr 26, 2024 at 11:44:38AM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé  writes:
>>
>>> On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote:
 Daniel P. Berrangé  writes:

> On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
>> A good starting point on this journey is supporting the new mapped-ram
>> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
>> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm 
>> not
>> sure how to detect if a saved image is in mapped-ram format vs the 
>> existing,
>> sequential stream format.
>
> Yes, we'll need to be supporting 'mapped-ram', so a good first step.
>
> A question is whether we make that feature mandatory for all save images,
> or implied by another feature (parallel save), or an directly controllable
> feature with opt-in.
>
> The former breaks back compat with existnig libvirt, while the latter 2
> options are net new so don't have compat implications.
>
> In terms of actual data blocks written on disk mapped-ram should be be the
> same size, or smaller, than the existing format.
>
> In terms of logical file size, however, mapped-ram will almost always be
> larger.
>
> This is because mapped-ram will result in a file whose logical size 
> matches
> the guest RAM size, plus some header overhead, while being sparse so not
> all blocks are written.
>
> If tools handling save images aren't sparse-aware this could come across
> as a surprise and even be considered a regression.
>
> Mapped ram is needed for parallel saves since it lets each thread write
> to a specific region of the file.
>
> Mapped ram is good for non-parallel saves too though, because the mapping
> of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
> Currently libvirt has to tunnnel over its iohelper to futz alignment
> needed for O_DIRECT. This makes it desirable to use in general, but back
> compat hurts...

 Note that QEMU doesn't support O_DIRECT without multifd.

 From mapped-ram patch series v4:

 - Dropped support for direct-io with fixed-ram _without_ multifd. This
   is something I said I would do for this version, but I had to drop
   it because performance is really bad. I think the single-threaded
   precopy code cannot cope with the extra latency/synchronicity of
   O_DIRECT.
>>>
>>> Note the reason for using O_DIRECT is *not* to make saving / restoring
>>> the guest VM faster. Rather it is to ensure that saving/restoring a VM
>>> does not trash the host I/O / buffer cache, which will negatively impact
>>> performance of all the *other* concurrently running VMs.

You can absolutely also thrash yourself, not only other VMs.

>>
>> Well, there's surely a performance degradation threshold that negates
>> the benefits of perserving the caches. But maybe it's not as low as I
>> initially thought then.
> 
> I guess you could say that O_DIRECT makes saving/restoring have a
> predictable speed, because it will no longer randomly vary depending
> on how much free RAM happens to be available at a given time. Time
> will be dominated largely by the underlying storage I/O performanc
With fast nvme disks, my observation is that O_DIRECT without multifd is 
bottlenecked on libvirt + QEMU throughput, not on storage I/O Performance.

> 
> With regards,
> Daniel

Ciao,

Claudio
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: Revisiting parallel save/restore

2024-05-01 Thread Jim Fehlig via Devel

On 4/26/24 4:04 AM, Daniel P. Berrangé wrote:

On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:

A good starting point on this journey is supporting the new mapped-ram
capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not
sure how to detect if a saved image is in mapped-ram format vs the existing,
sequential stream format.


Yes, we'll need to be supporting 'mapped-ram', so a good first step.

A question is whether we make that feature mandatory for all save images,
or implied by another feature (parallel save), or an directly controllable
feature with opt-in.

The former breaks back compat with existnig libvirt, while the latter 2
options are net new so don't have compat implications.

In terms of actual data blocks written on disk mapped-ram should be be the
same size, or smaller, than the existing format.

In terms of logical file size, however, mapped-ram will almost always be
larger.

This is because mapped-ram will result in a file whose logical size matches
the guest RAM size, plus some header overhead, while being sparse so not
all blocks are written.

If tools handling save images aren't sparse-aware this could come across
as a surprise and even be considered a regression.

Mapped ram is needed for parallel saves since it lets each thread write
to a specific region of the file.

Mapped ram is good for non-parallel saves too though, because the mapping
of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
Currently libvirt has to tunnnel over its iohelper to futz alignment
needed for O_DIRECT. This makes it desirable to use in general, but back
compat hurts...


Looking at what we did in the past

First time, we stole a element from 'uint32_t unused[..]' in the
save header, to add the 'compressed' field, and bumped the
version. This prevented old libvirt reading the files. This was
needed as adding compression was a non-backwards compatible
change. We could have carried on using version 1 for non-compressd
fields, but we didn't for some reason. It was a hard compat break.


Hmm, libvirt's implementation of compression seems to conflict with mapped-ram. 
AFAIK, mapped-ram requires a seekable fd. Should the two be mutually exclusive?




Next time, we stole a element from 'uint32 unused[..]' in the
save header, to add the 'cookie_len' field, but did NOT bump
the version. 'unused' is always all zeroes, so new libvirt could
detect whether the cookie was present by the len being non-zero.
Old libvirt would still load the image, but would be ignoring
the cookie data. This was largely harmless.

This time mapped-ram is a non-compatible change, so we need to
ensure old libvirt won't try to read the files, which suggests
either a save version bump, or we could abuse the 'compressed'
field to indicate 'mapped-ram' as a form of compression.

If we did a save version bump, we might want to carrry on using
v2 for non mapped ram.


IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and
instead must use 'file:'. Does qemu advertise support for that? I couldn't
find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in
theory we could live without the advertisement.


'mapped-ram' is reported in QMP as a MigrationCapability, so I think we
can probe for it directly.

Yes, it is exclusively for use with 'file:' protocol. If we want to use
FD passing, then we can still do that with 'file:', by using QEMU's
generic /dev/fdset/NNN approach we have with block devices.



It's also not clear when we want to enable the mapped-ram capability. Should
it always be enabled if supported by the underlying qemu? One motivation for
creating the mapped-ram was to support direct-io of the migration stream in
qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g.
the mapped-ram capability is enabled when user specifies
VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd
&& qemu supports mapped-ram?


One option is to be lazy and have a /etc/libvirt/qemu.conf for the
save format version, defaulting to latest v3. Release note that
admin/host provisioning apps must set it to v2 if back compat is
needed with old libvirt. If we assume new -> old save image loading
is relatively rare, that's probably good enough.

IOW, we can

  * Bump save version to 3
  * Use v3 by default


Using mapped-ram by default but not supporting compression would be a 
regression, right? E.g. 'virsh save vm-name /some/path' would suddenly fail if 
user's /etc/libvirt/qemu.conf contained 'save_image_format = "lzop"'.


Regards,
Jim


  * Add a SAVE_PARALLEL flag which implies mapped-ram, reject
if v2
  * Use mapped RAM with BYPASS_CACHE for v3, old approach for v2
  * Steal another unused field to indicate use of mapped-ram,
or perhaps future proof it by declaring a 'features'
field. So we don't need to bump version again, just make

Re: Revisiting parallel save/restore

2024-04-26 Thread Jim Fehlig via Devel

On 4/26/24 4:07 AM, Daniel P. Berrangé wrote:

On Thu, Apr 25, 2024 at 04:41:02PM -0600, Jim Fehlig via Devel wrote:

On 4/17/24 5:12 PM, Jim Fehlig wrote:

Hi All,

While Fabiano has been working on improving save/restore performance in
qemu, I've been tinkering with the same in libvirt. The end goal is to
introduce a new VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along
with a VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify
the number of concurrent channels used for the save/restore. Recall
Claudio previously posted a patch series implementing parallel
save/restore completely in libvirt, using qemu's multifd functionality
[1].

A good starting point on this journey is supporting the new mapped-ram
capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm
not sure how to detect if a saved image is in mapped-ram format vs the
existing, sequential stream format.


While hacking on a POC, I discovered the save data cookie and assume the use
of mapped-ram could be recorded there?


The issue with that is the semantics around old libivrt loading
the new image. Old libvirt won't know to look for 'mapped-ram'
element/attribute in the XML cookie, so will think it is a
traditional image with hilariously predictable results :-)


Haha :-). I need to recall we aim to support new-to-old migration upstream. We 
limit our downstream support scope, and this type of migration scenario is one 
that falls in the unsupported bucket.


Regards,
Jim
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: Revisiting parallel save/restore

2024-04-26 Thread Jim Fehlig via Devel

On 4/26/24 4:04 AM, Daniel P. Berrangé wrote:

On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:

A good starting point on this journey is supporting the new mapped-ram
capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not
sure how to detect if a saved image is in mapped-ram format vs the existing,
sequential stream format.


Yes, we'll need to be supporting 'mapped-ram', so a good first step.

A question is whether we make that feature mandatory for all save images,
or implied by another feature (parallel save), or an directly controllable
feature with opt-in.


It feels more like an implementation detail.



The former breaks back compat with existnig libvirt, while the latter 2
options are net new so don't have compat implications.

In terms of actual data blocks written on disk mapped-ram should be be the
same size, or smaller, than the existing format.

In terms of logical file size, however, mapped-ram will almost always be
larger.


Correct. E.g. from a mostly idle 8G VM

# stat existing-format.sav
  Size: 510046983   Blocks: 996192 IO Block: 4096   regular file

# stat mapped-ram-format.sav
  Size: 8597730739  Blocks: 956200 IO Block: 4096   regular file

The upside is mapped-ram is bounded, unlike the existing stream, which can 
result in actual file sizes much greater than RAM size when VM runs a memory 
intensive workload.



This is because mapped-ram will result in a file whose logical size matches
the guest RAM size, plus some header overhead, while being sparse so not
all blocks are written.

If tools handling save images aren't sparse-aware this could come across
as a surprise and even be considered a regression.


Yes, I already had visions of phone ringing off the hook asking "why are my save 
images suddenly huge?". But maybe it's tolerable once they realize actual blocks 
used, and when combined with parallel they could also be asking "why are saves 
suddenly so fast?" :-).




Mapped ram is needed for parallel saves since it lets each thread write
to a specific region of the file.

Mapped ram is good for non-parallel saves too though, because the mapping
of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
Currently libvirt has to tunnnel over its iohelper to futz alignment
needed for O_DIRECT. This makes it desirable to use in general, but back
compat hurts...


My POC avoids the use of iohelper with mapped-ram. It provides qemu with two fds 
when direct-io has been requested, one opened with O_DIRECT, one without.




Looking at what we did in the past

First time, we stole a element from 'uint32_t unused[..]' in the
save header, to add the 'compressed' field, and bumped the
version. This prevented old libvirt reading the files. This was
needed as adding compression was a non-backwards compatible
change. We could have carried on using version 1 for non-compressd
fields, but we didn't for some reason. It was a hard compat break.

Next time, we stole a element from 'uint32 unused[..]' in the
save header, to add the 'cookie_len' field, but did NOT bump
the version. 'unused' is always all zeroes, so new libvirt could
detect whether the cookie was present by the len being non-zero.
Old libvirt would still load the image, but would be ignoring
the cookie data. This was largely harmless.

This time mapped-ram is a non-compatible change, so we need to
ensure old libvirt won't try to read the files, which suggests
either a save version bump, or we could abuse the 'compressed'
field to indicate 'mapped-ram' as a form of compression.

If we did a save version bump, we might want to carrry on using
v2 for non mapped ram.


IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and
instead must use 'file:'. Does qemu advertise support for that? I couldn't
find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in
theory we could live without the advertisement.


'mapped-ram' is reported in QMP as a MigrationCapability, so I think we
can probe for it directly.


Yes, mapped-ram is reported. Sorry for not being clear, but I was asking if qemu 
advertised support for the 'file:' migration URI it gained in 8.2? Probably not 
a problem either way since it predates mapped-ram.




Yes, it is exclusively for use with 'file:' protocol. If we want to use
FD passing, then we can still do that with 'file:', by using QEMU's
generic /dev/fdset/NNN approach we have with block devices.



It's also not clear when we want to enable the mapped-ram capability. Should
it always be enabled if supported by the underlying qemu? One motivation for
creating the mapped-ram was to support direct-io of the migration stream in
qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g.
the mapped-ram capability is enabled when user specifies
VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd
&& qemu 

Re: Revisiting parallel save/restore

2024-04-26 Thread Daniel P . Berrangé
On Fri, Apr 26, 2024 at 11:44:38AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé  writes:
> 
> > On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote:
> >> Daniel P. Berrangé  writes:
> >> 
> >> > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
> >> >> A good starting point on this journey is supporting the new mapped-ram
> >> >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
> >> >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise 
> >> >> I'm not
> >> >> sure how to detect if a saved image is in mapped-ram format vs the 
> >> >> existing,
> >> >> sequential stream format.
> >> >
> >> > Yes, we'll need to be supporting 'mapped-ram', so a good first step.
> >> >
> >> > A question is whether we make that feature mandatory for all save images,
> >> > or implied by another feature (parallel save), or an directly 
> >> > controllable
> >> > feature with opt-in.
> >> >
> >> > The former breaks back compat with existnig libvirt, while the latter 2
> >> > options are net new so don't have compat implications.
> >> >
> >> > In terms of actual data blocks written on disk mapped-ram should be be 
> >> > the
> >> > same size, or smaller, than the existing format.
> >> >
> >> > In terms of logical file size, however, mapped-ram will almost always be
> >> > larger.
> >> >
> >> > This is because mapped-ram will result in a file whose logical size 
> >> > matches
> >> > the guest RAM size, plus some header overhead, while being sparse so not
> >> > all blocks are written.
> >> >
> >> > If tools handling save images aren't sparse-aware this could come across
> >> > as a surprise and even be considered a regression.
> >> >
> >> > Mapped ram is needed for parallel saves since it lets each thread write
> >> > to a specific region of the file.
> >> >
> >> > Mapped ram is good for non-parallel saves too though, because the mapping
> >> > of RAM into the file is aligned suitably to allow for O_DIRECT to be 
> >> > used.
> >> > Currently libvirt has to tunnnel over its iohelper to futz alignment
> >> > needed for O_DIRECT. This makes it desirable to use in general, but back
> >> > compat hurts...
> >> 
> >> Note that QEMU doesn't support O_DIRECT without multifd.
> >> 
> >> From mapped-ram patch series v4:
> >> 
> >> - Dropped support for direct-io with fixed-ram _without_ multifd. This
> >>   is something I said I would do for this version, but I had to drop
> >>   it because performance is really bad. I think the single-threaded
> >>   precopy code cannot cope with the extra latency/synchronicity of
> >>   O_DIRECT.
> >
> > Note the reason for using O_DIRECT is *not* to make saving / restoring
> > the guest VM faster. Rather it is to ensure that saving/restoring a VM
> > does not trash the host I/O / buffer cache, which will negatively impact
> > performance of all the *other* concurrently running VMs.
> 
> Well, there's surely a performance degradation threshold that negates
> the benefits of perserving the caches. But maybe it's not as low as I
> initially thought then.

I guess you could say that O_DIRECT makes saving/restoring have a
predictable speed, because it will no longer randomly vary depending
on how much free RAM happens to be available at a given time. Time
will be dominated largely by the underlying storage I/O performance

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: Revisiting parallel save/restore

2024-04-26 Thread Fabiano Rosas
Daniel P. Berrangé  writes:

> On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
>> >> A good starting point on this journey is supporting the new mapped-ram
>> >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
>> >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm 
>> >> not
>> >> sure how to detect if a saved image is in mapped-ram format vs the 
>> >> existing,
>> >> sequential stream format.
>> >
>> > Yes, we'll need to be supporting 'mapped-ram', so a good first step.
>> >
>> > A question is whether we make that feature mandatory for all save images,
>> > or implied by another feature (parallel save), or an directly controllable
>> > feature with opt-in.
>> >
>> > The former breaks back compat with existnig libvirt, while the latter 2
>> > options are net new so don't have compat implications.
>> >
>> > In terms of actual data blocks written on disk mapped-ram should be be the
>> > same size, or smaller, than the existing format.
>> >
>> > In terms of logical file size, however, mapped-ram will almost always be
>> > larger.
>> >
>> > This is because mapped-ram will result in a file whose logical size matches
>> > the guest RAM size, plus some header overhead, while being sparse so not
>> > all blocks are written.
>> >
>> > If tools handling save images aren't sparse-aware this could come across
>> > as a surprise and even be considered a regression.
>> >
>> > Mapped ram is needed for parallel saves since it lets each thread write
>> > to a specific region of the file.
>> >
>> > Mapped ram is good for non-parallel saves too though, because the mapping
>> > of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
>> > Currently libvirt has to tunnnel over its iohelper to futz alignment
>> > needed for O_DIRECT. This makes it desirable to use in general, but back
>> > compat hurts...
>> 
>> Note that QEMU doesn't support O_DIRECT without multifd.
>> 
>> From mapped-ram patch series v4:
>> 
>> - Dropped support for direct-io with fixed-ram _without_ multifd. This
>>   is something I said I would do for this version, but I had to drop
>>   it because performance is really bad. I think the single-threaded
>>   precopy code cannot cope with the extra latency/synchronicity of
>>   O_DIRECT.
>
> Note the reason for using O_DIRECT is *not* to make saving / restoring
> the guest VM faster. Rather it is to ensure that saving/restoring a VM
> does not trash the host I/O / buffer cache, which will negatively impact
> performance of all the *other* concurrently running VMs.

Well, there's surely a performance degradation threshold that negates
the benefits of perserving the caches. But maybe it's not as low as I
initially thought then. 

The direct-io enablement is now posted to the qemu mailing list, please
take a look when you get the chance. I'll revisit the direct-io
no-parallel approach in the meantime, let's keep that option open for
now.

>
> With regards,
> Daniel
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: Revisiting parallel save/restore

2024-04-26 Thread Daniel P . Berrangé
On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé  writes:
> 
> > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
> >> A good starting point on this journey is supporting the new mapped-ram
> >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
> >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm 
> >> not
> >> sure how to detect if a saved image is in mapped-ram format vs the 
> >> existing,
> >> sequential stream format.
> >
> > Yes, we'll need to be supporting 'mapped-ram', so a good first step.
> >
> > A question is whether we make that feature mandatory for all save images,
> > or implied by another feature (parallel save), or an directly controllable
> > feature with opt-in.
> >
> > The former breaks back compat with existnig libvirt, while the latter 2
> > options are net new so don't have compat implications.
> >
> > In terms of actual data blocks written on disk mapped-ram should be be the
> > same size, or smaller, than the existing format.
> >
> > In terms of logical file size, however, mapped-ram will almost always be
> > larger.
> >
> > This is because mapped-ram will result in a file whose logical size matches
> > the guest RAM size, plus some header overhead, while being sparse so not
> > all blocks are written.
> >
> > If tools handling save images aren't sparse-aware this could come across
> > as a surprise and even be considered a regression.
> >
> > Mapped ram is needed for parallel saves since it lets each thread write
> > to a specific region of the file.
> >
> > Mapped ram is good for non-parallel saves too though, because the mapping
> > of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
> > Currently libvirt has to tunnnel over its iohelper to futz alignment
> > needed for O_DIRECT. This makes it desirable to use in general, but back
> > compat hurts...
> 
> Note that QEMU doesn't support O_DIRECT without multifd.
> 
> From mapped-ram patch series v4:
> 
> - Dropped support for direct-io with fixed-ram _without_ multifd. This
>   is something I said I would do for this version, but I had to drop
>   it because performance is really bad. I think the single-threaded
>   precopy code cannot cope with the extra latency/synchronicity of
>   O_DIRECT.

Note the reason for using O_DIRECT is *not* to make saving / restoring
the guest VM faster. Rather it is to ensure that saving/restoring a VM
does not trash the host I/O / buffer cache, which will negatively impact
performance of all the *other* concurrently running VMs.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: Revisiting parallel save/restore

2024-04-26 Thread Fabiano Rosas
Daniel P. Berrangé  writes:

> On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
>> A good starting point on this journey is supporting the new mapped-ram
>> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
>> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not
>> sure how to detect if a saved image is in mapped-ram format vs the existing,
>> sequential stream format.
>
> Yes, we'll need to be supporting 'mapped-ram', so a good first step.
>
> A question is whether we make that feature mandatory for all save images,
> or implied by another feature (parallel save), or an directly controllable
> feature with opt-in.
>
> The former breaks back compat with existnig libvirt, while the latter 2
> options are net new so don't have compat implications.
>
> In terms of actual data blocks written on disk mapped-ram should be be the
> same size, or smaller, than the existing format.
>
> In terms of logical file size, however, mapped-ram will almost always be
> larger.
>
> This is because mapped-ram will result in a file whose logical size matches
> the guest RAM size, plus some header overhead, while being sparse so not
> all blocks are written.
>
> If tools handling save images aren't sparse-aware this could come across
> as a surprise and even be considered a regression.
>
> Mapped ram is needed for parallel saves since it lets each thread write
> to a specific region of the file.
>
> Mapped ram is good for non-parallel saves too though, because the mapping
> of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
> Currently libvirt has to tunnnel over its iohelper to futz alignment
> needed for O_DIRECT. This makes it desirable to use in general, but back
> compat hurts...

Note that QEMU doesn't support O_DIRECT without multifd.

From mapped-ram patch series v4:

- Dropped support for direct-io with fixed-ram _without_ multifd. This
  is something I said I would do for this version, but I had to drop
  it because performance is really bad. I think the single-threaded
  precopy code cannot cope with the extra latency/synchronicity of
  O_DIRECT.
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: Revisiting parallel save/restore

2024-04-26 Thread Daniel P . Berrangé
On Thu, Apr 25, 2024 at 04:41:02PM -0600, Jim Fehlig via Devel wrote:
> On 4/17/24 5:12 PM, Jim Fehlig wrote:
> > Hi All,
> > 
> > While Fabiano has been working on improving save/restore performance in
> > qemu, I've been tinkering with the same in libvirt. The end goal is to
> > introduce a new VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along
> > with a VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify
> > the number of concurrent channels used for the save/restore. Recall
> > Claudio previously posted a patch series implementing parallel
> > save/restore completely in libvirt, using qemu's multifd functionality
> > [1].
> > 
> > A good starting point on this journey is supporting the new mapped-ram
> > capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
> > assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm
> > not sure how to detect if a saved image is in mapped-ram format vs the
> > existing, sequential stream format.
> 
> While hacking on a POC, I discovered the save data cookie and assume the use
> of mapped-ram could be recorded there?

The issue with that is the semantics around old libivrt loading
the new image. Old libvirt won't know to look for 'mapped-ram'
element/attribute in the XML cookie, so will think it is a
traditional image with hilariously predictable results :-)

Hence I think mapped-ram needs some addition/change to the
save image header that explicitly stops old libvirt trying
to consume it, thus the version bump you mention.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Re: Revisiting parallel save/restore

2024-04-26 Thread Daniel P . Berrangé
On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
> A good starting point on this journey is supporting the new mapped-ram
> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not
> sure how to detect if a saved image is in mapped-ram format vs the existing,
> sequential stream format.

Yes, we'll need to be supporting 'mapped-ram', so a good first step.

A question is whether we make that feature mandatory for all save images,
or implied by another feature (parallel save), or an directly controllable
feature with opt-in.

The former breaks back compat with existnig libvirt, while the latter 2
options are net new so don't have compat implications.

In terms of actual data blocks written on disk mapped-ram should be be the
same size, or smaller, than the existing format.

In terms of logical file size, however, mapped-ram will almost always be
larger.

This is because mapped-ram will result in a file whose logical size matches
the guest RAM size, plus some header overhead, while being sparse so not
all blocks are written.

If tools handling save images aren't sparse-aware this could come across
as a surprise and even be considered a regression.

Mapped ram is needed for parallel saves since it lets each thread write
to a specific region of the file.

Mapped ram is good for non-parallel saves too though, because the mapping
of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
Currently libvirt has to tunnnel over its iohelper to futz alignment
needed for O_DIRECT. This makes it desirable to use in general, but back
compat hurts...


Looking at what we did in the past

First time, we stole a element from 'uint32_t unused[..]' in the
save header, to add the 'compressed' field, and bumped the
version. This prevented old libvirt reading the files. This was
needed as adding compression was a non-backwards compatible
change. We could have carried on using version 1 for non-compressd
fields, but we didn't for some reason. It was a hard compat break.

Next time, we stole a element from 'uint32 unused[..]' in the
save header, to add the 'cookie_len' field, but did NOT bump
the version. 'unused' is always all zeroes, so new libvirt could
detect whether the cookie was present by the len being non-zero.
Old libvirt would still load the image, but would be ignoring
the cookie data. This was largely harmless.

This time mapped-ram is a non-compatible change, so we need to
ensure old libvirt won't try to read the files, which suggests
either a save version bump, or we could abuse the 'compressed'
field to indicate 'mapped-ram' as a form of compression.

If we did a save version bump, we might want to carrry on using
v2 for non mapped ram.

> IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and
> instead must use 'file:'. Does qemu advertise support for that? I couldn't
> find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in
> theory we could live without the advertisement.

'mapped-ram' is reported in QMP as a MigrationCapability, so I think we
can probe for it directly.

Yes, it is exclusively for use with 'file:' protocol. If we want to use
FD passing, then we can still do that with 'file:', by using QEMU's
generic /dev/fdset/NNN approach we have with block devices.

> 
> It's also not clear when we want to enable the mapped-ram capability. Should
> it always be enabled if supported by the underlying qemu? One motivation for
> creating the mapped-ram was to support direct-io of the migration stream in
> qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g.
> the mapped-ram capability is enabled when user specifies
> VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd
> && qemu supports mapped-ram?

One option is to be lazy and have a /etc/libvirt/qemu.conf for the
save format version, defaulting to latest v3. Release note that
admin/host provisioning apps must set it to v2 if back compat is
needed with old libvirt. If we assume new -> old save image loading
is relatively rare, that's probably good enough.

IOW, we can

 * Bump save version to 3
 * Use v3 by default
 * Add a SAVE_PARALLEL flag which implies mapped-ram, reject
   if v2
 * Use mapped RAM with BYPASS_CACHE for v3, old approach for v2
 * Steal another unused field to indicate use of mapped-ram,
   or perhaps future proof it by declaring a 'features'
   field. So we don't need to bump version again, just make
   sure that the libvirt loading an image supports all
   set features.

> Looking ahead, should the mapped-ram capability be required for supporting
> the VIR_DOMAIN_SAVE_PARALLEL flag? As I understand, parallel save/restore
> was another motivation for creating the mapped-ram feature. It allows
> multifd threads to write exclusively to the offsets provided by mapped-ram.
> Can multiple multifd threads concurrently write to an fd 

Re: Revisiting parallel save/restore

2024-04-25 Thread Jim Fehlig via Devel

On 4/17/24 5:12 PM, Jim Fehlig wrote:

Hi All,

While Fabiano has been working on improving save/restore performance in qemu, 
I've been tinkering with the same in libvirt. The end goal is to introduce a new 
VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along with a 
VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify the number of 
concurrent channels used for the save/restore. Recall Claudio previously posted 
a patch series implementing parallel save/restore completely in libvirt, using 
qemu's multifd functionality [1].


A good starting point on this journey is supporting the new mapped-ram 
capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume 
we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how 
to detect if a saved image is in mapped-ram format vs the existing, sequential 
stream format.


While hacking on a POC, I discovered the save data cookie and assume the use of 
mapped-ram could be recorded there?


IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead 
must use 'file:'. Does qemu advertise support for that? I couldn't find it. If 
not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could 
live without the advertisement.


It's also not clear when we want to enable the mapped-ram capability. Should it 
always be enabled if supported by the underlying qemu? One motivation for 
creating the mapped-ram was to support direct-io of the migration stream in 
qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the 
mapped-ram capability is enabled when user specifies 
VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && 
qemu supports mapped-ram?


Comments/suggestions on these topics are much appreciated :-).

Looking ahead, should the mapped-ram capability be required for supporting the 
VIR_DOMAIN_SAVE_PARALLEL flag?


I think the answer is yes, otherwise we'd need something in libvirt like 
Claudio's original series to manage multifd channels writing to fixed offsets in 
the save file.


Regards,
Jim

As I understand, parallel save/restore was 
another motivation for creating the mapped-ram feature. It allows multifd 
threads to write exclusively to the offsets provided by mapped-ram. Can multiple 
multifd threads concurrently write to an fd without mapped-ram?


Regards,
Jim

[1] 
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/3Y5GMS6A4QS4IXWDKFFV3A2FO5YMCFES/
[2] 
https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads

___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org


Revisiting parallel save/restore

2024-04-17 Thread Jim Fehlig via Devel

Hi All,

While Fabiano has been working on improving save/restore performance in qemu, 
I've been tinkering with the same in libvirt. The end goal is to introduce a new 
VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along with a 
VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify the number of 
concurrent channels used for the save/restore. Recall Claudio previously posted 
a patch series implementing parallel save/restore completely in libvirt, using 
qemu's multifd functionality [1].


A good starting point on this journey is supporting the new mapped-ram 
capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume 
we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how 
to detect if a saved image is in mapped-ram format vs the existing, sequential 
stream format.


IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead 
must use 'file:'. Does qemu advertise support for that? I couldn't find it. If 
not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could 
live without the advertisement.


It's also not clear when we want to enable the mapped-ram capability. Should it 
always be enabled if supported by the underlying qemu? One motivation for 
creating the mapped-ram was to support direct-io of the migration stream in 
qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the 
mapped-ram capability is enabled when user specifies 
VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && 
qemu supports mapped-ram?


Looking ahead, should the mapped-ram capability be required for supporting the 
VIR_DOMAIN_SAVE_PARALLEL flag? As I understand, parallel save/restore was 
another motivation for creating the mapped-ram feature. It allows multifd 
threads to write exclusively to the offsets provided by mapped-ram. Can multiple 
multifd threads concurrently write to an fd without mapped-ram?


Regards,
Jim

[1] 
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/3Y5GMS6A4QS4IXWDKFFV3A2FO5YMCFES/
[2] 
https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads

___
Devel mailing list -- devel@lists.libvirt.org
To unsubscribe send an email to devel-le...@lists.libvirt.org