Re: Revisiting parallel save/restore
On 4/26/24 16:50, Daniel P. Berrangé wrote: > On Fri, Apr 26, 2024 at 11:44:38AM -0300, Fabiano Rosas wrote: >> Daniel P. Berrangé writes: >> >>> On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote: Daniel P. Berrangé writes: > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: >> A good starting point on this journey is supporting the new mapped-ram >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm >> not >> sure how to detect if a saved image is in mapped-ram format vs the >> existing, >> sequential stream format. > > Yes, we'll need to be supporting 'mapped-ram', so a good first step. > > A question is whether we make that feature mandatory for all save images, > or implied by another feature (parallel save), or an directly controllable > feature with opt-in. > > The former breaks back compat with existnig libvirt, while the latter 2 > options are net new so don't have compat implications. > > In terms of actual data blocks written on disk mapped-ram should be be the > same size, or smaller, than the existing format. > > In terms of logical file size, however, mapped-ram will almost always be > larger. > > This is because mapped-ram will result in a file whose logical size > matches > the guest RAM size, plus some header overhead, while being sparse so not > all blocks are written. > > If tools handling save images aren't sparse-aware this could come across > as a surprise and even be considered a regression. > > Mapped ram is needed for parallel saves since it lets each thread write > to a specific region of the file. > > Mapped ram is good for non-parallel saves too though, because the mapping > of RAM into the file is aligned suitably to allow for O_DIRECT to be used. > Currently libvirt has to tunnnel over its iohelper to futz alignment > needed for O_DIRECT. This makes it desirable to use in general, but back > compat hurts... Note that QEMU doesn't support O_DIRECT without multifd. From mapped-ram patch series v4: - Dropped support for direct-io with fixed-ram _without_ multifd. This is something I said I would do for this version, but I had to drop it because performance is really bad. I think the single-threaded precopy code cannot cope with the extra latency/synchronicity of O_DIRECT. >>> >>> Note the reason for using O_DIRECT is *not* to make saving / restoring >>> the guest VM faster. Rather it is to ensure that saving/restoring a VM >>> does not trash the host I/O / buffer cache, which will negatively impact >>> performance of all the *other* concurrently running VMs. You can absolutely also thrash yourself, not only other VMs. >> >> Well, there's surely a performance degradation threshold that negates >> the benefits of perserving the caches. But maybe it's not as low as I >> initially thought then. > > I guess you could say that O_DIRECT makes saving/restoring have a > predictable speed, because it will no longer randomly vary depending > on how much free RAM happens to be available at a given time. Time > will be dominated largely by the underlying storage I/O performanc With fast nvme disks, my observation is that O_DIRECT without multifd is bottlenecked on libvirt + QEMU throughput, not on storage I/O Performance. > > With regards, > Daniel Ciao, Claudio ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Re: Revisiting parallel save/restore
On 4/26/24 4:04 AM, Daniel P. Berrangé wrote: On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: A good starting point on this journey is supporting the new mapped-ram capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how to detect if a saved image is in mapped-ram format vs the existing, sequential stream format. Yes, we'll need to be supporting 'mapped-ram', so a good first step. A question is whether we make that feature mandatory for all save images, or implied by another feature (parallel save), or an directly controllable feature with opt-in. The former breaks back compat with existnig libvirt, while the latter 2 options are net new so don't have compat implications. In terms of actual data blocks written on disk mapped-ram should be be the same size, or smaller, than the existing format. In terms of logical file size, however, mapped-ram will almost always be larger. This is because mapped-ram will result in a file whose logical size matches the guest RAM size, plus some header overhead, while being sparse so not all blocks are written. If tools handling save images aren't sparse-aware this could come across as a surprise and even be considered a regression. Mapped ram is needed for parallel saves since it lets each thread write to a specific region of the file. Mapped ram is good for non-parallel saves too though, because the mapping of RAM into the file is aligned suitably to allow for O_DIRECT to be used. Currently libvirt has to tunnnel over its iohelper to futz alignment needed for O_DIRECT. This makes it desirable to use in general, but back compat hurts... Looking at what we did in the past First time, we stole a element from 'uint32_t unused[..]' in the save header, to add the 'compressed' field, and bumped the version. This prevented old libvirt reading the files. This was needed as adding compression was a non-backwards compatible change. We could have carried on using version 1 for non-compressd fields, but we didn't for some reason. It was a hard compat break. Hmm, libvirt's implementation of compression seems to conflict with mapped-ram. AFAIK, mapped-ram requires a seekable fd. Should the two be mutually exclusive? Next time, we stole a element from 'uint32 unused[..]' in the save header, to add the 'cookie_len' field, but did NOT bump the version. 'unused' is always all zeroes, so new libvirt could detect whether the cookie was present by the len being non-zero. Old libvirt would still load the image, but would be ignoring the cookie data. This was largely harmless. This time mapped-ram is a non-compatible change, so we need to ensure old libvirt won't try to read the files, which suggests either a save version bump, or we could abuse the 'compressed' field to indicate 'mapped-ram' as a form of compression. If we did a save version bump, we might want to carrry on using v2 for non mapped ram. IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead must use 'file:'. Does qemu advertise support for that? I couldn't find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could live without the advertisement. 'mapped-ram' is reported in QMP as a MigrationCapability, so I think we can probe for it directly. Yes, it is exclusively for use with 'file:' protocol. If we want to use FD passing, then we can still do that with 'file:', by using QEMU's generic /dev/fdset/NNN approach we have with block devices. It's also not clear when we want to enable the mapped-ram capability. Should it always be enabled if supported by the underlying qemu? One motivation for creating the mapped-ram was to support direct-io of the migration stream in qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the mapped-ram capability is enabled when user specifies VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && qemu supports mapped-ram? One option is to be lazy and have a /etc/libvirt/qemu.conf for the save format version, defaulting to latest v3. Release note that admin/host provisioning apps must set it to v2 if back compat is needed with old libvirt. If we assume new -> old save image loading is relatively rare, that's probably good enough. IOW, we can * Bump save version to 3 * Use v3 by default Using mapped-ram by default but not supporting compression would be a regression, right? E.g. 'virsh save vm-name /some/path' would suddenly fail if user's /etc/libvirt/qemu.conf contained 'save_image_format = "lzop"'. Regards, Jim * Add a SAVE_PARALLEL flag which implies mapped-ram, reject if v2 * Use mapped RAM with BYPASS_CACHE for v3, old approach for v2 * Steal another unused field to indicate use of mapped-ram, or perhaps future proof it by declaring a 'features' field. So we don't need to bump version again, just make
Re: Revisiting parallel save/restore
On 4/26/24 4:07 AM, Daniel P. Berrangé wrote: On Thu, Apr 25, 2024 at 04:41:02PM -0600, Jim Fehlig via Devel wrote: On 4/17/24 5:12 PM, Jim Fehlig wrote: Hi All, While Fabiano has been working on improving save/restore performance in qemu, I've been tinkering with the same in libvirt. The end goal is to introduce a new VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along with a VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify the number of concurrent channels used for the save/restore. Recall Claudio previously posted a patch series implementing parallel save/restore completely in libvirt, using qemu's multifd functionality [1]. A good starting point on this journey is supporting the new mapped-ram capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how to detect if a saved image is in mapped-ram format vs the existing, sequential stream format. While hacking on a POC, I discovered the save data cookie and assume the use of mapped-ram could be recorded there? The issue with that is the semantics around old libivrt loading the new image. Old libvirt won't know to look for 'mapped-ram' element/attribute in the XML cookie, so will think it is a traditional image with hilariously predictable results :-) Haha :-). I need to recall we aim to support new-to-old migration upstream. We limit our downstream support scope, and this type of migration scenario is one that falls in the unsupported bucket. Regards, Jim ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Re: Revisiting parallel save/restore
On 4/26/24 4:04 AM, Daniel P. Berrangé wrote: On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: A good starting point on this journey is supporting the new mapped-ram capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how to detect if a saved image is in mapped-ram format vs the existing, sequential stream format. Yes, we'll need to be supporting 'mapped-ram', so a good first step. A question is whether we make that feature mandatory for all save images, or implied by another feature (parallel save), or an directly controllable feature with opt-in. It feels more like an implementation detail. The former breaks back compat with existnig libvirt, while the latter 2 options are net new so don't have compat implications. In terms of actual data blocks written on disk mapped-ram should be be the same size, or smaller, than the existing format. In terms of logical file size, however, mapped-ram will almost always be larger. Correct. E.g. from a mostly idle 8G VM # stat existing-format.sav Size: 510046983 Blocks: 996192 IO Block: 4096 regular file # stat mapped-ram-format.sav Size: 8597730739 Blocks: 956200 IO Block: 4096 regular file The upside is mapped-ram is bounded, unlike the existing stream, which can result in actual file sizes much greater than RAM size when VM runs a memory intensive workload. This is because mapped-ram will result in a file whose logical size matches the guest RAM size, plus some header overhead, while being sparse so not all blocks are written. If tools handling save images aren't sparse-aware this could come across as a surprise and even be considered a regression. Yes, I already had visions of phone ringing off the hook asking "why are my save images suddenly huge?". But maybe it's tolerable once they realize actual blocks used, and when combined with parallel they could also be asking "why are saves suddenly so fast?" :-). Mapped ram is needed for parallel saves since it lets each thread write to a specific region of the file. Mapped ram is good for non-parallel saves too though, because the mapping of RAM into the file is aligned suitably to allow for O_DIRECT to be used. Currently libvirt has to tunnnel over its iohelper to futz alignment needed for O_DIRECT. This makes it desirable to use in general, but back compat hurts... My POC avoids the use of iohelper with mapped-ram. It provides qemu with two fds when direct-io has been requested, one opened with O_DIRECT, one without. Looking at what we did in the past First time, we stole a element from 'uint32_t unused[..]' in the save header, to add the 'compressed' field, and bumped the version. This prevented old libvirt reading the files. This was needed as adding compression was a non-backwards compatible change. We could have carried on using version 1 for non-compressd fields, but we didn't for some reason. It was a hard compat break. Next time, we stole a element from 'uint32 unused[..]' in the save header, to add the 'cookie_len' field, but did NOT bump the version. 'unused' is always all zeroes, so new libvirt could detect whether the cookie was present by the len being non-zero. Old libvirt would still load the image, but would be ignoring the cookie data. This was largely harmless. This time mapped-ram is a non-compatible change, so we need to ensure old libvirt won't try to read the files, which suggests either a save version bump, or we could abuse the 'compressed' field to indicate 'mapped-ram' as a form of compression. If we did a save version bump, we might want to carrry on using v2 for non mapped ram. IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead must use 'file:'. Does qemu advertise support for that? I couldn't find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could live without the advertisement. 'mapped-ram' is reported in QMP as a MigrationCapability, so I think we can probe for it directly. Yes, mapped-ram is reported. Sorry for not being clear, but I was asking if qemu advertised support for the 'file:' migration URI it gained in 8.2? Probably not a problem either way since it predates mapped-ram. Yes, it is exclusively for use with 'file:' protocol. If we want to use FD passing, then we can still do that with 'file:', by using QEMU's generic /dev/fdset/NNN approach we have with block devices. It's also not clear when we want to enable the mapped-ram capability. Should it always be enabled if supported by the underlying qemu? One motivation for creating the mapped-ram was to support direct-io of the migration stream in qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the mapped-ram capability is enabled when user specifies VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && qemu
Re: Revisiting parallel save/restore
On Fri, Apr 26, 2024 at 11:44:38AM -0300, Fabiano Rosas wrote: > Daniel P. Berrangé writes: > > > On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote: > >> Daniel P. Berrangé writes: > >> > >> > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: > >> >> A good starting point on this journey is supporting the new mapped-ram > >> >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I > >> >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise > >> >> I'm not > >> >> sure how to detect if a saved image is in mapped-ram format vs the > >> >> existing, > >> >> sequential stream format. > >> > > >> > Yes, we'll need to be supporting 'mapped-ram', so a good first step. > >> > > >> > A question is whether we make that feature mandatory for all save images, > >> > or implied by another feature (parallel save), or an directly > >> > controllable > >> > feature with opt-in. > >> > > >> > The former breaks back compat with existnig libvirt, while the latter 2 > >> > options are net new so don't have compat implications. > >> > > >> > In terms of actual data blocks written on disk mapped-ram should be be > >> > the > >> > same size, or smaller, than the existing format. > >> > > >> > In terms of logical file size, however, mapped-ram will almost always be > >> > larger. > >> > > >> > This is because mapped-ram will result in a file whose logical size > >> > matches > >> > the guest RAM size, plus some header overhead, while being sparse so not > >> > all blocks are written. > >> > > >> > If tools handling save images aren't sparse-aware this could come across > >> > as a surprise and even be considered a regression. > >> > > >> > Mapped ram is needed for parallel saves since it lets each thread write > >> > to a specific region of the file. > >> > > >> > Mapped ram is good for non-parallel saves too though, because the mapping > >> > of RAM into the file is aligned suitably to allow for O_DIRECT to be > >> > used. > >> > Currently libvirt has to tunnnel over its iohelper to futz alignment > >> > needed for O_DIRECT. This makes it desirable to use in general, but back > >> > compat hurts... > >> > >> Note that QEMU doesn't support O_DIRECT without multifd. > >> > >> From mapped-ram patch series v4: > >> > >> - Dropped support for direct-io with fixed-ram _without_ multifd. This > >> is something I said I would do for this version, but I had to drop > >> it because performance is really bad. I think the single-threaded > >> precopy code cannot cope with the extra latency/synchronicity of > >> O_DIRECT. > > > > Note the reason for using O_DIRECT is *not* to make saving / restoring > > the guest VM faster. Rather it is to ensure that saving/restoring a VM > > does not trash the host I/O / buffer cache, which will negatively impact > > performance of all the *other* concurrently running VMs. > > Well, there's surely a performance degradation threshold that negates > the benefits of perserving the caches. But maybe it's not as low as I > initially thought then. I guess you could say that O_DIRECT makes saving/restoring have a predictable speed, because it will no longer randomly vary depending on how much free RAM happens to be available at a given time. Time will be dominated largely by the underlying storage I/O performance With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Re: Revisiting parallel save/restore
Daniel P. Berrangé writes: > On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote: >> Daniel P. Berrangé writes: >> >> > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: >> >> A good starting point on this journey is supporting the new mapped-ram >> >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I >> >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm >> >> not >> >> sure how to detect if a saved image is in mapped-ram format vs the >> >> existing, >> >> sequential stream format. >> > >> > Yes, we'll need to be supporting 'mapped-ram', so a good first step. >> > >> > A question is whether we make that feature mandatory for all save images, >> > or implied by another feature (parallel save), or an directly controllable >> > feature with opt-in. >> > >> > The former breaks back compat with existnig libvirt, while the latter 2 >> > options are net new so don't have compat implications. >> > >> > In terms of actual data blocks written on disk mapped-ram should be be the >> > same size, or smaller, than the existing format. >> > >> > In terms of logical file size, however, mapped-ram will almost always be >> > larger. >> > >> > This is because mapped-ram will result in a file whose logical size matches >> > the guest RAM size, plus some header overhead, while being sparse so not >> > all blocks are written. >> > >> > If tools handling save images aren't sparse-aware this could come across >> > as a surprise and even be considered a regression. >> > >> > Mapped ram is needed for parallel saves since it lets each thread write >> > to a specific region of the file. >> > >> > Mapped ram is good for non-parallel saves too though, because the mapping >> > of RAM into the file is aligned suitably to allow for O_DIRECT to be used. >> > Currently libvirt has to tunnnel over its iohelper to futz alignment >> > needed for O_DIRECT. This makes it desirable to use in general, but back >> > compat hurts... >> >> Note that QEMU doesn't support O_DIRECT without multifd. >> >> From mapped-ram patch series v4: >> >> - Dropped support for direct-io with fixed-ram _without_ multifd. This >> is something I said I would do for this version, but I had to drop >> it because performance is really bad. I think the single-threaded >> precopy code cannot cope with the extra latency/synchronicity of >> O_DIRECT. > > Note the reason for using O_DIRECT is *not* to make saving / restoring > the guest VM faster. Rather it is to ensure that saving/restoring a VM > does not trash the host I/O / buffer cache, which will negatively impact > performance of all the *other* concurrently running VMs. Well, there's surely a performance degradation threshold that negates the benefits of perserving the caches. But maybe it's not as low as I initially thought then. The direct-io enablement is now posted to the qemu mailing list, please take a look when you get the chance. I'll revisit the direct-io no-parallel approach in the meantime, let's keep that option open for now. > > With regards, > Daniel ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Re: Revisiting parallel save/restore
On Fri, Apr 26, 2024 at 10:03:29AM -0300, Fabiano Rosas wrote: > Daniel P. Berrangé writes: > > > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: > >> A good starting point on this journey is supporting the new mapped-ram > >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I > >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm > >> not > >> sure how to detect if a saved image is in mapped-ram format vs the > >> existing, > >> sequential stream format. > > > > Yes, we'll need to be supporting 'mapped-ram', so a good first step. > > > > A question is whether we make that feature mandatory for all save images, > > or implied by another feature (parallel save), or an directly controllable > > feature with opt-in. > > > > The former breaks back compat with existnig libvirt, while the latter 2 > > options are net new so don't have compat implications. > > > > In terms of actual data blocks written on disk mapped-ram should be be the > > same size, or smaller, than the existing format. > > > > In terms of logical file size, however, mapped-ram will almost always be > > larger. > > > > This is because mapped-ram will result in a file whose logical size matches > > the guest RAM size, plus some header overhead, while being sparse so not > > all blocks are written. > > > > If tools handling save images aren't sparse-aware this could come across > > as a surprise and even be considered a regression. > > > > Mapped ram is needed for parallel saves since it lets each thread write > > to a specific region of the file. > > > > Mapped ram is good for non-parallel saves too though, because the mapping > > of RAM into the file is aligned suitably to allow for O_DIRECT to be used. > > Currently libvirt has to tunnnel over its iohelper to futz alignment > > needed for O_DIRECT. This makes it desirable to use in general, but back > > compat hurts... > > Note that QEMU doesn't support O_DIRECT without multifd. > > From mapped-ram patch series v4: > > - Dropped support for direct-io with fixed-ram _without_ multifd. This > is something I said I would do for this version, but I had to drop > it because performance is really bad. I think the single-threaded > precopy code cannot cope with the extra latency/synchronicity of > O_DIRECT. Note the reason for using O_DIRECT is *not* to make saving / restoring the guest VM faster. Rather it is to ensure that saving/restoring a VM does not trash the host I/O / buffer cache, which will negatively impact performance of all the *other* concurrently running VMs. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Re: Revisiting parallel save/restore
Daniel P. Berrangé writes: > On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: >> A good starting point on this journey is supporting the new mapped-ram >> capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I >> assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not >> sure how to detect if a saved image is in mapped-ram format vs the existing, >> sequential stream format. > > Yes, we'll need to be supporting 'mapped-ram', so a good first step. > > A question is whether we make that feature mandatory for all save images, > or implied by another feature (parallel save), or an directly controllable > feature with opt-in. > > The former breaks back compat with existnig libvirt, while the latter 2 > options are net new so don't have compat implications. > > In terms of actual data blocks written on disk mapped-ram should be be the > same size, or smaller, than the existing format. > > In terms of logical file size, however, mapped-ram will almost always be > larger. > > This is because mapped-ram will result in a file whose logical size matches > the guest RAM size, plus some header overhead, while being sparse so not > all blocks are written. > > If tools handling save images aren't sparse-aware this could come across > as a surprise and even be considered a regression. > > Mapped ram is needed for parallel saves since it lets each thread write > to a specific region of the file. > > Mapped ram is good for non-parallel saves too though, because the mapping > of RAM into the file is aligned suitably to allow for O_DIRECT to be used. > Currently libvirt has to tunnnel over its iohelper to futz alignment > needed for O_DIRECT. This makes it desirable to use in general, but back > compat hurts... Note that QEMU doesn't support O_DIRECT without multifd. From mapped-ram patch series v4: - Dropped support for direct-io with fixed-ram _without_ multifd. This is something I said I would do for this version, but I had to drop it because performance is really bad. I think the single-threaded precopy code cannot cope with the extra latency/synchronicity of O_DIRECT. ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Re: Revisiting parallel save/restore
On Thu, Apr 25, 2024 at 04:41:02PM -0600, Jim Fehlig via Devel wrote: > On 4/17/24 5:12 PM, Jim Fehlig wrote: > > Hi All, > > > > While Fabiano has been working on improving save/restore performance in > > qemu, I've been tinkering with the same in libvirt. The end goal is to > > introduce a new VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along > > with a VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify > > the number of concurrent channels used for the save/restore. Recall > > Claudio previously posted a patch series implementing parallel > > save/restore completely in libvirt, using qemu's multifd functionality > > [1]. > > > > A good starting point on this journey is supporting the new mapped-ram > > capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I > > assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm > > not sure how to detect if a saved image is in mapped-ram format vs the > > existing, sequential stream format. > > While hacking on a POC, I discovered the save data cookie and assume the use > of mapped-ram could be recorded there? The issue with that is the semantics around old libivrt loading the new image. Old libvirt won't know to look for 'mapped-ram' element/attribute in the XML cookie, so will think it is a traditional image with hilariously predictable results :-) Hence I think mapped-ram needs some addition/change to the save image header that explicitly stops old libvirt trying to consume it, thus the version bump you mention. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Re: Revisiting parallel save/restore
On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote: > A good starting point on this journey is supporting the new mapped-ram > capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I > assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not > sure how to detect if a saved image is in mapped-ram format vs the existing, > sequential stream format. Yes, we'll need to be supporting 'mapped-ram', so a good first step. A question is whether we make that feature mandatory for all save images, or implied by another feature (parallel save), or an directly controllable feature with opt-in. The former breaks back compat with existnig libvirt, while the latter 2 options are net new so don't have compat implications. In terms of actual data blocks written on disk mapped-ram should be be the same size, or smaller, than the existing format. In terms of logical file size, however, mapped-ram will almost always be larger. This is because mapped-ram will result in a file whose logical size matches the guest RAM size, plus some header overhead, while being sparse so not all blocks are written. If tools handling save images aren't sparse-aware this could come across as a surprise and even be considered a regression. Mapped ram is needed for parallel saves since it lets each thread write to a specific region of the file. Mapped ram is good for non-parallel saves too though, because the mapping of RAM into the file is aligned suitably to allow for O_DIRECT to be used. Currently libvirt has to tunnnel over its iohelper to futz alignment needed for O_DIRECT. This makes it desirable to use in general, but back compat hurts... Looking at what we did in the past First time, we stole a element from 'uint32_t unused[..]' in the save header, to add the 'compressed' field, and bumped the version. This prevented old libvirt reading the files. This was needed as adding compression was a non-backwards compatible change. We could have carried on using version 1 for non-compressd fields, but we didn't for some reason. It was a hard compat break. Next time, we stole a element from 'uint32 unused[..]' in the save header, to add the 'cookie_len' field, but did NOT bump the version. 'unused' is always all zeroes, so new libvirt could detect whether the cookie was present by the len being non-zero. Old libvirt would still load the image, but would be ignoring the cookie data. This was largely harmless. This time mapped-ram is a non-compatible change, so we need to ensure old libvirt won't try to read the files, which suggests either a save version bump, or we could abuse the 'compressed' field to indicate 'mapped-ram' as a form of compression. If we did a save version bump, we might want to carrry on using v2 for non mapped ram. > IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and > instead must use 'file:'. Does qemu advertise support for that? I couldn't > find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in > theory we could live without the advertisement. 'mapped-ram' is reported in QMP as a MigrationCapability, so I think we can probe for it directly. Yes, it is exclusively for use with 'file:' protocol. If we want to use FD passing, then we can still do that with 'file:', by using QEMU's generic /dev/fdset/NNN approach we have with block devices. > > It's also not clear when we want to enable the mapped-ram capability. Should > it always be enabled if supported by the underlying qemu? One motivation for > creating the mapped-ram was to support direct-io of the migration stream in > qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. > the mapped-ram capability is enabled when user specifies > VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd > && qemu supports mapped-ram? One option is to be lazy and have a /etc/libvirt/qemu.conf for the save format version, defaulting to latest v3. Release note that admin/host provisioning apps must set it to v2 if back compat is needed with old libvirt. If we assume new -> old save image loading is relatively rare, that's probably good enough. IOW, we can * Bump save version to 3 * Use v3 by default * Add a SAVE_PARALLEL flag which implies mapped-ram, reject if v2 * Use mapped RAM with BYPASS_CACHE for v3, old approach for v2 * Steal another unused field to indicate use of mapped-ram, or perhaps future proof it by declaring a 'features' field. So we don't need to bump version again, just make sure that the libvirt loading an image supports all set features. > Looking ahead, should the mapped-ram capability be required for supporting > the VIR_DOMAIN_SAVE_PARALLEL flag? As I understand, parallel save/restore > was another motivation for creating the mapped-ram feature. It allows > multifd threads to write exclusively to the offsets provided by mapped-ram. > Can multiple multifd threads concurrently write to an fd
Re: Revisiting parallel save/restore
On 4/17/24 5:12 PM, Jim Fehlig wrote: Hi All, While Fabiano has been working on improving save/restore performance in qemu, I've been tinkering with the same in libvirt. The end goal is to introduce a new VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along with a VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify the number of concurrent channels used for the save/restore. Recall Claudio previously posted a patch series implementing parallel save/restore completely in libvirt, using qemu's multifd functionality [1]. A good starting point on this journey is supporting the new mapped-ram capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how to detect if a saved image is in mapped-ram format vs the existing, sequential stream format. While hacking on a POC, I discovered the save data cookie and assume the use of mapped-ram could be recorded there? IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead must use 'file:'. Does qemu advertise support for that? I couldn't find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could live without the advertisement. It's also not clear when we want to enable the mapped-ram capability. Should it always be enabled if supported by the underlying qemu? One motivation for creating the mapped-ram was to support direct-io of the migration stream in qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the mapped-ram capability is enabled when user specifies VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && qemu supports mapped-ram? Comments/suggestions on these topics are much appreciated :-). Looking ahead, should the mapped-ram capability be required for supporting the VIR_DOMAIN_SAVE_PARALLEL flag? I think the answer is yes, otherwise we'd need something in libvirt like Claudio's original series to manage multifd channels writing to fixed offsets in the save file. Regards, Jim As I understand, parallel save/restore was another motivation for creating the mapped-ram feature. It allows multifd threads to write exclusively to the offsets provided by mapped-ram. Can multiple multifd threads concurrently write to an fd without mapped-ram? Regards, Jim [1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/3Y5GMS6A4QS4IXWDKFFV3A2FO5YMCFES/ [2] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org
Revisiting parallel save/restore
Hi All, While Fabiano has been working on improving save/restore performance in qemu, I've been tinkering with the same in libvirt. The end goal is to introduce a new VIR_DOMAIN_SAVE_PARALLEL flag for save/restore, along with a VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS parameter to specify the number of concurrent channels used for the save/restore. Recall Claudio previously posted a patch series implementing parallel save/restore completely in libvirt, using qemu's multifd functionality [1]. A good starting point on this journey is supporting the new mapped-ram capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how to detect if a saved image is in mapped-ram format vs the existing, sequential stream format. IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead must use 'file:'. Does qemu advertise support for that? I couldn't find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could live without the advertisement. It's also not clear when we want to enable the mapped-ram capability. Should it always be enabled if supported by the underlying qemu? One motivation for creating the mapped-ram was to support direct-io of the migration stream in qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the mapped-ram capability is enabled when user specifies VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && qemu supports mapped-ram? Looking ahead, should the mapped-ram capability be required for supporting the VIR_DOMAIN_SAVE_PARALLEL flag? As I understand, parallel save/restore was another motivation for creating the mapped-ram feature. It allows multifd threads to write exclusively to the offsets provided by mapped-ram. Can multiple multifd threads concurrently write to an fd without mapped-ram? Regards, Jim [1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/3Y5GMS6A4QS4IXWDKFFV3A2FO5YMCFES/ [2] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads ___ Devel mailing list -- devel@lists.libvirt.org To unsubscribe send an email to devel-le...@lists.libvirt.org