Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-06-03 Thread Peter Xu
On Fri, May 31, 2024 at 03:32:11PM -0400, Steven Sistare wrote:
> On 5/30/2024 2:39 PM, Peter Xu wrote:
> > On Thu, May 30, 2024 at 01:12:40PM -0400, Steven Sistare wrote:
> > > On 5/29/2024 3:25 PM, Peter Xu wrote:
> > > > On Wed, May 29, 2024 at 01:31:53PM -0400, Steven Sistare wrote:
> > > > > On 5/28/2024 5:44 PM, Peter Xu wrote:
> > > > > > On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:
> > > > > > > Preserve fields of RAMBlocks that allocate their host memory 
> > > > > > > during CPR so
> > > > > > > the RAM allocation can be recovered.
> > > > > > 
> > > > > > This sentence itself did not explain much, IMHO.  QEMU can share 
> > > > > > memory
> > > > > > using fd based memory already of all kinds, as long as the memory 
> > > > > > backend
> > > > > > is path-based it can be shared by sharing the same paths to dst.
> > > > > > 
> > > > > > This reads very confusing as a generic concept.  I mean, QEMU 
> > > > > > migration
> > > > > > relies on so many things to work right.  We mostly asks the users 
> > > > > > to "use
> > > > > > exactly the same cmdline for src/dst QEMU unless you know what 
> > > > > > you're
> > > > > > doing", otherwise many things can break.  That should also include 
> > > > > > ramblock
> > > > > > being matched between src/dst due to the same cmdlines provided on 
> > > > > > both
> > > > > > sides.  It'll be confusing to mention this when we thought the 
> > > > > > ramblocks
> > > > > > also rely on that fact.
> > > > > > 
> > > > > > So IIUC this sentence should be dropped in the real patch, and I'll 
> > > > > > try to
> > > > > > guess the real reason with below..
> > > > > 
> > > > > The properties of the implicitly created ramblocks must be preserved.
> > > > > The defaults can and do change between qemu releases, even when the 
> > > > > command-line
> > > > > parameters do not change for the explicit objects that cause these 
> > > > > implicit
> > > > > ramblocks to be created.
> > > > 
> > > > AFAIU, QEMU relies on ramblocks to be the same before this series.  Do 
> > > > you
> > > > have an example?  Would that already cause issue when migrate?
> > > 
> > > Alignment has changed, and used_length vs max_length changed when
> > > resizeable ramblocks were introduced.  I have dealt with these issues
> > > while supporting cpr for our internal use, and the learned lesson is to
> > > explicitly communicate the creation-time parameters to new qemu.
> > 
> > Why used_length can change?  I'm looking at ram_mig_ram_block_resized():
> > 
> >  if (!migration_is_idle()) {
> >  /*
> >   * Precopy code on the source cannot deal with the size of RAM 
> > blocks
> >   * changing at random points in time - especially after sending the
> >   * RAM block sizes in the migration stream, they must no longer 
> > change.
> >   * Abort and indicate a proper reason.
> >   */
> >  error_setg(&err, "RAM block '%s' resized during precopy.", 
> > rb->idstr);
> >  migration_cancel(err);
> >  error_free(err);
> >  }
> > 
> > We sent used_length upfront of a migration during SETUP phase.  Looks like
> > what you're describing can be something different, though?
> 
> I was imprecise.  used_length did not change; it was introduced as being
> different than max_length when resizeable ramblocks were introduced.
> 
> The max_length is not sent.  It is an implicit property of the implementation,
> and can change.  It is the size of the memfd mapping, so we need to know it
> and preserve it.
> 
> used_length is indeed sent during SETUP.  We could also send max_length
> at that time, and store both in the struct ramblock, and *maybe* that would
> be safe, but that is more fragile and less future proof than setting both
> properties to the correct value when the ramblock struct is created.
> 
> And BTW, the ramblock properties are sent using ad-hoc code in setup.
> I send them using nice clean vmstate.

Right, I agree that's not pretty at all... I wished we have had something
better, but that was just there for years.

When you said max_length can change, could you give an example?  I want to
know whether it means we have bug already, and bug fixing can even be done
before the rest.

Thinking now, maybe max_length is indeed fine to be changed acorss
migration?

Consider the fact that only used_length is used in both src/dst for
e.g. migration, dirty tracking, etc. purposes.  Basically we assumed that's
the "real size" of RAM irrelevant of "how large it used to be before
migration", or "how large it can grow after migration completes", while
max_length is "possible max value" here but isn't really important for
migration.

E.g., mem resize can allow a larger range after migration if the user
specifies max_length on dest to be larger than src max_length somehow, and
logically migration should still work indeed.  I just don't know whether
there'll be people using it like that.

> 
> > Regarding to rb->align:

Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-05-31 Thread Steven Sistare via

On 5/30/2024 2:39 PM, Peter Xu wrote:

On Thu, May 30, 2024 at 01:12:40PM -0400, Steven Sistare wrote:

On 5/29/2024 3:25 PM, Peter Xu wrote:

On Wed, May 29, 2024 at 01:31:53PM -0400, Steven Sistare wrote:

On 5/28/2024 5:44 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:

Preserve fields of RAMBlocks that allocate their host memory during CPR so
the RAM allocation can be recovered.


This sentence itself did not explain much, IMHO.  QEMU can share memory
using fd based memory already of all kinds, as long as the memory backend
is path-based it can be shared by sharing the same paths to dst.

This reads very confusing as a generic concept.  I mean, QEMU migration
relies on so many things to work right.  We mostly asks the users to "use
exactly the same cmdline for src/dst QEMU unless you know what you're
doing", otherwise many things can break.  That should also include ramblock
being matched between src/dst due to the same cmdlines provided on both
sides.  It'll be confusing to mention this when we thought the ramblocks
also rely on that fact.

So IIUC this sentence should be dropped in the real patch, and I'll try to
guess the real reason with below..


The properties of the implicitly created ramblocks must be preserved.
The defaults can and do change between qemu releases, even when the command-line
parameters do not change for the explicit objects that cause these implicit
ramblocks to be created.


AFAIU, QEMU relies on ramblocks to be the same before this series.  Do you
have an example?  Would that already cause issue when migrate?


Alignment has changed, and used_length vs max_length changed when
resizeable ramblocks were introduced.  I have dealt with these issues
while supporting cpr for our internal use, and the learned lesson is to
explicitly communicate the creation-time parameters to new qemu.


Why used_length can change?  I'm looking at ram_mig_ram_block_resized():

 if (!migration_is_idle()) {
 /*
  * Precopy code on the source cannot deal with the size of RAM blocks
  * changing at random points in time - especially after sending the
  * RAM block sizes in the migration stream, they must no longer change.
  * Abort and indicate a proper reason.
  */
 error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
 migration_cancel(err);
 error_free(err);
 }

We sent used_length upfront of a migration during SETUP phase.  Looks like
what you're describing can be something different, though?


I was imprecise.  used_length did not change; it was introduced as being
different than max_length when resizeable ramblocks were introduced.

The max_length is not sent.  It is an implicit property of the implementation,
and can change.  It is the size of the memfd mapping, so we need to know it
and preserve it.

used_length is indeed sent during SETUP.  We could also send max_length
at that time, and store both in the struct ramblock, and *maybe* that would
be safe, but that is more fragile and less future proof than setting both
properties to the correct value when the ramblock struct is created.

And BTW, the ramblock properties are sent using ad-hoc code in setup.
I send them using nice clean vmstate.


Regarding to rb->align: isn't that mostly a constant, reflecting the MR's
alignment?  It's set when ramblock is created IIUC:

 rb->align = mr->align;

When will the alignment change?


The alignment specified by the mr to allocate a new block is an implicit 
property
of the implementation, and has changed before, from one qemu release to another.
Not often, but it did, and could again in the future.  Communicating the 
alignment
from old qemu to new qemu is future proof.


These are not an issue for migration because the ramblock is re-created
and the data copied into the new memory.


Mirror the mr->align field in the RAMBlock to simplify the vmstate.
Preserve the old host address, even though it is immediately discarded,
as it will be needed in the future for CPR with iommufd.  Preserve
guest_memfd, even though CPR does not yet support it, to maintain vmstate
compatibility when it becomes supported.


.. It could be about the vfio vaddr update feature that you mentioned and
only for iommufd (as IIUC vfio still relies on iova ranges, then it won't
help here)?

If so, IMHO we should have this patch (or any variance form) to be there
for your upcoming vfio support.  Keeping this around like this will make
the series harder to review.  Or is it needed even before VFIO?


This patch is needed independently of vfio or iommufd.

guest_memfd is independent of vfio or iommufd.  It is a recent addition
which I have not tried to support, but I added this placeholder field
to it can be supported in the future without adding a new field later
and maintaining backwards compatibility.


Is guest_memfd the only user so far, then?  If so, would it be possible we
split it as a separa

Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-05-30 Thread Peter Xu
On Thu, May 30, 2024 at 01:12:40PM -0400, Steven Sistare wrote:
> On 5/29/2024 3:25 PM, Peter Xu wrote:
> > On Wed, May 29, 2024 at 01:31:53PM -0400, Steven Sistare wrote:
> > > On 5/28/2024 5:44 PM, Peter Xu wrote:
> > > > On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:
> > > > > Preserve fields of RAMBlocks that allocate their host memory during 
> > > > > CPR so
> > > > > the RAM allocation can be recovered.
> > > > 
> > > > This sentence itself did not explain much, IMHO.  QEMU can share memory
> > > > using fd based memory already of all kinds, as long as the memory 
> > > > backend
> > > > is path-based it can be shared by sharing the same paths to dst.
> > > > 
> > > > This reads very confusing as a generic concept.  I mean, QEMU migration
> > > > relies on so many things to work right.  We mostly asks the users to 
> > > > "use
> > > > exactly the same cmdline for src/dst QEMU unless you know what you're
> > > > doing", otherwise many things can break.  That should also include 
> > > > ramblock
> > > > being matched between src/dst due to the same cmdlines provided on both
> > > > sides.  It'll be confusing to mention this when we thought the ramblocks
> > > > also rely on that fact.
> > > > 
> > > > So IIUC this sentence should be dropped in the real patch, and I'll try 
> > > > to
> > > > guess the real reason with below..
> > > 
> > > The properties of the implicitly created ramblocks must be preserved.
> > > The defaults can and do change between qemu releases, even when the 
> > > command-line
> > > parameters do not change for the explicit objects that cause these 
> > > implicit
> > > ramblocks to be created.
> > 
> > AFAIU, QEMU relies on ramblocks to be the same before this series.  Do you
> > have an example?  Would that already cause issue when migrate?
> 
> Alignment has changed, and used_length vs max_length changed when
> resizeable ramblocks were introduced.  I have dealt with these issues
> while supporting cpr for our internal use, and the learned lesson is to
> explicitly communicate the creation-time parameters to new qemu.

Why used_length can change?  I'm looking at ram_mig_ram_block_resized():

if (!migration_is_idle()) {
/*
 * Precopy code on the source cannot deal with the size of RAM blocks
 * changing at random points in time - especially after sending the
 * RAM block sizes in the migration stream, they must no longer change.
 * Abort and indicate a proper reason.
 */
error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
migration_cancel(err);
error_free(err);
}

We sent used_length upfront of a migration during SETUP phase.  Looks like
what you're describing can be something different, though?

Regarding to rb->align: isn't that mostly a constant, reflecting the MR's
alignment?  It's set when ramblock is created IIUC:

rb->align = mr->align;

When will the alignment change?

> 
> These are not an issue for migration because the ramblock is re-created
> and the data copied into the new memory.
> 
> > > > > Mirror the mr->align field in the RAMBlock to simplify the vmstate.
> > > > > Preserve the old host address, even though it is immediately 
> > > > > discarded,
> > > > > as it will be needed in the future for CPR with iommufd.  Preserve
> > > > > guest_memfd, even though CPR does not yet support it, to maintain 
> > > > > vmstate
> > > > > compatibility when it becomes supported.
> > > > 
> > > > .. It could be about the vfio vaddr update feature that you mentioned 
> > > > and
> > > > only for iommufd (as IIUC vfio still relies on iova ranges, then it 
> > > > won't
> > > > help here)?
> > > > 
> > > > If so, IMHO we should have this patch (or any variance form) to be there
> > > > for your upcoming vfio support.  Keeping this around like this will make
> > > > the series harder to review.  Or is it needed even before VFIO?
> > > 
> > > This patch is needed independently of vfio or iommufd.
> > > 
> > > guest_memfd is independent of vfio or iommufd.  It is a recent addition
> > > which I have not tried to support, but I added this placeholder field
> > > to it can be supported in the future without adding a new field later
> > > and maintaining backwards compatibility.
> > 
> > Is guest_memfd the only user so far, then?  If so, would it be possible we
> > split it as a separate effort on top of the base cpr-exec support?
> 
> I don't understand the question.  I am indeed deferring support for 
> guest_memfd
> to a future time.  For now, I am adding a blocker, and reserving a field for
> it in the preserved ramblock attributes, to avoid adding a subsection later.

I meant I'm thinking whether the new ramblock vmsd may not be required for
the initial implementation.

E.g., IIUC vaddr is required by iommufd, and so far that's not part of the
initial support.

Then I think a major thing is about the fds to be managed that will need to
be shared.  If we put 

Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-05-30 Thread Steven Sistare via

On 5/29/2024 3:25 PM, Peter Xu wrote:

On Wed, May 29, 2024 at 01:31:53PM -0400, Steven Sistare wrote:

On 5/28/2024 5:44 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:

Preserve fields of RAMBlocks that allocate their host memory during CPR so
the RAM allocation can be recovered.


This sentence itself did not explain much, IMHO.  QEMU can share memory
using fd based memory already of all kinds, as long as the memory backend
is path-based it can be shared by sharing the same paths to dst.

This reads very confusing as a generic concept.  I mean, QEMU migration
relies on so many things to work right.  We mostly asks the users to "use
exactly the same cmdline for src/dst QEMU unless you know what you're
doing", otherwise many things can break.  That should also include ramblock
being matched between src/dst due to the same cmdlines provided on both
sides.  It'll be confusing to mention this when we thought the ramblocks
also rely on that fact.

So IIUC this sentence should be dropped in the real patch, and I'll try to
guess the real reason with below..


The properties of the implicitly created ramblocks must be preserved.
The defaults can and do change between qemu releases, even when the command-line
parameters do not change for the explicit objects that cause these implicit
ramblocks to be created.


AFAIU, QEMU relies on ramblocks to be the same before this series.  Do you
have an example?  Would that already cause issue when migrate?


Alignment has changed, and used_length vs max_length changed when
resizeable ramblocks were introduced.  I have dealt with these issues
while supporting cpr for our internal use, and the learned lesson is to
explicitly communicate the creation-time parameters to new qemu.

These are not an issue for migration because the ramblock is re-created
and the data copied into the new memory.


Mirror the mr->align field in the RAMBlock to simplify the vmstate.
Preserve the old host address, even though it is immediately discarded,
as it will be needed in the future for CPR with iommufd.  Preserve
guest_memfd, even though CPR does not yet support it, to maintain vmstate
compatibility when it becomes supported.


.. It could be about the vfio vaddr update feature that you mentioned and
only for iommufd (as IIUC vfio still relies on iova ranges, then it won't
help here)?

If so, IMHO we should have this patch (or any variance form) to be there
for your upcoming vfio support.  Keeping this around like this will make
the series harder to review.  Or is it needed even before VFIO?


This patch is needed independently of vfio or iommufd.

guest_memfd is independent of vfio or iommufd.  It is a recent addition
which I have not tried to support, but I added this placeholder field
to it can be supported in the future without adding a new field later
and maintaining backwards compatibility.


Is guest_memfd the only user so far, then?  If so, would it be possible we
split it as a separate effort on top of the base cpr-exec support?


I don't understand the question.  I am indeed deferring support for guest_memfd
to a future time.  For now, I am adding a blocker, and reserving a field for
it in the preserved ramblock attributes, to avoid adding a subsection later.

- Steve



Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-05-29 Thread Peter Xu
On Wed, May 29, 2024 at 01:31:53PM -0400, Steven Sistare wrote:
> On 5/28/2024 5:44 PM, Peter Xu wrote:
> > On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:
> > > Preserve fields of RAMBlocks that allocate their host memory during CPR so
> > > the RAM allocation can be recovered.
> > 
> > This sentence itself did not explain much, IMHO.  QEMU can share memory
> > using fd based memory already of all kinds, as long as the memory backend
> > is path-based it can be shared by sharing the same paths to dst.
> > 
> > This reads very confusing as a generic concept.  I mean, QEMU migration
> > relies on so many things to work right.  We mostly asks the users to "use
> > exactly the same cmdline for src/dst QEMU unless you know what you're
> > doing", otherwise many things can break.  That should also include ramblock
> > being matched between src/dst due to the same cmdlines provided on both
> > sides.  It'll be confusing to mention this when we thought the ramblocks
> > also rely on that fact.
> > 
> > So IIUC this sentence should be dropped in the real patch, and I'll try to
> > guess the real reason with below..
> 
> The properties of the implicitly created ramblocks must be preserved.
> The defaults can and do change between qemu releases, even when the 
> command-line
> parameters do not change for the explicit objects that cause these implicit
> ramblocks to be created.

AFAIU, QEMU relies on ramblocks to be the same before this series.  Do you
have an example?  Would that already cause issue when migrate?

> 
> > > Mirror the mr->align field in the RAMBlock to simplify the vmstate.
> > > Preserve the old host address, even though it is immediately discarded,
> > > as it will be needed in the future for CPR with iommufd.  Preserve
> > > guest_memfd, even though CPR does not yet support it, to maintain vmstate
> > > compatibility when it becomes supported.
> > 
> > .. It could be about the vfio vaddr update feature that you mentioned and
> > only for iommufd (as IIUC vfio still relies on iova ranges, then it won't
> > help here)?
> > 
> > If so, IMHO we should have this patch (or any variance form) to be there
> > for your upcoming vfio support.  Keeping this around like this will make
> > the series harder to review.  Or is it needed even before VFIO?
> 
> This patch is needed independently of vfio or iommufd.
> 
> guest_memfd is independent of vfio or iommufd.  It is a recent addition
> which I have not tried to support, but I added this placeholder field
> to it can be supported in the future without adding a new field later
> and maintaining backwards compatibility.

Is guest_memfd the only user so far, then?  If so, would it be possible we
split it as a separate effort on top of the base cpr-exec support?

-- 
Peter Xu




Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-05-29 Thread Steven Sistare via

On 5/28/2024 5:44 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:

Preserve fields of RAMBlocks that allocate their host memory during CPR so
the RAM allocation can be recovered.


This sentence itself did not explain much, IMHO.  QEMU can share memory
using fd based memory already of all kinds, as long as the memory backend
is path-based it can be shared by sharing the same paths to dst.

This reads very confusing as a generic concept.  I mean, QEMU migration
relies on so many things to work right.  We mostly asks the users to "use
exactly the same cmdline for src/dst QEMU unless you know what you're
doing", otherwise many things can break.  That should also include ramblock
being matched between src/dst due to the same cmdlines provided on both
sides.  It'll be confusing to mention this when we thought the ramblocks
also rely on that fact.

So IIUC this sentence should be dropped in the real patch, and I'll try to
guess the real reason with below..


The properties of the implicitly created ramblocks must be preserved.
The defaults can and do change between qemu releases, even when the command-line
parameters do not change for the explicit objects that cause these implicit
ramblocks to be created.


Mirror the mr->align field in the RAMBlock to simplify the vmstate.
Preserve the old host address, even though it is immediately discarded,
as it will be needed in the future for CPR with iommufd.  Preserve
guest_memfd, even though CPR does not yet support it, to maintain vmstate
compatibility when it becomes supported.


.. It could be about the vfio vaddr update feature that you mentioned and
only for iommufd (as IIUC vfio still relies on iova ranges, then it won't
help here)?

If so, IMHO we should have this patch (or any variance form) to be there
for your upcoming vfio support.  Keeping this around like this will make
the series harder to review.  Or is it needed even before VFIO?


This patch is needed independently of vfio or iommufd.

guest_memfd is independent of vfio or iommufd.  It is a recent addition
which I have not tried to support, but I added this placeholder field
to it can be supported in the future without adding a new field later
and maintaining backwards compatibility.


Another thing to ask: does this idea also need to rely on some future
iommufd kernel support?  If there's anything that's not merged in current
Linux upstream, this series needs to be marked as RFC, so it's not target
for merging.  This will also be true if this patch is "preparing" for that
work.  It means if this patch only services iommufd purpose, even if it
doesn't require any kernel header to be referenced, we should only merge it
together with the full iommufd support comes later (and that'll be after
iommufd kernel supports land).


It does not rely on future kernel support.

- Steve



Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-05-28 Thread Peter Xu
On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:
> Preserve fields of RAMBlocks that allocate their host memory during CPR so
> the RAM allocation can be recovered.

This sentence itself did not explain much, IMHO.  QEMU can share memory
using fd based memory already of all kinds, as long as the memory backend
is path-based it can be shared by sharing the same paths to dst.

This reads very confusing as a generic concept.  I mean, QEMU migration
relies on so many things to work right.  We mostly asks the users to "use
exactly the same cmdline for src/dst QEMU unless you know what you're
doing", otherwise many things can break.  That should also include ramblock
being matched between src/dst due to the same cmdlines provided on both
sides.  It'll be confusing to mention this when we thought the ramblocks
also rely on that fact.

So IIUC this sentence should be dropped in the real patch, and I'll try to
guess the real reason with below..

> Mirror the mr->align field in the RAMBlock to simplify the vmstate.
> Preserve the old host address, even though it is immediately discarded,
> as it will be needed in the future for CPR with iommufd.  Preserve
> guest_memfd, even though CPR does not yet support it, to maintain vmstate
> compatibility when it becomes supported.

.. It could be about the vfio vaddr update feature that you mentioned and
only for iommufd (as IIUC vfio still relies on iova ranges, then it won't
help here)?

If so, IMHO we should have this patch (or any variance form) to be there
for your upcoming vfio support.  Keeping this around like this will make
the series harder to review.  Or is it needed even before VFIO?

Another thing to ask: does this idea also need to rely on some future
iommufd kernel support?  If there's anything that's not merged in current
Linux upstream, this series needs to be marked as RFC, so it's not target
for merging.  This will also be true if this patch is "preparing" for that
work.  It means if this patch only services iommufd purpose, even if it
doesn't require any kernel header to be referenced, we should only merge it
together with the full iommufd support comes later (and that'll be after
iommufd kernel supports land).

Thanks,

-- 
Peter Xu




[PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-04-29 Thread Steve Sistare
Preserve fields of RAMBlocks that allocate their host memory during CPR so
the RAM allocation can be recovered.  Mirror the mr->align field in the
RAMBlock to simplify the vmstate.  Preserve the old host address, even
though it is immediately discarded, as it will be needed in the future for
CPR with iommufd.  Preserve guest_memfd, even though CPR does not yet
support it, to maintain vmstate compatibility when it becomes supported.

Signed-off-by: Steve Sistare 
---
 include/exec/ramblock.h |  6 ++
 system/physmem.c| 40 
 2 files changed, 46 insertions(+)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 61deefe..b492d89 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -44,6 +44,7 @@ struct RAMBlock {
 uint64_t fd_offset;
 int guest_memfd;
 size_t page_size;
+uint64_t align;
 /* dirty bitmap used during migration */
 unsigned long *bmap;
 
@@ -91,5 +92,10 @@ struct RAMBlock {
  */
 ram_addr_t postcopy_length;
 };
+
+#define RAM_BLOCK "RAMBlock"
+
+extern const VMStateDescription vmstate_ram_block;
+
 #endif
 #endif
diff --git a/system/physmem.c b/system/physmem.c
index 36d97ec..3019284 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1398,6 +1398,7 @@ static void *file_ram_alloc(RAMBlock *block,
 block->mr->align = MAX(block->mr->align, QEMU_VMALLOC_ALIGN);
 }
 #endif
+block->align = block->mr->align;
 
 if (memory < block->page_size) {
 error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
@@ -1848,6 +1849,7 @@ static void *ram_block_alloc_host(RAMBlock *rb, Error 
**errp)
  rb->idstr);
 }
 }
+rb->align = mr->align;
 
 if (host) {
 memory_try_enable_merging(host, rb->max_length);
@@ -1934,6 +1936,7 @@ static RAMBlock *ram_block_create(MemoryRegion *mr, 
ram_addr_t size,
 rb->flags = ram_flags;
 rb->page_size = qemu_real_host_page_size();
 rb->mr = mr;
+rb->align = mr->align;
 
 if (ram_flags & RAM_GUEST_MEMFD) {
 rb->guest_memfd = ram_block_create_guest_memfd(rb, errp);
@@ -2060,6 +2063,26 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 }
 #endif
 
+const VMStateDescription vmstate_ram_block = {
+.name = RAM_BLOCK,
+.version_id = 1,
+.minimum_version_id = 1,
+.precreate = true,
+.factory = true,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(align, RAMBlock),
+VMSTATE_VOID_PTR(host, RAMBlock),
+VMSTATE_INT32(fd, RAMBlock),
+VMSTATE_INT32(guest_memfd, RAMBlock),
+VMSTATE_UINT32(flags, RAMBlock),
+VMSTATE_UINT64(used_length, RAMBlock),
+VMSTATE_UINT64(max_length, RAMBlock),
+VMSTATE_END_OF_LIST()
+}
+};
+
+vmstate_register_init_factory(vmstate_ram_block, RAMBlock);
+
 static
 RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
   void (*resized)(const char*,
@@ -2070,6 +2093,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
 {
 RAMBlock *new_block;
 int align;
+g_autofree RAMBlock *preserved = NULL;
 
 assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
   RAM_NORESERVE | RAM_GUEST_MEMFD)) == 0);
@@ -2086,6 +2110,17 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
 }
 new_block->resized = resized;
 
+preserved = vmstate_claim_factory_object(RAM_BLOCK, new_block->idstr, 0);
+if (preserved) {
+assert(mr->align <= preserved->align);
+mr->align = mr->align ?: preserved->align;
+new_block->align = preserved->align;
+new_block->fd = preserved->fd;
+new_block->flags = preserved->flags;
+new_block->used_length = preserved->used_length;
+new_block->max_length = preserved->max_length;
+}
+
 if (!host) {
 host = ram_block_alloc_host(new_block, errp);
 if (!host) {
@@ -2093,6 +2128,10 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
ram_addr_t max_size,
 g_free(new_block);
 return NULL;
 }
+if (!(ram_flags & RAM_GUEST_MEMFD)) {
+vmstate_register_named(new_block->idstr, 0, &vmstate_ram_block,
+   new_block);
+}
 }
 
 new_block->host = host;
@@ -2157,6 +2196,7 @@ void qemu_ram_free(RAMBlock *block)
 }
 
 qemu_mutex_lock_ramlist();
+vmstate_unregister_named(RAM_BLOCK, block->idstr, 0);
 qemu_ram_unset_idstr(block);
 QLIST_REMOVE_RCU(block, next);
 ram_list.mru_block = NULL;
-- 
1.8.3.1