Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"

2021-02-02 Thread Dr. David Alan Gilbert
* Damir Chanyshev (conflict...@gmail.com) wrote:
> Hello,
> Qemu version 5.1 host os Debian 10.7
> Two exactly the same  machines ( except ram size 380G and 1.5T )
> Live migration fails (from host with 380G ram to 1.5T) with errors like this:
> Feb 02 16:26:13 QEMU[12090]: kvm: load of migration failed: Invalid argument
> Feb 02 16:26:13 QEMU[12090]: kvm: error while loading state for
> instance 0x0 of device 'ram'
> Feb 02 16:26:13 QEMU[12090]: kvm: Mismatched RAM page size ram-node0
> (local) 2097152 != 1526773257204281392
> 
> I think it's some overflow issue.

That's a fun error; I've not seen anyone manage to trigger that before.

Could you please post the qemu command line from both the source and the
destination?

My guess here is that the use of huge pages is different on the source
and destination;  when the destination is using huge pages it will read
the page size of the block from the stream and compare it to the page
size it's using - they should match (if postcopy is enabled).

To me it looks like the destination is using 2MB huge pages
(probably explicitly from something like /dev/hugepages)
and maybe the source isn't; the source (because it's not using
hugepages) didn't bother sending the page size, so the destination
then reads some junk off the stream; that junk is probably the name
of the next RAMBlock, and it's probably a PCI device, so that
huge number is hex 15303030303A3030 which is 21 bytes long
and is:
  :00

which looks like the start of a PCI address; maybe for video RAM.

Or in a simple answer; if you've got postcopy enabled, and you're
using hugepages, make sure you use them consistently on source
and destination.

Dave

> -- 
> Thanks,
> Damir Chanyshev
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"

2021-02-02 Thread Peter Maydell
On Tue, 2 Feb 2021 at 14:21, Damir Chanyshev  wrote:
>
> вт, 2 февр. 2021 г. в 17:01, Peter Maydell :
> >
> >
> > Are you using exactly the same VM config on both source and
> > destination ? The migration error messages are often rather
> > opaque, but generally they mean "the source and the destination
> > don't match". In this case I think the message means that the
> > hugepage size on source and destination hosts is different.
> >
> > thanks
> > -- PMM
>
> Yes, VM config exactly the same on both sides. On hugepage part, guest
> backed by 2M pages.

Thanks for confirming. Looking again at the message, the
value the destination thinks it got for the hugepages size
(1526773257204281392) is 15303030303A3030 in hex, which is
really suspicious as a value (it's got lots of ascii-range
values in it). Something has probably got confused about
where in the migration-data-stream it is...

thanks
-- PMM



Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"

2021-02-02 Thread Jakob Bohm

On 2021-02-02 15:21, Damir Chanyshev wrote:

вт, 2 февр. 2021 г. в 17:01, Peter Maydell :


Are you using exactly the same VM config on both source and
destination ? The migration error messages are often rather
opaque, but generally they mean "the source and the destination
don't match". In this case I think the message means that the
hugepage size on source and destination hosts is different.

thanks
-- PMM

Yes, VM config exactly the same on both sides. On hugepage part, guest
backed by 2M pages.

Is the large number an ASCII string mistaken for a binary number?

Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded




Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"

2021-02-02 Thread Damir Chanyshev
вт, 2 февр. 2021 г. в 17:01, Peter Maydell :
>
>
> Are you using exactly the same VM config on both source and
> destination ? The migration error messages are often rather
> opaque, but generally they mean "the source and the destination
> don't match". In this case I think the message means that the
> hugepage size on source and destination hosts is different.
>
> thanks
> -- PMM

Yes, VM config exactly the same on both sides. On hugepage part, guest
backed by 2M pages.
-- 
Thanks,
Damir Chanyshev



Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"

2021-02-02 Thread Peter Maydell
On Tue, 2 Feb 2021 at 13:53, Damir Chanyshev  wrote:
>
> Hello,
> Qemu version 5.1 host os Debian 10.7
> Two exactly the same  machines ( except ram size 380G and 1.5T )
> Live migration fails (from host with 380G ram to 1.5T) with errors like this:
> Feb 02 16:26:13 QEMU[12090]: kvm: load of migration failed: Invalid argument
> Feb 02 16:26:13 QEMU[12090]: kvm: error while loading state for
> instance 0x0 of device 'ram'
> Feb 02 16:26:13 QEMU[12090]: kvm: Mismatched RAM page size ram-node0
> (local) 2097152 != 1526773257204281392

Are you using exactly the same VM config on both source and
destination ? The migration error messages are often rather
opaque, but generally they mean "the source and the destination
don't match". In this case I think the message means that the
hugepage size on source and destination hosts is different.

thanks
-- PMM



Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"

2021-02-02 Thread Damir Chanyshev
Hello,
Qemu version 5.1 host os Debian 10.7
Two exactly the same  machines ( except ram size 380G and 1.5T )
Live migration fails (from host with 380G ram to 1.5T) with errors like this:
Feb 02 16:26:13 QEMU[12090]: kvm: load of migration failed: Invalid argument
Feb 02 16:26:13 QEMU[12090]: kvm: error while loading state for
instance 0x0 of device 'ram'
Feb 02 16:26:13 QEMU[12090]: kvm: Mismatched RAM page size ram-node0
(local) 2097152 != 1526773257204281392

I think it's some overflow issue.
-- 
Thanks,
Damir Chanyshev