Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"
* Damir Chanyshev (conflict...@gmail.com) wrote: > Hello, > Qemu version 5.1 host os Debian 10.7 > Two exactly the same machines ( except ram size 380G and 1.5T ) > Live migration fails (from host with 380G ram to 1.5T) with errors like this: > Feb 02 16:26:13 QEMU[12090]: kvm: load of migration failed: Invalid argument > Feb 02 16:26:13 QEMU[12090]: kvm: error while loading state for > instance 0x0 of device 'ram' > Feb 02 16:26:13 QEMU[12090]: kvm: Mismatched RAM page size ram-node0 > (local) 2097152 != 1526773257204281392 > > I think it's some overflow issue. That's a fun error; I've not seen anyone manage to trigger that before. Could you please post the qemu command line from both the source and the destination? My guess here is that the use of huge pages is different on the source and destination; when the destination is using huge pages it will read the page size of the block from the stream and compare it to the page size it's using - they should match (if postcopy is enabled). To me it looks like the destination is using 2MB huge pages (probably explicitly from something like /dev/hugepages) and maybe the source isn't; the source (because it's not using hugepages) didn't bother sending the page size, so the destination then reads some junk off the stream; that junk is probably the name of the next RAMBlock, and it's probably a PCI device, so that huge number is hex 15303030303A3030 which is 21 bytes long and is: :00 which looks like the start of a PCI address; maybe for video RAM. Or in a simple answer; if you've got postcopy enabled, and you're using hugepages, make sure you use them consistently on source and destination. Dave > -- > Thanks, > Damir Chanyshev > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"
On Tue, 2 Feb 2021 at 14:21, Damir Chanyshev wrote: > > вт, 2 февр. 2021 г. в 17:01, Peter Maydell : > > > > > > Are you using exactly the same VM config on both source and > > destination ? The migration error messages are often rather > > opaque, but generally they mean "the source and the destination > > don't match". In this case I think the message means that the > > hugepage size on source and destination hosts is different. > > > > thanks > > -- PMM > > Yes, VM config exactly the same on both sides. On hugepage part, guest > backed by 2M pages. Thanks for confirming. Looking again at the message, the value the destination thinks it got for the hugepages size (1526773257204281392) is 15303030303A3030 in hex, which is really suspicious as a value (it's got lots of ascii-range values in it). Something has probably got confused about where in the migration-data-stream it is... thanks -- PMM
Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"
On 2021-02-02 15:21, Damir Chanyshev wrote: вт, 2 февр. 2021 г. в 17:01, Peter Maydell : Are you using exactly the same VM config on both source and destination ? The migration error messages are often rather opaque, but generally they mean "the source and the destination don't match". In this case I think the message means that the hugepage size on source and destination hosts is different. thanks -- PMM Yes, VM config exactly the same on both sides. On hugepage part, guest backed by 2M pages. Is the large number an ASCII string mistaken for a binary number? Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"
вт, 2 февр. 2021 г. в 17:01, Peter Maydell : > > > Are you using exactly the same VM config on both source and > destination ? The migration error messages are often rather > opaque, but generally they mean "the source and the destination > don't match". In this case I think the message means that the > hugepage size on source and destination hosts is different. > > thanks > -- PMM Yes, VM config exactly the same on both sides. On hugepage part, guest backed by 2M pages. -- Thanks, Damir Chanyshev
Re: Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"
On Tue, 2 Feb 2021 at 13:53, Damir Chanyshev wrote: > > Hello, > Qemu version 5.1 host os Debian 10.7 > Two exactly the same machines ( except ram size 380G and 1.5T ) > Live migration fails (from host with 380G ram to 1.5T) with errors like this: > Feb 02 16:26:13 QEMU[12090]: kvm: load of migration failed: Invalid argument > Feb 02 16:26:13 QEMU[12090]: kvm: error while loading state for > instance 0x0 of device 'ram' > Feb 02 16:26:13 QEMU[12090]: kvm: Mismatched RAM page size ram-node0 > (local) 2097152 != 1526773257204281392 Are you using exactly the same VM config on both source and destination ? The migration error messages are often rather opaque, but generally they mean "the source and the destination don't match". In this case I think the message means that the hugepage size on source and destination hosts is different. thanks -- PMM
Live migration fails with "Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392"
Hello, Qemu version 5.1 host os Debian 10.7 Two exactly the same machines ( except ram size 380G and 1.5T ) Live migration fails (from host with 380G ram to 1.5T) with errors like this: Feb 02 16:26:13 QEMU[12090]: kvm: load of migration failed: Invalid argument Feb 02 16:26:13 QEMU[12090]: kvm: error while loading state for instance 0x0 of device 'ram' Feb 02 16:26:13 QEMU[12090]: kvm: Mismatched RAM page size ram-node0 (local) 2097152 != 1526773257204281392 I think it's some overflow issue. -- Thanks, Damir Chanyshev