On 21 March 2018 at 08:00, Shannon Zhao <zhaoshengl...@huawei.com> wrote:
> On 2018/3/20 19:54, Peter Maydell wrote:
>> Can you still successfully migrate a VM from a QEMU version
>> without this bugfix to one with the bugfix ?
>>
> I've tested this case. I can migrate a VM between these two versions.

Hmm. Looking at the code I can't see how that would work,
except by accident. Let me see if I understand what's happening
here:

In the code in master, we have QEMU data structures
(bitmaps, etc) which have one entry for each of GICV3_MAXIRQ
irqs. That includes the RAZ/WI unused space for the SPIs/PPIs, so
for a 1-bit-per-irq bitmap:
 [0x00000000, irq 32, irq 33, .... ]

When we fill in the values from KVM into these data structures,
we start after the unused space, because the for_each_dist_irq_reg()
macro starts with _irq = GIC_INTERNAL. But we forgot to adjust
the offset value we use for the KVM access, so we start by
reading the RAZ/WI values from KVM, and the data structure
contents end up with:
 [0x00000000, 0x00000000, irq 32, irq 33, ... ]
(and the last irqs wouldn't get transferred).

With this change to the code we will get the offset right and
the data structure will be filled as
 [0x00000000, irq 32, irq 33, .... ]

But for migration from the old version, the data structure
we receive from the migration source will contain the old
broken layout of
 [0x00000000, 0x00000000, irq 32, irq 33, ... ]
so if the new code doesn't do anything special to handle
migration from that old version then it will write zeroes to
irq 32..63, and then write incorrect values for all the irqs
after that, won't it?

That suggests to me that we need to have some code in the
migration post-load routine that identifies that the data
is coming from an old version with this bug, and shifts
all the data down in the arrays so that the code to write
it to the kernel can handle it.

thanks
-- PMM

Reply via email to