Hi Peter, On 2/6/26 3:31 PM, Peter Maydell wrote: > On Mon, 26 Jan 2026 at 16:55, Eric Auger <[email protected]> wrote: >> Currently when the number of KVM registers exposed by the source is >> larger than the one exposed on the destination, the migration fails >> with: "failed to load cpu:cpreg_vmstate_array_len" >> >> This gives no information about which registers are causing the trouble. >> >> This patch reworks the target/arm/machine code so that it becomes >> able to handle an input stream with a larger set of registers than >> the destination and print useful information about which registers >> are causing the trouble. The migration outcome is unchanged: >> - unexpected registers still will fail the migration >> - missing ones are printed but will not fail the migration, as done today. > Improving the diagnostics here is a great idea. > >> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra >> registers compared to what exists on the target. >> >> If there are more registers we will still hit the previous >> "load cpu:cpreg_vmstate_array_len" error. >> >> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers >> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed. >> >> Example: >> >> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: >> 0 0x6030000000160003 fw feat reg 3 >> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input >> stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3 >> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input >> stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2 >> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input >> stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3 >> qemu-system-aarch64: error while loading state for instance 0x0 of device >> 'cpu' >> qemu-system-aarch64: load of migration failed: Operation not permitted >> >> With TCG there is no user friendly formatting of the faulting >> register indexes as with KVM. However the 2 added trace points >> help to identify the culprit indexes. > Could we move kvm_print_register_name() out of kvm.c and into > somewhere that the TCG code can use it? (I did think when I > was reviewing the patch that added that that we might want it > for TCG too eventually.) > >> @@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque) >> { >> ARMCPU *cpu = opaque; >> CPUARMState *env = &cpu->env; >> + int arraylen = cpu->cpreg_vmstate_array_len + >> MAX_CPREG_VMSTATE_ANOMALIES; >> >> + cpu->cpreg_vmstate_indexes = g_renew(uint64_t, >> cpu->cpreg_vmstate_indexes, >> + arraylen); >> + cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values, >> + arraylen); >> + cpu->cpreg_vmstate_array_len = arraylen; > It seems a bit odd to extend these on cpu_pre_load, especially > since it means we'll do so on every cpu_pre_load call, which I > think can happen if you try an inbound migration, it fails, and > then you retry it. yes in that case g_renew would be called several times with cpu->cpreg_vmstate_array_len increased each time I guess. However with the current way cpreg_vmstate_indexes/values arrays were statically allocated I did not have other solution than using pre_load() I think.
> > I think it ought to be possible to both avoid this reallocation > and the problem noted in the commit message where more than 10 > extra registers results in an unhelpful message, if we can > convert the vmstate fields from VMSTATE_VARRAY_INT32 to > VMSTATE_VARRAY_INT32_ALLOC. (That latter doesn't exist yet but > will be the INT32 equivalent of VMSTATE_VARRAY_UINT32_ALLOC.) VMSTATE_VARRAY_INT32_ALLOC sounds a great idea indeed. I was not aware of its existence. I will try this out. > > If I have read the code correctly, these should work by > having the inbound migration code allocate the buffer for the > array data instead of expecting it to be pre-allocated -- that > means our post_load function can look at all the data it got > without imposing a length limitation. > > I think (but we should check :-)) that the data in the migration > stream is the same in both cases, so this will not be a compat break. OK I will test this. Thank you for the review! Eric > > (Some existing code will need adjustment to avoid a memory leak, > e.g. g_free any existing array in pre_load.) > > thanks > -- PMM >
