Hi Peter,

On 2/6/26 3:31 PM, Peter Maydell wrote:
> On Mon, 26 Jan 2026 at 16:55, Eric Auger <[email protected]> wrote:
>> Currently when the number of KVM registers exposed by the source is
>> larger than the one exposed on the destination, the migration fails
>> with: "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This gives no information about which registers are causing the trouble.
>>
>> This patch reworks the target/arm/machine code so that it becomes
>> able to handle an input stream with a larger set of registers than
>> the destination and print useful information about which registers
>> are causing the trouble. The migration outcome is unchanged:
>> - unexpected registers still will fail the migration
>> - missing ones are printed but will not fail the migration, as done today.
> Improving the diagnostics here is a great idea.
>
>> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
>> registers compared to what exists on the target.
>>
>> If there are more registers we will still hit the previous
>> "load cpu:cpreg_vmstate_array_len" error.
>>
>> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
>> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>>
>> Example:
>>
>> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream: 
>> 0 0x6030000000160003 fw feat reg 3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input 
>> stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input 
>> stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
>> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input 
>> stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
>> qemu-system-aarch64: error while loading state for instance 0x0 of device 
>> 'cpu'
>> qemu-system-aarch64: load of migration failed: Operation not permitted
>>
>> With TCG there is no user friendly formatting of the faulting
>> register indexes as with KVM. However the 2 added trace points
>> help to identify the culprit indexes.
> Could we move kvm_print_register_name() out of kvm.c and into
> somewhere that the TCG code can use it? (I did think when I
> was reviewing the patch that added that that we might want it
> for TCG too eventually.)
>
>> @@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque)
>>  {
>>      ARMCPU *cpu = opaque;
>>      CPUARMState *env = &cpu->env;
>> +    int arraylen = cpu->cpreg_vmstate_array_len + 
>> MAX_CPREG_VMSTATE_ANOMALIES;
>>
>> +    cpu->cpreg_vmstate_indexes = g_renew(uint64_t, 
>> cpu->cpreg_vmstate_indexes,
>> +                                         arraylen);
>> +    cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
>> +                                        arraylen);
>> +    cpu->cpreg_vmstate_array_len = arraylen;
> It seems a bit odd to extend these on cpu_pre_load, especially
> since it means we'll do so on every cpu_pre_load call, which I
> think can happen if you try an inbound migration, it fails, and
> then you retry it.
yes in that case g_renew would be called several times with
cpu->cpreg_vmstate_array_len increased each time I guess. However with
the current way cpreg_vmstate_indexes/values arrays were statically
allocated I did not have other solution than using pre_load() I think.

>
> I think it ought to be possible to both avoid this reallocation
> and the problem noted in the commit message where more than 10
> extra registers results in an unhelpful message, if we can
> convert the vmstate fields from VMSTATE_VARRAY_INT32 to
> VMSTATE_VARRAY_INT32_ALLOC. (That latter doesn't exist yet but
> will be the INT32 equivalent of VMSTATE_VARRAY_UINT32_ALLOC.)

VMSTATE_VARRAY_INT32_ALLOC sounds a great idea indeed. I was not aware of its 
existence. I will try this out.

>
> If I have read the code correctly, these should work by
> having the inbound migration code allocate the buffer for the
> array data instead of expecting it to be pre-allocated -- that
> means our post_load function can look at all the data it got
> without imposing a length limitation.
>
> I think (but we should check :-)) that the data in the migration
> stream is the same in both cases, so this will not be a compat break.

OK I will test this.

Thank you for the review!

Eric
>
> (Some existing code will need adjustment to avoid a memory leak,
> e.g. g_free any existing array in pre_load.)
>
> thanks
> -- PMM
>


Reply via email to