Am 03.03.26 um 4:57 PM schrieb Fiona Ebner:
> If cpu->env.has_error_code is true, backwards migration of a VM from
> a QEMU binary with commit 27535e9cca to a QEMU binary without commit
> 27535e9cca will fail:
> 
>> kvm: error while loading state for instance 0x0 of device 'cpu'
> 
> This happens even if error_code == 0. Fix it by only sending the
> error code if it is actually set.
> 
> Cc: [email protected]
> Fixes: 27535e9cca ("target/i386: Add support for save/load of exception error 
> code")
> Signed-off-by: Fiona Ebner <[email protected]>
> ---
>  target/i386/machine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/machine.c b/target/i386/machine.c
> index c913961281..b47b14d819 100644
> --- a/target/i386/machine.c
> +++ b/target/i386/machine.c
> @@ -466,7 +466,7 @@ static bool cpu_errcode_needed(void *opaque)
>  {
>      X86CPU *cpu = opaque;
>  
> -    return cpu->env.has_error_code != 0;
> +    return cpu->env.has_error_code != 0 && cpu->env.error_code != 0;
>  }
>  
>  static const VMStateDescription vmstate_error_code = {

This is not enough in practice and we might need proper machine
versioning here.

For a guest with '-smp 4,sockets=1,cores=4,maxcpus=4' running Ubuntu
24.10, I still have issues, e.g. with error codes 6 and 20 being set
sometimes, leading to the following error again:
kvm: error while loading state for instance 0x0 of device 'cpu'

The original commit mentions guest kernel panics:

> commit 27535e9ccae89db5856bfb5e3357f44645812143
> Author: Xin Wang <[email protected]>
> Date:   Tue Aug 19 22:58:34 2025 +0800
> 
>     target/i386: Add support for save/load of exception error code
>     
>     For now, qemu save/load CPU exception info(such as exception_nr and
>     has_error_code), while the exception error_code is ignored. This will
>     cause the dest hypervisor reinject a vCPU exception with error_code(0),
>     potentially causing a guest kernel panic.
>     
>     For instance, if src VM stopped with an user-mode write #PF (error_code 
> 6),
>     the dest hypervisor will reinject an #PF with error_code(0) when vCPU 
> resume,
>     then guest kernel panic as:
>       BUG: unable to handle page fault for address: 00007f80319cb010
>       #PF: supervisor read access in user mode
>       #PF: error_code(0x0000) - not-present page
>       RIP: 0033:0x40115d
>     
>     To fix it, support save/load exception error_code.

Is there some other factor needed for the guest kernel panic to trigger?
Migrating my guest between 10.1.2 and 10.1.2 seems to work without any
noticeable issue within the guest even though I can see that error_code
is often set during the time of the migration.

Best Regards,
Fiona


Reply via email to