Hi Everyone,

We are facing issues booting windows VMs with SapphireRapids CPU definition. 
This is happening in case we have multiple cores per vcpu set and the VM is a 
UEFI, secure boot and credential guard enabled. Till now we have observed this 
issue on windows 10 and 11.



We did some triaging around this. SapphireRapids CPU definition has raised 
cpuid_level to 0x20. This includes leaf V2 extended topology (0x1f). QEMU 
returns all zeros in case 
!x86_has_extended_topo()<https://github.com/qemu/qemu/blob/58ee924b97d1c0898555647a31820c5a20d55a73/target/i386/kvm/kvm.c#L1834>.
 As per expectation(also mentioned in 
https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf)
 if guests see this it should fallback to 0x1b. Somehow windows 10 and windows 
11 does not work well with this assumption and panics on boot.



We checked on one of the SapphireRapids node with no multi-die topology; this 
is how CPUID output looks like. 0x1f output is the same as 0xb.


# cpuid -l 0xb -s 0 -1

CPU:

   x2APIC features / processor topology (0xb):

      extended APIC ID                      = 37

      --- level 0 ---

      level number                          = 0x0 (0)

      level type                            = thread (1)

      bit width of level & previous levels  = 0x1 (1)

      number of logical processors at level = 0x2 (2)

# cpuid -l 0xb -s 1 -1

CPU:

      --- level 1 ---

      level number                          = 0x1 (1)

      level type                            = core (2)

      bit width of level & previous levels  = 0x7 (7)

      number of logical processors at level = 0x28 (40)

# cpuid -l 0xb -s 2 -1

CPU:

      --- level 2 ---

      level number                          = 0x2 (2)

      level type                            = invalid (0)

      bit width of level & previous levels  = 0x0 (0)

      number of logical processors at level = 0x0 (0)

# cpuid -l 0x1f -s 0 -1

CPU:

   V2 extended topology (0x1f):

      x2APIC ID of logical processor = 0x25 (37)

      --- level 0 ---

      level number                          = 0x0 (0)

      level type                            = thread (1)

      bit width of level & previous levels  = 0x1 (1)

      number of logical processors at level = 0x2 (2)

# cpuid -l 0x1f -s 1 -1

CPU:

      --- level 1 ---

      level number                          = 0x1 (1)

      level type                            = core (2)

      bit width of level & previous levels  = 0x7 (7)

      number of logical processors at level = 0x28 (40)

# cpuid -l 0x1f -s 2 -1

CPU:

      --- level 2 ---

      level number                          = 0x2 (2)

      level type                            = invalid (0)

      bit width of level & previous levels  = 0x0 (0)

      number of logical processors at level = 0x0 (0)



We tried a workaround having 0x1f output same as 0xb in case 
!x86_has_extended_topo(), instead of setting all zeros. This seems to work 
fine. Our understanding is that current QEMU behaviour is not incorrect but 
still does the above mentioned workaround makes sense? And if we look it is the 
same as bare-metal so it should not be unreasonable. If so will be happy to 
send a patch for same.


Thanks

Manish Mishra


Reply via email to