Hi Everyone, We are facing issues booting windows VMs with SapphireRapids CPU definition. This is happening in case we have multiple cores per vcpu set and the VM is a UEFI, secure boot and credential guard enabled. Till now we have observed this issue on windows 10 and 11.
We did some triaging around this. SapphireRapids CPU definition has raised cpuid_level to 0x20. This includes leaf V2 extended topology (0x1f). QEMU returns all zeros in case !x86_has_extended_topo()<https://github.com/qemu/qemu/blob/58ee924b97d1c0898555647a31820c5a20d55a73/target/i386/kvm/kvm.c#L1834>. As per expectation(also mentioned in https://cdrdv2-public.intel.com/775917/intel-64-architecture-processor-topology-enumeration.pdf) if guests see this it should fallback to 0x1b. Somehow windows 10 and windows 11 does not work well with this assumption and panics on boot. We checked on one of the SapphireRapids node with no multi-die topology; this is how CPUID output looks like. 0x1f output is the same as 0xb. # cpuid -l 0xb -s 0 -1 CPU: x2APIC features / processor topology (0xb): extended APIC ID = 37 --- level 0 --- level number = 0x0 (0) level type = thread (1) bit width of level & previous levels = 0x1 (1) number of logical processors at level = 0x2 (2) # cpuid -l 0xb -s 1 -1 CPU: --- level 1 --- level number = 0x1 (1) level type = core (2) bit width of level & previous levels = 0x7 (7) number of logical processors at level = 0x28 (40) # cpuid -l 0xb -s 2 -1 CPU: --- level 2 --- level number = 0x2 (2) level type = invalid (0) bit width of level & previous levels = 0x0 (0) number of logical processors at level = 0x0 (0) # cpuid -l 0x1f -s 0 -1 CPU: V2 extended topology (0x1f): x2APIC ID of logical processor = 0x25 (37) --- level 0 --- level number = 0x0 (0) level type = thread (1) bit width of level & previous levels = 0x1 (1) number of logical processors at level = 0x2 (2) # cpuid -l 0x1f -s 1 -1 CPU: --- level 1 --- level number = 0x1 (1) level type = core (2) bit width of level & previous levels = 0x7 (7) number of logical processors at level = 0x28 (40) # cpuid -l 0x1f -s 2 -1 CPU: --- level 2 --- level number = 0x2 (2) level type = invalid (0) bit width of level & previous levels = 0x0 (0) number of logical processors at level = 0x0 (0) We tried a workaround having 0x1f output same as 0xb in case !x86_has_extended_topo(), instead of setting all zeros. This seems to work fine. Our understanding is that current QEMU behaviour is not incorrect but still does the above mentioned workaround makes sense? And if we look it is the same as bare-metal so it should not be unreasonable. If so will be happy to send a patch for same. Thanks Manish Mishra