Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On Thu, May 30, 2024 at 04:49:33PM +0200, Igor Mammedov wrote: > On Thu, 30 May 2024 21:54:47 +0800 > Zhao Liu wrote: > > > Hi Zide, > > > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > > > Date: Wed, 29 May 2024 10:31:21 -0700 > > > From: "Chen, Zide" > > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > > > > > > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > > > On Tue, 28 May 2024 11:16:59 -0700 > > > > "Chen, Zide" wrote: > > > > > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > > >>> On Fri, 24 May 2024 13:00:14 -0700 > > > >>> Zide Chen wrote: > > > >>> > > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > > >>> > > > >>> this is missing proper description how do you trigger issue > > > >>> with reproducer and detailed description why guest sees MWAIT > > > >>> when it's not supported by host. > > > >> > > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > > >> > > > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > > > hardware was used if it's relevant so others can reproduce problem. > > > > > > I ever reproduced this on an older Intel Icelake machine, a > > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > > > issue, not specific to particular models. > > > > > > For the CLI, I think the only command line options that matter are > > > -overcommit cpu-pm=on: to set enable_cpu_pm > > > -cpu host: so that cpu->max_features is set > > > > > > For QEMU version, as long as it's after this commit: 662175b91ff2 > > > ("i386: reorder call to cpu_exec_realizefn") > > > > > > The guest fails to boot: > > > > > > [ 24.825568] smpboot: x86: Booting SMP configuration: > > > [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > > > #13 #14 #15 #17 > > > [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > > > #136 #137 #138 #139 #140 #141 #142 #143 #145 > > > [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI > > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > > [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 > > > [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: > > > > > > [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: > > > 0004 > > > [ 25.137790] RBP: R08: 0005da4709cb R09: > > > 0001 > > > [ 25.137790] R10: 5da4 R11: 0009 R12: > > > > > > [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: > > > 00093ff0 > > > [ 25.137790] FS: () GS:97f1ade0() > > > knlGS: > > > [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 > > > [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: > > > 00770ef0 > > > [ 25.137790] DR0: DR1: DR2: > > > > > > [ 25.137790] DR3: DR6: 07f0 DR7: > > > 0400 > > > [ 25.137790] PKRU: 5554 > > > [ 25.137790] Call Trace: > > > [ 25.137790] > > > [ 25.137790] ? die+0x37/0x90 > > > [ 25.137790] ? do_trap+0xe3/0x110 > > > [ 25.137790] ? mwait_idle+0x35/0x80 > > > [ 25.137790] ? do_error_trap+0x6a/0x90 > > > [ 25.137790] ? mwait_idle+0x35/0x80 > > > [ 25.137790] ? exc_invalid_op+0x52/0x70 > > > [ 25.137790] ? mwait_idle+0x35/0x80 > > > [ 25.137790] ? asm_exc_inval
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On 5/30/2024 6:54 AM, Zhao Liu wrote: > Hi Zide, > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: >> Date: Wed, 29 May 2024 10:31:21 -0700 >> From: "Chen, Zide" >> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off >> >> >> >> On 5/29/2024 5:46 AM, Igor Mammedov wrote: >>> On Tue, 28 May 2024 11:16:59 -0700 >>> "Chen, Zide" wrote: >>> >>>> On 5/28/2024 2:23 AM, Igor Mammedov wrote: >>>>> On Fri, 24 May 2024 13:00:14 -0700 >>>>> Zide Chen wrote: >>>>> >>>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >>>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >>>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. >>>>> >>>>> this is missing proper description how do you trigger issue >>>>> with reproducer and detailed description why guest sees MWAIT >>>>> when it's not supported by host. >>>> >>>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the >>> it's bette to provide full QEMU CLI and host/guest kernels used and what >>> hardware was used if it's relevant so others can reproduce problem. >> >> I ever reproduced this on an older Intel Icelake machine, a >> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic >> issue, not specific to particular models. >> >> For the CLI, I think the only command line options that matter are >> -overcommit cpu-pm=on: to set enable_cpu_pm >> -cpu host: so that cpu->max_features is set >> >> For QEMU version, as long as it's after this commit: 662175b91ff2 >> ("i386: reorder call to cpu_exec_realizefn") >> >> The guest fails to boot: >> >> [ 24.825568] smpboot: x86: Booting SMP configuration: >> [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 >> #13 #14 #15 #17 >> [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 >> #136 #137 #138 #139 #140 #141 #142 #143 #145 >> [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI >> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 >> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 >> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 >> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 >> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 >> [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 >> [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: >> >> [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: >> 0004 >> [ 25.137790] RBP: R08: 0005da4709cb R09: >> 0001 >> [ 25.137790] R10: 5da4 R11: 0009 R12: >> >> [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: >> 00093ff0 >> [ 25.137790] FS: () GS:97f1ade0() >> knlGS: >> [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 >> [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: >> 00770ef0 >> [ 25.137790] DR0: DR1: DR2: >> >> [ 25.137790] DR3: DR6: 07f0 DR7: >> 0400 >> [ 25.137790] PKRU: 5554 >> [ 25.137790] Call Trace: >> [ 25.137790] >> [ 25.137790] ? die+0x37/0x90 >> [ 25.137790] ? do_trap+0xe3/0x110 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? do_error_trap+0x6a/0x90 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? exc_invalid_op+0x52/0x70 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] default_idle_call+0x30/0x100 >> [ 25.137790] cpuidle_idle_call+0x12c/0x170 >> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 >> [ 25.137790] do_idle+0x7f/0xd0 >> [ 25.137790] cpu_startup_entry+0x29/0x30 >> [ 25.137790] rest_init+0xcc/0xd0 >> [ 25.137790] start_kernel+0x396/0x5d0 >> [ 25.137790] x86_64_start_reservations+0x18/0x30 >> [ 25.137790] x86_64_start_kernel+0xe7/0xf0 >> [ 25.137790] common_startup_64+0x13e/0x148 >> [ 25.137790] >> [ 25.137790] Modules linked in: >> [ 25.137790] --[ end trace 0
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On Thu, May 30, 2024, Igor Mammedov wrote: > On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu wrote: ... > > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > > > >> that it doesn't have a chance to check MWAIT against host features and > > > >> will be advertised to the guest regardless of whether it's supported by > > > >> the host or not. > > > >> > > > >> x86_cpu_realizefn() > > > >> x86_cpu_filter_features() > > > >> cpu_exec_realizefn() > > > >> kvm_cpu_realizefn > > > >> host_cpu_realizefn > > > >> host_cpu_enable_cpu_pm > > > >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > > > >> > > > >> > > > >> If it's not supported by the host, executing MONITOR or MWAIT > > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING > > > >> control is set or not. > > > > > > > > If I recall right, kvm was able to emulate mwait/monitor. > > > > So question is why it leads to exception instead? Because KVM doesn't emulated MONITOR/MWAIT on #UD. > > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I > > > didn't find explicit proof from Intel SDM that #UD exceptions take > > > precedence over MWAIT/MONITOR VM exits, but this is my speculation. Yeah, typically #UD takes priority over VM-Exit interception checks. AMD's APM is much more explicit and states that all exceptions are checked on MONITOR/MWAIT before the interception check. > > > For example, in ancient machines which don't support MWAIT yet, the only > > > way it can do is #UD, not MWAIT VM exit? Not really relevant, because such CPUs wouldn't have MWAIT-exiting. > > For the Host which doesn't support MWAIT, it shouldn't have the VMX > > control bit for mwait exit either, right? > > > > Could you pls check this on your machine? If VMX doesn't support this > > exit event, then triggering an exception will make sense. > > My assumption (probably wrong) was that KVM would emulate mwait if it's > unavailable, Nope. In order to limit the attack surface of the emulator on modern CPUs, KVM only emulates select instructions in response to a #UD. But even if KVM did emulate MONITOR/MWAIT on #UD, this is inarguably a QEMU bug, e.g. QEMU will effectively coerce the guest into using a idle-polling mechanism.
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu wrote: > Hi Zide, > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > > Date: Wed, 29 May 2024 10:31:21 -0700 > > From: "Chen, Zide" > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > > On Tue, 28 May 2024 11:16:59 -0700 > > > "Chen, Zide" wrote: > > > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > >>> On Fri, 24 May 2024 13:00:14 -0700 > > >>> Zide Chen wrote: > > >>> > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > >>> > > >>> this is missing proper description how do you trigger issue > > >>> with reproducer and detailed description why guest sees MWAIT > > >>> when it's not supported by host. > > >> > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > > hardware was used if it's relevant so others can reproduce problem. > > > > I ever reproduced this on an older Intel Icelake machine, a > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > > issue, not specific to particular models. > > > > For the CLI, I think the only command line options that matter are > > -overcommit cpu-pm=on: to set enable_cpu_pm > > -cpu host: so that cpu->max_features is set > > > > For QEMU version, as long as it's after this commit: 662175b91ff2 > > ("i386: reorder call to cpu_exec_realizefn") > > > > The guest fails to boot: > > > > [ 24.825568] smpboot: x86: Booting SMP configuration: > > [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > > #13 #14 #15 #17 > > [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > > #136 #137 #138 #139 #140 #141 #142 #143 #145 > > [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 > > [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: > > > > [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: > > 0004 > > [ 25.137790] RBP: R08: 0005da4709cb R09: > > 0001 > > [ 25.137790] R10: 5da4 R11: 0009 R12: > > > > [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: > > 00093ff0 > > [ 25.137790] FS: () GS:97f1ade0() > > knlGS: > > [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 > > [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: > > 00770ef0 > > [ 25.137790] DR0: DR1: DR2: > > > > [ 25.137790] DR3: DR6: 07f0 DR7: > > 0400 > > [ 25.137790] PKRU: 5554 > > [ 25.137790] Call Trace: > > [ 25.137790] > > [ 25.137790] ? die+0x37/0x90 > > [ 25.137790] ? do_trap+0xe3/0x110 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? do_error_trap+0x6a/0x90 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? exc_invalid_op+0x52/0x70 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] default_idle_call+0x30/0x100 > > [ 25.137790] cpuidle_idle_call+0x12c/0x170 > > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 > > [ 25.137790] do_idle+0x7f/0xd0 > > [ 25.137790] cpu_startup_entry+0x29/0x30 > > [ 25.137790] rest_init+0xcc/0xd0 > > [ 25.137790] start_kernel+0x396/0x5d0 > > [ 25.137790] x86_64_start_reservations+0x18/0x30 > > [ 25.137790] x86_64_start_kernel+0xe7/0xf0 > >
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu wrote: > Hi Zide, > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > > Date: Wed, 29 May 2024 10:31:21 -0700 > > From: "Chen, Zide" > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > > On Tue, 28 May 2024 11:16:59 -0700 > > > "Chen, Zide" wrote: > > > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > >>> On Fri, 24 May 2024 13:00:14 -0700 > > >>> Zide Chen wrote: > > >>> > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > >>> > > >>> this is missing proper description how do you trigger issue > > >>> with reproducer and detailed description why guest sees MWAIT > > >>> when it's not supported by host. > > >> > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > > hardware was used if it's relevant so others can reproduce problem. > > > > I ever reproduced this on an older Intel Icelake machine, a > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > > issue, not specific to particular models. > > > > For the CLI, I think the only command line options that matter are > > -overcommit cpu-pm=on: to set enable_cpu_pm > > -cpu host: so that cpu->max_features is set > > > > For QEMU version, as long as it's after this commit: 662175b91ff2 > > ("i386: reorder call to cpu_exec_realizefn") > > > > The guest fails to boot: > > > > [ 24.825568] smpboot: x86: Booting SMP configuration: > > [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > > #13 #14 #15 #17 > > [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > > #136 #137 #138 #139 #140 #141 #142 #143 #145 > > [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 > > [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: > > > > [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: > > 0004 > > [ 25.137790] RBP: R08: 0005da4709cb R09: > > 0001 > > [ 25.137790] R10: 5da4 R11: 0009 R12: > > > > [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: > > 00093ff0 > > [ 25.137790] FS: () GS:97f1ade0() > > knlGS: > > [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 > > [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: > > 00770ef0 > > [ 25.137790] DR0: DR1: DR2: > > > > [ 25.137790] DR3: DR6: 07f0 DR7: > > 0400 > > [ 25.137790] PKRU: 5554 > > [ 25.137790] Call Trace: > > [ 25.137790] > > [ 25.137790] ? die+0x37/0x90 > > [ 25.137790] ? do_trap+0xe3/0x110 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? do_error_trap+0x6a/0x90 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? exc_invalid_op+0x52/0x70 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 > > [ 25.137790] ? mwait_idle+0x35/0x80 > > [ 25.137790] default_idle_call+0x30/0x100 > > [ 25.137790] cpuidle_idle_call+0x12c/0x170 > > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 > > [ 25.137790] do_idle+0x7f/0xd0 > > [ 25.137790] cpu_startup_entry+0x29/0x30 > > [ 25.137790] rest_init+0xcc/0xd0 > > [ 25.137790] start_kernel+0x396/0x5d0 > > [ 25.137790] x86_64_start_reservations+0x18/0x30 > > [ 25.137790] x86_64_start_kernel+0xe7/0xf0 > >
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
Hi Zide, On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: > Date: Wed, 29 May 2024 10:31:21 -0700 > From: "Chen, Zide" > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off > > > > On 5/29/2024 5:46 AM, Igor Mammedov wrote: > > On Tue, 28 May 2024 11:16:59 -0700 > > "Chen, Zide" wrote: > > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: > >>> On Fri, 24 May 2024 13:00:14 -0700 > >>> Zide Chen wrote: > >>> > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. > >>> > >>> this is missing proper description how do you trigger issue > >>> with reproducer and detailed description why guest sees MWAIT > >>> when it's not supported by host. > >> > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > > it's bette to provide full QEMU CLI and host/guest kernels used and what > > hardware was used if it's relevant so others can reproduce problem. > > I ever reproduced this on an older Intel Icelake machine, a > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic > issue, not specific to particular models. > > For the CLI, I think the only command line options that matter are > -overcommit cpu-pm=on: to set enable_cpu_pm > -cpu host: so that cpu->max_features is set > > For QEMU version, as long as it's after this commit: 662175b91ff2 > ("i386: reorder call to cpu_exec_realizefn") > > The guest fails to boot: > > [ 24.825568] smpboot: x86: Booting SMP configuration: > [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 > #13 #14 #15 #17 > [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 > #136 #137 #138 #139 #140 #141 #142 #143 #145 > [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 > [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: > > [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: > 0004 > [ 25.137790] RBP: R08: 0005da4709cb R09: > 0001 > [ 25.137790] R10: 5da4 R11: 0009 R12: > > [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: > 00093ff0 > [ 25.137790] FS: () GS:97f1ade0() > knlGS: > [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 > [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: > 00770ef0 > [ 25.137790] DR0: DR1: DR2: > > [ 25.137790] DR3: DR6: 07f0 DR7: > 0400 > [ 25.137790] PKRU: 5554 > [ 25.137790] Call Trace: > [ 25.137790] > [ 25.137790] ? die+0x37/0x90 > [ 25.137790] ? do_trap+0xe3/0x110 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] ? do_error_trap+0x6a/0x90 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] ? exc_invalid_op+0x52/0x70 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 > [ 25.137790] ? mwait_idle+0x35/0x80 > [ 25.137790] default_idle_call+0x30/0x100 > [ 25.137790] cpuidle_idle_call+0x12c/0x170 > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 > [ 25.137790] do_idle+0x7f/0xd0 > [ 25.137790] cpu_startup_entry+0x29/0x30 > [ 25.137790] rest_init+0xcc/0xd0 > [ 25.137790] start_kernel+0x396/0x5d0 > [ 25.137790] x86_64_start_reservations+0x18/0x30 > [ 25.137790] x86_64_start_kernel+0xe7/0xf0 > [ 25.137790] common_startup_64+0x13e/0x148 > [ 25.137790] > [ 25.137790] Modules linked in: > [ 25.137790] --[ end trace ]-- > [ 25.137790] invalid opcode: 2 PREEMPT SMP NOPTI > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > > > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > >> that it doesn't have a chance to check
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On 5/29/2024 5:46 AM, Igor Mammedov wrote: > On Tue, 28 May 2024 11:16:59 -0700 > "Chen, Zide" wrote: > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: >>> On Fri, 24 May 2024 13:00:14 -0700 >>> Zide Chen wrote: >>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't have MWAIT support, the MWAIT/MONITOR feature is advertised to the guest and executing MWAIT/MONITOR on the guest triggers #UD. >>> >>> this is missing proper description how do you trigger issue >>> with reproducer and detailed description why guest sees MWAIT >>> when it's not supported by host. >> >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > it's bette to provide full QEMU CLI and host/guest kernels used and what > hardware was used if it's relevant so others can reproduce problem. I ever reproduced this on an older Intel Icelake machine, a Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic issue, not specific to particular models. For the CLI, I think the only command line options that matter are -overcommit cpu-pm=on: to set enable_cpu_pm -cpu host: so that cpu->max_features is set For QEMU version, as long as it's after this commit: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") The guest fails to boot: [ 24.825568] smpboot: x86: Booting SMP configuration: [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #17 [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #145 [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: 0004 [ 25.137790] RBP: R08: 0005da4709cb R09: 0001 [ 25.137790] R10: 5da4 R11: 0009 R12: [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: 00093ff0 [ 25.137790] FS: () GS:97f1ade0() knlGS: [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: 00770ef0 [ 25.137790] DR0: DR1: DR2: [ 25.137790] DR3: DR6: 07f0 DR7: 0400 [ 25.137790] PKRU: 5554 [ 25.137790] Call Trace: [ 25.137790] [ 25.137790] ? die+0x37/0x90 [ 25.137790] ? do_trap+0xe3/0x110 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? do_error_trap+0x6a/0x90 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? exc_invalid_op+0x52/0x70 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] default_idle_call+0x30/0x100 [ 25.137790] cpuidle_idle_call+0x12c/0x170 [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 [ 25.137790] do_idle+0x7f/0xd0 [ 25.137790] cpu_startup_entry+0x29/0x30 [ 25.137790] rest_init+0xcc/0xd0 [ 25.137790] start_kernel+0x396/0x5d0 [ 25.137790] x86_64_start_reservations+0x18/0x30 [ 25.137790] x86_64_start_kernel+0xe7/0xf0 [ 25.137790] common_startup_64+0x13e/0x148 [ 25.137790] [ 25.137790] Modules linked in: [ 25.137790] --[ end trace ]-- [ 25.137790] invalid opcode: 2 PREEMPT SMP NOPTI [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so >> that it doesn't have a chance to check MWAIT against host features and >> will be advertised to the guest regardless of whether it's supported by >> the host or not. >> >> x86_cpu_realizefn() >> x86_cpu_filter_features() >> cpu_exec_realizefn() >> kvm_cpu_realizefn >> host_cpu_realizefn >> host_cpu_enable_cpu_pm >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; >> >> >> If it's not supported by the host, executing MONITOR or MWAIT >> instructions from the guest triggers #UD, no matter MWAIT_EXITING >> control is set or not. > > If I recall right, kvm was able to emulate mwait/monitor. > So question is why it leads to exception instead? KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I didn't find explicit proof from Intel SDM that #UD exceptions take precedence over MWAIT/MONITOR VM exits, but this is my speculation. For example, in ancient machines which don't support MWAIT yet, the only way it can do is #UD, not MWAIT V
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On Tue, 28 May 2024 11:16:59 -0700 "Chen, Zide" wrote: > On 5/28/2024 2:23 AM, Igor Mammedov wrote: > > On Fri, 24 May 2024 13:00:14 -0700 > > Zide Chen wrote: > > > >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't > >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the > >> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > > > this is missing proper description how do you trigger issue > > with reproducer and detailed description why guest sees MWAIT > > when it's not supported by host. > > If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the it's bette to provide full QEMU CLI and host/guest kernels used and what hardware was used if it's relevant so others can reproduce problem. > following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so > that it doesn't have a chance to check MWAIT against host features and > will be advertised to the guest regardless of whether it's supported by > the host or not. > > x86_cpu_realizefn() > x86_cpu_filter_features() > cpu_exec_realizefn() > kvm_cpu_realizefn > host_cpu_realizefn > host_cpu_enable_cpu_pm > env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; > > > If it's not supported by the host, executing MONITOR or MWAIT > instructions from the guest triggers #UD, no matter MWAIT_EXITING > control is set or not. If I recall right, kvm was able to emulate mwait/monitor. So question is why it leads to exception instead?
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On 5/28/2024 2:23 AM, Igor Mammedov wrote: > On Fri, 24 May 2024 13:00:14 -0700 > Zide Chen wrote: > >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > this is missing proper description how do you trigger issue > with reproducer and detailed description why guest sees MWAIT > when it's not supported by host. If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so that it doesn't have a chance to check MWAIT against host features and will be advertised to the guest regardless of whether it's supported by the host or not. x86_cpu_realizefn() x86_cpu_filter_features() cpu_exec_realizefn() kvm_cpu_realizefn host_cpu_realizefn host_cpu_enable_cpu_pm env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; If it's not supported by the host, executing MONITOR or MWAIT instructions from the guest triggers #UD, no matter MWAIT_EXITING control is set or not.
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On Fri, 24 May 2024 13:00:14 -0700 Zide Chen wrote: > Currently, if running "-overcommit cpu-pm=on" on hosts that don't > have MWAIT support, the MWAIT/MONITOR feature is advertised to the > guest and executing MWAIT/MONITOR on the guest triggers #UD. this is missing proper description how do you trigger issue with reproducer and detailed description why guest sees MWAIT when it's not supported by host. > > V2: > - [PATCH 1]: took Thomas' suggestion for more generic fix > - [PATCH 2/3]: no changes > > Zide Chen (3): > vl: Allow multiple -overcommit commands > target/i386: call cpu_exec_realizefn before x86_cpu_filter_features > target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() > > system/vl.c | 4 ++-- > target/i386/cpu.c | 24 > target/i386/host-cpu.c| 12 > target/i386/kvm/kvm-cpu.c | 12 +--- > 4 files changed, 23 insertions(+), 29 deletions(-) >
[PATCH V2 0/3] improve -overcommit cpu-pm=on|off
Currently, if running "-overcommit cpu-pm=on" on hosts that don't have MWAIT support, the MWAIT/MONITOR feature is advertised to the guest and executing MWAIT/MONITOR on the guest triggers #UD. V2: - [PATCH 1]: took Thomas' suggestion for more generic fix - [PATCH 2/3]: no changes Zide Chen (3): vl: Allow multiple -overcommit commands target/i386: call cpu_exec_realizefn before x86_cpu_filter_features target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn() system/vl.c | 4 ++-- target/i386/cpu.c | 24 target/i386/host-cpu.c| 12 target/i386/kvm/kvm-cpu.c | 12 +--- 4 files changed, 23 insertions(+), 29 deletions(-) -- 2.34.1