Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-06-02 Thread Michael S. Tsirkin
On Thu, May 30, 2024 at 04:49:33PM +0200, Igor Mammedov wrote:
> On Thu, 30 May 2024 21:54:47 +0800
> Zhao Liu  wrote:
> 
> > Hi Zide,
> > 
> > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> > > Date: Wed, 29 May 2024 10:31:21 -0700
> > > From: "Chen, Zide" 
> > > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> > > 
> > > 
> > > 
> > > On 5/29/2024 5:46 AM, Igor Mammedov wrote:  
> > > > On Tue, 28 May 2024 11:16:59 -0700
> > > > "Chen, Zide"  wrote:
> > > >   
> > > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:  
> > > >>> On Fri, 24 May 2024 13:00:14 -0700
> > > >>> Zide Chen  wrote:
> > > >>> 
> > > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> > > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> > > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
> > > >>>
> > > >>> this is missing proper description how do you trigger issue
> > > >>> with reproducer and detailed description why guest sees MWAIT
> > > >>> when it's not supported by host.
> > > >>
> > > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the 
> > > >>  
> > > > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > > > hardware was used if it's relevant so others can reproduce problem.  
> > > 
> > > I ever reproduced this on an older Intel Icelake machine, a
> > > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> > > issue, not specific to particular models.
> > > 
> > > For the CLI, I think the only command line options that matter are
> > >  -overcommit cpu-pm=on: to set enable_cpu_pm
> > >  -cpu host: so that cpu->max_features is set
> > > 
> > > For QEMU version, as long as it's after this commit: 662175b91ff2
> > > ("i386: reorder call to cpu_exec_realizefn")
> > > 
> > > The guest fails to boot:
> > > 
> > > [ 24.825568] smpboot: x86: Booting SMP configuration:
> > > [ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> > > #13 #14 #15 #17
> > > [ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> > > #136 #137 #138 #139 #140 #141 #142 #143 #145
> > > [ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
> > > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> > > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> > > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> > > [ 25.137790] RSP: :91403e70 EFLAGS: 00010046
> > > [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:
> > > 
> > > [ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
> > > 0004
> > > [ 25.137790] RBP:  R08: 0005da4709cb R09:
> > > 0001
> > > [ 25.137790] R10: 5da4 R11: 0009 R12:
> > > 
> > > [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
> > > 00093ff0
> > > [ 25.137790] FS: () GS:97f1ade0()
> > > knlGS:
> > > [ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
> > > [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
> > > 00770ef0
> > > [ 25.137790] DR0:  DR1:  DR2:
> > > 
> > > [ 25.137790] DR3:  DR6: 07f0 DR7:
> > > 0400
> > > [ 25.137790] PKRU: 5554
> > > [ 25.137790] Call Trace:
> > > [ 25.137790] 
> > > [ 25.137790] ? die+0x37/0x90
> > > [ 25.137790] ? do_trap+0xe3/0x110
> > > [ 25.137790] ? mwait_idle+0x35/0x80
> > > [ 25.137790] ? do_error_trap+0x6a/0x90
> > > [ 25.137790] ? mwait_idle+0x35/0x80
> > > [ 25.137790] ? exc_invalid_op+0x52/0x70
> > > [ 25.137790] ? mwait_idle+0x35/0x80
> > > [ 25.137790] ? asm_exc_inval

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-30 Thread Chen, Zide



On 5/30/2024 6:54 AM, Zhao Liu wrote:
> Hi Zide,
> 
> On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
>> Date: Wed, 29 May 2024 10:31:21 -0700
>> From: "Chen, Zide" 
>> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
>>
>>
>>
>> On 5/29/2024 5:46 AM, Igor Mammedov wrote:
>>> On Tue, 28 May 2024 11:16:59 -0700
>>> "Chen, Zide"  wrote:
>>>
>>>> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
>>>>> On Fri, 24 May 2024 13:00:14 -0700
>>>>> Zide Chen  wrote:
>>>>>   
>>>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>>>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>>>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
>>>>>
>>>>> this is missing proper description how do you trigger issue
>>>>> with reproducer and detailed description why guest sees MWAIT
>>>>> when it's not supported by host.  
>>>>
>>>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
>>> it's bette to provide full QEMU CLI and host/guest kernels used and what
>>> hardware was used if it's relevant so others can reproduce problem.
>>
>> I ever reproduced this on an older Intel Icelake machine, a
>> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
>> issue, not specific to particular models.
>>
>> For the CLI, I think the only command line options that matter are
>>  -overcommit cpu-pm=on: to set enable_cpu_pm
>>  -cpu host: so that cpu->max_features is set
>>
>> For QEMU version, as long as it's after this commit: 662175b91ff2
>> ("i386: reorder call to cpu_exec_realizefn")
>>
>> The guest fails to boot:
>>
>> [ 24.825568] smpboot: x86: Booting SMP configuration:
>> [ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
>> #13 #14 #15 #17
>> [ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
>> #136 #137 #138 #139 #140 #141 #142 #143 #145
>> [ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
>> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
>> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
>> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
>> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
>> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
>> [ 25.137790] RSP: :91403e70 EFLAGS: 00010046
>> [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:
>> 
>> [ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
>> 0004
>> [ 25.137790] RBP:  R08: 0005da4709cb R09:
>> 0001
>> [ 25.137790] R10: 5da4 R11: 0009 R12:
>> 
>> [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
>> 00093ff0
>> [ 25.137790] FS: () GS:97f1ade0()
>> knlGS:
>> [ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
>> [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
>> 00770ef0
>> [ 25.137790] DR0:  DR1:  DR2:
>> 
>> [ 25.137790] DR3:  DR6: 07f0 DR7:
>> 0400
>> [ 25.137790] PKRU: 5554
>> [ 25.137790] Call Trace:
>> [ 25.137790] 
>> [ 25.137790] ? die+0x37/0x90
>> [ 25.137790] ? do_trap+0xe3/0x110
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? do_error_trap+0x6a/0x90
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? exc_invalid_op+0x52/0x70
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] default_idle_call+0x30/0x100
>> [ 25.137790] cpuidle_idle_call+0x12c/0x170
>> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
>> [ 25.137790] do_idle+0x7f/0xd0
>> [ 25.137790] cpu_startup_entry+0x29/0x30
>> [ 25.137790] rest_init+0xcc/0xd0
>> [ 25.137790] start_kernel+0x396/0x5d0
>> [ 25.137790] x86_64_start_reservations+0x18/0x30
>> [ 25.137790] x86_64_start_kernel+0xe7/0xf0
>> [ 25.137790] common_startup_64+0x13e/0x148
>> [ 25.137790] 
>> [ 25.137790] Modules linked in:
>> [ 25.137790] --[ end trace 0

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-30 Thread Sean Christopherson
On Thu, May 30, 2024, Igor Mammedov wrote:
> On Thu, 30 May 2024 21:54:47 +0800 Zhao Liu  wrote:

...

> > > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> > > >> that it doesn't have a chance to check MWAIT against host features and
> > > >> will be advertised to the guest regardless of whether it's supported by
> > > >> the host or not.
> > > >>
> > > >> x86_cpu_realizefn()
> > > >>   x86_cpu_filter_features()
> > > >>   cpu_exec_realizefn()
> > > >> kvm_cpu_realizefn
> > > >>   host_cpu_realizefn
> > > >> host_cpu_enable_cpu_pm
> > > >>   env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> > > >>
> > > >>
> > > >> If it's not supported by the host, executing MONITOR or MWAIT
> > > >> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> > > >> control is set or not.  
> > > > 
> > > > If I recall right, kvm was able to emulate mwait/monitor.
> > > > So question is why it leads to exception instead?

Because KVM doesn't emulated MONITOR/MWAIT on #UD.

> > > KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
> > > didn't find explicit proof from Intel SDM that #UD exceptions take
> > > precedence over MWAIT/MONITOR VM exits, but this is my speculation.

Yeah, typically #UD takes priority over VM-Exit interception checks.  AMD's APM
is much more explicit and states that all exceptions are checked on 
MONITOR/MWAIT
before the interception check.

> > > For example, in ancient machines which don't support MWAIT yet, the only
> > > way it can do is #UD, not MWAIT VM exit?  

Not really relevant, because such CPUs wouldn't have MWAIT-exiting.

> > For the Host which doesn't support MWAIT, it shouldn't have the VMX
> > control bit for mwait exit either, right?
> > 
> > Could you pls check this on your machine? If VMX doesn't support this
> > exit event, then triggering an exception will make sense.
> 
> My assumption (probably wrong) was that KVM would emulate mwait if it's 
> unavailable,

Nope.  In order to limit the attack surface of the emulator on modern CPUs, KVM
only emulates select instructions in response to a #UD.

But even if KVM did emulate MONITOR/MWAIT on #UD, this is inarguably a QEMU bug,
e.g. QEMU will effectively coerce the guest into using a idle-polling mechanism.



Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-30 Thread Igor Mammedov
On Thu, 30 May 2024 21:54:47 +0800
Zhao Liu  wrote:

> Hi Zide,
> 
> On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> > Date: Wed, 29 May 2024 10:31:21 -0700
> > From: "Chen, Zide" 
> > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> > 
> > 
> > 
> > On 5/29/2024 5:46 AM, Igor Mammedov wrote:  
> > > On Tue, 28 May 2024 11:16:59 -0700
> > > "Chen, Zide"  wrote:
> > >   
> > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:  
> > >>> On Fri, 24 May 2024 13:00:14 -0700
> > >>> Zide Chen  wrote:
> > >>> 
> > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
> > >>>
> > >>> this is missing proper description how do you trigger issue
> > >>> with reproducer and detailed description why guest sees MWAIT
> > >>> when it's not supported by host.
> > >>
> > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the  
> > > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > > hardware was used if it's relevant so others can reproduce problem.  
> > 
> > I ever reproduced this on an older Intel Icelake machine, a
> > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> > issue, not specific to particular models.
> > 
> > For the CLI, I think the only command line options that matter are
> >  -overcommit cpu-pm=on: to set enable_cpu_pm
> >  -cpu host: so that cpu->max_features is set
> > 
> > For QEMU version, as long as it's after this commit: 662175b91ff2
> > ("i386: reorder call to cpu_exec_realizefn")
> > 
> > The guest fails to boot:
> > 
> > [ 24.825568] smpboot: x86: Booting SMP configuration:
> > [ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> > #13 #14 #15 #17
> > [ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> > #136 #137 #138 #139 #140 #141 #142 #143 #145
> > [ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
> > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> > [ 25.137790] RSP: :91403e70 EFLAGS: 00010046
> > [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:
> > 
> > [ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
> > 0004
> > [ 25.137790] RBP:  R08: 0005da4709cb R09:
> > 0001
> > [ 25.137790] R10: 5da4 R11: 0009 R12:
> > 
> > [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
> > 00093ff0
> > [ 25.137790] FS: () GS:97f1ade0()
> > knlGS:
> > [ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
> > [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
> > 00770ef0
> > [ 25.137790] DR0:  DR1:  DR2:
> > 
> > [ 25.137790] DR3:  DR6: 07f0 DR7:
> > 0400
> > [ 25.137790] PKRU: 5554
> > [ 25.137790] Call Trace:
> > [ 25.137790] 
> > [ 25.137790] ? die+0x37/0x90
> > [ 25.137790] ? do_trap+0xe3/0x110
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? do_error_trap+0x6a/0x90
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? exc_invalid_op+0x52/0x70
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] default_idle_call+0x30/0x100
> > [ 25.137790] cpuidle_idle_call+0x12c/0x170
> > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
> > [ 25.137790] do_idle+0x7f/0xd0
> > [ 25.137790] cpu_startup_entry+0x29/0x30
> > [ 25.137790] rest_init+0xcc/0xd0
> > [ 25.137790] start_kernel+0x396/0x5d0
> > [ 25.137790] x86_64_start_reservations+0x18/0x30
> > [ 25.137790] x86_64_start_kernel+0xe7/0xf0
> > 

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-30 Thread Igor Mammedov
On Thu, 30 May 2024 21:54:47 +0800
Zhao Liu  wrote:

> Hi Zide,
> 
> On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> > Date: Wed, 29 May 2024 10:31:21 -0700
> > From: "Chen, Zide" 
> > Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> > 
> > 
> > 
> > On 5/29/2024 5:46 AM, Igor Mammedov wrote:  
> > > On Tue, 28 May 2024 11:16:59 -0700
> > > "Chen, Zide"  wrote:
> > >   
> > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:  
> > >>> On Fri, 24 May 2024 13:00:14 -0700
> > >>> Zide Chen  wrote:
> > >>> 
> > >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> > >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> > >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
> > >>>
> > >>> this is missing proper description how do you trigger issue
> > >>> with reproducer and detailed description why guest sees MWAIT
> > >>> when it's not supported by host.
> > >>
> > >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the  
> > > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > > hardware was used if it's relevant so others can reproduce problem.  
> > 
> > I ever reproduced this on an older Intel Icelake machine, a
> > Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> > issue, not specific to particular models.
> > 
> > For the CLI, I think the only command line options that matter are
> >  -overcommit cpu-pm=on: to set enable_cpu_pm
> >  -cpu host: so that cpu->max_features is set
> > 
> > For QEMU version, as long as it's after this commit: 662175b91ff2
> > ("i386: reorder call to cpu_exec_realizefn")
> > 
> > The guest fails to boot:
> > 
> > [ 24.825568] smpboot: x86: Booting SMP configuration:
> > [ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> > #13 #14 #15 #17
> > [ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> > #136 #137 #138 #139 #140 #141 #142 #143 #145
> > [ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
> > [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> > [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> > [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> > [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> > 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> > [ 25.137790] RSP: :91403e70 EFLAGS: 00010046
> > [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:
> > 
> > [ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
> > 0004
> > [ 25.137790] RBP:  R08: 0005da4709cb R09:
> > 0001
> > [ 25.137790] R10: 5da4 R11: 0009 R12:
> > 
> > [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
> > 00093ff0
> > [ 25.137790] FS: () GS:97f1ade0()
> > knlGS:
> > [ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
> > [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
> > 00770ef0
> > [ 25.137790] DR0:  DR1:  DR2:
> > 
> > [ 25.137790] DR3:  DR6: 07f0 DR7:
> > 0400
> > [ 25.137790] PKRU: 5554
> > [ 25.137790] Call Trace:
> > [ 25.137790] 
> > [ 25.137790] ? die+0x37/0x90
> > [ 25.137790] ? do_trap+0xe3/0x110
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? do_error_trap+0x6a/0x90
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? exc_invalid_op+0x52/0x70
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
> > [ 25.137790] ? mwait_idle+0x35/0x80
> > [ 25.137790] default_idle_call+0x30/0x100
> > [ 25.137790] cpuidle_idle_call+0x12c/0x170
> > [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
> > [ 25.137790] do_idle+0x7f/0xd0
> > [ 25.137790] cpu_startup_entry+0x29/0x30
> > [ 25.137790] rest_init+0xcc/0xd0
> > [ 25.137790] start_kernel+0x396/0x5d0
> > [ 25.137790] x86_64_start_reservations+0x18/0x30
> > [ 25.137790] x86_64_start_kernel+0xe7/0xf0
> > 

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-30 Thread Zhao Liu
Hi Zide,

On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
> Date: Wed, 29 May 2024 10:31:21 -0700
> From: "Chen, Zide" 
> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
> 
> 
> 
> On 5/29/2024 5:46 AM, Igor Mammedov wrote:
> > On Tue, 28 May 2024 11:16:59 -0700
> > "Chen, Zide"  wrote:
> > 
> >> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> >>> On Fri, 24 May 2024 13:00:14 -0700
> >>> Zide Chen  wrote:
> >>>   
> >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
> >>>
> >>> this is missing proper description how do you trigger issue
> >>> with reproducer and detailed description why guest sees MWAIT
> >>> when it's not supported by host.  
> >>
> >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
> > it's bette to provide full QEMU CLI and host/guest kernels used and what
> > hardware was used if it's relevant so others can reproduce problem.
> 
> I ever reproduced this on an older Intel Icelake machine, a
> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
> issue, not specific to particular models.
> 
> For the CLI, I think the only command line options that matter are
>  -overcommit cpu-pm=on: to set enable_cpu_pm
>  -cpu host: so that cpu->max_features is set
> 
> For QEMU version, as long as it's after this commit: 662175b91ff2
> ("i386: reorder call to cpu_exec_realizefn")
> 
> The guest fails to boot:
> 
> [ 24.825568] smpboot: x86: Booting SMP configuration:
> [ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
> #13 #14 #15 #17
> [ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
> #136 #137 #138 #139 #140 #141 #142 #143 #145
> [ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> [ 25.137790] RSP: :91403e70 EFLAGS: 00010046
> [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:
> 
> [ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
> 0004
> [ 25.137790] RBP:  R08: 0005da4709cb R09:
> 0001
> [ 25.137790] R10: 5da4 R11: 0009 R12:
> 
> [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
> 00093ff0
> [ 25.137790] FS: () GS:97f1ade0()
> knlGS:
> [ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
> [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
> 00770ef0
> [ 25.137790] DR0:  DR1:  DR2:
> 
> [ 25.137790] DR3:  DR6: 07f0 DR7:
> 0400
> [ 25.137790] PKRU: 5554
> [ 25.137790] Call Trace:
> [ 25.137790] 
> [ 25.137790] ? die+0x37/0x90
> [ 25.137790] ? do_trap+0xe3/0x110
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] ? do_error_trap+0x6a/0x90
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] ? exc_invalid_op+0x52/0x70
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
> [ 25.137790] ? mwait_idle+0x35/0x80
> [ 25.137790] default_idle_call+0x30/0x100
> [ 25.137790] cpuidle_idle_call+0x12c/0x170
> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
> [ 25.137790] do_idle+0x7f/0xd0
> [ 25.137790] cpu_startup_entry+0x29/0x30
> [ 25.137790] rest_init+0xcc/0xd0
> [ 25.137790] start_kernel+0x396/0x5d0
> [ 25.137790] x86_64_start_reservations+0x18/0x30
> [ 25.137790] x86_64_start_kernel+0xe7/0xf0
> [ 25.137790] common_startup_64+0x13e/0x148
> [ 25.137790] 
> [ 25.137790] Modules linked in:
> [ 25.137790] --[ end trace  ]--
> [ 25.137790] invalid opcode:  2 PREEMPT SMP NOPTI
> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
> 
> > 
> >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> >> that it doesn't have a chance to check 

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-29 Thread Chen, Zide



On 5/29/2024 5:46 AM, Igor Mammedov wrote:
> On Tue, 28 May 2024 11:16:59 -0700
> "Chen, Zide"  wrote:
> 
>> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
>>> On Fri, 24 May 2024 13:00:14 -0700
>>> Zide Chen  wrote:
>>>   
 Currently, if running "-overcommit cpu-pm=on" on hosts that don't
 have MWAIT support, the MWAIT/MONITOR feature is advertised to the
 guest and executing MWAIT/MONITOR on the guest triggers #UD.  
>>>
>>> this is missing proper description how do you trigger issue
>>> with reproducer and detailed description why guest sees MWAIT
>>> when it's not supported by host.  
>>
>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
> it's bette to provide full QEMU CLI and host/guest kernels used and what
> hardware was used if it's relevant so others can reproduce problem.

I ever reproduced this on an older Intel Icelake machine, a
Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
issue, not specific to particular models.

For the CLI, I think the only command line options that matter are
 -overcommit cpu-pm=on: to set enable_cpu_pm
 -cpu host: so that cpu->max_features is set

For QEMU version, as long as it's after this commit: 662175b91ff2
("i386: reorder call to cpu_exec_realizefn")

The guest fails to boot:

[ 24.825568] smpboot: x86: Booting SMP configuration:
[ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
#13 #14 #15 #17
[ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
#136 #137 #138 #139 #140 #141 #142 #143 #145
[ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
[ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
[ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
[ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
[ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
[ 25.137790] RSP: :91403e70 EFLAGS: 00010046
[ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:

[ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
0004
[ 25.137790] RBP:  R08: 0005da4709cb R09:
0001
[ 25.137790] R10: 5da4 R11: 0009 R12:

[ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
00093ff0
[ 25.137790] FS: () GS:97f1ade0()
knlGS:
[ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
[ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
00770ef0
[ 25.137790] DR0:  DR1:  DR2:

[ 25.137790] DR3:  DR6: 07f0 DR7:
0400
[ 25.137790] PKRU: 5554
[ 25.137790] Call Trace:
[ 25.137790] 
[ 25.137790] ? die+0x37/0x90
[ 25.137790] ? do_trap+0xe3/0x110
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? do_error_trap+0x6a/0x90
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? exc_invalid_op+0x52/0x70
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] default_idle_call+0x30/0x100
[ 25.137790] cpuidle_idle_call+0x12c/0x170
[ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
[ 25.137790] do_idle+0x7f/0xd0
[ 25.137790] cpu_startup_entry+0x29/0x30
[ 25.137790] rest_init+0xcc/0xd0
[ 25.137790] start_kernel+0x396/0x5d0
[ 25.137790] x86_64_start_reservations+0x18/0x30
[ 25.137790] x86_64_start_kernel+0xe7/0xf0
[ 25.137790] common_startup_64+0x13e/0x148
[ 25.137790] 
[ 25.137790] Modules linked in:
[ 25.137790] --[ end trace  ]--
[ 25.137790] invalid opcode:  2 PREEMPT SMP NOPTI
[ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
[ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8

> 
>> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
>> that it doesn't have a chance to check MWAIT against host features and
>> will be advertised to the guest regardless of whether it's supported by
>> the host or not.
>>
>> x86_cpu_realizefn()
>>   x86_cpu_filter_features()
>>   cpu_exec_realizefn()
>> kvm_cpu_realizefn
>>   host_cpu_realizefn
>> host_cpu_enable_cpu_pm
>>   env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>>
>>
>> If it's not supported by the host, executing MONITOR or MWAIT
>> instructions from the guest triggers #UD, no matter MWAIT_EXITING
>> control is set or not.
> 
> If I recall right, kvm was able to emulate mwait/monitor.
> So question is why it leads to exception instead?

KVM can come to play only iff it can trigger MWAIT/MONITOR VM exits. I
didn't find explicit proof from Intel SDM that #UD exceptions take
precedence over MWAIT/MONITOR VM exits, but this is my speculation. For
example, in ancient machines which don't support MWAIT yet, the only way
it can do is #UD, not MWAIT V

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-29 Thread Igor Mammedov
On Tue, 28 May 2024 11:16:59 -0700
"Chen, Zide"  wrote:

> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> > On Fri, 24 May 2024 13:00:14 -0700
> > Zide Chen  wrote:
> >   
> >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> >> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
> > 
> > this is missing proper description how do you trigger issue
> > with reproducer and detailed description why guest sees MWAIT
> > when it's not supported by host.  
> 
> If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the
it's bette to provide full QEMU CLI and host/guest kernels used and what
hardware was used if it's relevant so others can reproduce problem.

> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
> that it doesn't have a chance to check MWAIT against host features and
> will be advertised to the guest regardless of whether it's supported by
> the host or not.
> 
> x86_cpu_realizefn()
>   x86_cpu_filter_features()
>   cpu_exec_realizefn()
> kvm_cpu_realizefn
>   host_cpu_realizefn
> host_cpu_enable_cpu_pm
>   env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
> 
> 
> If it's not supported by the host, executing MONITOR or MWAIT
> instructions from the guest triggers #UD, no matter MWAIT_EXITING
> control is set or not.

If I recall right, kvm was able to emulate mwait/monitor.
So question is why it leads to exception instead?




Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-28 Thread Chen, Zide



On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> On Fri, 24 May 2024 13:00:14 -0700
> Zide Chen  wrote:
> 
>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
> 
> this is missing proper description how do you trigger issue
> with reproducer and detailed description why guest sees MWAIT
> when it's not supported by host.

If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the
following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
that it doesn't have a chance to check MWAIT against host features and
will be advertised to the guest regardless of whether it's supported by
the host or not.

x86_cpu_realizefn()
  x86_cpu_filter_features()
  cpu_exec_realizefn()
kvm_cpu_realizefn
  host_cpu_realizefn
host_cpu_enable_cpu_pm
  env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;


If it's not supported by the host, executing MONITOR or MWAIT
instructions from the guest triggers #UD, no matter MWAIT_EXITING
control is set or not.



Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-28 Thread Igor Mammedov
On Fri, 24 May 2024 13:00:14 -0700
Zide Chen  wrote:

> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
> guest and executing MWAIT/MONITOR on the guest triggers #UD.

this is missing proper description how do you trigger issue
with reproducer and detailed description why guest sees MWAIT
when it's not supported by host.

> 
> V2:
> - [PATCH 1]: took Thomas' suggestion for more generic fix
> - [PATCH 2/3]: no changes
> 
> Zide Chen (3):
>   vl: Allow multiple -overcommit commands
>   target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
>   target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()
> 
>  system/vl.c   |  4 ++--
>  target/i386/cpu.c | 24 
>  target/i386/host-cpu.c| 12 
>  target/i386/kvm/kvm-cpu.c | 12 +---
>  4 files changed, 23 insertions(+), 29 deletions(-)
> 




[PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-24 Thread Zide Chen
Currently, if running "-overcommit cpu-pm=on" on hosts that don't
have MWAIT support, the MWAIT/MONITOR feature is advertised to the
guest and executing MWAIT/MONITOR on the guest triggers #UD.

V2:
- [PATCH 1]: took Thomas' suggestion for more generic fix
- [PATCH 2/3]: no changes

Zide Chen (3):
  vl: Allow multiple -overcommit commands
  target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()

 system/vl.c   |  4 ++--
 target/i386/cpu.c | 24 
 target/i386/host-cpu.c| 12 
 target/i386/kvm/kvm-cpu.c | 12 +---
 4 files changed, 23 insertions(+), 29 deletions(-)

-- 
2.34.1