Re: [PATCH V3 0/2] improve -overcommit cpu-pm=on|off

2024-06-05 Thread Chen, Zide



On 6/5/2024 6:49 AM, Igor Mammedov wrote:
> On Mon,  3 Jun 2024 17:02:20 -0700
> Zide Chen  wrote:
> 
>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
>>
>> Typically #UD takes priority over VM-Exit interception checks and
>> KVM doesn't emulate MONITOR/MWAIT. This causes the guest fail to
>> boot.
>>
>> V2:
>> - [PATCH 1]: took Thomas' suggestion for more generic fix
>> - [PATCH 2/3]: no changes
>>
>> V3:
>> - dropped [PATCH 1/3]. Took the simpler approach not to re-order
>>   cpu_exec_realizefn() call.
>> - changed patch title in [PATCH V3 1/2]
>> - don't set CPUID_EXT_MONITOR in kvm_cpu_realizefn() 
> 
> on top of above we should make make
>   -overcommit cpu-pm=on
> to error out if KVM_X86_DISABLE_EXITS_MWAIT is not supported/failed
> 
> if we don't do this user gets false assumption that cpu-pm=on
> works as expected, and instead of effective CPU usage/IPI delivery
> all they get is a storm of mwait exits.
> 

Agree. But seems it's quite complicated. Please refer to the comments on
[PATCH V2 2/3].

We may remove the "-overcommit cpu-pm", and similar to notify-vmexit,
add individual -accel options for flexibility and better cpu-pm control.
But will be over complicated?

KVM_X86_DISABLE_EXITS_MWAIT
KVM_X86_DISABLE_EXITS_HLT
KVM_X86_DISABLE_EXITS_PAUSE
KVM_X86_DISABLE_EXITS_CSTATE


>>
>> Zide Chen (2):
>>   vl: Allow multiple -overcommit commands
>>   target/i386: Advertise MWAIT iff host supports
>>
>>  system/vl.c   |  4 ++--
>>  target/i386/host-cpu.c| 12 
>>  target/i386/kvm/kvm-cpu.c | 11 +--
>>  3 files changed, 11 insertions(+), 16 deletions(-)
>>
> 



Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features

2024-06-05 Thread Chen, Zide



On 6/5/2024 8:07 AM, Igor Mammedov wrote:
> On Mon, 3 Jun 2024 14:29:50 -0700
> "Chen, Zide"  wrote:
> 
>> On 6/3/2024 2:30 AM, Igor Mammedov wrote:
>>> On Sat, 1 Jun 2024 23:26:55 +0800
>>> Zhao Liu  wrote:
>>>   
>>>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:  
>>>>> Date: Fri, 31 May 2024 10:13:47 -0700
>>>>> From: "Chen, Zide" 
>>>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>  x86_cpu_filter_features
>>>>>
>>>>> On 5/30/2024 11:30 PM, Zhao Liu wrote:
>>>>>> Hi Zide,
>>>>>>
>>>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
>>>>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>>>>> From: Zide Chen 
>>>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>>>  x86_cpu_filter_features
>>>>>>> X-Mailer: git-send-email 2.34.1
>>>>>>>
>>>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>>>>> features.  e.g., some accel-specific options may require extra features
>>>>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>>>>> specific realizefn.
>>>>>>>
>>>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>>>>
>>>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>>>>> that it won't expose features not supported by the host.
>>>>>>>
>>>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>>>>> Suggested-by: Xiaoyao Li 
>>>>>>> Signed-off-by: Zide Chen 
>>>>>>> ---
>>>>>>>  target/i386/cpu.c | 24 
>>>>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>>>>
>>>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>>>>> --- a/target/i386/cpu.c
>>>>>>> +++ b/target/i386/cpu.c
>>>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, 
>>>>>>> Error **errp)
>>>>>>>  }
>>>>>>>  }
>>>>>>>  
>>>>>>> +/*
>>>>>>> + * note: the call to the framework needs to happen after feature 
>>>>>>> expansion,
>>>>>>> + * but before the checks/modifications to ucode_rev, mwait, 
>>>>>>> phys_bits.
>>>>>>> + * These may be set by the accel-specific code,
>>>>>>> + * and the results are subsequently checked / assumed in this 
>>>>>>> function.
>>>>>>> + */
>>>>>>> +cpu_exec_realizefn(cs, _err);
>>>>>>> +if (local_err != NULL) {
>>>>>>> +error_propagate(errp, local_err);
>>>>>>> +return;
>>>>>>> +}
>>>>>>> +
>>>>>>>  x86_cpu_filter_features(cpu, cpu->check_cpuid || 
>>>>>>> cpu->enforce_cpuid);
>>>>>>
>>>>>> For your case, which sets cpu-pm=on via overcommit, then
>>>>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>>>>
>>>>>> Such warning is not necessary, because the purpose of overcommit (from
>>>>>> code) is only to support mwait when possible, not to commit to support
>>>>>> mwait in Guest.
>>>>>>
>>>>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>>>>> intended to filter features configured by the user, 
>>>>>
>>>>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>>>>> filter out the MWAIT bit which is set from the overcommit option.
>>>>
>>>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
>>>> bit set by "-overcommit cpu-pm=on". ;-)
>>>>
>&

Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features

2024-06-03 Thread Chen, Zide



On 6/3/2024 2:30 AM, Igor Mammedov wrote:
> On Sat, 1 Jun 2024 23:26:55 +0800
> Zhao Liu  wrote:
> 
>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
>>> Date: Fri, 31 May 2024 10:13:47 -0700
>>> From: "Chen, Zide" 
>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>  x86_cpu_filter_features
>>>
>>> On 5/30/2024 11:30 PM, Zhao Liu wrote:  
>>>> Hi Zide,
>>>>
>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:  
>>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>>> From: Zide Chen 
>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>  x86_cpu_filter_features
>>>>> X-Mailer: git-send-email 2.34.1
>>>>>
>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>>> features.  e.g., some accel-specific options may require extra features
>>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>>> specific realizefn.
>>>>>
>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>>
>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>>> that it won't expose features not supported by the host.
>>>>>
>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>>> Suggested-by: Xiaoyao Li 
>>>>> Signed-off-by: Zide Chen 
>>>>> ---
>>>>>  target/i386/cpu.c | 24 
>>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>>> --- a/target/i386/cpu.c
>>>>> +++ b/target/i386/cpu.c
>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, 
>>>>> Error **errp)
>>>>>  }
>>>>>  }
>>>>>  
>>>>> +/*
>>>>> + * note: the call to the framework needs to happen after feature 
>>>>> expansion,
>>>>> + * but before the checks/modifications to ucode_rev, mwait, 
>>>>> phys_bits.
>>>>> + * These may be set by the accel-specific code,
>>>>> + * and the results are subsequently checked / assumed in this 
>>>>> function.
>>>>> + */
>>>>> +cpu_exec_realizefn(cs, _err);
>>>>> +if (local_err != NULL) {
>>>>> +error_propagate(errp, local_err);
>>>>> +return;
>>>>> +}
>>>>> +
>>>>>  x86_cpu_filter_features(cpu, cpu->check_cpuid || 
>>>>> cpu->enforce_cpuid);  
>>>>
>>>> For your case, which sets cpu-pm=on via overcommit, then
>>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>>
>>>> Such warning is not necessary, because the purpose of overcommit (from
>>>> code) is only to support mwait when possible, not to commit to support
>>>> mwait in Guest.
>>>>
>>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>>> intended to filter features configured by the user,   
>>>
>>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>>> filter out the MWAIT bit which is set from the overcommit option.  
>>
>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
>> bit set by "-overcommit cpu-pm=on". ;-)
>>
>> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
>> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>>   x86_cpu_expand_features()
>>  -> x86_cpu_get_supported_feature_word()
>> -> kvm_arch_get_supported_cpuid()  
>>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>>  is working correctly here!
>>
>> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>>   neither Host's support or previous MWAIT enablement result. This is
>>   the root cause of your issue.
>>
>> Therefore, we should make cpu-pm honor his first MWAIT enableme

Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features

2024-06-03 Thread Chen, Zide



On 6/1/2024 8:26 AM, Zhao Liu wrote:
> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
>> Date: Fri, 31 May 2024 10:13:47 -0700
>> From: "Chen, Zide" 
>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>  x86_cpu_filter_features
>>
>> On 5/30/2024 11:30 PM, Zhao Liu wrote:
>>> Hi Zide,
>>>
>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>> From: Zide Chen 
>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>  x86_cpu_filter_features
>>>> X-Mailer: git-send-email 2.34.1
>>>>
>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>> features.  e.g., some accel-specific options may require extra features
>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>> specific realizefn.
>>>>
>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>
>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>> that it won't expose features not supported by the host.
>>>>
>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>> Suggested-by: Xiaoyao Li 
>>>> Signed-off-by: Zide Chen 
>>>> ---
>>>>  target/i386/cpu.c | 24 
>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>> --- a/target/i386/cpu.c
>>>> +++ b/target/i386/cpu.c
>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, 
>>>> Error **errp)
>>>>  }
>>>>  }
>>>>  
>>>> +/*
>>>> + * note: the call to the framework needs to happen after feature 
>>>> expansion,
>>>> + * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>>>> + * These may be set by the accel-specific code,
>>>> + * and the results are subsequently checked / assumed in this 
>>>> function.
>>>> + */
>>>> +cpu_exec_realizefn(cs, _err);
>>>> +if (local_err != NULL) {
>>>> +error_propagate(errp, local_err);
>>>> +return;
>>>> +}
>>>> +
>>>>  x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
>>>
>>> For your case, which sets cpu-pm=on via overcommit, then
>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>
>>> Such warning is not necessary, because the purpose of overcommit (from
>>> code) is only to support mwait when possible, not to commit to support
>>> mwait in Guest.
>>>
>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>> intended to filter features configured by the user, 
>>
>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>> filter out the MWAIT bit which is set from the overcommit option.
> 
> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> bit set by "-overcommit cpu-pm=on". ;-)
> 
> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>   x86_cpu_expand_features()
>  -> x86_cpu_get_supported_feature_word()
> -> kvm_arch_get_supported_cpuid()
>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>  is working correctly here!
> 
> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>   neither Host's support or previous MWAIT enablement result. This is
>   the root cause of your issue.
> 
> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> instead of repeatly and unconditionally setting the MWAIT bit again in
> host_cpu_enable_cpu_pm().

Yes, we don't need to set CPUID_EXT_MONITOR in host_cpu_enable_cpu_pm().

I'll drop this patch though I still think it makes sense to reorder
cpu_exec_realizefn () call.


> 
> Additionally, I think the code in x86_cpu_realizefn():
>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> has the similar issue because it also should check MWAIT

Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features

2024-05-31 Thread Chen, Zide



On 5/30/2024 11:30 PM, Zhao Liu wrote:
> Hi Zide,
> 
> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
>> Date: Fri, 24 May 2024 13:00:16 -0700
>> From: Zide Chen 
>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>  x86_cpu_filter_features
>> X-Mailer: git-send-email 2.34.1
>>
>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>> features.  e.g., some accel-specific options may require extra features
>> to be enabled, and it's appropriate to expand these features in accel-
>> specific realizefn.
>>
>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>
>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>> that it won't expose features not supported by the host.
>>
>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>> Suggested-by: Xiaoyao Li 
>> Signed-off-by: Zide Chen 
>> ---
>>  target/i386/cpu.c | 24 
>>  target/i386/kvm/kvm-cpu.c |  1 -
>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index bc2dceb647fa..a1c1c785bd2f 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
>> **errp)
>>  }
>>  }
>>  
>> +/*
>> + * note: the call to the framework needs to happen after feature 
>> expansion,
>> + * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>> + * These may be set by the accel-specific code,
>> + * and the results are subsequently checked / assumed in this function.
>> + */
>> +cpu_exec_realizefn(cs, _err);
>> +if (local_err != NULL) {
>> +error_propagate(errp, local_err);
>> +return;
>> +}
>> +
>>  x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
> 
> For your case, which sets cpu-pm=on via overcommit, then
> x86_cpu_filter_features() will complain that mwait is not supported.
> 
> Such warning is not necessary, because the purpose of overcommit (from
> code) is only to support mwait when possible, not to commit to support
> mwait in Guest.
> 
> Additionally, I understand x86_cpu_filter_features() is primarily
> intended to filter features configured by the user, 

Yes, that's why this patches intends to let x86_cpu_filter_features()
filter out the MWAIT bit which is set from the overcommit option.

> and the changes of
> CPUID after x86_cpu_filter_features() should by default be regarded like
> "QEMU knows what it is doing".

Sure, we can add feature bits after x86_cpu_filter_features(), but I
think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
more generic, and actually this is what QEMU did before commit 662175b91ff2.

- Less redundant code. Specifically, no need to call
x86_cpu_get_supported_feature_word() again.
- Potentially there could be other features could be added from the
accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
features need to be checked against the host availability.

> 
> I feel adding a check for the CPUID mwait bit in host_cpu_realizefn()
> is enough, after all, this bit should be present if host supports mwait
> and enable_cpu_pm (in kvm_arch_get_supported_cpuid()).

Besides the above reasons, it seems to me expanding env->features in
host-cpu.c is confusing.

> Thanks,
> Zhao
> 



Re: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()

2024-05-31 Thread Chen, Zide



On 5/30/2024 11:53 PM, Zhao Liu wrote:
> On Fri, May 24, 2024 at 01:00:17PM -0700, Zide Chen wrote:
>> Date: Fri, 24 May 2024 13:00:17 -0700
>> From: Zide Chen 
>> Subject: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into
>>  kvm_cpu_realizefn()
>> X-Mailer: git-send-email 2.34.1
>>
>> It seems not a good idea to expand features in host_cpu_realizefn,
>> which is for host CPU only. 
> 
> It is restricted by max_features and should be part of the "max" CPU,
> and the current target/i386/host-cpu.c should only serve the “host” CPU.

Yes, since this file only serves for "host" CPU, that's why I proposed
to move this code to kvm_cpu_realizefn().

> 
>> Additionally, cpu-pm option is KVM
>> specific, and it's cleaner to put it in kvm_cpu_realizefn(), together
>> with the WAITPKG code.
>>
>> Fixes: f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using 
>> AccelCPUClass")
>> Signed-off-by: Zide Chen 
>> ---
>>  target/i386/host-cpu.c| 12 
>>  target/i386/kvm/kvm-cpu.c | 11 +--
>>  2 files changed, 9 insertions(+), 14 deletions(-)
>>
>> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
>> index 280e427c017c..8b8bf5afeccf 100644
>> --- a/target/i386/host-cpu.c
>> +++ b/target/i386/host-cpu.c
>> @@ -42,15 +42,6 @@ static uint32_t host_cpu_phys_bits(void)
>>  return host_phys_bits;
>>  }
>>  
>> -static void host_cpu_enable_cpu_pm(X86CPU *cpu)
>> -{
>> -CPUX86State *env = >env;
>> -
>> -host_cpuid(5, 0, >mwait.eax, >mwait.ebx,
>> -   >mwait.ecx, >mwait.edx);
>> -env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>> -}
>> -
>>  static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
>>  {
>>  uint32_t host_phys_bits = host_cpu_phys_bits();
>> @@ -83,9 +74,6 @@ bool host_cpu_realizefn(CPUState *cs, Error **errp)
>>  X86CPU *cpu = X86_CPU(cs);
>>  CPUX86State *env = >env;
>>  
>> -if (cpu->max_features && enable_cpu_pm) {
>> -host_cpu_enable_cpu_pm(cpu);
>> -}
>>  if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
>>  uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu);
>>  
>> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
>> index 3adcedf0dbc3..197c892da89a 100644
>> --- a/target/i386/kvm/kvm-cpu.c
>> +++ b/target/i386/kvm/kvm-cpu.c
>> @@ -64,9 +64,16 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
>>   *   cpu_common_realizefn() (via xcc->parent_realize)
>>   */
>>  if (cpu->max_features) {
>> -if (enable_cpu_pm && kvm_has_waitpkg()) {
>> -env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
>> +if (enable_cpu_pm) {
>> +if (kvm_has_waitpkg()) {
>> +env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
>> +}
> 
> If you agree with my comment in patch 2, here we need a mwait bit check.

I still think that take advantaged of x86_cpu_filter_features() to check
host availability is a better choice.

> 
>> +host_cpuid(5, 0, >mwait.eax, >mwait.ebx,
>> +   >mwait.ecx, >mwait.edx);
>> +env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>>  }
>> +
>>  if (cpu->ucode_rev == 0) {
>>  cpu->ucode_rev =
>>  kvm_arch_get_supported_msr_feature(kvm_state,
>> -- 
>> 2.34.1
>>
>>



Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-30 Thread Chen, Zide



On 5/30/2024 6:54 AM, Zhao Liu wrote:
> Hi Zide,
> 
> On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote:
>> Date: Wed, 29 May 2024 10:31:21 -0700
>> From: "Chen, Zide" 
>> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
>>
>>
>>
>> On 5/29/2024 5:46 AM, Igor Mammedov wrote:
>>> On Tue, 28 May 2024 11:16:59 -0700
>>> "Chen, Zide"  wrote:
>>>
>>>> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
>>>>> On Fri, 24 May 2024 13:00:14 -0700
>>>>> Zide Chen  wrote:
>>>>>   
>>>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>>>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>>>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
>>>>>
>>>>> this is missing proper description how do you trigger issue
>>>>> with reproducer and detailed description why guest sees MWAIT
>>>>> when it's not supported by host.  
>>>>
>>>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
>>> it's bette to provide full QEMU CLI and host/guest kernels used and what
>>> hardware was used if it's relevant so others can reproduce problem.
>>
>> I ever reproduced this on an older Intel Icelake machine, a
>> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
>> issue, not specific to particular models.
>>
>> For the CLI, I think the only command line options that matter are
>>  -overcommit cpu-pm=on: to set enable_cpu_pm
>>  -cpu host: so that cpu->max_features is set
>>
>> For QEMU version, as long as it's after this commit: 662175b91ff2
>> ("i386: reorder call to cpu_exec_realizefn")
>>
>> The guest fails to boot:
>>
>> [ 24.825568] smpboot: x86: Booting SMP configuration:
>> [ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
>> #13 #14 #15 #17
>> [ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
>> #136 #137 #138 #139 #140 #141 #142 #143 #145
>> [ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
>> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
>> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
>> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
>> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
>> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
>> [ 25.137790] RSP: :91403e70 EFLAGS: 00010046
>> [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:
>> 
>> [ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
>> 0004
>> [ 25.137790] RBP:  R08: 0005da4709cb R09:
>> 0001
>> [ 25.137790] R10: 5da4 R11: 0009 R12:
>> 
>> [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
>> 00093ff0
>> [ 25.137790] FS: () GS:97f1ade0()
>> knlGS:
>> [ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
>> [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
>> 00770ef0
>> [ 25.137790] DR0:  DR1:  DR2:
>> 
>> [ 25.137790] DR3:  DR6: 07f0 DR7:
>> 0400
>> [ 25.137790] PKRU: 5554
>> [ 25.137790] Call Trace:
>> [ 25.137790] 
>> [ 25.137790] ? die+0x37/0x90
>> [ 25.137790] ? do_trap+0xe3/0x110
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? do_error_trap+0x6a/0x90
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? exc_invalid_op+0x52/0x70
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
>> [ 25.137790] ? mwait_idle+0x35/0x80
>> [ 25.137790] default_idle_call+0x30/0x100
>> [ 25.137790] cpuidle_idle_call+0x12c/0x170
>> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
>> [ 25.137790] do_idle+0x7f/0xd0
>> [ 25.137790] cpu_startup_entry+0x29/0x30
>> [ 25.137790] rest_init+0xcc/0xd0
>> [ 25.137790] start_kernel+0x396/0x5d0
>> [ 25.137790] x86_64_start_reservations+0x18/0x30
>> [ 25.137790] x86_64_start_kernel+0xe7/0xf0
>> [ 25.137790] common_startup_64+0x13e/0x148
>> [ 25.137790] 
>> [ 25.137790] Modules linked in:
>> [ 25.137790] --[ end trace  ]--
>&

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-29 Thread Chen, Zide



On 5/29/2024 5:46 AM, Igor Mammedov wrote:
> On Tue, 28 May 2024 11:16:59 -0700
> "Chen, Zide"  wrote:
> 
>> On 5/28/2024 2:23 AM, Igor Mammedov wrote:
>>> On Fri, 24 May 2024 13:00:14 -0700
>>> Zide Chen  wrote:
>>>   
>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD.  
>>>
>>> this is missing proper description how do you trigger issue
>>> with reproducer and detailed description why guest sees MWAIT
>>> when it's not supported by host.  
>>
>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the
> it's bette to provide full QEMU CLI and host/guest kernels used and what
> hardware was used if it's relevant so others can reproduce problem.

I ever reproduced this on an older Intel Icelake machine, a
Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic
issue, not specific to particular models.

For the CLI, I think the only command line options that matter are
 -overcommit cpu-pm=on: to set enable_cpu_pm
 -cpu host: so that cpu->max_features is set

For QEMU version, as long as it's after this commit: 662175b91ff2
("i386: reorder call to cpu_exec_realizefn")

The guest fails to boot:

[ 24.825568] smpboot: x86: Booting SMP configuration:
[ 24.826377]  node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12
#13 #14 #15 #17
[ 24.985799]  node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135
#136 #137 #138 #139 #140 #141 #142 #143 #145
[ 25.136955] invalid opcode:  1 PREEMPT SMP NOPTI
[ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2
[ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04
[ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
[ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8
[ 25.137790] RSP: :91403e70 EFLAGS: 00010046
[ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX:

[ 25.137790] RDX:  RSI: 97f1ade21b20 RDI:
0004
[ 25.137790] RBP:  R08: 0005da4709cb R09:
0001
[ 25.137790] R10: 5da4 R11: 0009 R12:

[ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15:
00093ff0
[ 25.137790] FS: () GS:97f1ade0()
knlGS:
[ 25.137790] CS: 0010 DS:  ES:  CR0: 80050033
[ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4:
00770ef0
[ 25.137790] DR0:  DR1:  DR2:

[ 25.137790] DR3:  DR6: 07f0 DR7:
0400
[ 25.137790] PKRU: 5554
[ 25.137790] Call Trace:
[ 25.137790] 
[ 25.137790] ? die+0x37/0x90
[ 25.137790] ? do_trap+0xe3/0x110
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? do_error_trap+0x6a/0x90
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? exc_invalid_op+0x52/0x70
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] ? asm_exc_invalid_op+0x1a/0x20
[ 25.137790] ? mwait_idle+0x35/0x80
[ 25.137790] default_idle_call+0x30/0x100
[ 25.137790] cpuidle_idle_call+0x12c/0x170
[ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0
[ 25.137790] do_idle+0x7f/0xd0
[ 25.137790] cpu_startup_entry+0x29/0x30
[ 25.137790] rest_init+0xcc/0xd0
[ 25.137790] start_kernel+0x396/0x5d0
[ 25.137790] x86_64_start_reservations+0x18/0x30
[ 25.137790] x86_64_start_kernel+0xe7/0xf0
[ 25.137790] common_startup_64+0x13e/0x148
[ 25.137790] 
[ 25.137790] Modules linked in:
[ 25.137790] --[ end trace  ]--
[ 25.137790] invalid opcode:  2 PREEMPT SMP NOPTI
[ 25.137790] RIP: 0010:mwait_idle+0x35/0x80
[ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15
47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8

> 
>> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
>> that it doesn't have a chance to check MWAIT against host features and
>> will be advertised to the guest regardless of whether it's supported by
>> the host or not.
>>
>> x86_cpu_realizefn()
>>   x86_cpu_filter_features()
>>   cpu_exec_realizefn()
>> kvm_cpu_realizefn
>>   host_cpu_realizefn
>> host_cpu_enable_cpu_pm
>>   env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
>>
>>
>> If it's not supported by the host, executing MONITOR or MWAIT
>> instructions from the guest triggers #UD, no matter MWAIT_EXITING
>> control is set or not.
> 
> If I recall right, kvm was able to emulate mwait/monitor.
> So question is why it leads to exception instead?

KVM can 

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-28 Thread Chen, Zide



On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> On Fri, 24 May 2024 13:00:14 -0700
> Zide Chen  wrote:
> 
>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
> 
> this is missing proper description how do you trigger issue
> with reproducer and detailed description why guest sees MWAIT
> when it's not supported by host.

If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the
following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
that it doesn't have a chance to check MWAIT against host features and
will be advertised to the guest regardless of whether it's supported by
the host or not.

x86_cpu_realizefn()
  x86_cpu_filter_features()
  cpu_exec_realizefn()
kvm_cpu_realizefn
  host_cpu_realizefn
host_cpu_enable_cpu_pm
  env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;


If it's not supported by the host, executing MONITOR or MWAIT
instructions from the guest triggers #UD, no matter MWAIT_EXITING
control is set or not.



Re: [PATCH 1/3] vl: Allow multiple -overcommit commands

2024-05-21 Thread Chen, Zide



On 5/20/2024 10:16 PM, Thomas Huth wrote:
> On 21/05/2024 07.08, Thomas Huth wrote:
>> On 20/05/2024 19.47, Zide Chen wrote:
>>> Both cpu-pm and mem-lock are related to system resource overcommit, but
>>> they are separate from each other, in terms of how they are realized,
>>> and of course, they are applied to different system resources.
>>>
>>> It's tempting to use separate command lines to specify their behavior.
>>> e.g., in the following example, the cpu-pm command is quietly
>>> overwritten, and it's not easy to notice it without careful inspection.
>>>
>>>    --overcommit mem-lock=on
>>>    --overcommit cpu-pm=on
>>>
>>> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
>>> Signed-off-by: Zide Chen 
>>> ---
>>>   system/vl.c | 8 ++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/system/vl.c b/system/vl.c
>>> index a3eede5fa5b8..ed682643805b 100644
>>> --- a/system/vl.c
>>> +++ b/system/vl.c
>>> @@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv)
>>>   if (!opts) {
>>>   exit(1);
>>>   }
>>> -    enable_mlock = qemu_opt_get_bool(opts, "mem-lock",
>>> false);
>>> -    enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm",
>>> false);
>>> +
>>> +    /* Don't override the -overcommit option if set */
>>> +    enable_mlock = enable_mlock ||
>>> +    qemu_opt_get_bool(opts, "mem-lock", false);
>>> +    enable_cpu_pm = enable_cpu_pm ||
>>> +    qemu_opt_get_bool(opts, "cpu-pm", false);
>>>   break;
>>>   case QEMU_OPTION_compat:
>>>   {
>>
>> Reviewed-by: Thomas Huth 
> 
> Ah, wait, actually, this is a bad idea, too, since now you cannot
> disable an enabled option anymore. Imagine that you have for example
> enabled the option in the config file, and now you'd like to disable it
> on the command line again - you're stuck with the enabled setting in
> that case.
> 
> I think the better solution is to replace the "false" default value at
> the end:
> 
>     enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
>     enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);
> 
> What do you think about this?

This is great! Thank you very much! I will update it in V2.

> 
>  Thomas
> 



Re: [PATCH 1/3] vl: Allow multiple -overcommit commands

2024-05-21 Thread Chen, Zide



On 5/20/2024 10:16 PM, Thomas Huth wrote:
> On 21/05/2024 07.08, Thomas Huth wrote:
>> On 20/05/2024 19.47, Zide Chen wrote:
>>> Both cpu-pm and mem-lock are related to system resource overcommit, but
>>> they are separate from each other, in terms of how they are realized,
>>> and of course, they are applied to different system resources.
>>>
>>> It's tempting to use separate command lines to specify their behavior.
>>> e.g., in the following example, the cpu-pm command is quietly
>>> overwritten, and it's not easy to notice it without careful inspection.
>>>
>>>    --overcommit mem-lock=on
>>>    --overcommit cpu-pm=on
>>>
>>> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
>>> Signed-off-by: Zide Chen 
>>> ---
>>>   system/vl.c | 8 ++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/system/vl.c b/system/vl.c
>>> index a3eede5fa5b8..ed682643805b 100644
>>> --- a/system/vl.c
>>> +++ b/system/vl.c
>>> @@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv)
>>>   if (!opts) {
>>>   exit(1);
>>>   }
>>> -    enable_mlock = qemu_opt_get_bool(opts, "mem-lock",
>>> false);
>>> -    enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm",
>>> false);
>>> +
>>> +    /* Don't override the -overcommit option if set */
>>> +    enable_mlock = enable_mlock ||
>>> +    qemu_opt_get_bool(opts, "mem-lock", false);
>>> +    enable_cpu_pm = enable_cpu_pm ||
>>> +    qemu_opt_get_bool(opts, "cpu-pm", false);
>>>   break;
>>>   case QEMU_OPTION_compat:
>>>   {
>>
>> Reviewed-by: Thomas Huth 
> 
> Ah, wait, actually, this is a bad idea, too, since now you cannot
> disable an enabled option anymore. Imagine that you have for example
> enabled the option in the config file, and now you'd like to disable it
> on the command line again - you're stuck with the enabled setting in
> that case.
> 
> I think the better solution is to replace the "false" default value at
> the end:
> 
>     enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
>     enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);
> 
> What do you think about this?

Thank you very much! This is a very good idea! I can update it in V2.

>  Thomas
> 



Re: [PATCH 1/3] vl: Allow multiple -overcommit commands

2024-05-21 Thread Chen, Zide



On 5/20/2024 10:16 PM, Thomas Huth wrote:
> On 21/05/2024 07.08, Thomas Huth wrote:
>> On 20/05/2024 19.47, Zide Chen wrote:
>>> Both cpu-pm and mem-lock are related to system resource overcommit, but
>>> they are separate from each other, in terms of how they are realized,
>>> and of course, they are applied to different system resources.
>>>
>>> It's tempting to use separate command lines to specify their behavior.
>>> e.g., in the following example, the cpu-pm command is quietly
>>> overwritten, and it's not easy to notice it without careful inspection.
>>>
>>>    --overcommit mem-lock=on
>>>    --overcommit cpu-pm=on
>>>
>>> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
>>> Signed-off-by: Zide Chen 
>>> ---
>>>   system/vl.c | 8 ++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/system/vl.c b/system/vl.c
>>> index a3eede5fa5b8..ed682643805b 100644
>>> --- a/system/vl.c
>>> +++ b/system/vl.c
>>> @@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv)
>>>   if (!opts) {
>>>   exit(1);
>>>   }
>>> -    enable_mlock = qemu_opt_get_bool(opts, "mem-lock",
>>> false);
>>> -    enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm",
>>> false);
>>> +
>>> +    /* Don't override the -overcommit option if set */
>>> +    enable_mlock = enable_mlock ||
>>> +    qemu_opt_get_bool(opts, "mem-lock", false);
>>> +    enable_cpu_pm = enable_cpu_pm ||
>>> +    qemu_opt_get_bool(opts, "cpu-pm", false);
>>>   break;
>>>   case QEMU_OPTION_compat:
>>>   {
>>
>> Reviewed-by: Thomas Huth 
> 
> Ah, wait, actually, this is a bad idea, too, since now you cannot
> disable an enabled option anymore. Imagine that you have for example
> enabled the option in the config file, and now you'd like to disable it
> on the command line again - you're stuck with the enabled setting in
> that case.
> 
> I think the better solution is to replace the "false" default value at
> the end:
> 
>     enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
>     enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);
> 

> What do you think about this?

Thank you very much! This is a very good idea, and I can update it in V2.

> 
>  Thomas
> 



Re: [PATCH 1/6] target/i386/kvm: Add feature bit definitions for KVM CPUID

2024-04-26 Thread Chen, Zide



On 4/26/2024 3:07 AM, Zhao Liu wrote:
> Add feature definiations for KVM_CPUID_FEATURES in CPUID (
> CPUID[4000_0001].EAX and CPUID[4000_0001].EDX), to get rid of lots of
> offset calculations.
> 
> Signed-off-by: Zhao Liu 
> ---
> v2: Changed the prefix from CPUID_FEAT_KVM_* to CPUID_KVM_*. (Xiaoyao)
> ---
>  hw/i386/kvm/clock.c   |  5 ++---
>  target/i386/cpu.h | 23 +++
>  target/i386/kvm/kvm.c | 28 ++--
>  3 files changed, 39 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 40aa9a32c32c..ce416c05a3d0 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -27,7 +27,6 @@
>  #include "qapi/error.h"
>  
>  #include 
> -#include "standard-headers/asm-x86/kvm_para.h"
>  #include "qom/object.h"
>  
>  #define TYPE_KVM_CLOCK "kvmclock"
> @@ -334,8 +333,8 @@ void kvmclock_create(bool create_always)
>  
>  assert(kvm_enabled());
>  if (create_always ||
> -cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) |
> -   (1ULL << KVM_FEATURE_CLOCKSOURCE2))) {
> +cpu->env.features[FEAT_KVM] & (CPUID_KVM_CLOCK |
> +   CPUID_KVM_CLOCK2)) {

To achieve this purpose, how about doing the alternative to define an
API similar to KVM's guest_pv_has()?

_has() is simpler and clearer than "features[] & CPUID_x",
additionally, this helps to keep the definitions identical to KVM, more
readable and easier for future maintenance.