On 6/3/2024 2:30 AM, Igor Mammedov wrote:
> On Sat, 1 Jun 2024 23:26:55 +0800
> Zhao Liu <zhao1....@intel.com> wrote:
> 
>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
>>> Date: Fri, 31 May 2024 10:13:47 -0700
>>> From: "Chen, Zide" <zide.c...@intel.com>
>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>  x86_cpu_filter_features
>>>
>>> On 5/30/2024 11:30 PM, Zhao Liu wrote:  
>>>> Hi Zide,
>>>>
>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:  
>>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>>> From: Zide Chen <zide.c...@intel.com>
>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>  x86_cpu_filter_features
>>>>> X-Mailer: git-send-email 2.34.1
>>>>>
>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>>> features.  e.g., some accel-specific options may require extra features
>>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>>> specific realizefn.
>>>>>
>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>>
>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>>> that it won't expose features not supported by the host.
>>>>>
>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>>> Suggested-by: Xiaoyao Li <xiaoyao...@intel.com>
>>>>> Signed-off-by: Zide Chen <zide.c...@intel.com>
>>>>> ---
>>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>>> --- a/target/i386/cpu.c
>>>>> +++ b/target/i386/cpu.c
>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, 
>>>>> Error **errp)
>>>>>          }
>>>>>      }
>>>>>  
>>>>> +    /*
>>>>> +     * note: the call to the framework needs to happen after feature 
>>>>> expansion,
>>>>> +     * but before the checks/modifications to ucode_rev, mwait, 
>>>>> phys_bits.
>>>>> +     * These may be set by the accel-specific code,
>>>>> +     * and the results are subsequently checked / assumed in this 
>>>>> function.
>>>>> +     */
>>>>> +    cpu_exec_realizefn(cs, &local_err);
>>>>> +    if (local_err != NULL) {
>>>>> +        error_propagate(errp, local_err);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || 
>>>>> cpu->enforce_cpuid);  
>>>>
>>>> For your case, which sets cpu-pm=on via overcommit, then
>>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>>
>>>> Such warning is not necessary, because the purpose of overcommit (from
>>>> code) is only to support mwait when possible, not to commit to support
>>>> mwait in Guest.
>>>>
>>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>>> intended to filter features configured by the user,   
>>>
>>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>>> filter out the MWAIT bit which is set from the overcommit option.  
>>
>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
>> bit set by "-overcommit cpu-pm=on". ;-)
>>
>> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
>> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>>   x86_cpu_expand_features()
>>      -> x86_cpu_get_supported_feature_word()
>>         -> kvm_arch_get_supported_cpuid()  
>>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>>  is working correctly here!
>>
>> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>>   neither Host's support or previous MWAIT enablement result. This is
>>   the root cause of your issue.
>>
>> Therefore, we should make cpu-pm honor his first MWAIT enablement result
>> instead of repeatly and unconditionally setting the MWAIT bit again in
>> host_cpu_enable_cpu_pm().
>>
>> Additionally, I think the code in x86_cpu_realizefn():
>>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>> has the similar issue because it also should check MWAIT feature bit.
>>
>> Further, it may be possible to remove cpu->mwait: just check the MWAIT
>> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
>> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.
> 
> Agreed with above analysis,
> we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
> gets us MWAIT already.

Yes, I agree don't need to set CPUID_EXT_MONITOR besides
kvm_arch_get_supported_cpuid().

> 
> filling in cpu->mwait.ecx is benign mistake which likely doesn't
> trigger anything if CPUID_EXT_MONITOR is not present.
> But for clarity it's better to add an explicit check there as well.

Yes, I agree without MWAIT available and advertised, it's meaningless to
set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
MWAIT is available to the guest. But I don't understand the history of
why QEMU unconditionally advertises these two bits, and don't know it it
could break some thing if these two bits are removed.

Even if we want to fix these two bits, we can do it in another separate
patch.

e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
with further explanation.

2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
comment "We always wake on interrupt even if host does not have the
capability" to CPUID_MWAIT_IBE.


> 
>>
>>>> and the changes of
>>>> CPUID after x86_cpu_filter_features() should by default be regarded like
>>>> "QEMU knows what it is doing".  
>>>
>>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
>>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
>>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
>>>
>>> - Less redundant code. Specifically, no need to call
>>> x86_cpu_get_supported_feature_word() again.
>>> - Potentially there could be other features could be added from the
>>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
>>> features need to be checked against the host availability.  
>>
>> Mainly I don't think this reorder is a direct fix for the problem (I
>> just analyse it above), also in your case x86_cpu_filter_features() will
>> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.
> 
> There is no problem with warning, I'd even say it's a good thing.

I agree it's good to have the warning as well.

> But you are right reordering just masks the issue.
> 
> As for expected behavior, if user asked for "-overcommit cpu-pm=on"
> there are 2 options:
>    * it's working as expected (mwait exiting is enabled successfully with 
> CPUID MONITOR bit set)
>    * QEMU shall fail to start.

I like the idea that QEMU refuses to launch the guest if the asked CPU
features are not available, which is more user friendly.  But the
problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
enable all the following CPUIDs and KVM features if it's possible.  So,
if QEMU fails the guest in this case, then it needs to fail the WAITPKG
feature as well. Additionally, it may need to provide individual options
to enable these individual features, which I doubt could be too complicated.

KVM_X86_DISABLE_EXITS_MWAI
KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
KVM_X86_DISABLE_EXITS_CSTATE
CPUID.7.0:ECX.WAITPKG
CPUID.1.ECX.MWAIT

Reply via email to