Re: [PATCH V3 0/2] improve -overcommit cpu-pm=on|off
On 6/5/2024 6:49 AM, Igor Mammedov wrote: > On Mon, 3 Jun 2024 17:02:20 -0700 > Zide Chen wrote: > >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >> guest and executing MWAIT/MONITOR on the guest triggers #UD. >> >> Typically #UD takes priority over VM-Exit interception checks and >> KVM doesn't emulate MONITOR/MWAIT. This causes the guest fail to >> boot. >> >> V2: >> - [PATCH 1]: took Thomas' suggestion for more generic fix >> - [PATCH 2/3]: no changes >> >> V3: >> - dropped [PATCH 1/3]. Took the simpler approach not to re-order >> cpu_exec_realizefn() call. >> - changed patch title in [PATCH V3 1/2] >> - don't set CPUID_EXT_MONITOR in kvm_cpu_realizefn() > > on top of above we should make make > -overcommit cpu-pm=on > to error out if KVM_X86_DISABLE_EXITS_MWAIT is not supported/failed > > if we don't do this user gets false assumption that cpu-pm=on > works as expected, and instead of effective CPU usage/IPI delivery > all they get is a storm of mwait exits. > Agree. But seems it's quite complicated. Please refer to the comments on [PATCH V2 2/3]. We may remove the "-overcommit cpu-pm", and similar to notify-vmexit, add individual -accel options for flexibility and better cpu-pm control. But will be over complicated? KVM_X86_DISABLE_EXITS_MWAIT KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_PAUSE KVM_X86_DISABLE_EXITS_CSTATE >> >> Zide Chen (2): >> vl: Allow multiple -overcommit commands >> target/i386: Advertise MWAIT iff host supports >> >> system/vl.c | 4 ++-- >> target/i386/host-cpu.c| 12 >> target/i386/kvm/kvm-cpu.c | 11 +-- >> 3 files changed, 11 insertions(+), 16 deletions(-) >> >
Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
On 6/5/2024 8:07 AM, Igor Mammedov wrote: > On Mon, 3 Jun 2024 14:29:50 -0700 > "Chen, Zide" wrote: > >> On 6/3/2024 2:30 AM, Igor Mammedov wrote: >>> On Sat, 1 Jun 2024 23:26:55 +0800 >>> Zhao Liu wrote: >>> >>>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote: >>>>> Date: Fri, 31 May 2024 10:13:47 -0700 >>>>> From: "Chen, Zide" >>>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before >>>>> x86_cpu_filter_features >>>>> >>>>> On 5/30/2024 11:30 PM, Zhao Liu wrote: >>>>>> Hi Zide, >>>>>> >>>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote: >>>>>>> Date: Fri, 24 May 2024 13:00:16 -0700 >>>>>>> From: Zide Chen >>>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before >>>>>>> x86_cpu_filter_features >>>>>>> X-Mailer: git-send-email 2.34.1 >>>>>>> >>>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand >>>>>>> features. e.g., some accel-specific options may require extra features >>>>>>> to be enabled, and it's appropriate to expand these features in accel- >>>>>>> specific realizefn. >>>>>>> >>>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR. >>>>>>> >>>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure >>>>>>> that it won't expose features not supported by the host. >>>>>>> >>>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") >>>>>>> Suggested-by: Xiaoyao Li >>>>>>> Signed-off-by: Zide Chen >>>>>>> --- >>>>>>> target/i386/cpu.c | 24 >>>>>>> target/i386/kvm/kvm-cpu.c | 1 - >>>>>>> 2 files changed, 12 insertions(+), 13 deletions(-) >>>>>>> >>>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c >>>>>>> index bc2dceb647fa..a1c1c785bd2f 100644 >>>>>>> --- a/target/i386/cpu.c >>>>>>> +++ b/target/i386/cpu.c >>>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, >>>>>>> Error **errp) >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +/* >>>>>>> + * note: the call to the framework needs to happen after feature >>>>>>> expansion, >>>>>>> + * but before the checks/modifications to ucode_rev, mwait, >>>>>>> phys_bits. >>>>>>> + * These may be set by the accel-specific code, >>>>>>> + * and the results are subsequently checked / assumed in this >>>>>>> function. >>>>>>> + */ >>>>>>> +cpu_exec_realizefn(cs, _err); >>>>>>> +if (local_err != NULL) { >>>>>>> +error_propagate(errp, local_err); >>>>>>> +return; >>>>>>> +} >>>>>>> + >>>>>>> x86_cpu_filter_features(cpu, cpu->check_cpuid || >>>>>>> cpu->enforce_cpuid); >>>>>> >>>>>> For your case, which sets cpu-pm=on via overcommit, then >>>>>> x86_cpu_filter_features() will complain that mwait is not supported. >>>>>> >>>>>> Such warning is not necessary, because the purpose of overcommit (from >>>>>> code) is only to support mwait when possible, not to commit to support >>>>>> mwait in Guest. >>>>>> >>>>>> Additionally, I understand x86_cpu_filter_features() is primarily >>>>>> intended to filter features configured by the user, >>>>> >>>>> Yes, that's why this patches intends to let x86_cpu_filter_features() >>>>> filter out the MWAIT bit which is set from the overcommit option. >>>> >>>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT >>>> bit set by "-overcommit cpu-pm=on". ;-) >>>> >&
Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
On 6/3/2024 2:30 AM, Igor Mammedov wrote: > On Sat, 1 Jun 2024 23:26:55 +0800 > Zhao Liu wrote: > >> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote: >>> Date: Fri, 31 May 2024 10:13:47 -0700 >>> From: "Chen, Zide" >>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before >>> x86_cpu_filter_features >>> >>> On 5/30/2024 11:30 PM, Zhao Liu wrote: >>>> Hi Zide, >>>> >>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote: >>>>> Date: Fri, 24 May 2024 13:00:16 -0700 >>>>> From: Zide Chen >>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before >>>>> x86_cpu_filter_features >>>>> X-Mailer: git-send-email 2.34.1 >>>>> >>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand >>>>> features. e.g., some accel-specific options may require extra features >>>>> to be enabled, and it's appropriate to expand these features in accel- >>>>> specific realizefn. >>>>> >>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR. >>>>> >>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure >>>>> that it won't expose features not supported by the host. >>>>> >>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") >>>>> Suggested-by: Xiaoyao Li >>>>> Signed-off-by: Zide Chen >>>>> --- >>>>> target/i386/cpu.c | 24 >>>>> target/i386/kvm/kvm-cpu.c | 1 - >>>>> 2 files changed, 12 insertions(+), 13 deletions(-) >>>>> >>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c >>>>> index bc2dceb647fa..a1c1c785bd2f 100644 >>>>> --- a/target/i386/cpu.c >>>>> +++ b/target/i386/cpu.c >>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, >>>>> Error **errp) >>>>> } >>>>> } >>>>> >>>>> +/* >>>>> + * note: the call to the framework needs to happen after feature >>>>> expansion, >>>>> + * but before the checks/modifications to ucode_rev, mwait, >>>>> phys_bits. >>>>> + * These may be set by the accel-specific code, >>>>> + * and the results are subsequently checked / assumed in this >>>>> function. >>>>> + */ >>>>> +cpu_exec_realizefn(cs, _err); >>>>> +if (local_err != NULL) { >>>>> +error_propagate(errp, local_err); >>>>> +return; >>>>> +} >>>>> + >>>>> x86_cpu_filter_features(cpu, cpu->check_cpuid || >>>>> cpu->enforce_cpuid); >>>> >>>> For your case, which sets cpu-pm=on via overcommit, then >>>> x86_cpu_filter_features() will complain that mwait is not supported. >>>> >>>> Such warning is not necessary, because the purpose of overcommit (from >>>> code) is only to support mwait when possible, not to commit to support >>>> mwait in Guest. >>>> >>>> Additionally, I understand x86_cpu_filter_features() is primarily >>>> intended to filter features configured by the user, >>> >>> Yes, that's why this patches intends to let x86_cpu_filter_features() >>> filter out the MWAIT bit which is set from the overcommit option. >> >> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT >> bit set by "-overcommit cpu-pm=on". ;-) >> >> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT: >> * Firstly, it set MWAIT bit in x86_cpu_expand_features(): >> x86_cpu_expand_features() >> -> x86_cpu_get_supported_feature_word() >> -> kvm_arch_get_supported_cpuid() >> This MWAIT is based on Host's MWAIT capability. This MWAIT enablement >> is fine for next x86_cpu_filter_features() and x86_cpu_filter_features() >> is working correctly here! >> >> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless >> neither Host's support or previous MWAIT enablement result. This is >> the root cause of your issue. >> >> Therefore, we should make cpu-pm honor his first MWAIT enableme
Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
On 6/1/2024 8:26 AM, Zhao Liu wrote: > On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote: >> Date: Fri, 31 May 2024 10:13:47 -0700 >> From: "Chen, Zide" >> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before >> x86_cpu_filter_features >> >> On 5/30/2024 11:30 PM, Zhao Liu wrote: >>> Hi Zide, >>> >>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote: >>>> Date: Fri, 24 May 2024 13:00:16 -0700 >>>> From: Zide Chen >>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before >>>> x86_cpu_filter_features >>>> X-Mailer: git-send-email 2.34.1 >>>> >>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand >>>> features. e.g., some accel-specific options may require extra features >>>> to be enabled, and it's appropriate to expand these features in accel- >>>> specific realizefn. >>>> >>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR. >>>> >>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure >>>> that it won't expose features not supported by the host. >>>> >>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") >>>> Suggested-by: Xiaoyao Li >>>> Signed-off-by: Zide Chen >>>> --- >>>> target/i386/cpu.c | 24 >>>> target/i386/kvm/kvm-cpu.c | 1 - >>>> 2 files changed, 12 insertions(+), 13 deletions(-) >>>> >>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c >>>> index bc2dceb647fa..a1c1c785bd2f 100644 >>>> --- a/target/i386/cpu.c >>>> +++ b/target/i386/cpu.c >>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, >>>> Error **errp) >>>> } >>>> } >>>> >>>> +/* >>>> + * note: the call to the framework needs to happen after feature >>>> expansion, >>>> + * but before the checks/modifications to ucode_rev, mwait, phys_bits. >>>> + * These may be set by the accel-specific code, >>>> + * and the results are subsequently checked / assumed in this >>>> function. >>>> + */ >>>> +cpu_exec_realizefn(cs, _err); >>>> +if (local_err != NULL) { >>>> +error_propagate(errp, local_err); >>>> +return; >>>> +} >>>> + >>>> x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid); >>> >>> For your case, which sets cpu-pm=on via overcommit, then >>> x86_cpu_filter_features() will complain that mwait is not supported. >>> >>> Such warning is not necessary, because the purpose of overcommit (from >>> code) is only to support mwait when possible, not to commit to support >>> mwait in Guest. >>> >>> Additionally, I understand x86_cpu_filter_features() is primarily >>> intended to filter features configured by the user, >> >> Yes, that's why this patches intends to let x86_cpu_filter_features() >> filter out the MWAIT bit which is set from the overcommit option. > > HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT > bit set by "-overcommit cpu-pm=on". ;-) > > (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT: > * Firstly, it set MWAIT bit in x86_cpu_expand_features(): > x86_cpu_expand_features() > -> x86_cpu_get_supported_feature_word() > -> kvm_arch_get_supported_cpuid() > This MWAIT is based on Host's MWAIT capability. This MWAIT enablement > is fine for next x86_cpu_filter_features() and x86_cpu_filter_features() > is working correctly here! > > * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless > neither Host's support or previous MWAIT enablement result. This is > the root cause of your issue. > > Therefore, we should make cpu-pm honor his first MWAIT enablement result > instead of repeatly and unconditionally setting the MWAIT bit again in > host_cpu_enable_cpu_pm(). Yes, we don't need to set CPUID_EXT_MONITOR in host_cpu_enable_cpu_pm(). I'll drop this patch though I still think it makes sense to reorder cpu_exec_realizefn () call. > > Additionally, I think the code in x86_cpu_realizefn(): > cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE; > has the similar issue because it also should check MWAIT
Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
On 5/30/2024 11:30 PM, Zhao Liu wrote: > Hi Zide, > > On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote: >> Date: Fri, 24 May 2024 13:00:16 -0700 >> From: Zide Chen >> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before >> x86_cpu_filter_features >> X-Mailer: git-send-email 2.34.1 >> >> cpu_exec_realizefn which calls the accel-specific realizefn may expand >> features. e.g., some accel-specific options may require extra features >> to be enabled, and it's appropriate to expand these features in accel- >> specific realizefn. >> >> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR. >> >> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure >> that it won't expose features not supported by the host. >> >> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") >> Suggested-by: Xiaoyao Li >> Signed-off-by: Zide Chen >> --- >> target/i386/cpu.c | 24 >> target/i386/kvm/kvm-cpu.c | 1 - >> 2 files changed, 12 insertions(+), 13 deletions(-) >> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c >> index bc2dceb647fa..a1c1c785bd2f 100644 >> --- a/target/i386/cpu.c >> +++ b/target/i386/cpu.c >> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error >> **errp) >> } >> } >> >> +/* >> + * note: the call to the framework needs to happen after feature >> expansion, >> + * but before the checks/modifications to ucode_rev, mwait, phys_bits. >> + * These may be set by the accel-specific code, >> + * and the results are subsequently checked / assumed in this function. >> + */ >> +cpu_exec_realizefn(cs, _err); >> +if (local_err != NULL) { >> +error_propagate(errp, local_err); >> +return; >> +} >> + >> x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid); > > For your case, which sets cpu-pm=on via overcommit, then > x86_cpu_filter_features() will complain that mwait is not supported. > > Such warning is not necessary, because the purpose of overcommit (from > code) is only to support mwait when possible, not to commit to support > mwait in Guest. > > Additionally, I understand x86_cpu_filter_features() is primarily > intended to filter features configured by the user, Yes, that's why this patches intends to let x86_cpu_filter_features() filter out the MWAIT bit which is set from the overcommit option. > and the changes of > CPUID after x86_cpu_filter_features() should by default be regarded like > "QEMU knows what it is doing". Sure, we can add feature bits after x86_cpu_filter_features(), but I think moving cpu_exec_realizefn() before x86_cpu_filter_features() is more generic, and actually this is what QEMU did before commit 662175b91ff2. - Less redundant code. Specifically, no need to call x86_cpu_get_supported_feature_word() again. - Potentially there could be other features could be added from the accel-specific realizefn, kvm_cpu_realizefn() for example. And these features need to be checked against the host availability. > > I feel adding a check for the CPUID mwait bit in host_cpu_realizefn() > is enough, after all, this bit should be present if host supports mwait > and enable_cpu_pm (in kvm_arch_get_supported_cpuid()). Besides the above reasons, it seems to me expanding env->features in host-cpu.c is confusing. > Thanks, > Zhao >
Re: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()
On 5/30/2024 11:53 PM, Zhao Liu wrote: > On Fri, May 24, 2024 at 01:00:17PM -0700, Zide Chen wrote: >> Date: Fri, 24 May 2024 13:00:17 -0700 >> From: Zide Chen >> Subject: [PATCH V2 3/3] target/i386: Move host_cpu_enable_cpu_pm into >> kvm_cpu_realizefn() >> X-Mailer: git-send-email 2.34.1 >> >> It seems not a good idea to expand features in host_cpu_realizefn, >> which is for host CPU only. > > It is restricted by max_features and should be part of the "max" CPU, > and the current target/i386/host-cpu.c should only serve the “host” CPU. Yes, since this file only serves for "host" CPU, that's why I proposed to move this code to kvm_cpu_realizefn(). > >> Additionally, cpu-pm option is KVM >> specific, and it's cleaner to put it in kvm_cpu_realizefn(), together >> with the WAITPKG code. >> >> Fixes: f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using >> AccelCPUClass") >> Signed-off-by: Zide Chen >> --- >> target/i386/host-cpu.c| 12 >> target/i386/kvm/kvm-cpu.c | 11 +-- >> 2 files changed, 9 insertions(+), 14 deletions(-) >> >> diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c >> index 280e427c017c..8b8bf5afeccf 100644 >> --- a/target/i386/host-cpu.c >> +++ b/target/i386/host-cpu.c >> @@ -42,15 +42,6 @@ static uint32_t host_cpu_phys_bits(void) >> return host_phys_bits; >> } >> >> -static void host_cpu_enable_cpu_pm(X86CPU *cpu) >> -{ >> -CPUX86State *env = >env; >> - >> -host_cpuid(5, 0, >mwait.eax, >mwait.ebx, >> - >mwait.ecx, >mwait.edx); >> -env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; >> -} >> - >> static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu) >> { >> uint32_t host_phys_bits = host_cpu_phys_bits(); >> @@ -83,9 +74,6 @@ bool host_cpu_realizefn(CPUState *cs, Error **errp) >> X86CPU *cpu = X86_CPU(cs); >> CPUX86State *env = >env; >> >> -if (cpu->max_features && enable_cpu_pm) { >> -host_cpu_enable_cpu_pm(cpu); >> -} >> if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) { >> uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu); >> >> diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c >> index 3adcedf0dbc3..197c892da89a 100644 >> --- a/target/i386/kvm/kvm-cpu.c >> +++ b/target/i386/kvm/kvm-cpu.c >> @@ -64,9 +64,16 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp) >> * cpu_common_realizefn() (via xcc->parent_realize) >> */ >> if (cpu->max_features) { >> -if (enable_cpu_pm && kvm_has_waitpkg()) { >> -env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG; >> +if (enable_cpu_pm) { >> +if (kvm_has_waitpkg()) { >> +env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG; >> +} > > If you agree with my comment in patch 2, here we need a mwait bit check. I still think that take advantaged of x86_cpu_filter_features() to check host availability is a better choice. > >> +host_cpuid(5, 0, >mwait.eax, >mwait.ebx, >> + >mwait.ecx, >mwait.edx); >> +env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; >> } >> + >> if (cpu->ucode_rev == 0) { >> cpu->ucode_rev = >> kvm_arch_get_supported_msr_feature(kvm_state, >> -- >> 2.34.1 >> >>
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On 5/30/2024 6:54 AM, Zhao Liu wrote: > Hi Zide, > > On Wed, May 29, 2024 at 10:31:21AM -0700, Chen, Zide wrote: >> Date: Wed, 29 May 2024 10:31:21 -0700 >> From: "Chen, Zide" >> Subject: Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off >> >> >> >> On 5/29/2024 5:46 AM, Igor Mammedov wrote: >>> On Tue, 28 May 2024 11:16:59 -0700 >>> "Chen, Zide" wrote: >>> >>>> On 5/28/2024 2:23 AM, Igor Mammedov wrote: >>>>> On Fri, 24 May 2024 13:00:14 -0700 >>>>> Zide Chen wrote: >>>>> >>>>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >>>>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >>>>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. >>>>> >>>>> this is missing proper description how do you trigger issue >>>>> with reproducer and detailed description why guest sees MWAIT >>>>> when it's not supported by host. >>>> >>>> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the >>> it's bette to provide full QEMU CLI and host/guest kernels used and what >>> hardware was used if it's relevant so others can reproduce problem. >> >> I ever reproduced this on an older Intel Icelake machine, a >> Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic >> issue, not specific to particular models. >> >> For the CLI, I think the only command line options that matter are >> -overcommit cpu-pm=on: to set enable_cpu_pm >> -cpu host: so that cpu->max_features is set >> >> For QEMU version, as long as it's after this commit: 662175b91ff2 >> ("i386: reorder call to cpu_exec_realizefn") >> >> The guest fails to boot: >> >> [ 24.825568] smpboot: x86: Booting SMP configuration: >> [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 >> #13 #14 #15 #17 >> [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 >> #136 #137 #138 #139 #140 #141 #142 #143 #145 >> [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI >> [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 >> [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 >> [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 >> [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 >> 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 >> [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 >> [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: >> >> [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: >> 0004 >> [ 25.137790] RBP: R08: 0005da4709cb R09: >> 0001 >> [ 25.137790] R10: 5da4 R11: 0009 R12: >> >> [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: >> 00093ff0 >> [ 25.137790] FS: () GS:97f1ade0() >> knlGS: >> [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 >> [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: >> 00770ef0 >> [ 25.137790] DR0: DR1: DR2: >> >> [ 25.137790] DR3: DR6: 07f0 DR7: >> 0400 >> [ 25.137790] PKRU: 5554 >> [ 25.137790] Call Trace: >> [ 25.137790] >> [ 25.137790] ? die+0x37/0x90 >> [ 25.137790] ? do_trap+0xe3/0x110 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? do_error_trap+0x6a/0x90 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? exc_invalid_op+0x52/0x70 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 >> [ 25.137790] ? mwait_idle+0x35/0x80 >> [ 25.137790] default_idle_call+0x30/0x100 >> [ 25.137790] cpuidle_idle_call+0x12c/0x170 >> [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 >> [ 25.137790] do_idle+0x7f/0xd0 >> [ 25.137790] cpu_startup_entry+0x29/0x30 >> [ 25.137790] rest_init+0xcc/0xd0 >> [ 25.137790] start_kernel+0x396/0x5d0 >> [ 25.137790] x86_64_start_reservations+0x18/0x30 >> [ 25.137790] x86_64_start_kernel+0xe7/0xf0 >> [ 25.137790] common_startup_64+0x13e/0x148 >> [ 25.137790] >> [ 25.137790] Modules linked in: >> [ 25.137790] --[ end trace ]-- >&
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On 5/29/2024 5:46 AM, Igor Mammedov wrote: > On Tue, 28 May 2024 11:16:59 -0700 > "Chen, Zide" wrote: > >> On 5/28/2024 2:23 AM, Igor Mammedov wrote: >>> On Fri, 24 May 2024 13:00:14 -0700 >>> Zide Chen wrote: >>> >>>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >>>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >>>> guest and executing MWAIT/MONITOR on the guest triggers #UD. >>> >>> this is missing proper description how do you trigger issue >>> with reproducer and detailed description why guest sees MWAIT >>> when it's not supported by host. >> >> If "overcommit cpu-pm=on" and "-cpu host" are present, as shown in the > it's bette to provide full QEMU CLI and host/guest kernels used and what > hardware was used if it's relevant so others can reproduce problem. I ever reproduced this on an older Intel Icelake machine, a Sapphire Rapids and a Sierra Forest, but I believe this is a x86 generic issue, not specific to particular models. For the CLI, I think the only command line options that matter are -overcommit cpu-pm=on: to set enable_cpu_pm -cpu host: so that cpu->max_features is set For QEMU version, as long as it's after this commit: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn") The guest fails to boot: [ 24.825568] smpboot: x86: Booting SMP configuration: [ 24.826377] node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #17 [ 24.985799] node #1, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #145 [ 25.136955] invalid opcode: 1 PREEMPT SMP NOPTI [ 25.137790] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0 #2 [ 25.137790] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/04 [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 [ 25.137790] RSP: :91403e70 EFLAGS: 00010046 [ 25.137790] RAX: 9140a980 RBX: 9140a980 RCX: [ 25.137790] RDX: RSI: 97f1ade21b20 RDI: 0004 [ 25.137790] RBP: R08: 0005da4709cb R09: 0001 [ 25.137790] R10: 5da4 R11: 0009 R12: [ 25.137790] R13: 98573ff90fc0 R14: 9140a038 R15: 00093ff0 [ 25.137790] FS: () GS:97f1ade0() knlGS: [ 25.137790] CS: 0010 DS: ES: CR0: 80050033 [ 25.137790] CR2: 97d8aa801000 CR3: 0049e9430001 CR4: 00770ef0 [ 25.137790] DR0: DR1: DR2: [ 25.137790] DR3: DR6: 07f0 DR7: 0400 [ 25.137790] PKRU: 5554 [ 25.137790] Call Trace: [ 25.137790] [ 25.137790] ? die+0x37/0x90 [ 25.137790] ? do_trap+0xe3/0x110 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? do_error_trap+0x6a/0x90 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? exc_invalid_op+0x52/0x70 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] ? asm_exc_invalid_op+0x1a/0x20 [ 25.137790] ? mwait_idle+0x35/0x80 [ 25.137790] default_idle_call+0x30/0x100 [ 25.137790] cpuidle_idle_call+0x12c/0x170 [ 25.137790] ? tsc_verify_tsc_adjust+0x73/0xd0 [ 25.137790] do_idle+0x7f/0xd0 [ 25.137790] cpu_startup_entry+0x29/0x30 [ 25.137790] rest_init+0xcc/0xd0 [ 25.137790] start_kernel+0x396/0x5d0 [ 25.137790] x86_64_start_reservations+0x18/0x30 [ 25.137790] x86_64_start_kernel+0xe7/0xf0 [ 25.137790] common_startup_64+0x13e/0x148 [ 25.137790] [ 25.137790] Modules linked in: [ 25.137790] --[ end trace ]-- [ 25.137790] invalid opcode: 2 PREEMPT SMP NOPTI [ 25.137790] RIP: 0010:mwait_idle+0x35/0x80 [ 25.137790] Code: 6f f0 80 48 02 20 48 8b 10 83 e2 08 75 3e 65 48 8b 15 47 d6 56 6f 48 0f ba e2 27 72 41 31 d2 48 89 d8 > >> following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so >> that it doesn't have a chance to check MWAIT against host features and >> will be advertised to the guest regardless of whether it's supported by >> the host or not. >> >> x86_cpu_realizefn() >> x86_cpu_filter_features() >> cpu_exec_realizefn() >> kvm_cpu_realizefn >> host_cpu_realizefn >> host_cpu_enable_cpu_pm >> env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; >> >> >> If it's not supported by the host, executing MONITOR or MWAIT >> instructions from the guest triggers #UD, no matter MWAIT_EXITING >> control is set or not. > > If I recall right, kvm was able to emulate mwait/monitor. > So question is why it leads to exception instead? KVM can
Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off
On 5/28/2024 2:23 AM, Igor Mammedov wrote: > On Fri, 24 May 2024 13:00:14 -0700 > Zide Chen wrote: > >> Currently, if running "-overcommit cpu-pm=on" on hosts that don't >> have MWAIT support, the MWAIT/MONITOR feature is advertised to the >> guest and executing MWAIT/MONITOR on the guest triggers #UD. > > this is missing proper description how do you trigger issue > with reproducer and detailed description why guest sees MWAIT > when it's not supported by host. If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so that it doesn't have a chance to check MWAIT against host features and will be advertised to the guest regardless of whether it's supported by the host or not. x86_cpu_realizefn() x86_cpu_filter_features() cpu_exec_realizefn() kvm_cpu_realizefn host_cpu_realizefn host_cpu_enable_cpu_pm env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR; If it's not supported by the host, executing MONITOR or MWAIT instructions from the guest triggers #UD, no matter MWAIT_EXITING control is set or not.
Re: [PATCH 1/3] vl: Allow multiple -overcommit commands
On 5/20/2024 10:16 PM, Thomas Huth wrote: > On 21/05/2024 07.08, Thomas Huth wrote: >> On 20/05/2024 19.47, Zide Chen wrote: >>> Both cpu-pm and mem-lock are related to system resource overcommit, but >>> they are separate from each other, in terms of how they are realized, >>> and of course, they are applied to different system resources. >>> >>> It's tempting to use separate command lines to specify their behavior. >>> e.g., in the following example, the cpu-pm command is quietly >>> overwritten, and it's not easy to notice it without careful inspection. >>> >>> --overcommit mem-lock=on >>> --overcommit cpu-pm=on >>> >>> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option") >>> Signed-off-by: Zide Chen >>> --- >>> system/vl.c | 8 ++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/system/vl.c b/system/vl.c >>> index a3eede5fa5b8..ed682643805b 100644 >>> --- a/system/vl.c >>> +++ b/system/vl.c >>> @@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv) >>> if (!opts) { >>> exit(1); >>> } >>> - enable_mlock = qemu_opt_get_bool(opts, "mem-lock", >>> false); >>> - enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", >>> false); >>> + >>> + /* Don't override the -overcommit option if set */ >>> + enable_mlock = enable_mlock || >>> + qemu_opt_get_bool(opts, "mem-lock", false); >>> + enable_cpu_pm = enable_cpu_pm || >>> + qemu_opt_get_bool(opts, "cpu-pm", false); >>> break; >>> case QEMU_OPTION_compat: >>> { >> >> Reviewed-by: Thomas Huth > > Ah, wait, actually, this is a bad idea, too, since now you cannot > disable an enabled option anymore. Imagine that you have for example > enabled the option in the config file, and now you'd like to disable it > on the command line again - you're stuck with the enabled setting in > that case. > > I think the better solution is to replace the "false" default value at > the end: > > enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock); > enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm); > > What do you think about this? This is great! Thank you very much! I will update it in V2. > > Thomas >
Re: [PATCH 1/3] vl: Allow multiple -overcommit commands
On 5/20/2024 10:16 PM, Thomas Huth wrote: > On 21/05/2024 07.08, Thomas Huth wrote: >> On 20/05/2024 19.47, Zide Chen wrote: >>> Both cpu-pm and mem-lock are related to system resource overcommit, but >>> they are separate from each other, in terms of how they are realized, >>> and of course, they are applied to different system resources. >>> >>> It's tempting to use separate command lines to specify their behavior. >>> e.g., in the following example, the cpu-pm command is quietly >>> overwritten, and it's not easy to notice it without careful inspection. >>> >>> --overcommit mem-lock=on >>> --overcommit cpu-pm=on >>> >>> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option") >>> Signed-off-by: Zide Chen >>> --- >>> system/vl.c | 8 ++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/system/vl.c b/system/vl.c >>> index a3eede5fa5b8..ed682643805b 100644 >>> --- a/system/vl.c >>> +++ b/system/vl.c >>> @@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv) >>> if (!opts) { >>> exit(1); >>> } >>> - enable_mlock = qemu_opt_get_bool(opts, "mem-lock", >>> false); >>> - enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", >>> false); >>> + >>> + /* Don't override the -overcommit option if set */ >>> + enable_mlock = enable_mlock || >>> + qemu_opt_get_bool(opts, "mem-lock", false); >>> + enable_cpu_pm = enable_cpu_pm || >>> + qemu_opt_get_bool(opts, "cpu-pm", false); >>> break; >>> case QEMU_OPTION_compat: >>> { >> >> Reviewed-by: Thomas Huth > > Ah, wait, actually, this is a bad idea, too, since now you cannot > disable an enabled option anymore. Imagine that you have for example > enabled the option in the config file, and now you'd like to disable it > on the command line again - you're stuck with the enabled setting in > that case. > > I think the better solution is to replace the "false" default value at > the end: > > enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock); > enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm); > > What do you think about this? Thank you very much! This is a very good idea! I can update it in V2. > Thomas >
Re: [PATCH 1/3] vl: Allow multiple -overcommit commands
On 5/20/2024 10:16 PM, Thomas Huth wrote: > On 21/05/2024 07.08, Thomas Huth wrote: >> On 20/05/2024 19.47, Zide Chen wrote: >>> Both cpu-pm and mem-lock are related to system resource overcommit, but >>> they are separate from each other, in terms of how they are realized, >>> and of course, they are applied to different system resources. >>> >>> It's tempting to use separate command lines to specify their behavior. >>> e.g., in the following example, the cpu-pm command is quietly >>> overwritten, and it's not easy to notice it without careful inspection. >>> >>> --overcommit mem-lock=on >>> --overcommit cpu-pm=on >>> >>> Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option") >>> Signed-off-by: Zide Chen >>> --- >>> system/vl.c | 8 ++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/system/vl.c b/system/vl.c >>> index a3eede5fa5b8..ed682643805b 100644 >>> --- a/system/vl.c >>> +++ b/system/vl.c >>> @@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv) >>> if (!opts) { >>> exit(1); >>> } >>> - enable_mlock = qemu_opt_get_bool(opts, "mem-lock", >>> false); >>> - enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", >>> false); >>> + >>> + /* Don't override the -overcommit option if set */ >>> + enable_mlock = enable_mlock || >>> + qemu_opt_get_bool(opts, "mem-lock", false); >>> + enable_cpu_pm = enable_cpu_pm || >>> + qemu_opt_get_bool(opts, "cpu-pm", false); >>> break; >>> case QEMU_OPTION_compat: >>> { >> >> Reviewed-by: Thomas Huth > > Ah, wait, actually, this is a bad idea, too, since now you cannot > disable an enabled option anymore. Imagine that you have for example > enabled the option in the config file, and now you'd like to disable it > on the command line again - you're stuck with the enabled setting in > that case. > > I think the better solution is to replace the "false" default value at > the end: > > enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock); > enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm); > > What do you think about this? Thank you very much! This is a very good idea, and I can update it in V2. > > Thomas >
Re: [PATCH 1/6] target/i386/kvm: Add feature bit definitions for KVM CPUID
On 4/26/2024 3:07 AM, Zhao Liu wrote: > Add feature definiations for KVM_CPUID_FEATURES in CPUID ( > CPUID[4000_0001].EAX and CPUID[4000_0001].EDX), to get rid of lots of > offset calculations. > > Signed-off-by: Zhao Liu > --- > v2: Changed the prefix from CPUID_FEAT_KVM_* to CPUID_KVM_*. (Xiaoyao) > --- > hw/i386/kvm/clock.c | 5 ++--- > target/i386/cpu.h | 23 +++ > target/i386/kvm/kvm.c | 28 ++-- > 3 files changed, 39 insertions(+), 17 deletions(-) > > diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c > index 40aa9a32c32c..ce416c05a3d0 100644 > --- a/hw/i386/kvm/clock.c > +++ b/hw/i386/kvm/clock.c > @@ -27,7 +27,6 @@ > #include "qapi/error.h" > > #include > -#include "standard-headers/asm-x86/kvm_para.h" > #include "qom/object.h" > > #define TYPE_KVM_CLOCK "kvmclock" > @@ -334,8 +333,8 @@ void kvmclock_create(bool create_always) > > assert(kvm_enabled()); > if (create_always || > -cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | > - (1ULL << KVM_FEATURE_CLOCKSOURCE2))) { > +cpu->env.features[FEAT_KVM] & (CPUID_KVM_CLOCK | > + CPUID_KVM_CLOCK2)) { To achieve this purpose, how about doing the alternative to define an API similar to KVM's guest_pv_has()? _has() is simpler and clearer than "features[] & CPUID_x", additionally, this helps to keep the definitions identical to KVM, more readable and easier for future maintenance.