On Thu, Aug 7, 2025 at 5:38 AM Xiaoyao Li <xiaoyao...@intel.com> wrote:
>
> On 8/7/2025 3:18 AM, Daniel P. Berrangé wrote:
> > On Wed, Aug 06, 2025 at 07:57:34PM +0200, Christian Ehrhardt wrote:
> >> On Wed, Aug 6, 2025 at 2:00 PM Daniel P. Berrangé <berra...@redhat.com> 
> >> wrote:
> >>>
> >>> On Wed, Aug 06, 2025 at 01:52:17PM +0200, Christian Ehrhardt wrote:
> >>>> Hi,
> >>>> I was unsure if this would be better sent to libvirt or qemu - the
> >>>> issue is somewhere between libvirt modelling CPUs and qemu 10.1
> >>>> behaving differently. I did not want to double post and gladly most of
> >>>> the people are on both lists - since the switch in/out of the problem
> >>>> is qemu 10.0 <-> 10.1 let me start here. I beg your pardon for not yet
> >>>> having all the answers, I'm sure I could find more with debugging, but
> >>>> I also wanted to report early for your awareness while we are still in
> >>>> the RC phase.
> >>>>
> >>>>
> >>>> # Problem
> >>>>
> >>>> What I found when testing migrations in Ubuntu with qemu 10.1-rc1 was:
> >>>>    error: operation failed: guest CPU doesn't match specification:
> >>>> missing features: pdcm
> >>>>
> >>>> This is behaving the same with libvirt 11.4 or the more recent 11.6.
> >>>> But switching back to qemu 10.0 confirmed that this behavior is new
> >>>> with qemu 10.1-rc.
> >>>
> >>>
> >>>> Without yet having any hard evidence against them I found a few pdcm
> >>>> related commits between 10.0 and 10.1-rc1:
> >>>>    7ff24fb65 i386/tdx: Don't mask off CPUID_EXT_PDCM
> >>>>    00268e000 i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>>>    e68ec2980 i386/cpu: Move adjustment of CPUID_EXT_PDCM before
> >>>> feature_dependencies[] check
> >>>>    0ba06e46d i386/tdx: Add TDX fixed1 bits to supported CPUIDs
> >>>>
> >>>>
> >>>> # Caveat
> >>>>
> >>>> My test environment is in LXD system containers, that gives me issues
> >>>> in the power management detection
> >>>>    libvirtd[406]: error from service: GDBus.Error:System.Error.EROFS:
> >>>> Read-only file system
> >>>>    libvirtd[406]: Failed to get host power management capabilities
> >>>
> >>> That's harmless.
> >>
> >> Yeah, it always was for me - thanks for confirming.
> >>
> >>>> And the resulting host-model on a  rather old test server will therefore 
> >>>> have:
> >>>>    <cpu mode='custom' match='exact' check='full'>
> >>>>      <model fallback='forbid'>Haswell-noTSX-IBRS</model>
> >>>>      <vendor>Intel</vendor>
> >>>>      <feature policy='require' name='vmx'/>
> >>>>      <feature policy='disable' name='pdcm'/>
> >>>>       ...
> >>>>
> >>>> But that was fine in the past, and the behavior started to break
> >>>> save/restore or migrations just now with the new qemu 10.1-rc.
> >>>>
> >>>> # Next steps
> >>>>
> >>>> I'm soon overwhelmed by meetings for the rest of the day, but would be
> >>>> curious if one has a suggestion about what to look at next for
> >>>> debugging or a theory about what might go wrong. If nothing else comes
> >>>> up I'll try to set up a bisect run tomorrow.
> >>>
> >>> Yeah, git bisect is what I'd start with.
> >>
> >> Bisect complete, identified this commit
> >>
> >> commit 00268e00027459abede448662f8794d78eb4b0a4
> >> Author: Xiaoyao Li <xiaoyao...@intel.com>
> >> Date:   Tue Mar 4 00:24:50 2025 -0500
> >>
> >>      i386/cpu: Warn about why CPUID_EXT_PDCM is not available
> >>
> >>      When user requests PDCM explicitly via "+pdcm" without PMU enabled, 
> >> emit
> >>      a warning to inform the user.
> >>
> >>      Signed-off-by: Xiaoyao Li <xiaoyao...@intel.com>
> >>      Reviewed-by: Zhao Liu <zhao1....@intel.com>
> >>      Link: 
> >> https://lore.kernel.org/r/20250304052450.465445-3-xiaoyao...@intel.com
> >>      Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> >>
> >>   target/i386/cpu.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >>
> >>
> >> Which is odd as it should only add a warning right?
> >
> > No, that commit message is misleading.
> >
> > IIUC mark_unavailable_features() actively blocks usage of the feature,
> > so it is a functional change, not merely a emitting warning.
> >
> > It makes me wonder if that commit was actually intended to block the
> > feature or not, vs merely warning ?  CC'ing those involved in the
> > commit.
>
> The intention was to print a warning to tell users PDCM cannot be
> enabled if pmu is not enabled. While mark_unavailable_features() does
> has the effect of setting the bit in cpu->filtered_features[].
>
> But the feature is masked off anyway

Right - it was disabled right from the beginning.
As I reported libvirt detected it as not available and constructed the
CPU as with it disabled.
Which translated it into -cpu ...,pdcm=off,...

The new and bad aspect we need to overcome is that in these conditions
this now somehow breaks save/restore and migration operations.

As a cross-check I reverted just and only 00268e0002 on top of
10.1-rc2 and these use cases work again.

> even without the
> mark_unavailable_features():
>
>      env->features[FEAT_1_ECX] &= ~CPUID_EXT_PDCM;
>
> So is it that PDCM is set in cpu->filtered_features[] causing the problem?
>
> > With regards,
> > Daniel
>


-- 
Christian Ehrhardt
Director of Engineering, Ubuntu Server
Canonical Ltd

Reply via email to