> -----Original Message-----
> From: Igor Mammedov <imamm...@redhat.com>
> Sent: Friday, August 28, 2020 3:43 AM
> To: Moger, Babu <babu.mo...@amd.com>
> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>; ehabk...@redhat.com;
> m...@redhat.com; qemu-devel@nongnu.org; pbonz...@redhat.com;
> r...@twiddle.net
> Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> generic decode
> 
> On Thu, 27 Aug 2020 17:58:01 -0500
> Babu Moger <babu.mo...@amd.com> wrote:
> 
> > > -----Original Message-----
> > > From: Igor Mammedov <imamm...@redhat.com>
> > > Sent: Thursday, August 27, 2020 4:19 PM
> > > To: Dr. David Alan Gilbert <dgilb...@redhat.com>
> > > Cc: ehabk...@redhat.com; m...@redhat.com; qemu-devel@nongnu.org;
> > > Moger, Babu <babu.mo...@amd.com>; pbonz...@redhat.com;
> > > r...@twiddle.net
> > > Subject: Re: [PATCH v5 0/8] Remove EPYC mode apicid decode and use
> > > generic decode
> > >
> > > On Wed, 26 Aug 2020 15:10:46 +0100
> > > "Dr. David Alan Gilbert" <dgilb...@redhat.com> wrote:
> > >
> > > > * Igor Mammedov (imamm...@redhat.com) wrote:
> > > > > On Tue, 25 Aug 2020 16:25:21 +0100 "Dr. David Alan Gilbert"
> > > > > <dgilb...@redhat.com> wrote:
> > > > >
> > > > > > * Igor Mammedov (imamm...@redhat.com) wrote:
> > > > > > > On Tue, 25 Aug 2020 09:15:04 +0100 "Dr. David Alan Gilbert"
> > > > > > > <dgilb...@redhat.com> wrote:
> > > > > > >
> > > > > > > > * Babu Moger (babu.mo...@amd.com) wrote:
> > > > > > > > > Hi Dave,
> > > > > > > > >
> > > > > > > > > On 8/24/20 1:41 PM, Dr. David Alan Gilbert wrote:
> > > > > > > > > > * Babu Moger (babu.mo...@amd.com) wrote:
> > > > > > > > > >> To support some of the complex topology, we
> > > > > > > > > >> introduced EPYC
> > > mode apicid decode.
> > > > > > > > > >> But, EPYC mode decode is running into problems. Also
> > > > > > > > > >> it can become quite a maintenance problem in the
> > > > > > > > > >> future. So, it was decided to remove that code and
> > > > > > > > > >> use the generic decode which works for majority of
> > > > > > > > > >> the topology. Most of the SPECed configuration would
> > > > > > > > > >> work just fine. With some
> > > non-SPECed user inputs, it will create some sub-optimal configuration.
> > > > > > > > > >> Here is the discussion thread.
> > > > > > > > > >> https://nam11.safelinks.protection.outlook.com/?url=h
> > > > > > > > > >> ttps
> > > > > > > > > >> %3A%2F%2Flore.kernel.org%2Fqemu-devel%2Fc0bcc1a6-
> > > 1d84-a6e
> > > > > > > > > >> 7-e468-
> > > d5b437c1b254%40amd.com%2F&amp;data=02%7C01%7Cbabu.
> > > > > > > > > >>
> > > moger%40amd.com%7C9b15ee395daa4935640408d84acedf13%7C3dd8
> > > > > > > > > >>
> > > 961fe4884e608e11a82d994e183d%7C0%7C0%7C637341599663177545
> > > > > > > > > >>
> > > &amp;sdata=4okYGU%2F8QTYqEOZEd1EBC%2BEsIIrEV59HZrHzpbsR8s
> > > > > > > > > >> U%3D&amp;reserved=0
> > > > > > > > > >>
> > > > > > > > > >> This series removes all the EPYC mode specific apicid
> > > > > > > > > >> changes
> > > and use the generic
> > > > > > > > > >> apicid decode.
> > > > > > > > > >
> > > > > > > > > > Hi Babu,
> > > > > > > > > >   This does simplify things a lot!
> > > > > > > > > > One worry, what happens about a live migration of a VM
> > > > > > > > > > from
> > > an old qemu
> > > > > > > > > > that was using the node-id to a qemu with this new scheme?
> > > > > > > > >
> > > > > > > > > The node_id which we introduced was only used
> > > > > > > > > internally. This
> > > wasn't
> > > > > > > > > exposed outside. I don't think live migration will be an 
> > > > > > > > > issue.
> > > > > > > >
> > > > > > > > Didn't it become part of the APIC ID visible to the guest?
> > > > > > >
> > > > > > > Daniel asked similar question wrt hard error on start up,
> > > > > > > when CLI is not sufficient to create EPYC cpu.
> > > > > > >
> > > > > > >
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%25
> > > > > > > 2Fwww.mail-archive.com%2Fqemu-
> > > devel%40nongnu.org%2Fmsg728536.htm
> > > > > > >
> > >
> l&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C9b15ee395daa49356
> > > 404
> > > > > > >
> > >
> 08d84acedf13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63734
> > > 1
> > > > > > >
> > >
> 599663177545&amp;sdata=OnHz23W4F4TdYwlxPZwC%2B8YRY1K3qJ5U9Sfdo
> > > Oc
> > > > > > > GXtw%3D&amp;reserved=0
> > > > > > >
> > > > > > > Migration might fall into the same category.
> > > > > > > Also looking at the history, 5.0 commit
> > > > > > >   247b18c593ec29 target/i386: Enable new apic id encoding
> > > > > > > for EPYC based cpus models silently broke APIC ID (without
> > > > > > > versioning),
> > > for all EPYC models (that's were 1 new and 1 old one).
> > > > > > >
> > > > > > > (I'm not aware of somebody complaining about it)
> > > > > > >
> > > > > > > Another commit ed78467a21459, changed CPUID_8000_001E
> > > > > > > without
> > > versioning as well.
> > > > > > >
> > > > > > >
> > > > > > > With current EPYC apicid code, if all starts align (no numa
> > > > > > > or 1 numa node only on CLI and no -smp dies=) it might
> > > > > > > produce a valid
> > > CPU (apicid+CPUID_8000_001E).
> > > > > > > No numa is gray area, since EPYC spec implies that it has to
> > > > > > > be numa
> > > machine in case of real EPYC cpus.
> > > > > > > Multi-node configs would be correct only if user assigns
> > > > > > > cpus to numa nodes by duplicating internal node_id algorithm
> > > > > > > that this series
> > > removes.
> > > > > > >
> > > > > > > There might be other broken cases that I don't recall
> > > > > > > anymore (should be mentioned in previous versions of this
> > > > > > > series)
> > > > > > >
> > > > > > >
> > > > > > > To summarize from migration pov (ignoring ed78467a21459
> change):
> > > > > > >
> > > > > > >  1) old qemu pre-5.0 ==>  qemu 5.0, 5.1 - broken migration
> > > > > >
> > > > > > Oh ....
> > > > > >
> > > > > > >  2) with this series (lets call it qemu 5.2)
> > > > > > >      pre-5.0 ==> qemu 5.2 - should work as series basically
> > > > > > > rollbacks
> > > current code to pre-5.0
> > > > > > >      qemu 5.0, 5.1 ==> qemu 5.2 - broken
> > > > > > >
> > > > > > > It's all about picking which poison to choose, I'd preffer
> > > > > > > 2nd case as it lets drop a lot of complicated code that
> > > > > > > doesn't work as expected.
> > > > > >
> > > > > > I think that would make our lives easier for other reasons; so
> > > > > > I'm happy to go with that.
> > > > >
> > > > > to make things less painful for users, me wonders if there is a
> > > > > way to block migration if epyc and specific QEMU versions are used?
> > > >
> > > > We have no way to block based on version - and that's a pretty
> > > > painful thing to do; we can block based on machine type.
> > > >
> > > > But before we get there; can we understand in which combinations
> > > > that things break and why exactly - would it break on a 1 or 2
> > > > vCPU guest - or would it only break when we get to the point the
> > > > upper bits start being used for example?  Why exaclty would it
> > > > break - i.e. is it going to change the name of sections in the
> > > > migration stream - or are the values we need actually going to migrate
> OK?
> > >
> > > it's values of APIC ID, where 4.2 and 5.0 QEMU use different values
> > > if numa is enabled.
> > > I'd expect guest to be very confused in when this happens.
> > >
> > > here is an example:
> > > qemu-4.2 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3
> > > -numa
> > > node,cpus=4-7
> > >
> > > (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id
> {
> > >     "return": 7
> > > }
> > >
> > > vs
> > >
> > > qemu-5.1 -cpu EPYC -smp 8,sockets=1,cores=8 -numa node,cpus=0-3
> > > -numa
> > > node,cpus=4-7
> > > (QEMU) qom-get path=/machine/unattached/device[8] property=apic-id
> {
> > >     "return": 15
> > > }
> > >
> > > we probably can't do anything based on machine type versions, as
> > > 4.2 and older versions on qemu-5.0 and newer use different algorithm
> > > to calculate apic-id.
> > >
> > > Hence was suggestion to leave 5.0/5.1 with broken apic id and revert
> > > back to
> > > 4.2 algorithm, which should encode APIC ID correctly when '-smp
> > > dies' is used.
> >
> > That is correct. When we revert all the node_id related changes, we
> > will go back to 4.2 algorithm. It will work fine with user passing
> > "-smp dies=n". It also keeps the code simple. That is why I kept the
> > decoding of 0x8000001e like this below. This will also match apicid 
> > decoding.
> >
> > *ecx = ((topo_info->dies_per_pkg - 1) << 8) |  ((cpu->apic_id >>
> > apicid_die_offset(topo_info)) & 0xFF);
> that will work when there is no -numa on CLI, when -numa is used, we
> should use node id that user provided.
> like you did in previous revision
>    "[PATCH v4 1/3] i386: Simplify CPUID_8000_001E for AMD"

This might be a problem in the future with new BIOS options to change the
NPS(Nodes per Socket). Nodes and dies may not match. Then we will end up
with wrong CPUID_8000_001E encoding. That is why I wanted to keep both of
them separate. Users have the option to configure the way it matches their
bios config.


> 
> > Still not clear if we need to add a warning when numa nodes != dies.
> > Worried about adding that check and remove it again later.
> Since there is objection wrt making it error and I'd go with warning for now, 
> it
> makes life of person who have to figure what's wrong a bit easier.
> 
> > What about auto_enable_numa? Do we still need it?
> >
> >
> > I can send the patches tomorrow if these things are clarified.
> > Thanks
> With auto_enable_numa it would be cleaner as you can reuse the same
> numa code to set 0x8000001e.ecx vs hardcodding it as above.
> 
> Maybe post series without auto_enable_numa so we fix migration
> regression ASAP and then switch to auto_enable_numa on top.
> 
> 
> >
> > >
> > >
> > > > Dave
> > > >
> > > >
> > > > > > > PS:
> > > > > > >  I didn't review it yet, but with this series we aren't
> > > > > > > making up internal node_ids that should match user provided
> > > > > > > numa node ids
> > > somehow.
> > > > > > >  It seems series lost the patch that would enforce numa in
> > > > > > > case -smp dies>1,  but otherwise it heads in the right direction.
> > > > > >
> > > > > > Dave
> > > > > >
> > > > > > > >
> > > > > > > > Dave
> > > > > > > >
> > > > > > >
> > > > >
> >


Reply via email to