Re: [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models

Igor Mammedov Tue, 04 Feb 2020 00:04:01 -0800

On Mon, 3 Feb 2020 13:31:29 -0600
Babu Moger <babu.mo...@amd.com> wrote:


> On 2/3/20 8:59 AM, Igor Mammedov wrote:
> > On Tue, 03 Dec 2019 18:36:54 -0600
> > Babu Moger <babu.mo...@amd.com> wrote:
> >   
> >> This series fixes APIC ID encoding problems on AMD EPYC CPUs.
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1728166&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=N%2FaBBZ8G3D1gCNvabVQ%2FraHvINazcVeEc9FWdxQAWmg%3D&amp;reserved=0
> >>
> >> Currently, the APIC ID is decoded based on the sequence
> >> sockets->dies->cores->threads. This works for most standard AMD and other
> >> vendors' configurations, but this decoding sequence does not follow that of
> >> AMD's APIC ID enumeration strictly. In some cases this can cause CPU 
> >> topology
> >> inconsistency.  When booting a guest VM, the kernel tries to validate the
> >> topology, and finds it inconsistent with the enumeration of EPYC cpu 
> >> models.
> >>
> >> To fix the problem we need to build the topology as per the Processor
> >> Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1
> >> Processors. It is available at 
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F55570-B1_PUB.zip&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=McjyMS3A3x5Jr57VxJmHDyh5jumdybzW%2FwLtE4FAKHQ%3D&amp;reserved=0
> >>
> >> Here is the text from the PPR.
> >> Operating systems are expected to use 
> >> Core::X86::Cpuid::SizeId[ApicIdSize], the
> >> number of least significant bits in the Initial APIC ID that indicate core 
> >> ID
> >> within a processor, in constructing per-core CPUID masks.
> >> Core::X86::Cpuid::SizeId[ApicIdSize] determines the maximum number of cores
> >> (MNC) that the processor could theoretically support, not the actual 
> >> number of
> >> cores that are actually implemented or enabled on the processor, as 
> >> indicated
> >> by Core::X86::Cpuid::SizeId[NC].
> >> Each Core::X86::Apic::ApicId[ApicId] register is preset as follows:
> >> • ApicId[6] = Socket ID.
> >> • ApicId[5:4] = Node ID.
> >> • ApicId[3] = Logical CCX L3 complex ID
> >> • ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : 
> >> {1'b0,LogicalCoreID[1:0]}  
> > 
> > 
> > After checking out all patches and some pondering, used here approach
> > looks to me too intrusive for the task at hand especially where it
> > comes to generic code.
> > 
> > (Ignore till ==== to see suggestion how to simplify without reading
> > reasoning behind it first)
> > 
> > Lets look for a way to simplify it a little bit.
> > 
> > So problem we are trying to solve,
> >  1: calculate APIC IDs based on cpu type (to e more specific: for EPYC 
> > based CPUs)
> >  2: it depends on knowing total number of numa nodes.
> > 
> > Externally workflow looks like following:
> >   1. user provides -smp x,sockets,cores,...,maxcpus
> >       that's used by possible_cpu_arch_ids() singleton to build list of
> >       possible CPUs (which is available to user via command 
> > 'hotpluggable-cpus')
> > 
> >       Hook could be called very early and possible_cpus data might be
> >       not complete. It builds a list of possible CPUs which user could
> >       modify later.
> > 
> >   2.1 user uses "-numa cpu,node-id=x,..." or legacy "-numa 
> > node,node_id=x,cpus="
> >       options to assign cpus to nodes, which is one way or another calling
> >       machine_set_cpu_numa_node(). The later updates 'possible_cpus' list
> >       with node information. It happens early when total number of nodes
> >       is not available.
> > 
> >   2.2 user does not provide explicit node mappings for CPUs.
> >       QEMU steps in and assigns possible cpus to nodes in 
> > machine_numa_finish_cpu_init()
> >       (using the same machine_set_cpu_numa_node()) right before calling 
> > boards
> >       specific machine init(). At that time total number of nodes is known.
> > 
> > In 1 -- 2.1 cases, 'arch_id' in 'possible_cpus' list doesn't have to be 
> > defined before
> > boards init() is run.
> > 
> > In 2.2 case it calls get_default_cpu_node_id() -> 
> > x86_get_default_cpu_node_id()
> > which uses arch_id calculate numa node.
> > But then question is: does it have to use APIC id or could it infer 
> > 'pkg_id',
> > it's after, from ms->possible_cpus->cpus[i].props data?  
> 
> Not sure if I got the question right. In this case because the numa
> information is not provided all the cpus are assigned to only one node.
> The apic id is used here to get the correct pkg_id.

apicid was composed from socket/core/thread[/die] tuple which cpus[i].props is.

Question is if we can compose only pkg_id based on the same data without
converting it to apicid and then "reverse engineering" it back
original data?

Or more direct question: is socket-id the same as pkg_id?


> 
> >   
> > With that out of the way APIC ID will be used only during board's init(),
> > so board could update possible_cpus with valid APIC IDs at the start of
> > x86_cpus_init().
> > 
> > ====
> > in nutshell it would be much easier to do following:
> > 
> >  1. make x86_get_default_cpu_node_id() APIC ID in-depended or
> >     if impossible as alternative recompute APIC IDs there if cpu
> >     type is EPYC based (since number of nodes is already known)
> >  2. recompute APIC IDs in x86_cpus_init() if cpu type is EPYC based
> > 
> > this way one doesn't need to touch generic numa code, introduce
> > x86 specific init_apicid_fn() hook into generic code and keep
> > x86/EPYC nuances contained within x86 code only.  
> 
> I was kind of already working in the similar direction in v4.
> 1. We already have split the numa initialization in patch #12(Split the
> numa initialization). This way we know exactly how many numa nodes are
> there before hand.

I suggest to drop that patch, It's the one that touches generic numa
code and adding more legacy based extensions like cpu_indexes.
Which I'd like to get rid of to begin with, so only -numa cpu is left.

I think it's not necessary to touch numa code at all for apicid generation
purpose, as I tried to explain above. We should be able to keep
this x86 only business.

> 2. Planning to remove init_apicid_fn
> 3. Insert the handlers inside X86CPUDefinition.
what handlers do you mean?

> 4. EPYC model will have its own apid id handlers. Everything else will be
> initialized with a default handlers(current default handler).
> 5. The function pc_possible_cpu_arch_ids will load the model definition
> and initialize the PCMachineState data structure with the model specific
> handlers.
I'm not sure what do you mean here.
 
> Does that sound similar to what you are thinking. Thoughts?
If you have something to share and can push it on github,
I can look at, whether it has design issues to spare you a round trip on a list.
(it won't be proper review but at least I can help to pinpoint most problematic 
parts)

> 
> >   
> >> v3:
> >>   1. Consolidated the topology information in structure X86CPUTopoInfo.
> >>   2. Changed the ccx_id to llc_id as commented by upstream.
> >>   3. Generalized the apic id decoding. It is mostly similar to current 
> >> apic id
> >>      except that it adds new field llc_id when numa configured. Removes 
> >> all the
> >>      hardcoded values.
> >>   4. Removed the earlier parse_numa split. And moved the numa node 
> >> initialization
> >>      inside the numa_complete_configuration. This is bit cleaner as 
> >> commented by 
> >>      Eduardo.
> >>   5. Added new function init_apicid_fn inside machine_class structure. This
> >>      will be used to update the apic id handler specific to cpu model.
> >>   6. Updated the cpuid unit tests.
> >>   7. TODO : Need to figure out how to dynamically update the handlers 
> >> using cpu models.
> >>      I might some guidance on that.
> >>
> >> v2:
> >>   
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F156779689013.21957.1631551572950676212.stgit%40localhost.localdomain%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=ls1cxA1yh0P05zYsAf3sLXDM11DFHtxZvfWWaar7Mgg%3D&amp;reserved=0
> >>   1. Introduced the new property epyc to enable new epyc mode.
> >>   2. Separated the epyc mode and non epyc mode function.
> >>   3. Introduced function pointers in PCMachineState to handle the
> >>      differences.
> >>   4. Mildly tested different combinations to make things are working as 
> >> expected.
> >>   5. TODO : Setting the epyc feature bit needs to be worked out. This 
> >> feature is
> >>      supported only on AMD EPYC models. I may need some guidance on that.
> >>
> >> v1:
> >>   
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F20190731232032.51786-1-babu.moger%40amd.com%2F&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&amp;sdata=nT4T9RIL4EeSvB%2Ff9%2BjbU7lldopjglQ2X6uYx13WMPE%3D&amp;reserved=0
> >>
> >> ---
> >>
> >> Babu Moger (18):
> >>       hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs
> >>       hw/i386: Introduce X86CPUTopoInfo to contain topology info
> >>       hw/i386: Consolidate topology functions
> >>       hw/i386: Introduce initialize_topo_info to initialize X86CPUTopoInfo
> >>       machine: Add SMP Sockets in CpuTopology
> >>       hw/core: Add core complex id in X86CPU topology
> >>       machine: Add a new function init_apicid_fn in MachineClass
> >>       hw/i386: Update structures for nodes_per_pkg
> >>       i386: Add CPUX86Family type in CPUX86State
> >>       hw/386: Add EPYC mode topology decoding functions
> >>       i386: Cleanup and use the EPYC mode topology functions
> >>       numa: Split the numa initialization
> >>       hw/i386: Introduce apicid_from_cpu_idx in PCMachineState
> >>       hw/i386: Introduce topo_ids_from_apicid handler PCMachineState
> >>       hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState
> >>       hw/i386: Introduce EPYC mode function handlers
> >>       i386: Fix pkg_id offset for epyc mode
> >>       tests: Update the Unit tests
> >>
> >>
> >>  hw/core/machine-hmp-cmds.c |    3 +
> >>  hw/core/machine.c          |   14 +++
> >>  hw/core/numa.c             |   62 +++++++++----
> >>  hw/i386/pc.c               |  132 +++++++++++++++++++---------
> >>  include/hw/boards.h        |    3 +
> >>  include/hw/i386/pc.h       |    9 ++
> >>  include/hw/i386/topology.h |  209 
> >> +++++++++++++++++++++++++++++++-------------
> >>  include/sysemu/numa.h      |    5 +
> >>  qapi/machine.json          |    7 +
> >>  target/i386/cpu.c          |  196 
> >> ++++++++++++-----------------------------
> >>  target/i386/cpu.h          |    9 ++
> >>  tests/test-x86-cpuid.c     |  115 ++++++++++++++----------
> >>  vl.c                       |    4 +
> >>  13 files changed, 455 insertions(+), 313 deletions(-)
> >>
> >> --
> >>  
> >   
>

Re: [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models

Reply via email to