On Mon, 3 Feb 2020 13:31:29 -0600 Babu Moger <babu.mo...@amd.com> wrote:
> On 2/3/20 8:59 AM, Igor Mammedov wrote: > > On Tue, 03 Dec 2019 18:36:54 -0600 > > Babu Moger <babu.mo...@amd.com> wrote: > > > >> This series fixes APIC ID encoding problems on AMD EPYC CPUs. > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1728166&data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&sdata=N%2FaBBZ8G3D1gCNvabVQ%2FraHvINazcVeEc9FWdxQAWmg%3D&reserved=0 > >> > >> Currently, the APIC ID is decoded based on the sequence > >> sockets->dies->cores->threads. This works for most standard AMD and other > >> vendors' configurations, but this decoding sequence does not follow that of > >> AMD's APIC ID enumeration strictly. In some cases this can cause CPU > >> topology > >> inconsistency. When booting a guest VM, the kernel tries to validate the > >> topology, and finds it inconsistent with the enumeration of EPYC cpu > >> models. > >> > >> To fix the problem we need to build the topology as per the Processor > >> Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 > >> Processors. It is available at > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F55570-B1_PUB.zip&data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&sdata=McjyMS3A3x5Jr57VxJmHDyh5jumdybzW%2FwLtE4FAKHQ%3D&reserved=0 > >> > >> Here is the text from the PPR. > >> Operating systems are expected to use > >> Core::X86::Cpuid::SizeId[ApicIdSize], the > >> number of least significant bits in the Initial APIC ID that indicate core > >> ID > >> within a processor, in constructing per-core CPUID masks. > >> Core::X86::Cpuid::SizeId[ApicIdSize] determines the maximum number of cores > >> (MNC) that the processor could theoretically support, not the actual > >> number of > >> cores that are actually implemented or enabled on the processor, as > >> indicated > >> by Core::X86::Cpuid::SizeId[NC]. > >> Each Core::X86::Apic::ApicId[ApicId] register is preset as follows: > >> • ApicId[6] = Socket ID. > >> • ApicId[5:4] = Node ID. > >> • ApicId[3] = Logical CCX L3 complex ID > >> • ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : > >> {1'b0,LogicalCoreID[1:0]} > > > > > > After checking out all patches and some pondering, used here approach > > looks to me too intrusive for the task at hand especially where it > > comes to generic code. > > > > (Ignore till ==== to see suggestion how to simplify without reading > > reasoning behind it first) > > > > Lets look for a way to simplify it a little bit. > > > > So problem we are trying to solve, > > 1: calculate APIC IDs based on cpu type (to e more specific: for EPYC > > based CPUs) > > 2: it depends on knowing total number of numa nodes. > > > > Externally workflow looks like following: > > 1. user provides -smp x,sockets,cores,...,maxcpus > > that's used by possible_cpu_arch_ids() singleton to build list of > > possible CPUs (which is available to user via command > > 'hotpluggable-cpus') > > > > Hook could be called very early and possible_cpus data might be > > not complete. It builds a list of possible CPUs which user could > > modify later. > > > > 2.1 user uses "-numa cpu,node-id=x,..." or legacy "-numa > > node,node_id=x,cpus=" > > options to assign cpus to nodes, which is one way or another calling > > machine_set_cpu_numa_node(). The later updates 'possible_cpus' list > > with node information. It happens early when total number of nodes > > is not available. > > > > 2.2 user does not provide explicit node mappings for CPUs. > > QEMU steps in and assigns possible cpus to nodes in > > machine_numa_finish_cpu_init() > > (using the same machine_set_cpu_numa_node()) right before calling > > boards > > specific machine init(). At that time total number of nodes is known. > > > > In 1 -- 2.1 cases, 'arch_id' in 'possible_cpus' list doesn't have to be > > defined before > > boards init() is run. > > > > In 2.2 case it calls get_default_cpu_node_id() -> > > x86_get_default_cpu_node_id() > > which uses arch_id calculate numa node. > > But then question is: does it have to use APIC id or could it infer > > 'pkg_id', > > it's after, from ms->possible_cpus->cpus[i].props data? > > Not sure if I got the question right. In this case because the numa > information is not provided all the cpus are assigned to only one node. > The apic id is used here to get the correct pkg_id. apicid was composed from socket/core/thread[/die] tuple which cpus[i].props is. Question is if we can compose only pkg_id based on the same data without converting it to apicid and then "reverse engineering" it back original data? Or more direct question: is socket-id the same as pkg_id? > > > > > With that out of the way APIC ID will be used only during board's init(), > > so board could update possible_cpus with valid APIC IDs at the start of > > x86_cpus_init(). > > > > ==== > > in nutshell it would be much easier to do following: > > > > 1. make x86_get_default_cpu_node_id() APIC ID in-depended or > > if impossible as alternative recompute APIC IDs there if cpu > > type is EPYC based (since number of nodes is already known) > > 2. recompute APIC IDs in x86_cpus_init() if cpu type is EPYC based > > > > this way one doesn't need to touch generic numa code, introduce > > x86 specific init_apicid_fn() hook into generic code and keep > > x86/EPYC nuances contained within x86 code only. > > I was kind of already working in the similar direction in v4. > 1. We already have split the numa initialization in patch #12(Split the > numa initialization). This way we know exactly how many numa nodes are > there before hand. I suggest to drop that patch, It's the one that touches generic numa code and adding more legacy based extensions like cpu_indexes. Which I'd like to get rid of to begin with, so only -numa cpu is left. I think it's not necessary to touch numa code at all for apicid generation purpose, as I tried to explain above. We should be able to keep this x86 only business. > 2. Planning to remove init_apicid_fn > 3. Insert the handlers inside X86CPUDefinition. what handlers do you mean? > 4. EPYC model will have its own apid id handlers. Everything else will be > initialized with a default handlers(current default handler). > 5. The function pc_possible_cpu_arch_ids will load the model definition > and initialize the PCMachineState data structure with the model specific > handlers. I'm not sure what do you mean here. > Does that sound similar to what you are thinking. Thoughts? If you have something to share and can push it on github, I can look at, whether it has design issues to spare you a round trip on a list. (it won't be proper review but at least I can help to pinpoint most problematic parts) > > > > >> v3: > >> 1. Consolidated the topology information in structure X86CPUTopoInfo. > >> 2. Changed the ccx_id to llc_id as commented by upstream. > >> 3. Generalized the apic id decoding. It is mostly similar to current > >> apic id > >> except that it adds new field llc_id when numa configured. Removes > >> all the > >> hardcoded values. > >> 4. Removed the earlier parse_numa split. And moved the numa node > >> initialization > >> inside the numa_complete_configuration. This is bit cleaner as > >> commented by > >> Eduardo. > >> 5. Added new function init_apicid_fn inside machine_class structure. This > >> will be used to update the apic id handler specific to cpu model. > >> 6. Updated the cpuid unit tests. > >> 7. TODO : Need to figure out how to dynamically update the handlers > >> using cpu models. > >> I might some guidance on that. > >> > >> v2: > >> > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F156779689013.21957.1631551572950676212.stgit%40localhost.localdomain%2F&data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&sdata=ls1cxA1yh0P05zYsAf3sLXDM11DFHtxZvfWWaar7Mgg%3D&reserved=0 > >> 1. Introduced the new property epyc to enable new epyc mode. > >> 2. Separated the epyc mode and non epyc mode function. > >> 3. Introduced function pointers in PCMachineState to handle the > >> differences. > >> 4. Mildly tested different combinations to make things are working as > >> expected. > >> 5. TODO : Setting the epyc feature bit needs to be worked out. This > >> feature is > >> supported only on AMD EPYC models. I may need some guidance on that. > >> > >> v1: > >> > >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fqemu-devel%2F20190731232032.51786-1-babu.moger%40amd.com%2F&data=02%7C01%7Cbabu.moger%40amd.com%7C50685202e372472d7b2c08d7a8b9afa6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637163387802886193&sdata=nT4T9RIL4EeSvB%2Ff9%2BjbU7lldopjglQ2X6uYx13WMPE%3D&reserved=0 > >> > >> --- > >> > >> Babu Moger (18): > >> hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs > >> hw/i386: Introduce X86CPUTopoInfo to contain topology info > >> hw/i386: Consolidate topology functions > >> hw/i386: Introduce initialize_topo_info to initialize X86CPUTopoInfo > >> machine: Add SMP Sockets in CpuTopology > >> hw/core: Add core complex id in X86CPU topology > >> machine: Add a new function init_apicid_fn in MachineClass > >> hw/i386: Update structures for nodes_per_pkg > >> i386: Add CPUX86Family type in CPUX86State > >> hw/386: Add EPYC mode topology decoding functions > >> i386: Cleanup and use the EPYC mode topology functions > >> numa: Split the numa initialization > >> hw/i386: Introduce apicid_from_cpu_idx in PCMachineState > >> hw/i386: Introduce topo_ids_from_apicid handler PCMachineState > >> hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState > >> hw/i386: Introduce EPYC mode function handlers > >> i386: Fix pkg_id offset for epyc mode > >> tests: Update the Unit tests > >> > >> > >> hw/core/machine-hmp-cmds.c | 3 + > >> hw/core/machine.c | 14 +++ > >> hw/core/numa.c | 62 +++++++++---- > >> hw/i386/pc.c | 132 +++++++++++++++++++--------- > >> include/hw/boards.h | 3 + > >> include/hw/i386/pc.h | 9 ++ > >> include/hw/i386/topology.h | 209 > >> +++++++++++++++++++++++++++++++------------- > >> include/sysemu/numa.h | 5 + > >> qapi/machine.json | 7 + > >> target/i386/cpu.c | 196 > >> ++++++++++++----------------------------- > >> target/i386/cpu.h | 9 ++ > >> tests/test-x86-cpuid.c | 115 ++++++++++++++---------- > >> vl.c | 4 + > >> 13 files changed, 455 insertions(+), 313 deletions(-) > >> > >> -- > >> > > >