On Wed, 23 Sep 2020 16:34:55 -0300 Daniel Henrique Barboza <danielhb...@gmail.com> wrote:
> QEMU allows the user to set NUMA distances in the command line. > For ACPI architectures like x86, this means that user input is > used to populate the SLIT table, and the guest perceives the > distances as the user chooses to. > > PPC64 does not work that way. In the PAPR concept of NUMA, > associativity relations between the NUMA nodes are provided by > the device tree, and the guest kernel is free to calculate the > distances as it sees fit. Given how ACPI architectures works, > this puts the pSeries machine in a strange spot - users expect > to define NUMA distances like in the ACPI case, but QEMU does > not have control over it. To give pSeries users a similar > experience, we'll need to bring kernel specifics to QEMU > to approximate the NUMA distances. > > The pSeries kernel works with the NUMA distance range 10, > 20, 40, 80 and 160. The code starts at 10 (local distance) and > searches for a match in the first NUMA level between the > resources. If there is no match, the distance is doubled and > then it proceeds to try to match in the next NUMA level. Rinse > and repeat for MAX_DISTANCE_REF_POINTS levels. > > This patch introduces a spapr_numa_PAPRify_distances() helper Funky naming but meaningful and funny, for me at least :) > that translates the user distances to kernel distance, which > we're going to use to determine the associativity domains for > the NUMA nodes. > > Signed-off-by: Daniel Henrique Barboza <danielhb...@gmail.com> > --- > hw/ppc/spapr_numa.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c > index 36aaa273ee..180800b2f3 100644 > --- a/hw/ppc/spapr_numa.c > +++ b/hw/ppc/spapr_numa.c > @@ -37,6 +37,49 @@ static bool spapr_numa_is_symmetrical(MachineState *ms) > return true; > } > > +/* > + * This function will translate the user distances into > + * what the kernel understand as possible values: 10 > + * (local distance), 20, 40, 80 and 160. Current heuristic > + * is: > + * > + * - distances between 11 and 30 -> rounded to 20 > + * - distances between 31 and 60 -> rounded to 40 > + * - distances between 61 and 120 -> rounded to 80 > + * - everything above 120 -> 160 It isn't clear what happens when the distances are exactly 30, 60 or 120... > + * > + * This step can also be done in the same time as the NUMA > + * associativity domains calculation, at the cost of extra > + * complexity. We chose to keep it simpler. > + * > + * Note: this will overwrite the distance values in > + * ms->numa_state->nodes. > + */ > +static void spapr_numa_PAPRify_distances(MachineState *ms) > +{ > + int src, dst; > + int nb_numa_nodes = ms->numa_state->num_nodes; > + NodeInfo *numa_info = ms->numa_state->nodes; > + > + for (src = 0; src < nb_numa_nodes; src++) { > + for (dst = src; dst < nb_numa_nodes; dst++) { > + uint8_t distance = numa_info[src].distance[dst]; > + uint8_t rounded_distance = 160; > + > + if (distance > 11 && distance < 30) { > + rounded_distance = 20; > + } else if (distance > 31 && distance < 60) { > + rounded_distance = 40; > + } else if (distance > 61 && distance < 120) { > + rounded_distance = 80; > + } ... and this code doesn't convert them to PAPR-friendly values actually. I guess < should be turned into <= . > + > + numa_info[src].distance[dst] = rounded_distance; > + numa_info[dst].distance[src] = rounded_distance; > + } > + } > +} > + > void spapr_numa_associativity_init(SpaprMachineState *spapr, > MachineState *machine) > { > @@ -95,6 +138,7 @@ void spapr_numa_associativity_init(SpaprMachineState > *spapr, > exit(1); > } > > + spapr_numa_PAPRify_distances(machine); > } > > void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,