On Wed, 21 Feb 2018 06:51:11 -0800 Dan Williams <dan.j.willi...@intel.com> wrote:
> On Wed, Feb 21, 2018 at 5:55 AM, Igor Mammedov <imamm...@redhat.com> wrote: > > On Tue, 20 Feb 2018 17:17:58 -0800 > > Dan Williams <dan.j.willi...@intel.com> wrote: > > > >> On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov <imamm...@redhat.com> > >> wrote: > >> > On Sat, 17 Feb 2018 14:31:35 +0800 > >> > Haozhong Zhang <haozhong.zh...@intel.com> wrote: > >> > > >> >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity > >> >> domain of a NVDIMM SPA range must match with corresponding entry in > >> >> SRAT table. > >> >> > >> >> The address ranges of vNVDIMM in QEMU are allocated from the > >> >> hot-pluggable address space, which is entirely covered by one SRAT > >> >> memory affinity structure. However, users can set the vNVDIMM > >> >> proximity domain in NFIT SPA range structure by the 'node' property of > >> >> '-device nvdimm' to a value different than the one in the above SRAT > >> >> memory affinity structure. > >> >> > >> >> In order to solve such proximity domain mismatch, this patch build one > >> >> SRAT memory affinity structure for each NVDIMM device with the > >> >> proximity domain used in NFIT. The remaining hot-pluggable address > >> >> space is covered by one or multiple SRAT memory affinity structures > >> >> with the proximity domain of the last node as before. > >> >> > >> >> Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> > >> > If we consider hotpluggable system, correctly implemented OS should > >> > be able pull proximity from Device::_PXM and override any value from > >> > SRAT. > >> > Do we really have a problem here (anything that breaks if we would use > >> > _PXM)? > >> > Maybe we should add _PXM object to nvdimm device nodes instead of > >> > massaging SRAT? > >> > >> Unfortunately _PXM is an awkward fit. Currently the proximity domain > >> is attached to the SPA range structure. The SPA range may be > >> associated with multiple DIMM devices and those individual NVDIMMs may > >> have conflicting _PXM properties. > > There shouldn't be any conflict here as NVDIMM device's _PXM method, > > should override in runtime any proximity specified by parent scope. > > (as parent scope I'd also count boot time NFIT/SRAT tables). > > > > To make it more clear we could clear valid proximity domain flag in SPA > > like this: > > > > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c > > index 59d6e42..131bca5 100644 > > --- a/hw/acpi/nvdimm.c > > +++ b/hw/acpi/nvdimm.c > > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, > > DeviceState *dev) > > */ > > nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for > > management during hot add/online > > - operation */ | > > - 2 /* Data in Proximity Domain field is > > - valid*/); > > + operation */); > > > > /* NUMA node. */ > > nfit_spa->proximity_domain = cpu_to_le32(node); > > > >> Even if that was unified across > >> DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the > >> device's control interface, or the assembled persistent memory SPA > >> range. > > I'm not sure what you mean under 'device's control interface', > > could you clarify where the ambiguity comes from? > > There are multiple SPA range types. In addition to the typical > Persistent Memory SPA range there are also Control Region SPA ranges > for MMIO registers on the DIMM for Block Apertures and other purposes. > > > > > I read spec as: _PXM applies to address range covered by NVDIMM > > device it belongs to. > > No, an NVDIMM may contribute to multiple SPA ranges and those ranges > may span sockets. Isn't NVDIMM device plugged into a single socket which belongs to a single numa node? If it's so then shouldn't SPAs referencing it also have the same proximity domain? > > As for assembled SPA, I'd assume that it applies to interleaved set > > and all NVDIMMs with it should be on the same node. It's somewhat > > irrelevant question though as QEMU so far implements only > > 1:1:1/SPA:Region Mapping:NVDIMM Device/ > > mapping. > > > > My main concern with using static configuration tables for proximity > > mapping, we'd miss on hotplug side of equation. However if we start > > from dynamic side first, we could later complement it with static > > tables if there really were need for it. > > Especially when you consider the new HMAT table that wants to have > proximity domains for describing performance characteristics of an > address range relative to an initiator, the _PXM method on an > individual NVDIMM device is a poor fit for describing a wider set. >