On Wed, Sep 14, 2016 at 07:03:50AM -0500, Michael Roth wrote:
> Quoting Alexey Kardashevskiy (2016-09-14 04:39:10)
> > On 14/09/16 09:29, Michael Roth wrote:
> > > Quoting Alexey Kardashevskiy (2016-07-27 03:03:38)
> > >> This adds a numa id property to a PHB to allow linking passed PCI device
> > >> to CPU/memory. It is up to the management stack to do CPU/memory pinning
> > >> to the node with the actual PCI device.
> > > 
> > > It looks like x86 relies on PCIBus->numa_node() method (via
> > > pci_bus_numa_node()) to encode similar PCIBus affinities
> > > into ACPI tables, and currently exposes it via
> > > -device pci-[-express]-expander-bus,numa_node=X.
> > 
> > 
> > 
> > Well, until we allow DMA windows per PCI bus (not per PHB as it is now),
> > this won't make much sense for us (unless I am missing something here).
> 
> I didn't consider that it's not a bus-level setting; I think
> you're right that re-using the interface to both store/retrieve doesn't
> make much sense in that case.
> 
> My thought that was that since pci_bus_numa_node() could in theory come
> to be relied upon by common PCI code, that we should use it as well. But
> even if it doesn't make sense for us to use it, wouldn't it make sense to
> still set PCIBus->numa_node (in addition to the PHB-wide value) in the
> off-chance that common code does come to rely on
> pci_bus_numa_node()?

Yes, it would be a good idea to set the PCIBus node value to inherit
the one that's set for the host bridge, just in case any generic code
looks at it in future.

> 
> > 
> > 
> > > Maybe we should implement it using this existing
> > > PCIBus->numa_node() interface?
> > > 
> > > We'd still have to expose numa_node as a spapr-pci-host-bridge
> > > device option though. Not sure if there's a more common way
> > > to expose it that might be easier for libvirt to discover. As it
> > > stands we'd need to add spapr-pci-host-bridge to a libvirt
> > > whitelist that currently only covers pci-expander-bus.
> > > 
> > > Cc'ing Shiva who was looking into the libvirt side.
> > > 
> > > One comment below:
> > > 
> > >>
> > >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
> > >> ---
> > >>  hw/ppc/spapr_pci.c          | 13 +++++++++++++
> > >>  include/hw/pci-host/spapr.h |  2 ++
> > >>  2 files changed, 15 insertions(+)
> > >>
> > >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > >> index 949c44f..af5394a 100644
> > >> --- a/hw/ppc/spapr_pci.c
> > >> +++ b/hw/ppc/spapr_pci.c
> > >> @@ -47,6 +47,7 @@
> > >>  #include "sysemu/device_tree.h"
> > >>  #include "sysemu/kvm.h"
> > >>  #include "sysemu/hostmem.h"
> > >> +#include "sysemu/numa.h"
> > >>
> > >>  #include "hw/vfio/vfio.h"
> > >>
> > >> @@ -1544,6 +1545,7 @@ static Property spapr_phb_properties[] = {
> > >>      DEFINE_PROP_BOOL("ddw", sPAPRPHBState, ddw_enabled, true),
> > >>      DEFINE_PROP_UINT64("pgsz", sPAPRPHBState, page_size_mask,
> > >>                         (1ULL << 12) | (1ULL << 16)),
> > >> +    DEFINE_PROP_UINT32("node", sPAPRPHBState, numa_node, -1),
> > >>      DEFINE_PROP_END_OF_LIST(),
> > >>  };
> > >>
> > >> @@ -1805,6 +1807,11 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> > >>          cpu_to_be32(1),
> > >>          cpu_to_be32(RTAS_IBM_RESET_PE_DMA_WINDOW)
> > >>      };
> > >> +    uint32_t associativity[] = {cpu_to_be32(0x4),
> > >> +                                cpu_to_be32(0x0),
> > >> +                                cpu_to_be32(0x0),
> > >> +                                cpu_to_be32(0x0),
> > >> +                                cpu_to_be32(phb->numa_node)};
> > >>      sPAPRTCETable *tcet;
> > >>      PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
> > >>      sPAPRFDT s_fdt;
> > >> @@ -1837,6 +1844,12 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
> > >>                           &ddw_extensions, sizeof(ddw_extensions)));
> > >>      }
> > >>
> > >> +    /* Advertise NUMA via ibm,associativity */
> > >> +    if (nb_numa_nodes > 1) {
> > >> +        _FDT(fdt_setprop(fdt, bus_off, "ibm,associativity", 
> > >> associativity,
> > >> +                         sizeof(associativity)));
> > >> +    }
> > > 
> > > LoPAPR 15.3 seems to suggest that ibm,associativity-reference-points is
> > > required alongside ibm,associativity for each DT node it appears in,
> > > and since we hardcode "Form 1" affinity it should be done similarly as
> > > the entry we place in the top-level DT node.
> > 
> > 
> > Hm, okay, I'll add it. There is a question to Bharata - why do we have 4s
> > in spapr_create_fdt_skel()'s refpoints? Just a random pick?
> > 
> > vec5[5]==0x80 means we are doing "Form1" and these 4s are far/near 
> > distances?
> > 
> > 
> > >> +
> > >>      /* Build the interrupt-map, this must matches what is done
> > >>       * in pci_spapr_map_irq
> > >>       */
> > >> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> > >> index 5adc603..53c4b2d 100644
> > >> --- a/include/hw/pci-host/spapr.h
> > >> +++ b/include/hw/pci-host/spapr.h
> > >> @@ -75,6 +75,8 @@ struct sPAPRPHBState {
> > >>      bool ddw_enabled;
> > >>      uint64_t page_size_mask;
> > >>      uint64_t dma64_win_addr;
> > >> +
> > >> +    uint32_t numa_node;
> > >>  };
> > >>
> > >>  #define SPAPR_PCI_MAX_INDEX          255
> > >>
> > > 
> > 
> > 
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature

Reply via email to