On 07/12/2017 04:25 PM, Andrea Bolognani wrote:
[libvir-list added to the loop]

On Tue, 2017-07-04 at 10:47 +0200, Greg Kurz wrote:
On Tue, 4 Jul 2017 17:29:01 +1000 David Gibson <da...@gibson.dropbear.id.au> 
wrote:
On Mon, Jul 03, 2017 at 06:48:25PM +0200, Greg Kurz wrote:
The sPAPR machine always create a default PHB during initialization, even
if -nodefaults was passed on the command line. This forces the user to
rely on -global if she wants to set properties of the default PHB, such
as numa_node.
This patch introduces a new machine create-default-phb property to control
whether the default PHB must be created or not. It defaults to on in order
to preserve old setups (which is also the motivation to not alter the
current behavior of -nodefaults).
If create-default-phb is set to off, the default PHB isn't created, nor
any other device usually created with it. It is mandatory to provide
a PHB on the command line to be able to use PCI devices (otherwise QEMU
won't start). For example, the following creates a PHB with the same
mappings as the default PHB and also sets the NUMA affinity:
-machine type=pseries,create-default-phb=off \
      -numa node,nodeid=0 -device spapr-pci-host-bridge,index=0,numa_node=0
So, I agree that the distinction between default devices that are
disabled with -nodefaults and default devices that aren't is a big
mess in qemu configuration.  But on the other hand this only addresses
one tiny aspect of that, and in the meantime means we will silently
ignore some other configuration options in some conditions.
So, what's the immediate benefit / use case for this?

Setting numa_node for emulated devices is the benefit for now. On x86, I figured there is no way to set the numa_node for the root controller and the emulated devices sitting there all have numa_node set to -1. Only the devices on the pxb can have a sensible value specified. Does it mean, the emulated devices/drivers don't care about the numa_node they are on?

Would it be fine on PPC to disallow setting the NUMA node for the default PHB because that is where
all the emulated devices sit ?

With the current code base, the only way to set properties of the default
PHB, is to pass -global spapr-pci-host-bridge.prop=value for each property.
The immediate benefit of this patch is to unify the way libvirt passes
PHB description to the command line:
ie, do: -machine type=pseries,create-default-phb=off \
      -device spapr-pci-host-bridge,prop1=a,prop2=b,prop3=c \
      -device spapr-pci-host-bridge,prop1=d,prop2=e,prop3=f
instead of: -machine type=pseries \
      -global spapr-pci-host-bridge.prop1=a \
      -global spapr-pci-host-bridge.prop2=b \
      -global spapr-pci-host-bridge.prop3=c \
      -device spapr-pci-host-bridge,prop1=d,prop2=e,prop3=f
So, I'm thinking about this mostly in terms of NUMA nodes
because that's the use case I'm aware of.

The problem with using -global is not that it requires using
a different syntax to set properties for the default PHB,
but rather that such properties are then inherited by all
other PHBs unless explicitly overridden. Not creating the
default PHB at all would solve the issue.

On the other hand, libvirt would then need to either

   1) only allow setting NUMA nodes for PHBs if QEMU supports
      the new option, leaving QEMU < 2.10 users behind; or

   2) implement handling for both the new and old behavior.

I'm not sure we could get away with 1), and going for 2)
means more work both for QEMU and libvirt developers for
very little actual gain, so I'd be inclined to scrap this
and just build the libvirt glue on top of the existing
interface.

That is, of course, unless

   1) having a random selection of PHBs not assigned to any
      NUMA node is a sensible use case. This is something
      we just can't do reliably with the current interface:
      we can decide to set the NUMA node only for say, PHBs
      1 and 3 leaving 0 and 2 alone, but once we set it for
      the default PHB we *have* to set it for all remaining
      ones as well. libvirt will by default assign emulated
      devices to the default PHB, so I would rather expect
      users to leave that one alone and set a NUMA node for
      all other PHBs; or

   2) there are other properties outside of numa_node we
      might want to deal with; or

   3) it turns out it's okay to require a recent QEMU :)

--
Andrea Bolognani / Red Hat / Virtualization


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to