Re: [PATCH v1 0/4] vfio: report NUMA nodes for device memory

Alex Williamson Fri, 15 Sep 2023 07:50:32 -0700

On Fri, 15 Sep 2023 16:19:29 +0200
Cédric Le Goater <c...@redhat.com> wrote:


> Hello Ankit,
> 
> On 9/15/23 04:45, ank...@nvidia.com wrote:
> > From: Ankit Agrawal <ank...@nvidia.com>
> > 
> > For devices which allow CPU to cache coherently access their memory,
> > it is sensible to expose such memory as NUMA nodes separate from
> > the sysmem node. Qemu currently do not provide a mechanism for creation
> > of NUMA nodes associated with a vfio-pci device.
> > 
> > Implement a mechanism to create and associate a set of unique NUMA nodes
> > with a vfio-pci device.>
> > NUMA node is created by inserting a series of the unique proximity
> > domains (PXM) in the VM SRAT ACPI table. The ACPI tables are read once
> > at the time of bootup by the kernel to determine the NUMA configuration
> > and is inflexible post that. Hence this feature is incompatible with
> > device hotplug. The added node range associated with the device is
> > communicated through ACPI DSD and can be fetched by the VM kernel or
> > kernel modules. QEMU's VM SRAT and DSD builder code is modified
> > accordingly.
> > 
> > New command line params are introduced for admin to have a control on
> > the NUMA node assignment.  
> 
> This approach seems to bypass the NUMA framework in place in QEMU and
> will be a challenge for the upper layers. QEMU is generally used from
> libvirt when dealing with KVM guests.
> 
> Typically, a command line for a virt machine with NUMA nodes would look
> like :
> 
>    -object memory-backend-ram,id=ram-node0,size=1G \
>    -numa node,nodeid=0,memdev=ram-node0 \
>    -object memory-backend-ram,id=ram-node1,size=1G \
>    -numa node,nodeid=1,cpus=0-3,memdev=ram-node1
> 
> which defines 2 nodes, one with memory and all CPUs and a second with
> only memory.
> 
>    # numactl -H
>    available: 2 nodes (0-1)
>    node 0 cpus: 0 1 2 3
>    node 0 size: 1003 MB
>    node 0 free: 734 MB
>    node 1 cpus:
>    node 1 size: 975 MB
>    node 1 free: 968 MB
>    node distances:
>    node   0   1
>      0:  10  20
>      1:  20  10
> 
>    
> Could it be a new type of host memory backend ?  Have you considered
> this approach ?

Good idea.  Fundamentally the device should not be creating NUMA nodes,
the VM should be configured with NUMA nodes and the device memory
associated with those nodes.

I think we're also dealing with a lot of very, very device specific
behavior, so I question whether we shouldn't create a separate device
for this beyond vifo-pci or vfio-pci-nohotplug.

In particular, a PCI device typically only has association to a single
proximity domain, so what sense does it make to describe the coherent
memory as a PCI BAR to only then create a confusing mapping where the
device has a proximity domain separate from a resources associated with
the device?

It's seeming like this device should create memory objects that can be
associated as memory backing for command line specified NUMA nodes.
Thanks,

Alex

Re: [PATCH v1 0/4] vfio: report NUMA nodes for device memory

Reply via email to