Hi Christian, Thomas, > Subject: Re: [PATCH v4 1/5] PCI/P2PDMA: Don't enforce ACS check for device > functions of Intel GPUs > > On 25.09.25 12:51, Thomas Hellström wrote: > >>> In that case I strongly suggest to add a private DMA-buf interface > >>> for the DMA- > >>> bufs exported by vfio-pci which returns which BAR and offset the > >>> DMA-buf > >>> represents. > > > > @Christian, Is what you're referring to here the "dma_buf private > > interconnect" we've been discussing previously, now only between vfio- > > pci and any interested importers instead of private to a known exporter > > and importer? > > > > If so I have a POC I can post as an RFC on a way to negotiate such an > > interconnect. I'll start testing with the RFC patches Thomas posted and see how they can be improved to make them suitable not only for this use-case but also for the other (iommufd/kvm) use-cases as well.
> > I was just about to write something up as well, but feel free to go ahead if > you already have something. > > >> Does this private dmabuf interface already exist or does it need to > >> be created > >> from the ground up? > > Every driver which supports both exporting and importing DMA-buf has > code to detect when somebody tries to re-import a buffer previously > exported from the same device. > > Now some drivers like amdgpu and I think XE as well also detect if the buffer > is from another device handled by the same driver which potentially have > private interconnects (XGMI or similar). > > See function amdgpu_dmabuf_is_xgmi_accessible() in amdgpu_dma_buf.c > for an example. > > >> If it already exists, could you please share an example/reference of > >> how you > >> have used it with amdgpu or other drivers? > > Well what's new is that we need to do this between two drivers unreleated > to each other. Right, that is a key difference. > > As far as I know previously that was all inside AMD drivers for example, > while in this case vfio is a common vendor agnostic driver. > > So we should probably make sure to get that right and vendor agnostic > etc.... > > >> If it doesn't exist, I was wondering if it should be based on any > >> particular best > >> practices/ideas (or design patterns) that already exist in other > >> drivers? > > > > @Vivek, another question: Also on the guest side we're exporting dma- > > mapped adresses that are imported and somehow decoded by the guest > > virtio-gpu driver? Is something similar needed there? AFAICS, nothing else is needed because Qemu is the one that decodes or resolves the dma-mapped addresses (that are imported by virtio-gpu) and identifies the right memory region (and its owner, which could be a vfio-dev or system memory). Details are found in the last patch of this Qemu series: https://lore.kernel.org/qemu-devel/[email protected]/ > > > > Also how would the guest side VF driver know that what is assumed to be > > a PF on the same device is actually a PF on the same device and not a > > completely different device with another driver? (In which case I > > assume it would like to export a system dma-buf)? Good question. AFAICS, there is no definitive way for the Xe VF driver to know who is the ultimate consumer of its buffer on the Host side. In other words, the real question is how should it decide whether to create the dmabuf from VRAM or migrate the backing object to system memory and then create the dmabuf. Here are a few options I have tried so far: 1) If the importer (virtio-gpu) has allow_peer2peer set to true, and if Xe is running in VF mode, then assume that PF of the same device is active on the Host side and thus create a dmabuf from VRAM. 2) Rely on the user (or admin) that is launching Qemu to determine if the PF on the Host and the VF are compatible (same device) and therefore configure virtio-gpu and the VF device to be virtual P2P peers like this: qemu-system-x86_64 -m 4096m .... -device ioh3420,id=root_port1,bus=pcie.0 -device x3130-upstream,id=upstream1,bus=root_port1 -device xio3130-downstream,id=downstream1,bus=upstream1,chassis=9 -device xio3130-downstream,id=downstream2,bus=upstream1,chassis=10 -device vfio-pci,host=0000:03:00.1,bus=downstream1 -device virtio-gpu,max_outputs=1,blob=true,xres=1920,yres=1080,bus=downstream2 -display gtk,gl=on I am sure there may be better ideas but I think the first option above is a lot more straightforward. However, currently, virtio-gpu's allow_peer2peer is always set to true but I'd like to set it to false and add a Qemu option to toggle it while launching the VM. This way the user gets to decide (based on what GPU device is active on the Host) whether the Xe VF driver can create the dmabuf from system memory or VRAM. > > Another question is how is lifetime handled? E.g. does the guest know that > a DMA-buf exists for it's BAR area? Yes, the Guest VM knows that. The virtio-gpu driver (a dynamic importer) which imports the scanout buffer from Xe VF driver calls dma_buf_pin(). So, the backing object stays pinned until Host/Qemu signals (via a fence) that it is done accessing (or using) the Guest's buffer. Also, note that since virtio-gpu registers a move_notify() callback, it can let Qemu know of any location changes associated with the backing store of the imported scanout buffer by sending attach_backing and detach_backing cmds. Thanks, Vivek > > Regards, > Christian. > > > > > Thanks, > > Thomas > > > > > > > >> > >>> > >>> Ideally using the same structure Qemu used to provide the offset to > >>> the vfio- > >>> pci driver, but not a must have. > >>> > >>> This way the driver for the GPU PF (XE) can leverage this > >>> interface, validates > >>> that the DMA-buf comes from a VF it feels responsible for and do > >>> the math to > >>> figure out in which parts of the VRAM needs to be accessed to > >>> scanout the > >>> picture. > >> Sounds good. This is definitely a viable path forward and it looks > >> like we are all > >> in agreement with this idea. > >> > >> I guess we can start exploring how to implement the private dmabuf > >> interface > >> mechanism right away. > >> > >> Thanks, > >> Vivek > >> > >>> > >>> This way this private vfio-pci interface can also be used by > >>> iommufd for > >>> example. > >>> > >>> Regards, > >>> Christian. > >>> > >>>> > >>>> Thanks, > >>>> Vivek > >>>> > >>>>> > >>>>> Regards, > >>>>> Christian. > >>>>> > >>>>>> > >>>>>>> What Simona agreed on is exactly what I proposed as well, > >>>>>>> that you > >>>>>>> get a private interface for exactly that use case. > >>>>>> > >>>>>> A "private" interface to exchange phys_addr_t between at > >>>>>> least > >>>>>> VFIO/KVM/iommufd - sure no complaint with that. > >>>>>> > >>>>>> Jason > >>>> > >> > >
