Hi Christian, > > Hi guys, > > On 22.09.25 08:59, Kasireddy, Vivek wrote: > > Hi Jason, > > > >> Subject: Re: [PATCH v4 1/5] PCI/P2PDMA: Don't enforce ACS check for > device > >> functions of Intel GPUs > >> > >> On Fri, Sep 19, 2025 at 06:22:45AM +0000, Kasireddy, Vivek wrote: > >>>> In this case messing with ACS is completely wrong. If the intention is > >>>> to convay a some kind of "private" address representing the physical > >>>> VRAM then you need to use a DMABUF mechanism to do that, not > >> deliver a > >>>> P2P address that the other side cannot access. > >> > >>> I think using a PCI BAR Address works just fine in this case because the > Xe > >>> driver bound to PF on the Host can easily determine that it belongs to > one > >>> of the VFs and translate it into VRAM Address. > >> > >> That isn't how the P2P or ACS mechansim works in Linux, it is about > >> the actual address used for DMA. > > Right, but this is not dealing with P2P DMA access between two random, > > unrelated devices. Instead, this is a special situation involving a GPU PF > > trying to access the VRAM of a VF that it provisioned and holds a > reference > > on (note that the backing object for VF's VRAM is pinned by Xe on Host > > as part of resource provisioning). But it gets treated as regular P2P DMA > > because the exporters rely on pci_p2pdma_distance() or > > pci_p2pdma_map_type() to determine P2P compatibility. > > > > In other words, I am trying to look at this problem differently: how can the > > PF be allowed to access the VF's resource that it provisioned, particularly > > when the VF itself requests the PF to access it and when a hardware path > > (via PCIe fabric) is not required/supported or doesn't exist at all? > > Well what exactly is happening here? You have a PF assigned to the host > and a VF passed through to a guest, correct? Yes, correct.
> > And now the PF (from the host side) wants to access a BAR of the VF? Yes, that is indeed the use-case, except that the PF cannot access a buffer located in VF's VRAM portion via the BAR because this path is likely not supported by our hardware. Therefore, my proposal (via this patch series) is to translate the BAR addresses into VRAM addresses in Xe driver (on the Host). Here are some more details about the use-case (copied from an earlier reply to Jason): - Xe Graphics driver, bound to GPU PF on the Host provisions its resources including VRAM among all the VFs. - A GPU VF device is bound to vfio-pci and assigned to a Linux VM which is launched via Qemu. - The Xe Graphics driver running inside the Linux VM creates a buffer (Gnome Wayland compositor's framebuffer) in the VF's portion (or share) of the VRAM and this buffer is shared with Qemu. Qemu then requests vfio-pci driver to create a dmabuf associated with this buffer. - Next, Qemu (UI layer) requests the GPU PF (via the Xe driver) to import the dmabuf (for display purposes) located in VF's portion of the VRAM. This is where two problems occur: 1) The exporter (vfio-pci driver in this case) calls pci_p2pdma_map_type() to determine the mapping type (or check P2P compatibility) between both devices (GPU VF and PF) but it fails due to the ACS enforcement check because the PCIe upstream bridge is not whitelisted, which is a common problem on workstations/desktops/laptops. 2) Assuming that pci_p2pdma_map_type() did not fail (likely on server systems with whitelisted PCIe bridges), based on my experiments, the GPU PF is unable to access the buffer located in VF's VRAM portion directly because it is represented using PCI BAR addresses. (note that, the PCI BAR address is the DMA address here which seems to be a common practice among GPU drivers including Xe and Amdgpu when exporting dmabufs to other devices). The only way this seems to be work at the moment is if the BAR addresses are translated into VRAM addresses that the GPU PF understands (this is done done inside Xe driver on the Host using provisioning data). Note that this buffer is accessible by the CPU using BAR addresses but it is very slow. So, in summary, given that the GPU PF does not need to use PCIe fabric in order to access the buffer located in GPU VF's portion of the VRAM in this use-case, I figured adding a quirk (to not enforce ACS check) would solve 1) and implementing the BAR to VRAM address translation in Xe driver on the Host would solve 2) above. Also, Jason suggested that using dmabuf private address mechanism would help with my use-case. Could you please share details about how it can be used here? Thanks, Vivek > > Regards, > Christian. > > > > > Furthermore, note that on a server system with a whitelisted PCIe > upstream > > bridge, this quirk would not be needed at all as pci_p2pdma_map_type() > > would not have failed and this would have been a purely Xe driver specific > > problem to solve that would have required just the translation logic and > no > > further changes anywhere. But my goal is to fix it across systems like > > workstations/desktops that do not typically have whitelisted PCIe > upstream > > bridges. > > > >> > >> You can't translate a dma_addr_t to anything in the Xe PF driver > >> anyhow, once it goes through the IOMMU the necessary information is > lost. > > Well, I already tested this path (via IOMMU, with your earlier vfio-pci + > > dmabuf patch that used dma_map_resource() and also with Leon's latest > > version) and found that I could still do the translation in the Xe PF driver > > after first calling iommu_iova_to_phys(). > > > >> This is a fundamentally broken design to dma map something and > >> then try to reverse engineer the dma_addr_t back to something with > >> meaning. > > IIUC, I don't think this is a new or radical idea. I think the concept is > slightly > > similar to using bounce buffers to address hardware DMA limitations > except > > that there are no memory copies and the CPU is not involved. And, I don't > see > > any other way to do this because I don't believe the exporter can provide > a > > DMA address that the importer can use directly without any translation, > which > > seems unavoidable in this case. > > > >> > >>>> Christian told me dmabuf has such a private address mechanism, so > >>>> please figure out a way to use it.. > >>> > >>> Even if such as a mechanism exists, we still need a way to prevent > >>> pci_p2pdma_map_type() from failing when invoked by the exporter > (vfio- > >> pci). > >>> Does it make sense to move this quirk into the exporter? > >> > >> When you export a private address through dmabuf the VFIO exporter > >> will not call p2pdma paths when generating it. > > I have cc'd Christian and Simona. Hopefully, they can help explain how > > the dmabuf private address mechanism can be used to address my > > use-case. And, I sincerely hope that it will work, otherwise I don't see > > any viable path forward for what I am trying to do other than using this > > quirk and translation. Note that the main reason why I am doing this > > is because I am seeing at-least ~35% performance gain when running > > light 3D/Gfx workloads. > > > >> > >>> Also, AFAICS, translating BAR Address to VRAM Address can only be > >>> done by the Xe driver bound to PF because it has access to provisioning > >>> data. In other words, vfio-pci would not be able to share any other > >>> address other than the BAR Address because it wouldn't know how to > >>> translate it to VRAM Address. > >> > >> If you have a vfio varient driver then the VF vfio driver could call > >> the Xe driver to create a suitable dmabuf using the private > >> addressing. This is probably what is required here if this is what you > >> are trying to do. > > Could this not be done via the vendor agnostic vfio-pci (+ dmabuf) driver > > instead of having to use a separate VF/vfio variant driver? > > > >> > >>>> No, don't, it is completely wrong to mess with ACS flags for the > >>>> problem you are trying to solve. > >> > >>> But I am not messing with any ACS flags here. I am just adding a quirk to > >>> sidestep the ACS enforcement check given that the PF to VF access does > >>> not involve the PCIe fabric in this case. > >> > >> Which is completely wrong. These are all based on fabric capability, > >> not based on code in drivers to wrongly "translate" the dma_addr_t. > > I am not sure why you consider translation to be wrong in this case > > given that it is done by a trusted entity (Xe PF driver) that is bound to > > the GPU PF and provisioned the resource that it is trying to access. > > What limitations do you see with this approach? > > > > Also, the quirk being added in this patch is indeed meant to address a > > specific case (GPU PF to VF access) to workaround a potential hardware > > limitation (non-existence of a direct PF to VF DMA access path via the > > PCIe fabric). Isn't that one of the main ideas behind using quirks -- to > > address hardware limitations? > > > > Thanks, > > Vivek > > > >> > >> Jason
