On 23.09.25 14:15, Jason Gunthorpe wrote: > On Tue, Sep 23, 2025 at 09:52:04AM +0200, Christian König wrote: >> For example the ISP driver part of amdgpu provides the V4L2 >> interface and when we interchange a DMA-buf with it we recognize that >> it is actually the same device we work with. > > One of the issues here is the mis-use of dma_map_resource() to create > dma_addr_t for PCI devices. This was never correct.
That is not a mis-use at all but rather exactly what dma_map_resource() was created for. If dma_map_resource() is not ACS aware than we should add that. > VFIO is using a new correct ACS aware DMA mapping API that I would > expect all the DMABUF world to slowly migrate to. This API prevents > mappings in cases that don't work in HW. > > So a design where you have to DMA map something then throw away the > DMA map after doing some "shortcut" check isn't going to work. > > We need some way for the importer/exporter to negotiate what kind of > address they want to exchange without forcing a dma mapping. That is already in place. We don't DMA map anything in those use cases. >>>> I've read through this thread—Jason, correct me if I'm wrong—but I >>>> believe what you're suggesting is that instead of using PCIe P2P >>>> (dma_map_resource) to communicate the VF's VRAM offset to the PF, we >>>> should teach dma-buf to natively understand a VF's VRAM offset. I don't >>>> think this is currently built into dma-buf, but it probably should be, >>>> as it could benefit other use cases as well (e.g., UALink, NVLink, >>>> etc.). >>>> >>>> In both examples above, the PCIe P2P fabric is used for communication, >>>> whereas in the VF→PF case, it's only using the PCIe P2P address to >>>> extract the VF's VRAM offset, rather than serving as a communication >>>> path. I believe that's Jason's objection. Again, Jason, correct me if >>>> I'm misunderstanding here. > > Yes, this is my point. > > We have many cases now where a dma_addr_t is not the appropriate way > to exchange addressing information from importer/exporter and we need > more flexibility. > > I also consider the KVM and iommufd use cases that must have a > phys_addr_t in this statement. Abusing phys_addr_t is also the completely wrong approach in that moment. When you want to communicate addresses in a device specific address space you need a device specific type for that and not abuse phys_addr_t. >> What you can do is to either export the DMA-buf from the driver who >> feels responsible for the PF directly (that's what we do in amdgpu >> because the VRAM is actually not fully accessible through the BAR). > > Again, considering security somehow as there should not be uAPI to > just give uncontrolled access to VRAM. > > From a security side having the VF create the DMABUF is better as you > get that security proof that it is permitted to access the VRAM. Well the VF is basically just a window into the HW of the PF. The real question is where does the VFIO gets the necessary information which parts of the BAR to expose? > From this thread I think if VFIO had the negotiated option to export a > CPU phys_addr_t then the Xe PF driver can reliably convert that to a > VRAM offset. > > We need to add a CPU phys_addr_t option for VFIO to iommufd and KVM > anyhow, those cases can't use dma_addr_t. Clear NAK to using CPU phys_addr_t. This is just a horrible idea. Regards, Christian. > >>>> I'd prefer to leave the provisioning data to the PF if possible. I >>>> haven't fully wrapped my head around the flow yet, but it should be >>>> feasible for the VF → VFIO → PF path to pass along the initial VF >>>> scatter-gather (SG) list in the dma-buf, which includes VF-specific >>>> PFNs. The PF can then use this, along with its provisioning information, >>>> to resolve the physical address. >> >> Well don't put that into the sg_table but rather into an xarray or >> similar, but in general that's the correct idea. > > Yes, please lets move away from re-using dma_addr_t to represent > things that are not created by the DMA API. > > Jason
