On 1/21/26 17:01, Jason Gunthorpe wrote: > On Wed, Jan 21, 2026 at 04:28:17PM +0100, Christian König wrote: >> On 1/21/26 14:31, Jason Gunthorpe wrote: >>> On Wed, Jan 21, 2026 at 10:20:51AM +0100, Christian König wrote: >>>> On 1/20/26 15:07, Leon Romanovsky wrote: >>>>> From: Leon Romanovsky <[email protected]> >>>>> >>>>> dma-buf invalidation is performed asynchronously by hardware, so VFIO must >>>>> wait until all affected objects have been fully invalidated. >>>>> >>>>> Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO >>>>> regions") >>>>> Signed-off-by: Leon Romanovsky <[email protected]> >>>> >>>> Reviewed-by: Christian König <[email protected]> >>>> >>>> Please also keep in mind that the while this wait for all fences for >>>> correctness you also need to keep the mapping valid until >>>> dma_buf_unmap_attachment() was called. >>> >>> Can you elaborate on this more? >>> >>> I think what we want for dma_buf_attach_revocable() is the strong >>> guarentee that the importer stops doing all access to the memory once >>> this sequence is completed and the exporter can rely on it. I don't >>> think this works any other way. >>> >>> This is already true for dynamic move capable importers, right? >> >> Not quite, no. > > :( > > It is kind of shocking to hear these APIs work like this with such a > loose lifetime definition. Leon can you include some of these detail > in the new comments?
Yeah, when the API was designed we intentionally said that by waiting for the fences means waiting for all operations to finish. But then came reality and there HW just do stuff like speculatively read ahead... and with that all the nice design goes to the trash-bin. >>>> In other words you can only redirect the DMA-addresses previously >>>> given out into nirvana (or a dummy memory or similar), but you still >>>> need to avoid re-using them for something else. >>> >>> Does any driver do this? If you unload/reload a GPU driver it is >>> going to re-use the addresses handed out? >> >> I never fully read through all the source code, but if I'm not >> completely mistaken that is enforced for all GPU drivers through the >> DMA-buf and DRM layer lifetime handling and I think even in other in >> kernel frameworks like V4L, alsa etc... > >> What roughly happens is that each DMA-buf mapping through a couple >> of hoops keeps a reference on the device, so even after a hotplug >> event the device can only fully go away after all housekeeping >> structures are destroyed and buffers freed. > > A simple reference on the device means nothing for these kinds of > questions. It does not stop unloading and reloading a driver. Well as far as I know it stops the PCIe address space from being re-used. So when you do an "echo 1 > remove" and then an re-scan on the upstream bridge that works, but you get different addresses for your MMIO BARs! > Obviously if the driver is loaded fresh it will reallocate. > > To do what you are saying the DRM drivers would have to block during > driver remove until all unmaps happen. Oh, well I never looked to deeply into that. As far as I know it doesn't block, but rather the last drm_dev_put() just cleans things up. And we have a CI test system which exercises that stuff over and over again because we have a big customer depending on that. >> Background is that a lot of device still make reads even after you >> have invalidated a mapping, but then discard the result. > > And they also don't insert fences to conclude that? Nope, that is just speculatively read ahead from other operations which actually doesn't have anything TODO with our buffer. >> So when you don't have same grace period you end up with PCI AER, >> warnings from IOMMU, random accesses to PCI BARs which just happen >> to be in the old location of something etc... > > Yes, definitely. It is very important to have a definitive point in > the API where all accesses stop. While "read but discard" seems > harmless on the surface, there are corner cases where it is not OK. > > Am I understanding right that these devices must finish their reads > before doing unmap? Yes, and that is a big one. Otherwise we basically loose any chance of sanely handling this. >> I would rather like to keep that semantics even for forcefully >> shootdowns since it proved to be rather reliable. > > We can investigate making unmap the barrier point if this is the case. I mean when you absolutely just can't do it otherwise just make sure that a speculative read doesn't result in any form of error message or triggering actions or similar. That approach works as well. And yes we absolutely have to document all those findings and behavior in the DMA-buf API. Regards, Christian.
