On 12/11/25 13:33, Philipp Stanner wrote: > On Thu, 2025-12-11 at 13:16 +0100, Christian König wrote: >> Hi everyone, >> >> dma_fences have ever lived under the tyranny dictated by the module >> lifetime of their issuer, leading to crashes should anybody still holding >> a reference to a dma_fence when the module of the issuer was unloaded. >> >> The basic problem is that when buffer are shared between drivers >> dma_fence objects can leak into external drivers and stay there even >> after they are signaled. The dma_resv object for example only lazy releases >> dma_fences. >> >> So what happens is that when the module who originally created the dma_fence >> unloads the dma_fence_ops function table becomes unavailable as well and so >> any attempt to release the fence crashes the system. >> >> Previously various approaches have been discussed, including changing the >> locking semantics of the dma_fence callbacks (by me) as well as using the >> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences >> from their actual users, but none of them are actually solving all problems. >> >> Tvrtko did some really nice prerequisite work by protecting the returned >> strings of the dma_fence_ops by RCU. This way dma_fence creators where >> able to just wait for an RCU grace period after fence signaling before >> they could be save to free those data structures. >> >> Now this patch set here goes a step further and protects the whole >> dma_fence_ops structure by RCU, so that after the fence signals the >> pointer to the dma_fence_ops is set to NULL when there is no wait nor >> release callback given. All functionality which use the dma_fence_ops >> reference are put inside an RCU critical section, except for the >> deprecated issuer specific wait and of course the optional release >> callback. >> >> Additional to the RCU changes the lock protecting the dma_fence state >> previously had to be allocated external. This set here now changes the >> functionality to make that external lock optional and allows dma_fences >> to use an inline lock and be self contained. >> >> v4: >> >> Rebases the whole set on upstream changes, especially the cleanup >> from Philip in patch "drm/amdgpu: independence for the amdkfd_fence!". >> >> Adding two patches which brings the DMA-fence self tests up to date. >> The first selftest changes removes the mock_wait and so actually starts >> testing the default behavior instead of some hacky implementation in the >> test. This one should probably go upstream independent of this set. >> The second drops the mock_fence as well and tests the new RCU and inline >> spinlock functionality. >> >> Especially the first patch still needs a Reviewed-by, apart from that I >> think I've addressed all review comments. >> >> The plan is to push the core DMA-buf changes to drm-misc-next and then the >> driver specific changes through the driver channels as approprite. > > This does not apply to drm-misc-next (unless I'm screwing up badly). > > Where can I apply it? I'd like to test the drm_sched changes before > this gets merged.
drm-tip from a few days ago, otherwise the xe changes won't work. Regards, Christian. > > P. > >> >> Please review and comment, >> Christian. >> >> >
