On 3/6/26 15:55, Boris Brezillon wrote: > On Fri, 6 Mar 2026 13:54:15 +0100 > Christian König <[email protected]> wrote: > >> On 3/6/26 13:36, Philipp Stanner wrote: >>>>>> (which >>>>>> is the thing that would be attached to the HW ringbuf. The >>>>>> reason is: we don't want to leave unsignalled fences behind, >>>>>> >>>>> >>>>> Not only do we not "want to", we actually *cannot*. We have to >>>>> make sure all fences are signaled because only this way the C >>>>> backend plus RCU can protect also the Rust code against UAF. >>>>> >>>>>> and if the HW ring is >>>>>> gone, there's nothing that can signal it. Mind explaining why >>>>>> you think this shouldn't be done, because I originally >>>>>> interpreted your suggestion as exactly the opposite. >>>>> >>>>> I also don't get it. All fences must always get signaled, that's >>>>> one of the most fundamental fence rules. Thus, if the last >>>>> accessor to a fence drops, you do want to signal it with >>>>> -ECANCELED >>>> >>>> All fences must always signal because the HW operation must always >>>> complete or be terminated by a timeout. >>>> >>>> If a fence signals only because it runs out of scope than that >>>> means that you have a huge potential for data corruption and that >>>> is even worse than not signaling a fence. >>>> >>>> In other words not signaling a fence can leave the system in a >>>> deadlock state, but signaling it incorrectly usually results in >>>> random data corruption. >>> >>> It all stands and falls with the question whether a fence can drop >>> by accident in Rust, or if it will only ever drop when the hw-ring >>> is closed. >>> >>> What do you believe is the right thing to do when a driver unloads? >>> >> >> Do a dma_fence_wait() to make sure that all HW operations have >> completed and all fences signaled. >> >>> Ideally we could design it in a way that the driver closes its >>> rings, the pending fences drop and get signaled with ECANCELED. >> >> No, exactly that is a really bad idea. >> >> Just do it the other way around, use the dma_fence to wait for the HW >> operation to be completed. > > But in practice you don't just wait for the HW to finish most of the > time. You instruct the HW to stop processing stuff, and then wait for it > to acknowledge that it indeed stopped.
And how does the HW acknowledged that it has indeed stopped? Maybe by sending an interrupt which signals a DMA-fence? The point here is that all acknowledgement from the HW that a DMA operation was indeed stopped, independent if it's the normal operation completed use case or if it's the I have aborted use case, *must* always take the same HW and SW path. It is *not* sufficient that you do something like busy waiting for a bit in a register to flip in the abortion path and for a DMA memory write in the normal completion path. That's why MMU/VM inside a device is usually not sufficient to prevent freed memory from being written to. You need an IOMMU for that, e.g. close to the CPU/memory and without caches behind the HW path. I should probably write that down in some documentation. Regards, Christian. > And the HWRing object will only > be dropped when that happens. There's of course a fallback for the case > where the STOP operation fails (with reset, etc), which would just > delay the drop of the HWRing. But the point is, when the HWRing is > dropped, you should be guaranteed that the HW is no longer executing > any of the possibly pending jobs. Now, of course you can decide that > it's the driver responsibility to walk the list of jobs that were > pending after a STOP has been acked and manually signal all fences, or > you can assume that the HWRing being dropped is what provides this > guarantee. And because the HWRing embeds you DmaFenceCtx that > auto-signal on drops, you don't have to do anything. > >> >> Then wait for an RCU grace period to make sure that nobody is still >> inside your DMA fence ops. >> >> And then you can continue with unloading the module. >> >>> Your concern seems to be a driver by accident droping a fence while >>> the hardware is still processing the associated job. >>> >>> (how's that dangerous, though? Shouldn't parties waiting for the >>> fence detect the error? ECANCELED ⇒ you must not access the >>> associated memory) >> >> The dma_fence is the SW object which represents the HW operation. >> >> And that HW operation is doing DMA, e.g. accessing and potentially >> writing into memory. That's where the name Direct Memory Access comes >> from. >> >> So when that is messed up the memory which gets written to is >> potentially re-used with the absolutely dire consequences we have >> seen so many times. > > But then, I'd argue that the HWRing (and the underlying FenceCtx > keeping track of emitted fences on this ring) should live at least as > long as the HW is assumed to be running commands. That's IMHO the > guarantee we need to provide: don't drop your HWRing object until > you're sure the HW has stopped pulling stuff from there. I can think > of the following two cases: > > 1. The HW has a way to prematurely stop stuff that are currently > executing. First thing we do is ensure the HW can't pull anything new, > then we issue a STOP and wait for an ACK. When this ACK is received, we > proceed with the rest of the cleanup. If the ACK doesn't come in time, > we flag the HWRing unusable, schedule a reset, and wait for this reset > to be effective before dropping the HWRing. > > 2. The HW can't stop what it started doing. We make sure nothing new > can be pushed to the HWRing, we wait for the in-flight ops to land. If > it's taking too long, the timeout handler will take over, schedule a > reset, and drop the faulty HWRing only after the reset is effective. > >> >> Keep in mind that this framework is not only used by GPU where at >> least modern ones have VM protection, but also old ones and stuff >> like V4L were such things is just not present in any way. > > I'm not questioning the fact signaling fences prematurely can lead to > terrible security holes which are worse than deadlocks, I'm questioning > the fact this "dont-signal-before-HW-is-done" guarantee needs to happen > at the fence level, rather than at the fence emitter level.
