Re: dma_fence: force users to take the lock manually

Boris Brezillon Fri, 06 Mar 2026 04:48:48 -0800

On Fri, 06 Mar 2026 12:57:00 +0100
Philipp Stanner <[email protected]> wrote:


> On Fri, 2026-03-06 at 12:24 +0100, Boris Brezillon wrote:
> > On Fri, 6 Mar 2026 12:03:19 +0100
> > Christian König <[email protected]> wrote:
> >   
> > > On 3/6/26 11:37, Boris Brezillon wrote:  
> > > > On Fri, 6 Mar 2026 10:58:07 +0100
> > > > Christian König <[email protected]> wrote:
> > > >     
> > > > > On 3/6/26 10:46, Boris Brezillon wrote:    
> > > > > > On Fri, 6 Mar 2026 09:10:52 +0100
> > > > > > Christian König <[email protected]> wrote:      
> > > > > > > Well as I wrote above you either have super reliable locking in
> > > > > > > your signaling path or you will need that for error handling.     
> > > > > > >  
> > > > > > 
> > > > > > Not really. With rust's ownership model, you can make it so only
> > > > > > one thread gets to own the DriverFence (the signal-able fence
> > > > > > object), and the DriverFence::signal() method consumes this
> > > > > > object. This implies that only one path gets to signal the
> > > > > > DriverFence, and after that it vanishes, so no one else can
> > > > > > signal it anymore. Just to clarify, by vanishes, I mean that the
> > > > > > signal-able view disappears, but the observable object (Fence)
> > > > > > can stay around, so it can be monitored (and only monitored) by
> > > > > > others. With this model, it doesn't matter that _set_error() is
> > > > > > set under a dma_fence locked section or not, because the
> > > > > > concurrency is addressed at a higher level.      
> > > > > 
> > > > > That whole approach won't work. You have at least the IRQ handler
> > > > > which signals completion and the timeout handler which signals
> > > > > completion with an error.    
> > > > 
> > > > From a pure rust standpoint, and assuming both path (IRQ handler and
> > > > timeout handler) are written in rust, the compiler won't let you
> > > > signal concurrently if we design the thing properly, that's what
> > > > I'm trying to say. Just to be clear, it doesn't mean you can't have
> > > > one worker (in a workqueue context) that can signal a fence and an
> > > > IRQ handler that can signal the same fence. It just means that rust
> > > > won't let you do that unless you have proper locking in place, and
> > > > rust will also guarantee you won't be able to signal a fence that
> > > > has already been signaled, because as soon as it's signaled, the
> > > > signal-able fence should be consumed.    
> > > 
> > > Ah got it! I've worked a lot with OCaml in the past which has some
> > > similarities, but doesn't push things that far.
> > >   
> > > > > 
> > > > > We have documented that this handling is mandatory for DMA-fences
> > > > > since so many driver implementations got it wrong.    
> > > > 
> > > > Again, I'm just talking about the rust implementation we're aiming
> > > > for. If you start mixing C and rust in the same driver, you're back
> > > > to the original problem you described.    
> > > 
> > > The key point is the Rust implementation should not repeat the
> > > mistakes we made in the C implementation.
> > > 
> > > For example blocking that multiple threads can't signal a DMA-fence
> > > is completely irrelevant.  
> > 
> > From a correctness standpoint, I think it's important to ensure no more
> > than one thread gets to signal the object.  
> 
> If you have two paths that can signal a fence, that will result
> effectively in you in Rust having to use yet another lock for a fence,
> and likely some mechanism for revoking the access.
> 
> I would at least consider whether it isn't much easier to have the
> signalling-function ignore multiple signal attempts.
> 
> AFAIU in Rust we originaly ended up at signal() consuming the fence
> because of the code UAF problem with data: T.

That's true, but it does prevent the multiple _signal() call on the same
fence at the same time. Anyway, we're running in circle here. Silently
ignoring when _signal() is called on an already signaled fence is
perfectly fine.

> 
> >   
> > > 
> > > What we need to guarantee is correct timeout handling and that
> > > DMA-fence can only signal from something delivered from a HW event,
> > > e.g. a HW interrupt or interrupt worker or similar.  
> > 
> > We've mostly focused on coming up with a solution that would annotate
> > signaling paths in an automated way, and making sure dma_fence_signal()
> > is never called outside of a non-annotated path:
> > - creation of DmaFenceWorkqueue/DmaFence[Delayed]Work that guarantees
> >   all works are executed in a dma_fence_signalling_{begin,end}()
> >   section, so we can properly detect deadlocks (through lockdep)
> > - creation of a DmaFenceIrqHandler for the same reason
> > - we'll need variants for each new deferred mechanism drivers might
> >   want to use (kthread_worker?)
> > 
> > But there's currently no restriction on calling dma_fence_signal() in a
> > user thread context (IOCTL()). I guess that shouldn't be too hard to
> > add (is_user_task() to the rescue).
> >   
> > > 
> > > A DMA-fence should *never* signal because of an IOCTL  
> > 
> > Okay, that's understandable.
> >   
> > > or because some
> > > object runs out of scope. E.g. when you cleanup a HW ring buffer, FW
> > > queue, etc...  
> > 
> > We were actually going in the opposite direction:
> > auto-signal(ECANCELED) on DriverFenceTimeline object destruction (which
> > is the thing that would be attached to the HW ringbuf. The reason is:
> > we don't want to leave unsignalled fences behind,
> >   
> 
> Not only do we not "want to", we actually *cannot*. We have to make
> sure all fences are signaled because only this way the C backend plus
> RCU can protect also the Rust code against UAF.

There's a bit of that, yes. But also, even if we were assuming the
dma_fence_ops and all data associated to the fence which are needed
for any of the fence ops to function correctly were staying around,
ensuring all DriverFences are signaled before being dropped is key if
we want to ensure deadlock-free implementations, otherwise you'll block
observers on a fence that will never be signaled.

> 
> >  and if the HW ring is
> > gone, there's nothing that can signal it. Mind explaining why you think
> > this shouldn't be done, because I originally interpreted your
> > suggestion as exactly the opposite.  
> 
> I also don't get it. All fences must always get signaled, that's one of
> the most fundamental fence rules. Thus, if the last accessor to a fence
> drops, you do want to signal it with -ECANCELED

Yep, I'm pretty much on the same page here.

Re: dma_fence: force users to take the lock manually

Reply via email to