Re: dma_fence: force users to take the lock manually

Philipp Stanner Fri, 06 Mar 2026 04:37:03 -0800

On Fri, 2026-03-06 at 13:31 +0100, Christian König wrote:
> On 3/6/26 12:57, Philipp Stanner wrote:
> > On Fri, 2026-03-06 at 12:24 +0100, Boris Brezillon wrote:
> > > On Fri, 6 Mar 2026 12:03:19 +0100
> > > Christian König <[email protected]> wrote:
> > > 
> > > > On 3/6/26 11:37, Boris Brezillon wrote:
> > > > > On Fri, 6 Mar 2026 10:58:07 +0100
> > > > > Christian König <[email protected]> wrote:
> > > > >   
> > > > > > On 3/6/26 10:46, Boris Brezillon wrote:  
> > > > > > > On Fri, 6 Mar 2026 09:10:52 +0100
> > > > > > > Christian König <[email protected]> wrote:    
> > > > > > > > Well as I wrote above you either have super reliable locking in
> > > > > > > > your signaling path or you will need that for error handling.   
> > > > > > > >  
> > > > > > > 
> > > > > > > Not really. With rust's ownership model, you can make it so only
> > > > > > > one thread gets to own the DriverFence (the signal-able fence
> > > > > > > object), and the DriverFence::signal() method consumes this
> > > > > > > object. This implies that only one path gets to signal the
> > > > > > > DriverFence, and after that it vanishes, so no one else can
> > > > > > > signal it anymore. Just to clarify, by vanishes, I mean that the
> > > > > > > signal-able view disappears, but the observable object (Fence)
> > > > > > > can stay around, so it can be monitored (and only monitored) by
> > > > > > > others. With this model, it doesn't matter that _set_error() is
> > > > > > > set under a dma_fence locked section or not, because the
> > > > > > > concurrency is addressed at a higher level.    
> > > > > > 
> > > > > > That whole approach won't work. You have at least the IRQ handler
> > > > > > which signals completion and the timeout handler which signals
> > > > > > completion with an error.  
> > > > > 
> > > > > From a pure rust standpoint, and assuming both path (IRQ handler and
> > > > > timeout handler) are written in rust, the compiler won't let you
> > > > > signal concurrently if we design the thing properly, that's what
> > > > > I'm trying to say. Just to be clear, it doesn't mean you can't have
> > > > > one worker (in a workqueue context) that can signal a fence and an
> > > > > IRQ handler that can signal the same fence. It just means that rust
> > > > > won't let you do that unless you have proper locking in place, and
> > > > > rust will also guarantee you won't be able to signal a fence that
> > > > > has already been signaled, because as soon as it's signaled, the
> > > > > signal-able fence should be consumed.  
> > > > 
> > > > Ah got it! I've worked a lot with OCaml in the past which has some
> > > > similarities, but doesn't push things that far.
> > > > 
> > > > > > 
> > > > > > We have documented that this handling is mandatory for DMA-fences
> > > > > > since so many driver implementations got it wrong.  
> > > > > 
> > > > > Again, I'm just talking about the rust implementation we're aiming
> > > > > for. If you start mixing C and rust in the same driver, you're back
> > > > > to the original problem you described.  
> > > > 
> > > > The key point is the Rust implementation should not repeat the
> > > > mistakes we made in the C implementation.
> > > > 
> > > > For example blocking that multiple threads can't signal a DMA-fence
> > > > is completely irrelevant.
> > > 
> > > From a correctness standpoint, I think it's important to ensure no more
> > > than one thread gets to signal the object.
> > 
> > If you have two paths that can signal a fence, that will result
> > effectively in you in Rust having to use yet another lock for a fence,
> > and likely some mechanism for revoking the access.
> > 
> > I would at least consider whether it isn't much easier to have the
> > signalling-function ignore multiple signal attempts.
> > 
> > AFAIU in Rust we originaly ended up at signal() consuming the fence
> > because of the code UAF problem with data: T.
> 
> +1
> 
> > > > 
> > > > What we need to guarantee is correct timeout handling and that
> > > > DMA-fence can only signal from something delivered from a HW event,
> > > > e.g. a HW interrupt or interrupt worker or similar.
> > > 
> > > We've mostly focused on coming up with a solution that would annotate
> > > signaling paths in an automated way, and making sure dma_fence_signal()
> > > is never called outside of a non-annotated path:
> > > - creation of DmaFenceWorkqueue/DmaFence[Delayed]Work that guarantees
> > >   all works are executed in a dma_fence_signalling_{begin,end}()
> > >   section, so we can properly detect deadlocks (through lockdep)
> > > - creation of a DmaFenceIrqHandler for the same reason
> > > - we'll need variants for each new deferred mechanism drivers might
> > >   want to use (kthread_worker?)
> > > 
> > > But there's currently no restriction on calling dma_fence_signal() in a
> > > user thread context (IOCTL()). I guess that shouldn't be too hard to
> > > add (is_user_task() to the rescue).
> > > 
> > > > 
> > > > A DMA-fence should *never* signal because of an IOCTL
> > > 
> > > Okay, that's understandable.
> > > 
> > > > or because some
> > > > object runs out of scope. E.g. when you cleanup a HW ring buffer, FW
> > > > queue, etc...
> > > 
> > > We were actually going in the opposite direction:
> > > auto-signal(ECANCELED) on DriverFenceTimeline object destruction
> 
> Absolutely clear NAK to that, we have iterated that many times before on the 
> C side as well.
> 
> See below for the explanation of the background.
> 
> > > (which
> > > is the thing that would be attached to the HW ringbuf. The reason is:
> > > we don't want to leave unsignalled fences behind,
> > > 
> > 
> > Not only do we not "want to", we actually *cannot*. We have to make
> > sure all fences are signaled because only this way the C backend plus
> > RCU can protect also the Rust code against UAF.
> > 
> > >  and if the HW ring is
> > > gone, there's nothing that can signal it. Mind explaining why you think
> > > this shouldn't be done, because I originally interpreted your
> > > suggestion as exactly the opposite.
> > 
> > I also don't get it. All fences must always get signaled, that's one of
> > the most fundamental fence rules. Thus, if the last accessor to a fence
> > drops, you do want to signal it with -ECANCELED
> 
> All fences must always signal because the HW operation must always complete 
> or be terminated by a timeout.
> 
> If a fence signals only because it runs out of scope than that means that you 
> have a huge potential for data corruption and that is even worse than not 
> signaling a fence.
> 
> In other words not signaling a fence can leave the system in a deadlock 
> state, but signaling it incorrectly usually results in random data corruption.


It all stands and falls with the question whether a fence can drop by
accident in Rust, or if it will only ever drop when the hw-ring is
closed.

What do you believe is the right thing to do when a driver unloads?

Ideally we could design it in a way that the driver closes its rings,
the pending fences drop and get signaled with ECANCELED.

Your concern seems to be a driver by accident droping a fence while the
hardware is still processing the associated job.

(how's that dangerous, though? Shouldn't parties waiting for the fence
detect the error? ECANCELED ⇒ you must not access the associated
memory)


P.


> 
> Saying that we could potentially make dma_fence_release() more resilient to 
> ref-counting issues.
> 
> Regards,
> Christian.
> 
> > 
> > 
> > P.
>

Re: dma_fence: force users to take the lock manually

Reply via email to