Re: dma_fence: force users to take the lock manually

Christian König Fri, 06 Mar 2026 04:31:25 -0800

On 3/6/26 12:57, Philipp Stanner wrote:
> On Fri, 2026-03-06 at 12:24 +0100, Boris Brezillon wrote:
>> On Fri, 6 Mar 2026 12:03:19 +0100
>> Christian König <[email protected]> wrote:
>>
>>> On 3/6/26 11:37, Boris Brezillon wrote:
>>>> On Fri, 6 Mar 2026 10:58:07 +0100
>>>> Christian König <[email protected]> wrote:
>>>>   
>>>>> On 3/6/26 10:46, Boris Brezillon wrote:  
>>>>>> On Fri, 6 Mar 2026 09:10:52 +0100
>>>>>> Christian König <[email protected]> wrote:    
>>>>>>> Well as I wrote above you either have super reliable locking in
>>>>>>> your signaling path or you will need that for error handling.    
>>>>>>
>>>>>> Not really. With rust's ownership model, you can make it so only
>>>>>> one thread gets to own the DriverFence (the signal-able fence
>>>>>> object), and the DriverFence::signal() method consumes this
>>>>>> object. This implies that only one path gets to signal the
>>>>>> DriverFence, and after that it vanishes, so no one else can
>>>>>> signal it anymore. Just to clarify, by vanishes, I mean that the
>>>>>> signal-able view disappears, but the observable object (Fence)
>>>>>> can stay around, so it can be monitored (and only monitored) by
>>>>>> others. With this model, it doesn't matter that _set_error() is
>>>>>> set under a dma_fence locked section or not, because the
>>>>>> concurrency is addressed at a higher level.    
>>>>>
>>>>> That whole approach won't work. You have at least the IRQ handler
>>>>> which signals completion and the timeout handler which signals
>>>>> completion with an error.  
>>>>
>>>> From a pure rust standpoint, and assuming both path (IRQ handler and
>>>> timeout handler) are written in rust, the compiler won't let you
>>>> signal concurrently if we design the thing properly, that's what
>>>> I'm trying to say. Just to be clear, it doesn't mean you can't have
>>>> one worker (in a workqueue context) that can signal a fence and an
>>>> IRQ handler that can signal the same fence. It just means that rust
>>>> won't let you do that unless you have proper locking in place, and
>>>> rust will also guarantee you won't be able to signal a fence that
>>>> has already been signaled, because as soon as it's signaled, the
>>>> signal-able fence should be consumed.  
>>>
>>> Ah got it! I've worked a lot with OCaml in the past which has some
>>> similarities, but doesn't push things that far.
>>>
>>>>>
>>>>> We have documented that this handling is mandatory for DMA-fences
>>>>> since so many driver implementations got it wrong.  
>>>>
>>>> Again, I'm just talking about the rust implementation we're aiming
>>>> for. If you start mixing C and rust in the same driver, you're back
>>>> to the original problem you described.  
>>>
>>> The key point is the Rust implementation should not repeat the
>>> mistakes we made in the C implementation.
>>>
>>> For example blocking that multiple threads can't signal a DMA-fence
>>> is completely irrelevant.
>>
>> From a correctness standpoint, I think it's important to ensure no more
>> than one thread gets to signal the object.
> 
> If you have two paths that can signal a fence, that will result
> effectively in you in Rust having to use yet another lock for a fence,
> and likely some mechanism for revoking the access.
> 
> I would at least consider whether it isn't much easier to have the
> signalling-function ignore multiple signal attempts.
> 
> AFAIU in Rust we originaly ended up at signal() consuming the fence
> because of the code UAF problem with data: T.


+1

>>>
>>> What we need to guarantee is correct timeout handling and that
>>> DMA-fence can only signal from something delivered from a HW event,
>>> e.g. a HW interrupt or interrupt worker or similar.
>>
>> We've mostly focused on coming up with a solution that would annotate
>> signaling paths in an automated way, and making sure dma_fence_signal()
>> is never called outside of a non-annotated path:
>> - creation of DmaFenceWorkqueue/DmaFence[Delayed]Work that guarantees
>>   all works are executed in a dma_fence_signalling_{begin,end}()
>>   section, so we can properly detect deadlocks (through lockdep)
>> - creation of a DmaFenceIrqHandler for the same reason
>> - we'll need variants for each new deferred mechanism drivers might
>>   want to use (kthread_worker?)
>>
>> But there's currently no restriction on calling dma_fence_signal() in a
>> user thread context (IOCTL()). I guess that shouldn't be too hard to
>> add (is_user_task() to the rescue).
>>
>>>
>>> A DMA-fence should *never* signal because of an IOCTL
>>
>> Okay, that's understandable.
>>
>>> or because some
>>> object runs out of scope. E.g. when you cleanup a HW ring buffer, FW
>>> queue, etc...
>>
>> We were actually going in the opposite direction:
>> auto-signal(ECANCELED) on DriverFenceTimeline object destruction

Absolutely clear NAK to that, we have iterated that many times before on the C 
side as well.

See below for the explanation of the background.

>> (which
>> is the thing that would be attached to the HW ringbuf. The reason is:
>> we don't want to leave unsignalled fences behind,
>>
> 
> Not only do we not "want to", we actually *cannot*. We have to make
> sure all fences are signaled because only this way the C backend plus
> RCU can protect also the Rust code against UAF.
> 
>>  and if the HW ring is
>> gone, there's nothing that can signal it. Mind explaining why you think
>> this shouldn't be done, because I originally interpreted your
>> suggestion as exactly the opposite.
> 
> I also don't get it. All fences must always get signaled, that's one of
> the most fundamental fence rules. Thus, if the last accessor to a fence
> drops, you do want to signal it with -ECANCELED

All fences must always signal because the HW operation must always complete or 
be terminated by a timeout.

If a fence signals only because it runs out of scope than that means that you 
have a huge potential for data corruption and that is even worse than not 
signaling a fence.

In other words not signaling a fence can leave the system in a deadlock state, 
but signaling it incorrectly usually results in random data corruption.

Saying that we could potentially make dma_fence_release() more resilient to 
ref-counting issues.

Regards,
Christian.

> 
> 
> P.

Re: dma_fence: force users to take the lock manually

Reply via email to