On Fri, 2026-03-06 at 13:31 +0100, Christian König wrote: > On 3/6/26 12:57, Philipp Stanner wrote: > > On Fri, 2026-03-06 at 12:24 +0100, Boris Brezillon wrote: > > > On Fri, 6 Mar 2026 12:03:19 +0100 > > > Christian König <[email protected]> wrote: > > > > > > > On 3/6/26 11:37, Boris Brezillon wrote: > > > > > On Fri, 6 Mar 2026 10:58:07 +0100 > > > > > Christian König <[email protected]> wrote: > > > > > > > > > > > On 3/6/26 10:46, Boris Brezillon wrote: > > > > > > > On Fri, 6 Mar 2026 09:10:52 +0100 > > > > > > > Christian König <[email protected]> wrote: > > > > > > > > Well as I wrote above you either have super reliable locking in > > > > > > > > your signaling path or you will need that for error handling. > > > > > > > > > > > > > > > > > > > > > > Not really. With rust's ownership model, you can make it so only > > > > > > > one thread gets to own the DriverFence (the signal-able fence > > > > > > > object), and the DriverFence::signal() method consumes this > > > > > > > object. This implies that only one path gets to signal the > > > > > > > DriverFence, and after that it vanishes, so no one else can > > > > > > > signal it anymore. Just to clarify, by vanishes, I mean that the > > > > > > > signal-able view disappears, but the observable object (Fence) > > > > > > > can stay around, so it can be monitored (and only monitored) by > > > > > > > others. With this model, it doesn't matter that _set_error() is > > > > > > > set under a dma_fence locked section or not, because the > > > > > > > concurrency is addressed at a higher level. > > > > > > > > > > > > That whole approach won't work. You have at least the IRQ handler > > > > > > which signals completion and the timeout handler which signals > > > > > > completion with an error. > > > > > > > > > > From a pure rust standpoint, and assuming both path (IRQ handler and > > > > > timeout handler) are written in rust, the compiler won't let you > > > > > signal concurrently if we design the thing properly, that's what > > > > > I'm trying to say. Just to be clear, it doesn't mean you can't have > > > > > one worker (in a workqueue context) that can signal a fence and an > > > > > IRQ handler that can signal the same fence. It just means that rust > > > > > won't let you do that unless you have proper locking in place, and > > > > > rust will also guarantee you won't be able to signal a fence that > > > > > has already been signaled, because as soon as it's signaled, the > > > > > signal-able fence should be consumed. > > > > > > > > Ah got it! I've worked a lot with OCaml in the past which has some > > > > similarities, but doesn't push things that far. > > > > > > > > > > > > > > > > We have documented that this handling is mandatory for DMA-fences > > > > > > since so many driver implementations got it wrong. > > > > > > > > > > Again, I'm just talking about the rust implementation we're aiming > > > > > for. If you start mixing C and rust in the same driver, you're back > > > > > to the original problem you described. > > > > > > > > The key point is the Rust implementation should not repeat the > > > > mistakes we made in the C implementation. > > > > > > > > For example blocking that multiple threads can't signal a DMA-fence > > > > is completely irrelevant. > > > > > > From a correctness standpoint, I think it's important to ensure no more > > > than one thread gets to signal the object. > > > > If you have two paths that can signal a fence, that will result > > effectively in you in Rust having to use yet another lock for a fence, > > and likely some mechanism for revoking the access. > > > > I would at least consider whether it isn't much easier to have the > > signalling-function ignore multiple signal attempts. > > > > AFAIU in Rust we originaly ended up at signal() consuming the fence > > because of the code UAF problem with data: T. > > +1 > > > > > > > > > What we need to guarantee is correct timeout handling and that > > > > DMA-fence can only signal from something delivered from a HW event, > > > > e.g. a HW interrupt or interrupt worker or similar. > > > > > > We've mostly focused on coming up with a solution that would annotate > > > signaling paths in an automated way, and making sure dma_fence_signal() > > > is never called outside of a non-annotated path: > > > - creation of DmaFenceWorkqueue/DmaFence[Delayed]Work that guarantees > > > all works are executed in a dma_fence_signalling_{begin,end}() > > > section, so we can properly detect deadlocks (through lockdep) > > > - creation of a DmaFenceIrqHandler for the same reason > > > - we'll need variants for each new deferred mechanism drivers might > > > want to use (kthread_worker?) > > > > > > But there's currently no restriction on calling dma_fence_signal() in a > > > user thread context (IOCTL()). I guess that shouldn't be too hard to > > > add (is_user_task() to the rescue). > > > > > > > > > > > A DMA-fence should *never* signal because of an IOCTL > > > > > > Okay, that's understandable. > > > > > > > or because some > > > > object runs out of scope. E.g. when you cleanup a HW ring buffer, FW > > > > queue, etc... > > > > > > We were actually going in the opposite direction: > > > auto-signal(ECANCELED) on DriverFenceTimeline object destruction > > Absolutely clear NAK to that, we have iterated that many times before on the > C side as well. > > See below for the explanation of the background. > > > > (which > > > is the thing that would be attached to the HW ringbuf. The reason is: > > > we don't want to leave unsignalled fences behind, > > > > > > > Not only do we not "want to", we actually *cannot*. We have to make > > sure all fences are signaled because only this way the C backend plus > > RCU can protect also the Rust code against UAF. > > > > > and if the HW ring is > > > gone, there's nothing that can signal it. Mind explaining why you think > > > this shouldn't be done, because I originally interpreted your > > > suggestion as exactly the opposite. > > > > I also don't get it. All fences must always get signaled, that's one of > > the most fundamental fence rules. Thus, if the last accessor to a fence > > drops, you do want to signal it with -ECANCELED > > All fences must always signal because the HW operation must always complete > or be terminated by a timeout. > > If a fence signals only because it runs out of scope than that means that you > have a huge potential for data corruption and that is even worse than not > signaling a fence. > > In other words not signaling a fence can leave the system in a deadlock > state, but signaling it incorrectly usually results in random data corruption.
It all stands and falls with the question whether a fence can drop by accident in Rust, or if it will only ever drop when the hw-ring is closed. What do you believe is the right thing to do when a driver unloads? Ideally we could design it in a way that the driver closes its rings, the pending fences drop and get signaled with ECANCELED. Your concern seems to be a driver by accident droping a fence while the hardware is still processing the associated job. (how's that dangerous, though? Shouldn't parties waiting for the fence detect the error? ECANCELED ⇒ you must not access the associated memory) P. > > Saying that we could potentially make dma_fence_release() more resilient to > ref-counting issues. > > Regards, > Christian. > > > > > > > P. >
