On 3/9/26 09:16, Boris Brezillon wrote:
> On Fri, 06 Mar 2026 20:02:58 +0100
> Philipp Stanner <[email protected]> wrote:
> 
>> On Fri, 2026-03-06 at 16:43 +0100, Boris Brezillon wrote:
>>> On Fri, 6 Mar 2026 16:25:25 +0100
>>> Boris Brezillon <[email protected]> wrote:
>>>   
>>>> On Fri, 6 Mar 2026 15:37:31 +0100
>>>> Christian König <[email protected]> wrote:
>>>>   
>>>>> On 3/6/26 14:36, Philipp Stanner wrote:    
>>>>>>>>>> In other words not signaling a fence can leave the system in a
>>>>>>>>>> deadlock state, but signaling it incorrectly usually results in
>>>>>>>>>> random data corruption.      
>>>>>>>>
>>>>>>>> Well, not signaling it results in a potential deadlock of the
>>>>>>>> whole kernel, whereas wrongly signaling it is "only" a functional
>>>>>>>> bug.      
>>>>>>>
>>>>>>> No, that results in random memory corruption. Which is easily a
>>>>>>> magnitude worse than just a kernel deadlock.
>>>>>>>
>>>>>>> When have seen such bugs numerous times with suspend and resume on
>>>>>>> laptops in different subsystems, e.g. not only GPU. And I'm
>>>>>>> absolutely clearly rejecting any attempt to signal DMA fences when
>>>>>>> an object runs out of scope because of that experience.      
>>>>>>
>>>>>> But you're aware that both in C and Rust you could experience UAF
>>>>>> bugs if fences drop unsignaled and the driver unloads?
>>>>>>
>>>>>> Though I tentatively agree that memory corruptions on a large scale
>>>>>> are probably the worst error you can have on a computer.      
>>>>>
>>>>> Yeah, of course I'm aware of the UAF issue we have.
>>>>>
>>>>> But those are relatively harmless compared to the random memory
>>>>> corruption issues.
>>>>>
>>>>> Linux has the unfortunate habit of re-using memory directly after
>>>>> freeing it because that means caches are usually hotter.
>>>>>
>>>>> So rough DMA operations have the tendency to end up anywhere and
>>>>> tools like KASAN can't find anything wrong.
>>>>>
>>>>> The only protection you sometimes have is IOMMU, but that is usually
>>>>> not able to catch everything either.
>>>>>     
>>>>>>>      
>>>>>>>>> It all stands and falls with the question whether a fence can
>>>>>>>>> drop by accident in Rust, or if it will only ever drop when the
>>>>>>>>> hw-ring is closed.
>>>>>>>>>
>>>>>>>>> What do you believe is the right thing to do when a driver
>>>>>>>>> unloads?      
>>>>>>>>
>>>>>>>> The fence has to be signaled -- ideally after shutting down all
>>>>>>>> queues, but it has to be signaled.      
>>>>>>>
>>>>>>> Yeah well this shutting down all queues (and making sure that no
>>>>>>> write operation is pending in caches etc...) is what people
>>>>>>> usually don't get right.
>>>>>>>
>>>>>>> What you can to is things like setting your timeout to zero and
>>>>>>> immediately causing terminating the HW operation and resetting the
>>>>>>> device.
>>>>>>>
>>>>>>> This will then use the same code path as the mandatory timeout
>>>>>>> functionality for DMA operations and that usually works reliable.      
>>>>>>
>>>>>> Why is all that even an issue? The driver controls the hardware and
>>>>>> must "switch it off" before it unloads. Then the hardware will not
>>>>>> access any memory anymore for sure.      
>>>>>
>>>>> Well exactly that is usually really complicated. Let me try to
>>>>> explain that on a HW example. 
>>>>>
>>>>> Between a shader and the actual system memory you usually have
>>>>> different IP blocks or stages where a memory access needs to go
>>>>> through:
>>>>>
>>>>> Shader -> device VM -> device cache -> PCI bus -> CPU cache -> memory
>>>>>
>>>>> Now when you want to terminate some shader or make some memory
>>>>> inaccessible because it is freed drivers update their page tables and
>>>>> issue the equivalent of TLB invalidation on a CPU.
>>>>>
>>>>> The problem is now that this will only invalidate the translation in
>>>>> the device VM. It doesn't affect the device cache nor any ongoing
>>>>> memory transaction on the bus which waits to snoop the CPU cache.
>>>>>
>>>>> To make sure that you don't corrupt system memory you actually need
>>>>> to wait for a cache flush event to be signaled and *not* just update
>>>>> the VM page tables and tell the HW to invalidate it's TLB.
>>>>>
>>>>> So what is needed is usually a fence operation. In other words a
>>>>> memory value written over the PCIe bus into system memory. Background
>>>>> is that memory writes are ordered and this one comes after all
>>>>> previous PCIe memory writes of the device and so is in the correct
>>>>> order.
>>>>>
>>>>> Only when the CPU sees this memory write you can be sure that your
>>>>> operation is completed.
>>>>>
>>>>> This memory write is then often implemented by using an MSI interrupt
>>>>> which in turn signals the DMA fence.
>>>>>
>>>>>
>>>>> So the right thing to do is to wait for the DMA fence to signal
>>>>> through its normal signaling path which includes both HW and SW
>>>>> functionality and *not* just tell the HW to stop some ring and then
>>>>> just assume that this is also sufficient to signal the DMA fence
>>>>> associated with the HW operation.    
>>>>
>>>> Ultimately this
>>>> "stop-HW-and-make-sure-all-outcomes-are-visible-even-for-partially-executed-jobs"
>>>> is something you'll have to do, no matter what. But it leading to
>>>> having to wait for each pending fence, I'm not too sure. What about the
>>>> case where jobs/ops further away in the HWRing were not even considered
>>>> for execution by the HW, because the STOP operation prevented them from
>>>> being dequeued. I'd expect that the only event we'll get for those is
>>>> "HW queue is properly stopped now". So at this point it's a matter of
>>>> signalling everything that's left, no? I mean, that's basically what
>>>> Panthor does:
>>>>
>>>> 1. it stops
>>>> 2. wait for all executing ops to land (with all the cache maintenance,
>>>> etc, you described)
>>>> 3. signal(ECANCELED) what's left (things not picked by the HW by
>>>> the time the STOP was effective).
>>>>
>>>> It's currently done manually, but does it have to?  
>>>
>>> All this being said, I'm also a pragmatic guy, so if you tell us "no
>>> way!" even after these arguments, I'd rather give up on this
>>> auto-signaling feature and have rust drivers be forced to manually
>>> signal stuff than have the whole Fence abstraction blocked. We can add
>>> a warn_on!(!is_signaled()) in the DriverFence::drop() path for the time
>>> being, so we can at least catch cases where the driver didn't signal
>>> the fence before dropping the signal-able object.  
>>
>>
>> In past discussions with my team members we were also not very
>> determined whether we want autosignal or not.
>>
>> The only thing I'm concerned about is the RCU vs. UAF feature. We
>> already invested a lot of time and pain into that feature, so we most
>> probably want it.
> 
> Right, there's this UAF situation too. I guess if auto-signal is not
> an option, we could add an _ORPHAN_BIT (or _DEAD_BIT) flag, and have
> it tested along the _SIGNALED_BIT one in paths where we check if
> dma_fence::ops are usable.

You mean to protect driver unload? Yeah that's a pretty good point.

But I think I have an idea how to tackle that sanely even on the C side.

Give me a day to go over the rest of my mails and hack something together. I 
have another DMA-fence cleanup I wanted to run by Tvrtko and you anyway.

Regards,
Christian.

Reply via email to