On 3/6/26 15:55, Boris Brezillon wrote:
> On Fri, 6 Mar 2026 13:54:15 +0100
> Christian König <[email protected]> wrote:
> 
>> On 3/6/26 13:36, Philipp Stanner wrote:
>>>>>> (which
>>>>>> is the thing that would be attached to the HW ringbuf. The
>>>>>> reason is: we don't want to leave unsignalled fences behind,
>>>>>>  
>>>>>
>>>>> Not only do we not "want to", we actually *cannot*. We have to
>>>>> make sure all fences are signaled because only this way the C
>>>>> backend plus RCU can protect also the Rust code against UAF.
>>>>>  
>>>>>>  and if the HW ring is
>>>>>> gone, there's nothing that can signal it. Mind explaining why
>>>>>> you think this shouldn't be done, because I originally
>>>>>> interpreted your suggestion as exactly the opposite.  
>>>>>
>>>>> I also don't get it. All fences must always get signaled, that's
>>>>> one of the most fundamental fence rules. Thus, if the last
>>>>> accessor to a fence drops, you do want to signal it with
>>>>> -ECANCELED  
>>>>
>>>> All fences must always signal because the HW operation must always
>>>> complete or be terminated by a timeout.
>>>>
>>>> If a fence signals only because it runs out of scope than that
>>>> means that you have a huge potential for data corruption and that
>>>> is even worse than not signaling a fence.
>>>>
>>>> In other words not signaling a fence can leave the system in a
>>>> deadlock state, but signaling it incorrectly usually results in
>>>> random data corruption.  
>>>
>>> It all stands and falls with the question whether a fence can drop
>>> by accident in Rust, or if it will only ever drop when the hw-ring
>>> is closed.
>>>
>>> What do you believe is the right thing to do when a driver unloads?
>>>  
>>
>> Do a dma_fence_wait() to make sure that all HW operations have
>> completed and all fences signaled.
>>
>>> Ideally we could design it in a way that the driver closes its
>>> rings, the pending fences drop and get signaled with ECANCELED.  
>>
>> No, exactly that is a really bad idea.
>>
>> Just do it the other way around, use the dma_fence to wait for the HW
>> operation to be completed.
> 
> But in practice you don't just wait for the HW to finish most of the
> time. You instruct the HW to stop processing stuff, and then wait for it
> to acknowledge that it indeed stopped.

And how does the HW acknowledged that it has indeed stopped? Maybe by sending 
an interrupt which signals a DMA-fence?

The point here is that all acknowledgement from the HW that a DMA operation was 
indeed stopped, independent if it's the normal operation completed use case or 
if it's the I have aborted use case, *must* always take the same HW and SW path.

It is *not* sufficient that you do something like busy waiting for a bit in a 
register to flip in the abortion path and for a DMA memory write in the normal 
completion path.

That's why MMU/VM inside a device is usually not sufficient to prevent freed 
memory from being written to. You need an IOMMU for that, e.g. close to the 
CPU/memory and without caches behind the HW path.

I should probably write that down in some documentation.

Regards,
Christian.

> And the HWRing object will only
> be dropped when that happens. There's of course a fallback for the case
> where the STOP operation fails (with reset, etc), which would just
> delay the drop of the HWRing. But the point is, when the HWRing is
> dropped, you should be guaranteed that the HW is no longer executing
> any of the possibly pending jobs. Now, of course you can decide that
> it's the driver responsibility to walk the list of jobs that were
> pending after a STOP has been acked and manually signal all fences, or
> you can assume that the HWRing being dropped is what provides this
> guarantee. And because the HWRing embeds you DmaFenceCtx that
> auto-signal on drops, you don't have to do anything.
> 
>>
>> Then wait for an RCU grace period to make sure that nobody is still
>> inside your DMA fence ops.
>>
>> And then you can continue with unloading the module.
>>
>>> Your concern seems to be a driver by accident droping a fence while
>>> the hardware is still processing the associated job.
>>>
>>> (how's that dangerous, though? Shouldn't parties waiting for the
>>> fence detect the error? ECANCELED ⇒ you must not access the
>>> associated memory)  
>>
>> The dma_fence is the SW object which represents the HW operation.
>>
>> And that HW operation is doing DMA, e.g. accessing and potentially
>> writing into memory. That's where the name Direct Memory Access comes
>> from.
>>
>> So when that is messed up the memory which gets written to is
>> potentially re-used with the absolutely dire consequences we have
>> seen so many times.
> 
> But then, I'd argue that the HWRing (and the underlying FenceCtx
> keeping track of emitted fences on this ring) should live at least as
> long as the HW is assumed to be running commands. That's IMHO the
> guarantee we need to provide: don't drop your HWRing object until
> you're sure the HW has stopped pulling stuff from there. I can think
> of the following two cases:
> 
> 1. The HW has a way to prematurely stop stuff that are currently
> executing. First thing we do is ensure the HW can't pull anything new,
> then we issue a STOP and wait for an ACK. When this ACK is received, we
> proceed with the rest of the cleanup. If the ACK doesn't come in time,
> we flag the HWRing unusable, schedule a reset, and wait for this reset
> to be effective before dropping the HWRing.
> 
> 2. The HW can't stop what it started doing. We make sure nothing new
> can be pushed to the HWRing, we wait for the in-flight ops to land. If
> it's taking too long, the timeout handler will take over, schedule a
> reset, and drop the faulty HWRing only after the reset is effective.
> 
>>
>> Keep in mind that this framework is not only used by GPU where at
>> least modern ones have VM protection, but also old ones and stuff
>> like V4L were such things is just not present in any way.
> 
> I'm not questioning the fact signaling fences prematurely can lead to
> terrible security holes which are worse than deadlocks, I'm questioning
> the fact this "dont-signal-before-HW-is-done" guarantee needs to happen
> at the fence level, rather than at the fence emitter level.

Reply via email to