On 3/6/26 13:36, Philipp Stanner wrote:
>>>> (which
>>>> is the thing that would be attached to the HW ringbuf. The reason is:
>>>> we don't want to leave unsignalled fences behind,
>>>>
>>>
>>> Not only do we not "want to", we actually *cannot*. We have to make
>>> sure all fences are signaled because only this way the C backend plus
>>> RCU can protect also the Rust code against UAF.
>>>
>>>>  and if the HW ring is
>>>> gone, there's nothing that can signal it. Mind explaining why you think
>>>> this shouldn't be done, because I originally interpreted your
>>>> suggestion as exactly the opposite.
>>>
>>> I also don't get it. All fences must always get signaled, that's one of
>>> the most fundamental fence rules. Thus, if the last accessor to a fence
>>> drops, you do want to signal it with -ECANCELED
>>
>> All fences must always signal because the HW operation must always complete 
>> or be terminated by a timeout.
>>
>> If a fence signals only because it runs out of scope than that means that 
>> you have a huge potential for data corruption and that is even worse than 
>> not signaling a fence.
>>
>> In other words not signaling a fence can leave the system in a deadlock 
>> state, but signaling it incorrectly usually results in random data 
>> corruption.
> 
> It all stands and falls with the question whether a fence can drop by
> accident in Rust, or if it will only ever drop when the hw-ring is
> closed.
> 
> What do you believe is the right thing to do when a driver unloads?

Do a dma_fence_wait() to make sure that all HW operations have completed and 
all fences signaled.

> Ideally we could design it in a way that the driver closes its rings,
> the pending fences drop and get signaled with ECANCELED.

No, exactly that is a really bad idea.

Just do it the other way around, use the dma_fence to wait for the HW operation 
to be completed.

Then wait for an RCU grace period to make sure that nobody is still inside your 
DMA fence ops.

And then you can continue with unloading the module.

> Your concern seems to be a driver by accident droping a fence while the
> hardware is still processing the associated job.
> 
> (how's that dangerous, though? Shouldn't parties waiting for the fence
> detect the error? ECANCELED ⇒ you must not access the associated
> memory)

The dma_fence is the SW object which represents the HW operation.

And that HW operation is doing DMA, e.g. accessing and potentially writing into 
memory. That's where the name Direct Memory Access comes from.

So when that is messed up the memory which gets written to is potentially 
re-used with the absolutely dire consequences we have seen so many times.

Keep in mind that this framework is not only used by GPU where at least modern 
ones have VM protection, but also old ones and stuff like V4L were such things 
is just not present in any way.

Regards,
Christian.

> 
> 
> P.
> 
> 
>>
>> Saying that we could potentially make dma_fence_release() more resilient to 
>> ref-counting issues.
>>
>> Regards,
>> Christian.
>>
>>>
>>>
>>> P.
>>
> 

Reply via email to