On 4/4/2026 9:02 AM, Gary Guo wrote:
> On Thu Apr 2, 2026 at 10:56 PM BST, Joel Fernandes wrote:
>> Hi Gary,
>>
>> On 4/2/2026 11:24 AM, Gary Guo wrote:
>>> From: Gary Guo <[email protected]>
>>>
>>> Currently, in the GSP->CPU messaging path, the current code misses a read
>>> barrier before data read. The barrier after read is updated to a DMA
>>> barrier (with release ordering desired), instead of the existing (Rust)
>>> SeqCst SMP barrier; the location of barrier is also moved to the beginning
>>> of function, because the barrier is needed to synchronizing between data
>>> and ring-buffer pointer, the RMW operation does not internally need a
>>> barrier (nor it has to be atomic, as CPU pointers are updated by CPU only).
>>>
>>> In the CPU->GSP messaging path, the current code misses a write barrier
>>> after data write and before updating the CPU write pointer. Barrier is not
>>> needed before data write due to control dependency, this fact is documented
>>> explicitly. This could be replaced with an acquire barrier if needed.
>>>
>>> Signed-off-by: Gary Guo <[email protected]>
>>> ---
>>>  drivers/gpu/nova-core/gsp/cmdq.rs | 19 +++++++++++++++++++
>>>  drivers/gpu/nova-core/gsp/fw.rs   | 12 ------------
>>>  2 files changed, 19 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs 
>>> b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> index 2224896ccc89..7e4315b13984 100644
>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>>> @@ -19,6 +19,12 @@
>>>      prelude::*,
>>>      sync::{
>>>          aref::ARef,
>>> +        barrier::{
>>> +            dma_mb,
>>> +            Read,
>>> +            Release,
>>> +            Write, //
>>> +        },
>>>          Mutex, //
>>>      },
>>>      time::Delta,
>>> @@ -258,6 +264,9 @@ fn new(dev: &device::Device<device::Bound>) -> 
>>> Result<Self> {
>>>          let tx = self.cpu_write_ptr() as usize;
>>>          let rx = self.gsp_read_ptr() as usize;
>>>  
>>> +        // ORDERING: control dependency provides necessary LOAD->STORE 
>>> ordering.
>>> +        // `dma_mb(Acquire)` may be used here if we don't want to rely on 
>>> control dependency.
>>
>> Just checking, does control dependency on CPU side really apply to ordering 
>> for
>> IO (what the device perceives?). IOW, the loads are stores might be ordered 
>> on
>> the CPU side, but the device might be seeing these operations out of order. 
>> If
>> that is the case, perhaps the control dependency comment is misleading.
> 
> Given that CPU cannot speculate store, I don't see how dependency ordering can
> be broken even if the other side is a bus-mastering device.
> 
> For this to be broken, the device would be able to see the dependency-ordered
> STORE to be propagated to it before it does it own STORE that propagates to 
> the
> CPU to trigger the CPU STORE in the first place? That'll just break casuality.
> 
> I'll say that control dependency is sufficient here. I'm not worried that the
> compiler will break this particular control dependency given that if
> `gsp_read_ptr == cpu_write_ptr` will imply command allocation failure and this
> condition cannot possibly be optimized out by the compiler.
Yep, ok, fair enough.

thanks,

--
Joel Fernandes

Reply via email to