On 1/28/26 14:57, Lazar, Lijo wrote:
> On 28-Jan-26 11:53 AM, Perry Yuan wrote:
>> Add a full memory barrier after clearing no_hw_access in
>> amdgpu_device_mode1_reset() so subsequent PCI state restore
>> access cannot observe stale state on other CPUs.
>>
> 
> Just want to reiterate that this approach masks the original logical errors 
> within amdgpu.

Yeah, completely agree. A single smp_mb() is actually forbidden by upstreaming 
rules.

So absolutely clear NAK from my side to this patch here.

Regards,
Christian.

> 
> For ex: this is one such which would not have been caught in the first place 
> with shortcuts like these.
> 
> 12caf3b76150 drm/amdkfd: Handle GPU reset and drain retry fault race
> 
> Thanks,
> Lijo
> 
>> Fixes: 91ae0045130b ("drm/amd/pm: Disable MMIO access during SMU Mode 1 
>> reset")
>> Signed-off-by: Perry Yuan <[email protected]>
>> Reviewed-by: Yifan Zhang <[email protected]>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b2deb6a74eb2..e69ab8a923e3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5735,6 +5735,9 @@ int amdgpu_device_mode1_reset(struct amdgpu_device 
>> *adev)
>>       /* enable mmio access after mode 1 reset completed */
>>       adev->no_hw_access = false;
>>   +    /* ensure no_hw_access is updated before we access hw */
>> +    smp_mb();
>> +
>>       amdgpu_device_load_pci_state(adev->pdev);
>>       ret = amdgpu_psp_wait_for_bootloader(adev);
>>       if (ret)
> 

Reply via email to