On 1/28/26 14:57, Lazar, Lijo wrote:
> On 28-Jan-26 11:53 AM, Perry Yuan wrote:
>> Add a full memory barrier after clearing no_hw_access in
>> amdgpu_device_mode1_reset() so subsequent PCI state restore
>> access cannot observe stale state on other CPUs.
>>
>
> Just want to reiterate that this approach masks the original logical errors
> within amdgpu.
Yeah, completely agree. A single smp_mb() is actually forbidden by upstreaming
rules.
So absolutely clear NAK from my side to this patch here.
Regards,
Christian.
>
> For ex: this is one such which would not have been caught in the first place
> with shortcuts like these.
>
> 12caf3b76150 drm/amdkfd: Handle GPU reset and drain retry fault race
>
> Thanks,
> Lijo
>
>> Fixes: 91ae0045130b ("drm/amd/pm: Disable MMIO access during SMU Mode 1
>> reset")
>> Signed-off-by: Perry Yuan <[email protected]>
>> Reviewed-by: Yifan Zhang <[email protected]>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b2deb6a74eb2..e69ab8a923e3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5735,6 +5735,9 @@ int amdgpu_device_mode1_reset(struct amdgpu_device
>> *adev)
>> /* enable mmio access after mode 1 reset completed */
>> adev->no_hw_access = false;
>> + /* ensure no_hw_access is updated before we access hw */
>> + smp_mb();
>> +
>> amdgpu_device_load_pci_state(adev->pdev);
>> ret = amdgpu_psp_wait_for_bootloader(adev);
>> if (ret)
>