Different interrupts may have different timestamp sources, which shouldn't be compared.
If we compare the timestamps of retry faults to timestamps of other interrupts, it may result in all retry fault interrupts being filtered out, because of the different time stamp source. This issue was observed on Strix Halo. Solved by storing the timestamp of the last page fault interrupt. Signed-off-by: Timur Kristóf <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 5 ++++- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index 13bec8461cde..52258f1341c2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c @@ -437,9 +437,12 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, uint32_t hash; /* Stale retry fault if timestamp goes backward */ - if (amdgpu_ih_ts_after(timestamp, ih->processed_timestamp)) + if (timestamp == adev->gmc.processed_fault_timestamp || + amdgpu_ih_ts_after(timestamp, adev->gmc.processed_fault_timestamp)) return true; + adev->gmc.processed_fault_timestamp = MAX(timestamp, adev->gmc.processed_fault_timestamp); + /* If we don't have space left in the ring buffer return immediately */ stamp = max(timestamp, AMDGPU_GMC_FAULT_TIMEOUT + 1) - AMDGPU_GMC_FAULT_TIMEOUT; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 676e3aaa1f27..77eb15380284 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -361,6 +361,7 @@ struct amdgpu_gmc { u64 noretry_flags; u64 init_pte_flags; + u64 processed_fault_timestamp; bool flush_tlb_needs_extra_type_0; bool flush_tlb_needs_extra_type_2; -- 2.54.0
