Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
Yes, exactly. Is the timer guaranteed to monotonous increment? I strongly suspect yes and then a simple "if (old > new) ++upper_32_bits;" should be sufficient. Regards, Christian. Am 13.11.20 um 18:15 schrieb Felix Kuehling: I'd feel better with wrap-around handling. I think having a system up for that long is not likely but not impossible. Having a known hard limit on uptime is probably a bad thing. Imagine someone trying to reproduce the problem ... Regards, Felix Am 2020-11-16 um 6:31 a.m. schrieb Christian König: Feel free to keep my rb for this, but is 455 days enough in general or should we add wrap around handling? Christian. Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex): [AMD Public Use] I just added support for vega10_ih too. Regards, Alex -Original Message- From: Sierra Guiza, Alejandro (Alex) Sent: Tuesday, November 10, 2020 11:55 AM To: amd-gfx@lists.freedesktop.org Cc: Koenig, Christian ; Kuehling, Felix ; Sierra Guiza, Alejandro (Alex) Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter By default this timestamp is based on a 32 bit counter. This is used by the amdgpu_gmc_filter_faults, to avoid process the same interrupt in retry configuration. Apparently there's a problem when the timestamp coming from IH overflows and compares against timestamp coming from the the hash table. This patch only extends the time overflow from 10 minutes to aprx 455 days. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++ drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++ 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c index 837769fcb35b..bda916f33805 100644 --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, ih_rb_cntl)) { DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING1, ih_rb_cntl)) { @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING2, ih_rb_cntl)) { diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 407c6093c2ec..35d68bc5d95e 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev)) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, ih_rb_cntl)) { DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); @@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev)) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING1, ih_rb_cntl)) { @@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RI
Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
I'd feel better with wrap-around handling. I think having a system up for that long is not likely but not impossible. Having a known hard limit on uptime is probably a bad thing. Imagine someone trying to reproduce the problem ... Regards, Felix Am 2020-11-16 um 6:31 a.m. schrieb Christian König: > Feel free to keep my rb for this, but is 455 days enough in general or > should we add wrap around handling? > > Christian. > > Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex): >> [AMD Public Use] >> >> I just added support for vega10_ih too. >> >> Regards, >> Alex >> >>> -Original Message- >>> From: Sierra Guiza, Alejandro (Alex) >>> Sent: Tuesday, November 10, 2020 11:55 AM >>> To: amd-gfx@lists.freedesktop.org >>> Cc: Koenig, Christian ; Kuehling, Felix >>> ; Sierra Guiza, Alejandro (Alex) >>> >>> Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter >>> >>> By default this timestamp is based on a 32 bit counter. >>> This is used by the amdgpu_gmc_filter_faults, to avoid process the same >>> interrupt in retry configuration. >>> Apparently there's a problem when the timestamp coming from IH >>> overflows >>> and compares against timestamp coming from the the hash table. >>> This patch only extends the time overflow from 10 minutes to aprx >>> 455 days. >>> >>> Signed-off-by: Alex Sierra >>> --- >>> drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++ >>> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++ >>> 2 files changed, 12 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c >>> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c >>> index 837769fcb35b..bda916f33805 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c >>> @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct >>> amdgpu_device *adev) >>> >>> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); >>> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, >>> 1); >>> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, >>> + RB_GPU_TS_ENABLE, 1); >>> if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { >>> if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, >>> ih_rb_cntl)) { >>> DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); >>> @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct >>> amdgpu_device *adev) >>> ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, >>> mmIH_RB_CNTL_RING1); >>> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, >>> RB_ENABLE, 1); >>> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, >>> + RB_GPU_TS_ENABLE, 1); >>> if (amdgpu_sriov_vf(adev) && adev->asic_type < >>> CHIP_NAVI10) { >>> if (psp_reg_program(&adev->psp, >>> PSP_REG_IH_RB_CNTL_RING1, >>> ih_rb_cntl)) { >>> @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct >>> amdgpu_device *adev) >>> ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, >>> mmIH_RB_CNTL_RING2); >>> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, >>> RB_ENABLE, 1); >>> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, >>> + RB_GPU_TS_ENABLE, 1); >>> if (amdgpu_sriov_vf(adev) && adev->asic_type < >>> CHIP_NAVI10) { >>> if (psp_reg_program(&adev->psp, >>> PSP_REG_IH_RB_CNTL_RING2, >>> ih_rb_cntl)) { >>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>> index 407c6093c2ec..35d68bc5d95e 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>> @@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct >>> amdgpu_device *adev) >>> >>> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); >>> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, >>> 1); >>> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, >>> + RB_GPU_TS_ENABLE, 1); >>> if (amdgpu_sriov_vf(adev)) { >>> if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, >>> ih_rb_cntl)) { >>> DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); >>> @@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct >>> amdgpu_device *adev) >>> ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, >>> mmIH_RB_CNTL_RING1); >>> ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, >>> RB_ENABLE, 1); >>> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, >>> + RB_GPU_TS_ENABLE, 1); >>> if (amdgpu_sriov_vf(adev)) { >>> if (psp_reg_program(&adev->psp, >>> PSP_REG_IH_RB_CNTL_RING1, >>> ih_rb_cntl)) { >>> @@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct >>> amdg
RE: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
[AMD Public Use] This give us time for the rest of the enablement we're doing. However, we should fix the fundamental problem in the near future. Regards, Alejandro S. > -Original Message- > From: amd-gfx On Behalf Of > Christian König > Sent: Monday, November 16, 2020 5:31 AM > To: Sierra Guiza, Alejandro (Alex) ; amd- > g...@lists.freedesktop.org; Koenig, Christian > Cc: Kuehling, Felix > Subject: Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter > > Feel free to keep my rb for this, but is 455 days enough in general or should > we add wrap around handling? > > Christian. > > Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex): > > [AMD Public Use] > > > > I just added support for vega10_ih too. > > > > Regards, > > Alex > > > >> -Original Message- > >> From: Sierra Guiza, Alejandro (Alex) > >> Sent: Tuesday, November 10, 2020 11:55 AM > >> To: amd-gfx@lists.freedesktop.org > >> Cc: Koenig, Christian ; Kuehling, Felix > >> ; Sierra Guiza, Alejandro (Alex) > >> > >> Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter > >> > >> By default this timestamp is based on a 32 bit counter. > >> This is used by the amdgpu_gmc_filter_faults, to avoid process the > >> same interrupt in retry configuration. > >> Apparently there's a problem when the timestamp coming from IH > >> overflows and compares against timestamp coming from the the hash > table. > >> This patch only extends the time overflow from 10 minutes to aprx 455 > days. > >> > >> Signed-off-by: Alex Sierra > >> --- > >> drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++ > >> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++ > >> 2 files changed, 12 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >> index 837769fcb35b..bda916f33805 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > >> @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct > >> amdgpu_device *adev) > >> > >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); > >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, > >> 1); > >> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, > >> + RB_GPU_TS_ENABLE, 1); > >>if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { > >>if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, > >> ih_rb_cntl)) { > >>DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); > @@ -109,6 +111,8 > >> @@ static void navi10_ih_enable_interrupts(struct > >> amdgpu_device *adev) > >>ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, > mmIH_RB_CNTL_RING1); > >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, > >> RB_ENABLE, 1); > >> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, > >> + RB_GPU_TS_ENABLE, 1); > >>if (amdgpu_sriov_vf(adev) && adev->asic_type < > >> CHIP_NAVI10) { > >>if (psp_reg_program(&adev->psp, > >> PSP_REG_IH_RB_CNTL_RING1, > >>ih_rb_cntl)) { > >> @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct > >> amdgpu_device *adev) > >>ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, > mmIH_RB_CNTL_RING2); > >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > >> RB_ENABLE, 1); > >> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > >> + RB_GPU_TS_ENABLE, 1); > >>if (amdgpu_sriov_vf(adev) && adev->asic_type < > >> CHIP_NAVI10) { > >>if (psp_reg_program(&adev->psp, > >> PSP_REG_IH_RB_CNTL_RING2, > >>ih_rb_cntl)) { > >> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >> index 407c6093c2ec..35d68bc5d95e 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > >> @@
Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
Feel free to keep my rb for this, but is 455 days enough in general or should we add wrap around handling? Christian. Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex): [AMD Public Use] I just added support for vega10_ih too. Regards, Alex -Original Message- From: Sierra Guiza, Alejandro (Alex) Sent: Tuesday, November 10, 2020 11:55 AM To: amd-gfx@lists.freedesktop.org Cc: Koenig, Christian ; Kuehling, Felix ; Sierra Guiza, Alejandro (Alex) Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter By default this timestamp is based on a 32 bit counter. This is used by the amdgpu_gmc_filter_faults, to avoid process the same interrupt in retry configuration. Apparently there's a problem when the timestamp coming from IH overflows and compares against timestamp coming from the the hash table. This patch only extends the time overflow from 10 minutes to aprx 455 days. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++ drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++ 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c index 837769fcb35b..bda916f33805 100644 --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, ih_rb_cntl)) { DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING1, ih_rb_cntl)) { @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING2, ih_rb_cntl)) { diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 407c6093c2ec..35d68bc5d95e 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev)) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, ih_rb_cntl)) { DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); @@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev)) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING1, ih_rb_cntl)) { @@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, + RB_GPU_TS_ENA
Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
Hi Alex, Please update vega10_ih.c as well. Thanks, Philip On 2020-11-09 10:20 p.m., Alex Sierra wrote: By default this timestamp is based on a 32 bit counter. This is used by the amdgpu_gmc_filter_faults, to avoid process the same interrupt in retry configuration. Apparently there's a problem when the timestamp coming from IH overflows and compares against timestamp coming from the the hash table. This patch only extends the time overflow from 10 minutes to aprx 455 days. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c index 837769fcb35b..bda916f33805 100644 --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, ih_rb_cntl)) { DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING1, ih_rb_cntl)) { @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING2, ih_rb_cntl)) { ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
[AMD Public Use] I just added support for vega10_ih too. Regards, Alex > -Original Message- > From: Sierra Guiza, Alejandro (Alex) > Sent: Tuesday, November 10, 2020 11:55 AM > To: amd-gfx@lists.freedesktop.org > Cc: Koenig, Christian ; Kuehling, Felix > ; Sierra Guiza, Alejandro (Alex) > > Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter > > By default this timestamp is based on a 32 bit counter. > This is used by the amdgpu_gmc_filter_faults, to avoid process the same > interrupt in retry configuration. > Apparently there's a problem when the timestamp coming from IH overflows > and compares against timestamp coming from the the hash table. > This patch only extends the time overflow from 10 minutes to aprx 455 days. > > Signed-off-by: Alex Sierra > --- > drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++ > drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++ > 2 files changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > index 837769fcb35b..bda916f33805 100644 > --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c > @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct > amdgpu_device *adev) > > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, > 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, > +RB_GPU_TS_ENABLE, 1); > if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { > if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, > ih_rb_cntl)) { > DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); > @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct > amdgpu_device *adev) > ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, > mmIH_RB_CNTL_RING1); > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, > RB_ENABLE, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, > +RB_GPU_TS_ENABLE, 1); > if (amdgpu_sriov_vf(adev) && adev->asic_type < > CHIP_NAVI10) { > if (psp_reg_program(&adev->psp, > PSP_REG_IH_RB_CNTL_RING1, > ih_rb_cntl)) { > @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct > amdgpu_device *adev) > ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, > mmIH_RB_CNTL_RING2); > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > RB_ENABLE, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > +RB_GPU_TS_ENABLE, 1); > if (amdgpu_sriov_vf(adev) && adev->asic_type < > CHIP_NAVI10) { > if (psp_reg_program(&adev->psp, > PSP_REG_IH_RB_CNTL_RING2, > ih_rb_cntl)) { > diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > index 407c6093c2ec..35d68bc5d95e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c > @@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct > amdgpu_device *adev) > > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, > 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, > +RB_GPU_TS_ENABLE, 1); > if (amdgpu_sriov_vf(adev)) { > if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, > ih_rb_cntl)) { > DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); > @@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct > amdgpu_device *adev) > ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, > mmIH_RB_CNTL_RING1); > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, > RB_ENABLE, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, > +RB_GPU_TS_ENABLE, 1); > if (amdgpu_sriov_vf(adev)) { > if (psp_reg_program(&adev->psp, > PSP_REG_IH_RB_CNTL_RING1, > ih_rb_cntl)) { > @@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct > amdgpu_device *adev) > ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, > mmIH_RB_CNTL_RING2); > ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > RB_ENABLE, 1); > + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, > +RB_GPU_TS_ENABLE, 1); > if (amdgpu_sriov_vf(adev)) { >
Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
Am 10.11.20 um 04:20 schrieb Alex Sierra: By default this timestamp is based on a 32 bit counter. This is used by the amdgpu_gmc_filter_faults, to avoid process the same interrupt in retry configuration. Apparently there's a problem when the timestamp coming from IH overflows and compares against timestamp coming from the the hash table. This patch only extends the time overflow from 10 minutes to aprx 455 days. Good catch, I wasn't aware of that limitation. The documentation from the IH suggested that it is a 64bit value. Signed-off-by: Alex Sierra In the long term we probably need some wrap around handling, but for now Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c index 837769fcb35b..bda916f33805 100644 --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, ih_rb_cntl)) { DRM_ERROR("PSP program IH_RB_CNTL failed!\n"); @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING1, ih_rb_cntl)) { @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev) ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2); ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, RB_ENABLE, 1); + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2, + RB_GPU_TS_ENABLE, 1); if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) { if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING2, ih_rb_cntl)) { ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx