Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

2020-11-14 Thread Christian König

Yes, exactly.

Is the timer guaranteed to monotonous increment? I strongly suspect yes 
and then a simple "if (old > new) ++upper_32_bits;" should be sufficient.


Regards,
Christian.

Am 13.11.20 um 18:15 schrieb Felix Kuehling:

I'd feel better with wrap-around handling. I think having a system up
for that long is not likely but not impossible. Having a known hard
limit on uptime is probably a bad thing. Imagine someone trying to
reproduce the problem ...

Regards,
   Felix

Am 2020-11-16 um 6:31 a.m. schrieb Christian König:

Feel free to keep my rb for this, but is 455 days enough in general or
should we add wrap around handling?

Christian.

Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex):

[AMD Public Use]

I just added support for vega10_ih too.

Regards,
Alex


-Original Message-
From: Sierra Guiza, Alejandro (Alex) 
Sent: Tuesday, November 10, 2020 11:55 AM
To: amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian ; Kuehling, Felix
; Sierra Guiza, Alejandro (Alex)

Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

By default this timestamp is based on a 32 bit counter.
This is used by the amdgpu_gmc_filter_faults, to avoid process the same
interrupt in retry configuration.
Apparently there's a problem when the timestamp coming from IH
overflows
and compares against timestamp coming from the the hash table.
This patch only extends the time overflow from 10 minutes to aprx
455 days.

Signed-off-by: Alex Sierra 
---
   drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++
drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++
   2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index 837769fcb35b..bda916f33805 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct
amdgpu_device *adev)

   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
1);
+    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+   RB_GPU_TS_ENABLE, 1);
   if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
   if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
ih_rb_cntl)) {
   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
@@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct
amdgpu_device *adev)
   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING1);
   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
  RB_ENABLE, 1);
+    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
+   RB_GPU_TS_ENABLE, 1);
   if (amdgpu_sriov_vf(adev) && adev->asic_type <
CHIP_NAVI10) {
   if (psp_reg_program(&adev->psp,
PSP_REG_IH_RB_CNTL_RING1,
   ih_rb_cntl)) {
@@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct
amdgpu_device *adev)
   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING2);
   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
  RB_ENABLE, 1);
+    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
+   RB_GPU_TS_ENABLE, 1);
   if (amdgpu_sriov_vf(adev) && adev->asic_type <
CHIP_NAVI10) {
   if (psp_reg_program(&adev->psp,
PSP_REG_IH_RB_CNTL_RING2,
   ih_rb_cntl)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 407c6093c2ec..35d68bc5d95e 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct
amdgpu_device *adev)

   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
1);
+    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+   RB_GPU_TS_ENABLE, 1);
   if (amdgpu_sriov_vf(adev)) {
   if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
ih_rb_cntl)) {
   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
@@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct
amdgpu_device *adev)
   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING1);
   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
  RB_ENABLE, 1);
+    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
+   RB_GPU_TS_ENABLE, 1);
   if (amdgpu_sriov_vf(adev)) {
   if (psp_reg_program(&adev->psp,
PSP_REG_IH_RB_CNTL_RING1,
   ih_rb_cntl)) {
@@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct
amdgpu_device *adev)
   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING2);
   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RI

Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

2020-11-13 Thread Felix Kuehling
I'd feel better with wrap-around handling. I think having a system up
for that long is not likely but not impossible. Having a known hard
limit on uptime is probably a bad thing. Imagine someone trying to
reproduce the problem ...

Regards,
  Felix

Am 2020-11-16 um 6:31 a.m. schrieb Christian König:
> Feel free to keep my rb for this, but is 455 days enough in general or
> should we add wrap around handling?
>
> Christian.
>
> Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex):
>> [AMD Public Use]
>>
>> I just added support for vega10_ih too.
>>
>> Regards,
>> Alex
>>
>>> -Original Message-
>>> From: Sierra Guiza, Alejandro (Alex) 
>>> Sent: Tuesday, November 10, 2020 11:55 AM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Koenig, Christian ; Kuehling, Felix
>>> ; Sierra Guiza, Alejandro (Alex)
>>> 
>>> Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
>>>
>>> By default this timestamp is based on a 32 bit counter.
>>> This is used by the amdgpu_gmc_filter_faults, to avoid process the same
>>> interrupt in retry configuration.
>>> Apparently there's a problem when the timestamp coming from IH
>>> overflows
>>> and compares against timestamp coming from the the hash table.
>>> This patch only extends the time overflow from 10 minutes to aprx
>>> 455 days.
>>>
>>> Signed-off-by: Alex Sierra 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++
>>> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++
>>>   2 files changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>> index 837769fcb35b..bda916f33805 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
>>> @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct
>>> amdgpu_device *adev)
>>>
>>>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
>>>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
>>> 1);
>>> +    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
>>> +   RB_GPU_TS_ENABLE, 1);
>>>   if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
>>>   if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
>>> ih_rb_cntl)) {
>>>   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
>>> @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct
>>> amdgpu_device *adev)
>>>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
>>> mmIH_RB_CNTL_RING1);
>>>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
>>>  RB_ENABLE, 1);
>>> +    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
>>> +   RB_GPU_TS_ENABLE, 1);
>>>   if (amdgpu_sriov_vf(adev) && adev->asic_type <
>>> CHIP_NAVI10) {
>>>   if (psp_reg_program(&adev->psp,
>>> PSP_REG_IH_RB_CNTL_RING1,
>>>   ih_rb_cntl)) {
>>> @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct
>>> amdgpu_device *adev)
>>>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
>>> mmIH_RB_CNTL_RING2);
>>>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
>>>  RB_ENABLE, 1);
>>> +    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
>>> +   RB_GPU_TS_ENABLE, 1);
>>>   if (amdgpu_sriov_vf(adev) && adev->asic_type <
>>> CHIP_NAVI10) {
>>>   if (psp_reg_program(&adev->psp,
>>> PSP_REG_IH_RB_CNTL_RING2,
>>>   ih_rb_cntl)) {
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>> index 407c6093c2ec..35d68bc5d95e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
>>> @@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct
>>> amdgpu_device *adev)
>>>
>>>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
>>>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
>>> 1);
>>> +    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
>>> +   RB_GPU_TS_ENABLE, 1);
>>>   if (amdgpu_sriov_vf(adev)) {
>>>   if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
>>> ih_rb_cntl)) {
>>>   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
>>> @@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct
>>> amdgpu_device *adev)
>>>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
>>> mmIH_RB_CNTL_RING1);
>>>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
>>>  RB_ENABLE, 1);
>>> +    ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
>>> +   RB_GPU_TS_ENABLE, 1);
>>>   if (amdgpu_sriov_vf(adev)) {
>>>   if (psp_reg_program(&adev->psp,
>>> PSP_REG_IH_RB_CNTL_RING1,
>>>   ih_rb_cntl)) {
>>> @@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct
>>> amdg

RE: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

2020-11-13 Thread Sierra Guiza, Alejandro (Alex)
[AMD Public Use]

This give us time for the rest of the enablement we're doing. However, we 
should fix the fundamental problem in the near future. 

Regards,
Alejandro S.

> -Original Message-
> From: amd-gfx  On Behalf Of
> Christian König
> Sent: Monday, November 16, 2020 5:31 AM
> To: Sierra Guiza, Alejandro (Alex) ; amd-
> g...@lists.freedesktop.org; Koenig, Christian 
> Cc: Kuehling, Felix 
> Subject: Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
> 
> Feel free to keep my rb for this, but is 455 days enough in general or should
> we add wrap around handling?
> 
> Christian.
> 
> Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex):
> > [AMD Public Use]
> >
> > I just added support for vega10_ih too.
> >
> > Regards,
> > Alex
> >
> >> -Original Message-
> >> From: Sierra Guiza, Alejandro (Alex) 
> >> Sent: Tuesday, November 10, 2020 11:55 AM
> >> To: amd-gfx@lists.freedesktop.org
> >> Cc: Koenig, Christian ; Kuehling, Felix
> >> ; Sierra Guiza, Alejandro (Alex)
> >> 
> >> Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
> >>
> >> By default this timestamp is based on a 32 bit counter.
> >> This is used by the amdgpu_gmc_filter_faults, to avoid process the
> >> same interrupt in retry configuration.
> >> Apparently there's a problem when the timestamp coming from IH
> >> overflows and compares against timestamp coming from the the hash
> table.
> >> This patch only extends the time overflow from 10 minutes to aprx 455
> days.
> >>
> >> Signed-off-by: Alex Sierra 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++
> >> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++
> >>   2 files changed, 12 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> >> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> >> index 837769fcb35b..bda916f33805 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> >> @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct
> >> amdgpu_device *adev)
> >>
> >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
> >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
> >> 1);
> >> +  ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
> >> + RB_GPU_TS_ENABLE, 1);
> >>if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
> >>if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
> >> ih_rb_cntl)) {
> >>DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
> @@ -109,6 +111,8
> >> @@ static void navi10_ih_enable_interrupts(struct
> >> amdgpu_device *adev)
> >>ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
> mmIH_RB_CNTL_RING1);
> >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
> >>   RB_ENABLE, 1);
> >> +  ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
> >> + RB_GPU_TS_ENABLE, 1);
> >>if (amdgpu_sriov_vf(adev) && adev->asic_type <
> >> CHIP_NAVI10) {
> >>if (psp_reg_program(&adev->psp,
> >> PSP_REG_IH_RB_CNTL_RING1,
> >>ih_rb_cntl)) {
> >> @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct
> >> amdgpu_device *adev)
> >>ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
> mmIH_RB_CNTL_RING2);
> >>ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
> >>   RB_ENABLE, 1);
> >> +  ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
> >> + RB_GPU_TS_ENABLE, 1);
> >>if (amdgpu_sriov_vf(adev) && adev->asic_type <
> >> CHIP_NAVI10) {
> >>if (psp_reg_program(&adev->psp,
> >> PSP_REG_IH_RB_CNTL_RING2,
> >>ih_rb_cntl)) {
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> >> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> >> index 407c6093c2ec..35d68bc5d95e 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> >> @@

Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

2020-11-13 Thread Christian König
Feel free to keep my rb for this, but is 455 days enough in general or 
should we add wrap around handling?


Christian.

Am 10.11.20 um 18:57 schrieb Sierra Guiza, Alejandro (Alex):

[AMD Public Use]

I just added support for vega10_ih too.

Regards,
Alex


-Original Message-
From: Sierra Guiza, Alejandro (Alex) 
Sent: Tuesday, November 10, 2020 11:55 AM
To: amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian ; Kuehling, Felix
; Sierra Guiza, Alejandro (Alex)

Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

By default this timestamp is based on a 32 bit counter.
This is used by the amdgpu_gmc_filter_faults, to avoid process the same
interrupt in retry configuration.
Apparently there's a problem when the timestamp coming from IH overflows
and compares against timestamp coming from the the hash table.
This patch only extends the time overflow from 10 minutes to aprx 455 days.

Signed-off-by: Alex Sierra 
---
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++
drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++
  2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index 837769fcb35b..bda916f33805 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct
amdgpu_device *adev)

ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
ih_rb_cntl)) {
DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
@@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct
amdgpu_device *adev)
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING1);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
   RB_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev) && adev->asic_type <
CHIP_NAVI10) {
if (psp_reg_program(&adev->psp,
PSP_REG_IH_RB_CNTL_RING1,
ih_rb_cntl)) {
@@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct
amdgpu_device *adev)
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING2);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
   RB_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev) && adev->asic_type <
CHIP_NAVI10) {
if (psp_reg_program(&adev->psp,
PSP_REG_IH_RB_CNTL_RING2,
ih_rb_cntl)) {
diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 407c6093c2ec..35d68bc5d95e 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct
amdgpu_device *adev)

ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev)) {
if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
ih_rb_cntl)) {
DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
@@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct
amdgpu_device *adev)
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING1);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
   RB_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev)) {
if (psp_reg_program(&adev->psp,
PSP_REG_IH_RB_CNTL_RING1,
ih_rb_cntl)) {
@@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct
amdgpu_device *adev)
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
mmIH_RB_CNTL_RING2);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
   RB_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
+  RB_GPU_TS_ENA

Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

2020-11-10 Thread philip yang

  
Hi Alex,
Please update vega10_ih.c as well.
Thanks,

Philip

On 2020-11-09 10:20 p.m., Alex Sierra
  wrote:


  By default this timestamp is based on a 32 bit counter.
This is used by the amdgpu_gmc_filter_faults, to
avoid process the same interrupt in retry configuration.
Apparently there's a problem when the timestamp coming from
IH overflows and compares against timestamp coming from the
the hash table.
This patch only extends the time overflow from 10 minutes to
aprx 455 days.

Signed-off-by: Alex Sierra 
---
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index 837769fcb35b..bda916f33805 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev)
 
 	ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
 	ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1);
+	ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+   RB_GPU_TS_ENABLE, 1);
 	if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
 		if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, ih_rb_cntl)) {
 			DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
@@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev)
 		ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
 		ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
 	   RB_ENABLE, 1);
+		ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
+	   RB_GPU_TS_ENABLE, 1);
 		if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
 			if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING1,
 		ih_rb_cntl)) {
@@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device *adev)
 		ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
 		ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
 	   RB_ENABLE, 1);
+		ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
+	   RB_GPU_TS_ENABLE, 1);
 		if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
 			if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL_RING2,
 		ih_rb_cntl)) {


  

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

2020-11-10 Thread Sierra Guiza, Alejandro (Alex)
[AMD Public Use]

I just added support for vega10_ih too.

Regards,
Alex

> -Original Message-
> From: Sierra Guiza, Alejandro (Alex) 
> Sent: Tuesday, November 10, 2020 11:55 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Koenig, Christian ; Kuehling, Felix
> ; Sierra Guiza, Alejandro (Alex)
> 
> Subject: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter
> 
> By default this timestamp is based on a 32 bit counter.
> This is used by the amdgpu_gmc_filter_faults, to avoid process the same
> interrupt in retry configuration.
> Apparently there's a problem when the timestamp coming from IH overflows
> and compares against timestamp coming from the the hash table.
> This patch only extends the time overflow from 10 minutes to aprx 455 days.
> 
> Signed-off-by: Alex Sierra 
> ---
>  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++
> drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 6 ++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> index 837769fcb35b..bda916f33805 100644
> --- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
> @@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct
> amdgpu_device *adev)
> 
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
> 1);
> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
> +RB_GPU_TS_ENABLE, 1);
>   if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
>   if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
> ih_rb_cntl)) {
>   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
> @@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct
> amdgpu_device *adev)
>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
> mmIH_RB_CNTL_RING1);
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
>  RB_ENABLE, 1);
> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
> +RB_GPU_TS_ENABLE, 1);
>   if (amdgpu_sriov_vf(adev) && adev->asic_type <
> CHIP_NAVI10) {
>   if (psp_reg_program(&adev->psp,
> PSP_REG_IH_RB_CNTL_RING1,
>   ih_rb_cntl)) {
> @@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct
> amdgpu_device *adev)
>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
> mmIH_RB_CNTL_RING2);
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
>  RB_ENABLE, 1);
> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
> +RB_GPU_TS_ENABLE, 1);
>   if (amdgpu_sriov_vf(adev) && adev->asic_type <
> CHIP_NAVI10) {
>   if (psp_reg_program(&adev->psp,
> PSP_REG_IH_RB_CNTL_RING2,
>   ih_rb_cntl)) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> index 407c6093c2ec..35d68bc5d95e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
> @@ -50,6 +50,8 @@ static void vega10_ih_enable_interrupts(struct
> amdgpu_device *adev)
> 
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR,
> 1);
> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
> +RB_GPU_TS_ENABLE, 1);
>   if (amdgpu_sriov_vf(adev)) {
>   if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL,
> ih_rb_cntl)) {
>   DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
> @@ -64,6 +66,8 @@ static void vega10_ih_enable_interrupts(struct
> amdgpu_device *adev)
>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
> mmIH_RB_CNTL_RING1);
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
>  RB_ENABLE, 1);
> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
> +RB_GPU_TS_ENABLE, 1);
>   if (amdgpu_sriov_vf(adev)) {
>   if (psp_reg_program(&adev->psp,
> PSP_REG_IH_RB_CNTL_RING1,
>   ih_rb_cntl)) {
> @@ -80,6 +84,8 @@ static void vega10_ih_enable_interrupts(struct
> amdgpu_device *adev)
>   ih_rb_cntl = RREG32_SOC15(OSSSYS, 0,
> mmIH_RB_CNTL_RING2);
>   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
>  RB_ENABLE, 1);
> + ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
> +RB_GPU_TS_ENABLE, 1);
>   if (amdgpu_sriov_vf(adev)) {
>  

Re: [PATCH] drm/amdgpu: enable 48-bit IH timestamp counter

2020-11-10 Thread Christian König

Am 10.11.20 um 04:20 schrieb Alex Sierra:

By default this timestamp is based on a 32 bit counter.
This is used by the amdgpu_gmc_filter_faults, to
avoid process the same interrupt in retry configuration.
Apparently there's a problem when the timestamp coming from
IH overflows and compares against timestamp coming from the
the hash table.
This patch only extends the time overflow from 10 minutes to
aprx 455 days.


Good catch, I wasn't aware of that limitation. The documentation from 
the IH suggested that it is a 64bit value.



Signed-off-by: Alex Sierra 


In the long term we probably need some wrap around handling, but for now 
Reviewed-by: Christian König 



---
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
index 837769fcb35b..bda916f33805 100644
--- a/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/navi10_ih.c
@@ -94,6 +94,8 @@ static void navi10_ih_enable_interrupts(struct amdgpu_device 
*adev)
  
  	ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, RB_ENABLE, 1);

ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL, ENABLE_INTR, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
if (psp_reg_program(&adev->psp, PSP_REG_IH_RB_CNTL, 
ih_rb_cntl)) {
DRM_ERROR("PSP program IH_RB_CNTL failed!\n");
@@ -109,6 +111,8 @@ static void navi10_ih_enable_interrupts(struct 
amdgpu_device *adev)
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING1);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
   RB_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING1,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
if (psp_reg_program(&adev->psp, 
PSP_REG_IH_RB_CNTL_RING1,
ih_rb_cntl)) {
@@ -125,6 +129,8 @@ static void navi10_ih_enable_interrupts(struct 
amdgpu_device *adev)
ih_rb_cntl = RREG32_SOC15(OSSSYS, 0, mmIH_RB_CNTL_RING2);
ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
   RB_ENABLE, 1);
+   ih_rb_cntl = REG_SET_FIELD(ih_rb_cntl, IH_RB_CNTL_RING2,
+  RB_GPU_TS_ENABLE, 1);
if (amdgpu_sriov_vf(adev) && adev->asic_type < CHIP_NAVI10) {
if (psp_reg_program(&adev->psp, 
PSP_REG_IH_RB_CNTL_RING2,
ih_rb_cntl)) {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx