RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid SRIOV

2020-09-21 Thread Zhang, Hawking
[AMD Public Use]

Thanks Monk. 

Take a look at this patch it almost exclude all the rlc callback function from 
guest sriov sequence. That's why I would suggest we even don't initialize any 
rlc callback function for sriov while check the function pointer before access 
it.

Regarding CG/PG feature, if we don't need it in SRIOV guest, just remove the 
corresponding mask for SRIOV. It's really not a good practice to add check in 
every function.

Regards,
Hawking
-Original Message-
From: Liu, Monk  
Sent: Tuesday, September 22, 2020 10:57
To: Liu, Monk ; Zhang, Hawking ; 
Khaire, Rohit ; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Li, Rong (Zero) ; Min, 
Frank ; Yuan, Xiaojie 
Subject: 回复: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

I think the main problem is RLCG is not an independent IP block but embedded in 
GFX block, so that make it hard to cut off from amdgpu for SRIOV

But I think we still have chance to introduce one more layer between GFX and 
RLCG and redundant RLCG functions for SRIOV

(except the part that some L1 blocked registers are accessed through RLCG path 
 )



-邮件原件-
发件人: amd-gfx  代表 Liu, Monk
发送时间: 2020年9月22日 10:54
收件人: Zhang, Hawking ; Khaire, Rohit 
; amd-gfx@lists.freedesktop.org
抄送: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Li, Rong (Zero) ; Min, 
Frank ; Yuan, Xiaojie 
主题: 回复: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid 
SRIOV

Yeah, Let's have a deep discussion regarding RLCG logic 

-邮件原件-
发件人: Zhang, Hawking  
发送时间: 2020年9月22日 10:04
收件人: Zhang, Hawking ; Khaire, Rohit 
; amd-gfx@lists.freedesktop.org; Liu, Monk 

抄送: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Li, Rong (Zero) ; Min, 
Frank ; Yuan, Xiaojie 
主题: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid 
SRIOV

[AMD Public Use]

Add @Liu, Monk for a more reasonable approach if any.

Regards,
Hawking

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Tuesday, September 22, 2020 10:02
To: Khaire, Rohit ; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Li, Rong (Zero) ; Min, 
Frank ; Yuan, Xiaojie 
Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

This is really not a sustainable approach --  add amdgpu_sriov_vf(adev) check 
for every callback function.

If RLC is not allowed to access from guest, we shall not initialize 
gfx.rlc.funcs for sriov guest..., while check the function pointer before 
invoke the function.

I think we really need to think about the approach we are using to support 
sriov guest. I'm afraid, in current approach, more and more functions will have 
to add amdgpu_sriov_vf(adev) check

Regards,
Hawking

-Original Message-
From: Khaire, Rohit  
Sent: Tuesday, September 22, 2020 05:16
To: amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Zhang, Hawking ; Xu, 
Feifei ; Wang, Kevin(Yang) ; Yuan, 
Xiaojie ; Li, Rong (Zero) ; Min, Frank 

Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

Adding more reviewers to cc.

Rohit

-Original Message-
From: Khaire, Rohit  
Sent: September 3, 2020 5:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Khaire, Rohit 
Subject: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid 
SRIOV

Signed-off-by: Rohit Khaire 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 49 --
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 64 +++-  
drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 42 
 3 files changed, 95 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index d502e30f67d9..4bafbd453e53 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4808,14 +4808,23 @@ static int gfx_v10_0_init_csb(struct amdgpu_device 
*adev)
 
 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev)  {
-   u32 tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
+   u32 tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
+   tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0);
WREG32_SOC15(GC, 0, mmRLC_CNTL, tmp);
 }
 
 static void gfx_v10_0_rlc_reset(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 1);
udelay(50);
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 0); @@ -4846,6 
+4855,10 @@ static void gfx_v10_0_rlc_smu_handshake_cntl(struct amdgpu_device 
*adev,
 
 static void gfx_v10_0_rlc_start(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev

RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid SRIOV

2020-09-21 Thread Zhang, Hawking
[AMD Public Use]

If most bare metal sequence is bypassed from guest side, why we don't create 
sriov specific ip block and initialize in set_ip_block phase. In such way, we 
can have clean code base for both bare metal and sriov guest.

Now the amdgpu_sriov_vf is almost everywhere in amdgpu

Regards,
Hawking

-Original Message-
From: Zhang, Hawking  
Sent: Tuesday, September 22, 2020 10:04
To: Zhang, Hawking ; Khaire, Rohit 
; amd-gfx@lists.freedesktop.org; Liu, Monk 

Cc: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Li, Rong (Zero) ; Min, 
Frank ; Yuan, Xiaojie 
Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

Add @Liu, Monk for a more reasonable approach if any.

Regards,
Hawking

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Tuesday, September 22, 2020 10:02
To: Khaire, Rohit ; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Li, Rong (Zero) ; Min, 
Frank ; Yuan, Xiaojie 
Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

This is really not a sustainable approach --  add amdgpu_sriov_vf(adev) check 
for every callback function.

If RLC is not allowed to access from guest, we shall not initialize 
gfx.rlc.funcs for sriov guest..., while check the function pointer before 
invoke the function.

I think we really need to think about the approach we are using to support 
sriov guest. I'm afraid, in current approach, more and more functions will have 
to add amdgpu_sriov_vf(adev) check

Regards,
Hawking

-Original Message-
From: Khaire, Rohit  
Sent: Tuesday, September 22, 2020 05:16
To: amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Zhang, Hawking ; Xu, 
Feifei ; Wang, Kevin(Yang) ; Yuan, 
Xiaojie ; Li, Rong (Zero) ; Min, Frank 

Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

Adding more reviewers to cc.

Rohit

-Original Message-
From: Khaire, Rohit  
Sent: September 3, 2020 5:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Khaire, Rohit 
Subject: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid 
SRIOV

Signed-off-by: Rohit Khaire 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 49 --
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 64 +++-  
drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 42 
 3 files changed, 95 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index d502e30f67d9..4bafbd453e53 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4808,14 +4808,23 @@ static int gfx_v10_0_init_csb(struct amdgpu_device 
*adev)
 
 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev)  {
-   u32 tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
+   u32 tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
+   tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0);
WREG32_SOC15(GC, 0, mmRLC_CNTL, tmp);
 }
 
 static void gfx_v10_0_rlc_reset(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 1);
udelay(50);
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 0); @@ -4846,6 
+4855,10 @@ static void gfx_v10_0_rlc_smu_handshake_cntl(struct amdgpu_device 
*adev,
 
 static void gfx_v10_0_rlc_start(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* TODO: enable rlc & smu handshake until smu
 * and gfxoff feature works as expected */
if (!(amdgpu_pp_feature_mask & PP_GFXOFF_MASK)) @@ -4859,6 +4872,10 @@ 
static void gfx_v10_0_rlc_enable_srm(struct amdgpu_device *adev)  {
uint32_t tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* enable Save Restore Machine */
tmp = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_SRM_CNTL));
tmp |= RLC_SRM_CNTL__AUTO_INCR_ADDR_MASK;
@@ -4872,6 +4889,10 @@ static int gfx_v10_0_rlc_load_microcode(struct 
amdgpu_device *adev)
const __le32 *fw_data;
unsigned i, fw_size;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
if (!adev->gfx.rlc_fw)
return -EINVAL;
 
@@ -4906,8 +4927,7 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
 
gfx_v10_0_init_csb(adev);
 
-   if (!amdgpu_sriov_vf(adev)) /* enable RLC SRM */
-   gfx_v10_0_rlc_enable_srm(adev);
+   gfx_v10_0_rlc_enable_srm(adev);
   

RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid SRIOV

2020-09-21 Thread Zhang, Hawking
[AMD Public Use]

Add @Liu, Monk for a more reasonable approach if any.

Regards,
Hawking

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Tuesday, September 22, 2020 10:02
To: Khaire, Rohit ; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Xu, Feifei ; Wang, 
Kevin(Yang) ; Li, Rong (Zero) ; Min, 
Frank ; Yuan, Xiaojie 
Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

This is really not a sustainable approach --  add amdgpu_sriov_vf(adev) check 
for every callback function.

If RLC is not allowed to access from guest, we shall not initialize 
gfx.rlc.funcs for sriov guest..., while check the function pointer before 
invoke the function.

I think we really need to think about the approach we are using to support 
sriov guest. I'm afraid, in current approach, more and more functions will have 
to add amdgpu_sriov_vf(adev) check

Regards,
Hawking

-Original Message-
From: Khaire, Rohit  
Sent: Tuesday, September 22, 2020 05:16
To: amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Zhang, Hawking ; Xu, 
Feifei ; Wang, Kevin(Yang) ; Yuan, 
Xiaojie ; Li, Rong (Zero) ; Min, Frank 

Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

Adding more reviewers to cc.

Rohit

-Original Message-
From: Khaire, Rohit  
Sent: September 3, 2020 5:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Khaire, Rohit 
Subject: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid 
SRIOV

Signed-off-by: Rohit Khaire 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 49 --
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 64 +++-  
drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 42 
 3 files changed, 95 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index d502e30f67d9..4bafbd453e53 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4808,14 +4808,23 @@ static int gfx_v10_0_init_csb(struct amdgpu_device 
*adev)
 
 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev)  {
-   u32 tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
+   u32 tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
+   tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0);
WREG32_SOC15(GC, 0, mmRLC_CNTL, tmp);
 }
 
 static void gfx_v10_0_rlc_reset(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 1);
udelay(50);
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 0); @@ -4846,6 
+4855,10 @@ static void gfx_v10_0_rlc_smu_handshake_cntl(struct amdgpu_device 
*adev,
 
 static void gfx_v10_0_rlc_start(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* TODO: enable rlc & smu handshake until smu
 * and gfxoff feature works as expected */
if (!(amdgpu_pp_feature_mask & PP_GFXOFF_MASK)) @@ -4859,6 +4872,10 @@ 
static void gfx_v10_0_rlc_enable_srm(struct amdgpu_device *adev)  {
uint32_t tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* enable Save Restore Machine */
tmp = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_SRM_CNTL));
tmp |= RLC_SRM_CNTL__AUTO_INCR_ADDR_MASK;
@@ -4872,6 +4889,10 @@ static int gfx_v10_0_rlc_load_microcode(struct 
amdgpu_device *adev)
const __le32 *fw_data;
unsigned i, fw_size;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
if (!adev->gfx.rlc_fw)
return -EINVAL;
 
@@ -4906,8 +4927,7 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
 
gfx_v10_0_init_csb(adev);
 
-   if (!amdgpu_sriov_vf(adev)) /* enable RLC SRM */
-   gfx_v10_0_rlc_enable_srm(adev);
+   gfx_v10_0_rlc_enable_srm(adev);
} else {
if (amdgpu_sriov_vf(adev)) {
gfx_v10_0_init_csb(adev);
@@ -6990,7 +7010,6 @@ static int gfx_v10_0_hw_fini(void *handle)
if (amdgpu_gfx_disable_kcq(adev))
DRM_ERROR("KCQ disable failed\n");
if (amdgpu_sriov_vf(adev)) {
-   gfx_v10_0_cp_gfx_enable(adev, false);
/* Program KIQ position of RLC_CP_SCHEDULERS during destroy */
tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS);
tmp &= 0xff00;
@@ -7272,6 +7291,10 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade  {
uint32_t data, def;
 
+

RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid SRIOV

2020-09-21 Thread Zhang, Hawking
[AMD Public Use]

This is really not a sustainable approach --  add amdgpu_sriov_vf(adev) check 
for every callback function.

If RLC is not allowed to access from guest, we shall not initialize 
gfx.rlc.funcs for sriov guest..., while check the function pointer before 
invoke the function.

I think we really need to think about the approach we are using to support 
sriov guest. I'm afraid, in current approach, more and more functions will have 
to add amdgpu_sriov_vf(adev) check

Regards,
Hawking

-Original Message-
From: Khaire, Rohit  
Sent: Tuesday, September 22, 2020 05:16
To: amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Zhang, Hawking ; Xu, 
Feifei ; Wang, Kevin(Yang) ; Yuan, 
Xiaojie ; Li, Rong (Zero) ; Min, Frank 

Subject: RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna 
cichlid SRIOV

[AMD Public Use]

Adding more reviewers to cc.

Rohit

-Original Message-
From: Khaire, Rohit  
Sent: September 3, 2020 5:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Khaire, Rohit 
Subject: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid 
SRIOV

Signed-off-by: Rohit Khaire 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 49 --
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 64 +++-  
drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 42 
 3 files changed, 95 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index d502e30f67d9..4bafbd453e53 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4808,14 +4808,23 @@ static int gfx_v10_0_init_csb(struct amdgpu_device 
*adev)
 
 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev)  {
-   u32 tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
+   u32 tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
+   tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0);
WREG32_SOC15(GC, 0, mmRLC_CNTL, tmp);
 }
 
 static void gfx_v10_0_rlc_reset(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 1);
udelay(50);
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 0); @@ -4846,6 
+4855,10 @@ static void gfx_v10_0_rlc_smu_handshake_cntl(struct amdgpu_device 
*adev,
 
 static void gfx_v10_0_rlc_start(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* TODO: enable rlc & smu handshake until smu
 * and gfxoff feature works as expected */
if (!(amdgpu_pp_feature_mask & PP_GFXOFF_MASK)) @@ -4859,6 +4872,10 @@ 
static void gfx_v10_0_rlc_enable_srm(struct amdgpu_device *adev)  {
uint32_t tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* enable Save Restore Machine */
tmp = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_SRM_CNTL));
tmp |= RLC_SRM_CNTL__AUTO_INCR_ADDR_MASK;
@@ -4872,6 +4889,10 @@ static int gfx_v10_0_rlc_load_microcode(struct 
amdgpu_device *adev)
const __le32 *fw_data;
unsigned i, fw_size;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
if (!adev->gfx.rlc_fw)
return -EINVAL;
 
@@ -4906,8 +4927,7 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
 
gfx_v10_0_init_csb(adev);
 
-   if (!amdgpu_sriov_vf(adev)) /* enable RLC SRM */
-   gfx_v10_0_rlc_enable_srm(adev);
+   gfx_v10_0_rlc_enable_srm(adev);
} else {
if (amdgpu_sriov_vf(adev)) {
gfx_v10_0_init_csb(adev);
@@ -6990,7 +7010,6 @@ static int gfx_v10_0_hw_fini(void *handle)
if (amdgpu_gfx_disable_kcq(adev))
DRM_ERROR("KCQ disable failed\n");
if (amdgpu_sriov_vf(adev)) {
-   gfx_v10_0_cp_gfx_enable(adev, false);
/* Program KIQ position of RLC_CP_SCHEDULERS during destroy */
tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS);
tmp &= 0xff00;
@@ -7272,6 +7291,10 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade  {
uint32_t data, def;
 
+   /* For SRIOV, guest VM should not touch CGCG and PG stuff */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* It is disabled by HW by default */
if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG)) {
/* 0 - Disable some blocks' MGCG */
@@ -7339,6 +7362,10 @@ static void gfx_v10_0_update_3d_clock_gating(struct 
amdgpu_device *adev,  {
uint32_t data, def;
 
+  

RE: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid SRIOV

2020-09-21 Thread Khaire, Rohit
[AMD Public Use]

Adding more reviewers to cc.

Rohit

-Original Message-
From: Khaire, Rohit  
Sent: September 3, 2020 5:50 PM
To: amd-gfx@lists.freedesktop.org
Cc: Khaire, Rohit 
Subject: [PATCH] drm/amdgpu: Fix L1 policy violations (PSP) on sienna cichlid 
SRIOV

Signed-off-by: Rohit Khaire 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 49 --
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 64 +++-  
drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 42 
 3 files changed, 95 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index d502e30f67d9..4bafbd453e53 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4808,14 +4808,23 @@ static int gfx_v10_0_init_csb(struct amdgpu_device 
*adev)
 
 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev)  {
-   u32 tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
+   u32 tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
+   tmp = RREG32_SOC15(GC, 0, mmRLC_CNTL);
tmp = REG_SET_FIELD(tmp, RLC_CNTL, RLC_ENABLE_F32, 0);
WREG32_SOC15(GC, 0, mmRLC_CNTL, tmp);
 }
 
 static void gfx_v10_0_rlc_reset(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 1);
udelay(50);
WREG32_FIELD15(GC, 0, GRBM_SOFT_RESET, SOFT_RESET_RLC, 0); @@ -4846,6 
+4855,10 @@ static void gfx_v10_0_rlc_smu_handshake_cntl(struct amdgpu_device 
*adev,
 
 static void gfx_v10_0_rlc_start(struct amdgpu_device *adev)  {
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* TODO: enable rlc & smu handshake until smu
 * and gfxoff feature works as expected */
if (!(amdgpu_pp_feature_mask & PP_GFXOFF_MASK)) @@ -4859,6 +4872,10 @@ 
static void gfx_v10_0_rlc_enable_srm(struct amdgpu_device *adev)  {
uint32_t tmp;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* enable Save Restore Machine */
tmp = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_SRM_CNTL));
tmp |= RLC_SRM_CNTL__AUTO_INCR_ADDR_MASK;
@@ -4872,6 +4889,10 @@ static int gfx_v10_0_rlc_load_microcode(struct 
amdgpu_device *adev)
const __le32 *fw_data;
unsigned i, fw_size;
 
+   /* For SRIOV, don't touch RLC_G */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
if (!adev->gfx.rlc_fw)
return -EINVAL;
 
@@ -4906,8 +4927,7 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
 
gfx_v10_0_init_csb(adev);
 
-   if (!amdgpu_sriov_vf(adev)) /* enable RLC SRM */
-   gfx_v10_0_rlc_enable_srm(adev);
+   gfx_v10_0_rlc_enable_srm(adev);
} else {
if (amdgpu_sriov_vf(adev)) {
gfx_v10_0_init_csb(adev);
@@ -6990,7 +7010,6 @@ static int gfx_v10_0_hw_fini(void *handle)
if (amdgpu_gfx_disable_kcq(adev))
DRM_ERROR("KCQ disable failed\n");
if (amdgpu_sriov_vf(adev)) {
-   gfx_v10_0_cp_gfx_enable(adev, false);
/* Program KIQ position of RLC_CP_SCHEDULERS during destroy */
tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS);
tmp &= 0xff00;
@@ -7272,6 +7291,10 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade  {
uint32_t data, def;
 
+   /* For SRIOV, guest VM should not touch CGCG and PG stuff */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* It is disabled by HW by default */
if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG)) {
/* 0 - Disable some blocks' MGCG */
@@ -7339,6 +7362,10 @@ static void gfx_v10_0_update_3d_clock_gating(struct 
amdgpu_device *adev,  {
uint32_t data, def;
 
+   /* For SRIOV, guest VM should not touch CGCG and PG stuff */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* Enable 3D CGCG/CGLS */
if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_3D_CGCG)) {
/* write cmd to clear cgcg/cgls ov */ @@ -7381,6 +7408,10 @@ 
static void gfx_v10_0_update_coarse_grain_clock_gating(struct amdgpu_device 
*ade  {
uint32_t def, data;
 
+   /* For SRIOV, guest VM should not touch CGCG and PG stuff */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_CGCG)) {
def = data = RREG32_SOC15(GC, 0, mmRLC_CGTT_MGCG_OVERRIDE);
/* unset CGCG override */
@@ -7422,6 +7453,10 @@ static void 
gfx_v10_0_update_coarse_grain_clock_gating(struct amdgpu_device *ade  static 
int