RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for SRIOV.

2020-05-22 Thread Wan, Gavin
Fixed it as Monk and Hawking suggestion. Now it only has one checking in 
function gfx_v10_0_enable_gui_idle_interrupt.

BTW, I update the commit, but it send out an another email.

Thanks,
Gavin

-Original Message-
From: Zhang, Hawking  
Sent: Friday, May 22, 2020 2:17 AM
To: Liu, Monk ; Chen, Guchun ; Wan, 
Gavin ; amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for 
SRIOV.

[AMD Public Use]

Or make it in more reasonable place.

Regards,
Hawking

-Original Message-
From: Zhang, Hawking 
Sent: Friday, May 22, 2020 14:16
To: Liu, Monk ; Chen, Guchun ; Wan, 
Gavin ; amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for 
SRIOV.

[AMD Public Use]

Yes, please try best effort to not introduce guest/one_vf/mult_vf check.

Regards,
Hawking
-Original Message-
From: Liu, Monk  
Sent: Friday, May 22, 2020 14:12
To: Liu, Monk ; Zhang, Hawking ; Chen, 
Guchun ; Wan, Gavin ; 
amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for 
SRIOV.

Gavin

Looks the only place you need to change is the part of avoid touching 
"CP_INT_CNTL_RING0" which is handled by GIM now 

Others looks not needed at all

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: amd-gfx  On Behalf Of Liu, Monk
Sent: Friday, May 22, 2020 1:52 PM
To: Zhang, Hawking ; Chen, Guchun ; 
Wan, Gavin ; amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for 
SRIOV.

Sounds a good idea

@Wan, Gavin can you try hawking's advice ?

_
Monk Liu|GPU Virtualization Team |AMD


-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Friday, May 22, 2020 1:09 PM
To: Chen, Guchun ; Wan, Gavin ; 
amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for 
SRIOV.

[AMD Public Use]

Can we leverage existing CG flags to control this rather than add 
amdgpu_sriov_vf(adev) check everywhere?

If GC CG feature is programmed by host. We can just mask out the following 
flags for guest driver case (amdgpu_sriov_vf(adev)).

AMD_CG_SUPPORT_GFX_MGCG |
 AMD_CG_SUPPORT_GFX_CGLS |
 AMD_CG_SUPPORT_GFX_CGCG |
 AMD_CG_SUPPORT_GFX_CGLS |
 AMD_CG_SUPPORT_GFX_3D_CGCG |
 AMD_CG_SUPPORT_GFX_3D_CGLS

There are too many amdgpu_sriov_vf(adev) Check in amdgpu driver, which actually 
add unnecessary sustaining effort.

Regards,
Hawking

-Original Message-
From: amd-gfx  On Behalf Of Chen, Guchun
Sent: Friday, May 22, 2020 11:47
To: Wan, Gavin ; amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for 
SRIOV.

[AMD Public Use]

Please see one comment below.

Regards,
Guchun

-Original Message-
From: amd-gfx  On Behalf Of Gavin Wan
Sent: Friday, May 22, 2020 3:53 AM
To: amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for SRIOV.

For SRIOV, since the CGCG is set on host side. The Guest should not program 
CGCG again.

The patch ignores setting CGCG for SRIOV.

Change-Id: Ic336fab3b23b8371c9e9e192182a3ba14a8db8e1
Signed-off-by: Gavin Wan 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index bd5dd4f64311..52b6e4759cf3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -4558,6 +4558,9 @@ static void gfx_v10_0_constants_init(struct amdgpu_device 
*adev)  static void gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device 
*adev,
   bool enable)
 {
+   if (amdgpu_sriov_vf(adev))
+   return;
+
[Guchun]This coding style is not correct. You should put the check after the 
declare of 'u32 tmp'.
Maybe it's better to split below line to declare and execution parts 
respectively.

u32 tmp = RREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0);
 
tmp = REG_SET_FIELD(tmp, CP_INT_CNTL_RING0, CNTX_BUSY_INT_ENABLE, @@ 
-6842,6 +6845,9 @@ static void 
gfx_v10_0_update_medium_grain_clock_gating(struct amdgpu_device *ade  {
uint32_t data, def;
 
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* It is disabled by HW by default */
if (enable && (adev->cg_flags & AMD_CG_SUPPORT_GFX_MGCG)) {
/* 0 - Disable some blocks' MGCG */
@@ -6911,6 +6917,9 @@ static void gfx_v10_0_update_3d_clock_gating(struct 
amdgpu_device *adev,  {
uint32_t data, def;
 
+   if (amdgpu_sriov_vf(adev))
+

RE: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for SRIOV.

2020-05-22 Thread Wan, Gavin
HI Alex,

I fixed it as your suggestion.

Thanks,
Gavin

-Original Message-
From: Alex Deucher  
Sent: Friday, May 22, 2020 3:11 PM
To: Wan, Gavin 
Cc: amd-gfx list 
Subject: Re: [PATCH] drm/amd/amdgpu: Fix the CGCG setting is overwritten for 
SRIOV.

On Fri, May 22, 2020 at 2:20 PM Gavin Wan  wrote:
>
> For SRIOV, since the CP_INT_CNTL_RING0 is programed on host side.
> The Guest should not program CP_INT_CNTL_RING0 again.
>
> Change-Id: Ic336fab3b23b8371c9e9e192182a3ba14a8db8e1
> Signed-off-by: Gavin Wan 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index bd5dd4f64311..39275bf79448 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -4558,6 +4558,9 @@ static void gfx_v10_0_constants_init(struct 
> amdgpu_device *adev)  static void gfx_v10_0_enable_gui_idle_interrupt(struct 
> amdgpu_device *adev,
>bool enable)  {
> +   if (amdgpu_sriov_vf(adev))
> +   return;
> +

This needs to be below the stack variable declarations or you'll get a warning.

Alex

> u32 tmp = RREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0);
>
> tmp = REG_SET_FIELD(tmp, CP_INT_CNTL_RING0, 
> CNTX_BUSY_INT_ENABLE,
> --
> 2.25.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CGa
> vin.Wan%40amd.com%7C23c1770b248841c7032a08d7fe83d940%7C3dd8961fe4884e6
> 08e11a82d994e183d%7C0%7C0%7C637257714548695750&sdata=PaWH5hLNb3N1Z
> lalw4GsjeeB46xzCVxXDWBROndcKsk%3D&reserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.

2024-07-08 Thread Wan, Gavin
[AMD Official Use Only - AMD Internal Distribution Only]

Hi Hawking,

Fixed the typo and a new email is sent.

Thanks,
Gavin

From: Zhang, Hawking 
Sent: Monday, July 8, 2024 10:23 PM
To: Wan, Gavin ; amd-gfx@lists.freedesktop.org 

Cc: Wan, Gavin 
Subject: RE: [PATCH] drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.

[AMD Official Use Only - AMD Internal Distribution Only]

Please correct the typo in description
CLIENTID_SDMA2 and CLIENTID_SDMA2

With that fixed, the patch is

Reviewed-by: Hawking Zhang 

Regards,
Hawking

-Original Message-
From: amd-gfx  On Behalf Of Gavin Wan
Sent: Tuesday, July 9, 2024 01:20
To: amd-gfx@lists.freedesktop.org
Cc: Wan, Gavin 
Subject: [PATCH] drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping.

sdma has 2 instances in SRIOV cpx mode. Odd numbered VFs have
sdma0/sdma1 instances. Even numbered vfs have sdma2/sdma3. For Even numbered 
vfs, the sdma2 & sdma3 (irq srouce id
CLIENTID_SDMA2 and CLIENTID_SDMA2) shoud map to irq seq 0 & 1.

Signed-off-by: Gavin Wan 
Change-Id: Ie850114932ca98ea3c9176370dde5dd393ddf7e7
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index fec5a3d1c4bc..4516cb0b3ae8 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -82,7 +82,7 @@ static unsigned sdma_v4_4_2_seq_to_irq_id(int seq_num)
}
 }

-static int sdma_v4_4_2_irq_id_to_seq(unsigned client_id)
+static int sdma_v4_4_2_irq_id_to_seq(struct amdgpu_device *adev,
+unsigned client_id)
 {
switch (client_id) {
case SOC15_IH_CLIENTID_SDMA0:
@@ -90,9 +90,15 @@ static int sdma_v4_4_2_irq_id_to_seq(unsigned client_id)
case SOC15_IH_CLIENTID_SDMA1:
return 1;
case SOC15_IH_CLIENTID_SDMA2:
-   return 2;
+   if (amdgpu_sriov_vf(adev) && (adev->gfx.xcc_mask == 0x1))
+   return 0;
+   else
+   return 2;
case SOC15_IH_CLIENTID_SDMA3:
-   return 3;
+   if (amdgpu_sriov_vf(adev) && (adev->gfx.xcc_mask == 0x1))
+   return 1;
+   else
+   return 3;
default:
return -EINVAL;
}
@@ -1541,7 +1547,7 @@ static int sdma_v4_4_2_process_trap_irq(struct 
amdgpu_device *adev,
uint32_t instance, i;

DRM_DEBUG("IH: SDMA trap\n");
-   instance = sdma_v4_4_2_irq_id_to_seq(entry->client_id);
+   instance = sdma_v4_4_2_irq_id_to_seq(adev, entry->client_id);

/* Client id gives the SDMA instance in AID. To know the exact SDMA
 * instance, interrupt entry gives the node id which corresponds to the 
AID instance.
@@ -1584,7 +1590,7 @@ static int sdma_v4_4_2_process_ras_data_cb(struct 
amdgpu_device *adev,
if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__SDMA))
goto out;

-   instance = sdma_v4_4_2_irq_id_to_seq(entry->client_id);
+   instance = sdma_v4_4_2_irq_id_to_seq(adev, entry->client_id);
if (instance < 0)
goto out;

@@ -1603,7 +1609,7 @@ static int sdma_v4_4_2_process_illegal_inst_irq(struct 
amdgpu_device *adev,

DRM_ERROR("Illegal instruction in SDMA command stream\n");

-   instance = sdma_v4_4_2_irq_id_to_seq(entry->client_id);
+   instance = sdma_v4_4_2_irq_id_to_seq(adev, entry->client_id);
if (instance < 0)
return 0;

@@ -1647,7 +1653,7 @@ static int sdma_v4_4_2_print_iv_entry(struct 
amdgpu_device *adev,
struct amdgpu_task_info task_info;
u64 addr;

-   instance = sdma_v4_4_2_irq_id_to_seq(entry->client_id);
+   instance = sdma_v4_4_2_irq_id_to_seq(adev, entry->client_id);
if (instance < 0 || instance >= adev->sdma.num_instances) {
dev_err(adev->dev, "sdma instance invalid %d\n", instance);
return -EINVAL;
--
2.40.1



RE: [PATCH] drm/amd/amdgpu: Remove static from variable in RLCG Reg RW.

2022-04-14 Thread Wan, Gavin
[AMD Official Use Only]

Dear Paul,

Thanks for you comments. Let me do it.

Thanks,
Gavin

-Original Message-
From: Paul Menzel 
Sent: Thursday, April 14, 2022 1:59 PM
To: Wan, Gavin 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amd/amdgpu: Remove static from variable in RLCG Reg RW.

Dear Gavin,


Thank you for your patch.

Am 13.04.22 um 17:26 schrieb Gavin Wan:

Should you re-roll your patch (v2), please remove the dot/period from the end 
of the git commit message summary (subject).

> [why] These static variables saves the RLC Scratch registers address.

s/saves/save/

>When we installed multiple GPUs (for example: XGMI setting) and

s/installed/install/

>multiple GPUs call the function at same time. The RLC Scratch

… same time, the RLC …

>registers address are changed each other. Then it caused

s/caused/causes/

>reading/writing to wrong GPU.

I see from other patches posted here, that [why] is put on a separate line, so 
you do not need to indent the text.

[why]

These static …

>
> [fix] Removed the static from the variables. The variables are
>in stack.

Same here, though *how* instead of *fix* seems more common.

s/Removed/Remove/
s/in stack/on the stack/

>
> Signed-off-by: Gavin Wan 
> Change-Id: Iee78849291d4f7a9688ecc5165bec70ee85cdfbe

Without the Gerrit URL that line is useless.


Kind regards.

Paul


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 +-
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index d5eea031c3e3..d18a05a20566 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -868,11 +868,11 @@ static u32 amdgpu_virt_rlcg_reg_rw(struct amdgpu_device 
> *adev, u32 offset, u32 v
>   uint32_t timeout = 5;
>   uint32_t i, tmp;
>   uint32_t ret = 0;
> - static void *scratch_reg0;
> - static void *scratch_reg1;
> - static void *scratch_reg2;
> - static void *scratch_reg3;
> - static void *spare_int;
> + void *scratch_reg0;
> + void *scratch_reg1;
> + void *scratch_reg2;
> + void *scratch_reg3;
> + void *spare_int;
>
>   if (!adev->gfx.rlc.rlcg_reg_access_supported) {
>   dev_err(adev->dev,


RE: amdgpu_vf_errors

2017-09-28 Thread Wan, Gavin
Hi Dave,

This feature is used on Virtualization environment to send errors of amdgpu 
initialization fail from Guest side to Host side via mailbox (function 
xgpu_ai_mailbox_trans_msg in file mxgpu_ai.c).

In Virtualization environment, it should not use multi GPUs. So I just created 
a global array not to impacts amdgpu structure. The array size is 16, operated 
as ring buffer. It is bigger enough to save the errors of amdgpu initialization 
fail.

You are right, I should move struct amdgpu_vf_error_buffer to struct 
amdgpu_device OR struct amdgpu_virt. Which one is better?

Yes, there are a lot typoes adm->amd and I will fix them.

Thanks your very much.

Best Regards,
Gavin

-Original Message-
From: Dave Airlie [mailto:airl...@gmail.com] 
Sent: Wednesday, September 27, 2017 7:17 PM
To: amd-gfx mailing list ; Wan, Gavin 

Subject: Re: amdgpu_vf_errors

On 28 September 2017 at 09:14, Dave Airlie  wrote:
> I've no idea what this is used for, virtual function errors?
>
> but why does it have no locking, and why is there a global array, (not per 
> gpu?)
>
> Alex, you reviewed it, please rewrite/remove as necessary.

Oh it also typoes adm->amd in lots of places.

Dave.

>
> Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx