RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10

2023-04-03 Thread Chai, Thomas
[AMD Official Use Only - General]




-
Best Regards,
Thomas

-Original Message-
From: Zhou1, Tao  
Sent: Monday, April 3, 2023 11:45 AM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Li, Candice ; 
Yang, Stanley 
Subject: RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10

[AMD Official Use Only - General]



> -Original Message-
> From: Chai, Thomas 
> Sent: Monday, April 3, 2023 9:59 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chai, Thomas ; Zhang, Hawking 
> ; Zhou1, Tao ; Li, Candice 
> ; Yang, Stanley ; Chai, 
> Thomas 
> Subject: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10
> 
> Optimize redundant code in umc_v8_10
> 
> Signed-off-by: YiPeng Chai 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c |  31 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h |   7 +
>  drivers/gpu/drm/amd/amdgpu/umc_v8_10.c  | 197 
> +---
>  3 files changed, 115 insertions(+), 120 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index 9e2e97207e53..734442315cf6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -302,3 +302,34 @@ void amdgpu_umc_fill_error_record(struct
> ras_err_data *err_data,
> 
>   err_data->err_addr_cnt++;
>  }
> +
> +int amdgpu_umc_scan_all_umc_channels(struct amdgpu_device *adev,
> + umc_func func, void *data)
> +{
> + uint32_t node_inst   = 0;
> + uint32_t umc_inst= 0;
> + uint32_t ch_inst = 0;
> + int ret = 0;
> +
> + if (adev->umc.node_inst_num) {
> + LOOP_UMC_EACH_NODE_INST_AND_CH(node_inst, umc_inst,
> ch_inst) {
> + ret = func(adev, node_inst, umc_inst, ch_inst, data);
> + if (ret) {
> + dev_err(adev->dev, "Node %d umc %d ch %d
> func returns %d\n",
> + node_inst, umc_inst, ch_inst, ret);
> + return ret;
> + }
> + }
> + } else {
> + LOOP_UMC_INST_AND_CH(umc_inst, ch_inst) {

>[Tao] for ASIC which doesn't support node, can we set its node_inst_num to 1 
>and retire the macro LOOP_UMC_INST_AND_CH?

[Thomas] I am afraid not.

" #define LOOP_UMC_NODE_INST(node_inst) \
for_each_set_bit((node_inst), &(adev->umc.active_mask), 
adev->umc.node_inst_num) "

The node instance loop of LOOP_UMC_EACH_NODE_INST_AND_CH  supports node 
harvest, so node_inst_num is not the real node instance number.


> + ret = func(adev, 0, umc_inst, ch_inst, data);
> + if (ret) {
> + dev_err(adev->dev, "Umc %d ch %d func
> returns %d\n",
> + umc_inst, ch_inst, ret);
> + return ret;
> + }
> + }
> + }
> +
> + return 0;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> index d7f1229ff11f..f279c8057f96 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> @@ -47,6 +47,10 @@
>  #define LOOP_UMC_EACH_NODE_INST_AND_CH(node_inst, umc_inst, ch_inst) 
> \
>   LOOP_UMC_NODE_INST((node_inst))
> LOOP_UMC_INST_AND_CH((umc_inst), (ch_inst))
> 
> +
> +typedef int (*umc_func)(struct amdgpu_device *adev, uint32_t node_inst,
> + uint32_t umc_inst, uint32_t ch_inst, void *data);
> +
>  struct amdgpu_umc_ras {
>   struct amdgpu_ras_block_object ras_block;
>   void (*err_cnt_init)(struct amdgpu_device *adev); @@ -104,4 +108,7 
> @@ int amdgpu_umc_process_ras_data_cb(struct amdgpu_device *adev,
>   struct amdgpu_iv_entry *entry);
>  int amdgpu_umc_page_retirement_mca(struct amdgpu_device *adev,
>   uint64_t err_addr, uint32_t ch_inst, uint32_t umc_inst);
> +
> +int amdgpu_umc_scan_all_umc_channels(struct amdgpu_device *adev,
> + umc_func func, void *data);
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> b/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> index fb55e8cb9967..6dff313ac04c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> @@ -76,10 +76,13 @@ static inline uint32_t 
> get_umc_v8_10_reg_offset(struct amdgpu_device *adev,
>   UMC_8_NODE_DIST * node_inst;
>  }
> 
> -static void umc_v8_10_clear_error_count_per_channel(struct 
> amdgpu_device *adev,
> - uint32_t umc_reg_offset)
> +static int umc_v8_10_clear_error_count_per_channel(struct 
> +amdgpu_device
> *adev,
> + uint32_t node_inst, uint32_t umc_inst,
> + uint32_t ch_inst, void *data)
>  {
>   uint32_t ecc_err_cnt_addr

RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10

2023-04-03 Thread Zhou1, Tao
[AMD Official Use Only - General]



> -Original Message-
> From: Chai, Thomas 
> Sent: Monday, April 3, 2023 3:00 PM
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Li, Candice
> ; Yang, Stanley 
> Subject: RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10
> 
> [AMD Official Use Only - General]
> 
> 
> 
> 
> -
> Best Regards,
> Thomas
> 
> -Original Message-
> From: Zhou1, Tao 
> Sent: Monday, April 3, 2023 11:45 AM
> To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Li, Candice
> ; Yang, Stanley 
> Subject: RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10
> 
> [AMD Official Use Only - General]
> 
> 
> 
> > -Original Message-
> > From: Chai, Thomas 
> > Sent: Monday, April 3, 2023 9:59 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Chai, Thomas ; Zhang, Hawking
> > ; Zhou1, Tao ; Li,
> Candice
> > ; Yang, Stanley ; Chai,
> > Thomas 
> > Subject: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10
> >
> > Optimize redundant code in umc_v8_10
> >
> > Signed-off-by: YiPeng Chai 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c |  31 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h |   7 +
> >  drivers/gpu/drm/amd/amdgpu/umc_v8_10.c  | 197
> > +---
> >  3 files changed, 115 insertions(+), 120 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > index 9e2e97207e53..734442315cf6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > @@ -302,3 +302,34 @@ void amdgpu_umc_fill_error_record(struct
> > ras_err_data *err_data,
> >
> > err_data->err_addr_cnt++;
> >  }
> > +
> > +int amdgpu_umc_scan_all_umc_channels(struct amdgpu_device *adev,
> > +   umc_func func, void *data)
> > +{
> > +   uint32_t node_inst   = 0;
> > +   uint32_t umc_inst= 0;
> > +   uint32_t ch_inst = 0;
> > +   int ret = 0;
> > +
> > +   if (adev->umc.node_inst_num) {
> > +   LOOP_UMC_EACH_NODE_INST_AND_CH(node_inst, umc_inst,
> > ch_inst) {
> > +   ret = func(adev, node_inst, umc_inst, ch_inst, data);
> > +   if (ret) {
> > +   dev_err(adev->dev, "Node %d umc %d ch %d
> > func returns %d\n",
> > +   node_inst, umc_inst, ch_inst, ret);
> > +   return ret;
> > +   }
> > +   }
> > +   } else {
> > +   LOOP_UMC_INST_AND_CH(umc_inst, ch_inst) {
> 
> >[Tao] for ASIC which doesn't support node, can we set its node_inst_num to 1
> and retire the macro LOOP_UMC_INST_AND_CH?
> 
> [Thomas] I am afraid not.
> 
>   " #define LOOP_UMC_NODE_INST(node_inst) \
>   for_each_set_bit((node_inst), &(adev->umc.active_mask),
> adev->umc.node_inst_num) "
> 
>   The node instance loop of LOOP_UMC_EACH_NODE_INST_AND_CH
> supports node harvest, so node_inst_num is not the real node instance number.

[Tao] we can set both node_inst_num and active_mask to 1, but either way is 
fine for me.
BTW, I think amdgpu_umc_loop_channels  is simple than 
amdgpu_umc_scan_all_umc_channels, with this fixed, the series is:

Reviewed-by: Tao Zhou 

> 
> 
> > +   ret = func(adev, 0, umc_inst, ch_inst, data);
> > +   if (ret) {
> > +   dev_err(adev->dev, "Umc %d ch %d func
> > returns %d\n",
> > +   umc_inst, ch_inst, ret);
> > +   return ret;
> > +   }
> > +   }
> > +   }
> > +
> > +   return 0;
> > +}
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > index d7f1229ff11f..f279c8057f96 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > @@ -47,6 +47,10 @@
> >  #define LOOP_UMC_EACH_NODE_INST_AND_CH(node_inst, umc_inst,
> ch_inst)
> > \
> > LOOP_UMC_NODE_INST((node_inst))
> > LOOP_UMC_INST_AND_CH((umc_inst), (ch_inst))
> >
> > +
> > +typedef int (*umc_func)(struct amdgpu_device *adev, uint32_t node_inst,
> > +   uint32_t umc_inst, uint32_t ch_inst, void *data);
> > +
> >  struct amdgpu_umc_ras {
> > struct amdgpu_ras_block_object ras_block;
> > void (*err_cnt_init)(struct amdgpu_device *adev); @@ -104,4 +108,7
> > @@ int amdgpu_umc_process_ras_data_cb(struct amdgpu_device *adev,
> > struct amdgpu_iv_entry *entry);
> >  int amdgpu_umc_page_retirement_mca(struct amdgpu_device *adev,
> > uint64_t err_addr, uint32_t ch_inst, uint32_t umc_inst);
> > +
> > +int amdgpu_umc_scan_all_umc_channels(struct amdgpu_device *adev,
> > +   umc_func func, void *data);
> >  #endif
> > diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> > b/drivers/gpu/drm/amd/amdgpu/umc_v8_10.c
> > index fb55e8cb9

RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10

2023-04-03 Thread Chai, Thomas
[AMD Official Use Only - General]

OK, Will do.


-
Best Regards,
Thomas

-Original Message-
From: Zhou1, Tao  
Sent: Monday, April 3, 2023 3:21 PM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Li, Candice ; 
Yang, Stanley 
Subject: RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in umc_v8_10

[AMD Official Use Only - General]



> -Original Message-
> From: Chai, Thomas 
> Sent: Monday, April 3, 2023 3:00 PM
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Li, Candice 
> ; Yang, Stanley 
> Subject: RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in 
> umc_v8_10
> 
> [AMD Official Use Only - General]
> 
> 
> 
> 
> -
> Best Regards,
> Thomas
> 
> -Original Message-
> From: Zhou1, Tao 
> Sent: Monday, April 3, 2023 11:45 AM
> To: Chai, Thomas ; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Li, Candice 
> ; Yang, Stanley 
> Subject: RE: [PATCH 1/2] drm/amdgpu: optimize redundant code in 
> umc_v8_10
> 
> [AMD Official Use Only - General]
> 
> 
> 
> > -Original Message-
> > From: Chai, Thomas 
> > Sent: Monday, April 3, 2023 9:59 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Chai, Thomas ; Zhang, Hawking 
> > ; Zhou1, Tao ; Li,
> Candice
> > ; Yang, Stanley ; Chai, 
> > Thomas 
> > Subject: [PATCH 1/2] drm/amdgpu: optimize redundant code in 
> > umc_v8_10
> >
> > Optimize redundant code in umc_v8_10
> >
> > Signed-off-by: YiPeng Chai 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c |  31 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h |   7 +
> >  drivers/gpu/drm/amd/amdgpu/umc_v8_10.c  | 197
> > +---
> >  3 files changed, 115 insertions(+), 120 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > index 9e2e97207e53..734442315cf6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> > @@ -302,3 +302,34 @@ void amdgpu_umc_fill_error_record(struct
> > ras_err_data *err_data,
> >
> > err_data->err_addr_cnt++;
> >  }
> > +
> > +int amdgpu_umc_scan_all_umc_channels(struct amdgpu_device *adev,
> > +   umc_func func, void *data)
> > +{
> > +   uint32_t node_inst   = 0;
> > +   uint32_t umc_inst= 0;
> > +   uint32_t ch_inst = 0;
> > +   int ret = 0;
> > +
> > +   if (adev->umc.node_inst_num) {
> > +   LOOP_UMC_EACH_NODE_INST_AND_CH(node_inst, umc_inst,
> > ch_inst) {
> > +   ret = func(adev, node_inst, umc_inst, ch_inst, data);
> > +   if (ret) {
> > +   dev_err(adev->dev, "Node %d umc %d ch %d
> > func returns %d\n",
> > +   node_inst, umc_inst, ch_inst, ret);
> > +   return ret;
> > +   }
> > +   }
> > +   } else {
> > +   LOOP_UMC_INST_AND_CH(umc_inst, ch_inst) {
> 
> >[Tao] for ASIC which doesn't support node, can we set its 
> >node_inst_num to 1
> and retire the macro LOOP_UMC_INST_AND_CH?
> 
> [Thomas] I am afraid not.
> 
>   " #define LOOP_UMC_NODE_INST(node_inst) \
>   for_each_set_bit((node_inst), &(adev->umc.active_mask),
> adev->umc.node_inst_num) "
> 
>   The node instance loop of LOOP_UMC_EACH_NODE_INST_AND_CH supports 
> node harvest, so node_inst_num is not the real node instance number.

[Tao] we can set both node_inst_num and active_mask to 1, but either way is 
fine for me.
BTW, I think amdgpu_umc_loop_channels  is simple than 
amdgpu_umc_scan_all_umc_channels, with this fixed, the series is:

Reviewed-by: Tao Zhou 

> 
> 
> > +   ret = func(adev, 0, umc_inst, ch_inst, data);
> > +   if (ret) {
> > +   dev_err(adev->dev, "Umc %d ch %d func
> > returns %d\n",
> > +   umc_inst, ch_inst, ret);
> > +   return ret;
> > +   }
> > +   }
> > +   }
> > +
> > +   return 0;
> > +}
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > index d7f1229ff11f..f279c8057f96 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> > @@ -47,6 +47,10 @@
> >  #define LOOP_UMC_EACH_NODE_INST_AND_CH(node_inst, umc_inst,
> ch_inst)
> > \
> > LOOP_UMC_NODE_INST((node_inst))
> > LOOP_UMC_INST_AND_CH((umc_inst), (ch_inst))
> >
> > +
> > +typedef int (*umc_func)(struct amdgpu_device *adev, uint32_t node_inst,
> > +   uint32_t umc_inst, uint32_t ch_inst, void *data);
> > +
> >  struct amdgpu_umc_ras {
> > struct amdgpu_ras_block_object ras_block;
> > void (*err_cnt_init)(struct amdgpu_device *adev); @@ -104,4 +108,7 
> > @@ int amdgpu_umc_process_ras_data_cb(struct amdgpu_device *adev,
> > struct amdgpu_iv_entry *entry);
> >  int amdgpu_umc_page_retirement_mca(s

[PATCH v2 2/2] drm/amdgpu: Add MES KIQ clear to tell RLC that KIQ is dequeued

2023-04-03 Thread Yifan Zha
[Why]
As MES KIQ is dequeued, tell RLC that KIQ is inactive

[How]
Clear the RLC_CP_SCHEDULERS Active bit which RLC checks KIQ status
In addition, driver can halt MES under SRIOV when unloading driver

v2:
Use scheduler0 mask to clear KIQ portion of RLC_CP_SCHEDULERS

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 4f0166a33732..67f7557d545d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1138,6 +1138,16 @@ static void mes_v11_0_kiq_setting(struct amdgpu_ring 
*ring)
WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp);
 }
 
+static void mes_v11_0_kiq_clear(struct amdgpu_device *adev)
+{
+   uint32_t tmp;
+
+   /* tell RLC which is KIQ dequeue */
+   tmp = RREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS);
+   tmp &= ~RLC_CP_SCHEDULERS__scheduler0_MASK;
+   WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp);
+}
+
 static int mes_v11_0_kiq_hw_init(struct amdgpu_device *adev)
 {
int r = 0;
@@ -1182,10 +1192,10 @@ static int mes_v11_0_kiq_hw_fini(struct amdgpu_device 
*adev)
 
if (amdgpu_sriov_vf(adev)) {
mes_v11_0_kiq_dequeue(&adev->gfx.kiq.ring);
+   mes_v11_0_kiq_clear(adev);
}
 
-   if (!amdgpu_sriov_vf(adev))
-   mes_v11_0_enable(adev, false);
+   mes_v11_0_enable(adev, false);
 
return 0;
 }
-- 
2.25.1



Re: [pull] amdgpu, amdkfd, radeon drm-next-6.4

2023-04-03 Thread Daniel Vetter
On Fri, Mar 31, 2023 at 06:19:55PM -0400, Alex Deucher wrote:
> Hi Dave, Daniel,
> 
> More new stuff for 6.4.
> 
> The following changes since commit d36d68fd1925d33066d52468b7c7c6aca6521248:
> 
>   Merge tag 'drm-habanalabs-next-2023-03-20' of 
> https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next 
> (2023-03-22 10:35:46 +1000)
> 
> are available in the Git repository at:
> 
>   https://gitlab.freedesktop.org/agd5f/linux.git 
> tags/amd-drm-next-6.4-2023-03-31
> 
> for you to fetch changes up to feae1bd80ec69a3a0011ba1fb88994785f705e3e:
> 
>   drm/amd/pm: enable sysfs node vclk1 and dclk1 for NV3X (2023-03-31 11:18:55 
> -0400)

Merged, thanks

> 
> 
> amd-drm-next-6.4-2023-03-31:
> 
> amdgpu:
> - Misc code cleanups
> - S4 fixes
> - MES fixes
> - SR-IOV fixes
> - Link DC backlight to connector device rather than PCI device
> - W=1 fixes
> - ASPM quirk
> - RAS fixes
> - DC dynamic split fixes and enablement for remaining chips
> - Navi1x SMU fix
> - Initial NBIO 7.9 support
> - Initial GC 9.4.3 support
> - Initial GFXHUB 1.2 support
> - Initial MMHUB 1.8 support
> - DCN 3.1.5 fixes
> - Initial DC FAMs infrastructure
> - Add support for 6.75Gbps link rates
> - Add sysfs nodes for secondary VCN clocks
> 
> amdkfd:
> - Initial support for GC 9.4.3
> 
> radeon:
> - Convert to client-based fbdev emulation
> 
> 
> Alex Deucher (3):
>   drm/amdgpu: drop the extra sign extension
>   Revert "drm/amdgpu/display: change pipe policy for DCN 2.0"
>   drm/amd/pm: enable TEMP_DEPENDENT_VMIN for navi1x
> 
> Alex Hung (1):
>   drm/amd/display: remove outdated 8bpc comments
> 
> Alvin Lee (6):
>   drm/amd/display: Enable FPO for configs that could reduce vlevel
>   drm/amd/display: Update FCLK change latency
>   drm/amd/display: Use per pipe P-State force for FPO
>   drm/amd/display: Only keep cursor p-state force for FPO
>   drm/amd/display: Enable FPO optimization
>   drm/amd/display: Uncomment assignments after HW headers are promoted
> 
> Amber Lin (2):
>   drm/amdkfd: Set noretry/xnack for GC 9.4.3
>   drm/amdkfd: Set TG_CHUNK_SIZE for GC 9.4.3
> 
> Anthony Koo (1):
>   drm/amd/display: [FW Promotion] Release 0.0.160.0
> 
> Aric Cyr (2):
>   drm/amd/display: 3.2.228
>   drm/amd/display: Promote DAL to 3.2.229
> 
> Artem Grishin (2):
>   drm/amd/display: Add support for 6.75 GBps link rate
>   drm/amd/display: Conditionally enable 6.75 GBps link rate
> 
> Ayush Gupta (1):
>   drm/amd/display: fixed dcn30+ underflow issue
> 
> Bill Liu (1):
>   drm/amdgpu: Adding CAP firmware initialization
> 
> Caio Novais (2):
>   drm/amd/display: Remove unused variable 'scl_enable'
>   drm/amd/display: Mark function 
> 'optc3_wait_drr_doublebuffer_pending_clear' as static
> 
> Charlene Liu (4):
>   drm/amd/display: update dio for two pixel per container case
>   drm/amd/display: Add CRC and DMUB test support
>   drm/amd/display: add missing code change init pix_per_cycle
>   drm/amd/display: update dig enable sequence
> 
> Christophe JAILLET (1):
>   drm/amd/display: Slightly optimize dm_dmub_outbox1_low_irq()
> 
> Dmytro Laktyushkin (1):
>   drm/amd/display: w/a for dcn315 inconsistent smu clock table
> 
> Hans de Goede (6):
>   drm/amd/display/amdgpu_dm: Fix backlight_device_register() error 
> handling
>   drm/amd/display/amdgpu_dm: Refactor register_backlight_device()
>   drm/amd/display/amdgpu_dm: Add a bl_idx to amdgpu_dm_connector
>   drm/amd/display/amdgpu_dm: Move most backlight setup into 
> setup_backlight_device()
>   drm/amd/display/amdgpu_dm: Make amdgpu_dm_register_backlight_device() 
> take an amdgpu_dm_connector
>   drm/amd/display/amdgpu_dm: Pass proper parent for backlight device 
> registration v3
> 
> Hawking Zhang (14):
>   drm/amdgpu: Initialize umc ras callback
>   drm/amdgpu: Add fatal error handling in nbio v4_3
>   drm/amdgpu: add nbio v7_9_0 ip headers
>   drm/amdgpu: add nbio v7_9 support
>   drm/amdgpu: init nbio v7_9 callbacks
>   drm/amdgpu: Set family for GC 9.4.3
>   drm/amdgpu: add athub v1_8_0 ip headers
>   drm/amdgpu: add osssys v4_4_2 ip headers
>   drm/amdgpu: add gc v9_4_3 ip headers
>   drm/amdgpu: add gmc ip block support for GC 9.4.3
>   drm/amdgpu: add mmhub v1_8_0 ip headers
>   drm/amdgpu: add GMC ip block for GC 9.4.3
>   drm/amdgpu: Correct xgmi_wafl block name
>   drm/amdkfd: Add GC 9.4.3 KFD support
> 
> Hersen Wu (3):
>   drm/amd/display: align commit_planes_for_stream to latest dc code
>   drm/amd/display: fix wrong index used in dccg32_set_dpstreamclk
>   drm/amd/display: Set dcn32 caps.seamless_odm
> 
> Jack Xiao (1):
>   drm/amd/amdgpu: limit one queue per gang
> 
> Jane Jian (2):
>   drm/amdgpu/gfx: set cg flags to enter/exit

RE: [PATCH v2 2/2] drm/amdgpu: Add MES KIQ clear to tell RLC that KIQ is dequeued

2023-04-03 Thread Chen, Horace
[AMD Official Use Only - General]

Reviewed-By: Horace Chen 

-Original Message-
From: Yifan Zha 
Sent: Monday, April 3, 2023 3:35 PM
To: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Chen, Horace ; Zhang, Hawking 
; Chang, HaiJun 
Cc: Zha, YiFan(Even) 
Subject: [PATCH v2 2/2] drm/amdgpu: Add MES KIQ clear to tell RLC that KIQ is 
dequeued

[Why]
As MES KIQ is dequeued, tell RLC that KIQ is inactive

[How]
Clear the RLC_CP_SCHEDULERS Active bit which RLC checks KIQ status
In addition, driver can halt MES under SRIOV when unloading driver

v2:
Use scheduler0 mask to clear KIQ portion of RLC_CP_SCHEDULERS

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 4f0166a33732..67f7557d545d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1138,6 +1138,16 @@ static void mes_v11_0_kiq_setting(struct amdgpu_ring 
*ring)
WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp);
 }

+static void mes_v11_0_kiq_clear(struct amdgpu_device *adev)
+{
+   uint32_t tmp;
+
+   /* tell RLC which is KIQ dequeue */
+   tmp = RREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS);
+   tmp &= ~RLC_CP_SCHEDULERS__scheduler0_MASK;
+   WREG32_SOC15(GC, 0, regRLC_CP_SCHEDULERS, tmp);
+}
+
 static int mes_v11_0_kiq_hw_init(struct amdgpu_device *adev)
 {
int r = 0;
@@ -1182,10 +1192,10 @@ static int mes_v11_0_kiq_hw_fini(struct amdgpu_device 
*adev)

if (amdgpu_sriov_vf(adev)) {
mes_v11_0_kiq_dequeue(&adev->gfx.kiq.ring);
+   mes_v11_0_kiq_clear(adev);
}

-   if (!amdgpu_sriov_vf(adev))
-   mes_v11_0_enable(adev, false);
+   mes_v11_0_enable(adev, false);

return 0;
 }
--
2.25.1



RE: [PATCH 00/10] DC Patches Apr 3rd, 2023

2023-04-03 Thread Wheeler, Daniel
[Public]

Hi all,
 
This week this patchset was tested on the following systems:
 
Lenovo Thinkpad T14s Gen2, with AMD Ryzen 5 5650U 
Lenovo Thinkpad T13s Gen4 with AMD Ryzen 5 6600U
Reference AMD RX6800
 
These systems were tested on the following display types: 
eDP, (1080p 60hz [5650U]) (1920x1200 60hz [6600U]) (2560x1600 120hz[6600U])
VGA and DVI (1680x1050 60HZ [DP to VGA/DVI, USB-C to DVI/VGA])
DP/HDMI/USB-C (1440p 170hz, 4k 60hz, 4k 144hz [Includes USB-C to DP/HDMI 
adapters])
 
MST tested with Startech MST14DP123DP and 2x 4k 60Hz displays
DSC tested with Cable Matters 101075 (DP to 3x DP), and 201375 (USB-C to 3x DP) 
with 3x 4k60 displays
HP Hook G2 with 1 and 2 4k60 Displays
 
The testing is a mix of automated and manual tests. Manual testing includes 
(but is not limited to):
Changing display configurations and settings
Benchmark testing
Feature testing (Freesync, etc.)
 
Automated testing includes (but is not limited to):
Script testing (scripts to automate some of the manual checks)
IGT testing
 
The patchset consists of the amd-staging-drm-next branch (Head commit - 
705a9d96f697 drm/amd/display: Promote DAL to 3.2.229) with new patches added on 
top of it. This branch is used for both Ubuntu and Chrome OS testing (ChromeOS 
on a bi-weekly basis).
 
 
Tested on Ubuntu 22.04.2
 
Tested-by: Daniel Wheeler 
 
Thank you,

Dan Wheeler
Sr. Technologist  |  AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
Facebook |  Twitter |  amd.com  


-Original Message-
From: Zhuo, Qingqing (Lillian)  
Sent: March 30, 2023 4:57 AM
To: amd-gfx@lists.freedesktop.org
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Lakha, Bhawanpreet ; Siqueira, 
Rodrigo ; Pillai, Aurabindo 
; Zhuo, Qingqing (Lillian) ; 
Li, Roman ; Lin, Wayne ; Wang, Chao-kai 
(Stylon) ; Chiu, Solomon ; Kotarac, 
Pavle ; Gutierrez, Agustin ; 
Wheeler, Daniel 
Subject: [PATCH 00/10] DC Patches Apr 3rd, 2023

This DC patchset brings improvements in multiple areas. In summary, we 
highlight:
- FW Release 0.0.161.0
- Improvements on FPO/FAMS
- Correction to DML calculation
- Fix to multiple clock related issues

Cc: Daniel Wheeler 

---

Alvin Lee (3):
  drm/amd/display: Clear FAMS flag if FAMS doesn't reduce vlevel
  drm/amd/display: Add FPO + VActive support
  drm/amd/display: On clock init, maintain DISPCLK freq

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.161.0

Aric Cyr (1):
  drm/amd/display: 3.2.230

Charlene Liu (1):
  drm/amd/display: add dscclk instance offset check

Hamza Mahfooz (1):
  drm/amd/display: prep work for root clock optimization enablement for
DCN314

Michael Strauss (1):
  drm/amd/display: Improve robustness of FIXED_VS link training at DP1
rates

Paul Hsieh (1):
  drm/amd/display: Correct DML calculation to follow HW SPEC

Zhikai Zhai (1):
  drm/amd/display: add scaler control for dcn32

 .../display/dc/clk_mgr/dcn32/dcn32_clk_mgr.c  |  18 +
 drivers/gpu/drm/amd/display/dc/dc.h   |   6 +-
 drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  |  20 +
 .../gpu/drm/amd/display/dc/dcn20/dcn20_dccg.h |   8 +
 .../gpu/drm/amd/display/dc/dcn31/dcn31_dccg.c |  18 +
 .../drm/amd/display/dc/dcn31/dcn31_resource.c |   2 +
 .../drm/amd/display/dc/dcn314/dcn314_dccg.c   |  28 +-
 .../drm/amd/display/dc/dcn314/dcn314_dccg.h   |  10 +
 .../drm/amd/display/dc/dcn32/dcn32_hwseq.c|  26 +-
 .../drm/amd/display/dc/dcn32/dcn32_resource.c |   3 +
 .../drm/amd/display/dc/dcn32/dcn32_resource.h |   4 +
 .../display/dc/dcn32/dcn32_resource_helpers.c | 156 
 .../amd/display/dc/dcn321/dcn321_resource.c   |   3 +
 .../dc/dml/dcn30/display_mode_vba_30.c|   2 +-
 .../dc/dml/dcn31/display_mode_vba_31.c|   2 +-
 .../dc/dml/dcn314/display_mode_vba_314.c  |   2 +-
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  |  85 +++-
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.h  |   4 +
 .../dc/dml/dcn32/display_mode_vba_32.c|   2 +-
 .../gpu/drm/amd/display/dc/inc/hw/clk_mgr.h   |   3 +
 .../gpu/drm/amd/display/dc/link/link_dpms.c   |   8 +-
 .../dc/link/protocols/link_dp_training.c  |   5 +-
 .../link_dp_training_fixed_vs_pe_retimer.c| 378 +-
 .../link_dp_training_fixed_vs_pe_retimer.h|   5 +
 .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h   |  28 +-
 25 files changed, 807 insertions(+), 19 deletions(-)

-- 
2.34.1


Re: [PATCH] drm/amd: Fix an out of bounds error in BIOS parser

2023-04-03 Thread Harry Wentland
On 4/2/23 18:08, Mario Limonciello wrote:
> The array is hardcoded to 8 in atomfirmware.h, but firmware provides
> a bigger one sometimes. Deferencing the larger array causes an out
> of bounds error.
> 
> commit 4fc1ba4aa589 ("drm/amd/display: fix array index out of bound error
> in bios parser") fixed some of this, but there are two other cases
> not covered by it.  Fix those as well.
> 
> Reported-by: erhar...@mailbox.org
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=214853
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2473
> Signed-off-by: Mario Limonciello 

Reviewed-by: Harry Wentland 

Harry

> ---
>  drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c 
> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
> index e381de2429fa..ae3783a7d7f4 100644
> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
> @@ -515,11 +515,8 @@ static enum bp_result get_gpio_i2c_info(
>   info->i2c_slave_address = record->i2c_slave_addr;
>  
>   /* TODO: check how to get register offset for en, Y, etc. */
> - info->gpio_info.clk_a_register_index =
> - le16_to_cpu(
> - header->gpio_pin[table_index].data_a_reg_index);
> - info->gpio_info.clk_a_shift =
> - header->gpio_pin[table_index].gpio_bitshift;
> + info->gpio_info.clk_a_register_index = 
> le16_to_cpu(pin->data_a_reg_index);
> + info->gpio_info.clk_a_shift = pin->gpio_bitshift;
>  
>   return BP_RESULT_OK;
>  }



Re: [PATCH] drm/amdgpu: simplify amdgpu_ras_eeprom.c

2023-04-03 Thread Luben Tuikov
On 2023-03-31 15:30, Alex Deucher wrote:
> On Tue, Mar 28, 2023 at 12:30 PM Luben Tuikov  wrote:
>>
>> On 2023-03-27 20:11, Alex Deucher wrote:
>>> All chips that support RAS also support IP discovery, so
>>> use the IP versions rather than a mix of IP versions and
>>> asic types.
>>>
>>> Signed-off-by: Alex Deucher 
>>> Cc: Luben Tuikov 
>>> ---
>>>  .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 72 ++-
>>>  1 file changed, 20 insertions(+), 52 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
>>> index 3106fa8a15ef..c2ef2b1456bc 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
>>> @@ -106,48 +106,13 @@
>>>  #define to_amdgpu_device(x) (container_of(x, struct amdgpu_ras, 
>>> eeprom_control))->adev
>>>
>>>  static bool __is_ras_eeprom_supported(struct amdgpu_device *adev)
>>> -{
>>> - if (adev->asic_type == CHIP_IP_DISCOVERY) {
>>> - switch (adev->ip_versions[MP1_HWIP][0]) {
>>> - case IP_VERSION(13, 0, 0):
>>> - case IP_VERSION(13, 0, 10):
>>> - return true;
>>> - default:
>>> - return false;
>>> - }
>>> - }
>>> -
>>> - return  adev->asic_type == CHIP_VEGA20 ||
>>> - adev->asic_type == CHIP_ARCTURUS ||
>>> - adev->asic_type == CHIP_SIENNA_CICHLID ||
>>> - adev->asic_type == CHIP_ALDEBARAN;
>>> -}
>>> -
>>> -static bool __get_eeprom_i2c_addr_arct(struct amdgpu_device *adev,
>>> -struct amdgpu_ras_eeprom_control 
>>> *control)
>>> -{
>>> - struct atom_context *atom_ctx = adev->mode_info.atom_context;
>>> -
>>> - if (!control || !atom_ctx)
>>> - return false;
>>> -
>>> - if (strnstr(atom_ctx->vbios_version,
>>> - "D342",
>>> - sizeof(atom_ctx->vbios_version)))
>>> - control->i2c_address = EEPROM_I2C_MADDR_0;
>>> - else
>>> - control->i2c_address = EEPROM_I2C_MADDR_4;
>>> -
>>> - return true;
>>> -}
>>> -
>>> -static bool __get_eeprom_i2c_addr_ip_discovery(struct amdgpu_device *adev,
>>> -struct amdgpu_ras_eeprom_control 
>>> *control)
>>>  {
>>>   switch (adev->ip_versions[MP1_HWIP][0]) {
>>> + case IP_VERSION(11, 0, 2): /* VEGA20 and ARCTURUS */
>>> + case IP_VERSION(11, 0, 7):
>>>   case IP_VERSION(13, 0, 0):
>>> + case IP_VERSION(13, 0, 2):
>>>   case IP_VERSION(13, 0, 10):
>>
>> I'd add the rest of the proper names here which are being deleted by this 
>> change,
>> so as to not lose this information by this commit: Sienna Cichlid and 
>> Aldebaran,
>> the rest can be left blank as per the current state of the code.
> 
> Fixed.
> 
>>
>>> - control->i2c_address = EEPROM_I2C_MADDR_4;
>>>   return true;
>>>   default:
>>>   return false;
>>> @@ -178,29 +143,32 @@ static bool __get_eeprom_i2c_addr(struct 
>>> amdgpu_device *adev,
>>>   return true;
>>>   }
>>>
>>> - switch (adev->asic_type) {
>>> - case CHIP_VEGA20:
>>> - control->i2c_address = EEPROM_I2C_MADDR_0;
>>> + switch (adev->ip_versions[MP1_HWIP][0]) {
>>> + case IP_VERSION(11, 0, 2):
>>> + /* VEGA20 and ARCTURUS */
>>> + if (adev->asic_type == CHIP_VEGA20)
>>> + control->i2c_address = EEPROM_I2C_MADDR_0;
>>> + else if (strnstr(atom_ctx->vbios_version,
>>
>> In the code this is qualified with atom_ctx != NULL; and if it is,
>> then we return false. So, this is fine, iff we can guarantee that
>> "atom_ctx" will never be NULL. If, OTOH, we cannot guarantee that,
>> then we need to add,
>> else if (!atom_ctx)
>> return false;
>> else if (strnstr(...
>>
>> Although, I do recognize that for Aldebaran below, we do not qualify
>> atom_ctx, so we should probably qualify there too.
> 
> This function is called after the vbios is initialized so I think we
> can drop the check.  vbios is fetched in amdgpu_device_ip_early_init()
> and ras is initialized in amdgpu_device_ip_init() which is called much
> later.

Okay, so we can guarantee that atom_ctx is not NULL at this point.
Add my,

Reviewed-by: Luben Tuikov 

And if in the wild we see that it is, it'll be an easy fix.

Regards,
Luben


> 
> Alex
> 
>>
>>> +  "D342",
>>> +  sizeof(atom_ctx->vbios_version)))
>>> + control->i2c_address = EEPROM_I2C_MADDR_0;
>>> + else
>>> + control->i2c_address = EEPROM_I2C_MADDR_4;
>>>   return true;
>>> -
>>> - case CHIP_ARCTURUS:
>>> - return __get_eeprom_i2c_addr_arct(adev, control);
>>> -
>>> - case CHIP_SIENNA_CICHLID:
>>> + case IP_VERSION(

Re: [PATCH] drm/amdgpu: simplify amdgpu_ras_eeprom.c

2023-04-03 Thread Luben Tuikov
This patch is,

Reviewed-by: Luben Tuikov 

Regards,
Luben

On 2023-03-31 15:54, Alex Deucher wrote:
> All chips that support RAS also support IP discovery, so
> use the IP versions rather than a mix of IP versions and
> asic types.  Checking the validity of the atom_ctx pointer
> is not required as the vbios is already fetched at this
> point.
> 
> v2: add comments to id asic types based on feedback from Luben
> 
> Signed-off-by: Alex Deucher 
> Cc: Luben Tuikov 
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 72 ++-
>  1 file changed, 20 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index 3106fa8a15ef..c2c2a7718613 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -106,48 +106,13 @@
>  #define to_amdgpu_device(x) (container_of(x, struct amdgpu_ras, 
> eeprom_control))->adev
>  
>  static bool __is_ras_eeprom_supported(struct amdgpu_device *adev)
> -{
> - if (adev->asic_type == CHIP_IP_DISCOVERY) {
> - switch (adev->ip_versions[MP1_HWIP][0]) {
> - case IP_VERSION(13, 0, 0):
> - case IP_VERSION(13, 0, 10):
> - return true;
> - default:
> - return false;
> - }
> - }
> -
> - return  adev->asic_type == CHIP_VEGA20 ||
> - adev->asic_type == CHIP_ARCTURUS ||
> - adev->asic_type == CHIP_SIENNA_CICHLID ||
> - adev->asic_type == CHIP_ALDEBARAN;
> -}
> -
> -static bool __get_eeprom_i2c_addr_arct(struct amdgpu_device *adev,
> -struct amdgpu_ras_eeprom_control 
> *control)
> -{
> - struct atom_context *atom_ctx = adev->mode_info.atom_context;
> -
> - if (!control || !atom_ctx)
> - return false;
> -
> - if (strnstr(atom_ctx->vbios_version,
> - "D342",
> - sizeof(atom_ctx->vbios_version)))
> - control->i2c_address = EEPROM_I2C_MADDR_0;
> - else
> - control->i2c_address = EEPROM_I2C_MADDR_4;
> -
> - return true;
> -}
> -
> -static bool __get_eeprom_i2c_addr_ip_discovery(struct amdgpu_device *adev,
> -struct amdgpu_ras_eeprom_control 
> *control)
>  {
>   switch (adev->ip_versions[MP1_HWIP][0]) {
> + case IP_VERSION(11, 0, 2): /* VEGA20 and ARCTURUS */
> + case IP_VERSION(11, 0, 7): /* Sienna cichlid */
>   case IP_VERSION(13, 0, 0):
> + case IP_VERSION(13, 0, 2): /* Aldebaran */
>   case IP_VERSION(13, 0, 10):
> - control->i2c_address = EEPROM_I2C_MADDR_4;
>   return true;
>   default:
>   return false;
> @@ -178,29 +143,32 @@ static bool __get_eeprom_i2c_addr(struct amdgpu_device 
> *adev,
>   return true;
>   }
>  
> - switch (adev->asic_type) {
> - case CHIP_VEGA20:
> - control->i2c_address = EEPROM_I2C_MADDR_0;
> + switch (adev->ip_versions[MP1_HWIP][0]) {
> + case IP_VERSION(11, 0, 2):
> + /* VEGA20 and ARCTURUS */
> + if (adev->asic_type == CHIP_VEGA20)
> + control->i2c_address = EEPROM_I2C_MADDR_0;
> + else if (strnstr(atom_ctx->vbios_version,
> +  "D342",
> +  sizeof(atom_ctx->vbios_version)))
> + control->i2c_address = EEPROM_I2C_MADDR_0;
> + else
> + control->i2c_address = EEPROM_I2C_MADDR_4;
>   return true;
> -
> - case CHIP_ARCTURUS:
> - return __get_eeprom_i2c_addr_arct(adev, control);
> -
> - case CHIP_SIENNA_CICHLID:
> + case IP_VERSION(11, 0, 7):
>   control->i2c_address = EEPROM_I2C_MADDR_0;
>   return true;
> -
> - case CHIP_ALDEBARAN:
> + case IP_VERSION(13, 0, 2):
>   if (strnstr(atom_ctx->vbios_version, "D673",
>   sizeof(atom_ctx->vbios_version)))
>   control->i2c_address = EEPROM_I2C_MADDR_4;
>   else
>   control->i2c_address = EEPROM_I2C_MADDR_0;
>   return true;
> -
> - case CHIP_IP_DISCOVERY:
> - return __get_eeprom_i2c_addr_ip_discovery(adev, control);
> -
> + case IP_VERSION(13, 0, 0):
> + case IP_VERSION(13, 0, 10):
> + control->i2c_address = EEPROM_I2C_MADDR_4;
> + return true;
>   default:
>   return false;
>   }



[PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-03 Thread Eric Huang
dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf
changing placement to CPU will trigger memory eviction after calling
ttm_bo_validate, and the eviction will cause performance drop.
Keeping the correct domain will solve the issue.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a3b09edfd1bf..17b708acb447 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -642,7 +642,7 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment 
*attachment)
struct ttm_operation_ctx ctx = {.interruptible = true};
struct amdgpu_bo *bo = attachment->bo_va->base.bo;
 
-   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+   amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
 }
 
-- 
2.34.1



[RFC PATCH 0/4] uapi, drm: Add and implement RLIMIT_GPUPRIO

2023-04-03 Thread Joshua Ashton
Hello all!

I would like to propose a new API for allowing processes to control
the priority of GPU queues similar to RLIMIT_NICE/RLIMIT_RTPRIO.

The main reason for this is for compositors such as Gamescope and
SteamVR vrcompositor to be able to create realtime async compute
queues on AMD without the need of CAP_SYS_NICE.

The current situation is bad for a few reasons, one being that in order
to setcap the executable, typically one must run as root which involves
a pretty high privelage escalation in order to achieve one
small feat, a realtime async compute queue queue for VR or a compositor.
The executable cannot be setcap'ed inside a
container nor can the setcap'ed executable be run in a container with
NO_NEW_PRIVS.

I go into more detail in the description in
`uapi: Add RLIMIT_GPUPRIO`.

My initial proposal here is to add a new RLIMIT, `RLIMIT_GPUPRIO`,
which seems to make most initial sense to me to solve the problem.

I am definitely not set that this is the best formulation however
or if this should be linked to DRM (in terms of it's scheduler
priority enum/definitions) in any way and and would really like other
people's opinions across the stack on this.

Once initial concern is that potentially this RLIMIT could out-live
the lifespan of DRM. It sounds crazy saying it right now, something
that definitely popped into my mind when touching `resource.h`. :-)

Anyway, please let me know what you think!
Definitely open to any feedback and advice you may have. :D

Thanks!
 - Joshie

Joshua Ashton (4):
  drm/scheduler: Add DRM_SCHED_PRIORITY_VERY_HIGH
  drm/scheduler: Split out drm_sched_priority to own file
  uapi: Add RLIMIT_GPUPRIO
  drm/amd/amdgpu: Check RLIMIT_GPUPRIO in priority permissions

 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 13 ++--
 drivers/gpu/drm/msm/msm_gpu.h   |  2 +-
 fs/proc/base.c  |  1 +
 include/asm-generic/resource.h  |  3 +-
 include/drm/drm_sched_priority.h| 41 +
 include/drm/gpu_scheduler.h | 14 +
 include/uapi/asm-generic/resource.h |  3 +-
 7 files changed, 58 insertions(+), 19 deletions(-)
 create mode 100644 include/drm/drm_sched_priority.h

-- 
2.40.0



[RFC PATCH 2/4] drm/scheduler: Split out drm_sched_priority to own file

2023-04-03 Thread Joshua Ashton
This allows it to be used by other parts of the codebase without fear
of a circular include dependency being introduced.

Signed-off-by: Joshua Ashton 
---
 include/drm/drm_sched_priority.h | 41 
 include/drm/gpu_scheduler.h  | 15 +---
 2 files changed, 42 insertions(+), 14 deletions(-)
 create mode 100644 include/drm/drm_sched_priority.h

diff --git a/include/drm/drm_sched_priority.h b/include/drm/drm_sched_priority.h
new file mode 100644
index ..85a7bb011e27
--- /dev/null
+++ b/include/drm/drm_sched_priority.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright 2015 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef _DRM_SCHED_PRIORITY_H_
+#define _DRM_SCHED_PRIORITY_H_
+
+/* These are often used as an (initial) index
+ * to an array, and as such should start at 0.
+ */
+enum drm_sched_priority {
+   DRM_SCHED_PRIORITY_MIN,
+   DRM_SCHED_PRIORITY_NORMAL,
+   DRM_SCHED_PRIORITY_HIGH,
+   DRM_SCHED_PRIORITY_VERY_HIGH,
+   DRM_SCHED_PRIORITY_KERNEL,
+
+   DRM_SCHED_PRIORITY_COUNT,
+   DRM_SCHED_PRIORITY_UNSET = -2
+};
+
+#endif
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index a62071660602..9228ff0d515e 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
 
@@ -48,20 +49,6 @@ struct drm_gem_object;
 struct drm_gpu_scheduler;
 struct drm_sched_rq;
 
-/* These are often used as an (initial) index
- * to an array, and as such should start at 0.
- */
-enum drm_sched_priority {
-   DRM_SCHED_PRIORITY_MIN,
-   DRM_SCHED_PRIORITY_NORMAL,
-   DRM_SCHED_PRIORITY_HIGH,
-   DRM_SCHED_PRIORITY_VERY_HIGH,
-   DRM_SCHED_PRIORITY_KERNEL,
-
-   DRM_SCHED_PRIORITY_COUNT,
-   DRM_SCHED_PRIORITY_UNSET = -2
-};
-
 /* Used to chose between FIFO and RR jobs scheduling */
 extern int drm_sched_policy;
 
-- 
2.40.0



[RFC PATCH 1/4] drm/scheduler: Add DRM_SCHED_PRIORITY_VERY_HIGH

2023-04-03 Thread Joshua Ashton
This allows AMDGPU scheduler priority above normal to be expressed
using the DRM_SCHED_PRIORITY enum.

Signed-off-by: Joshua Ashton 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 +-
 drivers/gpu/drm/msm/msm_gpu.h   | 2 +-
 include/drm/gpu_scheduler.h | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index d2139ac12159..8ec255091c4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -79,7 +79,7 @@ amdgpu_ctx_to_drm_sched_prio(int32_t ctx_prio)
return DRM_SCHED_PRIORITY_HIGH;
 
case AMDGPU_CTX_PRIORITY_VERY_HIGH:
-   return DRM_SCHED_PRIORITY_HIGH;
+   return DRM_SCHED_PRIORITY_VERY_HIGH;
 
/* This should not happen as we sanitized userspace provided priority
 * already, WARN if this happens.
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index fc1c0d8611a8..e3495712b236 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -336,7 +336,7 @@ struct msm_gpu_perfcntr {
  * DRM_SCHED_PRIORITY_KERNEL priority level is treated specially in some
  * cases, so we don't use it (no need for kernel generated jobs).
  */
-#define NR_SCHED_PRIORITIES (1 + DRM_SCHED_PRIORITY_HIGH - 
DRM_SCHED_PRIORITY_MIN)
+#define NR_SCHED_PRIORITIES (1 + DRM_SCHED_PRIORITY_VERY_HIGH - 
DRM_SCHED_PRIORITY_MIN)
 
 /**
  * struct msm_file_private - per-drm_file context
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9935d1e2ff69..a62071660602 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -55,6 +55,7 @@ enum drm_sched_priority {
DRM_SCHED_PRIORITY_MIN,
DRM_SCHED_PRIORITY_NORMAL,
DRM_SCHED_PRIORITY_HIGH,
+   DRM_SCHED_PRIORITY_VERY_HIGH,
DRM_SCHED_PRIORITY_KERNEL,
 
DRM_SCHED_PRIORITY_COUNT,
-- 
2.40.0



[RFC PATCH 3/4] uapi: Add RLIMIT_GPUPRIO

2023-04-03 Thread Joshua Ashton
Introduce a new RLIMIT that allows the user to set a runtime limit on
the GPU scheduler priority for tasks.

This avoids the need for leased compositors such as SteamVR's
vrcompositor to be launched via a setcap'ed executable with
CAP_SYS_NICE.

This is required for SteamVR as it doesn't run as a DRM master, but
rather on a DRM lease using the HMD's connector.

The current situation is bad for a few reasons, one being that in order
to setcap the executable, typically one must run as root which involves
a pretty high privelage escalation in order to achieve one
small feat, a realtime async compute queue queue for VR or a compositor.
The executable cannot be setcap'ed inside a
container nor can the setcap'ed executable be run in a container with
NO_NEW_PRIVS.

Even in cases where one may think the DRM master check to be useful,
such as Gamescope where it is the DRM master, the part of the compositor
that runs as the DRM master is entirely separate to the Vulkan device
with it's own DRM device fd doing the GPU work that demands the
realtime priority queue. Additionally, Gamescope can also run nested
in a traditional compositor where there is no DRM master, but having a
realtime queue is still advantageous.

With adding RLIMIT_GPUPRIO, a process outside of a container or
eg. rtkit could call `prlimit` on the process inside to allow it to make
a realtime queue and solve these problems.

Signed-off-by: Joshua Ashton 
---
 fs/proc/base.c  | 1 +
 include/asm-generic/resource.h  | 3 ++-
 include/uapi/asm-generic/resource.h | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5e0e0ccd47aa..a5c9a9f23f08 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -589,6 +589,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
[RLIMIT_NICE] = {"Max nice priority", NULL},
[RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
[RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
+   [RLIMIT_GPUPRIO] = {"Max DRM GPU priority", NULL},
 };
 
 /* Display limits for a process */
diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h
index 8874f681b056..cefee1a8d9db 100644
--- a/include/asm-generic/resource.h
+++ b/include/asm-generic/resource.h
@@ -3,7 +3,7 @@
 #define _ASM_GENERIC_RESOURCE_H
 
 #include 
-
+#include 
 
 /*
  * boot-time rlimit defaults for the init task:
@@ -26,6 +26,7 @@
[RLIMIT_NICE]   = { 0, 0 }, \
[RLIMIT_RTPRIO] = { 0, 0 }, \
[RLIMIT_RTTIME] = {  RLIM_INFINITY,  RLIM_INFINITY },   \
+   [RLIMIT_GPUPRIO]= { DRM_SCHED_PRIORITY_NORMAL, 
DRM_SCHED_PRIORITY_NORMAL }, \
 }
 
 #endif
diff --git a/include/uapi/asm-generic/resource.h 
b/include/uapi/asm-generic/resource.h
index f12db7a0da64..85027b07a420 100644
--- a/include/uapi/asm-generic/resource.h
+++ b/include/uapi/asm-generic/resource.h
@@ -46,7 +46,8 @@
   0-39 for nice level 19 .. -20 */
 #define RLIMIT_RTPRIO  14  /* maximum realtime priority */
 #define RLIMIT_RTTIME  15  /* timeout for RT tasks in us */
-#define RLIM_NLIMITS   16
+#define RLIMIT_GPUPRIO 16  /* maximum GPU priority */
+#define RLIM_NLIMITS   17
 
 /*
  * SuS says limits have to be unsigned.
-- 
2.40.0



[RFC PATCH 4/4] drm/amd/amdgpu: Check RLIMIT_GPUPRIO in priority permissions

2023-04-03 Thread Joshua Ashton
Add support for the new RLIMIT_GPUPRIO when doing the priority
checks creating an amdgpu_ctx.

Signed-off-by: Joshua Ashton 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 8ec255091c4a..4ac645455bc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -28,6 +28,8 @@
 #include "amdgpu_sched.h"
 #include "amdgpu_ras.h"
 #include 
+#include 
+#include 
 
 #define to_amdgpu_ctx_entity(e)\
container_of((e), struct amdgpu_ctx_entity, entity)
@@ -94,11 +96,16 @@ amdgpu_ctx_to_drm_sched_prio(int32_t ctx_prio)
 static int amdgpu_ctx_priority_permit(struct drm_file *filp,
  int32_t priority)
 {
+   enum drm_sched_priority in_drm_priority, rlim_drm_priority;
+
if (!amdgpu_ctx_priority_is_valid(priority))
return -EINVAL;
 
-   /* NORMAL and below are accessible by everyone */
-   if (priority <= AMDGPU_CTX_PRIORITY_NORMAL)
+   /* Check priority against RLIMIT to see what is allowed. */
+   in_drm_priority = amdgpu_ctx_to_drm_sched_prio(priority);
+   rlim_drm_priority = (enum drm_sched_priority)rlimit(RLIMIT_GPUPRIO);
+
+   if (in_drm_priority <= rlim_drm_priority)
return 0;
 
if (capable(CAP_SYS_NICE))
-- 
2.40.0



Re: [RFC PATCH 1/4] drm/scheduler: Add DRM_SCHED_PRIORITY_VERY_HIGH

2023-04-03 Thread Christian König

Am 03.04.23 um 21:40 schrieb Joshua Ashton:

This allows AMDGPU scheduler priority above normal to be expressed
using the DRM_SCHED_PRIORITY enum.


That was rejected before, I just don't remember why exactly. Need to dig 
that up again.


Christian.



Signed-off-by: Joshua Ashton 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 2 +-
  drivers/gpu/drm/msm/msm_gpu.h   | 2 +-
  include/drm/gpu_scheduler.h | 1 +
  3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index d2139ac12159..8ec255091c4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -79,7 +79,7 @@ amdgpu_ctx_to_drm_sched_prio(int32_t ctx_prio)
return DRM_SCHED_PRIORITY_HIGH;
  
  	case AMDGPU_CTX_PRIORITY_VERY_HIGH:

-   return DRM_SCHED_PRIORITY_HIGH;
+   return DRM_SCHED_PRIORITY_VERY_HIGH;
  
  	/* This should not happen as we sanitized userspace provided priority

 * already, WARN if this happens.
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index fc1c0d8611a8..e3495712b236 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -336,7 +336,7 @@ struct msm_gpu_perfcntr {
   * DRM_SCHED_PRIORITY_KERNEL priority level is treated specially in some
   * cases, so we don't use it (no need for kernel generated jobs).
   */
-#define NR_SCHED_PRIORITIES (1 + DRM_SCHED_PRIORITY_HIGH - 
DRM_SCHED_PRIORITY_MIN)
+#define NR_SCHED_PRIORITIES (1 + DRM_SCHED_PRIORITY_VERY_HIGH - 
DRM_SCHED_PRIORITY_MIN)
  
  /**

   * struct msm_file_private - per-drm_file context
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9935d1e2ff69..a62071660602 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -55,6 +55,7 @@ enum drm_sched_priority {
DRM_SCHED_PRIORITY_MIN,
DRM_SCHED_PRIORITY_NORMAL,
DRM_SCHED_PRIORITY_HIGH,
+   DRM_SCHED_PRIORITY_VERY_HIGH,
DRM_SCHED_PRIORITY_KERNEL,
  
  	DRM_SCHED_PRIORITY_COUNT,




Re: [RFC PATCH 0/4] uapi, drm: Add and implement RLIMIT_GPUPRIO

2023-04-03 Thread Christian König

Am 03.04.23 um 21:40 schrieb Joshua Ashton:

Hello all!

I would like to propose a new API for allowing processes to control
the priority of GPU queues similar to RLIMIT_NICE/RLIMIT_RTPRIO.

The main reason for this is for compositors such as Gamescope and
SteamVR vrcompositor to be able to create realtime async compute
queues on AMD without the need of CAP_SYS_NICE.

The current situation is bad for a few reasons, one being that in order
to setcap the executable, typically one must run as root which involves
a pretty high privelage escalation in order to achieve one
small feat, a realtime async compute queue queue for VR or a compositor.
The executable cannot be setcap'ed inside a
container nor can the setcap'ed executable be run in a container with
NO_NEW_PRIVS.

I go into more detail in the description in
`uapi: Add RLIMIT_GPUPRIO`.

My initial proposal here is to add a new RLIMIT, `RLIMIT_GPUPRIO`,
which seems to make most initial sense to me to solve the problem.

I am definitely not set that this is the best formulation however
or if this should be linked to DRM (in terms of it's scheduler
priority enum/definitions) in any way and and would really like other
people's opinions across the stack on this.

Once initial concern is that potentially this RLIMIT could out-live
the lifespan of DRM. It sounds crazy saying it right now, something
that definitely popped into my mind when touching `resource.h`. :-)

Anyway, please let me know what you think!
Definitely open to any feedback and advice you may have. :D


Well the basic problem is that higher priority queues can be used to 
starve low priority queues.


This starvation in turn is very very bad for memory management since the 
dma_fence's the GPU scheduler deals with have very strong restrictions.


Even exposing this under CAP_SYS_NICE is questionable, so we will most 
likely have to NAK this.


Regards,
Christian.



Thanks!
  - Joshie

Joshua Ashton (4):
   drm/scheduler: Add DRM_SCHED_PRIORITY_VERY_HIGH
   drm/scheduler: Split out drm_sched_priority to own file
   uapi: Add RLIMIT_GPUPRIO
   drm/amd/amdgpu: Check RLIMIT_GPUPRIO in priority permissions

  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 13 ++--
  drivers/gpu/drm/msm/msm_gpu.h   |  2 +-
  fs/proc/base.c  |  1 +
  include/asm-generic/resource.h  |  3 +-
  include/drm/drm_sched_priority.h| 41 +
  include/drm/gpu_scheduler.h | 14 +
  include/uapi/asm-generic/resource.h |  3 +-
  7 files changed, 58 insertions(+), 19 deletions(-)
  create mode 100644 include/drm/drm_sched_priority.h





[PATCH] drm/amdkfd: On GFX11 check PCIe atomics support and set CP_HQD_HQ_STATUS0[29]

2023-04-03 Thread Sreekant Somasekharan
On GFX11, CP_HQD_HQ_STATUS0[29] bit will be used by CPFW to acknowledge
whether PCIe atomics are supported. The default value of this bit is set
to 0. Driver will check whether PCIe atomics are supported and set the
bit to 1 if supported. This will force CPFW to use real atomic ops.
If the bit is not set, CPFW will default to read/modify/write using the
firmware itself.

This is applicable only to RS64 based GFX11 with MEC FW greater than or
equal to 509. If MEC FW is less than 509, PCIe atomics needs to be
supported, else it will skip the device.

This commit also involves moving amdgpu_amdkfd_device_probe() function
call after per-IP early_init loop in amdgpu_device_ip_early_init()
function so as to check for RS64 enabled device.

Signed-off-by: Sreekant Somasekharan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c |  9 +
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7116119ed038..b3a754ca0923 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2150,7 +2150,6 @@ static int amdgpu_device_ip_early_init(struct 
amdgpu_device *adev)
adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
}
 
-   amdgpu_amdkfd_device_probe(adev);
 
adev->pm.pp_feature = amdgpu_pp_feature_mask;
if (amdgpu_sriov_vf(adev) || sched_policy == KFD_SCHED_POLICY_NO_HWS)
@@ -2206,6 +2205,7 @@ static int amdgpu_device_ip_early_init(struct 
amdgpu_device *adev)
if (!total)
return -ENODEV;
 
+   amdgpu_amdkfd_device_probe(adev);
adev->cg_flags &= amdgpu_cg_mask;
adev->pg_flags &= amdgpu_pg_mask;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 521dfa88aad8..64a295a35d37 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -204,6 +204,17 @@ static void kfd_device_info_init(struct kfd_dev *kfd,
/* Navi1x+ */
if (gc_version >= IP_VERSION(10, 1, 1))
kfd->device_info.needs_pci_atomics = true;
+   } else if (gc_version < IP_VERSION(12, 0, 0)) {
+   /* On GFX11 running on RS64, MEC FW version must be 
greater than
+* or equal to version 509 to support acknowledging 
whether
+* PCIe atomics are supported. Before MEC version 509, 
PCIe
+* atomics are required. After that, the FW's use of 
atomics
+* is controlled by CP_HQD_HQ_STATUS0[29].
+* This will fail on GFX11 when PCIe atomics are not 
supported
+* and MEC FW version < 509 for RS64 based CPFW.
+*/
+   kfd->device_info.needs_pci_atomics = true;
+   kfd->device_info.no_atomic_fw_version = 
kfd->adev->gfx.rs64_enable ? 509 : 0;
}
} else {
kfd->device_info.doorbell_size = 4;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
index 4a9af800b1f1..c5ea594abbf6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
@@ -143,6 +143,15 @@ static void init_mqd(struct mqd_manager *mm, void **mqd,
1 << CP_HQD_QUANTUM__QUANTUM_SCALE__SHIFT |
1 << CP_HQD_QUANTUM__QUANTUM_DURATION__SHIFT;
 
+   /*
+* If PCIe atomics are supported, set CP_HQD_HQ_STATUS0[29] == 1
+* to force CPFW to use atomics. This is supported only on MEC FW
+* version >= 509 and on RS64 based CPFW only. On previous versions,
+* platforms running on GFX11 must support atomics else will skip the 
device.
+*/
+   if (amdgpu_amdkfd_have_atomics_support((mm->dev->adev)))
+   m->cp_hqd_hq_status0 |= 1 << 29;
+
if (q->format == KFD_QUEUE_FORMAT_AQL) {
m->cp_hqd_aql_control =
1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
-- 
2.25.1



Re: [RFC PATCH 0/4] uapi, drm: Add and implement RLIMIT_GPUPRIO

2023-04-03 Thread Joshua Ashton




On 4/3/23 20:54, Christian König wrote:

Am 03.04.23 um 21:40 schrieb Joshua Ashton:

Hello all!

I would like to propose a new API for allowing processes to control
the priority of GPU queues similar to RLIMIT_NICE/RLIMIT_RTPRIO.

The main reason for this is for compositors such as Gamescope and
SteamVR vrcompositor to be able to create realtime async compute
queues on AMD without the need of CAP_SYS_NICE.

The current situation is bad for a few reasons, one being that in order
to setcap the executable, typically one must run as root which involves
a pretty high privelage escalation in order to achieve one
small feat, a realtime async compute queue queue for VR or a compositor.
The executable cannot be setcap'ed inside a
container nor can the setcap'ed executable be run in a container with
NO_NEW_PRIVS.

I go into more detail in the description in
`uapi: Add RLIMIT_GPUPRIO`.

My initial proposal here is to add a new RLIMIT, `RLIMIT_GPUPRIO`,
which seems to make most initial sense to me to solve the problem.

I am definitely not set that this is the best formulation however
or if this should be linked to DRM (in terms of it's scheduler
priority enum/definitions) in any way and and would really like other
people's opinions across the stack on this.

Once initial concern is that potentially this RLIMIT could out-live
the lifespan of DRM. It sounds crazy saying it right now, something
that definitely popped into my mind when touching `resource.h`. :-)

Anyway, please let me know what you think!
Definitely open to any feedback and advice you may have. :D


Well the basic problem is that higher priority queues can be used to 
starve low priority queues.


This starvation in turn is very very bad for memory management since the 
dma_fence's the GPU scheduler deals with have very strong restrictions.


Even exposing this under CAP_SYS_NICE is questionable, so we will most 
likely have to NAK this.


This is already exposed with CAP_SYS_NICE and is relied on by SteamVR 
for async reprojection and Gamescope's composite path on Steam Deck.


Having a high priority async compute queue is really really important 
and advantageous for these tasks.


The majority of usecases for something like this is going to be a 
compositor which does some really tiny amount of work per-frame but is 
incredibly latency dependent (as it depends on latching onto buffers 
just before vblank to do it's work)


Starving and surpassing work on other queues is kind of the entire 
point. Gamescope and SteamVR do it on ACE as well so GFX work can run 
alongside it.


- Joshie 🐸✨



Regards,
Christian.



Thanks!
  - Joshie

Joshua Ashton (4):
   drm/scheduler: Add DRM_SCHED_PRIORITY_VERY_HIGH
   drm/scheduler: Split out drm_sched_priority to own file
   uapi: Add RLIMIT_GPUPRIO
   drm/amd/amdgpu: Check RLIMIT_GPUPRIO in priority permissions

  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 13 ++--
  drivers/gpu/drm/msm/msm_gpu.h   |  2 +-
  fs/proc/base.c  |  1 +
  include/asm-generic/resource.h  |  3 +-
  include/drm/drm_sched_priority.h    | 41 +
  include/drm/gpu_scheduler.h | 14 +
  include/uapi/asm-generic/resource.h |  3 +-
  7 files changed, 58 insertions(+), 19 deletions(-)
  create mode 100644 include/drm/drm_sched_priority.h







[linux-next:master] BUILD REGRESSION 31bd35b66249699343d2416658f57e97314a433a

2023-04-03 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 31bd35b66249699343d2416658f57e97314a433a  Add linux-next specific 
files for 20230403

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202303082135.njdx1bij-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202303161521.jbgbafjj-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202304040401.imxt7ubi-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:351:13: 
warning: variable 'bw_needed' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_validation.c:352:25: 
warning: variable 'link' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:309:17: sparse:  
  int
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:309:17: sparse:  
  void
drivers/net/wireless/legacy/ray_cs.c:628:17: warning: 'strncpy' specified bound 
32 equals destination size [-Wstringop-truncation]

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/acpi/property.c:985 acpi_data_prop_read_single() error: potentially 
dereferencing uninitialized 'obj'.
drivers/cdx/cdx.c:393:20: error: initialization of 'ssize_t (*)(const struct 
bus_type *, const char *, size_t)' {aka 'long int (*)(const struct bus_type *, 
const char *, long unsigned int)'} from incompatible pointer type 'ssize_t 
(*)(struct bus_type *, const char *, size_t)' {aka 'long int (*)(struct 
bus_type *, const char *, long unsigned int)'} 
[-Werror=incompatible-pointer-types]
drivers/pinctrl/pinctrl-mlxbf3.c:162:20: sparse: sparse: symbol 
'mlxbf3_pmx_funcs' was not declared. Should it be static?
drivers/soc/fsl/qe/tsa.c:140:26: sparse: sparse: incorrect type in argument 2 
(different address spaces)
drivers/soc/fsl/qe/tsa.c:150:27: sparse: sparse: incorrect type in argument 1 
(different address spaces)
drivers/soc/fsl/qe/tsa.c:189:26: sparse: sparse: dereference of noderef 
expression
drivers/soc/fsl/qe/tsa.c:663:22: sparse: sparse: incorrect type in assignment 
(different address spaces)
drivers/soc/fsl/qe/tsa.c:673:21: sparse: sparse: incorrect type in assignment 
(different address spaces)
drivers/usb/typec/ucsi/ucsi_glink.c:248:20: sparse: sparse: restricted __le32 
degrades to integer
drivers/usb/typec/ucsi/ucsi_glink.c:81:23: sparse: sparse: incorrect type in 
assignment (different base types)
drivers/usb/typec/ucsi/ucsi_glink.c:82:22: sparse: sparse: incorrect type in 
assignment (different base types)
drivers/usb/typec/ucsi/ucsi_glink.c:83:24: sparse: sparse: incorrect type in 
assignment (different base types)
include/linux/gpio/consumer.h: linux/err.h is included more than once.
include/linux/gpio/driver.h: asm/bug.h is included more than once.
io_uring/io_uring.c:432 io_prep_async_work() error: we previously assumed 
'req->file' could be null (see line 425)
io_uring/kbuf.c:221 __io_remove_buffers() warn: variable dereferenced before 
check 'bl->buf_ring' (see line 219)

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|   `-- 
drivers-net-wireless-legacy-ray_cs.c:warning:strncpy-specified-bound-equals-destination-size
|-- alpha-randconfig-s051-20230403
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:sparse:int
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:sparse:sparse:incompatible-types-in-conditional-expression-(different-base-types):
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-pm-swsmu-smu13-smu_v13_0_6_ppt.c:sparse:void
|   `-- 
drivers-pinctrl-pinctrl-mlxbf3.c:sparse:sparse:symbol-mlxbf3_pmx_funcs-was-not-declared.-Should-it-be-static
|-- arc-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- arm-allmodconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- arm-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-bw_needed-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-link_validation.c:warning:variable-link-set-but-not-used
|-- arm64-allyesconfig
|   |-- 
drivers-cdx-cdx.c:error:initialization-of-ssize_t-(-)(const-struct-bus_type-const-ch