Re: [PATCH] drm/amdgpu: Always align dumb buffer at PAGE_SIZE

2022-09-22 Thread Christian König

Am 23.09.22 um 07:28 schrieb lepton:

On Thu, Sep 22, 2022 at 10:14 PM Christian König
 wrote:

Am 23.09.22 um 01:04 schrieb Lepton Wu:

Since size has been aligned to PAGE_SIZE already, just align it
to PAGE_SIZE so later the buffer can be used as a texture in mesa
after 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fmesa%2Fmesa%2Fcommit%2F%3Fid%3Df7a4051b8data=05%7C01%7Cchristian.koenig%40amd.com%7C645f6878a7bd487588b708da9d246c4c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637995077041120091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=NMEAl8TByDLQFWW1d%2FaJfiGrXc4mpwL5dxNH0M0QH84%3Dreserved=0
Otherwise, si_texture_create_object will fail at line
"buf->alignment < tex->surface.alignment"

I don't think that those Mesa checks are a good idea in the first place.

The alignment value is often specified as zero when it doesn't matter
because the minimum alignment can never be less than the page size.

Are you suggesting to change those mesa checks?


Yes, the minimum alignment of allocations is always 4096 because that's 
the page size of the GPU.



While that can be
done, I still think a kernel side "fix" is still
useful since it doesn't hurt while can fix issues for some versions of mesa.


No, we have tons of places where we don't specify and alignment for 
buffers because it never mattered. I certainly don't want to fix all of 
those.


Regards,
Christian.


Christian.


Signed-off-by: Lepton Wu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8ef31d687ef3b..8dca0c920d3ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -928,7 +928,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
   args->size = ALIGN(args->size, PAGE_SIZE);
   domain = amdgpu_bo_get_preferred_domain(adev,
   amdgpu_display_supported_domains(adev, flags));
- r = amdgpu_gem_object_create(adev, args->size, 0, domain, flags,
+ r = amdgpu_gem_object_create(adev, args->size, PAGE_SIZE, domain, flags,
ttm_bo_type_device, NULL, );
   if (r)
   return -ENOMEM;




Re: [PATCH] drm/amdgpu: Always align dumb buffer at PAGE_SIZE

2022-09-22 Thread Christian König

Am 23.09.22 um 01:04 schrieb Lepton Wu:

Since size has been aligned to PAGE_SIZE already, just align it
to PAGE_SIZE so later the buffer can be used as a texture in mesa
after https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a4051b8
Otherwise, si_texture_create_object will fail at line
"buf->alignment < tex->surface.alignment"


I don't think that those Mesa checks are a good idea in the first place.

The alignment value is often specified as zero when it doesn't matter 
because the minimum alignment can never be less than the page size.


Christian.



Signed-off-by: Lepton Wu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8ef31d687ef3b..8dca0c920d3ce 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -928,7 +928,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
args->size = ALIGN(args->size, PAGE_SIZE);
domain = amdgpu_bo_get_preferred_domain(adev,
amdgpu_display_supported_domains(adev, flags));
-   r = amdgpu_gem_object_create(adev, args->size, 0, domain, flags,
+   r = amdgpu_gem_object_create(adev, args->size, PAGE_SIZE, domain, flags,
 ttm_bo_type_device, NULL, );
if (r)
return -ENOMEM;




Re: [PATCH v1] drivers/amd/kv_dpm: check the return value of amdgpu_kv_smc_bapm_enable

2022-09-22 Thread Lazar, Lijo




On 9/23/2022 1:36 AM, Li Zhong wrote:

Check the return value of amdgpu_kv_smc_bapm_enable() and log the error
when it fails.

Signed-off-by: Li Zhong 
---
  drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
index 8fd0782a2b20..d392256effe2 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
@@ -1384,13 +1384,16 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
  static void kv_dpm_disable(struct amdgpu_device *adev)
  {
struct kv_power_info *pi = kv_get_pi(adev);
+   int err;
  
  	amdgpu_irq_put(adev, >pm.dpm.thermal.irq,

   AMDGPU_THERMAL_IRQ_LOW_TO_HIGH);
amdgpu_irq_put(adev, >pm.dpm.thermal.irq,
   AMDGPU_THERMAL_IRQ_HIGH_TO_LOW);
  
-	amdgpu_kv_smc_bapm_enable(adev, false);

+   err = amdgpu_kv_smc_bapm_enable(adev, false);
+   if (ret)
+   DRM_ERROR("amdgpu_kv_smc_bapm_enable failed\n");


Return code is captured in 'err' and check is for ret' variable.

BTW, does this code compile?

Thanks,
Lijo

  
  	if (adev->asic_type == CHIP_MULLINS)

kv_enable_nb_dpm(adev, false);



Re: [PATCH v3] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Felix Kuehling

Am 2022-09-22 um 17:33 schrieb David Belanger:

Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.
v3: Remove debugger code changes.

Signed-off-by: Jay Cornwall 
Signed-off-by: David Belanger 
Reviewed-by: Felix Kuehling 
Acked-by: Alex Deucher 


Looks good to submit to amd-staging-drm-next.

Thanks,
  Felix



---
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
  .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
  3 files changed, 413 insertions(+), 385 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index b64cd46a159a..cbc506b958b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -185,7 +185,11 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
mes_add_queue_pkt.tma_addr = input->tma_addr;
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
-   mes_add_queue_pkt.trap_en = 1;
+
+   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) &&
+ (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
+ (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
+   mes_add_queue_pkt.trap_en = 1;
  
  	return mes_v11_0_submit_pkt_and_poll_completion(mes,

_add_queue_pkt, sizeof(mes_add_queue_pkt),
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 60a81649cf12..c7118843db05 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -742,7 +742,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbf88fffe, 0x877aff7f,
0x0400, 0x8f7a857a,
0x886d7a6d, 0xb97b02dc,
-   0x8f7b997b, 0xb97a2a05,
+   0x8f7b997b, 0xb97a3a05,
0x807a817a, 0xbf0d997b,
0xbf850002, 0x8f7a897a,
0xbf820001, 0x8f7a8a7a,
@@ -819,7 +819,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbefe037c, 0xbefc0370,
0xf4611c7a, 0xf800,
0x80708470, 0xbefc037e,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -1069,7 +1069,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xb9f9f816, 0x876f7bff,
0xf800, 0x906f8b6f,
0xb9efa2c3, 0xb9f3f801,
-   0xb96e2a05, 0x806e816e,
+   0xb96e3a05, 0x806e816e,
0xbf0d9972, 0xbf850002,
0x8f6e896e, 0xbf820001,
0x8f6e8a6e, 0xb96f1e06,
@@ -2114,7 +2114,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x007a, 0x7e000280,
0xbefe037a, 0xbeff037b,
0xb97b02dc, 0x8f7b997b,
-   0xb97a2a05, 0x807a817a,
+   0xb97a3a05, 0x807a817a,
0xbf0d997b, 0xbf850002,
0x8f7a897a, 0xbf820001,
0x8f7a8a7a, 0xb97b1e06,
@@ -2157,7 +2157,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x0100, 0xe0704100,
0x705d0100, 0xe0704200,
0x705d0200, 0xe0704300,
-   0x705d0300, 0xb9702a05,
+   0x705d0300, 0xb9703a05,
0x80708170, 0xbf0d9973,
0xbf850002, 0x8f708970,
0xbf820001, 0x8f708a70,
@@ -2189,7 +2189,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbefe03ff, 0x,
0xbeff0380, 0xe0704000,
0x705d0200, 0xbefe03c1,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -2475,7 +2475,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xb9ef4803, 0x876f7bff,
0xf800, 0x906f8b6f,
0xb9efa2c3, 0xb9f3f801,
-   0xb96e2a05, 0x806e816e,
+   0xb96e3a05, 0x806e816e,
0xbf0d9972, 0xbf850002,
0x8f6e896e, 0xbf820001,
0x8f6e8a6e, 0xb96f1e06,
@@ -2494,438 +2494,441 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbf9f, 0xbf9f,
0xbf9f, 0x,
  };
-
  static const uint32_t cwsr_trap_gfx11_hex[] = {
-   0xbfa1, 0xbfa0021b,
+   0xbfa1, 0xbfa0021e,
0xb0804006, 0xb8f8f802,
-   0x91788678, 0xb8fbf803,
-   0x8b6eff78, 0x2000,
-   0xbfa10009, 0x8b6eff6d,
-   0x00ff, 0xbfa2001e,
-   0x8b6eff7b, 0x0400,
-   0xbfa20041, 0xbf830010,
-   0xb8fbf803, 0xbfa0fffa,
-   0x8b6eff7b, 0x0900,
-   0xbfa20015, 0x8b6eff7b,
-   0x71ff, 0xbfa10008,
-   0x8b6fff7b, 0x7080,
-   0xbfa10001, 0xbeee1287,
-   0xb8eff801, 0x846e8c6e,
-   0x8b6e6f6e, 0xbfa2000a,
+   0x9178ff78, 0x00020006,
+   0xb8fbf803, 0xbf0d9f6d,
+   0xbfa20006, 0x8b6eff78,
+   0x2000, 0xbfa10009,
0x8b6eff6d, 

[PATCH v3] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread David Belanger
Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.
v3: Remove debugger code changes.

Signed-off-by: Jay Cornwall 
Signed-off-by: David Belanger 
Reviewed-by: Felix Kuehling 
Acked-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
 3 files changed, 413 insertions(+), 385 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index b64cd46a159a..cbc506b958b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -185,7 +185,11 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
mes_add_queue_pkt.tma_addr = input->tma_addr;
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
-   mes_add_queue_pkt.trap_en = 1;
+
+   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) &&
+ (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
+ (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
+   mes_add_queue_pkt.trap_en = 1;
 
return mes_v11_0_submit_pkt_and_poll_completion(mes,
_add_queue_pkt, sizeof(mes_add_queue_pkt),
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 60a81649cf12..c7118843db05 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -742,7 +742,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbf88fffe, 0x877aff7f,
0x0400, 0x8f7a857a,
0x886d7a6d, 0xb97b02dc,
-   0x8f7b997b, 0xb97a2a05,
+   0x8f7b997b, 0xb97a3a05,
0x807a817a, 0xbf0d997b,
0xbf850002, 0x8f7a897a,
0xbf820001, 0x8f7a8a7a,
@@ -819,7 +819,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbefe037c, 0xbefc0370,
0xf4611c7a, 0xf800,
0x80708470, 0xbefc037e,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -1069,7 +1069,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xb9f9f816, 0x876f7bff,
0xf800, 0x906f8b6f,
0xb9efa2c3, 0xb9f3f801,
-   0xb96e2a05, 0x806e816e,
+   0xb96e3a05, 0x806e816e,
0xbf0d9972, 0xbf850002,
0x8f6e896e, 0xbf820001,
0x8f6e8a6e, 0xb96f1e06,
@@ -2114,7 +2114,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x007a, 0x7e000280,
0xbefe037a, 0xbeff037b,
0xb97b02dc, 0x8f7b997b,
-   0xb97a2a05, 0x807a817a,
+   0xb97a3a05, 0x807a817a,
0xbf0d997b, 0xbf850002,
0x8f7a897a, 0xbf820001,
0x8f7a8a7a, 0xb97b1e06,
@@ -2157,7 +2157,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x0100, 0xe0704100,
0x705d0100, 0xe0704200,
0x705d0200, 0xe0704300,
-   0x705d0300, 0xb9702a05,
+   0x705d0300, 0xb9703a05,
0x80708170, 0xbf0d9973,
0xbf850002, 0x8f708970,
0xbf820001, 0x8f708a70,
@@ -2189,7 +2189,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbefe03ff, 0x,
0xbeff0380, 0xe0704000,
0x705d0200, 0xbefe03c1,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -2475,7 +2475,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xb9ef4803, 0x876f7bff,
0xf800, 0x906f8b6f,
0xb9efa2c3, 0xb9f3f801,
-   0xb96e2a05, 0x806e816e,
+   0xb96e3a05, 0x806e816e,
0xbf0d9972, 0xbf850002,
0x8f6e896e, 0xbf820001,
0x8f6e8a6e, 0xb96f1e06,
@@ -2494,438 +2494,441 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbf9f, 0xbf9f,
0xbf9f, 0x,
 };
-
 static const uint32_t cwsr_trap_gfx11_hex[] = {
-   0xbfa1, 0xbfa0021b,
+   0xbfa1, 0xbfa0021e,
0xb0804006, 0xb8f8f802,
-   0x91788678, 0xb8fbf803,
-   0x8b6eff78, 0x2000,
-   0xbfa10009, 0x8b6eff6d,
-   0x00ff, 0xbfa2001e,
-   0x8b6eff7b, 0x0400,
-   0xbfa20041, 0xbf830010,
-   0xb8fbf803, 0xbfa0fffa,
-   0x8b6eff7b, 0x0900,
-   0xbfa20015, 0x8b6eff7b,
-   0x71ff, 0xbfa10008,
-   0x8b6fff7b, 0x7080,
-   0xbfa10001, 0xbeee1287,
-   0xb8eff801, 0x846e8c6e,
-   0x8b6e6f6e, 0xbfa2000a,
+   0x9178ff78, 0x00020006,
+   0xb8fbf803, 0xbf0d9f6d,
+   0xbfa20006, 0x8b6eff78,
+   0x2000, 0xbfa10009,
0x8b6eff6d, 0x00ff,
-   0xbfa20007, 0xb8eef801,
-   0x8b6eff6e, 0x0800,
-   0xbfa20003, 0x8b6eff7b,
-   

[PATCH] drm/amdgpu/vcn: update vcn4 fw shared data structure

2022-09-22 Thread Ruijing Dong
update VF_RB_SETUP_FLAG, add SMU_DPM_INTERFACE_FLAG,
and corresponding change in VCN4.

Signed-off-by: Ruijing Dong 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 8 +++-
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c   | 4 
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
index 80b7a6cfd026..253ea6b159df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h
@@ -161,7 +161,8 @@
 #define AMDGPU_VCN_SW_RING_FLAG(1 << 9)
 #define AMDGPU_VCN_FW_LOGGING_FLAG (1 << 10)
 #define AMDGPU_VCN_SMU_VERSION_INFO_FLAG (1 << 11)
-#define AMDGPU_VCN_VF_RB_SETUP_FLAG (1 << 12)
+#define AMDGPU_VCN_SMU_DPM_INTERFACE_FLAG (1 << 11)
+#define AMDGPU_VCN_VF_RB_SETUP_FLAG (1 << 14)
 
 #define AMDGPU_VCN_IB_FLAG_DECODE_BUFFER   0x0001
 #define AMDGPU_VCN_CMD_FLAG_MSG_BUFFER 0x0001
@@ -171,6 +172,9 @@
 #define VCN_CODEC_DISABLE_MASK_HEVC (1 << 2)
 #define VCN_CODEC_DISABLE_MASK_H264 (1 << 3)
 
+#define AMDGPU_VCN_SMU_DPM_INTERFACE_DGPU (0)
+#define AMDGPU_VCN_SMU_DPM_INTERFACE_APU (1)
+
 enum fw_queue_mode {
FW_QUEUE_RING_RESET = 1,
FW_QUEUE_DPG_HOLD_OFF = 2,
@@ -335,7 +339,9 @@ struct amdgpu_vcn4_fw_shared {
struct amdgpu_fw_shared_unified_queue_struct sq;
uint8_t pad1[8];
struct amdgpu_fw_shared_fw_logging fw_log;
+   uint8_t pad2[20];
struct amdgpu_fw_shared_rb_setup rb_setup;
+   struct amdgpu_fw_shared_smu_interface_info smu_dpm_interface;
 };
 
 struct amdgpu_vcn_fwlog {
diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index b6f73b87c47e..897a5ce9c9da 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -150,6 +150,10 @@ static int vcn_v4_0_sw_init(void *handle)
fw_shared->present_flag_0 = 
cpu_to_le32(AMDGPU_FW_SHARED_FLAG_0_UNIFIED_QUEUE);
fw_shared->sq.is_enabled = 1;
 
+   fw_shared->present_flag_0 |= 
cpu_to_le32(AMDGPU_VCN_SMU_DPM_INTERFACE_FLAG);
+   fw_shared->smu_dpm_interface.smu_interface_type = (adev->flags 
& AMD_IS_APU) ?
+   AMDGPU_VCN_SMU_DPM_INTERFACE_APU : 
AMDGPU_VCN_SMU_DPM_INTERFACE_DGPU;
+
if (amdgpu_sriov_vf(adev))
fw_shared->present_flag_0 |= 
cpu_to_le32(AMDGPU_VCN_VF_RB_SETUP_FLAG);
 
-- 
2.25.1



Re: [PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Felix Kuehling

Am 2022-09-22 um 13:57 schrieb Belanger, David:

[AMD Official Use Only - General]




-Original Message-
From: Kuehling, Felix 
Sent: Thursday, September 22, 2022 1:14 PM
To: Belanger, David ; amd-
g...@lists.freedesktop.org
Cc: Cornwall, Jay 
Subject: Re: [PATCH] drm/amdgpu: Enable SA software trap.

Am 2022-09-22 um 12:17 schrieb David Belanger:

Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.

Signed-off-by: Jay Cornwall 
Signed-off-by: David Belanger 
Reviewed-by: Felix Kuehling 
---
   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
   .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
   .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
   .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
   4 files changed, 437 insertions(+), 387 deletions(-)

[snip]

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
index a6fcbeeb7428..4e03d19e9333 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
@@ -358,13 +358,35 @@ static void event_interrupt_wq_v11(struct

kfd_dev *dev,

break;
case SQ_INTERRUPT_WORD_ENCODING_ERROR:
print_sq_intr_info_error(context_id0,

context_id1);

+   sq_int_priv = REG_GET_FIELD(context_id0,
+

SQ_INTERRUPT_WORD_WAVE_CTXID0, PRIV);

sq_int_errtype =

REG_GET_FIELD(context_id0,
SQ_INTERRUPT_WORD_ERROR_CTXID0, TYPE);

-   if (sq_int_errtype !=

SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST &&

-   sq_int_errtype !=

SQ_INTERRUPT_ERROR_TYPE_MEMVIOL) {

+
+   switch (sq_int_errtype) {
+   case SQ_INTERRUPT_ERROR_TYPE_EDC_FUE:
+   case SQ_INTERRUPT_ERROR_TYPE_EDC_FED:


event_interrupt_poison_consumption_v11(

dev, pasid,

source_id);

return;
+   case

SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST:

+   /*if (!(((adev->mes.sched_version &

AMDGPU_MES_VERSION_MASK) >= 4) &&

+ (adev-
ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
+ (adev-
ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3)))
+   && sq_int_priv)
+

kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,

+

KFD_EC_MASK(EC_QUEUE_WAVE_ILLEGAL_INSTRUCTION),

+   NULL, 0);*/
+   return;
+   case

SQ_INTERRUPT_ERROR_TYPE_MEMVIOL:

+   /*if (!(((adev->mes.sched_version &

AMDGPU_MES_VERSION_MASK) >= 4) &&

+ (adev-
ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
+ (adev-
ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3)))
+   && sq_int_priv)
+

kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,

+

KFD_EC_MASK(EC_QUEUE_WAVE_MEMORY_VIOLATION),

+   NULL, 0);*/

Which branch is this for? kfd_set_dbg_ev_from_interrupt shouldn't exist on
the upstream branch yet. That code is still under review for upstream.


My understanding is that it is for branch amd-staging-drm-next to make its way 
upstream.
The code that calls that function is commented out.  There are other 
pre-existing instances in that file in amd-staging-drm-next branch that are 
commented out also with that function.
Please advise if I should remove it from the patch for now or keep it as 
commented out.


I'd prefer not to check in commented-out code to the upstream branch. 
Please work with Jon to make sure he includes this in his rocm-gdb patch 
series, where these changes belong. And you can submit them to the DKMS 
branch as a separate patch in the interim.


Thanks,
  Felix




Thanks,
David B.


Regards,
    Felix



+   return;
}
break;
default:


RE: [PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Sider, Graham
[Public]



> -Original Message-
> From: Belanger, David 
> Sent: Thursday, September 22, 2022 2:49 PM
> To: Sider, Graham ; amd-
> g...@lists.freedesktop.org
> Cc: Cornwall, Jay ; Kuehling, Felix
> 
> Subject: RE: [PATCH] drm/amdgpu: Enable SA software trap.
> 
> [Public]
> 
> 
> 
> > -Original Message-
> > From: Sider, Graham 
> > Sent: Thursday, September 22, 2022 1:56 PM
> > To: Belanger, David ; amd-
> > g...@lists.freedesktop.org
> > Cc: Cornwall, Jay ; Kuehling, Felix
> > ; Belanger, David 
> > Subject: RE: [PATCH] drm/amdgpu: Enable SA software trap.
> >
> > [Public]
> >
> > > -Original Message-
> > > From: amd-gfx  On Behalf Of
> > > David Belanger
> > > Sent: Thursday, September 22, 2022 12:17 PM
> > > To: amd-gfx@lists.freedesktop.org
> > > Cc: Cornwall, Jay ; Kuehling, Felix
> > > ; Belanger, David
> 
> > > Subject: [PATCH] drm/amdgpu: Enable SA software trap.
> > >
> > > Caution: This message originated from an External Source. Use proper
> > > caution when opening attachments, clicking links, or responding.
> > >
> > >
> > > Enables support for software trap for MES >= 4.
> > > Adapted from implementation from Jay Cornwall.
> > >
> > > v2: Add IP version check in conditions.
> > >
> > > Signed-off-by: Jay Cornwall 
> > > Signed-off-by: David Belanger 
> > > Reviewed-by: Felix Kuehling 
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
> > >  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +---
> --
> > >  .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
> > >  .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
> > >  4 files changed, 437 insertions(+), 387 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > index b64cd46a159a..cbc506b958b1 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > > @@ -185,7 +185,11 @@ static int mes_v11_0_add_hw_queue(struct
> > > amdgpu_mes *mes,
> > > mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
> > > mes_add_queue_pkt.tma_addr = input->tma_addr;
> > > mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
> > > -   mes_add_queue_pkt.trap_en = 1;
> > > +
> > > +   if (!(((adev->mes.sched_version &
> AMDGPU_MES_VERSION_MASK)
> > > + >=
> > > 4) &&
> > > + (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) 
> > > &&
> > > + (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 
> > > 3
> > > +   mes_add_queue_pkt.trap_en = 1;
> >
> > I think the value for trap_en here is backwards. It should be set to 0
> > for this condition and default to 1 otherwise.
> >
> > Best,
> > Graham
> 
> Note that the condition is reversed with the "!" operator.
> 
> David B.
> 

Ah, I read it too quickly. Looks good.

Best,
Graham

> >
> > >
> > > return mes_v11_0_submit_pkt_and_poll_completion(mes,
> > > _add_queue_pkt,
> > > sizeof(mes_add_queue_pkt), diff --git
> > > a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > > b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > > index 60a81649cf12..c7118843db05 100644
> > > --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > > +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > > @@ -742,7 +742,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> > > 0xbf88fffe, 0x877aff7f,
> > > 0x0400, 0x8f7a857a,
> > > 0x886d7a6d, 0xb97b02dc,
> > > -   0x8f7b997b, 0xb97a2a05,
> > > +   0x8f7b997b, 0xb97a3a05,
> > > 0x807a817a, 0xbf0d997b,
> > > 0xbf850002, 0x8f7a897a,
> > > 0xbf820001, 0x8f7a8a7a,
> > > @@ -819,7 +819,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> > > 0xbefe037c, 0xbefc0370,
> > > 0xf4611c7a, 0xf800,
> > > 0x80708470, 0xbefc037e,
> > > -   0xb9702a05, 0x80708170,
> > > +   0xb9703a05, 0x80708170,
> > > 0xbf0d9973, 0xbf850002,
> > > 0x8f708970, 0xbf820001,
> > > 0x8f708a70, 0xb97a1e06,
> > > @@ -1069,7 +1069,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> > > 0xb9f9f816, 0x876f7bff,
> > > 0xf800, 0x906f8b6f,
> > > 0xb9efa2c3, 0xb9f3f801,
> > > -   0xb96e2a05, 0x806e816e,
> > > +   0xb96e3a05, 0x806e816e,
> > > 0xbf0d9972, 0xbf850002,
> > > 0x8f6e896e, 0xbf820001,
> > > 0x8f6e8a6e, 0xb96f1e06,
> > > @@ -2114,7 +2114,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> > > 0x007a, 0x7e000280,
> > > 0xbefe037a, 0xbeff037b,
> > > 0xb97b02dc, 0x8f7b997b,
> > > -   0xb97a2a05, 0x807a817a,
> > > +   0xb97a3a05, 0x807a817a,
> > > 0xbf0d997b, 0xbf850002,
> > > 0x8f7a897a, 0xbf820001,
> > > 0x8f7a8a7a, 0xb97b1e06,
> > > @@ -2157,7 +2157,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> > > 0x0100, 0xe0704100,
> > > 

Re: [PATCH v2] drm/amdgpu: pass queue size and is_aql_queue to MES

2022-09-22 Thread Felix Kuehling

Am 2022-09-22 um 14:02 schrieb Graham Sider:

Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   | 2 ++
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 4 
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 ++
  drivers/gpu/drm/amd/include/mes_v11_api_def.h | 3 ++-
  4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 7b46f6bf4187..ad980f4b66e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -222,6 +222,8 @@ struct mes_add_queue_input {
uint64_ttba_addr;
uint64_ttma_addr;
uint32_tis_kfd_process;
+   uint32_tis_aql_queue;
+   uint32_tqueue_size;
  };
  
  struct mes_remove_queue_input {

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index b64cd46a159a..5581e03fc956 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -187,6 +187,10 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
mes_add_queue_pkt.trap_en = 1;
  
+	/* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */

+   mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
+   mes_add_queue_pkt.gds_size = input->queue_size;
+
return mes_v11_0_submit_pkt_and_poll_completion(mes,
_add_queue_pkt, sizeof(mes_add_queue_pkt),
offsetof(union MESAPI__ADD_QUEUE, api_status));
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e83725a28106..007a3db69df1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -205,6 +205,8 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
}
  
  	queue_input.is_kfd_process = 1;

+   queue_input.is_aql_queue = (q->properties.format == 
KFD_QUEUE_FORMAT_AQL);
+   queue_input.queue_size = q->properties.queue_size >> 2;
  
  	queue_input.paging = false;

queue_input.tba_addr = qpd->tba_addr;
diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
index 50bfa513cb35..7e85cdc5bd34 100644
--- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
@@ -269,7 +269,8 @@ union MESAPI__ADD_QUEUE {
uint32_t map_kiq_utility_queue  : 1;
uint32_t is_kfd_process : 1;
uint32_t trap_en: 1;
-   uint32_t reserved   : 21;
+   uint32_t is_aql_queue   : 1;
+   uint32_t reserved   : 20;
};
struct MES_API_STATUS   api_status;
uint64_ttma_addr;


RE: [PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Belanger, David
[Public]



> -Original Message-
> From: Sider, Graham 
> Sent: Thursday, September 22, 2022 1:56 PM
> To: Belanger, David ; amd-
> g...@lists.freedesktop.org
> Cc: Cornwall, Jay ; Kuehling, Felix
> ; Belanger, David 
> Subject: RE: [PATCH] drm/amdgpu: Enable SA software trap.
> 
> [Public]
> 
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> > David Belanger
> > Sent: Thursday, September 22, 2022 12:17 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Cornwall, Jay ; Kuehling, Felix
> > ; Belanger, David 
> > Subject: [PATCH] drm/amdgpu: Enable SA software trap.
> >
> > Caution: This message originated from an External Source. Use proper
> > caution when opening attachments, clicking links, or responding.
> >
> >
> > Enables support for software trap for MES >= 4.
> > Adapted from implementation from Jay Cornwall.
> >
> > v2: Add IP version check in conditions.
> >
> > Signed-off-by: Jay Cornwall 
> > Signed-off-by: David Belanger 
> > Reviewed-by: Felix Kuehling 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
> >  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
> >  .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
> >  .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
> >  4 files changed, 437 insertions(+), 387 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > index b64cd46a159a..cbc506b958b1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> > @@ -185,7 +185,11 @@ static int mes_v11_0_add_hw_queue(struct
> > amdgpu_mes *mes,
> > mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
> > mes_add_queue_pkt.tma_addr = input->tma_addr;
> > mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
> > -   mes_add_queue_pkt.trap_en = 1;
> > +
> > +   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >=
> > 4) &&
> > + (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
> > + (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
> > +   mes_add_queue_pkt.trap_en = 1;
> 
> I think the value for trap_en here is backwards. It should be set to 0 for 
> this
> condition and default to 1 otherwise.
> 
> Best,
> Graham

Note that the condition is reversed with the "!" operator.

David B.

> 
> >
> > return mes_v11_0_submit_pkt_and_poll_completion(mes,
> > _add_queue_pkt, sizeof(mes_add_queue_pkt),
> > diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > index 60a81649cf12..c7118843db05 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> > @@ -742,7 +742,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> > 0xbf88fffe, 0x877aff7f,
> > 0x0400, 0x8f7a857a,
> > 0x886d7a6d, 0xb97b02dc,
> > -   0x8f7b997b, 0xb97a2a05,
> > +   0x8f7b997b, 0xb97a3a05,
> > 0x807a817a, 0xbf0d997b,
> > 0xbf850002, 0x8f7a897a,
> > 0xbf820001, 0x8f7a8a7a,
> > @@ -819,7 +819,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> > 0xbefe037c, 0xbefc0370,
> > 0xf4611c7a, 0xf800,
> > 0x80708470, 0xbefc037e,
> > -   0xb9702a05, 0x80708170,
> > +   0xb9703a05, 0x80708170,
> > 0xbf0d9973, 0xbf850002,
> > 0x8f708970, 0xbf820001,
> > 0x8f708a70, 0xb97a1e06,
> > @@ -1069,7 +1069,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> > 0xb9f9f816, 0x876f7bff,
> > 0xf800, 0x906f8b6f,
> > 0xb9efa2c3, 0xb9f3f801,
> > -   0xb96e2a05, 0x806e816e,
> > +   0xb96e3a05, 0x806e816e,
> > 0xbf0d9972, 0xbf850002,
> > 0x8f6e896e, 0xbf820001,
> > 0x8f6e8a6e, 0xb96f1e06,
> > @@ -2114,7 +2114,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> > 0x007a, 0x7e000280,
> > 0xbefe037a, 0xbeff037b,
> > 0xb97b02dc, 0x8f7b997b,
> > -   0xb97a2a05, 0x807a817a,
> > +   0xb97a3a05, 0x807a817a,
> > 0xbf0d997b, 0xbf850002,
> > 0x8f7a897a, 0xbf820001,
> > 0x8f7a8a7a, 0xb97b1e06,
> > @@ -2157,7 +2157,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> > 0x0100, 0xe0704100,
> > 0x705d0100, 0xe0704200,
> > 0x705d0200, 0xe0704300,
> > -   0x705d0300, 0xb9702a05,
> > +   0x705d0300, 0xb9703a05,
> > 0x80708170, 0xbf0d9973,
> > 0xbf850002, 0x8f708970,
> > 0xbf820001, 0x8f708a70,
> > @@ -2189,7 +2189,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> > 0xbefe03ff, 0x,
> > 0xbeff0380, 0xe0704000,
> > 0x705d0200, 0xbefe03c1,
> > -   0xb9702a05, 0x80708170,
> > +   0xb9703a05, 0x80708170,
> > 0xbf0d9973, 0xbf850002,
> > 0x8f708970, 0xbf820001,
> > 0x8f708a70, 0xb97a1e06,

Re: [REGRESSION] Graphical issues on Lenovo Yoga 7 14ARB7 laptop since v6.0-rc1 (bisected)

2022-09-22 Thread August Wikerfors

Hi Alex,

On 2022-09-22 15:59, Alex Deucher wrote:

On Thu, Sep 22, 2022 at 8:54 AM Thorsten Leemhuis
 wrote:


Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

@amdgpu developers, what up here? August afaics didn't even get a single
reply for his report that even identifies the change that's causing the
problem. We're already late in the development cycle, so it would be
good if someone could take a closer look into this before it's too late
for 6.0.


Been a busy week.  Haven't had a chance to look into this yet.  Does
the issue still happen with this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=66f99628eb24409cb8feb5061f78283c8b65f820
Yes, the issue still happens with that patch, and also with the current 
git master (dc164f4fb00a0abebdfff132f8bc7291a28f5401).


Regards,
August Wikerfors


[PATCH v2] drm/amdgpu: pass queue size and is_aql_queue to MES

2022-09-22 Thread Graham Sider
Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

v2: Make is_aql_queue assign clearer

Signed-off-by: Graham Sider 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   | 2 ++
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 4 
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 ++
 drivers/gpu/drm/amd/include/mes_v11_api_def.h | 3 ++-
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 7b46f6bf4187..ad980f4b66e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -222,6 +222,8 @@ struct mes_add_queue_input {
uint64_ttba_addr;
uint64_ttma_addr;
uint32_tis_kfd_process;
+   uint32_tis_aql_queue;
+   uint32_tqueue_size;
 };
 
 struct mes_remove_queue_input {
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index b64cd46a159a..5581e03fc956 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -187,6 +187,10 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
mes_add_queue_pkt.trap_en = 1;
 
+   /* For KFD, gds_size is re-used for queue size (needed in MES for AQL 
queues) */
+   mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
+   mes_add_queue_pkt.gds_size = input->queue_size;
+
return mes_v11_0_submit_pkt_and_poll_completion(mes,
_add_queue_pkt, sizeof(mes_add_queue_pkt),
offsetof(union MESAPI__ADD_QUEUE, api_status));
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e83725a28106..007a3db69df1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -205,6 +205,8 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
}
 
queue_input.is_kfd_process = 1;
+   queue_input.is_aql_queue = (q->properties.format == 
KFD_QUEUE_FORMAT_AQL);
+   queue_input.queue_size = q->properties.queue_size >> 2;
 
queue_input.paging = false;
queue_input.tba_addr = qpd->tba_addr;
diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
index 50bfa513cb35..7e85cdc5bd34 100644
--- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
@@ -269,7 +269,8 @@ union MESAPI__ADD_QUEUE {
uint32_t map_kiq_utility_queue  : 1;
uint32_t is_kfd_process : 1;
uint32_t trap_en: 1;
-   uint32_t reserved   : 21;
+   uint32_t is_aql_queue   : 1;
+   uint32_t reserved   : 20;
};
struct MES_API_STATUS   api_status;
uint64_ttma_addr;
-- 
2.25.1



RE: [PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Belanger, David
[AMD Official Use Only - General]



> -Original Message-
> From: Kuehling, Felix 
> Sent: Thursday, September 22, 2022 1:14 PM
> To: Belanger, David ; amd-
> g...@lists.freedesktop.org
> Cc: Cornwall, Jay 
> Subject: Re: [PATCH] drm/amdgpu: Enable SA software trap.
> 
> Am 2022-09-22 um 12:17 schrieb David Belanger:
> > Enables support for software trap for MES >= 4.
> > Adapted from implementation from Jay Cornwall.
> >
> > v2: Add IP version check in conditions.
> >
> > Signed-off-by: Jay Cornwall 
> > Signed-off-by: David Belanger 
> > Reviewed-by: Felix Kuehling 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
> >   .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
> >   .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
> >   .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
> >   4 files changed, 437 insertions(+), 387 deletions(-)
> [snip]
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
> > index a6fcbeeb7428..4e03d19e9333 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
> > @@ -358,13 +358,35 @@ static void event_interrupt_wq_v11(struct
> kfd_dev *dev,
> > break;
> > case SQ_INTERRUPT_WORD_ENCODING_ERROR:
> > print_sq_intr_info_error(context_id0,
> context_id1);
> > +   sq_int_priv = REG_GET_FIELD(context_id0,
> > +
>   SQ_INTERRUPT_WORD_WAVE_CTXID0, PRIV);
> > sq_int_errtype =
> REG_GET_FIELD(context_id0,
> >
>   SQ_INTERRUPT_WORD_ERROR_CTXID0, TYPE);
> > -   if (sq_int_errtype !=
> SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST &&
> > -   sq_int_errtype !=
> SQ_INTERRUPT_ERROR_TYPE_MEMVIOL) {
> > +
> > +   switch (sq_int_errtype) {
> > +   case SQ_INTERRUPT_ERROR_TYPE_EDC_FUE:
> > +   case SQ_INTERRUPT_ERROR_TYPE_EDC_FED:
> >
>   event_interrupt_poison_consumption_v11(
> > dev, pasid,
> source_id);
> > return;
> > +   case
> SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST:
> > +   /*if (!(((adev->mes.sched_version &
> AMDGPU_MES_VERSION_MASK) >= 4) &&
> > + (adev-
> >ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
> > + (adev-
> >ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3)))
> > +   && sq_int_priv)
> > +
>   kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,
> > +
>   KFD_EC_MASK(EC_QUEUE_WAVE_ILLEGAL_INSTRUCTION),
> > +   NULL, 0);*/
> > +   return;
> > +   case
> SQ_INTERRUPT_ERROR_TYPE_MEMVIOL:
> > +   /*if (!(((adev->mes.sched_version &
> AMDGPU_MES_VERSION_MASK) >= 4) &&
> > + (adev-
> >ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
> > + (adev-
> >ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3)))
> > +   && sq_int_priv)
> > +
>   kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,
> > +
>   KFD_EC_MASK(EC_QUEUE_WAVE_MEMORY_VIOLATION),
> > +   NULL, 0);*/
> 
> Which branch is this for? kfd_set_dbg_ev_from_interrupt shouldn't exist on
> the upstream branch yet. That code is still under review for upstream.
> 

My understanding is that it is for branch amd-staging-drm-next to make its way 
upstream.
The code that calls that function is commented out.  There are other 
pre-existing instances in that file in amd-staging-drm-next branch that are 
commented out also with that function.
Please advise if I should remove it from the patch for now or keep it as 
commented out.

Thanks,
David B.

> Regards,
>    Felix
> 
> 
> > +   return;
> > }
> > break;
> > default:


RE: [PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Sider, Graham
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of
> David Belanger
> Sent: Thursday, September 22, 2022 12:17 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Cornwall, Jay ; Kuehling, Felix
> ; Belanger, David 
> Subject: [PATCH] drm/amdgpu: Enable SA software trap.
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> Enables support for software trap for MES >= 4.
> Adapted from implementation from Jay Cornwall.
> 
> v2: Add IP version check in conditions.
> 
> Signed-off-by: Jay Cornwall 
> Signed-off-by: David Belanger 
> Reviewed-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
>  .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
>  .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
>  4 files changed, 437 insertions(+), 387 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index b64cd46a159a..cbc506b958b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -185,7 +185,11 @@ static int mes_v11_0_add_hw_queue(struct
> amdgpu_mes *mes,
> mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
> mes_add_queue_pkt.tma_addr = input->tma_addr;
> mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
> -   mes_add_queue_pkt.trap_en = 1;
> +
> +   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >=
> 4) &&
> + (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
> + (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
> +   mes_add_queue_pkt.trap_en = 1;

I think the value for trap_en here is backwards. It should be set to 0 for this 
condition and default to 1 otherwise.

Best,
Graham

> 
> return mes_v11_0_submit_pkt_and_poll_completion(mes,
> _add_queue_pkt, sizeof(mes_add_queue_pkt), diff 
> --git
> a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> index 60a81649cf12..c7118843db05 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> @@ -742,7 +742,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> 0xbf88fffe, 0x877aff7f,
> 0x0400, 0x8f7a857a,
> 0x886d7a6d, 0xb97b02dc,
> -   0x8f7b997b, 0xb97a2a05,
> +   0x8f7b997b, 0xb97a3a05,
> 0x807a817a, 0xbf0d997b,
> 0xbf850002, 0x8f7a897a,
> 0xbf820001, 0x8f7a8a7a,
> @@ -819,7 +819,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> 0xbefe037c, 0xbefc0370,
> 0xf4611c7a, 0xf800,
> 0x80708470, 0xbefc037e,
> -   0xb9702a05, 0x80708170,
> +   0xb9703a05, 0x80708170,
> 0xbf0d9973, 0xbf850002,
> 0x8f708970, 0xbf820001,
> 0x8f708a70, 0xb97a1e06,
> @@ -1069,7 +1069,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> 0xb9f9f816, 0x876f7bff,
> 0xf800, 0x906f8b6f,
> 0xb9efa2c3, 0xb9f3f801,
> -   0xb96e2a05, 0x806e816e,
> +   0xb96e3a05, 0x806e816e,
> 0xbf0d9972, 0xbf850002,
> 0x8f6e896e, 0xbf820001,
> 0x8f6e8a6e, 0xb96f1e06,
> @@ -2114,7 +2114,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0x007a, 0x7e000280,
> 0xbefe037a, 0xbeff037b,
> 0xb97b02dc, 0x8f7b997b,
> -   0xb97a2a05, 0x807a817a,
> +   0xb97a3a05, 0x807a817a,
> 0xbf0d997b, 0xbf850002,
> 0x8f7a897a, 0xbf820001,
> 0x8f7a8a7a, 0xb97b1e06,
> @@ -2157,7 +2157,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0x0100, 0xe0704100,
> 0x705d0100, 0xe0704200,
> 0x705d0200, 0xe0704300,
> -   0x705d0300, 0xb9702a05,
> +   0x705d0300, 0xb9703a05,
> 0x80708170, 0xbf0d9973,
> 0xbf850002, 0x8f708970,
> 0xbf820001, 0x8f708a70,
> @@ -2189,7 +2189,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0xbefe03ff, 0x,
> 0xbeff0380, 0xe0704000,
> 0x705d0200, 0xbefe03c1,
> -   0xb9702a05, 0x80708170,
> +   0xb9703a05, 0x80708170,
> 0xbf0d9973, 0xbf850002,
> 0x8f708970, 0xbf820001,
> 0x8f708a70, 0xb97a1e06,
> @@ -2475,7 +2475,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0xb9ef4803, 0x876f7bff,
> 0xf800, 0x906f8b6f,
> 0xb9efa2c3, 0xb9f3f801,
> -   0xb96e2a05, 0x806e816e,
> +   0xb96e3a05, 0x806e816e,
> 0xbf0d9972, 0xbf850002,
> 0x8f6e896e, 0xbf820001,
> 0x8f6e8a6e, 0xb96f1e06,
> @@ -2494,438 +2494,441 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0xbf9f, 0xbf9f,
> 0xbf9f, 0x,
>  };
> -
>  static const uint32_t cwsr_trap_gfx11_hex[] = {
> -   0xbfa1, 0xbfa0021b,

Re: [PATCH] drm/amdgpu: Fix VRAM eviction issue

2022-09-22 Thread Christian König




Am 22.09.22 um 17:42 schrieb Arunpravin Paneer Selvam:



On 9/22/2022 9:01 PM, Christian König wrote:

Am 22.09.22 um 17:14 schrieb Arunpravin Paneer Selvam:

A user reported that when he starts a game (MTGA) with wine,
he observed an error msg failed to pin framebuffer with error -12.
Found an issue with the VRAM mem type eviction decision condition
logic. This patch will fix the if condition code error.

Gitlab bug link:
https://gitlab.freedesktop.org/drm/amd/-/issues/2159

Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible 
functions")
Signed-off-by: Arunpravin Paneer Selvam 



Reviewed-by: Christian König 

Can you push it yourself now?

No, I didn't get the commit rights yet.


Pushed to drm-misc-next. Please make sure the bug is closed.

Thanks,
Christian.



Thanks,
Arun



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

index d1a2619fa89f..73a517bcf5c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -746,7 +746,7 @@ static bool amdgpu_vram_mgr_intersects(struct 
ttm_resource_manager *man,

  (amdgpu_vram_mgr_block_size(block) >> PAGE_SHIFT);
    if (place->fpfn < lpfn &&
-    (place->lpfn && place->lpfn > fpfn))
+    (!place->lpfn || place->lpfn > fpfn))
  return true;
  }








Re: [PATCH v2] drm/amdgpu: Fix VRAM BO swap issue

2022-09-22 Thread Christian König




Am 22.09.22 um 19:14 schrieb Christian König:



Am 22.09.22 um 17:26 schrieb Arunpravin Paneer Selvam:

DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag

Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index b1c455329023..b1223c8e30c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -426,6 +426,7 @@ static bool amdgpu_mem_visible(struct 
amdgpu_device *adev,

  {
  uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
  struct amdgpu_res_cursor cursor;
+    u64 start, size, total_size = 0;
    if (mem->mem_type == TTM_PL_SYSTEM ||
  mem->mem_type == TTM_PL_TT)
@@ -435,8 +436,23 @@ static bool amdgpu_mem_visible(struct 
amdgpu_device *adev,

    amdgpu_res_first(mem, 0, mem_size, );
  -    /* ttm_resource_ioremap only supports contiguous memory */
-    if (cursor.size != mem_size)
+    do {
+    start = cursor.start;
+    size = cursor.size;
+
+    total_size += size;
+
+    amdgpu_res_next(, cursor.size);
+
+    if (!cursor.remaining)
+    break;
+
+    /* ttm_resource_ioremap only supports contiguous memory */
+    if (start + size != cursor.start)
+    return false;
+    } while (1);
+
+    if (total_size != mem_size)
  return false;


I would completely drop this extra check.


  return cursor.start + cursor.size <= adev->gmc.visible_vram_size;


Instead of this you should be able to do all of this in one go.

Something like this here should work:

amdgpu_res_first(...
end = cursor.start + cursor.size;
do (
    amdgpu_res_next(
    if (end != cursor.start)
        return false;
    end = cursor.start + cursor.size;
} while (cursor.remaining);


My fault, this should have been

while (cursor.remaining) {
...
}

Regards,
Christian.



return end <= visible_vram_size;

Saves a bit of extra calculations and variables.

Regards,
Christian.




Re: [PATCH] drm/amdgpu: pass queue size and is_aql_queue to MES

2022-09-22 Thread Felix Kuehling

Am 2022-09-20 um 10:29 schrieb Graham Sider:

Update mes_v11_api_def.h add_queue API with is_aql_queue parameter. Also
re-use gds_size for the queue size (unused for KFD). MES requires the
queue size in order to compute the actual wptr offset within the queue
RB since it increases monotonically for AQL queues.

Signed-off-by: Graham Sider 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h   | 2 ++
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 4 
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 ++
  drivers/gpu/drm/amd/include/mes_v11_api_def.h | 3 ++-
  4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 7b46f6bf4187..ad980f4b66e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -222,6 +222,8 @@ struct mes_add_queue_input {
uint64_ttba_addr;
uint64_ttma_addr;
uint32_tis_kfd_process;
+   uint32_tis_aql_queue;
+   uint32_tqueue_size;
  };
  
  struct mes_remove_queue_input {

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index b64cd46a159a..5581e03fc956 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -187,6 +187,10 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
mes_add_queue_pkt.trap_en = 1;
  
+	/* For KFD, gds_size is re-used for queue size (needed in MES for AQL queues) */

+   mes_add_queue_pkt.is_aql_queue = input->is_aql_queue;
+   mes_add_queue_pkt.gds_size = input->queue_size;
+
return mes_v11_0_submit_pkt_and_poll_completion(mes,
_add_queue_pkt, sizeof(mes_add_queue_pkt),
offsetof(union MESAPI__ADD_QUEUE, api_status));
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e83725a28106..8644b212cf4b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -205,6 +205,8 @@ static int add_queue_mes(struct device_queue_manager *dqm, 
struct queue *q,
}
  
  	queue_input.is_kfd_process = 1;

+   queue_input.is_aql_queue = q->properties.format;


This would be clearer:

queue_input.is_aql_queue = (q->properties.format == 
KFD_QUEUE_FORMAT_AQL);

Regards,
  Felix



+   queue_input.queue_size = q->properties.queue_size >> 2;
  
  	queue_input.paging = false;

queue_input.tba_addr = qpd->tba_addr;
diff --git a/drivers/gpu/drm/amd/include/mes_v11_api_def.h 
b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
index 50bfa513cb35..7e85cdc5bd34 100644
--- a/drivers/gpu/drm/amd/include/mes_v11_api_def.h
+++ b/drivers/gpu/drm/amd/include/mes_v11_api_def.h
@@ -269,7 +269,8 @@ union MESAPI__ADD_QUEUE {
uint32_t map_kiq_utility_queue  : 1;
uint32_t is_kfd_process : 1;
uint32_t trap_en: 1;
-   uint32_t reserved   : 21;
+   uint32_t is_aql_queue   : 1;
+   uint32_t reserved   : 20;
};
struct MES_API_STATUS   api_status;
uint64_ttma_addr;


Re: [PATCH] drm/amdkfd: fix MQD init for GFX11 in init_mqd

2022-09-22 Thread Felix Kuehling

Am 2022-09-21 um 14:47 schrieb Graham Sider:

Set remaining compute_static_thread_mgmt_se* accordingly.

Signed-off-by: Graham Sider 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
index d982c154537e..26b53b6d673e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
@@ -126,6 +126,10 @@ static void init_mqd(struct mqd_manager *mm, void **mqd,
m->compute_static_thread_mgmt_se1 = 0x;
m->compute_static_thread_mgmt_se2 = 0x;
m->compute_static_thread_mgmt_se3 = 0x;
+   m->compute_static_thread_mgmt_se4 = 0x;
+   m->compute_static_thread_mgmt_se5 = 0x;
+   m->compute_static_thread_mgmt_se6 = 0x;
+   m->compute_static_thread_mgmt_se7 = 0x;
  
  	m->cp_hqd_persistent_state = CP_HQD_PERSISTENT_STATE__PRELOAD_REQ_MASK |

0x55 << CP_HQD_PERSISTENT_STATE__PRELOAD_SIZE__SHIFT;


Re: [PATCH v2] drm/amdgpu: Fix VRAM BO swap issue

2022-09-22 Thread Christian König




Am 22.09.22 um 17:26 schrieb Arunpravin Paneer Selvam:

DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag

Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b1c455329023..b1223c8e30c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -426,6 +426,7 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
  {
uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
struct amdgpu_res_cursor cursor;
+   u64 start, size, total_size = 0;
  
  	if (mem->mem_type == TTM_PL_SYSTEM ||

mem->mem_type == TTM_PL_TT)
@@ -435,8 +436,23 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
  
  	amdgpu_res_first(mem, 0, mem_size, );
  
-	/* ttm_resource_ioremap only supports contiguous memory */

-   if (cursor.size != mem_size)
+   do {
+   start = cursor.start;
+   size = cursor.size;
+
+   total_size += size;
+
+   amdgpu_res_next(, cursor.size);
+
+   if (!cursor.remaining)
+   break;
+
+   /* ttm_resource_ioremap only supports contiguous memory */
+   if (start + size != cursor.start)
+   return false;
+   } while (1);
+
+   if (total_size != mem_size)
return false;


I would completely drop this extra check.


return cursor.start + cursor.size <= adev->gmc.visible_vram_size;


Instead of this you should be able to do all of this in one go.

Something like this here should work:

amdgpu_res_first(...
end = cursor.start + cursor.size;
do (
    amdgpu_res_next(
    if (end != cursor.start)
        return false;
    end = cursor.start + cursor.size;
} while (cursor.remaining);
return end <= visible_vram_size;

Saves a bit of extra calculations and variables.

Regards,
Christian.


Re: [PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Felix Kuehling

Am 2022-09-22 um 12:17 schrieb David Belanger:

Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.

Signed-off-by: Jay Cornwall 
Signed-off-by: David Belanger 
Reviewed-by: Felix Kuehling 
---
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
  .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
  .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
  4 files changed, 437 insertions(+), 387 deletions(-)

[snip]

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
index a6fcbeeb7428..4e03d19e9333 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
@@ -358,13 +358,35 @@ static void event_interrupt_wq_v11(struct kfd_dev *dev,
break;
case SQ_INTERRUPT_WORD_ENCODING_ERROR:
print_sq_intr_info_error(context_id0, 
context_id1);
+   sq_int_priv = REG_GET_FIELD(context_id0,
+   SQ_INTERRUPT_WORD_WAVE_CTXID0, 
PRIV);
sq_int_errtype = REG_GET_FIELD(context_id0,
SQ_INTERRUPT_WORD_ERROR_CTXID0, 
TYPE);
-   if (sq_int_errtype != 
SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST &&
-   sq_int_errtype != 
SQ_INTERRUPT_ERROR_TYPE_MEMVIOL) {
+
+   switch (sq_int_errtype) {
+   case SQ_INTERRUPT_ERROR_TYPE_EDC_FUE:
+   case SQ_INTERRUPT_ERROR_TYPE_EDC_FED:
event_interrupt_poison_consumption_v11(
dev, pasid, source_id);
return;
+   case SQ_INTERRUPT_ERROR_TYPE_ILLEGAL_INST:
+   /*if (!(((adev->mes.sched_version & 
AMDGPU_MES_VERSION_MASK) >= 4) &&
+ (adev->ip_versions[GC_HWIP][0] >= 
IP_VERSION(11, 0, 0)) &&
+ (adev->ip_versions[GC_HWIP][0] 
<= IP_VERSION(11, 0, 3)))
+   && sq_int_priv)
+   
kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,
+   
KFD_EC_MASK(EC_QUEUE_WAVE_ILLEGAL_INSTRUCTION),
+   NULL, 0);*/
+   return;
+   case SQ_INTERRUPT_ERROR_TYPE_MEMVIOL:
+   /*if (!(((adev->mes.sched_version & 
AMDGPU_MES_VERSION_MASK) >= 4) &&
+ (adev->ip_versions[GC_HWIP][0] >= 
IP_VERSION(11, 0, 0)) &&
+ (adev->ip_versions[GC_HWIP][0] 
<= IP_VERSION(11, 0, 3)))
+   && sq_int_priv)
+   
kfd_set_dbg_ev_from_interrupt(dev, pasid, -1,
+   
KFD_EC_MASK(EC_QUEUE_WAVE_MEMORY_VIOLATION),
+   NULL, 0);*/


Which branch is this for? kfd_set_dbg_ev_from_interrupt shouldn't exist 
on the upstream branch yet. That code is still under review for upstream.


Regards,
  Felix



+   return;
}
break;
default:


Re: [PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread Alex Deucher
On Thu, Sep 22, 2022 at 12:18 PM David Belanger  wrote:
>
> Enables support for software trap for MES >= 4.
> Adapted from implementation from Jay Cornwall.
>
> v2: Add IP version check in conditions.
>
> Signed-off-by: Jay Cornwall 
> Signed-off-by: David Belanger 
> Reviewed-by: Felix Kuehling 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
>  .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
>  .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
>  4 files changed, 437 insertions(+), 387 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index b64cd46a159a..cbc506b958b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -185,7 +185,11 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
> mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
> mes_add_queue_pkt.tma_addr = input->tma_addr;
> mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
> -   mes_add_queue_pkt.trap_en = 1;
> +
> +   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) &&
> + (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
> + (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
> +   mes_add_queue_pkt.trap_en = 1;
>
> return mes_v11_0_submit_pkt_and_poll_completion(mes,
> _add_queue_pkt, sizeof(mes_add_queue_pkt),
> diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
> b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> index 60a81649cf12..c7118843db05 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
> @@ -742,7 +742,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> 0xbf88fffe, 0x877aff7f,
> 0x0400, 0x8f7a857a,
> 0x886d7a6d, 0xb97b02dc,
> -   0x8f7b997b, 0xb97a2a05,
> +   0x8f7b997b, 0xb97a3a05,
> 0x807a817a, 0xbf0d997b,
> 0xbf850002, 0x8f7a897a,
> 0xbf820001, 0x8f7a8a7a,
> @@ -819,7 +819,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> 0xbefe037c, 0xbefc0370,
> 0xf4611c7a, 0xf800,
> 0x80708470, 0xbefc037e,
> -   0xb9702a05, 0x80708170,
> +   0xb9703a05, 0x80708170,
> 0xbf0d9973, 0xbf850002,
> 0x8f708970, 0xbf820001,
> 0x8f708a70, 0xb97a1e06,
> @@ -1069,7 +1069,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
> 0xb9f9f816, 0x876f7bff,
> 0xf800, 0x906f8b6f,
> 0xb9efa2c3, 0xb9f3f801,
> -   0xb96e2a05, 0x806e816e,
> +   0xb96e3a05, 0x806e816e,
> 0xbf0d9972, 0xbf850002,
> 0x8f6e896e, 0xbf820001,
> 0x8f6e8a6e, 0xb96f1e06,
> @@ -2114,7 +2114,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0x007a, 0x7e000280,
> 0xbefe037a, 0xbeff037b,
> 0xb97b02dc, 0x8f7b997b,
> -   0xb97a2a05, 0x807a817a,
> +   0xb97a3a05, 0x807a817a,
> 0xbf0d997b, 0xbf850002,
> 0x8f7a897a, 0xbf820001,
> 0x8f7a8a7a, 0xb97b1e06,
> @@ -2157,7 +2157,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0x0100, 0xe0704100,
> 0x705d0100, 0xe0704200,
> 0x705d0200, 0xe0704300,
> -   0x705d0300, 0xb9702a05,
> +   0x705d0300, 0xb9703a05,
> 0x80708170, 0xbf0d9973,
> 0xbf850002, 0x8f708970,
> 0xbf820001, 0x8f708a70,
> @@ -2189,7 +2189,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0xbefe03ff, 0x,
> 0xbeff0380, 0xe0704000,
> 0x705d0200, 0xbefe03c1,
> -   0xb9702a05, 0x80708170,
> +   0xb9703a05, 0x80708170,
> 0xbf0d9973, 0xbf850002,
> 0x8f708970, 0xbf820001,
> 0x8f708a70, 0xb97a1e06,
> @@ -2475,7 +2475,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0xb9ef4803, 0x876f7bff,
> 0xf800, 0x906f8b6f,
> 0xb9efa2c3, 0xb9f3f801,
> -   0xb96e2a05, 0x806e816e,
> +   0xb96e3a05, 0x806e816e,
> 0xbf0d9972, 0xbf850002,
> 0x8f6e896e, 0xbf820001,
> 0x8f6e8a6e, 0xb96f1e06,
> @@ -2494,438 +2494,441 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
> 0xbf9f, 0xbf9f,
> 0xbf9f, 0x,
>  };
> -
>  static const uint32_t cwsr_trap_gfx11_hex[] = {
> -   0xbfa1, 0xbfa0021b,
> +   0xbfa1, 0xbfa0021e,
> 0xb0804006, 0xb8f8f802,
> -   0x91788678, 0xb8fbf803,
> -   0x8b6eff78, 0x2000,
> -   0xbfa10009, 0x8b6eff6d,
> -   0x00ff, 0xbfa2001e,
> -   0x8b6eff7b, 0x0400,
> -   0xbfa20041, 0xbf830010,
> -   0xb8fbf803, 0xbfa0fffa,
> -   0x8b6eff7b, 0x0900,
> -   0xbfa20015, 0x8b6eff7b,
> -   0x71ff, 0xbfa10008,
> -   0x8b6fff7b, 0x7080,
> -   0xbfa10001, 0xbeee1287,
> -   

[PATCH] drm/amdgpu: Enable SA software trap.

2022-09-22 Thread David Belanger
Enables support for software trap for MES >= 4.
Adapted from implementation from Jay Cornwall.

v2: Add IP version check in conditions.

Signed-off-by: Jay Cornwall 
Signed-off-by: David Belanger 
Reviewed-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c|   6 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 771 +-
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm|  21 +
 .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c  |  26 +-
 4 files changed, 437 insertions(+), 387 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index b64cd46a159a..cbc506b958b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -185,7 +185,11 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes,
mes_add_queue_pkt.trap_handler_addr = input->tba_addr;
mes_add_queue_pkt.tma_addr = input->tma_addr;
mes_add_queue_pkt.is_kfd_process = input->is_kfd_process;
-   mes_add_queue_pkt.trap_en = 1;
+
+   if (!(((adev->mes.sched_version & AMDGPU_MES_VERSION_MASK) >= 4) &&
+ (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(11, 0, 0)) &&
+ (adev->ip_versions[GC_HWIP][0] <= IP_VERSION(11, 0, 3
+   mes_add_queue_pkt.trap_en = 1;
 
return mes_v11_0_submit_pkt_and_poll_completion(mes,
_add_queue_pkt, sizeof(mes_add_queue_pkt),
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 60a81649cf12..c7118843db05 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -742,7 +742,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbf88fffe, 0x877aff7f,
0x0400, 0x8f7a857a,
0x886d7a6d, 0xb97b02dc,
-   0x8f7b997b, 0xb97a2a05,
+   0x8f7b997b, 0xb97a3a05,
0x807a817a, 0xbf0d997b,
0xbf850002, 0x8f7a897a,
0xbf820001, 0x8f7a8a7a,
@@ -819,7 +819,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbefe037c, 0xbefc0370,
0xf4611c7a, 0xf800,
0x80708470, 0xbefc037e,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -1069,7 +1069,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xb9f9f816, 0x876f7bff,
0xf800, 0x906f8b6f,
0xb9efa2c3, 0xb9f3f801,
-   0xb96e2a05, 0x806e816e,
+   0xb96e3a05, 0x806e816e,
0xbf0d9972, 0xbf850002,
0x8f6e896e, 0xbf820001,
0x8f6e8a6e, 0xb96f1e06,
@@ -2114,7 +2114,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x007a, 0x7e000280,
0xbefe037a, 0xbeff037b,
0xb97b02dc, 0x8f7b997b,
-   0xb97a2a05, 0x807a817a,
+   0xb97a3a05, 0x807a817a,
0xbf0d997b, 0xbf850002,
0x8f7a897a, 0xbf820001,
0x8f7a8a7a, 0xb97b1e06,
@@ -2157,7 +2157,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0x0100, 0xe0704100,
0x705d0100, 0xe0704200,
0x705d0200, 0xe0704300,
-   0x705d0300, 0xb9702a05,
+   0x705d0300, 0xb9703a05,
0x80708170, 0xbf0d9973,
0xbf850002, 0x8f708970,
0xbf820001, 0x8f708a70,
@@ -2189,7 +2189,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbefe03ff, 0x,
0xbeff0380, 0xe0704000,
0x705d0200, 0xbefe03c1,
-   0xb9702a05, 0x80708170,
+   0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
0x8f708970, 0xbf820001,
0x8f708a70, 0xb97a1e06,
@@ -2475,7 +2475,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xb9ef4803, 0x876f7bff,
0xf800, 0x906f8b6f,
0xb9efa2c3, 0xb9f3f801,
-   0xb96e2a05, 0x806e816e,
+   0xb96e3a05, 0x806e816e,
0xbf0d9972, 0xbf850002,
0x8f6e896e, 0xbf820001,
0x8f6e8a6e, 0xb96f1e06,
@@ -2494,438 +2494,441 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
0xbf9f, 0xbf9f,
0xbf9f, 0x,
 };
-
 static const uint32_t cwsr_trap_gfx11_hex[] = {
-   0xbfa1, 0xbfa0021b,
+   0xbfa1, 0xbfa0021e,
0xb0804006, 0xb8f8f802,
-   0x91788678, 0xb8fbf803,
-   0x8b6eff78, 0x2000,
-   0xbfa10009, 0x8b6eff6d,
-   0x00ff, 0xbfa2001e,
-   0x8b6eff7b, 0x0400,
-   0xbfa20041, 0xbf830010,
-   0xb8fbf803, 0xbfa0fffa,
-   0x8b6eff7b, 0x0900,
-   0xbfa20015, 0x8b6eff7b,
-   0x71ff, 0xbfa10008,
-   0x8b6fff7b, 0x7080,
-   0xbfa10001, 0xbeee1287,
-   0xb8eff801, 0x846e8c6e,
-   0x8b6e6f6e, 0xbfa2000a,
+   0x9178ff78, 0x00020006,
+   0xb8fbf803, 0xbf0d9f6d,
+   0xbfa20006, 0x8b6eff78,
+   0x2000, 0xbfa10009,
0x8b6eff6d, 0x00ff,
-   0xbfa20007, 0xb8eef801,
-   0x8b6eff6e, 0x0800,
-   0xbfa20003, 0x8b6eff7b,
-   

Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-22 Thread Andrey Grodzovsky



On 2022-09-22 11:03, Luben Tuikov wrote:

The title of this patch has "v3", but "v4" in the title prefix.
If you're using "-v" to git-format-patch, please remove the "v3" from the title.

Inlined:

On 2022-09-21 14:28, Andrey Grodzovsky wrote:

When many entities competing for same run queue on
the same scheduler When many entities have  unacceptably long wait
time for some jobs waiting stuck in the run queue before being picked
up are observed (seen using  GPUVis).

Use this as your opening:

"When many entities are competing for the same run queue on the same scheduler,
we observe an unusually long wait times and some jobs get starved. This has
been observed on GPUVis."


The issue is due to the Round Robin policy used by schedulers
to pick up the next entity's job queue for execution. Under stress
of many entities and long job queues within entity some
jobs could be stack for very long time in it's entity's

"stuck", not "stack".


queue before being popped from the queue and executed
while for other entities with smaller job queues a job
might execute earlier even though that job arrived later
then the job in the long queue.

Fix:

Add FIFO selection policy to entities in run queue, chose next entity
on run queue in such order that if job on one entity arrived
earlier then job on another entity the first job will start
executing earlier regardless of the length of the entity's job
queue.

v2:

Switch to rb tree structure for entities based on TS of
oldest job waiting in the job queue of an entity. Improves next
entity extraction to O(1). Entity TS update
O(log N) where N is the number of entities in the run-queue

Drop default option in module control parameter.


v3:
Various cosmetical fixes and minor refactoring of fifo update function. (Luben)

v4:
Switch drm_sched_rq_select_entity_fifo to in order search (Luben)

Signed-off-by: Andrey Grodzovsky 

Tested-by: Li Yunxiang (Teddy) 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  26 +-
  drivers/gpu/drm/scheduler/sched_main.c   | 107 ++-
  include/drm/gpu_scheduler.h  |  32 +++
  3 files changed, 159 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6b25b2f4f5a3..f3ffce3c9304 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
entity->priority = priority;
entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
entity->last_scheduled = NULL;
+   RB_CLEAR_NODE(>rb_tree_node);
  
  	if(num_sched_list)

entity->rq = _list[0]->sched_rq[entity->priority];
@@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
  
  	sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));

if (!sched_job)
-   return NULL;
+   goto skip;
  
  	while ((entity->dependency =

drm_sched_job_dependency(sched_job, entity))) {
trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
  
-		if (drm_sched_entity_add_dependency_cb(entity))

-   return NULL;
+   if (drm_sched_entity_add_dependency_cb(entity)) {
+   sched_job = NULL;
+   goto skip;
+   }
}
  
  	/* skip jobs from entity that marked guilty */

@@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
smp_wmb();
  
  	spsc_queue_pop(>job_queue);

+
+   /*
+* It's when head job is extracted we can access the next job (or empty)
+* queue and update the entity location in the min heap accordingly.
+*/
+skip:
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+   drm_sched_rq_update_fifo(entity,
+(sched_job ? sched_job->submit_ts : 
ktime_get()));
+
return sched_job;
  }
  
@@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)

  {
struct drm_sched_entity *entity = sched_job->entity;
bool first;
+   ktime_t ts =  ktime_get();
  
  	trace_drm_sched_job(sched_job, entity);

atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
first = spsc_queue_push(>job_queue, _job->queue_node);
+   sched_job->submit_ts = ts;
  
  	/* first job wakes up scheduler */

if (first) {
@@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
DRM_ERROR("Trying to push to a killed entity\n");
return;
}
+
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
+
+   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
+ 

Re: [PATCH] drm/amdgpu: Fix VRAM eviction issue

2022-09-22 Thread Arunpravin Paneer Selvam




On 9/22/2022 9:01 PM, Christian König wrote:

Am 22.09.22 um 17:14 schrieb Arunpravin Paneer Selvam:

A user reported that when he starts a game (MTGA) with wine,
he observed an error msg failed to pin framebuffer with error -12.
Found an issue with the VRAM mem type eviction decision condition
logic. This patch will fix the if condition code error.

Gitlab bug link:
https://gitlab.freedesktop.org/drm/amd/-/issues/2159

Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible 
functions")
Signed-off-by: Arunpravin Paneer Selvam 



Reviewed-by: Christian König 

Can you push it yourself now?

No, I didn't get the commit rights yet.

Thanks,
Arun



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c

index d1a2619fa89f..73a517bcf5c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -746,7 +746,7 @@ static bool amdgpu_vram_mgr_intersects(struct 
ttm_resource_manager *man,

  (amdgpu_vram_mgr_block_size(block) >> PAGE_SHIFT);
    if (place->fpfn < lpfn &&
-    (place->lpfn && place->lpfn > fpfn))
+    (!place->lpfn || place->lpfn > fpfn))
  return true;
  }






Re: [PATCH] drm/amdgpu: Fix VRAM eviction issue

2022-09-22 Thread Christian König

Am 22.09.22 um 17:14 schrieb Arunpravin Paneer Selvam:

A user reported that when he starts a game (MTGA) with wine,
he observed an error msg failed to pin framebuffer with error -12.
Found an issue with the VRAM mem type eviction decision condition
logic. This patch will fix the if condition code error.

Gitlab bug link:
https://gitlab.freedesktop.org/drm/amd/-/issues/2159

Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible functions")
Signed-off-by: Arunpravin Paneer Selvam 


Reviewed-by: Christian König 

Can you push it yourself now?


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index d1a2619fa89f..73a517bcf5c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -746,7 +746,7 @@ static bool amdgpu_vram_mgr_intersects(struct 
ttm_resource_manager *man,
(amdgpu_vram_mgr_block_size(block) >> PAGE_SHIFT);
  
  		if (place->fpfn < lpfn &&

-   (place->lpfn && place->lpfn > fpfn))
+   (!place->lpfn || place->lpfn > fpfn))
return true;
}
  




[PATCH v2] drm/amdgpu: Fix VRAM BO swap issue

2022-09-22 Thread Arunpravin Paneer Selvam
DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag

Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b1c455329023..b1223c8e30c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -426,6 +426,7 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
 {
uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
struct amdgpu_res_cursor cursor;
+   u64 start, size, total_size = 0;
 
if (mem->mem_type == TTM_PL_SYSTEM ||
mem->mem_type == TTM_PL_TT)
@@ -435,8 +436,23 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
 
amdgpu_res_first(mem, 0, mem_size, );
 
-   /* ttm_resource_ioremap only supports contiguous memory */
-   if (cursor.size != mem_size)
+   do {
+   start = cursor.start;
+   size = cursor.size;
+
+   total_size += size;
+
+   amdgpu_res_next(, cursor.size);
+
+   if (!cursor.remaining)
+   break;
+
+   /* ttm_resource_ioremap only supports contiguous memory */
+   if (start + size != cursor.start)
+   return false;
+   } while (1);
+
+   if (total_size != mem_size)
return false;
 
return cursor.start + cursor.size <= adev->gmc.visible_vram_size;
-- 
2.25.1



Re: [PATCH] drm/amdgpu: Fix VRAM BO swap issue

2022-09-22 Thread Arunpravin Paneer Selvam




On 9/22/2022 7:04 PM, Alex Deucher wrote:

On Thu, Sep 22, 2022 at 9:19 AM Arunpravin Paneer Selvam
 wrote:

DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

Signed-off-by: Arunpravin Paneer Selvam 

Please add a Fixes line and any relevant bug report links.

Hi Alex,
There is no bug created for this issue, I will add Fixes line.

Thanks,
Arun


Alex


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++--
  1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b1c455329023..b1223c8e30c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -426,6 +426,7 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
  {
 uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
 struct amdgpu_res_cursor cursor;
+   u64 start, size, total_size = 0;

 if (mem->mem_type == TTM_PL_SYSTEM ||
 mem->mem_type == TTM_PL_TT)
@@ -435,8 +436,23 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,

 amdgpu_res_first(mem, 0, mem_size, );

-   /* ttm_resource_ioremap only supports contiguous memory */
-   if (cursor.size != mem_size)
+   do {
+   start = cursor.start;
+   size = cursor.size;
+
+   total_size += size;
+
+   amdgpu_res_next(, cursor.size);
+
+   if (!cursor.remaining)
+   break;
+
+   /* ttm_resource_ioremap only supports contiguous memory */
+   if (start + size != cursor.start)
+   return false;
+   } while (1);
+
+   if (total_size != mem_size)
 return false;

 return cursor.start + cursor.size <= adev->gmc.visible_vram_size;
--
2.25.1





[PATCH] drm/amdgpu: Fix VRAM eviction issue

2022-09-22 Thread Arunpravin Paneer Selvam
A user reported that when he starts a game (MTGA) with wine,
he observed an error msg failed to pin framebuffer with error -12.
Found an issue with the VRAM mem type eviction decision condition
logic. This patch will fix the if condition code error.

Gitlab bug link:
https://gitlab.freedesktop.org/drm/amd/-/issues/2159

Fixes: ded910f368a5 ("drm/amdgpu: Implement intersect/compatible functions")
Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index d1a2619fa89f..73a517bcf5c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -746,7 +746,7 @@ static bool amdgpu_vram_mgr_intersects(struct 
ttm_resource_manager *man,
(amdgpu_vram_mgr_block_size(block) >> PAGE_SHIFT);
 
if (place->fpfn < lpfn &&
-   (place->lpfn && place->lpfn > fpfn))
+   (!place->lpfn || place->lpfn > fpfn))
return true;
}
 
-- 
2.25.1



Re: [PATCH v4] drm/sched: Add FIFO sched policy to run queue v3

2022-09-22 Thread Luben Tuikov
The title of this patch has "v3", but "v4" in the title prefix.
If you're using "-v" to git-format-patch, please remove the "v3" from the title.

Inlined:

On 2022-09-21 14:28, Andrey Grodzovsky wrote:
> When many entities competing for same run queue on
> the same scheduler When many entities have  unacceptably long wait
> time for some jobs waiting stuck in the run queue before being picked
> up are observed (seen using  GPUVis).

Use this as your opening:

"When many entities are competing for the same run queue on the same scheduler,
we observe an unusually long wait times and some jobs get starved. This has
been observed on GPUVis."

> The issue is due to the Round Robin policy used by schedulers
> to pick up the next entity's job queue for execution. Under stress
> of many entities and long job queues within entity some
> jobs could be stack for very long time in it's entity's

"stuck", not "stack".

> queue before being popped from the queue and executed
> while for other entities with smaller job queues a job
> might execute earlier even though that job arrived later
> then the job in the long queue.
>    
> Fix:
> Add FIFO selection policy to entities in run queue, chose next entity
> on run queue in such order that if job on one entity arrived
> earlier then job on another entity the first job will start
> executing earlier regardless of the length of the entity's job
> queue.
>    
> v2:
> Switch to rb tree structure for entities based on TS of
> oldest job waiting in the job queue of an entity. Improves next
> entity extraction to O(1). Entity TS update
> O(log N) where N is the number of entities in the run-queue
>    
> Drop default option in module control parameter.
> 
> v3:
> Various cosmetical fixes and minor refactoring of fifo update function. 
> (Luben)
> 
> v4:
> Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
>    
> Signed-off-by: Andrey Grodzovsky 
> Tested-by: Li Yunxiang (Teddy) 
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c |  26 +-
>  drivers/gpu/drm/scheduler/sched_main.c   | 107 ++-
>  include/drm/gpu_scheduler.h  |  32 +++
>  3 files changed, 159 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 6b25b2f4f5a3..f3ffce3c9304 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -73,6 +73,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   entity->priority = priority;
>   entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
>   entity->last_scheduled = NULL;
> + RB_CLEAR_NODE(>rb_tree_node);
>  
>   if(num_sched_list)
>   entity->rq = _list[0]->sched_rq[entity->priority];
> @@ -417,14 +418,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>  
>   sched_job = to_drm_sched_job(spsc_queue_peek(>job_queue));
>   if (!sched_job)
> - return NULL;
> + goto skip;
>  
>   while ((entity->dependency =
>   drm_sched_job_dependency(sched_job, entity))) {
>   trace_drm_sched_job_wait_dep(sched_job, entity->dependency);
>  
> - if (drm_sched_entity_add_dependency_cb(entity))
> - return NULL;
> + if (drm_sched_entity_add_dependency_cb(entity)) {
> + sched_job = NULL;
> + goto skip;
> + }
>   }
>  
>   /* skip jobs from entity that marked guilty */
> @@ -443,6 +446,16 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>   smp_wmb();
>  
>   spsc_queue_pop(>job_queue);
> +
> + /*
> +  * It's when head job is extracted we can access the next job (or empty)
> +  * queue and update the entity location in the min heap accordingly.
> +  */
> +skip:
> + if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> + drm_sched_rq_update_fifo(entity,
> +  (sched_job ? sched_job->submit_ts : 
> ktime_get()));
> +
>   return sched_job;
>  }
>  
> @@ -502,11 +515,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job)
>  {
>   struct drm_sched_entity *entity = sched_job->entity;
>   bool first;
> + ktime_t ts =  ktime_get();
>  
>   trace_drm_sched_job(sched_job, entity);
>   atomic_inc(entity->rq->sched->score);
>   WRITE_ONCE(entity->last_user, current->group_leader);
>   first = spsc_queue_push(>job_queue, _job->queue_node);
> + sched_job->submit_ts = ts;
>  
>   /* first job wakes up scheduler */
>   if (first) {
> @@ -518,8 +533,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
> *sched_job)
>   DRM_ERROR("Trying to push to a killed entity\n");
>   return;
>   }
> +
>   

Re: [PATCH v2] drivers/amd/pm: check the return value of amdgpu_bo_kmap

2022-09-22 Thread Alex Deucher
Applied.  Thanks!

On Thu, Sep 22, 2022 at 2:49 AM Christian König
 wrote:
>
> Am 22.09.22 um 06:17 schrieb Li Zhong:
> > amdgpu_bo_kmap() returns error when fails to map buffer object. Add the
> > error check and propagate the error.
> >
> > Signed-off-by: Li Zhong 
>
> We usually use "r" as return and error variables, but that's just a nit.
>
> Reviewed-by: Christian König 
>
> > ---
> >   drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 -
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
> > b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
> > index 1eb4e613b27a..ec055858eb95 100644
> > --- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
> > +++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
> > @@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, 
> > void **addr, size_t *size)
> >   {
> >   struct pp_hwmgr *hwmgr = handle;
> >   struct amdgpu_device *adev = hwmgr->adev;
> > + int err;
> >
> >   if (!addr || !size)
> >   return -EINVAL;
> > @@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, 
> > void **addr, size_t *size)
> >   *addr = NULL;
> >   *size = 0;
> >   if (adev->pm.smu_prv_buffer) {
> > - amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
> > + err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
> > + if (err)
> > + return err;
> >   *size = adev->pm.smu_prv_buffer_size;
> >   }
> >
>


Re: [PATCH v2 0/6] Add support for atomic async page-flips

2022-09-22 Thread André Almeida

Hi Simon,

On 8/30/22 14:29, Simon Ser wrote:

This series adds support for DRM_MODE_PAGE_FLIP_ASYNC for atomic
commits, aka. "immediate flip" (which might result in tearing).
The feature was only available via the legacy uAPI, however for
gaming use-cases it may be desirable to enable it via the atomic
uAPI too.

- v1: https://patchwork.freedesktop.org/series/107683/
- User-space patch: https://github.com/Plagman/gamescope/pull/595
- IGT patch: https://patchwork.freedesktop.org/series/107681/

Main changes in v2: add docs, fail atomic commit if async flip isn't
possible.

Tested on an AMD Picasso iGPU.



I tested on an AMD vangogh GPU and worked with the IGT tests.

This series is Reviewed-by: André Almeida 


Simon Ser (6):
   amd/display: only accept async flips for fast updates
   drm: document DRM_MODE_PAGE_FLIP_ASYNC
   drm: introduce drm_mode_config.atomic_async_page_flip_not_supported
   drm: allow DRM_MODE_PAGE_FLIP_ASYNC for atomic commits
   drm: introduce DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
   amd/display: indicate support for atomic async page-flips on DC

  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  8 ++
  .../amd/display/amdgpu_dm/amdgpu_dm_crtc.c| 10 +++
  drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |  1 +
  drivers/gpu/drm/drm_atomic_uapi.c | 28 +--
  drivers/gpu/drm/drm_ioctl.c   |  5 
  drivers/gpu/drm/i915/display/intel_display.c  |  1 +
  drivers/gpu/drm/nouveau/nouveau_display.c |  1 +
  drivers/gpu/drm/vc4/vc4_kms.c |  1 +
  include/drm/drm_mode_config.h | 11 
  include/uapi/drm/drm.h| 10 ++-
  include/uapi/drm/drm_mode.h   | 11 
  11 files changed, 83 insertions(+), 4 deletions(-)



Re: [PATCH 23/31] Add debug option for exiting idle optimizations on cursor updates

2022-09-22 Thread Pillai, Aurabindo
[AMD Official Use Only - General]

Hi Jas,

Please add drm/amd/display prefix to the patch title

--

Regards,
Jay

From: Dhillon, Jasdeep 
Sent: Wednesday, September 21, 2022 8:18 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Lakha, Bhawanpreet ; Siqueira, 
Rodrigo ; Pillai, Aurabindo 
; Zhuo, Qingqing (Lillian) ; 
Li, Roman ; Lin, Wayne ; Wang, Chao-kai 
(Stylon) ; Chiu, Solomon ; Kotarac, 
Pavle ; Gutierrez, Agustin ; 
Syu, Brandon ; Cyr, Aric ; Dhillon, 
Jasdeep 
Subject: [PATCH 23/31] Add debug option for exiting idle optimizations on 
cursor updates

From: Brandon Syu 

[Description]
- Have option to exit idle opt on cursor updates
for debug and optimizations purposes

Reviewed-by: Aric Cyr 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Brandon Syu
---
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c   | 3 ++-
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 1 +
 drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c | 3 ++-
 drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 1 +
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index f6f3878c99b8..3a3b2ac791c7 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -724,7 +724,8 @@ static const struct dc_debug_options debug_defaults_drv = {
 .dwb_fi_phase = -1, // -1 = disable,
 .dmub_command_table = true,
 .disable_psr = false,
-   .use_max_lb = true
+   .use_max_lb = true,
+   .exit_idle_opt_for_cursor_updates = true
 };

 static const struct dc_debug_options debug_defaults_diags = {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
index 0c2b15a0f3a7..559e563d5bc1 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c
@@ -700,6 +700,7 @@ static const struct dc_debug_options debug_defaults_drv = {
 .dwb_fi_phase = -1, // -1 = disable
 .dmub_command_table = true,
 .use_max_lb = false,
+   .exit_idle_opt_for_cursor_updates = true
 };

 static const struct dc_debug_options debug_defaults_diags = {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
index 4fab537e822f..b925b6ddde5a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn302/dcn302_resource.c
@@ -93,7 +93,8 @@ static const struct dc_debug_options debug_defaults_drv = {
 .underflow_assert_delay_us = 0x,
 .dwb_fi_phase = -1, // -1 = disable,
 .dmub_command_table = true,
-   .use_max_lb = true
+   .use_max_lb = true,
+   .exit_idle_opt_for_cursor_updates = true
 };

 static const struct dc_debug_options debug_defaults_diags = {
diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
index d97076648acb..527d5c902878 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
@@ -77,6 +77,7 @@ static const struct dc_debug_options debug_defaults_drv = {
 .underflow_assert_delay_us = 0x,
 .dwb_fi_phase = -1, // -1 = disable,
 .dmub_command_table = true,
+   .exit_idle_opt_for_cursor_updates = true,
 .disable_idle_power_optimizations = false,
 };

--
2.25.1



Re: [PATCH 21/31] Add ABM control to panel_config struct.

2022-09-22 Thread Pillai, Aurabindo
[AMD Official Use Only - General]

Hi Jas,

Please add drm/amd/display prefix to the patch title.

--

Regards,
Jay

From: Dhillon, Jasdeep 
Sent: Wednesday, September 21, 2022 8:18 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Lakha, Bhawanpreet ; Siqueira, 
Rodrigo ; Pillai, Aurabindo 
; Zhuo, Qingqing (Lillian) ; 
Li, Roman ; Lin, Wayne ; Wang, Chao-kai 
(Stylon) ; Chiu, Solomon ; Kotarac, 
Pavle ; Gutierrez, Agustin ; 
Chen, Ian ; Pavic, Josip ; Dhillon, 
Jasdeep 
Subject: [PATCH 21/31] Add ABM control to panel_config struct.

From: Ian Chen 

Reviewed-by: Josip Pavic 
Acked-by: Jasdeep Dhillon 
Signed-off-by: Ian Chen 
---
 drivers/gpu/drm/amd/display/dc/dc_link.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_link.h 
b/drivers/gpu/drm/amd/display/dc/dc_link.h
index 6e49ec262487..bf5f9e2773bc 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_link.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_link.h
@@ -127,6 +127,12 @@ struct dc_panel_config {
 unsigned int extra_t12_ms;
 unsigned int extra_post_OUI_ms;
 } pps;
+   // ABM
+   struct varib {
+   unsigned int varibright_feature_enable;
+   unsigned int def_varibright_level;
+   unsigned int abm_config_setting;
+   } varib;
 // edp DSC
 struct dsc {
 bool disable_dsc_edp;
--
2.25.1



Re: [REGRESSION] Graphical issues on Lenovo Yoga 7 14ARB7 laptop since v6.0-rc1 (bisected)

2022-09-22 Thread Alex Deucher
On Thu, Sep 22, 2022 at 8:54 AM Thorsten Leemhuis
 wrote:
>
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> @amdgpu developers, what up here? August afaics didn't even get a single
> reply for his report that even identifies the change that's causing the
> problem. We're already late in the development cycle, so it would be
> good if someone could take a closer look into this before it's too late
> for 6.0.

Been a busy week.  Haven't had a chance to look into this yet.  Does
the issue still happen with this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=66f99628eb24409cb8feb5061f78283c8b65f820

Alex

>
> Ciao, Thorsten
>
> #regzbot poke
>
> On 17.09.22 18:57, August Wikerfors wrote:
> > Hi,
> > with every kernel version since v6.0-rc1, including the latest git
> > master, there are constant graphical issues on this laptop, such as
> > heavy stuttering (this is especially noticeable while typing on the
> > keyboard), parts of the screen showing random noise, and the entire
> > desktop environment freezing.
> >
> > I bisected the issue which showed that this is the first bad commit:
> >
> >> commit 7cc191ee7621b7145c6cc9c18a4e1929bb5f136e
> >> Author: Leo Li 
> >> Date:   Wed Mar 30 12:45:09 2022 -0400
> >>
> >> drm/amd/display: Implement MPO PSR SU
> >> [WHY]
> >> For additional power savings, PSR SU (also referred to as
> >> PSR2) can be
> >> enabled on eDP panels with PSR SU support.
> >> PSR2 saves more power compared to PSR1 by allowing more
> >> opportunities
> >> for the display hardware to be shut down. In comparison to PSR1, Shut
> >> down can now occur in-between frames, as well as in display regions
> >> where there is no visible update. In otherwords, it allows for some
> >> display hw components to be enabled only for a **selectively
> >> updated**
> >> region of the visible display. Hence PSR SU.
> >> [HOW]
> >> To define the SU region, support from the OS is required. OS
> >> needs to
> >> inform driver of damaged regions that need to be flushed to the eDP
> >> panel. Today, such support is lacking in most compositors.
> >> Therefore, an in-between solution is to implement PSR SU for
> >> MPO and
> >> cursor scenarios. The plane bounds can be used to define the damaged
> >> region to be flushed to panel. This is achieved by:
> >> * Leveraging dm_crtc_state->mpo_requested flag to identify
> >> when MPO is
> >>   enabled.
> >> * If MPO is enabled, only add updated plane bounds to dirty region.
> >>   Determine plane update by either:
> >> * Existence of drm damaged clips attached to the plane (added
> >> by a
> >>   damage-aware compositor)
> >> * Change in fb id (flip)
> >> * Change in plane bounds (position and dimensions)
> >> * If cursor is enabled, the old_pos and new_pos of cursor plus cursor
> >>   size is used as damaged regions(*).
> >> (*) Cursor updates follow a different code path through DC.
> >> PSR SU for
> >> cursor is already implemented in DC, and the only thing required to
> >> enable is to set DC_PSR_VERSION_SU_1 on the eDP link. See
> >> dcn10_dmub_update_cursor_data().
> >> Signed-off-by: Leo Li 
> >> Acked-by: Leo Li 
> >> Reviewed-by: Harry Wentland 
> >> Signed-off-by: Alex Deucher 
> >
> > #regzbot introduced: 7cc191ee7621b7145c6cc9c18a4e1929bb5f136e
> >
> > Note that while bisecting I also needed to apply commit
> > 9946e39fe8d0a5da9eb947d8e40a7ef204ba016e as the keyboard doesn't work
> > without it.
> >
> > Laptop model: Lenovo Yoga 7 14ARB7
> > CPU: AMD Ryzen 5 6600U
> > Kernel config:
> > https://raw.githubusercontent.com/archlinux/svntogit-packages/aa564cf7088b1d834ef4cda9cb48ff0283fde5c5/trunk/config
> > Distribution: Arch Linux
> > Desktop environment: KDE Plasma 5.25.5
> >
> > lspci:
> >> $ lspci -nn
> >> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14b5] (rev 01)
> >> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14b6]
> >> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14b7] (rev 01)
> >> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14b7] (rev 01)
> >> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14ba]
> >> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14ba]
> >> 00:02.5 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14ba]
> >> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14b7] (rev 01)
> >> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14cd]
> >> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
> >> [1022:14b7] (rev 01)
> >> 00:08.0 Host 

Re: [PATCH] drm/amdgpu: Fix VRAM BO swap issue

2022-09-22 Thread Alex Deucher
On Thu, Sep 22, 2022 at 9:19 AM Arunpravin Paneer Selvam
 wrote:
>
> DRM buddy manager allocates the contiguous memory requests in
> a single block or multiple blocks. So for the ttm move operation
> (incase of low vram memory) we should consider all the blocks to
> compute the total memory size which compared with the struct
> ttm_resource num_pages in order to verify that the blocks are
> contiguous for the eviction process.
>
> Signed-off-by: Arunpravin Paneer Selvam 

Please add a Fixes line and any relevant bug report links.

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index b1c455329023..b1223c8e30c9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -426,6 +426,7 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
>  {
> uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
> struct amdgpu_res_cursor cursor;
> +   u64 start, size, total_size = 0;
>
> if (mem->mem_type == TTM_PL_SYSTEM ||
> mem->mem_type == TTM_PL_TT)
> @@ -435,8 +436,23 @@ static bool amdgpu_mem_visible(struct amdgpu_device 
> *adev,
>
> amdgpu_res_first(mem, 0, mem_size, );
>
> -   /* ttm_resource_ioremap only supports contiguous memory */
> -   if (cursor.size != mem_size)
> +   do {
> +   start = cursor.start;
> +   size = cursor.size;
> +
> +   total_size += size;
> +
> +   amdgpu_res_next(, cursor.size);
> +
> +   if (!cursor.remaining)
> +   break;
> +
> +   /* ttm_resource_ioremap only supports contiguous memory */
> +   if (start + size != cursor.start)
> +   return false;
> +   } while (1);
> +
> +   if (total_size != mem_size)
> return false;
>
> return cursor.start + cursor.size <= adev->gmc.visible_vram_size;
> --
> 2.25.1
>


[PATCH] drm/amdgpu: Fix VRAM BO swap issue

2022-09-22 Thread Arunpravin Paneer Selvam
DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b1c455329023..b1223c8e30c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -426,6 +426,7 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
 {
uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
struct amdgpu_res_cursor cursor;
+   u64 start, size, total_size = 0;
 
if (mem->mem_type == TTM_PL_SYSTEM ||
mem->mem_type == TTM_PL_TT)
@@ -435,8 +436,23 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
 
amdgpu_res_first(mem, 0, mem_size, );
 
-   /* ttm_resource_ioremap only supports contiguous memory */
-   if (cursor.size != mem_size)
+   do {
+   start = cursor.start;
+   size = cursor.size;
+
+   total_size += size;
+
+   amdgpu_res_next(, cursor.size);
+
+   if (!cursor.remaining)
+   break;
+
+   /* ttm_resource_ioremap only supports contiguous memory */
+   if (start + size != cursor.start)
+   return false;
+   } while (1);
+
+   if (total_size != mem_size)
return false;
 
return cursor.start + cursor.size <= adev->gmc.visible_vram_size;
-- 
2.25.1



Re: [PATCH] gpu: dc: fix enum conversion in display_mode_vba

2022-09-22 Thread Zeng Heng

I just correct the subject line and resend the patch mail.

Please refer to:

[PATCH resend v2] drm/amdgpu: fix enum conversion in display_mode_vba


On 2022/9/19 15:44, Christian König wrote:

Am 19.09.22 um 03:41 schrieb Zeng Heng:

Fix below compile warning when open enum-conversion
option check (compiled with -Wenum-conversion):

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c: 


In function ‘dml20_ModeSupportAndSystemConfigurationFull’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3900:44: 

error: implicit conversion from ‘enum ’ to ‘enum 
odm_combine_mode’ [-Werror=enum-conversion]

  3900 | locals->ODMCombineEnablePerState[i][k] = false;
   |    ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3904:46: 

error: implicit conversion from ‘enum ’ to ‘enum 
odm_combine_mode’ [-Werror=enum-conversion]

  3904 |   locals->ODMCombineEnablePerState[i][k] = true;
   |  ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3907:46: 

error: implicit conversion from ‘enum ’ to ‘enum 
odm_combine_mode’ [-Werror=enum-conversion]

  3907 |   locals->ODMCombineEnablePerState[i][k] = true;
   |  ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3960:45: 

error: implicit conversion from ‘enum ’ to ‘enum 
odm_combine_mode’ [-Werror=enum-conversion]

  3960 |  locals->ODMCombineEnablePerState[i][k] = false;

Use the proper value from the right enumerated type,
dm_odm_combine_mode_disabled & dm_odm_combine_mode_2to1,
so there is no more implicit conversion.

The numerical values of dm_odm_combine_mode_disabled
& false and dm_odm_combine_mode_2to1 & true
happen to be the same, so there is no change in
behavior.


In the subject line the correct prefix is "drm/amdgpu: ", but 
apart from that looks good to me as well.


But our DC team has to take a closer look.

Thanks,
Christian.



Signed-off-by: Zeng Heng 
---
  .../amd/display/dc/dml/dcn20/display_mode_vba_20.c   |  8 
  .../amd/display/dc/dml/dcn20/display_mode_vba_20v2.c | 10 +-
  .../amd/display/dc/dml/dcn21/display_mode_vba_21.c   | 12 ++--
  3 files changed, 15 insertions(+), 15 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c

index d3b5b6fedf04..6266b0788387 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
@@ -3897,14 +3897,14 @@ void 
dml20_ModeSupportAndSystemConfigurationFull(struct display_mode_lib 
*mode_l
mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine = 
mode_lib->vba.PixelClock[k] / 2
  * (1 + 
mode_lib->vba.DISPCLKDPPCLKDSCCLKDownSpreading / 100.0);

  -    locals->ODMCombineEnablePerState[i][k] = false;
+    locals->ODMCombineEnablePerState[i][k] = 
dm_odm_combine_mode_disabled;
  mode_lib->vba.PlaneRequiredDISPCLK = 
mode_lib->vba.PlaneRequiredDISPCLKWithoutODMCombine;

  if (mode_lib->vba.ODMCapability) {
  if 
(locals->PlaneRequiredDISPCLKWithoutODMCombine > 
mode_lib->vba.MaxDispclkRoundedDownToDFSGranularity) {

- locals->ODMCombineEnablePerState[i][k] = true;
+ locals->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;
  mode_lib->vba.PlaneRequiredDISPCLK = 
mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine;
  } else if (locals->HActive[k] > 
DCN20_MAX_420_IMAGE_WIDTH && locals->OutputFormat[k] == dm_420) {

- locals->ODMCombineEnablePerState[i][k] = true;
+ locals->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;
  mode_lib->vba.PlaneRequiredDISPCLK = 
mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine;

  }
  }
@@ -3957,7 +3957,7 @@ void 
dml20_ModeSupportAndSystemConfigurationFull(struct display_mode_lib 
*mode_l

  locals->RequiredDISPCLK[i][j] = 0.0;
  locals->DISPCLK_DPPCLK_Support[i][j] = true;
  for (k = 0; k <= mode_lib->vba.NumberOfActivePlanes 
- 1; k++) {

-    locals->ODMCombineEnablePerState[i][k] = false;
+    locals->ODMCombineEnablePerState[i][k] = 
dm_odm_combine_mode_disabled;
  if (locals->SwathWidthYSingleDPP[k] <= 
locals->MaximumSwathWidth[k]) {

  locals->NoOfDPP[i][j][k] = 1;
  locals->RequiredDPPCLK[i][j][k] = 
locals->MinDPPCLKUsingSingleDPP[k]
diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20v2.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20v2.c

index edd098c7eb92..989d83ee3842 100644
--- 

Re: [REGRESSION] Graphical issues on Lenovo Yoga 7 14ARB7 laptop since v6.0-rc1 (bisected)

2022-09-22 Thread Thorsten Leemhuis
Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

@amdgpu developers, what up here? August afaics didn't even get a single
reply for his report that even identifies the change that's causing the
problem. We're already late in the development cycle, so it would be
good if someone could take a closer look into this before it's too late
for 6.0.

Ciao, Thorsten

#regzbot poke

On 17.09.22 18:57, August Wikerfors wrote:
> Hi,
> with every kernel version since v6.0-rc1, including the latest git
> master, there are constant graphical issues on this laptop, such as
> heavy stuttering (this is especially noticeable while typing on the
> keyboard), parts of the screen showing random noise, and the entire
> desktop environment freezing.
> 
> I bisected the issue which showed that this is the first bad commit:
> 
>> commit 7cc191ee7621b7145c6cc9c18a4e1929bb5f136e
>> Author: Leo Li 
>> Date:   Wed Mar 30 12:45:09 2022 -0400
>>
>>     drm/amd/display: Implement MPO PSR SU
>>         [WHY]
>>         For additional power savings, PSR SU (also referred to as
>> PSR2) can be
>>     enabled on eDP panels with PSR SU support.
>>         PSR2 saves more power compared to PSR1 by allowing more
>> opportunities
>>     for the display hardware to be shut down. In comparison to PSR1, Shut
>>     down can now occur in-between frames, as well as in display regions
>>     where there is no visible update. In otherwords, it allows for some
>>     display hw components to be enabled only for a **selectively
>> updated**
>>     region of the visible display. Hence PSR SU.
>>         [HOW]
>>         To define the SU region, support from the OS is required. OS
>> needs to
>>     inform driver of damaged regions that need to be flushed to the eDP
>>     panel. Today, such support is lacking in most compositors.
>>         Therefore, an in-between solution is to implement PSR SU for
>> MPO and
>>     cursor scenarios. The plane bounds can be used to define the damaged
>>     region to be flushed to panel. This is achieved by:
>>         * Leveraging dm_crtc_state->mpo_requested flag to identify
>> when MPO is
>>   enabled.
>>     * If MPO is enabled, only add updated plane bounds to dirty region.
>>   Determine plane update by either:
>>     * Existence of drm damaged clips attached to the plane (added
>> by a
>>   damage-aware compositor)
>>     * Change in fb id (flip)
>>     * Change in plane bounds (position and dimensions)
>>     * If cursor is enabled, the old_pos and new_pos of cursor plus cursor
>>   size is used as damaged regions(*).
>>         (*) Cursor updates follow a different code path through DC.
>> PSR SU for
>>     cursor is already implemented in DC, and the only thing required to
>>     enable is to set DC_PSR_VERSION_SU_1 on the eDP link. See
>>     dcn10_dmub_update_cursor_data().
>>         Signed-off-by: Leo Li 
>>     Acked-by: Leo Li 
>>     Reviewed-by: Harry Wentland 
>>     Signed-off-by: Alex Deucher 
> 
> #regzbot introduced: 7cc191ee7621b7145c6cc9c18a4e1929bb5f136e
> 
> Note that while bisecting I also needed to apply commit
> 9946e39fe8d0a5da9eb947d8e40a7ef204ba016e as the keyboard doesn't work
> without it.
> 
> Laptop model: Lenovo Yoga 7 14ARB7
> CPU: AMD Ryzen 5 6600U
> Kernel config:
> https://raw.githubusercontent.com/archlinux/svntogit-packages/aa564cf7088b1d834ef4cda9cb48ff0283fde5c5/trunk/config
> Distribution: Arch Linux
> Desktop environment: KDE Plasma 5.25.5
> 
> lspci:
>> $ lspci -nn
>> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b5] (rev 01)
>> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b6]
>> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b7] (rev 01)
>> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b7] (rev 01)
>> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14ba]
>> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14ba]
>> 00:02.5 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14ba]
>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b7] (rev 01)
>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14cd]
>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b7] (rev 01)
>> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b7] (rev 01)
>> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b9] (rev 10)
>> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device
>> [1022:14b9] (rev 10)
>> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus
>> Controller [1022:790b] (rev 71)
>> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC
>> Bridge [1022:790e] (rev 51)
>> 00:18.0 Host bridge 

[PATCH resend v2] drm/amdgpu: fix enum conversion in display_mode_vba

2022-09-22 Thread Zeng Heng
Fix below compile warning when open enum-conversion
option check (compiled with -Wenum-conversion):

drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:
In function ‘dml20_ModeSupportAndSystemConfigurationFull’:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3900:44:
error: implicit conversion from ‘enum ’ to ‘enum odm_combine_mode’ 
[-Werror=enum-conversion]
 3900 | locals->ODMCombineEnablePerState[i][k] = false;
  |^
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3904:46:
error: implicit conversion from ‘enum ’ to ‘enum odm_combine_mode’ 
[-Werror=enum-conversion]
 3904 |   locals->ODMCombineEnablePerState[i][k] = true;
  |  ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3907:46:
error: implicit conversion from ‘enum ’ to ‘enum odm_combine_mode’ 
[-Werror=enum-conversion]
 3907 |   locals->ODMCombineEnablePerState[i][k] = true;
  |  ^
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_mode_vba_20.c:3960:45:
error: implicit conversion from ‘enum ’ to ‘enum odm_combine_mode’ 
[-Werror=enum-conversion]
 3960 |  locals->ODMCombineEnablePerState[i][k] = false;

Use the proper value from the right enumerated type,
dm_odm_combine_mode_disabled & dm_odm_combine_mode_2to1,
so there is no more implicit conversion.

The numerical values of dm_odm_combine_mode_disabled
& false and dm_odm_combine_mode_2to1 & true
happen to be the same, so there is no change in
behavior.

Signed-off-by: Zeng Heng 
---
 .../amd/display/dc/dml/dcn20/display_mode_vba_20.c   |  8 
 .../amd/display/dc/dml/dcn20/display_mode_vba_20v2.c | 10 +-
 .../amd/display/dc/dml/dcn21/display_mode_vba_21.c   | 12 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
index d3b5b6fedf04..6266b0788387 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c
@@ -3897,14 +3897,14 @@ void dml20_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l

mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine = mode_lib->vba.PixelClock[k] 
/ 2
* (1 + 
mode_lib->vba.DISPCLKDPPCLKDSCCLKDownSpreading / 100.0);
 
-   locals->ODMCombineEnablePerState[i][k] = false;
+   locals->ODMCombineEnablePerState[i][k] = 
dm_odm_combine_mode_disabled;
mode_lib->vba.PlaneRequiredDISPCLK = 
mode_lib->vba.PlaneRequiredDISPCLKWithoutODMCombine;
if (mode_lib->vba.ODMCapability) {
if 
(locals->PlaneRequiredDISPCLKWithoutODMCombine > 
mode_lib->vba.MaxDispclkRoundedDownToDFSGranularity) {
-   
locals->ODMCombineEnablePerState[i][k] = true;
+   
locals->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;

mode_lib->vba.PlaneRequiredDISPCLK = 
mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine;
} else if (locals->HActive[k] > 
DCN20_MAX_420_IMAGE_WIDTH && locals->OutputFormat[k] == dm_420) {
-   
locals->ODMCombineEnablePerState[i][k] = true;
+   
locals->ODMCombineEnablePerState[i][k] = dm_odm_combine_mode_2to1;

mode_lib->vba.PlaneRequiredDISPCLK = 
mode_lib->vba.PlaneRequiredDISPCLKWithODMCombine;
}
}
@@ -3957,7 +3957,7 @@ void dml20_ModeSupportAndSystemConfigurationFull(struct 
display_mode_lib *mode_l
locals->RequiredDISPCLK[i][j] = 0.0;
locals->DISPCLK_DPPCLK_Support[i][j] = true;
for (k = 0; k <= 
mode_lib->vba.NumberOfActivePlanes - 1; k++) {
-   locals->ODMCombineEnablePerState[i][k] 
= false;
+   locals->ODMCombineEnablePerState[i][k] 
= dm_odm_combine_mode_disabled;
if (locals->SwathWidthYSingleDPP[k] <= 
locals->MaximumSwathWidth[k]) {
locals->NoOfDPP[i][j][k] = 1;
locals->RequiredDPPCLK[i][j][k] 
= locals->MinDPPCLKUsingSingleDPP[k]
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20v2.c 

Re: [PATCH AUTOSEL 5.4 3/5] drm/amdgpu: use dirty framebuffer helper

2022-09-22 Thread Michel Dänzer
On 2022-09-21 17:54, Sasha Levin wrote:
> From: Hamza Mahfooz 
> 
> [ Upstream commit 66f99628eb24409cb8feb5061f78283c8b65f820 ]
> 
> Currently, we aren't handling DRM_IOCTL_MODE_DIRTYFB. So, use
> drm_atomic_helper_dirtyfb() as the dirty callback in the amdgpu_fb_funcs
> struct.
> 
> Signed-off-by: Hamza Mahfooz 
> Acked-by: Alex Deucher 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index b588e0e409e7..d8687868407d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -495,6 +496,7 @@ bool amdgpu_display_ddc_probe(struct amdgpu_connector 
> *amdgpu_connector,
>  static const struct drm_framebuffer_funcs amdgpu_fb_funcs = {
>   .destroy = drm_gem_fb_destroy,
>   .create_handle = drm_gem_fb_create_handle,
> + .dirty = drm_atomic_helper_dirtyfb,
>  };
>  
>  uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev,

This patch has issues, see https://patchwork.freedesktop.org/patch/503749/ .


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH] drm/amd/amdgpu: solve the issue of allocate continuous pages under xen dom0

2022-09-22 Thread Xiao, Shane
[AMD Official Use Only - General]

Hi Christian,

So the real question is why are you trying to use userptr with XEN dom0?
[Shane]
Kfd test KFDMemoryTest.MMBench use userptr. And the test fails on Xen dom0 and 
succeeds on baremetal.
We can also use GTT to allocate coherent buffers by setting 
HSA_USERPTR_FOR_PAGED_MEM to 0 on xen dom0.

As you mentioned that "In other words using userptrs in our driver stack are 
fundamentally incompatible with swiotlb ",
it seems that IOMMU is not enabled on xen dom0, so the kfd test case falls back 
into xen swiotlb when using userptr.
I really appreciate that you could give me some advices on this issue without 
IOMMU enabled on xen dom0.

Best Regards,
Shane


> -Original Message-
> From: Koenig, Christian 
> Sent: Thursday, September 22, 2022 5:09 PM
> To: Xiao, Shane ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate
> continuous pages under xen dom0
> 
> Well this patch is a pretty clear NAK.
> 
> Using xen_swiotlb with userptr in the first place sounds illegal to me since 
> we
> need coherent buffers which are incompatible with swiotlb.
> 
> In other words using userptrs in our driver stack are fundamentally
> incompatible with swiotlb, so it doesn't make sense to try o get this working
> using those workarounds.
> 
> So the real question is why are you trying to use userptr with XEN dom0?
> That won't work correctly at all.
> 
> Regards,
> Christian.
> 
> Am 22.09.22 um 11:00 schrieb Xiao, Shane:
> > [AMD Official Use Only - General]
> >
> > Hi Christian,
> >
> > Why should the resulting sg table be too large?
> > [Shane]
> > sg_alloc_table_from_pages will set max_segment field as default value
> UINT_MAX  to sg_alloc_table_from_pages_segment. The filed
> max_segment works as follows:
> > “Contiguous ranges of the pages are squashed into a single scatterlist node
> up to the maximum size specified in @max_segment.”
> > If we don't set the max_segment field, the sg_alloc_table_from_pages
> may allocate 2M or more continuous ranges of pages.
> >
> >
> > For what too large?
> > [Shane]
> > However, these pages are called pseudo-physical pages on xen dom0,
> which means that the actual machine pages are not necessarily continuous.
> > When this happens, the xen_swiotlb will use bounce buffer to do dma
> operation by swiotlb_tbl_map_single.
> > But, the xen_swiotlb only allows IO_TLB_SEGSIZE*IO_TLB_SHIFT (256K)
> continuous pages, and the allocate 2M or more continuous ranges of pages
> will cause such error "swiotlb buffer is full".
> >
> > BTW: intel uses the same method to allocate page tables in
> i915_gem_userptr_get_pages.
> >
> > Best Regards,
> > Shane
> >
> >> -Original Message-
> >> From: Koenig, Christian 
> >> Sent: Thursday, September 22, 2022 3:19 PM
> >> To: Xiao, Shane ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate
> >> continuous pages under xen dom0
> >>
> >> Am 22.09.22 um 09:11 schrieb Shane Xiao:
> >>> [Why]
> >>> sg_alloc_table_from_pages alloc too large continuous PFN pages under
> >> xen dom0.
> >>
> >> Well that sentence doesn't make much sense. Why should the resulting
> >> sg table be to large? For what to large?
> >>
> >> Regards,
> >> Christian.
> >>
> >>> However, xen should check continuous MFN pages in
> >> range_straddles_page_boundary.
> >>> When range_straddles_page_boundary return false, some cases fall
> >>> back into swiotlb process and the continuous allocable page is not
> enough.
> >>>
> >>> [How]
> >>> In fact, xen swiotlb set max_segment default value as UINT_MAX and
> >>> xen_swiotlb_init_early already change the value to PAGE_SIZE under
> >>> xen
> >> dom0.
> >>> However amdgpu driver doesn't use the value, which may cause issue
> >>> such as swiotlb buffer full. Add amd_sg_segment_size according to
> >>> iommu setting, the details are as follows:
> >>>   iommu setting   |   amd_sg_segment_size
> >>> ---
> >>>   iommu=on|   UINT_MAX
> >>>   iommu=off && swiotlb on |   IO_TLB_DEFAULT_SIZE(64M)
> >>>   xen_swiotlb on  |   PAGE_SIZE(4K)
> >>> 
> >>> --
> >>> -
> >>>
> >>> Signed-off-by: Shane Xiao 
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22
> >> --
> >>>1 file changed, 20 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> index 134575a3893c..d081fcd22d6b 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> @@ -80,6 +80,23 @@ static int amdgpu_ttm_init_on_chip(struct
> >> amdgpu_device *adev,
> >>> false, size_in_page);
> >>>}
> >>>
> >>> +static inline unsigned 

Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate continuous pages under xen dom0

2022-09-22 Thread Christian König

Well this patch is a pretty clear NAK.

Using xen_swiotlb with userptr in the first place sounds illegal to me 
since we need coherent buffers which are incompatible with swiotlb.


In other words using userptrs in our driver stack are fundamentally 
incompatible with swiotlb, so it doesn't make sense to try o get this 
working using those workarounds.


So the real question is why are you trying to use userptr with XEN dom0? 
That won't work correctly at all.


Regards,
Christian.

Am 22.09.22 um 11:00 schrieb Xiao, Shane:

[AMD Official Use Only - General]

Hi Christian,

Why should the resulting sg table be too large?
[Shane]
sg_alloc_table_from_pages will set max_segment field as default value UINT_MAX  
to sg_alloc_table_from_pages_segment. The filed max_segment works as follows:
“Contiguous ranges of the pages are squashed into a single scatterlist node up 
to the maximum size specified in @max_segment.”
If we don't set the max_segment field, the sg_alloc_table_from_pages may 
allocate 2M or more continuous ranges of pages.


For what too large?
[Shane]
However, these pages are called pseudo-physical pages on xen dom0, which means 
that the actual machine pages are not necessarily continuous.
When this happens, the xen_swiotlb will use bounce buffer to do dma operation 
by swiotlb_tbl_map_single.
But, the xen_swiotlb only allows IO_TLB_SEGSIZE*IO_TLB_SHIFT (256K) continuous pages, and 
the allocate 2M or more continuous ranges of pages will cause such error "swiotlb 
buffer is full".

BTW: intel uses the same method to allocate page tables in 
i915_gem_userptr_get_pages.

Best Regards,
Shane


-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 22, 2022 3:19 PM
To: Xiao, Shane ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate
continuous pages under xen dom0

Am 22.09.22 um 09:11 schrieb Shane Xiao:

[Why]
sg_alloc_table_from_pages alloc too large continuous PFN pages under

xen dom0.

Well that sentence doesn't make much sense. Why should the resulting sg
table be to large? For what to large?

Regards,
Christian.


However, xen should check continuous MFN pages in

range_straddles_page_boundary.

When range_straddles_page_boundary return false, some cases fall back
into swiotlb process and the continuous allocable page is not enough.

[How]
In fact, xen swiotlb set max_segment default value as UINT_MAX and
xen_swiotlb_init_early already change the value to PAGE_SIZE under xen

dom0.

However amdgpu driver doesn't use the value, which may cause issue
such as swiotlb buffer full. Add amd_sg_segment_size according to
iommu setting, the details are as follows:
iommu setting   |   amd_sg_segment_size
---
iommu=on|   UINT_MAX
  iommu=off && swiotlb on   |   IO_TLB_DEFAULT_SIZE(64M)
xen_swiotlb on  |   PAGE_SIZE(4K)
--
-

Signed-off-by: Shane Xiao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22

--

   1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 134575a3893c..d081fcd22d6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -80,6 +80,23 @@ static int amdgpu_ttm_init_on_chip(struct

amdgpu_device *adev,

  false, size_in_page);
   }

+static inline unsigned int amdgpu_sg_segment_size(void) {
+   unsigned int size = swiotlb_max_segment();
+
+   /* size=0 when amd iommu enabled */
+   if (size == 0)
+   size = UINT_MAX;
+
+   size = rounddown(size, PAGE_SIZE);
+   /* swiotlb_max_segment_size can return 1 byte when it means one

page. */

+   if (size < PAGE_SIZE)
+   size = PAGE_SIZE;
+
+   return size;
+}
+
+
   /**
* amdgpu_evict_flags - Compute placement flags
*
@@ -760,9 +777,10 @@ static int amdgpu_ttm_tt_pin_userptr(struct

ttm_device *bdev,

int r;

/* Allocate an SG array and squash pages into it */
-   r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm-
num_pages, 0,
- (u64)ttm->num_pages << PAGE_SHIFT,
+   r = sg_alloc_table_from_pages_segment(ttm->sg, ttm->pages, ttm-
num_pages, 0,
+ (u64)ttm->num_pages << PAGE_SHIFT,
+amdgpu_sg_segment_size(),
  GFP_KERNEL);
+
if (r)
goto release_sg;





RE: [PATCH] drm/amd/amdgpu: solve the issue of allocate continuous pages under xen dom0

2022-09-22 Thread Xiao, Shane
[AMD Official Use Only - General]

Hi Christian,

Why should the resulting sg table be too large?
[Shane]
sg_alloc_table_from_pages will set max_segment field as default value UINT_MAX  
to sg_alloc_table_from_pages_segment. The filed max_segment works as follows:
“Contiguous ranges of the pages are squashed into a single scatterlist node up 
to the maximum size specified in @max_segment.”
If we don't set the max_segment field, the sg_alloc_table_from_pages may 
allocate 2M or more continuous ranges of pages.


For what too large?
[Shane]
However, these pages are called pseudo-physical pages on xen dom0, which means 
that the actual machine pages are not necessarily continuous. 
When this happens, the xen_swiotlb will use bounce buffer to do dma operation 
by swiotlb_tbl_map_single. 
But, the xen_swiotlb only allows IO_TLB_SEGSIZE*IO_TLB_SHIFT (256K) continuous 
pages, and the allocate 2M or more continuous ranges of pages will cause such 
error "swiotlb buffer is full".

BTW: intel uses the same method to allocate page tables in 
i915_gem_userptr_get_pages.

Best Regards,
Shane

> -Original Message-
> From: Koenig, Christian 
> Sent: Thursday, September 22, 2022 3:19 PM
> To: Xiao, Shane ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate
> continuous pages under xen dom0
> 
> Am 22.09.22 um 09:11 schrieb Shane Xiao:
> > [Why]
> > sg_alloc_table_from_pages alloc too large continuous PFN pages under
> xen dom0.
> 
> Well that sentence doesn't make much sense. Why should the resulting sg
> table be to large? For what to large?
> 
> Regards,
> Christian.
> 
> > However, xen should check continuous MFN pages in
> range_straddles_page_boundary.
> > When range_straddles_page_boundary return false, some cases fall back
> > into swiotlb process and the continuous allocable page is not enough.
> >
> > [How]
> > In fact, xen swiotlb set max_segment default value as UINT_MAX and
> > xen_swiotlb_init_early already change the value to PAGE_SIZE under xen
> dom0.
> > However amdgpu driver doesn't use the value, which may cause issue
> > such as swiotlb buffer full. Add amd_sg_segment_size according to
> > iommu setting, the details are as follows:
> > iommu setting   |   amd_sg_segment_size
> > ---
> > iommu=on|   UINT_MAX
> >  iommu=off && swiotlb on|   IO_TLB_DEFAULT_SIZE(64M)
> > xen_swiotlb on  |   PAGE_SIZE(4K)
> > --
> > -
> >
> > Signed-off-by: Shane Xiao 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22
> --
> >   1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 134575a3893c..d081fcd22d6b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -80,6 +80,23 @@ static int amdgpu_ttm_init_on_chip(struct
> amdgpu_device *adev,
> >   false, size_in_page);
> >   }
> >
> > +static inline unsigned int amdgpu_sg_segment_size(void) {
> > +   unsigned int size = swiotlb_max_segment();
> > +
> > +   /* size=0 when amd iommu enabled */
> > +   if (size == 0)
> > +   size = UINT_MAX;
> > +
> > +   size = rounddown(size, PAGE_SIZE);
> > +   /* swiotlb_max_segment_size can return 1 byte when it means one
> page. */
> > +   if (size < PAGE_SIZE)
> > +   size = PAGE_SIZE;
> > +
> > +   return size;
> > +}
> > +
> > +
> >   /**
> >* amdgpu_evict_flags - Compute placement flags
> >*
> > @@ -760,9 +777,10 @@ static int amdgpu_ttm_tt_pin_userptr(struct
> ttm_device *bdev,
> > int r;
> >
> > /* Allocate an SG array and squash pages into it */
> > -   r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm-
> >num_pages, 0,
> > - (u64)ttm->num_pages << PAGE_SHIFT,
> > +   r = sg_alloc_table_from_pages_segment(ttm->sg, ttm->pages, ttm-
> >num_pages, 0,
> > + (u64)ttm->num_pages << PAGE_SHIFT,
> > +amdgpu_sg_segment_size(),
> >   GFP_KERNEL);
> > +
> > if (r)
> > goto release_sg;
> >


Re: [PATCH v1] drivers:amdgpu: check the return value of amdgpu_bo_kmap

2022-09-22 Thread Li Zhong
On Wed, Sep 21, 2022 at 7:11 PM Chen, Guchun  wrote:
>
> Perhaps you need to update the prefix of patch subject to 'drm/amd/pm: check 
> return value ...'.
>
> With above addressed, it's: Acked-by: Guchun Chen 
>
> Regards,
> Guchun
>
> -Original Message-
> From: Li Zhong 
> Sent: Thursday, September 22, 2022 9:27 AM
> To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Cc: jiapeng.ch...@linux.alibaba.com; Powell, Darren ; 
> Chen, Guchun ; Limonciello, Mario 
> ; Quan, Evan ; Lazar, Lijo 
> ; dan...@ffwll.ch; airl...@linux.ie; Pan, Xinhui 
> ; Koenig, Christian ; Deucher, 
> Alexander ; Li Zhong 
> Subject: [PATCH v1] drivers:amdgpu: check the return value of amdgpu_bo_kmap
>
> amdgpu_bo_kmap() returns error when fails to map buffer object. Add the error 
> check and propagate the error.
>
> Signed-off-by: Li Zhong 
> ---
>  drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
> b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
> index 1eb4e613b27a..ec055858eb95 100644
> --- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
> +++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
> @@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, void 
> **addr, size_t *size)  {
> struct pp_hwmgr *hwmgr = handle;
> struct amdgpu_device *adev = hwmgr->adev;
> +   int err;
>
> if (!addr || !size)
> return -EINVAL;
> @@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, void 
> **addr, size_t *size)
> *addr = NULL;
> *size = 0;
> if (adev->pm.smu_prv_buffer) {
> -   amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
> +   err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
> +   if (err)
> +   return err;
> *size = adev->pm.smu_prv_buffer_size;
> }
>
> --
> 2.25.1
>

Thanks for your reply! It's updated in the v2 patch.


[PATCH v1] drivers:amdgpu: check the return value of amdgpu_bo_kmap

2022-09-22 Thread Li Zhong
amdgpu_bo_kmap() returns error when fails to map buffer object. Add the
error check and propagate the error.

Signed-off-by: Li Zhong 
---
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 1eb4e613b27a..ec055858eb95 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
 {
struct pp_hwmgr *hwmgr = handle;
struct amdgpu_device *adev = hwmgr->adev;
+   int err;
 
if (!addr || !size)
return -EINVAL;
@@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
*addr = NULL;
*size = 0;
if (adev->pm.smu_prv_buffer) {
-   amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   if (err)
+   return err;
*size = adev->pm.smu_prv_buffer_size;
}
 
-- 
2.25.1



[PATCH v2] drivers/amd/pm: check the return value of amdgpu_bo_kmap

2022-09-22 Thread Li Zhong
amdgpu_bo_kmap() returns error when fails to map buffer object. Add the
error check and propagate the error.

Signed-off-by: Li Zhong 
---
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 1eb4e613b27a..ec055858eb95 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
 {
struct pp_hwmgr *hwmgr = handle;
struct amdgpu_device *adev = hwmgr->adev;
+   int err;
 
if (!addr || !size)
return -EINVAL;
@@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
*addr = NULL;
*size = 0;
if (adev->pm.smu_prv_buffer) {
-   amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   if (err)
+   return err;
*size = adev->pm.smu_prv_buffer_size;
}
 
-- 
2.25.1



Re: [PATCH] drm/amd/amdgpu: solve the issue of allocate continuous pages under xen dom0

2022-09-22 Thread Christian König

Am 22.09.22 um 09:11 schrieb Shane Xiao:

[Why]
sg_alloc_table_from_pages alloc too large continuous PFN pages under xen dom0.


Well that sentence doesn't make much sense. Why should the resulting sg 
table be to large? For what to large?


Regards,
Christian.


However, xen should check continuous MFN pages in range_straddles_page_boundary.
When range_straddles_page_boundary return false, some cases fall back into
swiotlb process and the continuous allocable page is not enough.

[How]
In fact, xen swiotlb set max_segment default value as UINT_MAX and
xen_swiotlb_init_early already change the value to PAGE_SIZE under xen dom0.
However amdgpu driver doesn't use the value, which may cause issue such
as swiotlb buffer full. Add amd_sg_segment_size according to iommu setting,
the details are as follows:
iommu setting   |   amd_sg_segment_size
---
iommu=on|   UINT_MAX
 iommu=off && swiotlb on|   IO_TLB_DEFAULT_SIZE(64M)
xen_swiotlb on  |   PAGE_SIZE(4K)
---

Signed-off-by: Shane Xiao 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22 --
  1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 134575a3893c..d081fcd22d6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -80,6 +80,23 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device 
*adev,
  false, size_in_page);
  }
  
+static inline unsigned int amdgpu_sg_segment_size(void)

+{
+   unsigned int size = swiotlb_max_segment();
+
+   /* size=0 when amd iommu enabled */
+   if (size == 0)
+   size = UINT_MAX;
+
+   size = rounddown(size, PAGE_SIZE);
+   /* swiotlb_max_segment_size can return 1 byte when it means one page. */
+   if (size < PAGE_SIZE)
+   size = PAGE_SIZE;
+
+   return size;
+}
+
+
  /**
   * amdgpu_evict_flags - Compute placement flags
   *
@@ -760,9 +777,10 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_device 
*bdev,
int r;
  
  	/* Allocate an SG array and squash pages into it */

-   r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
- (u64)ttm->num_pages << PAGE_SHIFT,
+   r = sg_alloc_table_from_pages_segment(ttm->sg, ttm->pages, 
ttm->num_pages, 0,
+ (u64)ttm->num_pages << PAGE_SHIFT, 
amdgpu_sg_segment_size(),
  GFP_KERNEL);
+
if (r)
goto release_sg;
  




RE: [PATCH] drm/amd/amdgpu: solve the issue of allocate continuous pages under xen dom0

2022-09-22 Thread Liu, Aaron
[AMD Official Use Only - General]

Reviewed-by: Aaron Liu 

--
Best Regards
Aaron Liu

> -Original Message-
> From: amd-gfx  On Behalf Of
> Shane Xiao
> Sent: Thursday, September 22, 2022 3:12 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Xiao, Shane 
> Subject: [PATCH] drm/amd/amdgpu: solve the issue of allocate continuous
> pages under xen dom0
> 
> [Why]
> sg_alloc_table_from_pages alloc too large continuous PFN pages under xen
> dom0.
> However, xen should check continuous MFN pages in
> range_straddles_page_boundary.
> When range_straddles_page_boundary return false, some cases fall back
> into swiotlb process and the continuous allocable page is not enough.
> 
> [How]
> In fact, xen swiotlb set max_segment default value as UINT_MAX and
> xen_swiotlb_init_early already change the value to PAGE_SIZE under xen
> dom0.
> However amdgpu driver doesn't use the value, which may cause issue such
> as swiotlb buffer full. Add amd_sg_segment_size according to iommu setting,
> the details are as follows:
>   iommu setting   |   amd_sg_segment_size
> ---
>   iommu=on|   UINT_MAX
> iommu=off && swiotlb on   |   IO_TLB_DEFAULT_SIZE(64M)
>   xen_swiotlb on  |   PAGE_SIZE(4K)
> ---
> 
> Signed-off-by: Shane Xiao 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22
> --
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 134575a3893c..d081fcd22d6b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -80,6 +80,23 @@ static int amdgpu_ttm_init_on_chip(struct
> amdgpu_device *adev,
> false, size_in_page);
>  }
> 
> +static inline unsigned int amdgpu_sg_segment_size(void) {
> + unsigned int size = swiotlb_max_segment();
> +
> + /* size=0 when amd iommu enabled */
> + if (size == 0)
> + size = UINT_MAX;
> +
> + size = rounddown(size, PAGE_SIZE);
> + /* swiotlb_max_segment_size can return 1 byte when it means one
> page. */
> + if (size < PAGE_SIZE)
> + size = PAGE_SIZE;
> +
> + return size;
> +}
> +
> +
>  /**
>   * amdgpu_evict_flags - Compute placement flags
>   *
> @@ -760,9 +777,10 @@ static int amdgpu_ttm_tt_pin_userptr(struct
> ttm_device *bdev,
>   int r;
> 
>   /* Allocate an SG array and squash pages into it */
> - r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages,
> 0,
> -   (u64)ttm->num_pages << PAGE_SHIFT,
> + r = sg_alloc_table_from_pages_segment(ttm->sg, ttm->pages, ttm-
> >num_pages, 0,
> +   (u64)ttm->num_pages << PAGE_SHIFT,
> amdgpu_sg_segment_size(),
> GFP_KERNEL);
> +
>   if (r)
>   goto release_sg;
> 
> --
> 2.25.1


[PATCH] drm/amd/amdgpu: solve the issue of allocate continuous pages under xen dom0

2022-09-22 Thread Shane Xiao
[Why]
sg_alloc_table_from_pages alloc too large continuous PFN pages under xen dom0.
However, xen should check continuous MFN pages in range_straddles_page_boundary.
When range_straddles_page_boundary return false, some cases fall back into
swiotlb process and the continuous allocable page is not enough.

[How]
In fact, xen swiotlb set max_segment default value as UINT_MAX and
xen_swiotlb_init_early already change the value to PAGE_SIZE under xen dom0.
However amdgpu driver doesn't use the value, which may cause issue such
as swiotlb buffer full. Add amd_sg_segment_size according to iommu setting,
the details are as follows:
iommu setting   |   amd_sg_segment_size
---
iommu=on|   UINT_MAX
iommu=off && swiotlb on |   IO_TLB_DEFAULT_SIZE(64M)
xen_swiotlb on  |   PAGE_SIZE(4K)
---

Signed-off-by: Shane Xiao 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 134575a3893c..d081fcd22d6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -80,6 +80,23 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device 
*adev,
  false, size_in_page);
 }
 
+static inline unsigned int amdgpu_sg_segment_size(void)
+{
+   unsigned int size = swiotlb_max_segment();
+
+   /* size=0 when amd iommu enabled */
+   if (size == 0)
+   size = UINT_MAX;
+
+   size = rounddown(size, PAGE_SIZE);
+   /* swiotlb_max_segment_size can return 1 byte when it means one page. */
+   if (size < PAGE_SIZE)
+   size = PAGE_SIZE;
+
+   return size;
+}
+
+
 /**
  * amdgpu_evict_flags - Compute placement flags
  *
@@ -760,9 +777,10 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_device 
*bdev,
int r;
 
/* Allocate an SG array and squash pages into it */
-   r = sg_alloc_table_from_pages(ttm->sg, ttm->pages, ttm->num_pages, 0,
- (u64)ttm->num_pages << PAGE_SHIFT,
+   r = sg_alloc_table_from_pages_segment(ttm->sg, ttm->pages, 
ttm->num_pages, 0,
+ (u64)ttm->num_pages << PAGE_SHIFT, 
amdgpu_sg_segment_size(),
  GFP_KERNEL);
+
if (r)
goto release_sg;
 
-- 
2.25.1



Re: [PATCH v2] drivers/amd/pm: check the return value of amdgpu_bo_kmap

2022-09-22 Thread Christian König

Am 22.09.22 um 06:17 schrieb Li Zhong:

amdgpu_bo_kmap() returns error when fails to map buffer object. Add the
error check and propagate the error.

Signed-off-by: Li Zhong 


We usually use "r" as return and error variables, but that's just a nit.

Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 1eb4e613b27a..ec055858eb95 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
  {
struct pp_hwmgr *hwmgr = handle;
struct amdgpu_device *adev = hwmgr->adev;
+   int err;
  
  	if (!addr || !size)

return -EINVAL;
@@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
*addr = NULL;
*size = 0;
if (adev->pm.smu_prv_buffer) {
-   amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   if (err)
+   return err;
*size = adev->pm.smu_prv_buffer_size;
}