[PATCH] drm/amd/display: Remove redundant condition in dcn35_calc_blocks_to_gate()

2024-02-23 Thread Srinivasan Shanmugam
pipe_ctx->plane_res.mpcc_inst is of a type that can only hold values
between 0 and 255, so it's always greater than or equal to 0.

Thus the condition 'pipe_ctx->plane_res.mpcc_inst >= 0' was always true
and has been removed.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn35/dcn35_hwseq.c:1023 
dcn35_calc_blocks_to_gate() warn: always true condition 
'(pipe_ctx->plane_res.mpcc_inst >= 0) => (0-255 >= 0)'

Fixes: 6f8b7565cca4 ("drm/amd/display: Add DCN35 HWSEQ")
Cc: Qingqing Zhuo 
Cc: Harry Wentland 
Cc: Rodrigo Siqueira 
Cc: Roman Li 
Cc: Aurabindo Pillai 
Cc: Tom Chung 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
index 4b92df23ff0d..3dbbf6ea2603 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
@@ -1019,8 +1019,7 @@ void dcn35_calc_blocks_to_gate(struct dc *dc, struct 
dc_state *context,
if (pipe_ctx->plane_res.dpp)

update_state->pg_pipe_res_update[PG_DPP][pipe_ctx->plane_res.hubp->inst] = 
false;
 
-   if ((pipe_ctx->plane_res.dpp || pipe_ctx->stream_res.opp) &&
-   pipe_ctx->plane_res.mpcc_inst >= 0)
+   if (pipe_ctx->plane_res.dpp || pipe_ctx->stream_res.opp)

update_state->pg_pipe_res_update[PG_MPCC][pipe_ctx->plane_res.mpcc_inst] = 
false;
 
if (pipe_ctx->stream_res.dsc)
-- 
2.34.1



[PATCH] drm/amd/display: Fix logical operator in get_meta_and_pte_attr()

2024-02-23 Thread Srinivasan Shanmugam
logical operator in a condition check in the get_meta_and_pte_attr
function bitwise AND operator '&' was used in an 'if' statement, but the
logical AND operator '&&' was likely intended.

The condition check was changed from:
if (!surf_linear & (log2_dpte_req_height_ptes == 0) & surf_vert)
to the below:
if (!surf_linear && (log2_dpte_req_height_ptes == 0) && surf_vert)

This ensures that the 'if' statement will be true only if all three
conditions are met, which is the typical use case in an 'if' statement.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_rq_dlg_calc_20.c:656 
get_meta_and_pte_attr() warn: maybe use && instead of &
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c:656
 get_meta_and_pte_attr() warn: maybe use && instead of &
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_rq_dlg_calc_21.c:662 
get_meta_and_pte_attr() warn: maybe use && instead of &
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_rq_dlg_calc_30.c:627 
get_meta_and_pte_attr() warn: maybe use && instead of &
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn31/display_rq_dlg_calc_31.c:622 
get_meta_and_pte_attr() warn: maybe use && instead of &
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn314/display_rq_dlg_calc_314.c:710
 get_meta_and_pte_attr() warn: maybe use && instead of &

Cc: Rodrigo Siqueira 
Cc: Roman Li 
Cc: Aurabindo Pillai 
Cc: Tom Chung 
Signed-off-by: Srinivasan Shanmugam 
---
 .../gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c  | 3 ++-
 .../drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c| 3 ++-
 .../gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c  | 3 ++-
 .../gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c  | 3 ++-
 .../gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.c  | 3 ++-
 .../drm/amd/display/dc/dml/dcn314/display_rq_dlg_calc_314.c| 3 ++-
 6 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
index 548cdef8a8ad..e689afdf9e7b 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
@@ -653,7 +653,8 @@ static void get_meta_and_pte_attr(struct display_mode_lib 
*mode_lib,
 
// the dpte_group_bytes is reduced for the specific case of vertical
// access of a tile surface that has dpte request of 8x1 ptes.
-   if (!surf_linear & (log2_dpte_req_height_ptes == 0) & surf_vert) 
//reduced, in this case, will have page fault within a group
+   if (!surf_linear && log2_dpte_req_height_ptes == 0 &&
+   surf_vert) //reduced, in this case, will have page fault within a 
group
rq_sizing_param->dpte_group_bytes = 512;
else
//full size
diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c
index 0fc9f3e3ffae..6d105a8bd4c7 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c
@@ -653,7 +653,8 @@ static void get_meta_and_pte_attr(struct display_mode_lib 
*mode_lib,
 
// the dpte_group_bytes is reduced for the specific case of vertical
// access of a tile surface that has dpte request of 8x1 ptes.
-   if (!surf_linear & (log2_dpte_req_height_ptes == 0) & surf_vert) 
//reduced, in this case, will have page fault within a group
+   if (!surf_linear && log2_dpte_req_height_ptes == 0 &&
+   surf_vert) //reduced, in this case, will have page fault within a 
group
rq_sizing_param->dpte_group_bytes = 512;
else
//full size
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c
index 618f4b682ab1..5d604f52a948 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c
@@ -659,7 +659,8 @@ static void get_meta_and_pte_attr(
if (hostvm_enable)
rq_sizing_param->dpte_group_bytes = 512;
else {
-   if (!surf_linear & (log2_dpte_req_height_ptes == 0) & 
surf_vert) //reduced, in this case, will have page fault within a group
+   if (!surf_linear && log2_dpte_req_height_ptes == 0 &&
+   surf_vert) //reduced, in this case, will have page fault 
within a group
rq_sizing_param->dpte_group_bytes = 512;
else
//full size
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c
index 0497a5d74a62..6f6de729b195 100644
--- a/drivers/gpu/dr

[PATCH] drm/amd/display: Improve 'dml32_TruncToValidBPP()' function

2024-02-23 Thread Srinivasan Shanmugam
Refactors the dml32_TruncToValidBPP function by removing a
redundant return statement.

The function previously had a return statement at the end that was
never executed because all execution paths in the function ended with
a return statement before this line.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_util_32.c:1680
 dml32_TruncToValidBPP() warn: ignoring unreachable code.

Fixes: dda4fb85e433 ("drm/amd/display: DML changes for DCN32/321")
Cc: Rodrigo Siqueira 
Cc: Roman Li 
Cc: Aurabindo Pillai 
Cc: Tom Chung 
Signed-off-by: Srinivasan Shanmugam 
---
 .../gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c | 2 --
 1 file changed, 2 deletions(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
index 80fccd4999a5..54ac8242f7b0 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/display_mode_vba_util_32.c
@@ -1678,8 +1678,6 @@ double dml32_TruncToValidBPP(
}
 
*RequiredSlots = dml_ceil(DesiredBPP / MaxLinkBPP * 64, 1);
-
-   return BPP_INVALID;
 } // TruncToValidBPP
 
 double dml32_RequiredDTBCLK(
-- 
2.34.1



[PATCH] drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int()

2024-02-23 Thread Srinivasan Shanmugam
Missing break statement in the ATOM_ARG_IMM case of a switch statement,
adds the missing break statement, ensuring that the program's control
flow is as intended.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/atom.c:323 atom_get_src_int() warn: ignoring 
unreachable code.

Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Cc: Jammy Zhou 
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/atom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c 
b/drivers/gpu/drm/amd/amdgpu/atom.c
index b888613f653f..72362df352f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -320,7 +320,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, 
uint8_t attr,
DEBUG("IMM 0x%02X\n", val);
return val;
}
-   return 0;
+   break;
case ATOM_ARG_PLL:
idx = U8(*ptr);
(*ptr)++;
-- 
2.34.1



[PATCH] drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()

2024-02-23 Thread Srinivasan Shanmugam
This ensures that the memory mapped by ioremap for adev->rmmio, is
properly handled in amdgpu_device_init(). If the function exits early
due to an error, the memory is unmapped. If the function completes
successfully, the memory remains mapped.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4337 amdgpu_device_init() warn: 
'adev->rmmio' from ioremap() not released on lines: 
4035,4045,4051,4058,4068,4337

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1ef892bea488..68bf5e910cb8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4031,8 +4031,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 * early on during init and before calling to RREG32.
 */
adev->reset_domain = amdgpu_reset_create_reset_domain(SINGLE_DEVICE, 
"amdgpu-reset-dev");
-   if (!adev->reset_domain)
+   if (!adev->reset_domain) {
+   iounmap(adev->rmmio);
return -ENOMEM;
+   }
 
/* detect hw virtualization here */
amdgpu_detect_virtualization(adev);
@@ -4042,20 +4044,25 @@ int amdgpu_device_init(struct amdgpu_device *adev,
r = amdgpu_device_get_job_timeout_settings(adev);
if (r) {
dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
+   iounmap(adev->rmmio);
return r;
}
 
/* early init functions */
r = amdgpu_device_ip_early_init(adev);
-   if (r)
+   if (r) {
+   iounmap(adev->rmmio);
return r;
+   }
 
amdgpu_device_set_mcbp(adev);
 
/* Get rid of things like offb */
r = drm_aperture_remove_conflicting_pci_framebuffers(adev->pdev, 
&amdgpu_kms_driver);
-   if (r)
+   if (r) {
+   iounmap(adev->rmmio);
return r;
+   }
 
/* Enable TMZ based on IP_VERSION */
amdgpu_gmc_tmz_set(adev);
@@ -4064,8 +4071,10 @@ int amdgpu_device_init(struct amdgpu_device *adev,
/* Need to get xgmi info early to decide the reset behavior*/
if (adev->gmc.xgmi.supported) {
r = adev->gfxhub.funcs->get_xgmi_info(adev);
-   if (r)
+   if (r) {
+   iounmap(adev->rmmio);
return r;
+   }
}
 
/* enable PCIE atomic ops */
@@ -4334,6 +4343,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 failed:
amdgpu_vf_error_trans_all(adev);
 
+   iounmap(adev->rmmio);
return r;
 }
 
-- 
2.34.1



Re: [PATCH] drm/amdkfd: Increase the size of the memory reserved for the TBA

2024-02-23 Thread Felix Kuehling



On 2024-02-23 14:05, Laurent Morichetti wrote:

In a future commit, the cwsr trap handler code size for gfx10.1 will
increase to slightly above the one page mark. Since the TMA does not
need to be page aligned, and only 2 pointers are stored in it, push
the TMA offset by 2 KiB and keep the TBA+TMA reserved memory size
to two pages.

Signed-off-by: Laurent Morichetti 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 23 ---
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  6 +++---
  2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 4d399c0c8a57..041ec3de55e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -466,34 +466,43 @@ static void kfd_cwsr_init(struct kfd_dev *kfd)
  {
if (cwsr_enable && kfd->device_info.supports_cwsr) {
if (KFD_GC_VERSION(kfd) < IP_VERSION(9, 0, 1)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx8_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx8_hex);
} else if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 1)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_arcturus_hex) > 
PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_arcturus_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_arcturus_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_arcturus_hex);
} else if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 2)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_aldebaran_hex) > 
PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_aldebaran_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_aldebaran_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_aldebaran_hex);
} else if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 3)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_4_3_hex) > 
PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_4_3_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx9_4_3_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx9_4_3_hex);
} else if (KFD_GC_VERSION(kfd) < IP_VERSION(10, 1, 1)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx9_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx9_hex);
} else if (KFD_GC_VERSION(kfd) < IP_VERSION(10, 3, 0)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_nv1x_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_nv1x_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_nv1x_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_nv1x_hex);
} else if (KFD_GC_VERSION(kfd) < IP_VERSION(11, 0, 0)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx10_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx10_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx10_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx10_hex);
} else {
+   /* The gfx11 cwsr trap handler must fit inside a single
+  page. */
BUILD_BUG_ON(sizeof(cwsr_trap_gfx11_hex) > PAGE_SIZE);
kfd->cwsr_isa = cwsr_trap_gfx11_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx11_hex);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 80320b8603fc..42d40560cd30 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -99,11 +99,11 @@
  /*
   * Size of the per-process TBA+TMA buffer: 2 pages
   *
- * The first page is the TBA used for the CWSR ISA code. The second
- * page is used as TMA for user-mode trap handler setup in daisy-chain mode.
+ * The first chunk is the TBA used for the CWSR ISA code. The second
+ * chunk is used as TMA for user-mode trap handler setup in daisy-chain mode.
   */
  #define KFD_CWSR_TBA_TMA_SIZE (PAGE_SIZE * 2)
-#define KFD_CWSR_TMA_OFFSET PAGE_SIZE
+#define KFD_CWSR_TMA_OFFSET (

[PATCH] drm/amdkfd: Use SQC when TCP would fail in gfx10.1 context save

2024-02-23 Thread Laurent Morichetti
Similarly to gfx9, gfx10.1 drops vector stores when an xnack error is
raised. To work around this issue, use scalar stores instead of vector
stores when trapsts.xnack_error == 1.

Signed-off-by: Laurent Morichetti 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 543 --
 .../amd/amdkfd/cwsr_trap_handler_gfx10.asm| 156 -
 2 files changed, 530 insertions(+), 169 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 2e9b64edb8d2..5a0308d26b53 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -678,7 +678,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
-   0xbf820001, 0xbf8201f5,
+   0xbf820001, 0xbf820394,
0xb0804004, 0xb978f802,
0x8a78ff78, 0x00020006,
0xb97bf803, 0x876eff78,
@@ -769,13 +769,90 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0x877c817c, 0xbf06817c,
0xbf850002, 0xbeff0380,
0xbf820002, 0xbeff03c1,
-   0xbf82000b, 0xbef603ff,
-   0x0100, 0xe0704000,
-   0x705d, 0xe0704080,
-   0x705d0100, 0xe0704100,
-   0x705d0200, 0xe0704180,
-   0x705d0300, 0xbf82000a,
-   0xbef603ff, 0x0100,
+   0xbf820058, 0xbef603ff,
+   0x0100, 0xb97af803,
+   0x8a7a7aff, 0x1000,
+   0xbf850049, 0xbe840380,
+   0xd760, 0x0900,
+   0x80048104, 0xd761,
+   0x0900, 0x80048104,
+   0xd762, 0x0900,
+   0x80048104, 0xd763,
+   0x0900, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06a004,
+   0xbf84ffef, 0xbe840380,
+   0xd760, 0x0901,
+   0x80048104, 0xd761,
+   0x0901, 0x80048104,
+   0xd762, 0x0901,
+   0x80048104, 0xd763,
+   0x0901, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06a004,
+   0xbf84ffef, 0xbe840380,
+   0xd760, 0x0902,
+   0x80048104, 0xd761,
+   0x0902, 0x80048104,
+   0xd762, 0x0902,
+   0x80048104, 0xd763,
+   0x0902, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06a004,
+   0xbf84ffef, 0xbe840380,
+   0xd760, 0x0903,
+   0x80048104, 0xd761,
+   0x0903, 0x80048104,
+   0xd762, 0x0903,
+   0x80048104, 0xd763,
+   0x0903, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06a004,
+   0xbf84ffef, 0xbf820060,
+   0xe0704000, 0x705d,
+   0xe0704080, 0x705d0100,
+   0xe0704100, 0x705d0200,
+   0xe0704180, 0x705d0300,
+   0xbf820057, 0xbef603ff,
+   0x0100, 0xb97af803,
+   0x8a7a7aff, 0x1000,
+   0xbf850049, 0xbe840380,
+   0xd760, 0x0900,
+   0x80048104, 0xd761,
+   0x0900, 0x80048104,
+   0xd762, 0x0900,
+   0x80048104, 0xd763,
+   0x0900, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06c004,
+   0xbf84ffef, 0xbe840380,
+   0xd760, 0x0901,
+   0x80048104, 0xd761,
+   0x0901, 0x80048104,
+   0xd762, 0x0901,
+   0x80048104, 0xd763,
+   0x0901, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06c004,
+   0xbf84ffef, 0xbe840380,
+   0xd760, 0x0902,
+   0x80048104, 0xd761,
+   0x0902, 0x80048104,
+   0xd762, 0x0902,
+   0x80048104, 0xd763,
+   0x0902, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06c004,
+   0xbf84ffef, 0xbe840380,
+   0xd760, 0x0903,
+   0x80048104, 0xd761,
+   0x0903, 0x80048104,
+   0xd762, 0x0903,
+   0x80048104, 0xd763,
+   0x0903, 0x80048104,
+   0xf469003a, 0xe000,
+   0x80709070, 0xbf06c004,
+   0xbf84ffef, 0xbf820008,
0xe0704000, 0x705d,
0xe0704100, 0x705d0100,
0xe0704200, 0x705d0200,
@@ -855,9 +932,9 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xbf850002, 0xbeff0380,
0xbf820001, 0xbeff03c1,
0xb97b4306, 0x877bc17b,
-   0xbf840044, 0xbf8a,
+   0xbf840086, 0xbf8a,
0x877aff6d, 0x8000,
-   0xbf840040, 0x8f7b867b,
+   0xbf840082, 0x8f7b867b,
0x8f7b827b, 0xbef6037b,
0xb9703a05, 0x80708170,
0xbf0d9973, 0xbf850002,
@@ -871,16 +948,49 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
0xd766, 0x000200c1,
0x1684, 0x907c9973,
0x877c817c, 0xbf06817c,
-   0xbefc0380, 0xbf850012,
-   0xbe8303ff, 0x0080,
+   0xbefc0380, 0xbf850033,
+   0xb97af803, 0x8a7a7aff,
+   0x1000, 0xbf85001d,
+   0xd8d8, 0x0100,
+   0xbf8c, 0xbe840380,
+   0xd760, 0x0901,
+   0x80048104, 0xd761,
+   0x00

[PATCH] drm/amdkfd: Increase the size of the memory reserved for the TBA

2024-02-23 Thread Laurent Morichetti
In a future commit, the cwsr trap handler code size for gfx10.1 will
increase to slightly above the one page mark. Since the TMA does not
need to be page aligned, and only 2 pointers are stored in it, push
the TMA offset by 2 KiB and keep the TBA+TMA reserved memory size
to two pages.

Signed-off-by: Laurent Morichetti 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 23 ---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  6 +++---
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 4d399c0c8a57..041ec3de55e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -466,34 +466,43 @@ static void kfd_cwsr_init(struct kfd_dev *kfd)
 {
if (cwsr_enable && kfd->device_info.supports_cwsr) {
if (KFD_GC_VERSION(kfd) < IP_VERSION(9, 0, 1)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx8_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx8_hex);
} else if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 1)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_arcturus_hex) > 
PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_arcturus_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_arcturus_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_arcturus_hex);
} else if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 2)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_aldebaran_hex) > 
PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_aldebaran_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_aldebaran_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_aldebaran_hex);
} else if (KFD_GC_VERSION(kfd) == IP_VERSION(9, 4, 3)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_4_3_hex) > 
PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_4_3_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx9_4_3_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx9_4_3_hex);
} else if (KFD_GC_VERSION(kfd) < IP_VERSION(10, 1, 1)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx9_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx9_hex);
} else if (KFD_GC_VERSION(kfd) < IP_VERSION(10, 3, 0)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_nv1x_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_nv1x_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_nv1x_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_nv1x_hex);
} else if (KFD_GC_VERSION(kfd) < IP_VERSION(11, 0, 0)) {
-   BUILD_BUG_ON(sizeof(cwsr_trap_gfx10_hex) > PAGE_SIZE);
+   BUILD_BUG_ON(sizeof(cwsr_trap_gfx10_hex)
+> KFD_CWSR_TMA_OFFSET);
kfd->cwsr_isa = cwsr_trap_gfx10_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx10_hex);
} else {
+   /* The gfx11 cwsr trap handler must fit inside a single
+  page. */
BUILD_BUG_ON(sizeof(cwsr_trap_gfx11_hex) > PAGE_SIZE);
kfd->cwsr_isa = cwsr_trap_gfx11_hex;
kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx11_hex);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 80320b8603fc..42d40560cd30 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -99,11 +99,11 @@
 /*
  * Size of the per-process TBA+TMA buffer: 2 pages
  *
- * The first page is the TBA used for the CWSR ISA code. The second
- * page is used as TMA for user-mode trap handler setup in daisy-chain mode.
+ * The first chunk is the TBA used for the CWSR ISA code. The second
+ * chunk is used as TMA for user-mode trap handler setup in daisy-chain mode.
  */
 #define KFD_CWSR_TBA_TMA_SIZE (PAGE_SIZE * 2)
-#define KFD_CWSR_TMA_OFFSET PAGE_SIZE
+#define KFD_CWSR_TMA_OFFSET (PAGE_SIZE + 2048)
 
 #define KFD_MAX_NUM_OF_QUEUES_PER_DEVICE   \
(KFD_MAX

RE: [PATCH 2/4] drm/amdgpu: Do not program SQ_TIMEOUT_CONFIG in SRIOV

2024-02-23 Thread Luo, Zhigang
[AMD Official Use Only - General]

Reviewed By Zhigang Luo mailto:zhigang@amd.com>>


From: Lu, Victor Cheng Chi (Victor) 
Sent: Friday, February 16, 2024 1:50 PM
To: Luo, Zhigang 
Subject: Fw: [PATCH 2/4] drm/amdgpu: Do not program SQ_TIMEOUT_CONFIG in SRIOV


[AMD Official Use Only - General]



From: Lu, Victor Cheng Chi (Victor) 
mailto:victorchengchi...@amd.com>>
Sent: Tuesday, January 2, 2024 12:30 PM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Chander, Vignesh mailto:vignesh.chan...@amd.com>>; 
Lu, Victor Cheng Chi (Victor) 
mailto:victorchengchi...@amd.com>>
Subject: [PATCH 2/4] drm/amdgpu: Do not program SQ_TIMEOUT_CONFIG in SRIOV

VF should not program this register.

Signed-off-by: Victor Lu 
mailto:victorchengchi...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 00b21ece081f..30cc155f20d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -3888,6 +3888,9 @@ static void gfx_v9_4_3_inst_enable_watchdog_timer(struct 
amdgpu_device *adev,
 uint32_t i;
 uint32_t data;

+   if (amdgpu_sriov_vf(adev))
+   return;
+
 data = RREG32_SOC15(GC, GET_INST(GC, 0), regSQ_TIMEOUT_CONFIG);
 data = REG_SET_FIELD(data, SQ_TIMEOUT_CONFIG, TIMEOUT_FATAL_DISABLE,
  amdgpu_watchdog_timer.timeout_fatal_disable ? 1 : 
0);
--
2.34.1


Re: [PATCH v3 2/3] drm/amdgpu: implement TLB flush fence

2024-02-23 Thread Philip Yang

  


On 2024-02-23 08:42, Shashank Sharma
  wrote:


  From: Christian König 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 106 ++
 4 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4c989da4d2f3..fdbb3d770c7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o amdgpu_kms.o \
 	amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
 	atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
 	atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-	amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+	amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+	amdgpu_ib.o amdgpu_pll.o \
 	amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
 	amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
 	amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0960e0a665d3..67c690044b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -932,6 +932,15 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r)
 		goto error_unlock;
 
+	/* Prepare a TLB flush fence to be attached to PTs */
+	if (!unlocked && params.needs_flush && vm->is_compute_context) {
+		amdgpu_vm_tlb_fence_create(adev, vm, fence);
+
+		/* Makes sure no PD/PT is freed before the flush */
+		dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+   DMA_RESV_USAGE_BOOKKEEP);
+	}
+
 	amdgpu_res_first(pages_addr ? NULL : res, offset,
 			 (last - start + 1) * AMDGPU_GPU_PAGE_SIZE, &cursor);
 	while (cursor.remaining) {
@@ -2237,6 +2246,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 
 	mutex_init(&vm->eviction_lock);
 	vm->evicting = false;
+	vm->tlb_fence_context = dma_fence_context_alloc(1);
 
 	r = amdgpu_vm_pt_create(adev, vm, adev->vm_manager.root_level,
 false, &root, xcp_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index ac9380afcb69..8e6fd25d07b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -332,6 +332,7 @@ struct amdgpu_vm {
 	atomic64_t		tlb_seq;
 	uint64_t		tlb_seq_va;
 	uint64_t		*tlb_seq_cpu_addr;
+	uint64_t		tlb_fence_context;
 
 	atomic64_t		kfd_last_flushed_seq;
 
@@ -585,5 +586,8 @@ void amdgpu_vm_update_fault_cache(struct amdgpu_device *adev,
   uint64_t addr,
   uint32_t status,
   unsigned int vmhub);
+void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev,
+ struct amdgpu_vm *vm,
+ struct dma_fence **fence);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
new file mode 100644
index ..569681badd7c
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY C

Re: [PATCH 1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs

2024-02-23 Thread Michel Dänzer
On 2024-02-23 11:04, Michel Dänzer wrote:
> On 2024-02-23 10:34, Christian König wrote:
>> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>>> On 2024-02-23 08:06, Christian König wrote:
 Am 22.02.24 um 18:28 schrieb Michel Dänzer:
> From: Michel Dänzer 
>
> Pinning the BO storage to VRAM for scanout would make it inaccessible
> to non-P2P dma-buf importers.
 Thinking more about it I don't think we can do this.

 Using the BO in a ping/pong fashion for scanout and DMA-buf is actually 
 valid, you just can't do both at the same time.

 And if I'm not completely mistaken we actually have use cases for this at 
 the moment,
>>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
>>
>> Nope, we are basically talking about unit tests and examples for inter 
>> device operations.
> 
> Sounds like the "no user-space regressions" rule might not apply then.

To clarify what I mean by that:

"We can't fix this issue, because it would break some unit tests and examples" 
is similar to saying "We can't fix this KMS bug, because there are IGT tests 
expecting the buggy behaviour". In practice, the latter do get fixed, along 
with the IGT tests.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH] drm/amdgpu/pm: Fix the power1_min_cap value

2024-02-23 Thread Alex Deucher
On Fri, Feb 23, 2024 at 4:20 AM Ma Jun  wrote:
>
> It's unreasonable to use 0 as the power1_min_cap when
> OD is disabled. So, use the same lower limit as the value
> used when OD is enabled.
>
> Signed-off-by: Ma Jun 

Fixes: 1958946858a6 ("drm/amd/pm: Support for getting power1_cap_min value")

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c   | 9 -
>  drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 9 -
>  drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 9 -
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c| 9 -
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c| 9 -
>  5 files changed, 20 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> index 4cd43bbec910..bcad42534da4 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> @@ -1303,13 +1303,12 @@ static int arcturus_get_power_limit(struct 
> smu_context *smu,
> if (default_power_limit)
> *default_power_limit = power_limit;
>
> -   if (smu->od_enabled) {
> +   if (smu->od_enabled)
> od_percent_upper = 
> le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> -   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> -   } else {
> +   else
> od_percent_upper = 0;
> -   od_percent_lower = 100;
> -   }
> +
> +   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
>
> dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
> (default power: %d)\n",
> od_percent_upper, 
> od_percent_lower, power_limit);
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> index 8d1d29ffb0f1..ed189a3878eb 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> @@ -2357,13 +2357,12 @@ static int navi10_get_power_limit(struct smu_context 
> *smu,
> *default_power_limit = power_limit;
>
> if (smu->od_enabled &&
> -   navi10_od_feature_is_supported(od_settings, 
> SMU_11_0_ODCAP_POWER_LIMIT)) {
> +   navi10_od_feature_is_supported(od_settings, 
> SMU_11_0_ODCAP_POWER_LIMIT))
> od_percent_upper = 
> le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> -   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> -   } else {
> +   else
> od_percent_upper = 0;
> -   od_percent_lower = 100;
> -   }
> +
> +   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
>
> dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
> (default power: %d)\n",
> od_percent_upper, od_percent_lower, 
> power_limit);
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> index f2f401f00ed1..a405424dd699 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> @@ -640,13 +640,12 @@ static int sienna_cichlid_get_power_limit(struct 
> smu_context *smu,
> if (default_power_limit)
> *default_power_limit = power_limit;
>
> -   if (smu->od_enabled) {
> +   if (smu->od_enabled)
> od_percent_upper = 
> le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
> -   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
> -   } else {
> +   else
> od_percent_upper = 0;
> -   od_percent_lower = 100;
> -   }
> +
> +   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
>
> dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
> (default power: %d)\n",
> od_percent_upper, od_percent_lower, 
> power_limit);
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> index 67f44f851f59..9649484f11c0 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> @@ -2372,13 +2372,12 @@ static int smu_v13_0_0_get_power_limit(struct 
> smu_context *smu,
> 

Re: [PATCH] drm/amdgpu: disable ring_muxer if mcbp is off

2024-02-23 Thread Alex Deucher
On Fri, Feb 23, 2024 at 4:48 AM Pierre-Eric Pelloux-Prayer
 wrote:
>
> Using the ring_muxer without preemption adds overhead for no
> reason since mcbp cannot be triggered.
>
> Moving back to a single queue in this case also helps when
> high priority app are used: in this case the gpu_scheduler
> priority handling will work as expected - much better than
> ring_muxer with its 2 independant schedulers competing for
> the same hardware queue.
>
> This change requires moving amdgpu_device_set_mcbp above
> amdgpu_device_ip_early_init because we use adev->gfx.mcbp.
>
> Signed-off-by: Pierre-Eric Pelloux-Prayer 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 21 -
>  2 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index d534e192e260..40516d24026c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4054,13 +4054,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> return r;
> }
>
> +   amdgpu_device_set_mcbp(adev);
> +
> /* early init functions */
> r = amdgpu_device_ip_early_init(adev);
> if (r)
> return r;
>
> -   amdgpu_device_set_mcbp(adev);
> -
> /* Get rid of things like offb */
> r = drm_aperture_remove_conflicting_pci_framebuffers(adev->pdev, 
> &amdgpu_kms_driver);
> if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 169d45268ef6..f682f830f7f6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -2080,7 +2080,7 @@ static int gfx_v9_0_sw_init(void *handle)
> ring->doorbell_index = adev->doorbell_index.gfx_ring0 << 1;
>
> /* disable scheduler on the real ring */
> -   ring->no_scheduler = true;
> +   ring->no_scheduler = adev->gfx.mcbp;
> ring->vm_hub = AMDGPU_GFXHUB(0);
> r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
>  AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP,
> @@ -2090,7 +2090,7 @@ static int gfx_v9_0_sw_init(void *handle)
> }
>
> /* set up the software rings */
> -   if (adev->gfx.num_gfx_rings) {
> +   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
> for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++) {
> ring = &adev->gfx.sw_gfx_ring[i];
> ring->ring_obj = NULL;
> @@ -2181,7 +2181,7 @@ static int gfx_v9_0_sw_fini(void *handle)
> int i;
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> -   if (adev->gfx.num_gfx_rings) {
> +   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
> for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
> amdgpu_ring_fini(&adev->gfx.sw_gfx_ring[i]);
> amdgpu_ring_mux_fini(&adev->gfx.muxer);
> @@ -5910,11 +5910,14 @@ static int gfx_v9_0_eop_irq(struct amdgpu_device 
> *adev,
>
> switch (me_id) {
> case 0:
> -   if (adev->gfx.num_gfx_rings &&
> -   !amdgpu_mcbp_handle_trailing_fence_irq(&adev->gfx.muxer)) 
> {
> -   /* Fence signals are handled on the software rings*/
> -   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
> -   
> amdgpu_fence_process(&adev->gfx.sw_gfx_ring[i]);
> +   if (adev->gfx.num_gfx_rings) {
> +   if (!adev->gfx.mcbp) {
> +   amdgpu_fence_process(&adev->gfx.gfx_ring[0]);
> +   } else if 
> (!amdgpu_mcbp_handle_trailing_fence_irq(&adev->gfx.muxer)) {
> +   /* Fence signals are handled on the software 
> rings*/
> +   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
> +   
> amdgpu_fence_process(&adev->gfx.sw_gfx_ring[i]);
> +   }
> }
> break;
> case 1:
> @@ -7051,7 +7054,7 @@ static void gfx_v9_0_set_ring_funcs(struct 
> amdgpu_device *adev)
> for (i = 0; i < adev->gfx.num_gfx_rings; i++)
> adev->gfx.gfx_ring[i].funcs = &gfx_v9_0_ring_funcs_gfx;
>
> -   if (adev->gfx.num_gfx_rings) {
> +   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
> for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
> adev->gfx.sw_gfx_ring[i].funcs = 
> &gfx_v9_0_sw_ring_funcs_gfx;
> }
> --
> 2.40.1
>


RE: [PATCH] drm/amdkfd: Add partition id field to location_id

2024-02-23 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, February 22, 2024 10:49 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Deucher, Alexander
> ; Kim, Jonathan ;
> Poag, Charis ; Cheung, Donald
> ; Yat Sin, David ;
> Chaudhary, Jatin Jaikishan 
> Subject: [PATCH] drm/amdkfd: Add partition id field to location_id
>
> On devices which have multi-partition nodes, keep partition id in
> location_id[31:28].
>
> Signed-off-by: Lijo Lazar 

Reviewed-by: Jonathan Kim 

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index aee2fcab241f..0da747d52975 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1997,8 +1997,9 @@ int kfd_topology_add_device(struct kfd_node
> *gpu)
>   HSA_CAP_ASIC_REVISION_MASK);
>
>   dev->node_props.location_id = pci_dev_id(gpu->adev->pdev);
> - if (KFD_GC_VERSION(dev->gpu->kfd) == IP_VERSION(9, 4, 3))
> - dev->node_props.location_id |= dev->gpu->node_id;
> + /* On multi-partition nodes, node id = location_id[31:28] */
> + if (gpu->kfd->num_nodes > 1)
> + dev->node_props.location_id |= (dev->gpu->node_id << 28);
>
>   dev->node_props.domain = pci_domain_nr(gpu->adev->pdev->bus);
>   dev->node_props.max_engine_clk_fcompute =
> --
> 2.25.1



Re: [PATCH] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

2024-02-23 Thread Alex Deucher
On Thu, Feb 22, 2024 at 9:48 PM Prike Liang  wrote:
>
> Currently, GPU resets can now be performed successfully on the Raven
> series. While GPU reset is required for the S3 suspend abort case.
> So now can enable gpu reset for S3 abort cases on the Raven series.
>
> Signed-off-by: Prike Liang 

Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +-
>  1 file changed, 25 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index e4012f53632b..dec81ccf6240 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -574,11 +574,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
> return AMD_RESET_METHOD_MODE1;
>  }
>
> +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> +{
> +   u32 sol_reg;
> +
> +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> +
> +   /* Will reset for the following suspend abort cases.
> +* 1) Only reset limit on APU side, dGPU hasn't checked yet.
> +* 2) S3 suspend abort and TOS already launched.
> +*/
> +   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> +   !adev->suspend_complete &&
> +   sol_reg)
> +   return true;
> +
> +   return false;
> +}
> +
>  static int soc15_asic_reset(struct amdgpu_device *adev)
>  {
> /* original raven doesn't have full asic reset */
> -   if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> -   (adev->apu_flags & AMD_APU_IS_RAVEN2))
> +   /* On the latest Raven, the GPU reset can be performed
> +* successfully. So now, temporarily enable it for the
> +* S3 suspend abort case.
> +*/
> +   if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> +   (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> +   !soc15_need_reset_on_resume(adev))
> return 0;
>
> switch (soc15_asic_reset_method(adev)) {
> @@ -1298,24 +1321,6 @@ static int soc15_common_suspend(void *handle)
> return soc15_common_hw_fini(adev);
>  }
>
> -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> -{
> -   u32 sol_reg;
> -
> -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> -
> -   /* Will reset for the following suspend abort cases.
> -* 1) Only reset limit on APU side, dGPU hasn't checked yet.
> -* 2) S3 suspend abort and TOS already launched.
> -*/
> -   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> -   !adev->suspend_complete &&
> -   sol_reg)
> -   return true;
> -
> -   return false;
> -}
> -
>  static int soc15_common_resume(void *handle)
>  {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> --
> 2.34.1
>


[PATCH v3 3/3] drm/amdgpu: sync page table freeing with tlb flush

2024-02-23 Thread Shashank Sharma
This patch:
- adds a new list in amdgou_vm to hold the VM PT entries being freed
- waits for the TLB flush using the vm->tlb_flush_fence
- actually frees the PT BOs

V2: rebase
V3: Do not attach the tlb_fence to the entries, rather add the entries
to a list and delay their freeing (Christian)

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  6 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  6 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 51 ---
 3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 67c690044b97..eebb73f2c2ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -939,6 +939,10 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
/* Makes sure no PD/PT is freed before the flush */
dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
   DMA_RESV_USAGE_BOOKKEEP);
+
+   mutex_lock(&vm->tlb_fence_lock);
+   vm->tlb_fence_last = *fence;
+   mutex_unlock(&vm->tlb_fence_lock);
}
 
amdgpu_res_first(pages_addr ? NULL : res, offset,
@@ -2212,6 +2216,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
INIT_LIST_HEAD(&vm->freed);
INIT_LIST_HEAD(&vm->done);
INIT_LIST_HEAD(&vm->pt_freed);
+   INIT_LIST_HEAD(&vm->tlb_flush_waitlist);
INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work);
INIT_KFIFO(vm->faults);
 
@@ -2244,6 +2249,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
vm->last_unlocked = dma_fence_get_stub();
vm->generation = 0;
 
+   mutex_init(&vm->tlb_fence_lock);
mutex_init(&vm->eviction_lock);
vm->evicting = false;
vm->tlb_fence_context = dma_fence_context_alloc(1);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 8e6fd25d07b7..77f10ed80973 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -334,6 +334,10 @@ struct amdgpu_vm {
uint64_t*tlb_seq_cpu_addr;
uint64_ttlb_fence_context;
 
+   struct mutextlb_fence_lock;
+   struct dma_fence*tlb_fence_last;
+   struct list_headtlb_flush_waitlist;
+
atomic64_t  kfd_last_flushed_seq;
 
/* How many times we had to re-generate the page tables */
@@ -379,6 +383,8 @@ struct amdgpu_vm {
 
/* cached fault info */
struct amdgpu_vm_fault_info fault_info;
+
+   int count_bos;
 };
 
 struct amdgpu_vm_manager {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 95dc0afdaffb..57ea95c5c085 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -643,13 +643,13 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
if (!entry->bo)
return;
 
-   entry->bo->vm_bo = NULL;
shadow = amdgpu_bo_shadowed(entry->bo);
if (shadow) {
ttm_bo_set_bulk_move(&shadow->tbo, NULL);
amdgpu_bo_unref(&shadow);
}
ttm_bo_set_bulk_move(&entry->bo->tbo, NULL);
+   entry->bo->vm_bo = NULL;
 
spin_lock(&entry->vm->status_lock);
list_del(&entry->vm_status);
@@ -657,6 +657,38 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
amdgpu_bo_unref(&entry->bo);
 }
 
+static void amdgpu_vm_pt_flush_waitlist(struct amdgpu_vm *vm)
+{
+   struct amdgpu_vm_bo_base *entry, *next;
+   LIST_HEAD(tlb_flush_waitlist);
+
+   if (!vm || list_empty(&vm->tlb_flush_waitlist))
+   return;
+
+   /* Wait for pending TLB flush before freeing PT BOs */
+   mutex_lock(&vm->tlb_fence_lock);
+   if (vm->tlb_fence_last && !dma_fence_is_signaled(vm->tlb_fence_last)) {
+   if (dma_fence_wait_timeout(vm->tlb_fence_last, false,
+  MAX_SCHEDULE_TIMEOUT) <= 0) {
+   DRM_ERROR("Timedout waiting for TLB flush, not freeing 
PT BOs\n");
+   mutex_unlock(&vm->tlb_fence_lock);
+   return;
+   }
+
+   vm->tlb_fence_last = NULL;
+   }
+
+   /* Save the waitlist locally and reset the flushlist */
+   list_splice_init(&vm->tlb_flush_waitlist, &tlb_flush_waitlist);
+   mutex_unlock(&vm->tlb_fence_lock);
+
+   /* Now free the entries */
+   list_for_each_entry_safe(entry, next, &tlb_flush_waitlist, vm_status) {
+   if (entry)
+   amdgpu_vm_pt_free(entry);
+   }
+}
+
 void amdgp

[PATCH v3 2/3] drm/amdgpu: implement TLB flush fence

2024-02-23 Thread Shashank Sharma
From: Christian König 

The problem is that when (for example) 4k pages are replaced
with a single 2M page we need to wait for change to be flushed
out by invalidating the TLB before the PT can be freed.

Solve this by moving the TLB flush into a DMA-fence object which
can be used to delay the freeing of the PT BOs until it is signaled.

V2: (Shashank)
- rebase
- set dma_fence_error only in case of error
- add tlb_flush fence only when PT/PD BO is locked (Felix)
- use vm->pasid when f is NULL (Mukul)

Cc: Christian Koenig 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Cc: Alex Deucher 
Signed-off-by: Christian König 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  10 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|   4 +
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  | 106 ++
 4 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 4c989da4d2f3..fdbb3d770c7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -70,7 +70,8 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
-   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
+   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_vm_tlb_fence.o \
+   amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0960e0a665d3..67c690044b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -932,6 +932,15 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (r)
goto error_unlock;
 
+   /* Prepare a TLB flush fence to be attached to PTs */
+   if (!unlocked && params.needs_flush && vm->is_compute_context) {
+   amdgpu_vm_tlb_fence_create(adev, vm, fence);
+
+   /* Makes sure no PD/PT is freed before the flush */
+   dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
+  DMA_RESV_USAGE_BOOKKEEP);
+   }
+
amdgpu_res_first(pages_addr ? NULL : res, offset,
 (last - start + 1) * AMDGPU_GPU_PAGE_SIZE, &cursor);
while (cursor.remaining) {
@@ -2237,6 +2246,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
 
mutex_init(&vm->eviction_lock);
vm->evicting = false;
+   vm->tlb_fence_context = dma_fence_context_alloc(1);
 
r = amdgpu_vm_pt_create(adev, vm, adev->vm_manager.root_level,
false, &root, xcp_id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index ac9380afcb69..8e6fd25d07b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -332,6 +332,7 @@ struct amdgpu_vm {
atomic64_t  tlb_seq;
uint64_ttlb_seq_va;
uint64_t*tlb_seq_cpu_addr;
+   uint64_ttlb_fence_context;
 
atomic64_t  kfd_last_flushed_seq;
 
@@ -585,5 +586,8 @@ void amdgpu_vm_update_fault_cache(struct amdgpu_device 
*adev,
  uint64_t addr,
  uint32_t status,
  unsigned int vmhub);
+void amdgpu_vm_tlb_fence_create(struct amdgpu_device *adev,
+struct amdgpu_vm *vm,
+struct dma_fence **fence);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
new file mode 100644
index ..569681badd7c
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and th

[PATCH v3 1/3] drm/amdgpu: replace TLB seq callback with HW seq

2024-02-23 Thread Shashank Sharma
From: Christian König 

The callback we installed for the SDMA update were actually pretty
horrible. since we now have seq64 use that one and HW seq writes
instead.

V2:(Shashank)
 - rebased on amd-drm-staging-next
 - changed amdgpu_seq64_gpu_addr

Cc: Christian König 
Cc: Alex Deucher 
Cc: Felix Kuehling 
Cc: Rajneesh Bhardwaj 
Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c   | 14 
 drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 79 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  | 27 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c |  5 ++
 7 files changed, 42 insertions(+), 89 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3d0d56087d41..300dc79fa2b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
@@ -199,6 +199,20 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 va)
__clear_bit(bit_pos, adev->seq64.used);
 }
 
+/**
+ * amdgpu_seq64_gpu_addr - Calculate GPU addr from va
+ *
+ * @adev: amdgpu_device pointer
+ * @va: virtual address in client address space
+ *
+ * Calculate the GART address for a VA.
+ */
+u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va)
+{
+   return va - amdgpu_seq64_get_va_base(adev) +
+   amdgpu_bo_gpu_offset(adev->seq64.sbo);
+}
+
 /**
  * amdgpu_seq64_fini - Cleanup seq64 driver
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
index 4203b2ab318d..63e8ac0a2057 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h
@@ -43,6 +43,7 @@ void amdgpu_seq64_free(struct amdgpu_device *adev, u64 
gpu_addr);
 int amdgpu_seq64_map(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 struct amdgpu_bo_va **bo_va);
 void amdgpu_seq64_unmap(struct amdgpu_device *adev, struct amdgpu_fpriv 
*fpriv);
+u64 amdgpu_seq64_gpu_addr(struct amdgpu_device *adev, u64 va);
 
 #endif
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..0960e0a665d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -111,21 +111,6 @@ struct amdgpu_prt_cb {
struct dma_fence_cb cb;
 };
 
-/**
- * struct amdgpu_vm_tlb_seq_struct - Helper to increment the TLB flush sequence
- */
-struct amdgpu_vm_tlb_seq_struct {
-   /**
-* @vm: pointer to the amdgpu_vm structure to set the fence sequence on
-*/
-   struct amdgpu_vm *vm;
-
-   /**
-* @cb: callback
-*/
-   struct dma_fence_cb cb;
-};
-
 /**
  * amdgpu_vm_set_pasid - manage pasid and vm ptr mapping
  *
@@ -862,23 +847,6 @@ int amdgpu_vm_update_pdes(struct amdgpu_device *adev,
return r;
 }
 
-/**
- * amdgpu_vm_tlb_seq_cb - make sure to increment tlb sequence
- * @fence: unused
- * @cb: the callback structure
- *
- * Increments the tlb sequence to make sure that future CS execute a VM flush.
- */
-static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence,
-struct dma_fence_cb *cb)
-{
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
-
-   tlb_cb = container_of(cb, typeof(*tlb_cb), cb);
-   atomic64_inc(&tlb_cb->vm->tlb_seq);
-   kfree(tlb_cb);
-}
-
 /**
  * amdgpu_vm_update_range - update a range in the vm page table
  *
@@ -911,7 +879,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
   struct dma_fence **fence)
 {
struct amdgpu_vm_update_params params;
-   struct amdgpu_vm_tlb_seq_struct *tlb_cb;
struct amdgpu_res_cursor cursor;
enum amdgpu_sync_mode sync_mode;
int r, idx;
@@ -919,12 +886,6 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
if (!drm_dev_enter(adev_to_drm(adev), &idx))
return -ENODEV;
 
-   tlb_cb = kmalloc(sizeof(*tlb_cb), GFP_KERNEL);
-   if (!tlb_cb) {
-   r = -ENOMEM;
-   goto error_unlock;
-   }
-
/* Vega20+XGMI where PTEs get inadvertently cached in L2 texture cache,
 * heavy-weight flush TLB unconditionally.
 */
@@ -942,6 +903,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
params.immediate = immediate;
params.pages_addr = pages_addr;
params.unlocked = unlocked;
+   params.needs_flush = flush_tlb;
params.allow_override = allow_override;
 
/* Implicitly sync to command submissions in the same VM before
@@ -955,7 +917,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
amdgpu_vm_eviction_lock(vm);
if (vm->evicting) {
 

Re: [PATCH 1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs

2024-02-23 Thread Michel Dänzer
On 2024-02-23 10:34, Christian König wrote:
> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>> On 2024-02-23 08:06, Christian König wrote:
>>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
 From: Michel Dänzer 

 Pinning the BO storage to VRAM for scanout would make it inaccessible
 to non-P2P dma-buf importers.
>>> Thinking more about it I don't think we can do this.
>>>
>>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually 
>>> valid, you just can't do both at the same time.
>>>
>>> And if I'm not completely mistaken we actually have use cases for this at 
>>> the moment,
>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
> 
> Nope, we are basically talking about unit tests and examples for inter device 
> operations.

Sounds like the "no user-space regressions" rule might not apply then.


> Those render into a shared buffer and then display it to check if the content 
> was rendered/transferred correctly.

That can be fixed by dropping the dma-buf attachments from other GPUs before 
creating the KMS FB.

Conversely, tests / examples which do scanout first can be fixed by destroying 
KMS FBs before sharing the BO with another GPU.


> I'm not sure if we still do those test cases, the last time I looked into it 
> was before P2P was even supported, but I also can't rule it out.

Sounds too vague to block this series.


>>> So rejecting things during CS and atomic commit is the best thing we can do.
>> It's problematic for a Wayland compositor:
>>
>> The CS ioctl failing is awkward. With GL, I'm pretty sure it means the 
>> compositor would have to destroy the context and create a new one. Not sure 
>> about Vulkan, but I suspect it's painful there as well.
>>
>> Similarly for the KMS atomic commit ioctl. The compositor can't know why 
>> exactly it failed, all it gets is an error code.
> 
> Yeah, but that is not because the kernel is doing anything wrong.
> 
> Sharing, rendering and then doing an atomic commit is a perfectly valid use 
> case.
> 
> You just can't do scanout and sharing at the same time.

Per my later follow-up, Xwayland can't really avoid it.


>> And there's no other way for the compositor to detect when both things can 
>> actually work concurrently.
> 
> That I totally agree with. And IIRC we already have at least a query for the 
> buffer placement. E.g. you can already check if the BO is in GTT or VRAM and 
> shared.
> 
> What's missing is exposing if the device can scanout from GTT or not.

Requiring Wayland compositors to have driver-specific knowledge like that baked 
in isn't great either.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH] drm/amdgpu/pm: Fix the power1_min_cap value

2024-02-23 Thread Christian König

Am 23.02.24 um 10:19 schrieb Ma Jun:

It's unreasonable to use 0 as the power1_min_cap when
OD is disabled. So, use the same lower limit as the value
used when OD is enabled.

Signed-off-by: Ma Jun 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c   | 9 -
  drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 9 -
  drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 9 -
  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c| 9 -
  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c| 9 -
  5 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index 4cd43bbec910..bcad42534da4 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -1303,13 +1303,12 @@ static int arcturus_get_power_limit(struct smu_context 
*smu,
if (default_power_limit)
*default_power_limit = power_limit;
  
-	if (smu->od_enabled) {

+   if (smu->od_enabled)
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   } else {
+   else
od_percent_upper = 0;
-   od_percent_lower = 100;
-   }
+
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
  
  	dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",

od_percent_upper, 
od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 8d1d29ffb0f1..ed189a3878eb 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -2357,13 +2357,12 @@ static int navi10_get_power_limit(struct smu_context 
*smu,
*default_power_limit = power_limit;
  
  	if (smu->od_enabled &&

-   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT)) {
+   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT))
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   } else {
+   else
od_percent_upper = 0;
-   od_percent_lower = 100;
-   }
+
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
  
  	dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",

od_percent_upper, od_percent_lower, 
power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index f2f401f00ed1..a405424dd699 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -640,13 +640,12 @@ static int sienna_cichlid_get_power_limit(struct 
smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
  
-	if (smu->od_enabled) {

+   if (smu->od_enabled)
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
-   } else {
+   else
od_percent_upper = 0;
-   od_percent_lower = 100;
-   }
+
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
  
  	dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",

od_percent_upper, od_percent_lower, 
power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index 67f44f851f59..9649484f11c0 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2372,13 +2372,12 @@ static int smu_v13_0_0_get_power_limit(struct 
smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
  
-	if (smu->od_enabled) {

+   if (smu->od_enabled)
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cp

[PATCH] drm/amdgpu: disable ring_muxer if mcbp is off

2024-02-23 Thread Pierre-Eric Pelloux-Prayer
Using the ring_muxer without preemption adds overhead for no
reason since mcbp cannot be triggered.

Moving back to a single queue in this case also helps when
high priority app are used: in this case the gpu_scheduler
priority handling will work as expected - much better than
ring_muxer with its 2 independant schedulers competing for
the same hardware queue.

This change requires moving amdgpu_device_set_mcbp above
amdgpu_device_ip_early_init because we use adev->gfx.mcbp.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 21 -
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d534e192e260..40516d24026c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4054,13 +4054,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
return r;
}
 
+   amdgpu_device_set_mcbp(adev);
+
/* early init functions */
r = amdgpu_device_ip_early_init(adev);
if (r)
return r;
 
-   amdgpu_device_set_mcbp(adev);
-
/* Get rid of things like offb */
r = drm_aperture_remove_conflicting_pci_framebuffers(adev->pdev, 
&amdgpu_kms_driver);
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 169d45268ef6..f682f830f7f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2080,7 +2080,7 @@ static int gfx_v9_0_sw_init(void *handle)
ring->doorbell_index = adev->doorbell_index.gfx_ring0 << 1;
 
/* disable scheduler on the real ring */
-   ring->no_scheduler = true;
+   ring->no_scheduler = adev->gfx.mcbp;
ring->vm_hub = AMDGPU_GFXHUB(0);
r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq,
 AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP,
@@ -2090,7 +2090,7 @@ static int gfx_v9_0_sw_init(void *handle)
}
 
/* set up the software rings */
-   if (adev->gfx.num_gfx_rings) {
+   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++) {
ring = &adev->gfx.sw_gfx_ring[i];
ring->ring_obj = NULL;
@@ -2181,7 +2181,7 @@ static int gfx_v9_0_sw_fini(void *handle)
int i;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
-   if (adev->gfx.num_gfx_rings) {
+   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
amdgpu_ring_fini(&adev->gfx.sw_gfx_ring[i]);
amdgpu_ring_mux_fini(&adev->gfx.muxer);
@@ -5910,11 +5910,14 @@ static int gfx_v9_0_eop_irq(struct amdgpu_device *adev,
 
switch (me_id) {
case 0:
-   if (adev->gfx.num_gfx_rings &&
-   !amdgpu_mcbp_handle_trailing_fence_irq(&adev->gfx.muxer)) {
-   /* Fence signals are handled on the software rings*/
-   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
-   amdgpu_fence_process(&adev->gfx.sw_gfx_ring[i]);
+   if (adev->gfx.num_gfx_rings) {
+   if (!adev->gfx.mcbp) {
+   amdgpu_fence_process(&adev->gfx.gfx_ring[0]);
+   } else if 
(!amdgpu_mcbp_handle_trailing_fence_irq(&adev->gfx.muxer)) {
+   /* Fence signals are handled on the software 
rings*/
+   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
+   
amdgpu_fence_process(&adev->gfx.sw_gfx_ring[i]);
+   }
}
break;
case 1:
@@ -7051,7 +7054,7 @@ static void gfx_v9_0_set_ring_funcs(struct amdgpu_device 
*adev)
for (i = 0; i < adev->gfx.num_gfx_rings; i++)
adev->gfx.gfx_ring[i].funcs = &gfx_v9_0_ring_funcs_gfx;
 
-   if (adev->gfx.num_gfx_rings) {
+   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
adev->gfx.sw_gfx_ring[i].funcs = 
&gfx_v9_0_sw_ring_funcs_gfx;
}
-- 
2.40.1



Re: [PATCH 1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs

2024-02-23 Thread Christian König

Am 23.02.24 um 09:11 schrieb Michel Dänzer:

On 2024-02-23 08:06, Christian König wrote:

Am 22.02.24 um 18:28 schrieb Michel Dänzer:

From: Michel Dänzer 

Pinning the BO storage to VRAM for scanout would make it inaccessible
to non-P2P dma-buf importers.

Thinking more about it I don't think we can do this.

Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, 
you just can't do both at the same time.

And if I'm not completely mistaken we actually have use cases for this at the 
moment,

Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?


Nope, we are basically talking about unit tests and examples for inter 
device operations.


Those render into a shared buffer and then display it to check if the 
content was rendered/transferred correctly.


I'm not sure if we still do those test cases, the last time I looked 
into it was before P2P was even supported, but I also can't rule it out.



(As discussed on the GitLab issue, AFAICT P2P could be made to work even 
without CONFIG_DMABUF_MOVE_NOTIFY, by pinning to VRAM instead of GTT for 
dma-buf sharing)


Longer story but that is something intentionally not done.


only as fallback but it would still break existing userspace and that is a 
no-go.

I'm obviously aware of this general rule. There are exceptions though, and this 
might be one.



So rejecting things during CS and atomic commit is the best thing we can do.

It's problematic for a Wayland compositor:

The CS ioctl failing is awkward. With GL, I'm pretty sure it means the 
compositor would have to destroy the context and create a new one. Not sure 
about Vulkan, but I suspect it's painful there as well.

Similarly for the KMS atomic commit ioctl. The compositor can't know why 
exactly it failed, all it gets is an error code.


Yeah, but that is not because the kernel is doing anything wrong.

Sharing, rendering and then doing an atomic commit is a perfectly valid 
use case.


You just can't do scanout and sharing at the same time.


And there's no other way for the compositor to detect when both things can 
actually work concurrently.


That I totally agree with. And IIRC we already have at least a query for 
the buffer placement. E.g. you can already check if the BO is in GTT or 
VRAM and shared.


What's missing is exposing if the device can scanout from GTT or not.

It's just that blocking a valid use case because a special combination 
doesn't work is not going to fly I think.



Together, this means the compositor always has to assume the worst case, and do 
workarounds such as using the scanout GPU to copy from the scanout buffer to 
another buffer for access from another GPU. Even when direct access from the 
other GPU would actually work fine.


Yeah, I think we can avoid that. I'm just not sure how to do it in a 
driver agnostic way.


Regards,
Christian.


[PATCH] drm/amdgpu/pm: Fix the power1_min_cap value

2024-02-23 Thread Ma Jun
It's unreasonable to use 0 as the power1_min_cap when
OD is disabled. So, use the same lower limit as the value
used when OD is enabled.

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c   | 9 -
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 9 -
 drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 9 -
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c| 9 -
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c| 9 -
 5 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index 4cd43bbec910..bcad42534da4 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -1303,13 +1303,12 @@ static int arcturus_get_power_limit(struct smu_context 
*smu,
if (default_power_limit)
*default_power_limit = power_limit;
 
-   if (smu->od_enabled) {
+   if (smu->od_enabled)
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   } else {
+   else
od_percent_upper = 0;
-   od_percent_lower = 100;
-   }
+
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
 
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
(default power: %d)\n",
od_percent_upper, 
od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 8d1d29ffb0f1..ed189a3878eb 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -2357,13 +2357,12 @@ static int navi10_get_power_limit(struct smu_context 
*smu,
*default_power_limit = power_limit;
 
if (smu->od_enabled &&
-   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT)) {
+   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT))
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   } else {
+   else
od_percent_upper = 0;
-   od_percent_lower = 100;
-   }
+
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
 
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
(default power: %d)\n",
od_percent_upper, od_percent_lower, 
power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index f2f401f00ed1..a405424dd699 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -640,13 +640,12 @@ static int sienna_cichlid_get_power_limit(struct 
smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
 
-   if (smu->od_enabled) {
+   if (smu->od_enabled)
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
-   } else {
+   else
od_percent_upper = 0;
-   od_percent_lower = 100;
-   }
+
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
 
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
(default power: %d)\n",
od_percent_upper, od_percent_lower, 
power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index 67f44f851f59..9649484f11c0 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2372,13 +2372,12 @@ static int smu_v13_0_0_get_power_limit(struct 
smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
 
-   if (smu->od_enabled) {
+   if (smu->od_enabled)
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_0

RE: [PATCH] drm/amdgpu: reserve more memory for MES runtime DRAM

2024-02-23 Thread Zhang, Yifan
[AMD Official Use Only - General]

This patch is :

Reviewed-by: Yifan Zhang 

Best Regards,
Yifan

-Original Message-
From: Huang, Tim 
Sent: Friday, February 23, 2024 2:38 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Yifan 
; Huang, Tim 
Subject: [PATCH] drm/amdgpu: reserve more memory for MES runtime DRAM

This patch fixes a MES firmware boot failure issue when backdoor loading the 
MES firmware.

MES firmware runtime DRAM size is changed to 512k, the driver needs to reserve 
this amount of memory in FB, otherwise adjacent memory will be overwritten by 
the MES firmware startup code.

Signed-off-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 26d71a22395d..36127e204dfe 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -56,6 +56,7 @@ static int mes_v11_0_kiq_hw_init(struct amdgpu_device *adev); 
 static int mes_v11_0_kiq_hw_fini(struct amdgpu_device *adev);

 #define MES_EOP_SIZE   2048
+#define GFX_MES_DRAM_SIZE  0x8

 static void mes_v11_0_ring_set_wptr(struct amdgpu_ring *ring)  { @@ -475,7 
+476,13 @@ static int mes_v11_0_allocate_ucode_data_buffer(struct amdgpu_device 
*adev,
   le32_to_cpu(mes_hdr->mes_ucode_data_offset_bytes));
fw_size = le32_to_cpu(mes_hdr->mes_ucode_data_size_bytes);

-   r = amdgpu_bo_create_reserved(adev, fw_size,
+   if (fw_size > GFX_MES_DRAM_SIZE) {
+   dev_err(adev->dev, "PIPE%d ucode data fw size (%d) is greater 
than dram size (%d)\n",
+   pipe, fw_size, GFX_MES_DRAM_SIZE);
+   return -EINVAL;
+   }
+
+   r = amdgpu_bo_create_reserved(adev, GFX_MES_DRAM_SIZE,
  64 * 1024,
  AMDGPU_GEM_DOMAIN_VRAM |
  AMDGPU_GEM_DOMAIN_GTT,
@@ -611,8 +618,8 @@ static int mes_v11_0_load_microcode(struct amdgpu_device 
*adev,
WREG32_SOC15(GC, 0, regCP_MES_MDBASE_HI,
 upper_32_bits(adev->mes.data_fw_gpu_addr[pipe]));

-   /* Set 0x3 (256K-1) to CP_MES_MDBOUND_LO */
-   WREG32_SOC15(GC, 0, regCP_MES_MDBOUND_LO, 0x3);
+   /* Set 0x7 (512K-1) to CP_MES_MDBOUND_LO */
+   WREG32_SOC15(GC, 0, regCP_MES_MDBOUND_LO, 0x7);

if (prime_icache) {
/* invalidate ICACHE */
--
2.39.2



[PATCH] drm/amd/display: Remove duplicated function signature from dcn3.01 DCCG

2024-02-23 Thread David Tadokoro
In the header file dc/dcn301/dcn301_dccg.h, the function dccg301_create
is declared twice, so remove duplication.

Signed-off-by: David Tadokoro 
---
 drivers/gpu/drm/amd/display/dc/dcn301/dcn301_dccg.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_dccg.h 
b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_dccg.h
index 73db962dbc03..067e49cb238e 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_dccg.h
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_dccg.h
@@ -56,10 +56,4 @@ struct dccg *dccg301_create(
const struct dccg_shift *dccg_shift,
const struct dccg_mask *dccg_mask);
 
-struct dccg *dccg301_create(
-   struct dc_context *ctx,
-   const struct dccg_registers *regs,
-   const struct dccg_shift *dccg_shift,
-   const struct dccg_mask *dccg_mask);
-
 #endif //__DCN301_DCCG_H__
-- 
2.39.2



[INFO] mutter is a mess, wayland_xwayland

2024-02-23 Thread __- -__
Hi,

MUTTER_SYNC=0/1 complains, client/server and both?

example:
xorg_xfree_server((mutter_sync[0]) +
steam_linux_client(listening_local_server)) + game(x)
xorg_xfree_server((mutter_sync[1]) +
steam_linux_client(listening_local_server)) + game (x)

wayland_mutter_sync((0) steam_linux_client(listening_local_server))
wayland_mutter_sync((1) steam_linux_client(listening_local_server))

... sorry not continous ...

I think is simpler coz:

 cases,
MUTTER_SYNC=client_sync_focus
MUTTER_SYNC=server_sync_focus
MUTTER_SYNC=csc (clients, servers[wayland_xwayland] and clients) <- this is
most expecting ...

What you think?


Re: [PATCH 1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs

2024-02-23 Thread Michel Dänzer
On 2024-02-23 09:11, Michel Dänzer wrote:
> On 2024-02-23 08:06, Christian König wrote:
>>
>> So rejecting things during CS and atomic commit is the best thing we can do.
> 
> It's problematic for a Wayland compositor:
> 
> The CS ioctl failing is awkward. With GL, I'm pretty sure it means the 
> compositor would have to destroy the context and create a new one. Not sure 
> about Vulkan, but I suspect it's painful there as well.
> 
> Similarly for the KMS atomic commit ioctl. The compositor can't know why 
> exactly it failed, all it gets is an error code.
> 
> And there's no other way for the compositor to detect when both things can 
> actually work concurrently.
> 
> Together, this means the compositor always has to assume the worst case, and 
> do workarounds such as using the scanout GPU to copy from the scanout buffer 
> to another buffer for access from another GPU. Even when direct access from 
> the other GPU would actually work fine.

It's worse for Xwayland:

It can't know if/when the Wayland compositor uses a buffer for scanout. It 
would have to assume the worst case whenever a buffer is owned by the 
compositor.

Except Xwayland can't know which GPU a buffer is originally from. So it can't 
know when the workaround is actually needed, or which GPU it has to use for the 
workaround.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



[PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling

2024-02-23 Thread Tao Zhou
Let kfd interrupt handler process it.

v2: return 0 instead of 1 for fed error.
drop the usage of strcmp in interrupt handler.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 001e96d89cd7..09364817ae97 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -552,7 +552,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device 
*adev,
 {
bool retry_fault = !!(entry->src_data[1] & 0x80);
bool write_fault = !!(entry->src_data[1] & 0x20);
-   uint32_t status = 0, cid = 0, rw = 0;
+   uint32_t status = 0, cid = 0, rw = 0, fed = 0;
struct amdgpu_task_info task_info;
struct amdgpu_vmhub *hub;
const char *mmhub_cid;
@@ -663,6 +663,14 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device 
*adev,
status = RREG32(hub->vm_l2_pro_fault_status);
cid = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, CID);
rw = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, RW);
+   fed = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, FED);
+
+   /* for gfx fed error, kfd will handle it, return directly */
+   if (fed && amdgpu_ras_is_poison_mode_supported(adev) &&
+   (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2)) &&
+   (vmhub < AMDGPU_MMHUB0_START))
+   return 0;
+
WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
 #ifdef HAVE_STRUCT_XARRAY
amdgpu_vm_update_fault_cache(adev, entry->pasid, addr, status, vmhub);
-- 
2.34.1



[PATCH 4/5] amd/amdkfd: get node id for query_utcl2_poison_status

2024-02-23 Thread Tao Zhou
Obtain it from ring entry.

v2: replace node id with logical xcc id.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 14 --
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c  | 14 --
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
index 9a06c6fb6605..a8e76287dde0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
@@ -367,10 +367,20 @@ static void event_interrupt_wq_v10(struct kfd_node *dev,
   client_id == SOC15_IH_CLIENTID_UTCL2) {
struct kfd_vm_fault_info info = {0};
uint16_t ring_id = SOC15_RING_ID_FROM_IH_ENTRY(ih_ring_entry);
+   uint32_t node_id = SOC15_NODEID_FROM_IH_ENTRY(ih_ring_entry);
+   uint32_t vmid_type = 
SOC15_VMID_TYPE_FROM_IH_ENTRY(ih_ring_entry);
+   int xcc_id = 0;
struct kfd_hsa_memory_exception_data exception_data;
 
-   if (client_id == SOC15_IH_CLIENTID_UTCL2 &&
-   
amdgpu_amdkfd_ras_query_utcl2_poison_status(dev->adev)) {
+   if (!vmid_type && dev->adev->gfx.funcs->ih_node_to_logical_xcc) 
{
+   xcc_id = 
dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev->adev,
+   node_id);
+   if (xcc_id < 0)
+   xcc_id = 0;
+   }
+
+   if (client_id == SOC15_IH_CLIENTID_UTCL2 && !vmid_type &&
+   amdgpu_amdkfd_ras_query_utcl2_poison_status(dev->adev, 
xcc_id)) {
event_interrupt_poison_consumption(dev, pasid, 
client_id);
return;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index 91dd5e045b51..ff7392336795 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -413,10 +413,20 @@ static void event_interrupt_wq_v9(struct kfd_node *dev,
   client_id == SOC15_IH_CLIENTID_UTCL2) {
struct kfd_vm_fault_info info = {0};
uint16_t ring_id = SOC15_RING_ID_FROM_IH_ENTRY(ih_ring_entry);
+   uint32_t node_id = SOC15_NODEID_FROM_IH_ENTRY(ih_ring_entry);
+   uint32_t vmid_type = 
SOC15_VMID_TYPE_FROM_IH_ENTRY(ih_ring_entry);
+   int xcc_id = 0;
struct kfd_hsa_memory_exception_data exception_data;
 
-   if (client_id == SOC15_IH_CLIENTID_UTCL2 &&
-   amdgpu_amdkfd_ras_query_utcl2_poison_status(dev->adev)) {
+   if (!vmid_type && dev->adev->gfx.funcs->ih_node_to_logical_xcc) 
{
+   xcc_id = 
dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev->adev,
+   node_id);
+   if (xcc_id < 0)
+   xcc_id = 0;
+   }
+
+   if (client_id == SOC15_IH_CLIENTID_UTCL2 && !vmid_type &&
+   amdgpu_amdkfd_ras_query_utcl2_poison_status(dev->adev, 
xcc_id)) {
event_interrupt_poison_consumption_v9(dev, pasid, 
client_id);
return;
}
-- 
2.34.1



[PATCH 3/5] drm/amdgpu: retire gfx ras query_utcl2_poison_status

2024-02-23 Thread Tao Zhou
Replace it with related interface in gfxhub functions.

v2: replace node id with xcc id.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  7 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h|  1 -
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c| 12 
 4 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index f759a42def59..817464ee6a01 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -776,10 +776,11 @@ int amdgpu_amdkfd_send_close_event_drain_irq(struct 
amdgpu_device *adev,
return 0;
 }
 
-bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev)
+bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev,
+   int xcc_id)
 {
-   if (adev->gfx.ras && adev->gfx.ras->query_utcl2_poison_status)
-   return adev->gfx.ras->query_utcl2_poison_status(adev);
+   if (adev->gfxhub.funcs->query_utcl2_poison_status)
+   return adev->gfxhub.funcs->query_utcl2_poison_status(adev, 
xcc_id);
else
return false;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index b2990470d1c6..47d30f67f2b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -404,7 +404,8 @@ void amdgpu_amdkfd_ras_poison_consumption_handler(struct 
amdgpu_device *adev,
 bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem 
*mem);
 void amdgpu_amdkfd_block_mmu_notifications(void *p);
 int amdgpu_amdkfd_criu_resume(void *p);
-bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
+bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev,
+   int xcc_id);
 int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
uint64_t size, u32 alloc_flag, int8_t xcp_id);
 void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index a36d153b2ff7..9fa580b85417 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -268,7 +268,6 @@ struct amdgpu_cu_info {
 struct amdgpu_gfx_ras {
struct amdgpu_ras_block_object  ras_block;
void (*enable_watchdog_timer)(struct amdgpu_device *adev);
-   bool (*query_utcl2_poison_status)(struct amdgpu_device *adev);
int (*rlc_gc_fed_irq)(struct amdgpu_device *adev,
struct amdgpu_irq_src *source,
struct amdgpu_iv_entry *entry);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
index 0d5b4133fdf7..e3ed568eaacc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
@@ -1909,18 +1909,7 @@ static void gfx_v9_4_2_reset_sq_timeout_status(struct 
amdgpu_device *adev)
mutex_unlock(&adev->grbm_idx_mutex);
 }
 
-static bool gfx_v9_4_2_query_uctl2_poison_status(struct amdgpu_device *adev)
-{
-   u32 status = 0;
-   struct amdgpu_vmhub *hub;
-
-   hub = &adev->vmhub[AMDGPU_GFXHUB(0)];
-   status = RREG32(hub->vm_l2_pro_fault_status);
-   /* reset page fault status */
-   WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
 
-   return REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, FED);
-}
 
 struct amdgpu_ras_block_hw_ops  gfx_v9_4_2_ras_ops = {
.query_ras_error_count = &gfx_v9_4_2_query_ras_error_count,
@@ -1934,5 +1923,4 @@ struct amdgpu_gfx_ras gfx_v9_4_2_ras = {
.hw_ops = &gfx_v9_4_2_ras_ops,
},
.enable_watchdog_timer = &gfx_v9_4_2_enable_watchdog_timer,
-   .query_utcl2_poison_status = gfx_v9_4_2_query_uctl2_poison_status,
 };
-- 
2.34.1



[PATCH 1/5] drm/amdgpu: add new bit definitions for GC 9.0 PROTECTION_FAULT_STATUS

2024-02-23 Thread Tao Zhou
Add UCE and FED bit definitions.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
index efc16ddf274a..2dfa0e5b1aa3 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
@@ -6822,6 +6822,8 @@
 #define VM_L2_PROTECTION_FAULT_STATUS__VMID__SHIFT 
   0x14
 #define VM_L2_PROTECTION_FAULT_STATUS__VF__SHIFT   
   0x18
 #define VM_L2_PROTECTION_FAULT_STATUS__VFID__SHIFT 
   0x19
+#define VM_L2_PROTECTION_FAULT_STATUS__UCE__SHIFT  
   0x1d
+#define VM_L2_PROTECTION_FAULT_STATUS__FED__SHIFT  
   0x1e
 #define VM_L2_PROTECTION_FAULT_STATUS__MORE_FAULTS_MASK
   0x0001L
 #define VM_L2_PROTECTION_FAULT_STATUS__WALKER_ERROR_MASK   
   0x000EL
 #define VM_L2_PROTECTION_FAULT_STATUS__PERMISSION_FAULTS_MASK  
   0x00F0L
@@ -6832,6 +6834,8 @@
 #define VM_L2_PROTECTION_FAULT_STATUS__VMID_MASK   
   0x00F0L
 #define VM_L2_PROTECTION_FAULT_STATUS__VF_MASK 
   0x0100L
 #define VM_L2_PROTECTION_FAULT_STATUS__VFID_MASK   
   0x1E00L
+#define VM_L2_PROTECTION_FAULT_STATUS__UCE_MASK
   0x2000L
+#define VM_L2_PROTECTION_FAULT_STATUS__FED_MASK
   0x4000L
 //VM_L2_PROTECTION_FAULT_ADDR_LO32
 #define VM_L2_PROTECTION_FAULT_ADDR_LO32__LOGICAL_PAGE_ADDR_LO32__SHIFT
   0x0
 #define VM_L2_PROTECTION_FAULT_ADDR_LO32__LOGICAL_PAGE_ADDR_LO32_MASK  
   0xL
-- 
2.34.1



[PATCH 2/5] drm/amdgpu: add utcl2 poison query for gfxhub

2024-02-23 Thread Tao Zhou
Implement it for gfxhub 1.0 and 1.2.

v2: input logical xcc id for poison query interface.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c   | 17 +
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c   | 15 +++
 3 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
index c7b44aeb671b..103a837ccc71 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfxhub.h
@@ -38,6 +38,8 @@ struct amdgpu_gfxhub_funcs {
void (*mode2_save_regs)(struct amdgpu_device *adev);
void (*mode2_restore_regs)(struct amdgpu_device *adev);
void (*halt)(struct amdgpu_device *adev);
+   bool (*query_utcl2_poison_status)(struct amdgpu_device *adev,
+   int xcc_id);
 };
 
 struct amdgpu_gfxhub {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 22175da0e16a..d200310d1731 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -443,6 +443,22 @@ static void gfxhub_v1_0_init(struct amdgpu_device *adev)
mmVM_INVALIDATE_ENG0_ADDR_RANGE_LO32;
 }
 
+static bool gfxhub_v1_0_query_utcl2_poison_status(struct amdgpu_device *adev,
+   int xcc_id)
+{
+   u32 status = 0;
+   struct amdgpu_vmhub *hub;
+
+   if (amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(9, 4, 2))
+   return false;
+
+   hub = &adev->vmhub[AMDGPU_GFXHUB(0)];
+   status = RREG32(hub->vm_l2_pro_fault_status);
+   /* reset page fault status */
+   WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
+
+   return REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, FED);
+}
 
 const struct amdgpu_gfxhub_funcs gfxhub_v1_0_funcs = {
.get_mc_fb_offset = gfxhub_v1_0_get_mc_fb_offset,
@@ -452,4 +468,5 @@ const struct amdgpu_gfxhub_funcs gfxhub_v1_0_funcs = {
.set_fault_enable_default = gfxhub_v1_0_set_fault_enable_default,
.init = gfxhub_v1_0_init,
.get_xgmi_info = gfxhub_v1_1_get_xgmi_info,
+   .query_utcl2_poison_status = gfxhub_v1_0_query_utcl2_poison_status,
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
index 49aecdcee006..77df8c9cbad2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
@@ -620,6 +620,20 @@ static int gfxhub_v1_2_get_xgmi_info(struct amdgpu_device 
*adev)
return 0;
 }
 
+static bool gfxhub_v1_2_query_utcl2_poison_status(struct amdgpu_device *adev,
+   int xcc_id)
+{
+   u32 fed, status;
+
+   status = RREG32_SOC15(GC, GET_INST(GC, xcc_id), 
regVM_L2_PROTECTION_FAULT_STATUS);
+   fed = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, FED);
+   /* reset page fault status */
+   WREG32_P(SOC15_REG_OFFSET(GC, GET_INST(GC, xcc_id),
+   regVM_L2_PROTECTION_FAULT_STATUS), 1, ~1);
+
+   return fed;
+}
+
 const struct amdgpu_gfxhub_funcs gfxhub_v1_2_funcs = {
.get_mc_fb_offset = gfxhub_v1_2_get_mc_fb_offset,
.setup_vm_pt_regs = gfxhub_v1_2_setup_vm_pt_regs,
@@ -628,6 +642,7 @@ const struct amdgpu_gfxhub_funcs gfxhub_v1_2_funcs = {
.set_fault_enable_default = gfxhub_v1_2_set_fault_enable_default,
.init = gfxhub_v1_2_init,
.get_xgmi_info = gfxhub_v1_2_get_xgmi_info,
+   .query_utcl2_poison_status = gfxhub_v1_2_query_utcl2_poison_status,
 };
 
 static int gfxhub_v1_2_xcp_resume(void *handle, uint32_t inst_mask)
-- 
2.34.1



Re: [PATCH 1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs

2024-02-23 Thread Michel Dänzer
On 2024-02-23 08:06, Christian König wrote:
> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>> From: Michel Dänzer 
>>
>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>> to non-P2P dma-buf importers.
> 
> Thinking more about it I don't think we can do this.
> 
> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually 
> valid, you just can't do both at the same time.
> 
> And if I'm not completely mistaken we actually have use cases for this at the 
> moment,

Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?

(As discussed on the GitLab issue, AFAICT P2P could be made to work even 
without CONFIG_DMABUF_MOVE_NOTIFY, by pinning to VRAM instead of GTT for 
dma-buf sharing)


> only as fallback but it would still break existing userspace and that is a 
> no-go.

I'm obviously aware of this general rule. There are exceptions though, and this 
might be one.


> So rejecting things during CS and atomic commit is the best thing we can do.

It's problematic for a Wayland compositor:

The CS ioctl failing is awkward. With GL, I'm pretty sure it means the 
compositor would have to destroy the context and create a new one. Not sure 
about Vulkan, but I suspect it's painful there as well.

Similarly for the KMS atomic commit ioctl. The compositor can't know why 
exactly it failed, all it gets is an error code.

And there's no other way for the compositor to detect when both things can 
actually work concurrently.

Together, this means the compositor always has to assume the worst case, and do 
workarounds such as using the scanout GPU to copy from the scanout buffer to 
another buffer for access from another GPU. Even when direct access from the 
other GPU would actually work fine.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer