RE: [PATCH] drm/amdgpu: fix a compiling error in old kernels

2022-09-26 Thread Chen, Guchun
Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Song, Asher  
Sent: Tuesday, September 27, 2022 11:38 AM
To: Chen, Guchun ; Cui, Flora ; Shi, 
Leslie ; Ma, Jun ; 
amd-gfx@lists.freedesktop.org
Cc: Song, Asher 
Subject: [PATCH] drm/amdgpu: fix a compiling error in old kernels

This patch is used to fix following compiling error that occurs in some old 
kernels.

error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {

Signed-off-by: Asher Song 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 772ad200c5da..c91fcde7a16c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -1405,8 +1405,9 @@ void dcn32_update_dsc_pg(struct dc *dc,
bool safe_to_disable)
 {
struct dce_hwseq *hws = dc->hwseq;
+   int i;
 
-   for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
+   for (i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
struct display_stream_compressor *dsc = dc->res_pool->dscs[i];
bool is_dsc_ungated = hws->funcs.dsc_pg_status(hws, dsc->inst);
 
--
2.25.1



[PATCH] drm/amdgpu: fix a compiling error in old kernels

2022-09-26 Thread Asher Song
This patch is used to fix following compiling error that occurs in some
old kernels.

error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {

Signed-off-by: Asher Song 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 772ad200c5da..c91fcde7a16c 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -1405,8 +1405,9 @@ void dcn32_update_dsc_pg(struct dc *dc,
bool safe_to_disable)
 {
struct dce_hwseq *hws = dc->hwseq;
+   int i;
 
-   for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
+   for (i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
struct display_stream_compressor *dsc = dc->res_pool->dscs[i];
bool is_dsc_ungated = hws->funcs.dsc_pg_status(hws, dsc->inst);
 
-- 
2.25.1



RE: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-26 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>I need more time for an in deep review of this, but form the one mile high 
>view it looks correct to me now.

>Can we do some pre-commit qa testing with this?

I changed drm test "Command submission Test (GFX)" to send high priority ibs 
meanwhile running Manhattan on Screen/Unigine heaven foreground, checking 
mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info

I have continued running this scenario for 2 daytime and 1 night, no hangs 
happen yet(lots of hangs has been fixed in the previous patches).

I will ask QA team to do more test.

Thanks,
JIadong

-Original Message-
From: Christian König 
Sent: Monday, September 26, 2022 2:49 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


Am 23.09.22 um 15:16 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> Trigger Mid-Command Buffer Preemption according to the priority of the
> software rings and the hw fence signalling condition.
>
> The muxer saves the locations of the indirect buffer frames from the
> software ring together with the fence sequence number in its fifo
> queue, and pops out those records when the fences are signalled. The
> locations are used to resubmit packages in preemption scenarios by coping the 
> chunks from the software ring.
>
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
> v6: Refactor functions for resubmission, calling fence_process in irq handler.
>
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Acked-by: Luben Tuikov 
> Signed-off-by: Jiadong.Zhu 

I need more time for an in deep review of this, but form the one mile high view 
it looks correct to me now.

Can we do some pre-commit qa testing with this?

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  10 +-
>   8 files changed, 368 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 258cffe3c06a..af86d87e2f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   }
>   }
>
> + amdgpu_ring_ib_begin(ring);
>   if (job && ring->funcs->init_cond_exec)
>   patch_offset = amdgpu_ring_init_cond_exec(ring);
>
> @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
>   ring->funcs->emit_wave_limit(ring, false);
>
> + amdgpu_ring_ib_end(ring);
>   amdgpu_ring_commit(ring);
>   return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 13db99d653bd..84b0b3c7d40f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -33,6 +33,7 @@
>
>   #include 
>   #include "amdgpu.h"
> +#include "amdgpu_sw_ring.h"
>   #include "atom.h"
>
>   /*
> @@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring
> *ring)
>
>   return mqd_mgr->init_mqd(adev, ring->mqd_ptr, );
>   }
> +
> +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) {
> + if (ring->is_sw_ring)
> + amdgpu_sw_ring_ib_begin(ring); }
> +
> +void amdgpu_ring_ib_end(struct amdgpu_ring *ring) {
> + if (ring->is_sw_ring)
> + amdgpu_sw_ring_ib_end(ring); }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index e90d327a589e..6fbc1627dab7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -312,6 +312,9 @@ struct amdgpu_ring {
>   #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
>
>   int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
> +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring); void
> +amdgpu_ring_ib_end(struct amdgpu_ring *ring);
> +
>   void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
>   void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib 
> *ib);
>   void amdgpu_ring_commit(struct amdgpu_ring *ring); diff --git

RE: [PATCH] drm/amdgpu: add rlc_sr_cntl_list to firmware array

2022-09-26 Thread Gao, Likun
[AMD Official Use Only - General]

Reviewed-by: Likun Gao .

Regards,
Likun

-Original Message-
From: Zhang, Hawking  
Sent: Tuesday, September 27, 2022 10:31 AM
To: amd-gfx@lists.freedesktop.org; Gao, Likun ; Deucher, 
Alexander 
Cc: Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: add rlc_sr_cntl_list to firmware array

To allow upload the list via psp

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 13675b3aa218..792333206362 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -359,6 +359,14 @@ static void amdgpu_gfx_rlc_init_microcode_v2_1(struct 
amdgpu_device *adev)
le32_to_cpu(rlc_hdr->reg_list_format_direct_reg_list_length);
 
if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   if (adev->gfx.rlc.save_restore_list_cntl_size_bytes) {
+   info = 
>firmware.ucode[AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL];
+   info->ucode_id = AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(adev->gfx.rlc.save_restore_list_cntl_size_bytes, PAGE_SIZE);
+   }
+
if (adev->gfx.rlc.save_restore_list_gpm_size_bytes) {
info = 
>firmware.ucode[AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM];
info->ucode_id = 
AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM;
-- 
2.17.1


RE: [PATCH] drm/amdgpu: make rlc fw loading backward compatible with fw

2022-09-26 Thread Gao, Likun
[AMD Official Use Only - General]

Reviewed-by: Likun Gao .

Regards,
Likun

-Original Message-
From: Zhang, Hawking  
Sent: Tuesday, September 27, 2022 10:30 AM
To: amd-gfx@lists.freedesktop.org; Gao, Likun ; Deucher, 
Alexander 
Cc: Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: make rlc fw loading backward compatible with fw

To allow kernel change
drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode backward compatible 
with old rlc firmware

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 792333206362..012b72d00e04 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -537,9 +537,9 @@ int amdgpu_gfx_rlc_init_microcode(struct amdgpu_device 
*adev,
amdgpu_gfx_rlc_init_microcode_v2_1(adev);
if (version_minor >= 2)
amdgpu_gfx_rlc_init_microcode_v2_2(adev);
-   if (version_minor >= 3)
+   if (version_minor == 3)
amdgpu_gfx_rlc_init_microcode_v2_3(adev);
-   if (version_minor >= 4)
+   if (version_minor == 4)
amdgpu_gfx_rlc_init_microcode_v2_4(adev);
 
return 0;
--
2.17.1


[PATCH] drm/amdgpu: add rlc_sr_cntl_list to firmware array

2022-09-26 Thread Hawking Zhang
To allow upload the list via psp

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 13675b3aa218..792333206362 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -359,6 +359,14 @@ static void amdgpu_gfx_rlc_init_microcode_v2_1(struct 
amdgpu_device *adev)
le32_to_cpu(rlc_hdr->reg_list_format_direct_reg_list_length);
 
if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   if (adev->gfx.rlc.save_restore_list_cntl_size_bytes) {
+   info = 
>firmware.ucode[AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL];
+   info->ucode_id = AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(adev->gfx.rlc.save_restore_list_cntl_size_bytes, PAGE_SIZE);
+   }
+
if (adev->gfx.rlc.save_restore_list_gpm_size_bytes) {
info = 
>firmware.ucode[AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM];
info->ucode_id = 
AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM;
-- 
2.17.1



[PATCH] drm/amdgpu: make rlc fw loading backward compatible with fw

2022-09-26 Thread Hawking Zhang
To allow kernel change
drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode
backward compatible with old rlc firmware

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 792333206362..012b72d00e04 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -537,9 +537,9 @@ int amdgpu_gfx_rlc_init_microcode(struct amdgpu_device 
*adev,
amdgpu_gfx_rlc_init_microcode_v2_1(adev);
if (version_minor >= 2)
amdgpu_gfx_rlc_init_microcode_v2_2(adev);
-   if (version_minor >= 3)
+   if (version_minor == 3)
amdgpu_gfx_rlc_init_microcode_v2_3(adev);
-   if (version_minor >= 4)
+   if (version_minor == 4)
amdgpu_gfx_rlc_init_microcode_v2_4(adev);
 
return 0;
-- 
2.17.1



RE: [PATCH v3 2/5] drm/amdgpu: add new functions to set GPU power profile

2022-09-26 Thread Quan, Evan
[AMD Official Use Only - General]



> -Original Message-
> From: amd-gfx  On Behalf Of
> Shashank Sharma
> Sent: Tuesday, September 27, 2022 5:40 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Somalapuram,
> Amaranath ; Koenig, Christian
> ; Sharma, Shashank
> 
> Subject: [PATCH v3 2/5] drm/amdgpu: add new functions to set GPU power
> profile
> 
> This patch adds new functions which will allow a user to
> change the GPU power profile based a GPU workload hint
> flag.
> 
> Cc: Alex Deucher 
> Signed-off-by: Shashank Sharma 
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c  | 97
> +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  1 +
>  .../gpu/drm/amd/include/amdgpu_ctx_workload.h | 54 +++
>  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  5 +
>  5 files changed, 158 insertions(+), 1 deletion(-)
>  create mode 100644
> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
>  create mode 100644
> drivers/gpu/drm/amd/include/amdgpu_ctx_workload.h
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 5a283d12f8e1..34679c657ecc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -50,7 +50,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
>   atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
>   amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o
> amdgpu_pll.o \
> - amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
> + amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o
> amdgpu_ctx_workload.o amdgpu_sync.o \
>   amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o
> amdgpu_virt.o \
>   amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
>   amdgpu_debugfs.o amdgpu_ids.o amdgpu_gmc.o \
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
> new file mode 100644
> index ..a11cf29bc388
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
> @@ -0,0 +1,97 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
> DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
> THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +#include 
> +#include "kgd_pp_interface.h"
> +#include "amdgpu_ctx_workload.h"
> +
> +static enum PP_SMC_POWER_PROFILE
> +amdgpu_workload_to_power_profile(uint32_t hint)
> +{
> + switch (hint) {
> + case AMDGPU_CTX_WORKLOAD_HINT_NONE:
> + default:
> + return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
> +
> + case AMDGPU_CTX_WORKLOAD_HINT_3D:
> + return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
> + case AMDGPU_CTX_WORKLOAD_HINT_VIDEO:
> + return PP_SMC_POWER_PROFILE_VIDEO;
> + case AMDGPU_CTX_WORKLOAD_HINT_VR:
> + return PP_SMC_POWER_PROFILE_VR;
> + case AMDGPU_CTX_WORKLOAD_HINT_COMPUTE:
> + return PP_SMC_POWER_PROFILE_COMPUTE;
> + }
> +}
> +
> +int amdgpu_set_workload_profile(struct amdgpu_device *adev,
> + uint32_t hint)
> +{
> + int ret = 0;
> + enum PP_SMC_POWER_PROFILE profile =
> + amdgpu_workload_to_power_profile(hint);
> +
> + if (adev->pm.workload_mode == hint)
> + return 0;
> +
> + mutex_lock(>pm.smu_workload_lock);
> +
> + if (adev->pm.workload_mode == hint)
> + goto unlock;
[Quan, Evan] This seems redundant with code above. I saw you dropped this in 
Patch4.
But I kind of feel this should be the one which needs to be kept.
> +
> + ret = amdgpu_dpm_switch_power_profile(adev, profile, 1);
> + if (!ret)
> + adev->pm.workload_mode = hint;
> + atomic_inc(>pm.workload_switch_ref);
> +
> +unlock:
> + mutex_unlock(>pm.smu_workload_lock);
> + return ret;
> +}
> +

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Felix Kuehling



On 2022-09-26 17:35, Lyude Paul wrote:

On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:

When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to be left mapped in currently running
processes. This leads to a kernel crash when the pages are either freed
or accessed from the CPU because the GPU and associated data structures
and callbacks have all been freed.

Fix this by migrating any mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Signed-off-by: Alistair Popple 

---

I assume the AMD driver might have a similar issue. However I can't see
where device private (or coherent) pages actually get unmapped/freed
during teardown as I couldn't find any relevant calls to
devm_memunmap(), memunmap(), devm_release_mem_region() or
release_mem_region(). So it appears that ZONE_DEVICE pages are not being
properly freed during module unload, unless I'm missing something?

I've got no idea, will poke Ben to see if they know the answer to this


I guess we're relying on devm to release the region. Isn't the whole 
point of using devm_request_free_mem_region that we don't have to 
remember to explicitly release it when the device gets destroyed? I 
believe we had an explicit free call at some point by mistake, and that 
caused a double-free during module unload. See this commit for reference:


commit 22f4f4faf337d5fb2d2750aff13215726814273e
Author: Philip Yang 
Date:   Mon Sep 20 17:25:52 2021 -0400

drm/amdkfd: fix svm_migrate_fini warning

Device manager releases device-specific resources when a driver

disconnects from a device, devm_memunmap_pages and
devm_release_mem_region calls in svm_migrate_fini are redundant.

It causes below warning trace after patch "drm/amdgpu: Split

amdgpu_device_fini into early and late", so remove function
svm_migrate_fini.

BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718

WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795

devm_release_action+0x51/0x60
Call Trace:
? memunmap_pages+0x360/0x360
svm_migrate_fini+0x2d/0x60 [amdgpu]
kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
drm_dev_release+0x20/0x40 [drm]
release_nodes+0x196/0x1e0
device_release_driver_internal+0x104/0x1d0
driver_detach+0x47/0x90
bus_remove_driver+0x7a/0xd0
pci_unregister_driver+0x3d/0x90
amdgpu_exit+0x11/0x20 [amdgpu]

Signed-off-by: Philip Yang 

Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 

Furthermore, I guess we are assuming that nobody is using the GPU when 
the module is unloaded. As long as any processes have /dev/kfd open, you 
won't be able to unload the module (except by force-unload). I suppose 
with ZONE_DEVICE memory, we can have references to device memory pages 
even when user mode has closed /dev/kfd. We do have a cleanup handler 
that runs in an MMU-free-notifier. In theory that should run after all 
the pages in the mm_struct have been freed. It releases all sorts of 
other device resources and needs the driver to still be there. I'm not 
sure if there is anything preventing a module unload before the 
free-notifier runs. I'll look into that.


Regards,
  Felix





---
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
  1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 66ebbd4..3b247b8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
mutex_unlock(>dmem->mutex);
  }
  
+/*

+ * Evict all pages mapping a chunk.
+ */
+void
+nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
+{
+   unsigned long i, npages = range_len(>pagemap.range) >> 
PAGE_SHIFT;
+   unsigned long *src_pfns, *dst_pfns;
+   dma_addr_t *dma_addrs;
+   struct nouveau_fence *fence;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+   dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
+   npages);
+
+   for (i = 0; i < npages; i++) {
+   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
+   struct page *dpage;
+
+   /*
+* _GFP_NOFAIL because the GPU is going away and there
+* is nothing sensible we can do if we can't copy the
+* data back.
+*/

You'll have to excuse me for a moment since this area 

[PATCH v2] drm/amdkfd: Fix UBSAN shift-out-of-bounds warning

2022-09-26 Thread Felix Kuehling
This was fixed in initialize_cpsch before, but not in initialize_nocpsch.
Factor sdma bitmap initialization into a helper function to apply the
correct implementation in both cases without duplicating it.

v2: Added a range check

Reported-by: Ellis Michael 
Signed-off-by: Felix Kuehling 
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 45 +--
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 007a3db69df1..ecb4c3abc629 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1242,6 +1242,24 @@ static void init_interrupts(struct device_queue_manager 
*dqm)
dqm->dev->kfd2kgd->init_interrupts(dqm->dev->adev, i);
 }
 
+static void init_sdma_bitmaps(struct device_queue_manager *dqm)
+{
+   unsigned int num_sdma_queues =
+   min_t(unsigned int, sizeof(dqm->sdma_bitmap)*8,
+ get_num_sdma_queues(dqm));
+   unsigned int num_xgmi_sdma_queues =
+   min_t(unsigned int, sizeof(dqm->xgmi_sdma_bitmap)*8,
+ get_num_xgmi_sdma_queues(dqm));
+
+   if (num_sdma_queues)
+   dqm->sdma_bitmap = GENMASK_ULL(num_sdma_queues-1, 0);
+   if (num_xgmi_sdma_queues)
+   dqm->xgmi_sdma_bitmap = GENMASK_ULL(num_xgmi_sdma_queues-1, 0);
+
+   dqm->sdma_bitmap &= ~get_reserved_sdma_queues_bitmap(dqm);
+   pr_info("sdma_bitmap: %llx\n", dqm->sdma_bitmap);
+}
+
 static int initialize_nocpsch(struct device_queue_manager *dqm)
 {
int pipe, queue;
@@ -1270,11 +1288,7 @@ static int initialize_nocpsch(struct 
device_queue_manager *dqm)
 
memset(dqm->vmid_pasid, 0, sizeof(dqm->vmid_pasid));
 
-   dqm->sdma_bitmap = ~0ULL >> (64 - get_num_sdma_queues(dqm));
-   dqm->sdma_bitmap &= ~(get_reserved_sdma_queues_bitmap(dqm));
-   pr_info("sdma_bitmap: %llx\n", dqm->sdma_bitmap);
-
-   dqm->xgmi_sdma_bitmap = ~0ULL >> (64 - get_num_xgmi_sdma_queues(dqm));
+   init_sdma_bitmaps(dqm);
 
return 0;
 }
@@ -1452,9 +1466,6 @@ static int set_sched_resources(struct 
device_queue_manager *dqm)
 
 static int initialize_cpsch(struct device_queue_manager *dqm)
 {
-   uint64_t num_sdma_queues;
-   uint64_t num_xgmi_sdma_queues;
-
pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm));
 
mutex_init(>lock_hidden);
@@ -1463,24 +1474,10 @@ static int initialize_cpsch(struct device_queue_manager 
*dqm)
dqm->active_cp_queue_count = 0;
dqm->gws_queue_count = 0;
dqm->active_runlist = false;
-
-   num_sdma_queues = get_num_sdma_queues(dqm);
-   if (num_sdma_queues >= BITS_PER_TYPE(dqm->sdma_bitmap))
-   dqm->sdma_bitmap = ULLONG_MAX;
-   else
-   dqm->sdma_bitmap = (BIT_ULL(num_sdma_queues) - 1);
-
-   dqm->sdma_bitmap &= ~(get_reserved_sdma_queues_bitmap(dqm));
-   pr_info("sdma_bitmap: %llx\n", dqm->sdma_bitmap);
-
-   num_xgmi_sdma_queues = get_num_xgmi_sdma_queues(dqm);
-   if (num_xgmi_sdma_queues >= BITS_PER_TYPE(dqm->xgmi_sdma_bitmap))
-   dqm->xgmi_sdma_bitmap = ULLONG_MAX;
-   else
-   dqm->xgmi_sdma_bitmap = (BIT_ULL(num_xgmi_sdma_queues) - 1);
-
INIT_WORK(>hw_exception_work, kfd_process_hw_exception);
 
+   init_sdma_bitmaps(dqm);
+
return 0;
 }
 
-- 
2.32.0



Re: [REGRESSION] Graphical issues on Lenovo Yoga 7 14ARB7 laptop since v6.0-rc1 (bisected)

2022-09-26 Thread Leo Li



Hi August, thanks for the log.

It seems the eDP panel does not fully satisfy the amdgpu requirements
for enabling PSR SU, but we're enabling it anyways.

I suspect it may be due to the "DP_FORCE_PSRSU_CAPABILITY" bit being set
in it's DPCD registers, I'm checking with some devs to see if that is
expected.

In the meantime, can you give these two hacks a spin? Let me know if
this helps with the glitches and system hangs:
https://gitlab.freedesktop.org/-/snippets/7076

Also the dmesg, in particular this line:

[drm] PSR support 1, DC PSR ver 1, sink PSR ver 3 DPCD caps 
0x70su_y_granularity 4 force_psrsu_cap **X**


Thanks,
Leo

On 2022-09-23 16:26, August Wikerfors wrote:

Hi Leo,

On 2022-09-23 20:41, Leo Li wrote:

Hi August,

Can you provide a dmesg log with drm.debug=0x16 enabled in kernel 
cmdline?
Log is available here: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffiles.augustwikerfors.se%2Fdmesg.2022-09-23.txtdata=05%7C01%7Csunpeng.li%40amd.com%7C261d31a0ac6844e40b2208da9da1ee82%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637995616061782958%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=6LAMah8N%2FdEG6gl6u9HGcajwozrS7fwp%2FJDLZMKpcGU%3Dreserved=0


This is what I did during that log:
1. Boot the system
2. Type into the password field in SDDM (this is when the problem occurs)
3. Switch to a TTY to save the log (the problem doesn't happen there)

Regards,
August Wikerfors


Re: [PATCH v3 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-26 Thread Felix Kuehling



On 2022-09-26 15:40, Philip Yang wrote:

Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory. When switching xnack mode from off to on, subsequent
free ranges allocated with xnack off will not unreserve memory when
xnack is on, cause memory accounting unbalanced.


Here you describe half the problem.




When switching xnack mode from on to off, need reserve already allocated
svm range memory because subsequent free ranges will unreserve memory
with xnack off.


But here you describe the _other_ half of the solution. That's a bit 
confusing. Maybe state the other half of the problem first and then 
explain the solution that fixes both halves of the problem.





Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 20 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 ++
  3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 56f7307c21d2..938095478707 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1594,16 +1594,36 @@ static int kfd_ioctl_set_xnack_mode(struct file *filep,
if (args->xnack_enabled >= 0) {
if (!list_empty(>pqm.queues)) {
pr_debug("Process has user queues running\n");
-   mutex_unlock(>mutex);
-   return -EBUSY;
+   r = -EBUSY;
+   goto out_unlock;
}
-   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+   if (p->xnack_enabled == args->xnack_enabled)
+   goto out_unlock;
+
+   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
r = -EPERM;
-   else
-   p->xnack_enabled = args->xnack_enabled;
+   goto out_unlock;
+   }
+
+   pr_debug("switching xnack from %d to %d\n", p->xnack_enabled,
+args->xnack_enabled);
+
+   mutex_lock(>svms.lock);


It would be cleaner to do the locking inside svm_range_list_unreserve_mem.



+
+   /* Switching to XNACK on/off, unreserve/reserve memory of all
+* svm ranges. Change xnack must be inside svms lock, to avoid
+* race with svm_range_deferred_list_work unreserve memory.
+*/
+   p->xnack_enabled = args->xnack_enabled;
+   svm_range_list_unreserve_mem(p, p->xnack_enabled);
+
+   mutex_unlock(>svms.lock);
} else {
args->xnack_enabled = p->xnack_enabled;
}
+
+out_unlock:
mutex_unlock(>mutex);
  
  	return r;

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index cf5b4005534c..5a82d5660470 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range *prange, bool 
update_mem_usage)
svm_range_free_dma_mappings(prange);
  
  	if (update_mem_usage && !p->xnack_enabled) {

-   pr_debug("unreserve mem limit: %lld\n", size);
+   pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
}
@@ -2956,6 +2956,24 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
return r;
  }
  
+void svm_range_list_unreserve_mem(struct kfd_process *p, bool unreserve)

+{
+   struct svm_range *prange;
+   uint64_t size;
+
+   list_for_each_entry(prange, >svms.list, list) {
+   size = (prange->last - prange->start + 1) << PAGE_SHIFT;
+   pr_debug("svms 0x%p %s prange 0x%p size 0x%llx\n", >svms,
+unreserve ? "unreserve" : "reserve", prange, size);
+   if (unreserve)
+   amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+   
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+   else
+   amdgpu_amdkfd_reserve_mem_limit(NULL, size,
+   
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);


You are assuming that this will succeed. If it fails, you will end up 
with unbalanced accounting. You'll need to detect failures and roll back 
any reservations when a failure happens. Then fail the entire XNACK mode 
change.




+   }
+}
+
  void svm_range_list_fini(struct kfd_process *p)
  {
struct svm_range *prange;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 012c53729516..05a2135cd56e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ 

Re: [PATCH] drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV

2022-09-26 Thread Alex Deucher
On Mon, Sep 26, 2022 at 4:21 PM Bokun Zhang  wrote:
>
> - Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
>   during the suspend time. Furthermore, we cannot request a
>   mode 1 reset under SRIOV as VF. Therefore, we will skip it
>   as it is called in suspend_noirq() function.
>
> - In the resume code path, we need to send REQ_GPU_INIT to the
>   hypervisor and also resume PSP IP block under SRIOV.
>
> Signed-off-by: Bokun Zhang 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  4 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 +-
>  2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index c5fad52c649d..a5aee19ca30e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -1057,6 +1057,10 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device 
> *adev)
>  {
> if (adev->flags & AMD_IS_APU)
> return false;
> +
> +   if (amdgpu_sriov_vf(adev))
> +   return false;
> +
>  #ifdef HAVE_PM_SUSPEND_TARGET_STATE
> return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
>  #else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index fb09dc32b4c0..c5c94ebd3d57 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3176,7 +3176,8 @@ static int amdgpu_device_ip_resume_phase1(struct 
> amdgpu_device *adev)
> continue;
> if (adev->ip_blocks[i].version->type == 
> AMD_IP_BLOCK_TYPE_COMMON ||
> adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC 
> ||
> -   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_IH) 
> {
> +   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_IH 
> ||
> +   (adev->ip_blocks[i].version->type == 
> AMD_IP_BLOCK_TYPE_PSP && amdgpu_sriov_vf(adev))) {
>
> r = adev->ip_blocks[i].version->funcs->resume(adev);
> if (r) {
> @@ -4120,12 +4121,20 @@ static void amdgpu_device_evict_resources(struct 
> amdgpu_device *adev)
>  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
>  {
> struct amdgpu_device *adev = drm_to_adev(dev);
> +   int r = 0;
>
> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> return 0;
>
> adev->in_suspend = true;
>
> +   if (amdgpu_sriov_vf(adev)) {
> +   amdgpu_virt_fini_data_exchange(adev);
> +   r = amdgpu_virt_request_full_gpu(adev, false);
> +   if (r)
> +   return r;
> +   }
> +
> if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3))
> DRM_WARN("smart shift update failed\n");
>
> @@ -4153,6 +4162,9 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
> fbcon)
>
> amdgpu_device_ip_suspend_phase2(adev);
>
> +   if (amdgpu_sriov_vf(adev))
> +   amdgpu_virt_release_full_gpu(adev, false);
> +
> return 0;
>  }
>
> @@ -4171,6 +4183,12 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> fbcon)
> struct amdgpu_device *adev = drm_to_adev(dev);
> int r = 0;
>
> +   if (amdgpu_sriov_vf(adev)) {
> +   r = amdgpu_virt_request_full_gpu(adev, true);
> +   if (r)
> +   return r;
> +   }
> +
> if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> return 0;
>
> @@ -4185,6 +4203,13 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
> fbcon)
> }
>
> r = amdgpu_device_ip_resume(adev);
> +
> +   /* no matter what r is, always need to properly release full GPU */
> +   if (amdgpu_sriov_vf(adev)) {
> +   amdgpu_virt_init_data_exchange(adev);
> +   amdgpu_virt_release_full_gpu(adev, true);
> +   }
> +
> if (r) {
> dev_err(adev->dev, "amdgpu_device_ip_resume failed (%d).\n", 
> r);
> return r;
> --
> 2.25.1
>


[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-26 Thread Shashank Sharma
This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 5e53a5293935..1caed319a448 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -34,6 +34,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_umc.h"
 #include "amdgpu_reset.h"
+#include "amdgpu_ctx_workload.h"
 
 /* Total memory size in system memory and all GPU VRAM. Used to
  * estimate worst case amount of memory to reserve for page tables
@@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
 
 void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle)
 {
-   amdgpu_dpm_switch_power_profile(adev,
-   PP_SMC_POWER_PROFILE_COMPUTE,
-   !idle);
+   int ret;
+
+   if (idle)
+   ret = amdgpu_clear_workload_profile(adev, 
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+   else
+   ret = amdgpu_set_workload_profile(adev, 
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+
+   if (ret)
+   drm_warn(>ddev, "Failed to %s power profile to compute 
mode\n",
+idle ? "reset" : "set");
 }
 
 bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
-- 
2.34.1



[PATCH v3 4/5] drm/amdgpu: switch GPU workload profile

2022-09-26 Thread Shashank Sharma
This patch and switches the GPU workload based profile based
on the workload hint information saved in the workload context.
The workload profile is reset to NONE when the job is done.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 15 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  3 +++
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b7bae833c804..de906a42144f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -237,6 +237,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
goto free_all_kdata;
}
 
+   p->job->workload_mode = p->ctx->workload_mode;
+
if (p->uf_entry.tv.bo)
p->job->uf_addr = uf_offset;
kvfree(chunk_array);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
index a11cf29bc388..625114804121 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
@@ -55,15 +55,11 @@ int amdgpu_set_workload_profile(struct amdgpu_device *adev,
 
mutex_lock(>pm.smu_workload_lock);
 
-   if (adev->pm.workload_mode == hint)
-   goto unlock;
-
ret = amdgpu_dpm_switch_power_profile(adev, profile, 1);
if (!ret)
adev->pm.workload_mode = hint;
atomic_inc(>pm.workload_switch_ref);
 
-unlock:
mutex_unlock(>pm.smu_workload_lock);
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index c2fd6f3076a6..9300e86ee7c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -30,6 +30,7 @@
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_reset.h"
+#include "amdgpu_ctx_workload.h"
 
 static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 {
@@ -144,6 +145,14 @@ void amdgpu_job_free_resources(struct amdgpu_job *job)
 static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
 {
struct amdgpu_job *job = to_amdgpu_job(s_job);
+   struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
+
+   if (job->workload_mode != AMDGPU_CTX_WORKLOAD_HINT_NONE) {
+   if (amdgpu_clear_workload_profile(ring->adev, 
job->workload_mode))
+   DRM_WARN("Failed to come out of workload profile %s\n",
+   
amdgpu_workload_profile_name(job->workload_mode));
+   job->workload_mode = AMDGPU_CTX_WORKLOAD_HINT_NONE;
+   }
 
drm_sched_job_cleanup(s_job);
 
@@ -256,6 +265,12 @@ static struct dma_fence *amdgpu_job_run(struct 
drm_sched_job *sched_job)
DRM_ERROR("Error scheduling IBs (%d)\n", r);
}
 
+   if (job->workload_mode != AMDGPU_CTX_WORKLOAD_HINT_NONE) {
+   if (amdgpu_set_workload_profile(ring->adev, job->workload_mode))
+   DRM_WARN("Failed to set workload profile to %s\n",
+ 
amdgpu_workload_profile_name(job->workload_mode));
+   }
+
job->job_run_counter++;
amdgpu_job_free_resources(job);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index babc0af751c2..573e8692c814 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -68,6 +68,9 @@ struct amdgpu_job {
/* job_run_counter >= 1 means a resubmit job */
uint32_tjob_run_counter;
 
+   /* workload mode hint for pm */
+   uint32_tworkload_mode;
+
uint32_tnum_ibs;
struct amdgpu_ibibs[];
 };
-- 
2.34.1



[PATCH v3 3/5] drm/amdgpu: set GPU workload via ctx IOCTL

2022-09-26 Thread Shashank Sharma
This patch adds new IOCTL flags in amdgpu_context_IOCTL to set
GPU workload profile. These calls will allow a user to switch
to a GPU power profile which might be better suitable to its
workload type. The currently supported workload types are:
"None": Default workload profile
"3D": Workload profile for 3D rendering work
"Video": Workload profile for Media/Encode/Decode work
"VR": Workload profile for VR rendering work
"Compute": Workload profile for Compute work

The workload hint flag is saved in GPU context, and then its
applied when we actually run the job.

V3: Create only set_workload interface, there is no need for
get_workload (Christian)

Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 42 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 +
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 8ee4e8491f39..937c294f8d84 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -27,6 +27,7 @@
 #include "amdgpu.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_ras.h"
+#include "amdgpu_ctx_workload.h"
 #include 
 
 #define to_amdgpu_ctx_entity(e)\
@@ -328,7 +329,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, 
int32_t priority,
return r;
 
ctx->stable_pstate = current_stable_pstate;
-
+   ctx->workload_mode = AMDGPU_CTX_WORKLOAD_HINT_NONE;
return 0;
 }
 
@@ -633,11 +634,34 @@ static int amdgpu_ctx_stable_pstate(struct amdgpu_device 
*adev,
return r;
 }
 
+static int amdgpu_ctx_set_workload_profile(struct amdgpu_device *adev,
+  struct amdgpu_fpriv *fpriv, uint32_t id,
+  u32 workload_hint)
+{
+   struct amdgpu_ctx *ctx;
+   struct amdgpu_ctx_mgr *mgr;
+
+   if (!fpriv)
+   return -EINVAL;
+
+   mgr = >ctx_mgr;
+   mutex_lock(>lock);
+   ctx = idr_find(>ctx_handles, id);
+   if (!ctx) {
+   mutex_unlock(>lock);
+   return -EINVAL;
+   }
+
+   ctx->workload_mode = workload_hint;
+   mutex_unlock(>lock);
+   return 0;
+}
+
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 struct drm_file *filp)
 {
int r;
-   uint32_t id, stable_pstate;
+   uint32_t id, stable_pstate, wl_hint;
int32_t priority;
 
union drm_amdgpu_ctx *args = data;
@@ -681,6 +705,20 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
return -EINVAL;
r = amdgpu_ctx_stable_pstate(adev, fpriv, id, true, 
_pstate);
break;
+   case AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE:
+   if (args->in.flags & ~AMDGPU_CTX_WORKLOAD_HINT_MASK)
+   return -EINVAL;
+   wl_hint = args->in.flags & AMDGPU_CTX_WORKLOAD_HINT_MASK;
+   if (wl_hint > AMDGPU_CTX_WORKLOAD_HINT_MAX)
+   return -EINVAL;
+   r = amdgpu_ctx_set_workload_profile(adev, fpriv, id, wl_hint);
+   if (r)
+   DRM_ERROR("Failed to set workload profile to %s\n",
+   amdgpu_workload_profile_name(wl_hint));
+   else
+   DRM_DEBUG_DRIVER("Workload profile set to %s\n",
+   amdgpu_workload_profile_name(wl_hint));
+   break;
default:
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index cc7c8afff414..6c8032c3291a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -58,6 +58,7 @@ struct amdgpu_ctx {
unsigned long   ras_counter_ce;
unsigned long   ras_counter_ue;
uint32_tstable_pstate;
+   uint32_tworkload_mode;
 };
 
 struct amdgpu_ctx_mgr {
-- 
2.34.1



[PATCH v3 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2022-09-26 Thread Shashank Sharma
Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

V3: Create only set() workload UAPI (Christian)

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..23d354242699 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,7 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_OP_QUERY_STATE2 4
 #define AMDGPU_CTX_OP_GET_STABLE_PSTATE5
 #define AMDGPU_CTX_OP_SET_STABLE_PSTATE6
+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE 7
 
 /* GPU reset status */
 #define AMDGPU_CTX_NO_RESET0
@@ -252,6 +253,17 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
 #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
 
+/* GPU workload hints, flag bits 8-15 */
+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D(1 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VR(3 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_MAX  AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
+#define AMDGPU_CTX_WORKLOAD_INDEX(n)  (n >> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+
 struct drm_amdgpu_ctx_in {
/** AMDGPU_CTX_OP_* */
__u32   op;
@@ -281,6 +293,11 @@ union drm_amdgpu_ctx_out {
__u32   flags;
__u32   _pad;
} pstate;
+
+   struct {
+   __u32   flags;
+   __u32   _pad;
+   } workload;
 };
 
 union drm_amdgpu_ctx {
-- 
2.34.1



[PATCH v3 2/5] drm/amdgpu: add new functions to set GPU power profile

2022-09-26 Thread Shashank Sharma
This patch adds new functions which will allow a user to
change the GPU power profile based a GPU workload hint
flag.

Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c  | 97 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  1 +
 .../gpu/drm/amd/include/amdgpu_ctx_workload.h | 54 +++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  5 +
 5 files changed, 158 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_ctx_workload.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 5a283d12f8e1..34679c657ecc 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -50,7 +50,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
-   amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
+   amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_ctx_workload.o 
amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
amdgpu_debugfs.o amdgpu_ids.o amdgpu_gmc.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
new file mode 100644
index ..a11cf29bc388
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 
+#include "kgd_pp_interface.h"
+#include "amdgpu_ctx_workload.h"
+
+static enum PP_SMC_POWER_PROFILE
+amdgpu_workload_to_power_profile(uint32_t hint)
+{
+   switch (hint) {
+   case AMDGPU_CTX_WORKLOAD_HINT_NONE:
+   default:
+   return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+
+   case AMDGPU_CTX_WORKLOAD_HINT_3D:
+   return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+   case AMDGPU_CTX_WORKLOAD_HINT_VIDEO:
+   return PP_SMC_POWER_PROFILE_VIDEO;
+   case AMDGPU_CTX_WORKLOAD_HINT_VR:
+   return PP_SMC_POWER_PROFILE_VR;
+   case AMDGPU_CTX_WORKLOAD_HINT_COMPUTE:
+   return PP_SMC_POWER_PROFILE_COMPUTE;
+   }
+}
+
+int amdgpu_set_workload_profile(struct amdgpu_device *adev,
+   uint32_t hint)
+{
+   int ret = 0;
+   enum PP_SMC_POWER_PROFILE profile =
+   amdgpu_workload_to_power_profile(hint);
+
+   if (adev->pm.workload_mode == hint)
+   return 0;
+
+   mutex_lock(>pm.smu_workload_lock);
+
+   if (adev->pm.workload_mode == hint)
+   goto unlock;
+
+   ret = amdgpu_dpm_switch_power_profile(adev, profile, 1);
+   if (!ret)
+   adev->pm.workload_mode = hint;
+   atomic_inc(>pm.workload_switch_ref);
+
+unlock:
+   mutex_unlock(>pm.smu_workload_lock);
+   return ret;
+}
+
+int amdgpu_clear_workload_profile(struct amdgpu_device *adev,
+ uint32_t hint)
+{
+   int ret = 0;
+   enum PP_SMC_POWER_PROFILE profile =
+   amdgpu_workload_to_power_profile(hint);
+
+   if (hint == AMDGPU_CTX_WORKLOAD_HINT_NONE)
+   return 0;
+
+   /* Do not reset GPU power profile if another reset is coming */
+   if (atomic_dec_return(>pm.workload_switch_ref) > 0)
+   return 0;
+
+   mutex_lock(>pm.smu_workload_lock);
+
+   if (adev->pm.workload_mode != hint)
+   goto unlock;
+
+   ret = amdgpu_dpm_switch_power_profile(adev, profile, 0);
+   if (!ret)
+

[PATCH v3 0/5] GPU workload hints for better performance

2022-09-26 Thread Shashank Sharma
AMDGPU SOCs supports dynamic workload based power profiles, which can
provide fine-tuned performance for a particular type of workload.
This patch series adds an interface to set/reset these power profiles
based on the workload type hints. A user can set a hint of workload
type being submistted to GPU, and the driver can dynamically switch
the power profiles which is best suited to this kind of workload. 

Currently supported workload profiles are:
"None", "3D", "Video", "VR", "Compute"

V2: This version addresses the review comment from Christian about
chaning the design to set workload mode in a more dynamic method
than during the context creation.

V3: Addressed review comment from Christian, Removed the get_workload()
calls from UAPI, keeping only the set_workload() call.

Shashank Sharma (5):
  drm/amdgpu: add UAPI for workload hints to ctx ioctl
  drm/amdgpu: add new functions to set GPU power profile
  drm/amdgpu: set GPU workload via ctx IOCTL
  drm/amdgpu: switch GPU workload profile
  drm/amdgpu: switch workload context to/from compute

 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 14 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 42 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h   |  1 +
 .../gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c  | 93 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   | 15 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h   |  3 +
 .../gpu/drm/amd/include/amdgpu_ctx_workload.h | 54 +++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  5 +
 include/uapi/drm/amdgpu_drm.h | 17 
 12 files changed, 243 insertions(+), 6 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_ctx_workload.h

-- 
2.34.1



Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Lyude Paul
On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:
> When the module is unloaded or a GPU is unbound from the module it is
> possible for device private pages to be left mapped in currently running
> processes. This leads to a kernel crash when the pages are either freed
> or accessed from the CPU because the GPU and associated data structures
> and callbacks have all been freed.
> 
> Fix this by migrating any mappings back to normal CPU memory prior to
> freeing the GPU memory chunks and associated device private pages.
> 
> Signed-off-by: Alistair Popple 
> 
> ---
> 
> I assume the AMD driver might have a similar issue. However I can't see
> where device private (or coherent) pages actually get unmapped/freed
> during teardown as I couldn't find any relevant calls to
> devm_memunmap(), memunmap(), devm_release_mem_region() or
> release_mem_region(). So it appears that ZONE_DEVICE pages are not being
> properly freed during module unload, unless I'm missing something?

I've got no idea, will poke Ben to see if they know the answer to this

> ---
>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
>  1 file changed, 48 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index 66ebbd4..3b247b8 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
>   mutex_unlock(>dmem->mutex);
>  }
>  
> +/*
> + * Evict all pages mapping a chunk.
> + */
> +void
> +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
> +{
> + unsigned long i, npages = range_len(>pagemap.range) >> 
> PAGE_SHIFT;
> + unsigned long *src_pfns, *dst_pfns;
> + dma_addr_t *dma_addrs;
> + struct nouveau_fence *fence;
> +
> + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
> + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
> + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
> +
> + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
> + npages);
> +
> + for (i = 0; i < npages; i++) {
> + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
> + struct page *dpage;
> +
> + /*
> +  * _GFP_NOFAIL because the GPU is going away and there
> +  * is nothing sensible we can do if we can't copy the
> +  * data back.
> +  */

You'll have to excuse me for a moment since this area of nouveau isn't one of
my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite
retry, in the case of a GPU hotplug event I would assume we would rather just
stop trying to migrate things to the GPU and just drop the data instead of
hanging on infinite retries.

> + dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL);
> + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
> + nouveau_dmem_copy_one(chunk->drm,
> + migrate_pfn_to_page(src_pfns[i]), dpage,
> + _addrs[i]);
> + }
> + }
> +
> + nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, );
> + migrate_device_pages(src_pfns, dst_pfns, npages);
> + nouveau_dmem_fence_done();
> + migrate_device_finalize(src_pfns, dst_pfns, npages);
> + kfree(src_pfns);
> + kfree(dst_pfns);
> + for (i = 0; i < npages; i++)
> + dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, 
> DMA_BIDIRECTIONAL);
> + kfree(dma_addrs);
> +}
> +
>  void
>  nouveau_dmem_fini(struct nouveau_drm *drm)
>  {
> @@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
>   mutex_lock(>dmem->mutex);
>  
>   list_for_each_entry_safe(chunk, tmp, >dmem->chunks, list) {
> + nouveau_dmem_evict_chunk(chunk);
>   nouveau_bo_unpin(chunk->bo);
>   nouveau_bo_ref(NULL, >bo);
> + WARN_ON(chunk->callocated);
>   list_del(>list);
>   memunmap_pages(>pagemap);
>   release_mem_region(chunk->pagemap.range.start,

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat



Re: [PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()

2022-09-26 Thread Lyude Paul
On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:
> nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
> the migrate_to_ram() callback and is used to copy data from GPU to CPU
> memory. It is currently specific to fault handling, however a future
> patch implementing eviction of data during teardown needs similar
> functionality.
> 
> Refactor out the core functionality so that it is not specific to fault
> handling.
> 
> Signed-off-by: Alistair Popple 
> ---
>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +--
>  1 file changed, 29 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index f9234ed..66ebbd4 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -139,44 +139,25 @@ static void nouveau_dmem_fence_done(struct 
> nouveau_fence **fence)
>   }
>  }
>  
> -static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
> - struct vm_fault *vmf, struct migrate_vma *args,
> - dma_addr_t *dma_addr)
> +static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page *spage,
> + struct page *dpage, dma_addr_t *dma_addr)
>  {
>   struct device *dev = drm->dev->dev;
> - struct page *dpage, *spage;
> - struct nouveau_svmm *svmm;
> -
> - spage = migrate_pfn_to_page(args->src[0]);
> - if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE))
> - return 0;
>  
> - dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
> - if (!dpage)
> - return VM_FAULT_SIGBUS;
>   lock_page(dpage);
>  
>   *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
>   if (dma_mapping_error(dev, *dma_addr))
> - goto error_free_page;
> + return -EIO;
>  
> - svmm = spage->zone_device_data;
> - mutex_lock(>mutex);
> - nouveau_svmm_invalidate(svmm, args->start, args->end);
>   if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr,
> - NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage)))
> - goto error_dma_unmap;
> - mutex_unlock(>mutex);
> +  NOUVEAU_APER_VRAM,
> +  nouveau_dmem_page_addr(spage))) {
> + dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
> + return -EIO;
> + }

Feel free to just align this with the starting (, as long as it doesn't go
above 100 characters it doesn't really matter imho and would look nicer that
way.

Otherwise:

Reviewed-by: Lyude Paul 

Will look at the other patch in a moment

>  
> - args->dst[0] = migrate_pfn(page_to_pfn(dpage));
>   return 0;
> -
> -error_dma_unmap:
> - mutex_unlock(>mutex);
> - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
> -error_free_page:
> - __free_page(dpage);
> - return VM_FAULT_SIGBUS;
>  }
>  
>  static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
> @@ -184,9 +165,11 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
> vm_fault *vmf)
>   struct nouveau_drm *drm = page_to_drm(vmf->page);
>   struct nouveau_dmem *dmem = drm->dmem;
>   struct nouveau_fence *fence;
> + struct nouveau_svmm *svmm;
> + struct page *spage, *dpage;
>   unsigned long src = 0, dst = 0;
>   dma_addr_t dma_addr = 0;
> - vm_fault_t ret;
> + vm_fault_t ret = 0;
>   struct migrate_vma args = {
>   .vma= vmf->vma,
>   .start  = vmf->address,
> @@ -207,9 +190,25 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
> vm_fault *vmf)
>   if (!args.cpages)
>   return 0;
>  
> - ret = nouveau_dmem_fault_copy_one(drm, vmf, , _addr);
> - if (ret || dst == 0)
> + spage = migrate_pfn_to_page(src);
> + if (!spage || !(src & MIGRATE_PFN_MIGRATE))
> + goto done;
> +
> + dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
> + if (!dpage)
> + goto done;
> +
> + dst = migrate_pfn(page_to_pfn(dpage));
> +
> + svmm = spage->zone_device_data;
> + mutex_lock(>mutex);
> + nouveau_svmm_invalidate(svmm, args.start, args.end);
> + ret = nouveau_dmem_copy_one(drm, spage, dpage, _addr);
> + mutex_unlock(>mutex);
> + if (ret) {
> + ret = VM_FAULT_SIGBUS;
>   goto done;
> + }
>  
>   nouveau_fence_new(dmem->migrate.chan, false, );
>   migrate_vma_pages();

-- 
Cheers,
 Lyude Paul (she/her)
 Software Engineer at Red Hat



[PATCH] drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV

2022-09-26 Thread Bokun Zhang
- Under SRIOV, we need to send REQ_GPU_FINI to the hypervisor
  during the suspend time. Furthermore, we cannot request a
  mode 1 reset under SRIOV as VF. Therefore, we will skip it
  as it is called in suspend_noirq() function.

- In the resume code path, we need to send REQ_GPU_INIT to the
  hypervisor and also resume PSP IP block under SRIOV.

Signed-off-by: Bokun Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 +-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index c5fad52c649d..a5aee19ca30e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1057,6 +1057,10 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device 
*adev)
 {
if (adev->flags & AMD_IS_APU)
return false;
+
+   if (amdgpu_sriov_vf(adev))
+   return false;
+
 #ifdef HAVE_PM_SUSPEND_TARGET_STATE
return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
 #else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fb09dc32b4c0..c5c94ebd3d57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3176,7 +3176,8 @@ static int amdgpu_device_ip_resume_phase1(struct 
amdgpu_device *adev)
continue;
if (adev->ip_blocks[i].version->type == 
AMD_IP_BLOCK_TYPE_COMMON ||
adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC ||
-   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_IH) {
+   adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_IH ||
+   (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_PSP 
&& amdgpu_sriov_vf(adev))) {
 
r = adev->ip_blocks[i].version->funcs->resume(adev);
if (r) {
@@ -4120,12 +4121,20 @@ static void amdgpu_device_evict_resources(struct 
amdgpu_device *adev)
 int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
 {
struct amdgpu_device *adev = drm_to_adev(dev);
+   int r = 0;
 
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
return 0;
 
adev->in_suspend = true;
 
+   if (amdgpu_sriov_vf(adev)) {
+   amdgpu_virt_fini_data_exchange(adev);
+   r = amdgpu_virt_request_full_gpu(adev, false);
+   if (r)
+   return r;
+   }
+
if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D3))
DRM_WARN("smart shift update failed\n");
 
@@ -4153,6 +4162,9 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
fbcon)
 
amdgpu_device_ip_suspend_phase2(adev);
 
+   if (amdgpu_sriov_vf(adev))
+   amdgpu_virt_release_full_gpu(adev, false);
+
return 0;
 }
 
@@ -4171,6 +4183,12 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
struct amdgpu_device *adev = drm_to_adev(dev);
int r = 0;
 
+   if (amdgpu_sriov_vf(adev)) {
+   r = amdgpu_virt_request_full_gpu(adev, true);
+   if (r)
+   return r;
+   }
+
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
return 0;
 
@@ -4185,6 +4203,13 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
fbcon)
}
 
r = amdgpu_device_ip_resume(adev);
+
+   /* no matter what r is, always need to properly release full GPU */
+   if (amdgpu_sriov_vf(adev)) {
+   amdgpu_virt_init_data_exchange(adev);
+   amdgpu_virt_release_full_gpu(adev, true);
+   }
+
if (r) {
dev_err(adev->dev, "amdgpu_device_ip_resume failed (%d).\n", r);
return r;
-- 
2.25.1



[PATCH v3 1/1] drm/amdkfd: Track unified memory when switching xnack mode

2022-09-26 Thread Philip Yang
Unified memory usage with xnack off is tracked to avoid oversubscribe
system memory. When switching xnack mode from off to on, subsequent
free ranges allocated with xnack off will not unreserve memory when
xnack is on, cause memory accounting unbalanced.

When switching xnack mode from on to off, need reserve already allocated
svm range memory because subsequent free ranges will unreserve memory
with xnack off.

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 20 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  2 ++
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 56f7307c21d2..938095478707 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1594,16 +1594,36 @@ static int kfd_ioctl_set_xnack_mode(struct file *filep,
if (args->xnack_enabled >= 0) {
if (!list_empty(>pqm.queues)) {
pr_debug("Process has user queues running\n");
-   mutex_unlock(>mutex);
-   return -EBUSY;
+   r = -EBUSY;
+   goto out_unlock;
}
-   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true))
+
+   if (p->xnack_enabled == args->xnack_enabled)
+   goto out_unlock;
+
+   if (args->xnack_enabled && !kfd_process_xnack_mode(p, true)) {
r = -EPERM;
-   else
-   p->xnack_enabled = args->xnack_enabled;
+   goto out_unlock;
+   }
+
+   pr_debug("switching xnack from %d to %d\n", p->xnack_enabled,
+args->xnack_enabled);
+
+   mutex_lock(>svms.lock);
+
+   /* Switching to XNACK on/off, unreserve/reserve memory of all
+* svm ranges. Change xnack must be inside svms lock, to avoid
+* race with svm_range_deferred_list_work unreserve memory.
+*/
+   p->xnack_enabled = args->xnack_enabled;
+   svm_range_list_unreserve_mem(p, p->xnack_enabled);
+
+   mutex_unlock(>svms.lock);
} else {
args->xnack_enabled = p->xnack_enabled;
}
+
+out_unlock:
mutex_unlock(>mutex);
 
return r;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index cf5b4005534c..5a82d5660470 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -278,7 +278,7 @@ static void svm_range_free(struct svm_range *prange, bool 
update_mem_usage)
svm_range_free_dma_mappings(prange);
 
if (update_mem_usage && !p->xnack_enabled) {
-   pr_debug("unreserve mem limit: %lld\n", size);
+   pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size);
amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
}
@@ -2956,6 +2956,24 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,
return r;
 }
 
+void svm_range_list_unreserve_mem(struct kfd_process *p, bool unreserve)
+{
+   struct svm_range *prange;
+   uint64_t size;
+
+   list_for_each_entry(prange, >svms.list, list) {
+   size = (prange->last - prange->start + 1) << PAGE_SHIFT;
+   pr_debug("svms 0x%p %s prange 0x%p size 0x%llx\n", >svms,
+unreserve ? "unreserve" : "reserve", prange, size);
+   if (unreserve)
+   amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+   
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+   else
+   amdgpu_amdkfd_reserve_mem_limit(NULL, size,
+   
KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+   }
+}
+
 void svm_range_list_fini(struct kfd_process *p)
 {
struct svm_range *prange;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 012c53729516..05a2135cd56e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -203,10 +203,12 @@ void svm_range_list_lock_and_flush_work(struct 
svm_range_list *svms, struct mm_s
 void svm_range_bo_unref_async(struct svm_range_bo *svm_bo);
 
 void svm_range_set_max_pages(struct amdgpu_device *adev);
+void svm_range_list_unreserve_mem(struct kfd_process *p, bool unreserve);
 
 #else
 
 struct kfd_process;
+void svm_range_list_unreserve_mem(struct kfd_process *p, bool unreserve) { }
 
 static inline int svm_range_list_init(struct kfd_process *p)
 {
-- 
2.35.1



RE: [PATCH] drm/amdkfd: Fix UBSAN shift-out-of-bounds warning

2022-09-26 Thread Sider, Graham
[AMD Official Use Only - General]

> -Original Message-
> From: amd-gfx  On Behalf Of Felix
> Kuehling
> Sent: Wednesday, September 21, 2022 6:30 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Ellis Michael 
> Subject: [PATCH] drm/amdkfd: Fix UBSAN shift-out-of-bounds warning
> 
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
> 
> 
> This was fixed in initialize_cpsch before, but not in initialize_nocpsch.
> Factor sdma bitmap initialization into a helper function to apply the correct
> implementation in both cases without duplicating it.
> 
> Reported-by: Ellis Michael 
> Signed-off-by: Felix Kuehling 
> ---
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c | 41 --
> -
>  1 file changed, 17 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e83725a28106..f88ec6a11ad2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1240,6 +1240,20 @@ static void init_interrupts(struct
> device_queue_manager *dqm)
> dqm->dev->kfd2kgd->init_interrupts(dqm->dev->adev, 
> i);  }
> 
> +static void init_sdma_bitmaps(struct device_queue_manager *dqm) {
> +   uint64_t num_sdma_queues = get_num_sdma_queues(dqm);
> +   uint64_t num_xgmi_sdma_queues =
> get_num_xgmi_sdma_queues(dqm);
> +
> +   if (num_sdma_queues)
> +   dqm->sdma_bitmap = GENMASK_ULL(num_sdma_queues-1, 0);
> +   if (num_xgmi_sdma_queues)
> +   dqm->xgmi_sdma_bitmap =
> + GENMASK_ULL(num_xgmi_sdma_queues-1, 0);

I think we still want a safeguard here in case we ever get into a situation 
where num_sdma_queues is > 64. Otherwise we could hit an unsigned wraparound 
(in __GENMASK_ULL: (~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h --> would cause 
a wrap plus shift > width of type warning if h > 63).

Something along the lines of

dqm->sdma_bitmap = GENMASK_ULL(min(num_sdma_queues, BITS_PER_LONG_LONG) - 1, 0);

Could work as a safeguard. Same for xgmi_sdma_bitmap.

Best,
Graham

> +
> +   dqm->sdma_bitmap &= ~get_reserved_sdma_queues_bitmap(dqm);
> +   pr_info("sdma_bitmap: %llx\n", dqm->sdma_bitmap); }
> +
>  static int initialize_nocpsch(struct device_queue_manager *dqm)  {
> int pipe, queue;
> @@ -1268,11 +1282,7 @@ static int initialize_nocpsch(struct
> device_queue_manager *dqm)
> 
> memset(dqm->vmid_pasid, 0, sizeof(dqm->vmid_pasid));
> 
> -   dqm->sdma_bitmap = ~0ULL >> (64 - get_num_sdma_queues(dqm));
> -   dqm->sdma_bitmap &= ~(get_reserved_sdma_queues_bitmap(dqm));
> -   pr_info("sdma_bitmap: %llx\n", dqm->sdma_bitmap);
> -
> -   dqm->xgmi_sdma_bitmap = ~0ULL >> (64 -
> get_num_xgmi_sdma_queues(dqm));
> +   init_sdma_bitmaps(dqm);
> 
> return 0;
>  }
> @@ -1450,9 +1460,6 @@ static int set_sched_resources(struct
> device_queue_manager *dqm)
> 
>  static int initialize_cpsch(struct device_queue_manager *dqm)  {
> -   uint64_t num_sdma_queues;
> -   uint64_t num_xgmi_sdma_queues;
> -
> pr_debug("num of pipes: %d\n", get_pipes_per_mec(dqm));
> 
> mutex_init(>lock_hidden);
> @@ -1461,24 +1468,10 @@ static int initialize_cpsch(struct
> device_queue_manager *dqm)
> dqm->active_cp_queue_count = 0;
> dqm->gws_queue_count = 0;
> dqm->active_runlist = false;
> -
> -   num_sdma_queues = get_num_sdma_queues(dqm);
> -   if (num_sdma_queues >= BITS_PER_TYPE(dqm->sdma_bitmap))
> -   dqm->sdma_bitmap = ULLONG_MAX;
> -   else
> -   dqm->sdma_bitmap = (BIT_ULL(num_sdma_queues) - 1);
> -
> -   dqm->sdma_bitmap &= ~(get_reserved_sdma_queues_bitmap(dqm));
> -   pr_info("sdma_bitmap: %llx\n", dqm->sdma_bitmap);
> -
> -   num_xgmi_sdma_queues = get_num_xgmi_sdma_queues(dqm);
> -   if (num_xgmi_sdma_queues >= BITS_PER_TYPE(dqm-
> >xgmi_sdma_bitmap))
> -   dqm->xgmi_sdma_bitmap = ULLONG_MAX;
> -   else
> -   dqm->xgmi_sdma_bitmap = (BIT_ULL(num_xgmi_sdma_queues) -
> 1);
> -
> INIT_WORK(>hw_exception_work, kfd_process_hw_exception);
> 
> +   init_sdma_bitmaps(dqm);
> +
> return 0;
>  }
> 
> --
> 2.32.0


Re: [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-26 Thread Jason Gunthorpe
On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
> refcount") device private pages have no longer had an extra reference
> count when the page is in use. However before handing them back to the
> owning device driver we add an extra reference count such that free
> pages have a reference count of one.
> 
> This makes it difficult to tell if a page is free or not because both
> free and in use pages will have a non-zero refcount. Instead we should
> return pages to the drivers page allocator with a zero reference count.
> Kernel code can then safely use kernel functions such as
> get_page_unless_zero().
> 
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>  lib/test_hmm.c   | 1 +
>  mm/memremap.c| 5 -
>  mm/page_alloc.c  | 6 ++
>  6 files changed, 10 insertions(+), 5 deletions(-)

I think this is a great idea, but I'm surprised no dax stuff is
touched here?

Jason


Re: [PATCH v2 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2022-09-26 Thread Sharma, Shashank




On 9/26/2022 5:19 PM, Christian König wrote:

Am 26.09.22 um 17:14 schrieb Sharma, Shashank:


Hello Christian,

On 9/26/2022 5:10 PM, Christian König wrote:

Am 26.09.22 um 17:02 schrieb Shashank Sharma:

Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
  include/uapi/drm/amdgpu_drm.h | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h 
b/include/uapi/drm/amdgpu_drm.h

index c2c9c674a223..da5019a6e233 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,8 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_OP_QUERY_STATE2    4
  #define AMDGPU_CTX_OP_GET_STABLE_PSTATE    5
  #define AMDGPU_CTX_OP_SET_STABLE_PSTATE    6
+#define AMDGPU_CTX_OP_GET_WORKLOAD_PROFILE    7


Do we really have an use case for getting the profile or is that just 
added for completeness?
To be honest, there is no direct use case for a get(). We have 
self-reset in kernel to clear the profile, so this is just for the 
sake of completeness.


The problem is we usually need an userspace use case to justify 
upstreaming of an UAPI.


We already had a couple of cases where we only upstreamed UAPI for the 
sake of completeness (set/get pair for example) and then years later 
found out that the getter is completely broken for years because nobody 
used it.


So if we don't really need it I would try to avoid it.

Christian.


Makes sense, I can remove get API as UAPI and just keep it kernel internal.

- Shashank






At base minimum we need a test-case for both to exercise the UAPI.



Agree, I have already added some test cases in libdrm/tests/amdgpu for 
my local testing, I will start publishing them once we have an 
agreement on the UAPI and kernel design.


- Shashank


Regards,
Christian.


+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE    8
  /* GPU reset status */
  #define AMDGPU_CTX_NO_RESET    0
@@ -252,6 +254,17 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
  #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
+/* GPU workload hints, flag bits 8-15 */
+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D    (1 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VR    (3 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)

+#define AMDGPU_CTX_WORKLOAD_HINT_MAX AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
+#define AMDGPU_CTX_WORKLOAD_INDEX(n)   (n >> 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)

+
  struct drm_amdgpu_ctx_in {
  /** AMDGPU_CTX_OP_* */
  __u32    op;
@@ -281,6 +294,11 @@ union drm_amdgpu_ctx_out {
  __u32    flags;
  __u32    _pad;
  } pstate;
+
+    struct {
+    __u32    flags;
+    __u32    _pad;
+    } workload;
  };
  union drm_amdgpu_ctx {






Re: [PATCH v2 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2022-09-26 Thread Christian König

Am 26.09.22 um 17:14 schrieb Sharma, Shashank:


Hello Christian,

On 9/26/2022 5:10 PM, Christian König wrote:

Am 26.09.22 um 17:02 schrieb Shashank Sharma:

Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
  include/uapi/drm/amdgpu_drm.h | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h 
b/include/uapi/drm/amdgpu_drm.h

index c2c9c674a223..da5019a6e233 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,8 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_OP_QUERY_STATE2    4
  #define AMDGPU_CTX_OP_GET_STABLE_PSTATE    5
  #define AMDGPU_CTX_OP_SET_STABLE_PSTATE    6
+#define AMDGPU_CTX_OP_GET_WORKLOAD_PROFILE    7


Do we really have an use case for getting the profile or is that just 
added for completeness?
To be honest, there is no direct use case for a get(). We have 
self-reset in kernel to clear the profile, so this is just for the 
sake of completeness.


The problem is we usually need an userspace use case to justify 
upstreaming of an UAPI.


We already had a couple of cases where we only upstreamed UAPI for the 
sake of completeness (set/get pair for example) and then years later 
found out that the getter is completely broken for years because nobody 
used it.


So if we don't really need it I would try to avoid it.

Christian.




At base minimum we need a test-case for both to exercise the UAPI.



Agree, I have already added some test cases in libdrm/tests/amdgpu for 
my local testing, I will start publishing them once we have an 
agreement on the UAPI and kernel design.


- Shashank


Regards,
Christian.


+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE    8
  /* GPU reset status */
  #define AMDGPU_CTX_NO_RESET    0
@@ -252,6 +254,17 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
  #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
+/* GPU workload hints, flag bits 8-15 */
+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D    (1 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VR    (3 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)

+#define AMDGPU_CTX_WORKLOAD_HINT_MAX AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
+#define AMDGPU_CTX_WORKLOAD_INDEX(n)   (n >> 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)

+
  struct drm_amdgpu_ctx_in {
  /** AMDGPU_CTX_OP_* */
  __u32    op;
@@ -281,6 +294,11 @@ union drm_amdgpu_ctx_out {
  __u32    flags;
  __u32    _pad;
  } pstate;
+
+    struct {
+    __u32    flags;
+    __u32    _pad;
+    } workload;
  };
  union drm_amdgpu_ctx {






Re: [PATCH v2 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2022-09-26 Thread Sharma, Shashank



Hello Christian,

On 9/26/2022 5:10 PM, Christian König wrote:

Am 26.09.22 um 17:02 schrieb Shashank Sharma:

Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
  include/uapi/drm/amdgpu_drm.h | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h 
b/include/uapi/drm/amdgpu_drm.h

index c2c9c674a223..da5019a6e233 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,8 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_OP_QUERY_STATE2    4
  #define AMDGPU_CTX_OP_GET_STABLE_PSTATE    5
  #define AMDGPU_CTX_OP_SET_STABLE_PSTATE    6
+#define AMDGPU_CTX_OP_GET_WORKLOAD_PROFILE    7


Do we really have an use case for getting the profile or is that just 
added for completeness?
To be honest, there is no direct use case for a get(). We have 
self-reset in kernel to clear the profile, so this is just for the sake 
of completeness.



At base minimum we need a test-case for both to exercise the UAPI.



Agree, I have already added some test cases in libdrm/tests/amdgpu for 
my local testing, I will start publishing them once we have an agreement 
on the UAPI and kernel design.


- Shashank


Regards,
Christian.


+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE    8
  /* GPU reset status */
  #define AMDGPU_CTX_NO_RESET    0
@@ -252,6 +254,17 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
  #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
+/* GPU workload hints, flag bits 8-15 */
+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D    (1 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VR    (3 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_MAX   
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
+#define AMDGPU_CTX_WORKLOAD_INDEX(n)   (n >> 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)

+
  struct drm_amdgpu_ctx_in {
  /** AMDGPU_CTX_OP_* */
  __u32    op;
@@ -281,6 +294,11 @@ union drm_amdgpu_ctx_out {
  __u32    flags;
  __u32    _pad;
  } pstate;
+
+    struct {
+    __u32    flags;
+    __u32    _pad;
+    } workload;
  };
  union drm_amdgpu_ctx {




Re: [PATCH v2 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2022-09-26 Thread Christian König

Am 26.09.22 um 17:02 schrieb Shashank Sharma:

Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
  include/uapi/drm/amdgpu_drm.h | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..da5019a6e233 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,8 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_OP_QUERY_STATE24
  #define AMDGPU_CTX_OP_GET_STABLE_PSTATE   5
  #define AMDGPU_CTX_OP_SET_STABLE_PSTATE   6
+#define AMDGPU_CTX_OP_GET_WORKLOAD_PROFILE 7


Do we really have an use case for getting the profile or is that just 
added for completeness?


At base minimum we need a test-case for both to exercise the UAPI.

Regards,
Christian.


+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE 8
  
  /* GPU reset status */

  #define AMDGPU_CTX_NO_RESET   0
@@ -252,6 +254,17 @@ union drm_amdgpu_bo_list {
  #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
  #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
  
+/* GPU workload hints, flag bits 8-15 */

+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D(1 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VR(3 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_MAX  AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
+#define AMDGPU_CTX_WORKLOAD_INDEX(n)  (n >> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+
  struct drm_amdgpu_ctx_in {
/** AMDGPU_CTX_OP_* */
__u32   op;
@@ -281,6 +294,11 @@ union drm_amdgpu_ctx_out {
__u32   flags;
__u32   _pad;
} pstate;
+
+   struct {
+   __u32   flags;
+   __u32   _pad;
+   } workload;
  };
  
  union drm_amdgpu_ctx {




Re: [PATCH v3] drm/amdgpu: Fix VRAM BO swap issue

2022-09-26 Thread Christian König

Am 26.09.22 um 17:04 schrieb Arunpravin Paneer Selvam:



On 9/26/2022 12:01 PM, Christian König wrote:

Am 26.09.22 um 07:25 schrieb Arunpravin Paneer Selvam:

DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag
v3: Rewrite the code to save a bit of calculations and
 variables (Christian)

Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam 



Reviewed-by: Christian König 


Shall I push this patch into amd-staging-drm-next?


If it applies cleanly then I think so, yes.

Another possibility would be through drm-misc-fixes, but since it is a 
pure amdgpu fix we should try to avoid that.


Christian.



Thanks,
Arun.



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index b1c455329023..dc262d2c2925 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -424,8 +424,9 @@ static int amdgpu_move_blit(struct 
ttm_buffer_object *bo,

  static bool amdgpu_mem_visible(struct amdgpu_device *adev,
 struct ttm_resource *mem)
  {
-    uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
+    u64 mem_size = (u64)mem->num_pages << PAGE_SHIFT;
  struct amdgpu_res_cursor cursor;
+    u64 end;
    if (mem->mem_type == TTM_PL_SYSTEM ||
  mem->mem_type == TTM_PL_TT)
@@ -434,12 +435,18 @@ static bool amdgpu_mem_visible(struct 
amdgpu_device *adev,

  return false;
    amdgpu_res_first(mem, 0, mem_size, );
+    end = cursor.start + cursor.size;
+    while (cursor.remaining) {
+    amdgpu_res_next(, cursor.size);
  -    /* ttm_resource_ioremap only supports contiguous memory */
-    if (cursor.size != mem_size)
-    return false;
+    /* ttm_resource_ioremap only supports contiguous memory */
+    if (end != cursor.start)
+    return false;
+
+    end = cursor.start + cursor.size;
+    }
  -    return cursor.start + cursor.size <= 
adev->gmc.visible_vram_size;

+    return end <= adev->gmc.visible_vram_size;
  }
    /*








Re: [PATCH v3] drm/amdgpu: Fix VRAM BO swap issue

2022-09-26 Thread Arunpravin Paneer Selvam




On 9/26/2022 12:01 PM, Christian König wrote:

Am 26.09.22 um 07:25 schrieb Arunpravin Paneer Selvam:

DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag
v3: Rewrite the code to save a bit of calculations and
 variables (Christian)

Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam 



Reviewed-by: Christian König 


Shall I push this patch into amd-staging-drm-next?

Thanks,
Arun.



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index b1c455329023..dc262d2c2925 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -424,8 +424,9 @@ static int amdgpu_move_blit(struct 
ttm_buffer_object *bo,

  static bool amdgpu_mem_visible(struct amdgpu_device *adev,
 struct ttm_resource *mem)
  {
-    uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
+    u64 mem_size = (u64)mem->num_pages << PAGE_SHIFT;
  struct amdgpu_res_cursor cursor;
+    u64 end;
    if (mem->mem_type == TTM_PL_SYSTEM ||
  mem->mem_type == TTM_PL_TT)
@@ -434,12 +435,18 @@ static bool amdgpu_mem_visible(struct 
amdgpu_device *adev,

  return false;
    amdgpu_res_first(mem, 0, mem_size, );
+    end = cursor.start + cursor.size;
+    while (cursor.remaining) {
+    amdgpu_res_next(, cursor.size);
  -    /* ttm_resource_ioremap only supports contiguous memory */
-    if (cursor.size != mem_size)
-    return false;
+    /* ttm_resource_ioremap only supports contiguous memory */
+    if (end != cursor.start)
+    return false;
+
+    end = cursor.start + cursor.size;
+    }
  -    return cursor.start + cursor.size <= adev->gmc.visible_vram_size;
+    return end <= adev->gmc.visible_vram_size;
  }
    /*






[PATCH v2 5/5] drm/amdgpu: switch workload context to/from compute

2022-09-26 Thread Shashank Sharma
This patch switches the GPU workload mode to/from
compute mode, while submitting compute workload.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 5e53a5293935..1caed319a448 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -34,6 +34,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_umc.h"
 #include "amdgpu_reset.h"
+#include "amdgpu_ctx_workload.h"
 
 /* Total memory size in system memory and all GPU VRAM. Used to
  * estimate worst case amount of memory to reserve for page tables
@@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev,
 
 void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle)
 {
-   amdgpu_dpm_switch_power_profile(adev,
-   PP_SMC_POWER_PROFILE_COMPUTE,
-   !idle);
+   int ret;
+
+   if (idle)
+   ret = amdgpu_clear_workload_profile(adev, 
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+   else
+   ret = amdgpu_set_workload_profile(adev, 
AMDGPU_CTX_WORKLOAD_HINT_COMPUTE);
+
+   if (ret)
+   drm_warn(>ddev, "Failed to %s power profile to compute 
mode\n",
+idle ? "reset" : "set");
 }
 
 bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
-- 
2.34.1



[PATCH v2 1/5] drm/amdgpu: add UAPI for workload hints to ctx ioctl

2022-09-26 Thread Shashank Sharma
Allow the user to specify a workload hint to the kernel.
We can use these to tweak the dpm heuristics to better match
the workload for improved performance.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 include/uapi/drm/amdgpu_drm.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..da5019a6e233 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -212,6 +212,8 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_OP_QUERY_STATE2 4
 #define AMDGPU_CTX_OP_GET_STABLE_PSTATE5
 #define AMDGPU_CTX_OP_SET_STABLE_PSTATE6
+#define AMDGPU_CTX_OP_GET_WORKLOAD_PROFILE 7
+#define AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE 8
 
 /* GPU reset status */
 #define AMDGPU_CTX_NO_RESET0
@@ -252,6 +254,17 @@ union drm_amdgpu_bo_list {
 #define AMDGPU_CTX_STABLE_PSTATE_MIN_MCLK  3
 #define AMDGPU_CTX_STABLE_PSTATE_PEAK  4
 
+/* GPU workload hints, flag bits 8-15 */
+#define AMDGPU_CTX_WORKLOAD_HINT_SHIFT 8
+#define AMDGPU_CTX_WORKLOAD_HINT_MASK  (0xff << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_NONE  (0 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_3D(1 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VIDEO (2 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_VR(3 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_COMPUTE   (4 << 
AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+#define AMDGPU_CTX_WORKLOAD_HINT_MAX  AMDGPU_CTX_WORKLOAD_HINT_COMPUTE
+#define AMDGPU_CTX_WORKLOAD_INDEX(n)  (n >> AMDGPU_CTX_WORKLOAD_HINT_SHIFT)
+
 struct drm_amdgpu_ctx_in {
/** AMDGPU_CTX_OP_* */
__u32   op;
@@ -281,6 +294,11 @@ union drm_amdgpu_ctx_out {
__u32   flags;
__u32   _pad;
} pstate;
+
+   struct {
+   __u32   flags;
+   __u32   _pad;
+   } workload;
 };
 
 union drm_amdgpu_ctx {
-- 
2.34.1



[PATCH v2 4/5] drm/amdgpu: switch GPU workload profile

2022-09-26 Thread Shashank Sharma
This patch and switches the GPU workload based profile based
on the workload hint information saved in the workload context.
The workload profile is reset to NONE when the job is done.

Signed-off-by: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c   |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 15 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h  |  3 +++
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b7bae833c804..de906a42144f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -237,6 +237,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
goto free_all_kdata;
}
 
+   p->job->workload_mode = p->ctx->workload_mode;
+
if (p->uf_entry.tv.bo)
p->job->uf_addr = uf_offset;
kvfree(chunk_array);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
index a11cf29bc388..625114804121 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
@@ -55,15 +55,11 @@ int amdgpu_set_workload_profile(struct amdgpu_device *adev,
 
mutex_lock(>pm.smu_workload_lock);
 
-   if (adev->pm.workload_mode == hint)
-   goto unlock;
-
ret = amdgpu_dpm_switch_power_profile(adev, profile, 1);
if (!ret)
adev->pm.workload_mode = hint;
atomic_inc(>pm.workload_switch_ref);
 
-unlock:
mutex_unlock(>pm.smu_workload_lock);
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index c2fd6f3076a6..9300e86ee7c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -30,6 +30,7 @@
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_reset.h"
+#include "amdgpu_ctx_workload.h"
 
 static enum drm_gpu_sched_stat amdgpu_job_timedout(struct drm_sched_job *s_job)
 {
@@ -144,6 +145,14 @@ void amdgpu_job_free_resources(struct amdgpu_job *job)
 static void amdgpu_job_free_cb(struct drm_sched_job *s_job)
 {
struct amdgpu_job *job = to_amdgpu_job(s_job);
+   struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
+
+   if (job->workload_mode != AMDGPU_CTX_WORKLOAD_HINT_NONE) {
+   if (amdgpu_clear_workload_profile(ring->adev, 
job->workload_mode))
+   DRM_WARN("Failed to come out of workload profile %s\n",
+   
amdgpu_workload_profile_name(job->workload_mode));
+   job->workload_mode = AMDGPU_CTX_WORKLOAD_HINT_NONE;
+   }
 
drm_sched_job_cleanup(s_job);
 
@@ -256,6 +265,12 @@ static struct dma_fence *amdgpu_job_run(struct 
drm_sched_job *sched_job)
DRM_ERROR("Error scheduling IBs (%d)\n", r);
}
 
+   if (job->workload_mode != AMDGPU_CTX_WORKLOAD_HINT_NONE) {
+   if (amdgpu_set_workload_profile(ring->adev, job->workload_mode))
+   DRM_WARN("Failed to set workload profile to %s\n",
+ 
amdgpu_workload_profile_name(job->workload_mode));
+   }
+
job->job_run_counter++;
amdgpu_job_free_resources(job);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index babc0af751c2..573e8692c814 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -68,6 +68,9 @@ struct amdgpu_job {
/* job_run_counter >= 1 means a resubmit job */
uint32_tjob_run_counter;
 
+   /* workload mode hint for pm */
+   uint32_tworkload_mode;
+
uint32_tnum_ibs;
struct amdgpu_ibibs[];
 };
-- 
2.34.1



[PATCH v2 2/5] drm/amdgpu: add new functions to set GPU power profile

2022-09-26 Thread Shashank Sharma
This patch adds new functions which will allow a user to
change the GPU power profile based a GPU workload hint
flag.

Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c  | 97 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  1 +
 .../gpu/drm/amd/include/amdgpu_ctx_workload.h | 54 +++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  5 +
 5 files changed, 158 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_ctx_workload.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 5a283d12f8e1..34679c657ecc 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -50,7 +50,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
amdgpu_dma_buf.o amdgpu_vm.o amdgpu_vm_pt.o amdgpu_ib.o amdgpu_pll.o \
-   amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
+   amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_ctx_workload.o 
amdgpu_sync.o \
amdgpu_gtt_mgr.o amdgpu_preempt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o \
amdgpu_atomfirmware.o amdgpu_vf_error.o amdgpu_sched.o \
amdgpu_debugfs.o amdgpu_ids.o amdgpu_gmc.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
new file mode 100644
index ..a11cf29bc388
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 
+#include "kgd_pp_interface.h"
+#include "amdgpu_ctx_workload.h"
+
+static enum PP_SMC_POWER_PROFILE
+amdgpu_workload_to_power_profile(uint32_t hint)
+{
+   switch (hint) {
+   case AMDGPU_CTX_WORKLOAD_HINT_NONE:
+   default:
+   return PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT;
+
+   case AMDGPU_CTX_WORKLOAD_HINT_3D:
+   return PP_SMC_POWER_PROFILE_FULLSCREEN3D;
+   case AMDGPU_CTX_WORKLOAD_HINT_VIDEO:
+   return PP_SMC_POWER_PROFILE_VIDEO;
+   case AMDGPU_CTX_WORKLOAD_HINT_VR:
+   return PP_SMC_POWER_PROFILE_VR;
+   case AMDGPU_CTX_WORKLOAD_HINT_COMPUTE:
+   return PP_SMC_POWER_PROFILE_COMPUTE;
+   }
+}
+
+int amdgpu_set_workload_profile(struct amdgpu_device *adev,
+   uint32_t hint)
+{
+   int ret = 0;
+   enum PP_SMC_POWER_PROFILE profile =
+   amdgpu_workload_to_power_profile(hint);
+
+   if (adev->pm.workload_mode == hint)
+   return 0;
+
+   mutex_lock(>pm.smu_workload_lock);
+
+   if (adev->pm.workload_mode == hint)
+   goto unlock;
+
+   ret = amdgpu_dpm_switch_power_profile(adev, profile, 1);
+   if (!ret)
+   adev->pm.workload_mode = hint;
+   atomic_inc(>pm.workload_switch_ref);
+
+unlock:
+   mutex_unlock(>pm.smu_workload_lock);
+   return ret;
+}
+
+int amdgpu_clear_workload_profile(struct amdgpu_device *adev,
+ uint32_t hint)
+{
+   int ret = 0;
+   enum PP_SMC_POWER_PROFILE profile =
+   amdgpu_workload_to_power_profile(hint);
+
+   if (hint == AMDGPU_CTX_WORKLOAD_HINT_NONE)
+   return 0;
+
+   /* Do not reset GPU power profile if another reset is coming */
+   if (atomic_dec_return(>pm.workload_switch_ref) > 0)
+   return 0;
+
+   mutex_lock(>pm.smu_workload_lock);
+
+   if (adev->pm.workload_mode != hint)
+   goto unlock;
+
+   ret = amdgpu_dpm_switch_power_profile(adev, profile, 0);
+   if (!ret)
+

[PATCH v2 3/5] drm/amdgpu: set GPU workload via ctx IOCTL

2022-09-26 Thread Shashank Sharma
This patch adds new IOCTL flags in amdgpu_context_IOCTL to get
and set GPU workload profile. These calls will allow a user to
switch to a GPU power profile which might be better suitable to
its workload type. The currently supported workload types are:
"None": Default workload profile
"3D": Workload profile for 3D rendering work
"Video": Workload profile for Media/Encode/Decode work
"VR": Workload profile for VR rendering work
"Compute": Workload profile for Compute work

The workload hint flag is saved in GPU context, and then its
applied when we actually run the job.

Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 56 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h |  1 +
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 8ee4e8491f39..5a116e5bb6a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -27,6 +27,7 @@
 #include "amdgpu.h"
 #include "amdgpu_sched.h"
 #include "amdgpu_ras.h"
+#include "amdgpu_ctx_workload.h"
 #include 
 
 #define to_amdgpu_ctx_entity(e)\
@@ -328,7 +329,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, 
int32_t priority,
return r;
 
ctx->stable_pstate = current_stable_pstate;
-
+   ctx->workload_mode = AMDGPU_CTX_WORKLOAD_HINT_NONE;
return 0;
 }
 
@@ -633,11 +634,41 @@ static int amdgpu_ctx_stable_pstate(struct amdgpu_device 
*adev,
return r;
 }
 
+static int amdgpu_ctx_workload_profile(struct amdgpu_device *adev,
+  struct amdgpu_fpriv *fpriv, uint32_t id,
+  bool set, u32 *profile)
+{
+   struct amdgpu_ctx *ctx;
+   struct amdgpu_ctx_mgr *mgr;
+   u32 workload_hint;
+
+   if (!fpriv)
+   return -EINVAL;
+
+   mgr = >ctx_mgr;
+   mutex_lock(>lock);
+   ctx = idr_find(>ctx_handles, id);
+   if (!ctx) {
+   mutex_unlock(>lock);
+   return -EINVAL;
+   }
+
+   if (set) {
+   workload_hint = *profile;
+   ctx->workload_mode = workload_hint;
+   } else {
+   *profile = adev->pm.workload_mode;
+   }
+
+   mutex_unlock(>lock);
+   return 0;
+}
+
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 struct drm_file *filp)
 {
int r;
-   uint32_t id, stable_pstate;
+   uint32_t id, stable_pstate, wl_hint;
int32_t priority;
 
union drm_amdgpu_ctx *args = data;
@@ -681,6 +712,27 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
return -EINVAL;
r = amdgpu_ctx_stable_pstate(adev, fpriv, id, true, 
_pstate);
break;
+   case AMDGPU_CTX_OP_GET_WORKLOAD_PROFILE:
+   if (args->in.flags & ~AMDGPU_CTX_WORKLOAD_HINT_MASK)
+   return -EINVAL;
+   r = amdgpu_ctx_workload_profile(adev, fpriv, id, false, 
_hint);
+   if (!r)
+   args->out.workload.flags = wl_hint;
+   break;
+   case AMDGPU_CTX_OP_SET_WORKLOAD_PROFILE:
+   if (args->in.flags & ~AMDGPU_CTX_WORKLOAD_HINT_MASK)
+   return -EINVAL;
+   wl_hint = args->in.flags & AMDGPU_CTX_WORKLOAD_HINT_MASK;
+   if (wl_hint > AMDGPU_CTX_WORKLOAD_HINT_MAX)
+   return -EINVAL;
+   r = amdgpu_ctx_workload_profile(adev, fpriv, id, true, 
_hint);
+   if (r)
+   DRM_ERROR("Failed to set workload profile to %s\n",
+   amdgpu_workload_profile_name(wl_hint));
+   else
+   DRM_DEBUG_DRIVER("Workload profile set to %s\n",
+   amdgpu_workload_profile_name(wl_hint));
+   break;
default:
return -EINVAL;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index cc7c8afff414..6c8032c3291a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -58,6 +58,7 @@ struct amdgpu_ctx {
unsigned long   ras_counter_ce;
unsigned long   ras_counter_ue;
uint32_tstable_pstate;
+   uint32_tworkload_mode;
 };
 
 struct amdgpu_ctx_mgr {
-- 
2.34.1



[PATCH v2 0/5] GPU workload hints for better performance

2022-09-26 Thread Shashank Sharma
AMDGPU SOCs supports dynamic workload based power profiles, which can
provide fine-tuned performance for a particular type of workload.
This patch series adds an interface to set/reset these power profiles
based on the workload type hints. A user can set a hint of workload
type being submistted to GPU, and the driver can dynamically switch
the power profiles which is best suited to this kind of workload. 

Currently supported workload profiles are:
"None", "3D", "Video", "VR", "Compute"

V2: This version addresses the review comment from Christian about
chaning the design to set workload mode in a more dynamic method
than during the context creation.

Shashank Sharma (5):
  drm/amdgpu: add UAPI for workload hints to ctx ioctl
  drm/amdgpu: add new functions to set GPU power profile
  drm/amdgpu: set GPU workload via ctx IOCTL
  drm/amdgpu: switch GPU workload profile
  drm/amdgpu: switch workload context to/from compute

 drivers/gpu/drm/amd/amdgpu/Makefile   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 14 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 56 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h   |  1 +
 .../gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c  | 93 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   | 15 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.h   |  3 +
 .../gpu/drm/amd/include/amdgpu_ctx_workload.h | 54 +++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  5 +
 include/uapi/drm/amdgpu_drm.h | 18 
 12 files changed, 258 insertions(+), 6 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx_workload.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_ctx_workload.h

-- 
2.34.1



Re: [PATCH] drm/amdgpu: fix a compiling error

2022-09-26 Thread Christian König

Am 26.09.22 um 11:22 schrieb Asher Song:

The patch fix following compiling error:

error: ‘for’ loop initial declarations are only allowed in C99 mode
   for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {


Newer kernels are compiled in C99 mode, so this in now legal.

We should probably still fix that issue since we often port back things 
to older kernels.




Signed-off-by: Asher Song 
---
  drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 772ad200c5da..d58c5085600a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -1405,8 +1405,8 @@ void dcn32_update_dsc_pg(struct dc *dc,
bool safe_to_disable)
  {
struct dce_hwseq *hws = dc->hwseq;
-
-   for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
+   int i;
+   for (i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {


An empty line is required between deceleration and code. Try to use 
checkpatch.pl before sending out the patch.


Regards,
Christian.


struct display_stream_compressor *dsc = dc->res_pool->dscs[i];
bool is_dsc_ungated = hws->funcs.dsc_pg_status(hws, dsc->inst);
  




[PATCH] drm/amdgpu: fix a compiling error

2022-09-26 Thread Asher Song
The patch fix following compiling error:

error: ‘for’ loop initial declarations are only allowed in C99 mode
  for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {

Signed-off-by: Asher Song 
---
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index 772ad200c5da..d58c5085600a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -1405,8 +1405,8 @@ void dcn32_update_dsc_pg(struct dc *dc,
bool safe_to_disable)
 {
struct dce_hwseq *hws = dc->hwseq;
-
-   for (int i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
+   int i;
+   for (i = 0; i < dc->res_pool->res_cap->num_dsc; i++) {
struct display_stream_compressor *dsc = dc->res_pool->dscs[i];
bool is_dsc_ungated = hws->funcs.dsc_pg_status(hws, dsc->inst);
 
-- 
2.25.1



Re: [PATCH] drm/amdgpu: Always align dumb buffer at PAGE_SIZE

2022-09-26 Thread Christian König

I was thinking about that as well, yes.

Might be a good idea to just change the alignment check in 
amdgpu_bo_create():


    /* Memory should be aligned at least to a page size. */
    page_align = ALIGN(bp->byte_align, PAGE_SIZE) >> 
PAGE_SHIFT;



Something like ALIGN(bp->byte_align ?: 1, PAGE_SIZE) should already do it.

Christian.

Am 23.09.22 um 08:23 schrieb Marek Olšák:

The kernel could report the true alignment from the ioctl instead of 0.

Marek

On Fri, Sep 23, 2022 at 1:31 AM Christian König 
 wrote:


Am 23.09.22 um 07:28 schrieb lepton:
> On Thu, Sep 22, 2022 at 10:14 PM Christian König
>  wrote:
>> Am 23.09.22 um 01:04 schrieb Lepton Wu:
>>> Since size has been aligned to PAGE_SIZE already, just align it
>>> to PAGE_SIZE so later the buffer can be used as a texture in mesa
>>> after

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fmesa%2Fmesa%2Fcommit%2F%3Fid%3Df7a4051b8data=05%7C01%7Cchristian.koenig%40amd.com%7C645f6878a7bd487588b708da9d246c4c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637995077041120091%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=NMEAl8TByDLQFWW1d%2FaJfiGrXc4mpwL5dxNH0M0QH84%3Dreserved=0


>>> Otherwise, si_texture_create_object will fail at line
>>> "buf->alignment < tex->surface.alignment"
>> I don't think that those Mesa checks are a good idea in the
first place.
>>
>> The alignment value is often specified as zero when it doesn't
matter
>> because the minimum alignment can never be less than the page size.
> Are you suggesting to change those mesa checks?

Yes, the minimum alignment of allocations is always 4096 because
that's
the page size of the GPU.

> While that can be
> done, I still think a kernel side "fix" is still
> useful since it doesn't hurt while can fix issues for some
versions of mesa.

No, we have tons of places where we don't specify and alignment for
buffers because it never mattered. I certainly don't want to fix
all of
those.

Regards,
Christian.

>> Christian.
>>
>>> Signed-off-by: Lepton Wu 
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index 8ef31d687ef3b..8dca0c920d3ce 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -928,7 +928,7 @@ int amdgpu_mode_dumb_create(struct
drm_file *file_priv,
>>>        args->size = ALIGN(args->size, PAGE_SIZE);
>>>        domain = amdgpu_bo_get_preferred_domain(adev,
>>> amdgpu_display_supported_domains(adev, flags));
>>> -     r = amdgpu_gem_object_create(adev, args->size, 0,
domain, flags,
>>> +     r = amdgpu_gem_object_create(adev, args->size,
PAGE_SIZE, domain, flags,
>>>  ttm_bo_type_device, NULL, );
>>>        if (r)
>>>                return -ENOMEM;



[PATCH 4/7] mm/migrate_device.c: Add migrate_device_range()

2022-09-26 Thread Alistair Popple
Device drivers can use the migrate_vma family of functions to migrate
existing private anonymous mappings to device private pages. These pages
are backed by memory on the device with drivers being responsible for
copying data to and from device memory.

Device private pages are freed via the pgmap->page_free() callback when
they are unmapped and their refcount drops to zero. Alternatively they
may be freed indirectly via migration back to CPU memory in response to
a pgmap->migrate_to_ram() callback called whenever the CPU accesses
an address mapped to a device private page.

In other words drivers cannot control the lifetime of data allocated on
the devices and must wait until these pages are freed from userspace.
This causes issues when memory needs to reclaimed on the device, either
because the device is going away due to a ->release() callback or
because another user needs to use the memory.

Drivers could use the existing migrate_vma functions to migrate data off
the device. However this would require them to track the mappings of
each page which is both complicated and not always possible. Instead
drivers need to be able to migrate device pages directly so they can
free up device memory.

To allow that this patch introduces the migrate_device family of
functions which are functionally similar to migrate_vma but which skips
the initial lookup based on mapping.

Signed-off-by: Alistair Popple 
---
 include/linux/migrate.h |  7 +++-
 mm/migrate_device.c | 89 ++
 2 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 82ffa47..582cdc7 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -225,6 +225,13 @@ struct migrate_vma {
 int migrate_vma_setup(struct migrate_vma *args);
 void migrate_vma_pages(struct migrate_vma *migrate);
 void migrate_vma_finalize(struct migrate_vma *migrate);
+int migrate_device_range(unsigned long *src_pfns, unsigned long start,
+   unsigned long npages);
+void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns,
+   unsigned long npages);
+void migrate_device_finalize(unsigned long *src_pfns,
+   unsigned long *dst_pfns, unsigned long npages);
+
 #endif /* CONFIG_MIGRATION */
 
 #endif /* _LINUX_MIGRATE_H */
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index ba479b5..824860a 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -681,7 +681,7 @@ static void migrate_vma_insert_page(struct migrate_vma 
*migrate,
*src &= ~MIGRATE_PFN_MIGRATE;
 }
 
-static void migrate_device_pages(unsigned long *src_pfns,
+static void __migrate_device_pages(unsigned long *src_pfns,
unsigned long *dst_pfns, unsigned long npages,
struct migrate_vma *migrate)
 {
@@ -703,6 +703,9 @@ static void migrate_device_pages(unsigned long *src_pfns,
if (!page) {
unsigned long addr;
 
+   if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE))
+   continue;
+
/*
 * The only time there is no vma is when called from
 * migrate_device_coherent_page(). However this isn't
@@ -710,8 +713,6 @@ static void migrate_device_pages(unsigned long *src_pfns,
 */
VM_BUG_ON(!migrate);
addr = migrate->start + i*PAGE_SIZE;
-   if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE))
-   continue;
if (!notified) {
notified = true;
 
@@ -767,6 +768,22 @@ static void migrate_device_pages(unsigned long *src_pfns,
 }
 
 /**
+ * migrate_device_pages() - migrate meta-data from src page to dst page
+ * @src_pfns: src_pfns returned from migrate_device_range()
+ * @dst_pfns: array of pfns allocated by the driver to migrate memory to
+ * @npages: number of pages in the range
+ *
+ * Equivalent to migrate_vma_pages(). This is called to migrate struct page
+ * meta-data from source struct page to destination.
+ */
+void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns,
+   unsigned long npages)
+{
+   __migrate_device_pages(src_pfns, dst_pfns, npages, NULL);
+}
+EXPORT_SYMBOL(migrate_device_pages);
+
+/**
  * migrate_vma_pages() - migrate meta-data from src page to dst page
  * @migrate: migrate struct containing all migration information
  *
@@ -776,12 +793,22 @@ static void migrate_device_pages(unsigned long *src_pfns,
  */
 void migrate_vma_pages(struct migrate_vma *migrate)
 {
-   migrate_device_pages(migrate->src, migrate->dst, migrate->npages, 
migrate);
+   __migrate_device_pages(migrate->src, migrate->dst, migrate->npages, 
migrate);
 }
 EXPORT_SYMBOL(migrate_vma_pages);
 
-static void 

[PATCH 0/7] Fix several device private page reference counting issues

2022-09-26 Thread Alistair Popple
This series aims to fix a number of page reference counting issues in drivers
dealing with device private ZONE_DEVICE pages. These result in use-after-free
type bugs, either from accessing a struct page which no longer exists because it
has been removed or accessing fields within the struct page which are no longer
valid because the page has been freed.

During normal usage it is unlikely these will cause any problems. However
without these fixes it is possible to crash the kernel from userspace. These
crashes can be triggered either by unloading the kernel module or unbinding the
device from the driver prior to a userspace task exiting. In modules such as
Nouveau it is also possible to trigger some of these issues by explicitly
closing the device file-descriptor prior to the task exiting and then accessing
device private memory.

This involves changes to both PowerPC and AMD GPU code. Unfortunately I lack the
hardware to test on either of these so would appreciate it if someone with
access could test those.

Alistair Popple (7):
  mm/memory.c: Fix race when faulting a device private page
  mm: Free device private pages have zero refcount
  mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
  mm/migrate_device.c: Add migrate_device_range()
  nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
  nouveau/dmem: Evict device private memory during release
  hmm-tests: Add test for migrate_device_range()

 arch/powerpc/kvm/book3s_hv_uvmem.c   |  16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  18 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  11 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c   | 108 +++
 include/linux/migrate.h  |  15 ++-
 lib/test_hmm.c   | 127 ++---
 lib/test_hmm_uapi.h  |   1 +-
 mm/memory.c  |  16 +-
 mm/memremap.c|   5 +-
 mm/migrate.c |  34 +--
 mm/migrate_device.c  | 239 +---
 mm/page_alloc.c  |   6 +-
 tools/testing/selftests/vm/hmm-tests.c   |  49 +-
 14 files changed, 487 insertions(+), 160 deletions(-)

base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786
-- 
git-series 0.9.1


[PATCH 7/7] hmm-tests: Add test for migrate_device_range()

2022-09-26 Thread Alistair Popple
Signed-off-by: Alistair Popple 
---
 lib/test_hmm.c | 119 +-
 lib/test_hmm_uapi.h|   1 +-
 tools/testing/selftests/vm/hmm-tests.c |  49 +++-
 3 files changed, 148 insertions(+), 21 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 2bd3a67..d2821dd 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -100,6 +100,7 @@ struct dmirror {
 struct dmirror_chunk {
struct dev_pagemap  pagemap;
struct dmirror_device   *mdevice;
+   bool remove;
 };
 
 /*
@@ -192,11 +193,15 @@ static int dmirror_fops_release(struct inode *inode, 
struct file *filp)
return 0;
 }
 
+static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page)
+{
+   return container_of(page->pgmap, struct dmirror_chunk, pagemap);
+}
+
 static struct dmirror_device *dmirror_page_to_device(struct page *page)
 
 {
-   return container_of(page->pgmap, struct dmirror_chunk,
-   pagemap)->mdevice;
+   return dmirror_page_to_chunk(page)->mdevice;
 }
 
 static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range)
@@ -1219,6 +1224,84 @@ static int dmirror_snapshot(struct dmirror *dmirror,
return ret;
 }
 
+static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk)
+{
+   unsigned long start_pfn = chunk->pagemap.range.start >> PAGE_SHIFT;
+   unsigned long end_pfn = chunk->pagemap.range.end >> PAGE_SHIFT;
+   unsigned long npages = end_pfn - start_pfn + 1;
+   unsigned long i;
+   unsigned long *src_pfns;
+   unsigned long *dst_pfns;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, start_pfn, npages);
+   for (i = 0; i < npages; i++) {
+   struct page *dpage, *spage;
+
+   spage = migrate_pfn_to_page(src_pfns[i]);
+   if (!spage || !(src_pfns[i] & MIGRATE_PFN_MIGRATE))
+   continue;
+
+   if (WARN_ON(!is_device_private_page(spage) &&
+   !is_device_coherent_page(spage)))
+   continue;
+   spage = BACKING_PAGE(spage);
+   dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL);
+   lock_page(dpage);
+   copy_highpage(dpage, spage);
+   dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
+   if (src_pfns[i] & MIGRATE_PFN_WRITE)
+   dst_pfns[i] |= MIGRATE_PFN_WRITE;
+   }
+   migrate_device_pages(src_pfns, dst_pfns, npages);
+   migrate_device_finalize(src_pfns, dst_pfns, npages);
+   kfree(src_pfns);
+   kfree(dst_pfns);
+}
+
+/* Removes free pages from the free list so they can't be re-allocated */
+static void dmirror_remove_free_pages(struct dmirror_chunk *devmem)
+{
+   struct dmirror_device *mdevice = devmem->mdevice;
+   struct page *page;
+
+   for (page = mdevice->free_pages; page; page = page->zone_device_data)
+   if (dmirror_page_to_chunk(page) == devmem)
+   mdevice->free_pages = page->zone_device_data;
+}
+
+static void dmirror_device_remove_chunks(struct dmirror_device *mdevice)
+{
+   unsigned int i;
+
+   mutex_lock(>devmem_lock);
+   if (mdevice->devmem_chunks) {
+   for (i = 0; i < mdevice->devmem_count; i++) {
+   struct dmirror_chunk *devmem =
+   mdevice->devmem_chunks[i];
+
+   spin_lock(>lock);
+   devmem->remove = true;
+   dmirror_remove_free_pages(devmem);
+   spin_unlock(>lock);
+
+   dmirror_device_evict_chunk(devmem);
+   memunmap_pages(>pagemap);
+   if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
+   release_mem_region(devmem->pagemap.range.start,
+  
range_len(>pagemap.range));
+   kfree(devmem);
+   }
+   mdevice->devmem_count = 0;
+   mdevice->devmem_capacity = 0;
+   mdevice->free_pages = NULL;
+   kfree(mdevice->devmem_chunks);
+   }
+   mutex_unlock(>devmem_lock);
+}
+
 static long dmirror_fops_unlocked_ioctl(struct file *filp,
unsigned int command,
unsigned long arg)
@@ -1273,6 +1356,11 @@ static long dmirror_fops_unlocked_ioctl(struct file 
*filp,
ret = dmirror_snapshot(dmirror, );
break;
 
+   case HMM_DMIRROR_RELEASE:
+   dmirror_device_remove_chunks(dmirror->mdevice);
+   ret = 0;
+   break;
+
default:
return -EINVAL;
}
@@ -1327,9 +1415,13 @@ static void 

[PATCH 3/7] mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()

2022-09-26 Thread Alistair Popple
migrate_device_coherent_page() reuses the existing migrate_vma family of
functions to migrate a specific page without providing a valid mapping
or vma. This looks a bit odd because it means we are calling
migrate_vma_*() without setting a valid vma, however it was considered
acceptable at the time because the details were internal to
migrate_device.c and there was only a single user.

One of the reasons the details could be kept internal was that this was
strictly for migrating device coherent memory. Such memory can be copied
directly by the CPU without intervention from a driver. However this
isn't true for device private memory, and a future change requires
similar functionality for device private memory. So refactor the code
into something more sensible for migrating device memory without a vma.

Signed-off-by: Alistair Popple 
---
 mm/migrate_device.c | 150 +
 1 file changed, 85 insertions(+), 65 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index f756c00..ba479b5 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -345,26 +345,20 @@ static bool migrate_vma_check_page(struct page *page, 
struct page *fault_page)
 }
 
 /*
- * migrate_vma_unmap() - replace page mapping with special migration pte entry
- * @migrate: migrate struct containing all migration information
- *
- * Isolate pages from the LRU and replace mappings (CPU page table pte) with a
- * special migration pte entry and check if it has been pinned. Pinned pages 
are
- * restored because we cannot migrate them.
- *
- * This is the last step before we call the device driver callback to allocate
- * destination memory and copy contents of original page over to new page.
+ * Unmaps pages for migration. Returns number of unmapped pages.
  */
-static void migrate_vma_unmap(struct migrate_vma *migrate)
+static unsigned long migrate_device_unmap(unsigned long *src_pfns,
+ unsigned long npages,
+ struct page *fault_page)
 {
-   const unsigned long npages = migrate->npages;
unsigned long i, restore = 0;
bool allow_drain = true;
+   unsigned long unmapped = 0;
 
lru_add_drain();
 
for (i = 0; i < npages; i++) {
-   struct page *page = migrate_pfn_to_page(migrate->src[i]);
+   struct page *page = migrate_pfn_to_page(src_pfns[i]);
struct folio *folio;
 
if (!page)
@@ -379,8 +373,7 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
}
 
if (isolate_lru_page(page)) {
-   migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
-   migrate->cpages--;
+   src_pfns[i] &= ~MIGRATE_PFN_MIGRATE;
restore++;
continue;
}
@@ -394,34 +387,54 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
try_to_migrate(folio, 0);
 
if (page_mapped(page) ||
-   !migrate_vma_check_page(page, migrate->fault_page)) {
+   !migrate_vma_check_page(page, fault_page)) {
if (!is_zone_device_page(page)) {
get_page(page);
putback_lru_page(page);
}
 
-   migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
-   migrate->cpages--;
+   src_pfns[i] &= ~MIGRATE_PFN_MIGRATE;
restore++;
continue;
}
+
+   unmapped++;
}
 
for (i = 0; i < npages && restore; i++) {
-   struct page *page = migrate_pfn_to_page(migrate->src[i]);
+   struct page *page = migrate_pfn_to_page(src_pfns[i]);
struct folio *folio;
 
-   if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE))
+   if (!page || (src_pfns[i] & MIGRATE_PFN_MIGRATE))
continue;
 
folio = page_folio(page);
remove_migration_ptes(folio, folio, false);
 
-   migrate->src[i] = 0;
+   src_pfns[i] = 0;
folio_unlock(folio);
folio_put(folio);
restore--;
}
+
+   return unmapped;
+}
+
+/*
+ * migrate_vma_unmap() - replace page mapping with special migration pte entry
+ * @migrate: migrate struct containing all migration information
+ *
+ * Isolate pages from the LRU and replace mappings (CPU page table pte) with a
+ * special migration pte entry and check if it has been pinned. Pinned pages 
are
+ * restored because we cannot migrate them.
+ *
+ * This is the last step before we call the device driver callback to allocate
+ * destination memory and copy contents of original 

Re: [PATCH v7 0/9] dyndbg: drm.debug adaptation

2022-09-26 Thread Greg KH
On Sun, Sep 11, 2022 at 11:28:43PM -0600, Jim Cromie wrote:
> hi Greg, Dan, Jason, DRM-folk,
> 
> heres follow-up to V6:
>   rebased on driver-core/driver-core-next for -v6 applied bits (thanks)
>   rework drm_debug_enabled{_raw,_instrumented,} per Dan.
> 
> It excludes:
>   nouveau parts (immature)
>   tracefs parts (I missed --to=Steve on v6)
>   split _ddebug_site and de-duplicate experiment (way unready)
> 
> IOW, its the remaining commits of V6 on which Dan gave his Reviewed-by.
> 
> If these are good to apply, I'll rebase and repost the rest separately.

All now queued up, thanks.

greg k-h


[PATCH v2] drivers/amd/pm: check the return value of amdgpu_bo_kmap

2022-09-26 Thread Li Zhong
amdgpu_bo_kmap() returns error when fails to map buffer object. Add the
error check and propagate the error.

Signed-off-by: Li Zhong 
---

v2: revise the compile error

 drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c   | 5 -
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 5 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
index 8fd0782a2b20..f5e08b60f66e 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
@@ -1384,13 +1384,16 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
 static void kv_dpm_disable(struct amdgpu_device *adev)
 {
struct kv_power_info *pi = kv_get_pi(adev);
+   int err;
 
amdgpu_irq_put(adev, >pm.dpm.thermal.irq,
   AMDGPU_THERMAL_IRQ_LOW_TO_HIGH);
amdgpu_irq_put(adev, >pm.dpm.thermal.irq,
   AMDGPU_THERMAL_IRQ_HIGH_TO_LOW);
 
-   amdgpu_kv_smc_bapm_enable(adev, false);
+   err = amdgpu_kv_smc_bapm_enable(adev, false);
+   if (err)
+   DRM_ERROR("amdgpu_kv_smc_bapm_enable failed\n");
 
if (adev->asic_type == CHIP_MULLINS)
kv_enable_nb_dpm(adev, false);
diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 1eb4e613b27a..ec055858eb95 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -1485,6 +1485,7 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
 {
struct pp_hwmgr *hwmgr = handle;
struct amdgpu_device *adev = hwmgr->adev;
+   int err;
 
if (!addr || !size)
return -EINVAL;
@@ -1492,7 +1493,9 @@ static int pp_get_prv_buffer_details(void *handle, void 
**addr, size_t *size)
*addr = NULL;
*size = 0;
if (adev->pm.smu_prv_buffer) {
-   amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   err = amdgpu_bo_kmap(adev->pm.smu_prv_buffer, addr);
+   if (err)
+   return err;
*size = adev->pm.smu_prv_buffer_size;
}
 
-- 
2.25.1



Re: [PATCH v1] drivers/amd/kv_dpm: check the return value of amdgpu_kv_smc_bapm_enable

2022-09-26 Thread Li Zhong
On Thu, Sep 22, 2022 at 8:04 PM Lazar, Lijo  wrote:
>
>
>
> On 9/23/2022 1:36 AM, Li Zhong wrote:
> > Check the return value of amdgpu_kv_smc_bapm_enable() and log the error
> > when it fails.
> >
> > Signed-off-by: Li Zhong 
> > ---
> >   drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c | 5 -
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c 
> > b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
> > index 8fd0782a2b20..d392256effe2 100644
> > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/kv_dpm.c
> > @@ -1384,13 +1384,16 @@ static int kv_dpm_enable(struct amdgpu_device *adev)
> >   static void kv_dpm_disable(struct amdgpu_device *adev)
> >   {
> >   struct kv_power_info *pi = kv_get_pi(adev);
> > + int err;
> >
> >   amdgpu_irq_put(adev, >pm.dpm.thermal.irq,
> >  AMDGPU_THERMAL_IRQ_LOW_TO_HIGH);
> >   amdgpu_irq_put(adev, >pm.dpm.thermal.irq,
> >  AMDGPU_THERMAL_IRQ_HIGH_TO_LOW);
> >
> > - amdgpu_kv_smc_bapm_enable(adev, false);
> > + err = amdgpu_kv_smc_bapm_enable(adev, false);
> > + if (ret)
> > + DRM_ERROR("amdgpu_kv_smc_bapm_enable failed\n");
>
> Return code is captured in 'err' and check is for ret' variable.
>
> BTW, does this code compile?
>
> Thanks,
> Lijo
>
> >
> >   if (adev->asic_type == CHIP_MULLINS)
> >   kv_enable_nb_dpm(adev, false);
> >

Thanks for your reply. So sorry for submitting the wrong commit. Now it's
fixed in v2 patch.


Re: [PATCH v4 09/21] drm/etnaviv: Prepare to dynamic dma-buf locking specification

2022-09-26 Thread cco
To whoom it may belong: Deepest sorrow to inform you that my husband Helmut 
passed by on July 21. This access will be cleared in a view days. Irene
 Ursprüngliche Nachricht Von: Christian König 
 Datum: 01.09.22  08:50  (GMT+01:00) An: Dmitry 
Osipenko , David Airlie , Gerd 
Hoffmann , Gurchetan Singh , 
Chia-I Wu , Daniel Vetter , Daniel Almeida 
, Gert Wollny , 
Gustavo Padovan , Daniel Stone 
, Tomeu Vizoso , Maarten 
Lankhorst , Maxime Ripard 
, Thomas Zimmermann , Rob Clark 
, Sumit Semwal , "Pan, Xinhui" 
, Thierry Reding , Tomasz Figa 
, Marek Szyprowski , Mauro 
Carvalho Chehab , Alex Deucher , 
Jani Nikula , Joonas Lahtinen 
, Rodrigo Vivi , 
Tvrtko Ursulin , Thomas Hellström 
, Qiang Yu , Srinivas Kandagatla 
, Amol Maheshwari , 
Jason Gunthorpe , Leon Romanovsky , Juergen 
Gross , Stefano Stabellini , Oleksandr 
Tyshchenko , Tomi Valkeinen , 
Russell King , Lucas Stach , 
Christian Gmeiner  Cc: 
dri-de...@lists.freedesktop.org, linux-ker...@vger.kernel.org, Dmitry Osipenko 
, linux-me...@vger.kernel.org, 
linaro-mm-...@lists.linaro.org, amd-gfx@lists.freedesktop.org, 
intel-...@lists.freedesktop.org, ker...@collabora.com, 
virtualizat...@lists.linux-foundation.org, linux-r...@vger.kernel.org, 
linux-arm-...@vger.kernel.org Betreff: Re: [PATCH v4 09/21] drm/etnaviv: 
Prepare to dynamic dma-buf locking
  specification Am 31.08.22 um 17:37 schrieb Dmitry Osipenko:> Prepare Etnaviv 
driver to the common dynamic dma-buf locking convention> by starting to use the 
unlocked versions of dma-buf API functions.>> Signed-off-by: Dmitry Osipenko 
Interesting, where is the matching 
vmap()?Anyway, this patch is Acked-by: Christian König 
> --->   drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c 
| 2 +->   1 file changed, 1 insertion(+), 1 deletion(-)>> diff --git 
a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c> index 3fa2da149639..7031db145a77 
100644> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c> +++ 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c> @@ -65,7 +65,7 @@ static void 
etnaviv_gem_prime_release(struct etnaviv_gem_object *etnaviv_obj)>   
struct iosys_map map = IOSYS_MAP_INIT_VADDR(etnaviv_obj->vaddr);>   >   
if (etnaviv_obj->vaddr)> -  
dma_buf_vunmap(etnaviv_obj->base.import_attach->dmabuf, );> +   
dma_buf_vunmap_unlocked(etnaviv_obj->base.import_attach->dmabuf, );>   >    
/* Don't drop the pages for imported dmabuf, as they are not>    * 
ours, just free the array we allocated:

Re: [PATCH v5 02/31] drm/i915: Don't register backlight when another backlight should be used (v2)

2022-09-26 Thread Dmitry Osipenko
25.08.2022 17:36, Hans de Goede пишет:
> Before this commit when we want userspace to use the acpi_video backlight
> device we register both the GPU's native backlight device and acpi_video's
> firmware acpi_video# backlight device. This relies on userspace preferring
> firmware type backlight devices over native ones.
> 
> Registering 2 backlight devices for a single display really is
> undesirable, don't register the GPU's native backlight device when
> another backlight device should be used.
> 
> Changes in v2:
> - Use drm_info(drm_dev,  ...) for log messages
> 
> Reviewed-by: Jani Nikula 
> Signed-off-by: Hans de Goede 
> ---
>  drivers/gpu/drm/i915/display/intel_backlight.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_backlight.c 
> b/drivers/gpu/drm/i915/display/intel_backlight.c
> index 681ebcda97ad..03c7966f68d6 100644
> --- a/drivers/gpu/drm/i915/display/intel_backlight.c
> +++ b/drivers/gpu/drm/i915/display/intel_backlight.c
> @@ -8,6 +8,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  #include "intel_backlight.h"
>  #include "intel_backlight_regs.h"
>  #include "intel_connector.h"
> @@ -952,6 +954,11 @@ int intel_backlight_device_register(struct 
> intel_connector *connector)
>  
>   WARN_ON(panel->backlight.max == 0);
>  
> + if (!acpi_video_backlight_use_native()) {
> + drm_info(>drm, "Skipping intel_backlight registration\n");
> + return 0;
> + }
> +
>   memset(, 0, sizeof(props));
>   props.type = BACKLIGHT_RAW;
>  

This breaks backlight on Acer Chromebook Spin 713 because backlight
isn't registered anymore. Any ideas how to fix it?


[PATCH 1/7] mm/memory.c: Fix race when faulting a device private page

2022-09-26 Thread Alistair Popple
When the CPU tries to access a device private page the migrate_to_ram()
callback associated with the pgmap for the page is called. However no
reference is taken on the faulting page. Therefore a concurrent
migration of the device private page can free the page and possibly the
underlying pgmap. This results in a race which can crash the kernel due
to the migrate_to_ram() function pointer becoming invalid. It also means
drivers can't reliably read the zone_device_data field because the page
may have been freed with memunmap_pages().

Close the race by getting a reference on the page while holding the ptl
to ensure it has not been freed. Unfortunately the elevated reference
count will cause the migration required to handle the fault to fail. To
avoid this failure pass the faulting page into the migrate_vma functions
so that if an elevated reference count is found it can be checked to see
if it's expected or not.

Signed-off-by: Alistair Popple 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 15 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +---
 include/linux/migrate.h  |  8 ++-
 lib/test_hmm.c   |  7 ++---
 mm/memory.c  | 16 +++-
 mm/migrate.c | 34 ++---
 mm/migrate_device.c  | 18 +
 9 files changed, 87 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 5980063..d4eacf4 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -508,10 +508,10 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
 static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
unsigned long start,
unsigned long end, unsigned long page_shift,
-   struct kvm *kvm, unsigned long gpa)
+   struct kvm *kvm, unsigned long gpa, struct page *fault_page)
 {
unsigned long src_pfn, dst_pfn = 0;
-   struct migrate_vma mig;
+   struct migrate_vma mig = { 0 };
struct page *dpage, *spage;
struct kvmppc_uvmem_page_pvt *pvt;
unsigned long pfn;
@@ -525,6 +525,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
mig.dst = _pfn;
mig.pgmap_owner = _uvmem_pgmap;
mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+   mig.fault_page = fault_page;
 
/* The requested page is already paged-out, nothing to do */
if (!kvmppc_gfn_is_uvmem_pfn(gpa >> page_shift, kvm, NULL))
@@ -580,12 +581,14 @@ static int __kvmppc_svm_page_out(struct vm_area_struct 
*vma,
 static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
  unsigned long start, unsigned long end,
  unsigned long page_shift,
- struct kvm *kvm, unsigned long gpa)
+ struct kvm *kvm, unsigned long gpa,
+ struct page *fault_page)
 {
int ret;
 
mutex_lock(>arch.uvmem_lock);
-   ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa);
+   ret = __kvmppc_svm_page_out(vma, start, end, page_shift, kvm, gpa,
+   fault_page);
mutex_unlock(>arch.uvmem_lock);
 
return ret;
@@ -736,7 +739,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
bool pagein)
 {
unsigned long src_pfn, dst_pfn = 0;
-   struct migrate_vma mig;
+   struct migrate_vma mig = { 0 };
struct page *spage;
unsigned long pfn;
struct page *dpage;
@@ -994,7 +997,7 @@ static vm_fault_t kvmppc_uvmem_migrate_to_ram(struct 
vm_fault *vmf)
 
if (kvmppc_svm_page_out(vmf->vma, vmf->address,
vmf->address + PAGE_SIZE, PAGE_SHIFT,
-   pvt->kvm, pvt->gpa))
+   pvt->kvm, pvt->gpa, vmf->page))
return VM_FAULT_SIGBUS;
else
return 0;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index b059a77..776448b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -409,7 +409,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
uint64_t npages = (end - start) >> PAGE_SHIFT;
struct kfd_process_device *pdd;
struct dma_fence *mfence = NULL;
-   struct migrate_vma migrate;
+   struct migrate_vma migrate = { 0 };
unsigned long cpages = 0;
dma_addr_t *scratch;
void *buf;
@@ -668,7 +668,7 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
 static long
 

Re: [REGRESSION] Graphical issues on Lenovo Yoga 7 14ARB7 laptop since v6.0-rc1 (bisected)

2022-09-26 Thread August Wikerfors

Hi Leo,

On 2022-09-23 20:41, Leo Li wrote:

Hi August,

Can you provide a dmesg log with drm.debug=0x16 enabled in kernel cmdline?

Log is available here: https://files.augustwikerfors.se/dmesg.2022-09-23.txt

This is what I did during that log:
1. Boot the system
2. Type into the password field in SDDM (this is when the problem occurs)
3. Switch to a TTY to save the log (the problem doesn't happen there)

Regards,
August Wikerfors


[PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Alistair Popple
When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to be left mapped in currently running
processes. This leads to a kernel crash when the pages are either freed
or accessed from the CPU because the GPU and associated data structures
and callbacks have all been freed.

Fix this by migrating any mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Signed-off-by: Alistair Popple 

---

I assume the AMD driver might have a similar issue. However I can't see
where device private (or coherent) pages actually get unmapped/freed
during teardown as I couldn't find any relevant calls to
devm_memunmap(), memunmap(), devm_release_mem_region() or
release_mem_region(). So it appears that ZONE_DEVICE pages are not being
properly freed during module unload, unless I'm missing something?
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
 1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 66ebbd4..3b247b8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
mutex_unlock(>dmem->mutex);
 }
 
+/*
+ * Evict all pages mapping a chunk.
+ */
+void
+nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
+{
+   unsigned long i, npages = range_len(>pagemap.range) >> 
PAGE_SHIFT;
+   unsigned long *src_pfns, *dst_pfns;
+   dma_addr_t *dma_addrs;
+   struct nouveau_fence *fence;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+   dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
+   npages);
+
+   for (i = 0; i < npages; i++) {
+   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
+   struct page *dpage;
+
+   /*
+* _GFP_NOFAIL because the GPU is going away and there
+* is nothing sensible we can do if we can't copy the
+* data back.
+*/
+   dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL);
+   dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
+   nouveau_dmem_copy_one(chunk->drm,
+   migrate_pfn_to_page(src_pfns[i]), dpage,
+   _addrs[i]);
+   }
+   }
+
+   nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, );
+   migrate_device_pages(src_pfns, dst_pfns, npages);
+   nouveau_dmem_fence_done();
+   migrate_device_finalize(src_pfns, dst_pfns, npages);
+   kfree(src_pfns);
+   kfree(dst_pfns);
+   for (i = 0; i < npages; i++)
+   dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, 
DMA_BIDIRECTIONAL);
+   kfree(dma_addrs);
+}
+
 void
 nouveau_dmem_fini(struct nouveau_drm *drm)
 {
@@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
mutex_lock(>dmem->mutex);
 
list_for_each_entry_safe(chunk, tmp, >dmem->chunks, list) {
+   nouveau_dmem_evict_chunk(chunk);
nouveau_bo_unpin(chunk->bo);
nouveau_bo_ref(NULL, >bo);
+   WARN_ON(chunk->callocated);
list_del(>list);
memunmap_pages(>pagemap);
release_mem_region(chunk->pagemap.range.start,
-- 
git-series 0.9.1


[PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()

2022-09-26 Thread Alistair Popple
nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
the migrate_to_ram() callback and is used to copy data from GPU to CPU
memory. It is currently specific to fault handling, however a future
patch implementing eviction of data during teardown needs similar
functionality.

Refactor out the core functionality so that it is not specific to fault
handling.

Signed-off-by: Alistair Popple 
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +--
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index f9234ed..66ebbd4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -139,44 +139,25 @@ static void nouveau_dmem_fence_done(struct nouveau_fence 
**fence)
}
 }
 
-static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
-   struct vm_fault *vmf, struct migrate_vma *args,
-   dma_addr_t *dma_addr)
+static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page *spage,
+   struct page *dpage, dma_addr_t *dma_addr)
 {
struct device *dev = drm->dev->dev;
-   struct page *dpage, *spage;
-   struct nouveau_svmm *svmm;
-
-   spage = migrate_pfn_to_page(args->src[0]);
-   if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE))
-   return 0;
 
-   dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
-   if (!dpage)
-   return VM_FAULT_SIGBUS;
lock_page(dpage);
 
*dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
if (dma_mapping_error(dev, *dma_addr))
-   goto error_free_page;
+   return -EIO;
 
-   svmm = spage->zone_device_data;
-   mutex_lock(>mutex);
-   nouveau_svmm_invalidate(svmm, args->start, args->end);
if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr,
-   NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage)))
-   goto error_dma_unmap;
-   mutex_unlock(>mutex);
+NOUVEAU_APER_VRAM,
+nouveau_dmem_page_addr(spage))) {
+   dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+   return -EIO;
+   }
 
-   args->dst[0] = migrate_pfn(page_to_pfn(dpage));
return 0;
-
-error_dma_unmap:
-   mutex_unlock(>mutex);
-   dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
-error_free_page:
-   __free_page(dpage);
-   return VM_FAULT_SIGBUS;
 }
 
 static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
@@ -184,9 +165,11 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
vm_fault *vmf)
struct nouveau_drm *drm = page_to_drm(vmf->page);
struct nouveau_dmem *dmem = drm->dmem;
struct nouveau_fence *fence;
+   struct nouveau_svmm *svmm;
+   struct page *spage, *dpage;
unsigned long src = 0, dst = 0;
dma_addr_t dma_addr = 0;
-   vm_fault_t ret;
+   vm_fault_t ret = 0;
struct migrate_vma args = {
.vma= vmf->vma,
.start  = vmf->address,
@@ -207,9 +190,25 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
vm_fault *vmf)
if (!args.cpages)
return 0;
 
-   ret = nouveau_dmem_fault_copy_one(drm, vmf, , _addr);
-   if (ret || dst == 0)
+   spage = migrate_pfn_to_page(src);
+   if (!spage || !(src & MIGRATE_PFN_MIGRATE))
+   goto done;
+
+   dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
+   if (!dpage)
+   goto done;
+
+   dst = migrate_pfn(page_to_pfn(dpage));
+
+   svmm = spage->zone_device_data;
+   mutex_lock(>mutex);
+   nouveau_svmm_invalidate(svmm, args.start, args.end);
+   ret = nouveau_dmem_copy_one(drm, spage, dpage, _addr);
+   mutex_unlock(>mutex);
+   if (ret) {
+   ret = VM_FAULT_SIGBUS;
goto done;
+   }
 
nouveau_fence_new(dmem->migrate.chan, false, );
migrate_vma_pages();
-- 
git-series 0.9.1


[PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-26 Thread Alistair Popple
Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
refcount") device private pages have no longer had an extra reference
count when the page is in use. However before handing them back to the
owning device driver we add an extra reference count such that free
pages have a reference count of one.

This makes it difficult to tell if a page is free or not because both
free and in use pages will have a non-zero refcount. Instead we should
return pages to the drivers page allocator with a zero reference count.
Kernel code can then safely use kernel functions such as
get_page_unless_zero().

Signed-off-by: Alistair Popple 
---
 arch/powerpc/kvm/book3s_hv_uvmem.c   | 1 +
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
 drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
 lib/test_hmm.c   | 1 +
 mm/memremap.c| 5 -
 mm/page_alloc.c  | 6 ++
 6 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index d4eacf4..08d2f7d 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -718,6 +718,7 @@ static struct page *kvmppc_uvmem_get_page(unsigned long 
gpa, struct kvm *kvm)
 
dpage = pfn_to_page(uvmem_pfn);
dpage->zone_device_data = pvt;
+   set_page_count(dpage, 1);
lock_page(dpage);
return dpage;
 out_clear:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 776448b..05c2f4d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -223,6 +223,7 @@ svm_migrate_get_vram_page(struct svm_range *prange, 
unsigned long pfn)
page = pfn_to_page(pfn);
svm_range_bo_ref(prange->svm_bo);
page->zone_device_data = prange->svm_bo;
+   set_page_count(page, 1);
lock_page(page);
 }
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 1635661..f9234ed 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -326,6 +326,7 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm)
return NULL;
}
 
+   set_page_count(page, 1);
lock_page(page);
return page;
 }
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 89463ff..2bd3a67 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -627,6 +627,7 @@ static struct page *dmirror_devmem_alloc_page(struct 
dmirror_device *mdevice)
goto error;
}
 
+   set_page_count(dpage, 1);
dpage->zone_device_data = rpage;
lock_page(dpage);
return dpage;
diff --git a/mm/memremap.c b/mm/memremap.c
index 25029a4..e065171 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -501,11 +501,6 @@ void free_zone_device_page(struct page *page)
 */
page->mapping = NULL;
page->pgmap->ops->page_free(page);
-
-   /*
-* Reset the page count to 1 to prepare for handing out the page again.
-*/
-   set_page_count(page, 1);
 }
 
 #ifdef CONFIG_FS_DAX
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d49803..67eaab5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6744,6 +6744,12 @@ static void __ref __init_zone_device_page(struct page 
*page, unsigned long pfn,
set_pageblock_migratetype(page, MIGRATE_MOVABLE);
cond_resched();
}
+
+   /*
+* ZONE_DEVICE pages are released directly to the driver page allocator
+* which will set the page count to 1 when allocating the page.
+*/
+   set_page_count(page, 0);
 }
 
 /*
-- 
git-series 0.9.1


Re: [PATCH] drm/amdgpu: Remove fence_process in count_emitted

2022-09-26 Thread Christian König

Am 23.09.22 um 15:27 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

The function amdgpu_fence_count_emitted used in work_hander should not call
amdgpu_fence_process which must be used in irq handler.

Signed-off-by: Jiadong.Zhu 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d16c8c1f72db..790f7bfdc654 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -394,7 +394,6 @@ unsigned amdgpu_fence_count_emitted(struct amdgpu_ring 
*ring)
/* We are not protected by ring lock when reading the last sequence
 * but it's ok to report slightly wrong fence count here.
 */
-   amdgpu_fence_process(ring);
emitted = 0x1ull;
emitted -= atomic_read(>fence_drv.last_seq);
emitted += READ_ONCE(ring->fence_drv.sync_seq);




Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-26 Thread Christian König

Am 23.09.22 um 15:16 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

Trigger Mid-Command Buffer Preemption according to the priority of the software
rings and the hw fence signalling condition.

The muxer saves the locations of the indirect buffer frames from the software
ring together with the fence sequence number in its fifo queue, and pops out
those records when the fences are signalled. The locations are used to resubmit
packages in preemption scenarios by coping the chunks from the software ring.

v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.
v5: Fix corner cases for resubmission cases.
v6: Refactor functions for resubmission, calling fence_process in irq handler.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Acked-by: Luben Tuikov 
Signed-off-by: Jiadong.Zhu 


I need more time for an in deep review of this, but form the one mile 
high view it looks correct to me now.


Can we do some pre-commit qa testing with this?

Thanks,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  10 +-
  8 files changed, 368 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 258cffe3c06a..af86d87e2f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}
}
  
+	amdgpu_ring_ib_begin(ring);

if (job && ring->funcs->init_cond_exec)
patch_offset = amdgpu_ring_init_cond_exec(ring);
  
@@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,

ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
ring->funcs->emit_wave_limit(ring, false);
  
+	amdgpu_ring_ib_end(ring);

amdgpu_ring_commit(ring);
return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 13db99d653bd..84b0b3c7d40f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -33,6 +33,7 @@
  
  #include 

  #include "amdgpu.h"
+#include "amdgpu_sw_ring.h"
  #include "atom.h"
  
  /*

@@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring *ring)
  
  	return mqd_mgr->init_mqd(adev, ring->mqd_ptr, );

  }
+
+void amdgpu_ring_ib_begin(struct amdgpu_ring *ring)
+{
+   if (ring->is_sw_ring)
+   amdgpu_sw_ring_ib_begin(ring);
+}
+
+void amdgpu_ring_ib_end(struct amdgpu_ring *ring)
+{
+   if (ring->is_sw_ring)
+   amdgpu_sw_ring_ib_end(ring);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index e90d327a589e..6fbc1627dab7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -312,6 +312,9 @@ struct amdgpu_ring {
  #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
  
  int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);

+void amdgpu_ring_ib_begin(struct amdgpu_ring *ring);
+void amdgpu_ring_ib_end(struct amdgpu_ring *ring);
+
  void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
  void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib 
*ib);
  void amdgpu_ring_commit(struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
index 662aadebf111..788567e3b743 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -28,23 +28,146 @@
  
  #define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2)
  
+static struct kmem_cache *amdgpu_mux_chunk_slab;

+
+static inline struct amdgpu_mux_entry *amdgpu_ring_mux_sw_entry(struct 
amdgpu_ring_mux *mux,
+   struct 
amdgpu_ring *ring)
+{
+   return ring->entry_index < mux->ring_entry_size ?
+   >ring_entry[ring->entry_index] : NULL;
+}
+
+/* copy packages on sw ring range[begin, end) */
+static void amdgpu_ring_mux_copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux,
+ struct amdgpu_ring *ring,
+ u64 s_start, u64 s_end)
+{
+   u64 start, end;
+   struct amdgpu_ring *real_ring = mux->real_ring;
+
+   start = s_start & ring->buf_mask;
+   end = s_end & ring->buf_mask;
+
+   if (start == end) {
+   

Re: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v6)

2022-09-26 Thread Christian König

Am 23.09.22 um 15:16 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

Set ring functions with software ring callbacks on gfx9.

The software ring could be tested by debugfs_test_ib case.

v2: Set sw_ring 2 to enable software ring by default.
v3: Remove the parameter for software ring enablement.
v4: Use amdgpu_ring_init/fini for software rings.
v5: Update for code format. Fix conflict.
v6: Remove unnecessary checks and enable software ring on gfx9 by default.

Acked-by: Luben Tuikov 
Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 108 ++-
  3 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 9996dadb39f7..4fdfc3ec134a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -348,6 +348,7 @@ struct amdgpu_gfx {
  
  	boolis_poweron;
  
+	struct amdgpu_ring		sw_gfx_ring[AMDGPU_MAX_SW_GFX_RINGS];

struct amdgpu_ring_mux  muxer;
  };
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h

index 40b1277b4f0c..f08ee1ac281c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -39,6 +39,7 @@ struct amdgpu_vm;
  #define AMDGPU_MAX_RINGS  28
  #define AMDGPU_MAX_HWIP_RINGS 8
  #define AMDGPU_MAX_GFX_RINGS  2
+#define AMDGPU_MAX_SW_GFX_RINGS 2
  #define AMDGPU_MAX_COMPUTE_RINGS  8
  #define AMDGPU_MAX_VCE_RINGS  3
  #define AMDGPU_MAX_UVD_ENC_RINGS  2
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 5349ca4d19e3..e688665cd1e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -47,6 +47,7 @@
  
  #include "amdgpu_ras.h"
  
+#include "amdgpu_sw_ring.h"

  #include "gfx_v9_4.h"
  #include "gfx_v9_0.h"
  #include "gfx_v9_4_2.h"
@@ -56,6 +57,7 @@
  #include "asic_reg/gc/gc_9_0_default.h"
  
  #define GFX9_NUM_GFX_RINGS 1

+#define GFX9_NUM_SW_GFX_RINGS  2
  #define GFX9_MEC_HPD_SIZE 4096
  #define RLCG_UCODE_LOADING_START_ADDRESS 0x2000L
  #define RLC_SAVE_RESTORE_ADDR_STARTING_OFFSET 0xL
@@ -2273,6 +2275,7 @@ static int gfx_v9_0_sw_init(void *handle)
struct amdgpu_ring *ring;
struct amdgpu_kiq *kiq;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   unsigned int hw_prio;
  
  	switch (adev->ip_versions[GC_HWIP][0]) {

case IP_VERSION(9, 0, 1):
@@ -2356,6 +2359,9 @@ static int gfx_v9_0_sw_init(void *handle)
sprintf(ring->name, "gfx_%d", i);
ring->use_doorbell = true;
ring->doorbell_index = adev->doorbell_index.gfx_ring0 << 1;
+
+   /* disable scheduler on the real ring */
+   ring->no_scheduler = true;
r = amdgpu_ring_init(adev, ring, 1024, >gfx.eop_irq,
 AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP,
 AMDGPU_RING_PRIO_DEFAULT, NULL);
@@ -2363,6 +2369,42 @@ static int gfx_v9_0_sw_init(void *handle)
return r;
}
  
+	/* set up the software rings */

+   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++) {
+   ring = >gfx.sw_gfx_ring[i];
+   ring->ring_obj = NULL;
+   if (!i)
+   sprintf(ring->name, "gfx_sw");
+   else
+   sprintf(ring->name, "gfx_sw_%d", i);


I think we should use something like gfx_low/gfx_high for the ring name 
here.


That this is implemented by a sw muxer is pretty much irrelevant for 
overspace.


Maybe use a static array for the names or something like this.

Apart from that looks good to me.

Regards,
Christian.


+   ring->use_doorbell = true;
+   ring->doorbell_index = adev->doorbell_index.gfx_ring0 << 1;
+   ring->is_sw_ring = true;
+   hw_prio = (i == 1) ? AMDGPU_RING_PRIO_2 :
+   AMDGPU_RING_PRIO_DEFAULT;
+   r = amdgpu_ring_init(adev, ring, 1024, >gfx.eop_irq,
+AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP, hw_prio,
+NULL);
+   if (r)
+   return r;
+   ring->wptr = 0;
+   }
+
+   /* init the muxer and add software rings */
+   r = amdgpu_ring_mux_init(>gfx.muxer, >gfx.gfx_ring[0],
+GFX9_NUM_SW_GFX_RINGS);
+   if (r) {
+   DRM_ERROR("amdgpu_ring_mux_init failed(%d)\n", r);
+   return r;
+   }
+   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++) {
+   r = 

Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring (v7)

2022-09-26 Thread Christian König

Am 23.09.22 um 15:16 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

The software ring is created to support priority context while there is only
one hardware queue for gfx.

Every software ring has its fence driver and could be used as an ordinary ring
for the GPU scheduler.
Multiple software rings are bound to a real ring with the ring muxer. The
packages committed on the software ring are copied to the real ring.

v2: Use array to store software ring entry.
v3: Remove unnecessary prints.
v4: Remove amdgpu_ring_sw_init/fini functions,
using gtt for sw ring buffer for later dma copy
optimization.
v5: Allocate ring entry dynamically in the muxer.
v6: Update comments for the ring muxer.
v7: Modify for function naming.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky  
Signed-off-by: Jiadong.Zhu 
---
  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   4 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 183 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  68 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  61 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  38 
  7 files changed, 359 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 3e0e2eb7e235..85224bc81ce5 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
-   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
+   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
+   amdgpu_sw_ring.o amdgpu_ring_mux.o
  
  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h

index 53526ffb2ce1..9996dadb39f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -33,6 +33,7 @@
  #include "amdgpu_imu.h"
  #include "soc15.h"
  #include "amdgpu_ras.h"
+#include "amdgpu_ring_mux.h"
  
  /* GFX current status */

  #define AMDGPU_GFX_NORMAL_MODE0xL
@@ -346,6 +347,8 @@ struct amdgpu_gfx {
struct amdgpu_gfx_ras   *ras;
  
  	boolis_poweron;

+
+   struct amdgpu_ring_mux  muxer;
  };
  
  #define amdgpu_gfx_get_gpu_clock_counter(adev) (adev)->gfx.funcs->get_gpu_clock_counter((adev))

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7d89a52091c0..40b1277b4f0c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -278,6 +278,10 @@ struct amdgpu_ring {
boolis_mes_queue;
uint32_thw_queue_id;
struct amdgpu_mes_ctx_data *mes_ctx;
+
+   boolis_sw_ring;
+   unsigned intentry_index;
+
  };
  
  #define amdgpu_ring_parse_cs(r, p, job, ib) ((r)->funcs->parse_cs((p), (job), (ib)))

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
new file mode 100644
index ..662aadebf111
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -0,0 +1,183 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include 

Re: [PATCH v3] drm/amdgpu: Fix VRAM BO swap issue

2022-09-26 Thread Christian König

Am 26.09.22 um 07:25 schrieb Arunpravin Paneer Selvam:

DRM buddy manager allocates the contiguous memory requests in
a single block or multiple blocks. So for the ttm move operation
(incase of low vram memory) we should consider all the blocks to
compute the total memory size which compared with the struct
ttm_resource num_pages in order to verify that the blocks are
contiguous for the eviction process.

v2: Added a Fixes tag
v3: Rewrite the code to save a bit of calculations and
 variables (Christian)

Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index b1c455329023..dc262d2c2925 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -424,8 +424,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo,
  static bool amdgpu_mem_visible(struct amdgpu_device *adev,
   struct ttm_resource *mem)
  {
-   uint64_t mem_size = (u64)mem->num_pages << PAGE_SHIFT;
+   u64 mem_size = (u64)mem->num_pages << PAGE_SHIFT;
struct amdgpu_res_cursor cursor;
+   u64 end;
  
  	if (mem->mem_type == TTM_PL_SYSTEM ||

mem->mem_type == TTM_PL_TT)
@@ -434,12 +435,18 @@ static bool amdgpu_mem_visible(struct amdgpu_device *adev,
return false;
  
  	amdgpu_res_first(mem, 0, mem_size, );

+   end = cursor.start + cursor.size;
+   while (cursor.remaining) {
+   amdgpu_res_next(, cursor.size);
  
-	/* ttm_resource_ioremap only supports contiguous memory */

-   if (cursor.size != mem_size)
-   return false;
+   /* ttm_resource_ioremap only supports contiguous memory */
+   if (end != cursor.start)
+   return false;
+
+   end = cursor.start + cursor.size;
+   }
  
-	return cursor.start + cursor.size <= adev->gmc.visible_vram_size;

+   return end <= adev->gmc.visible_vram_size;
  }
  
  /*




Re: [PATCH] drm/amd/display: Fix mutex lock in dcn10

2022-09-26 Thread Christian König

Am 25.09.22 um 23:53 schrieb Daniel Gomez:

Removal of DC_FP_* wrappers from dml (9696679bf7ac) provokes a mutex
lock [2] on the amdgpu driver. Re-arrange the dcn10 code to avoid
locking the mutex by placing the DC_FP_* wrappers around the proper
functions.


Of hand that looks correct to me now, but our DC team needs to take a 
closer look.


Regards,
Christian.



This fixes the following WARN/stacktrace:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 227, name: systemd-udevd
preempt_count: 1, expected: 0
CPU: 4 PID: 227 Comm: systemd-udevd Not tainted 6.0.0-rc6-qtec-standard #2
Hardware name: Qtechnology A/S QT5222/QT5221, BIOS v1.0.1 06/07/2021
Call Trace:
  
  dump_stack_lvl+0x33/0x42
  __might_resched.cold.172+0xa5/0xb3
  mutex_lock+0x1a/0x40
  amdgpu_dpm_get_clock_by_type_with_voltage+0x38/0x70 [amdgpu]
  dm_pp_get_clock_levels_by_type_with_voltage+0x64/0xa0 [amdgpu]
  dcn_bw_update_from_pplib+0x70/0x340 [amdgpu]
  dcn10_create_resource_pool+0x8c8/0xd20 [amdgpu]
  ? __kmalloc+0x1c7/0x4a0
  dc_create_resource_pool+0xe7/0x190 [amdgpu]
  dc_create+0x212/0x5d0 [amdgpu]
  amdgpu_dm_init+0x246/0x370 [amdgpu]
  ? schedule_hrtimeout_range_clock+0x93/0x120
  ? phm_wait_for_register_unequal.part.1+0x4a/0x80 [amdgpu]
  dm_hw_init+0xe/0x20 [amdgpu]
  amdgpu_device_init.cold.56+0x1324/0x1653 [amdgpu]
  ? pci_bus_read_config_word+0x43/0x80
  amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
  amdgpu_pci_probe+0x116/0x320 [amdgpu]
  pci_device_probe+0x97/0x110
  really_probe+0xdd/0x340
  __driver_probe_device+0x80/0x170
  driver_probe_device+0x1f/0x90
  __driver_attach+0xdc/0x180
  ? __device_attach_driver+0x100/0x100
  ? __device_attach_driver+0x100/0x100
  bus_for_each_dev+0x74/0xc0
  bus_add_driver+0x19e/0x210
  ? kset_find_obj+0x30/0xa0
  ? 0xa0a5b000
  driver_register+0x6b/0xc0
  ? 0xa0a5b000
  do_one_initcall+0x4a/0x1f0
  ? __vunmap+0x28e/0x2f0
  ? __cond_resched+0x15/0x30
  ? kmem_cache_alloc_trace+0x3d/0x440
  do_init_module+0x4a/0x1e0
  load_module+0x1cba/0x1e10
  ? __do_sys_finit_module+0xb7/0x120
  __do_sys_finit_module+0xb7/0x120
  do_syscall_64+0x3c/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff2b5f5422d
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 ab 0e 00 f7 d8 64 89 01 48
RSP: 002b:7ffc44ab28e8 EFLAGS: 0246 ORIG_RAX: 0139
RAX: ffda RBX: 555c566a9240 RCX: 7ff2b5f5422d
RDX:  RSI: 7ff2b60bb353 RDI: 0019
RBP: 7ff2b60bb353 R08:  R09: 555c566a9240
R10: 0019 R11: 0246 R12: 
R13: 0002 R14:  R15: 


Fixes: 9696679bf7ac ("drm/amd/display: remove DC_FP_* wrapper from
dml folder")
Signed-off-by: Daniel Gomez 
---
  .../amd/display/dc/dcn10/dcn10_hw_sequencer.c |  12 +-
  .../drm/amd/display/dc/dcn10/dcn10_resource.c |  66 +-
  .../drm/amd/display/dc/dml/calcs/dcn_calcs.c  | 118 --
  .../gpu/drm/amd/display/dc/inc/dcn_calcs.h|  19 ++-
  4 files changed, 138 insertions(+), 77 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 5b5d952b2b8c..cb1e06d62841 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -2994,6 +2994,7 @@ void dcn10_prepare_bandwidth(
  {
struct dce_hwseq *hws = dc->hwseq;
struct hubbub *hubbub = dc->res_pool->hubbub;
+   int min_fclk_khz, min_dcfclk_khz, socclk_khz;
  
  	if (dc->debug.sanity_checks)

hws->funcs.verify_allow_pstate_change_high(dc);
@@ -3016,8 +3017,11 @@ void dcn10_prepare_bandwidth(
  
  	if (dc->debug.pplib_wm_report_mode == WM_REPORT_OVERRIDE) {

DC_FP_START();
-   dcn_bw_notify_pplib_of_wm_ranges(dc);
+   dcn_get_soc_clks(
+   dc, _fclk_khz, _dcfclk_khz, _khz);
DC_FP_END();
+   dcn_bw_notify_pplib_of_wm_ranges(
+   dc, min_fclk_khz, min_dcfclk_khz, socclk_khz);
}
  
  	if (dc->debug.sanity_checks)

@@ -3030,6 +3034,7 @@ void dcn10_optimize_bandwidth(
  {
struct dce_hwseq *hws = dc->hwseq;
struct hubbub *hubbub = dc->res_pool->hubbub;
+   int min_fclk_khz, min_dcfclk_khz, socclk_khz;
  
  	if (dc->debug.sanity_checks)

hws->funcs.verify_allow_pstate_change_high(dc);
@@ -3053,8 +3058,11 @@ void dcn10_optimize_bandwidth(
  
  	if (dc->debug.pplib_wm_report_mode == WM_REPORT_OVERRIDE) {

DC_FP_START();
-   dcn_bw_notify_pplib_of_wm_ranges(dc);
+   dcn_get_soc_clks(
+   dc, _fclk_khz, _dcfclk_khz, _khz);