[PATCH] drm/amdgpu/pm: Fix the error of pwm1_enable setting

2024-02-29 Thread Ma Jun
Fix the pwm_mode value error which used for
pwm1_enable setting

Signed-off-by: Ma Jun 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 9e70c41ad98f..7cc5cd7616b1 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2582,6 +2582,7 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct device 
*dev,
struct amdgpu_device *adev = dev_get_drvdata(dev);
int err, ret;
int value;
+   u32 pwm_mode;
 
if (amdgpu_in_reset(adev))
return -EPERM;
@@ -2592,13 +2593,22 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct 
device *dev,
if (err)
return err;
 
+   if (value == 0)
+   pwm_mode = AMD_FAN_CTRL_NONE;
+   else if (value == 1)
+   pwm_mode = AMD_FAN_CTRL_MANUAL;
+   else if (value == 2)
+   pwm_mode = AMD_FAN_CTRL_AUTO;
+   else
+   return -EINVAL;
+
ret = pm_runtime_get_sync(adev_to_drm(adev)->dev);
if (ret < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
return ret;
}
 
-   ret = amdgpu_dpm_set_fan_control_mode(adev, value);
+   ret = amdgpu_dpm_set_fan_control_mode(adev, pwm_mode);
 
pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
-- 
2.34.1



RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

2024-02-29 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Zhang, Jesse(Jie) 
> Sent: Friday, March 1, 2024 12:50 AM
> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Zhang, Yifan 
> Subject: RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
>
> [Public]
>
> Hi Jon,
>
> -Original Message-
> From: Kim, Jonathan 
> Sent: Thursday, February 29, 2024 11:58 PM
> To: Zhang, Jesse(Jie) ; amd-
> g...@lists.freedesktop.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Zhang, Yifan ; Zhang,
> Jesse(Jie) ; Zhang, Jesse(Jie)
> 
> Subject: RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
>
> [Public]
>
> I think this was discussed in another thread.
> Exception codes should be range checked prior to applying the mask.  Raising
> null events to the debugger or runtime isn't useful.
> I haven't gotten around to fixing this yet.  I should have time this week.
> Just to double check, the out of bounds shift is because of a CP interrupt 
> that
> generates a null exception code?
>
> [Zhang, Jesse(Jie)] Thanks for your reminder, I saw that discussion.
> In this interrupt, other fields(such as, source id, client id pasid ) are 
> correct.
> only the value of context_id0 (0xf) is invalid.
>How about do the check ,like this:
>   } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) {
> +   /* filter out the invalidate context_id0 */
> +   if (!(context_id0 >> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) 
> ||
> +   (context_id0 >> 
> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) >
> EC_MAX)
> +   return;

The range check should probably flag any exception prefixed as 
EC_QUEUE_PACKET_* as valid defined in kfd_dbg_trap_exception_code:
https://github.com/torvalds/linux/blob/master/include/uapi/linux/kfd_ioctl.h#L857
+ Jay to confirm this is the correct exception range for CP_BAD_OPCODE

If that's the case, then I think we can define a 
KFD_DBG_EC_TYPE_IS_QUEUE_PACKET macro similar to:
https://github.com/torvalds/linux/blob/master/include/uapi/linux/kfd_ioctl.h#L917

That way, KFD process interrupts v9, v10, v11 can use that check prior to mask 
conversion and user space may find it useful as well.

Jon
> kfd_set_dbg_ev_from_interrupt(dev, pasid,
> KFD_DEBUG_DOORBELL_ID(context_id0),
>
> KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)),
>  Thanks
>  Jesse
> Jon
>
> > -Original Message-
> > From: Jesse Zhang 
> > Sent: Thursday, February 29, 2024 3:45 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Kuehling, Felix
> > ; Kim, Jonathan ;
> Zhang,
> > Yifan ; Zhang, Jesse(Jie)
> ;
> > Zhang, Jesse(Jie) 
> > Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
> >
> >  the issue is :
> > [  388.151802] UBSAN: shift-out-of-bounds in
> > drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5
> > [  388.151807] shift exponent 4294967295 is too large for 64-bit type
> > 'long long unsigned int'
> > [  388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE
> > 6.7.0+ #1
> > [  388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS
> > WS54117N_140 01/16/2024
> > [  388.151816] Workqueue: KFD IH interrupt_wq [amdgpu] [  388.152084]
> > Call Trace:
> > [  388.152086]  
> > [  388.152089]  dump_stack_lvl+0x4c/0x70 [  388.152096]
> > dump_stack+0x14/0x20 [  388.152098]  ubsan_epilogue+0x9/0x40 [
> > 388.152101]  __ubsan_handle_shift_out_of_bounds+0x113/0x170
> > [  388.152103]  ? vprintk+0x40/0x70
> > [  388.152106]  ? swsusp_check+0x131/0x190 [  388.152110]
> > event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu] [  388.152411]  ?
> > raw_spin_rq_unlock+0x14/0x40 [  388.152415]  ?
> > finish_task_switch+0x85/0x2a0 [  388.152417]  ?
> > kfifo_copy_out+0x5f/0x70 [  388.152420]  interrupt_wq+0xb2/0x120
> > [amdgpu] [  388.152642]  ? interrupt_wq+0xb2/0x120 [amdgpu] [
> > 388.152728]  process_scheduled_works+0x9a/0x3a0
> > [  388.152731]  ? __pfx_worker_thread+0x10/0x10 [  388.152732]
> > worker_thread+0x15f/0x2d0 [  388.152733]  ?
> > __pfx_worker_thread+0x10/0x10 [  388.152734]  kthread+0xfb/0x130 [
> > 388.152735]  ? __pfx_kthread+0x10/0x10 [  388.152736]
> > ret_from_fork+0x3d/0x60 [  388.152738]  ? __pfx_kthread+0x10/0x10 [
> > 388.152739]  ret_from_fork_asm+0x1b/0x30 [  388.152742]  
> >
> > Signed-off-by: Jesse Zhang 
> > ---
> >  include/uapi/linux/kfd_ioctl.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/kfd_ioctl.h
> > b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..3d5867df17e8
> > 100644
> > --- a/include/uapi/linux/kfd_ioctl.h
> > +++ b/include/uapi/linux/kfd_ioctl.h
> > @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code {  };
> >
> >  /* Mask generated by ecode in kfd_dbg_trap_exception_code */
> > -#define KFD_EC_MASK(ecode)   (1ULL << (ecode - 1))
> > +#define KFD_EC_MASK(ecode)   (ecode ? (1UL

RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

2024-02-29 Thread Zhang, Jesse(Jie)
[Public]

Hi Jon,

-Original Message-
From: Kim, Jonathan 
Sent: Thursday, February 29, 2024 11:58 PM
To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Kuehling, Felix 
; Zhang, Yifan ; Zhang, 
Jesse(Jie) ; Zhang, Jesse(Jie) 
Subject: RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

[Public]

I think this was discussed in another thread.
Exception codes should be range checked prior to applying the mask.  Raising 
null events to the debugger or runtime isn't useful.
I haven't gotten around to fixing this yet.  I should have time this week.
Just to double check, the out of bounds shift is because of a CP interrupt that 
generates a null exception code?

[Zhang, Jesse(Jie)] Thanks for your reminder, I saw that discussion.
In this interrupt, other fields(such as, source id, client id pasid ) are 
correct.
only the value of context_id0 (0xf) is invalid.
   How about do the check ,like this:
  } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) {
+   /* filter out the invalidate context_id0 */
+   if (!(context_id0 >> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) ||
+   (context_id0 >> 
KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) > EC_MAX)
+   return;
kfd_set_dbg_ev_from_interrupt(dev, pasid,
KFD_DEBUG_DOORBELL_ID(context_id0),

KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)),
 Thanks
 Jesse
Jon

> -Original Message-
> From: Jesse Zhang 
> Sent: Thursday, February 29, 2024 3:45 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Kim, Jonathan ; Zhang,
> Yifan ; Zhang, Jesse(Jie) ;
> Zhang, Jesse(Jie) 
> Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
>
>  the issue is :
> [  388.151802] UBSAN: shift-out-of-bounds in
> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5
> [  388.151807] shift exponent 4294967295 is too large for 64-bit type
> 'long long unsigned int'
> [  388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE
> 6.7.0+ #1
> [  388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS
> WS54117N_140 01/16/2024
> [  388.151816] Workqueue: KFD IH interrupt_wq [amdgpu] [  388.152084]
> Call Trace:
> [  388.152086]  
> [  388.152089]  dump_stack_lvl+0x4c/0x70 [  388.152096]
> dump_stack+0x14/0x20 [  388.152098]  ubsan_epilogue+0x9/0x40 [
> 388.152101]  __ubsan_handle_shift_out_of_bounds+0x113/0x170
> [  388.152103]  ? vprintk+0x40/0x70
> [  388.152106]  ? swsusp_check+0x131/0x190 [  388.152110]
> event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu] [  388.152411]  ?
> raw_spin_rq_unlock+0x14/0x40 [  388.152415]  ?
> finish_task_switch+0x85/0x2a0 [  388.152417]  ?
> kfifo_copy_out+0x5f/0x70 [  388.152420]  interrupt_wq+0xb2/0x120
> [amdgpu] [  388.152642]  ? interrupt_wq+0xb2/0x120 [amdgpu] [
> 388.152728]  process_scheduled_works+0x9a/0x3a0
> [  388.152731]  ? __pfx_worker_thread+0x10/0x10 [  388.152732]
> worker_thread+0x15f/0x2d0 [  388.152733]  ?
> __pfx_worker_thread+0x10/0x10 [  388.152734]  kthread+0xfb/0x130 [
> 388.152735]  ? __pfx_kthread+0x10/0x10 [  388.152736]
> ret_from_fork+0x3d/0x60 [  388.152738]  ? __pfx_kthread+0x10/0x10 [
> 388.152739]  ret_from_fork_asm+0x1b/0x30 [  388.152742]  
>
> Signed-off-by: Jesse Zhang 
> ---
>  include/uapi/linux/kfd_ioctl.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h
> b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..3d5867df17e8
> 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code {  };
>
>  /* Mask generated by ecode in kfd_dbg_trap_exception_code */
> -#define KFD_EC_MASK(ecode)   (1ULL << (ecode - 1))
> +#define KFD_EC_MASK(ecode)   (ecode ? (1ULL << (ecode - 1)) : 0ULL)
>
>  /* Masks for exception code type checks below */  #define
> KFD_EC_MASK_QUEUE
>   (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \
> --
> 2.25.1




[PATCH AUTOSEL 6.7 24/24] drm/amd/display: fix input states translation error for dcn35 & dcn351

2024-02-29 Thread Sasha Levin
From: Swapnil Patel 

[ Upstream commit 27a6c49394b1a203beeb94752c9a1d6318f24ddf ]

[Why]
Currently there is an error while translating input clock sates into
output clock states. The highest fclk setting from output sates is
being dropped because of this error.

[How]
For dcn35 and dcn351, make output_states equal to input states.

Reviewed-by: Charlene Liu 
Acked-by: Rodrigo Siqueira 
Tested-by: Daniel Wheeler 
Signed-off-by: Swapnil Patel 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../drm/amd/display/dc/dml2/dml2_translation_helper.c| 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c 
b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c
index 2c379be19aa84..16452dae4acac 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c
@@ -398,7 +398,6 @@ void dml2_init_soc_states(struct dml2_context *dml2, const 
struct dc *in_dc,
/* Copy clocks tables entries, if available */
if (dml2->config.bbox_overrides.clks_table.num_states) {
p->in_states->num_states = 
dml2->config.bbox_overrides.clks_table.num_states;
-
for (i = 0; i < 
dml2->config.bbox_overrides.clks_table.num_entries_per_clk.num_dcfclk_levels; 
i++) {
p->in_states->state_array[i].dcfclk_mhz = 
dml2->config.bbox_overrides.clks_table.clk_entries[i].dcfclk_mhz;
}
@@ -437,6 +436,14 @@ void dml2_init_soc_states(struct dml2_context *dml2, const 
struct dc *in_dc,
}
 
dml2_policy_build_synthetic_soc_states(s, p);
+   if (dml2->v20.dml_core_ctx.project == dml_project_dcn35 ||
+   dml2->v20.dml_core_ctx.project == dml_project_dcn351) {
+   // Override last out_state with data from last in_state
+   // This will ensure that out_state contains max fclk
+   memcpy(&p->out_states->state_array[p->out_states->num_states - 
1],
+   
&p->in_states->state_array[p->in_states->num_states - 1],
+   sizeof(struct soc_state_bounding_box_st));
+   }
 }
 
 void dml2_translate_ip_params(const struct dc *in, struct ip_params_st *out)
-- 
2.43.0



Re: [PATCH] drm/amdgpu: remove misleading amdgpu_pmops_runtime_idle() comment

2024-02-29 Thread Alex Deucher
Applied.  Thanks!

On Thu, Feb 29, 2024 at 1:11 PM Bjorn Helgaas  wrote:
>
> From: Bjorn Helgaas 
>
> After 4020c2280233 ("drm/amdgpu: don't runtime suspend if there are
> displays attached (v3)"), "ret" is unconditionally set later before being
> used, so there's point in initializing it and the associated comment is no
> longer meaningful.
>
> Remove the comment and the unnecessary initialization.
>
> Signed-off-by: Bjorn Helgaas 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index cc69005f5b46..68416e2a9130 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2744,8 +2744,7 @@ static int amdgpu_pmops_runtime_idle(struct device *dev)
>  {
> struct drm_device *drm_dev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = drm_to_adev(drm_dev);
> -   /* we don't want the main rpm_idle to call suspend - we want to 
> autosuspend */
> -   int ret = 1;
> +   int ret;
>
> if (adev->pm.rpm_mode == AMDGPU_RUNPM_NONE) {
> pm_runtime_forbid(dev);
> --
> 2.34.1
>


[PATCH] drm/amdgpu: remove misleading amdgpu_pmops_runtime_idle() comment

2024-02-29 Thread Bjorn Helgaas
From: Bjorn Helgaas 

After 4020c2280233 ("drm/amdgpu: don't runtime suspend if there are
displays attached (v3)"), "ret" is unconditionally set later before being
used, so there's point in initializing it and the associated comment is no
longer meaningful.

Remove the comment and the unnecessary initialization.

Signed-off-by: Bjorn Helgaas 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index cc69005f5b46..68416e2a9130 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2744,8 +2744,7 @@ static int amdgpu_pmops_runtime_idle(struct device *dev)
 {
struct drm_device *drm_dev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(drm_dev);
-   /* we don't want the main rpm_idle to call suspend - we want to 
autosuspend */
-   int ret = 1;
+   int ret;
 
if (adev->pm.rpm_mode == AMDGPU_RUNPM_NONE) {
pm_runtime_forbid(dev);
-- 
2.34.1



Re: [PATCH v3 3/3] drm/amdgpu: sync page table freeing with tlb flush

2024-02-29 Thread Sharma, Shashank



On 26/02/2024 17:52, Philip Yang wrote:



On 2024-02-23 08:42, Shashank Sharma wrote:

This patch:
- adds a new list in amdgou_vm to hold the VM PT entries being freed
- waits for the TLB flush using the vm->tlb_flush_fence
- actually frees the PT BOs

V2: rebase
V3: Do not attach the tlb_fence to the entries, rather add the entries
 to a list and delay their freeing (Christian)

Cc: Christian König
Cc: Alex Deucher
Cc: Felix Kuehling
Cc: Rajneesh Bhardwaj
Signed-off-by: Shashank Sharma
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  6 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  6 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 51 ---
  3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 67c690044b97..eebb73f2c2ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -939,6 +939,10 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
/* Makes sure no PD/PT is freed before the flush */
dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence,
   DMA_RESV_USAGE_BOOKKEEP);
+
+   mutex_lock(&vm->tlb_fence_lock);
+   vm->tlb_fence_last = *fence;
+   mutex_unlock(&vm->tlb_fence_lock);
}
  
  	amdgpu_res_first(pages_addr ? NULL : res, offset,

@@ -2212,6 +2216,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
INIT_LIST_HEAD(&vm->freed);
INIT_LIST_HEAD(&vm->done);
INIT_LIST_HEAD(&vm->pt_freed);
+   INIT_LIST_HEAD(&vm->tlb_flush_waitlist);
INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work);
INIT_KFIFO(vm->faults);
  
@@ -2244,6 +2249,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,

vm->last_unlocked = dma_fence_get_stub();
vm->generation = 0;
  
+	mutex_init(&vm->tlb_fence_lock);

mutex_init(&vm->eviction_lock);
vm->evicting = false;
vm->tlb_fence_context = dma_fence_context_alloc(1);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 8e6fd25d07b7..77f10ed80973 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -334,6 +334,10 @@ struct amdgpu_vm {
uint64_t*tlb_seq_cpu_addr;
uint64_ttlb_fence_context;
  
+	struct mutex 		tlb_fence_lock;

+   struct dma_fence*tlb_fence_last;
+   struct list_headtlb_flush_waitlist;
+
atomic64_t  kfd_last_flushed_seq;
  
  	/* How many times we had to re-generate the page tables */

@@ -379,6 +383,8 @@ struct amdgpu_vm {
  
  	/* cached fault info */

struct amdgpu_vm_fault_info fault_info;
+
+   int count_bos;
  };
  
  struct amdgpu_vm_manager {

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 95dc0afdaffb..57ea95c5c085 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -643,13 +643,13 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
if (!entry->bo)
return;
  
-	entry->bo->vm_bo = NULL;

shadow = amdgpu_bo_shadowed(entry->bo);
if (shadow) {
ttm_bo_set_bulk_move(&shadow->tbo, NULL);
amdgpu_bo_unref(&shadow);
}
ttm_bo_set_bulk_move(&entry->bo->tbo, NULL);
+   entry->bo->vm_bo = NULL;
  
  	spin_lock(&entry->vm->status_lock);

list_del(&entry->vm_status);
@@ -657,6 +657,38 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base 
*entry)
amdgpu_bo_unref(&entry->bo);
  }
  
+static void amdgpu_vm_pt_flush_waitlist(struct amdgpu_vm *vm)

+{
+   struct amdgpu_vm_bo_base *entry, *next;
+   LIST_HEAD(tlb_flush_waitlist);
+
+   if (!vm || list_empty(&vm->tlb_flush_waitlist))
+   return;
+
+   /* Wait for pending TLB flush before freeing PT BOs */
+   mutex_lock(&vm->tlb_fence_lock);
+   if (vm->tlb_fence_last && !dma_fence_is_signaled(vm->tlb_fence_last)) {
+   if (dma_fence_wait_timeout(vm->tlb_fence_last, false,
+  MAX_SCHEDULE_TIMEOUT) <= 0) {
+   DRM_ERROR("Timedout waiting for TLB flush, not freeing PT 
BOs\n");
+   mutex_unlock(&vm->tlb_fence_lock);
+   return;
+   }
+
+   vm->tlb_fence_last = NULL;
+   }
+
+   /* Save the waitlist locally and reset the flushlist */
+   list_splice_init(&vm->tlb_flush_waitlist, &tlb_flush_waitlist);
+   mutex_unlock(&vm->tlb_fence_lock);
+
+   /* Now free the entries */
+   list_for_each_entry_safe(entry, next, &tlb_flush_waitlist, vm_status) {
+   if (entry)
+ 

RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

2024-02-29 Thread Kim, Jonathan
[Public]

I think this was discussed in another thread.
Exception codes should be range checked prior to applying the mask.  Raising 
null events to the debugger or runtime isn't useful.
I haven't gotten around to fixing this yet.  I should have time this week.
Just to double check, the out of bounds shift is because of a CP interrupt that 
generates a null exception code?

Jon

> -Original Message-
> From: Jesse Zhang 
> Sent: Thursday, February 29, 2024 3:45 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Kuehling, Felix
> ; Kim, Jonathan ;
> Zhang, Yifan ; Zhang, Jesse(Jie)
> ; Zhang, Jesse(Jie) 
> Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
>
>  the issue is :
> [  388.151802] UBSAN: shift-out-of-bounds in
> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5
> [  388.151807] shift exponent 4294967295 is too large for 64-bit type 'long
> long unsigned int'
> [  388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE
> 6.7.0+ #1
> [  388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS
> WS54117N_140 01/16/2024
> [  388.151816] Workqueue: KFD IH interrupt_wq [amdgpu]
> [  388.152084] Call Trace:
> [  388.152086]  
> [  388.152089]  dump_stack_lvl+0x4c/0x70
> [  388.152096]  dump_stack+0x14/0x20
> [  388.152098]  ubsan_epilogue+0x9/0x40
> [  388.152101]  __ubsan_handle_shift_out_of_bounds+0x113/0x170
> [  388.152103]  ? vprintk+0x40/0x70
> [  388.152106]  ? swsusp_check+0x131/0x190
> [  388.152110]  event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu]
> [  388.152411]  ? raw_spin_rq_unlock+0x14/0x40
> [  388.152415]  ? finish_task_switch+0x85/0x2a0
> [  388.152417]  ? kfifo_copy_out+0x5f/0x70
> [  388.152420]  interrupt_wq+0xb2/0x120 [amdgpu]
> [  388.152642]  ? interrupt_wq+0xb2/0x120 [amdgpu]
> [  388.152728]  process_scheduled_works+0x9a/0x3a0
> [  388.152731]  ? __pfx_worker_thread+0x10/0x10
> [  388.152732]  worker_thread+0x15f/0x2d0
> [  388.152733]  ? __pfx_worker_thread+0x10/0x10
> [  388.152734]  kthread+0xfb/0x130
> [  388.152735]  ? __pfx_kthread+0x10/0x10
> [  388.152736]  ret_from_fork+0x3d/0x60
> [  388.152738]  ? __pfx_kthread+0x10/0x10
> [  388.152739]  ret_from_fork_asm+0x1b/0x30
> [  388.152742]  
>
> Signed-off-by: Jesse Zhang 
> ---
>  include/uapi/linux/kfd_ioctl.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index 9ce46edc62a5..3d5867df17e8 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code {
>  };
>
>  /* Mask generated by ecode in kfd_dbg_trap_exception_code */
> -#define KFD_EC_MASK(ecode)   (1ULL << (ecode - 1))
> +#define KFD_EC_MASK(ecode)   (ecode ? (1ULL << (ecode - 1)) : 0ULL)
>
>  /* Masks for exception code type checks below */
>  #define KFD_EC_MASK_QUEUE
>   (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \
> --
> 2.25.1



Re: [PATCH 1/2] Revert "drm/amd: Remove freesync video mode amdgpu parameter"

2024-02-29 Thread Alex Deucher
On Wed, Feb 28, 2024 at 9:45 AM Christian König
 wrote:
>
> Am 28.02.24 um 15:23 schrieb Alex Deucher:
> > On Wed, Feb 28, 2024 at 2:03 AM Christian König
> >  wrote:
> >> Am 27.02.24 um 19:48 schrieb Alex Deucher:
> >>> This reverts commit e94e787e37b99645e7c02d20d0a1ba0f8a18a82a.
> >>>
> >>> This conflicts with how compositors want to handle VRR.  Now
> >>> that compositors actually handle VRR, we probably don't need
> >>> freesync video.
> >>>
> >>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2985
> >> Scratching my head what actually happens here? Doesn't the problem then
> >> just depend on a module parameter?
> > Yes.  The problem is that when freesync video is enabled, compositors
> > don't know which modes are actual modes and which are a VRR video
> > mode.  There are still customers that want the vrr video mode smooth
> > video playback, but compositors don't want this by default.  I guess
> > the alternative is to just drop this feature altogether now that
> > compositors and media players are starting to support this properly.
>
> That's what I would suggest as well.
>
> As far as I can see adding those modes is actually buggy behavior and we
> need to avoid it.

Well, they work as expected for the use case they were added for,
video playback with different refresh rates without a modeset.  I'll
apply then as is for now and then work on a set of patches to remove
the functionality.

Alex


>
> Christian.
>
> >
> > Alex
> >
> >> Regards,
> >> Christian.
> >>
> >>> Signed-off-by: Alex Deucher 
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 +
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 27 +
> >>>2 files changed, 28 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> index 0e365cadcc3fc..925026c183f41 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> >>> @@ -194,6 +194,7 @@ extern int amdgpu_emu_mode;
> >>>extern uint amdgpu_smu_memory_pool_size;
> >>>extern int amdgpu_smu_pptable_id;
> >>>extern uint amdgpu_dc_feature_mask;
> >>> +extern uint amdgpu_freesync_vid_mode;
> >>>extern uint amdgpu_dc_debug_mask;
> >>>extern uint amdgpu_dc_visual_confirm;
> >>>extern int amdgpu_dm_abm_level;
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> index 15a8a64fc4e28..82b154b103f43 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>> @@ -199,6 +199,7 @@ int amdgpu_mes_kiq;
> >>>int amdgpu_noretry = -1;
> >>>int amdgpu_force_asic_type = -1;
> >>>int amdgpu_tmz = -1; /* auto */
> >>> +uint amdgpu_freesync_vid_mode;
> >>>int amdgpu_reset_method = -1; /* auto */
> >>>int amdgpu_num_kcq = -1;
> >>>int amdgpu_smartshift_bias;
> >>> @@ -883,6 +884,32 @@ module_param_named(damageclips, amdgpu_damage_clips, 
> >>> int, 0444);
> >>>MODULE_PARM_DESC(tmz, "Enable TMZ feature (-1 = auto (default), 0 = 
> >>> off, 1 = on)");
> >>>module_param_named(tmz, amdgpu_tmz, int, 0444);
> >>>
> >>> +/**
> >>> + * DOC: freesync_video (uint)
> >>> + * Enable the optimization to adjust front porch timing to achieve 
> >>> seamless
> >>> + * mode change experience when setting a freesync supported mode for 
> >>> which full
> >>> + * modeset is not needed.
> >>> + *
> >>> + * The Display Core will add a set of modes derived from the base 
> >>> FreeSync
> >>> + * video mode into the corresponding connector's mode list based on 
> >>> commonly
> >>> + * used refresh rates and VRR range of the connected display, when users 
> >>> enable
> >>> + * this feature. From the userspace perspective, they can see a seamless 
> >>> mode
> >>> + * change experience when the change between different refresh rates 
> >>> under the
> >>> + * same resolution. Additionally, userspace applications such as Video 
> >>> playback
> >>> + * can read this modeset list and change the refresh rate based on the 
> >>> video
> >>> + * frame rate. Finally, the userspace can also derive an appropriate 
> >>> mode for a
> >>> + * particular refresh rate based on the FreeSync Mode and add it to the
> >>> + * connector's mode list.
> >>> + *
> >>> + * Note: This is an experimental feature.
> >>> + *
> >>> + * The default value: 0 (off).
> >>> + */
> >>> +MODULE_PARM_DESC(
> >>> + freesync_video,
> >>> + "Enable freesync modesetting optimization feature (0 = off 
> >>> (default), 1 = on)");
> >>> +module_param_named(freesync_video, amdgpu_freesync_vid_mode, uint, 0444);
> >>> +
> >>>/**
> >>> * DOC: reset_method (int)
> >>> * GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = 
> >>> mode1, 3 = mode2, 4 = baco)
>


Re: [PATCH] drm/amd/display: check dc_link before dereferencing

2024-02-29 Thread Alex Deucher
Applied.  Thanks!

On Tue, Feb 27, 2024 at 2:08 PM Melissa Wen  wrote:
>
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6683 
> amdgpu_dm_connector_funcs_force()
> warn: variable dereferenced before check 'dc_link' (see line 6663)
>
> Fixes: 967176179215 ("drm/amd/display: fix null-pointer dereference on edid 
> reading")
> Reported-by: Dan Carpenter 
> Signed-off-by: Melissa Wen 
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 32efce81a5a7..46dd06e8fc7e 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -6653,7 +6653,7 @@ static void amdgpu_dm_connector_funcs_force(struct 
> drm_connector *connector)
> struct edid *edid;
> struct i2c_adapter *ddc;
>
> -   if (dc_link->aux_mode)
> +   if (dc_link && dc_link->aux_mode)
> ddc = &aconnector->dm_dp_aux.aux.ddc;
> else
> ddc = &aconnector->i2c->base;
> --
> 2.43.0
>


Re: [PATCH 2/2] drm/amdgpu: workaround to avoid SET_Q_MODE packets

2024-02-29 Thread Alex Deucher
On Thu, Feb 29, 2024 at 9:58 AM Christian König
 wrote:
>
> It turned out that executing the SET_Q_MODE packet on every submission
> creates to much overhead.
>
> Implement a workaround which allows skipping the SET_Q_MODE packet if
> subsequent submissions all use the same parameters.
>
> Signed-off-by: Christian König 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   | 104 +++
>  2 files changed, 92 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 756330767909..582053f1cd56 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -285,6 +285,9 @@ struct amdgpu_ring {
> unsignedcond_exe_offs;
> u64 cond_exe_gpu_addr;
> volatile u32*cond_exe_cpu_addr;
> +   unsigned intset_q_mode_offs;
> +   volatile u32*set_q_mode_ptr;
> +   u64 set_q_mode_token;
> unsignedvm_hub;
> unsignedvm_inv_eng;
> struct dma_fence*vmid_wait;
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 2ccbdee570cf..6e6b6eff48e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -5461,6 +5461,11 @@ static void gfx_v11_0_ring_emit_vm_flush(struct 
> amdgpu_ring *ring,
> amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
> amdgpu_ring_write(ring, 0x0);
> }
> +
> +   /* Make sure that we can't skip the SET_Q_MODE packets when the VM
> +* changed in any way.
> +*/
> +   ring->set_q_mode_ptr = NULL;
>  }
>
>  static void gfx_v11_0_ring_emit_fence_kiq(struct amdgpu_ring *ring, u64 addr,
> @@ -5510,16 +5515,81 @@ static void gfx_v11_0_ring_emit_cntxcntl(struct 
> amdgpu_ring *ring,
> amdgpu_ring_write(ring, 0);
>  }
>
> +static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring,
> +  uint64_t addr)
> +{
> +   unsigned ret;
> +
> +   amdgpu_ring_write(ring, PACKET3(PACKET3_COND_EXEC, 3));
> +   amdgpu_ring_write(ring, lower_32_bits(addr));
> +   amdgpu_ring_write(ring, upper_32_bits(addr));
> +   /* discard following DWs if *cond_exec_gpu_addr==0 */
> +   amdgpu_ring_write(ring, 0);
> +   ret = ring->wptr & ring->buf_mask;
> +   /* patch dummy value later */
> +   amdgpu_ring_write(ring, 0);
> +
> +   return ret;
> +}
> +
>  static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring,
>u64 shadow_va, u64 csa_va,
>u64 gds_va, bool init_shadow,
>int vmid)
>  {
> struct amdgpu_device *adev = ring->adev;
> +   unsigned int offs, end;
>
> if (!adev->gfx.cp_gfx_shadow)
> return;
>
> +   /*
> +* The logic here isn't easy to understand because we need to keep 
> state
> +* accross multiple executions of the function as well as between the
> +* CPU and GPU. The general idea is that the newly written GPU command
> +* has a condition on the previous one and only executed if really
> +* necessary.
> +*/
> +
> +   /*
> +* The dw in the NOP controls if the next SET_Q_MODE packet should be
> +* executed or not. Reserve 64bits just to be on the save side.
> +*/
> +   amdgpu_ring_write(ring, PACKET3(PACKET3_NOP, 1));
> +   offs = ring->wptr & ring->buf_mask;
> +
> +   /*
> +* We start with skipping the prefix SET_Q_MODE and always executing
> +* the postfix SET_Q_MODE packet. This is changed below with a
> +* WRITE_DATA command when the postfix executed.
> +*/
> +   amdgpu_ring_write(ring, shadow_va ? 1 : 0);
> +   amdgpu_ring_write(ring, 0);
> +
> +   if (ring->set_q_mode_offs) {
> +   uint64_t addr;
> +
> +   addr = amdgpu_bo_gpu_offset(ring->ring_obj);
> +   addr += ring->set_q_mode_offs << 2;
> +   end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
> +   }
> +
> +   /*
> +* When the postfix SET_Q_MODE packet executes we need to make sure 
> that the
> +* next prefix SET_Q_MODE packet executes as well.
> +*/
> +   if (!shadow_va) {
> +   uint64_t addr;
> +
> +   addr = amdgpu_bo_gpu_offset(ring->ring_obj);
> +   addr += offs << 2;
> +   amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
> +   amdgpu_ring_write(ring, WRITE_DATA_DST_SEL(5) | WR_CONFIRM);
> +   amdgp

Re: [RFC PATCH v4 00/42] Color Pipeline API w/ VKMS

2024-02-29 Thread Joshua Ashton




On 2/29/24 10:43, Daniel Vetter wrote:

On Mon, Feb 26, 2024 at 04:10:14PM -0500, Harry Wentland wrote:

This is an RFC set for a color pipeline API, along with a sample
implementation in VKMS. All the key API bits are here. VKMS now
supports two named transfer function colorops and two matrix
colorops. We have IGT tests that check all four of these colorops
with a pixel-by-pixel comparison that checks that these colorops
do what we expect them to do with a +/- 1 8 bpc code point margin.


So vkms is definitely great to make sure the igts are generic enough and
somewhat useful, but ... does steam run on vkms too? I think that would be
a really good test to show that the api we have here is actually useful
for compositors in a cross-driver way, and not just a neat idea that
doesn't survive harsh reality.

And yes I realize that's probably going to be a bunch of work, but I feel
like the color pipeline discussion has dragged around enough in
hypotheticals and concerns that I think it would really help a lot.


I don't think we have ever tested Steam/Gamescope on vkms.

The last time I tried stuff there, there was all the problems with the 
ttm page table tail thing for virtio stuff that made using Steam games + 
Gamescope unfeasable because Vulkan + bindless, but I have heard those 
are solved now?


I will have to try it again at the weekend and see where it's at.
I am willing to place my bets that some part of the stack will fall over 
relating to modifiers somehow... =P


But yes, testing there would be good too, as we have the full Steam Deck 
OLED HDR color pipeline implemented in shader-based composition 
validated as being 1:1 on a suite of HDR and SDR test images.
(That *will* definitely rely on 3D LUTs being tetrahedrally interpolated 
though)


I'll have a look at this at the weekend and also see about getting a 
Gamescope branch that uses the new wip colorop stuff.


- Joshie 🐸✨



Thoughts?
-Sima



The big new change with v4 is the addition of an amdgpu color
pipeline, for all AMD GPUs with DCN 3 and newer. Amdgpu now support
the following:

1. 1D Curve EOTF
2. 3x4 CTM
3. Multiplier
4. 1D Curve Inverse EOTF
5. 1D LUT
6. 1D Curve EOTF
7. 1D LUT

The supported curves for the 1D Curve type are:
- sRGB EOTF and its inverse
- PQ EOTF, scaled to [0.0, 125.0] and its inverse
- BT.2020/BT.709 OETF and its inverse

Note that the 1st and 5th colorops take the EOTF or Inverse
OETF while the 3rd colorop takes the Inverse EOTF or OETF.

We are working on two more ops for amdgpu, the HDR multiplier
and the 3DLUT, which will give us this:

1. 1D Curve EOTF
2. 3x4 CTM
3. HDR Multiplier
4. 1D Curve Inverse EOTF
5. 1D LUT
6. 3D LUT
7. 1D Curve EOTF
8. 1D LUT

This, essentially mirrors the color pipeline used by gamescope
and presented by Melissa Wen, with the exception of the DEGAM
LUT, which is not currently used. See
[1] 
https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf

After this we'd like to also add the following ops:
- Scaler (Informational only)
- Color Encoding, to replace drm_plane's COLOR_ENCODING
- Color Range, to replace drm_plane's COLOR_RANGE

This patchset is grouped as follows:
  - Patches 1-3: couple general patches/fixes
  - Patches 4-7: introduce kunit to VKMS
  - Patch 7: description of motivation and details behind the
 Color Pipeline API. If you're reading nothing else
 but are interested in the topic I highly recommend
 you take a look at this.
  - Patches 7-27: DRM core and VKMS changes for color pipeline API
  - Patches 28-40: DRM core and amdgpu changes for color pipeline API

VKMS patches could still be improved in a few ways, though the
payoff might be limited and I would rather focus on other work
at the moment. The most obvious thing to improve would be to
eliminate the hard-coded LUTs for identity, and sRGB, and replace
them with fixed-point math instead.

There are plenty of things that I would like to see here but
haven't had a chance to look at. These will (hopefully) be
addressed in future iterations, either in VKMS or amdgpu:
  - Clear documentation for each drm_colorop_type
  - Add custom LUT colorops to VKMS
  - Add pre-blending 3DLUT
  - How to support HW which can't bypass entire pipeline?
  - Add ability to create colorops that don't have BYPASS
  - Can we do a LOAD / COMMIT model for LUTs (and other properties)?
  - read-only scaling colorop which defines scaling taps and position
  - read-only color format colorop to define supported color formats
for a pipeline
  - named matrices, for things like converting YUV to RGB

IGT tests can be found at
https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1

IGT patches are also being sent to the igt-dev mailing list.

If you prefer a gitlab MR for review you can find it at
https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5

v4:
  - Add amdgpu color pipeline (WIP)
  - Don't block

[pull] amdgpu drm-fixes-6.8

2024-02-29 Thread Alex Deucher
Hi Dave, Sima,

Fixes for 6.8.

The following changes since commit d206a76d7d2726f3b096037f2079ce0bd3ba329b:

  Linux 6.8-rc6 (2024-02-25 15:46:06 -0800)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-fixes-6.8-2024-02-29

for you to fetch changes up to b7cdccc6a849568775f738b1e233f751a8fed013:

  drm/amd/display: Add monitor patch for specific eDP (2024-02-28 17:33:05 
-0500)


amd-drm-fixes-6.8-2024-02-29:

amdgpu:
- Fix potential buffer overflow
- Fix power min cap
- Suspend/resume fix
- SI PM fix
- eDP fix


Alex Deucher (1):
  Revert "drm/amd/pm: resolve reboot exception for si oland"

Ma Jun (1):
  drm/amdgpu/pm: Fix the power1_min_cap value

Prike Liang (1):
  drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

Ryan Lin (1):
  drm/amd/display: Add monitor patch for specific eDP

Srinivasan Shanmugam (1):
  drm/amd/display: Prevent potential buffer overflow in map_hw_resources

 drivers/gpu/drm/amd/amdgpu/soc15.c | 45 --
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c  |  6 ++-
 drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c |  5 +++
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 29 ++
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  |  9 ++---
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c|  9 ++---
 .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c|  9 ++---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c   |  9 ++---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c   |  9 ++---
 9 files changed, 83 insertions(+), 47 deletions(-)


[PATCH 1/2] drm/amdgpu: cleanup conditional execution

2024-02-29 Thread Christian König
First of all calculating the number of dw to patch into a
conditional execution is not something HW generation specific.
This is just standard ring buffer calculations. While at it also
reduce the BUG_ON() into WARN_ON().

Then instead of a random bit pattern use 0 as default value for
the number of dw skipped, this way it's not mandatory any more
to patch the conditional execution.

And last make the address to check a parameter of the
conditional execution instead of getting this from the ring.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   | 21 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 30 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |  8 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c  | 26 +---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 28 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   | 27 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 28 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 28 +++---
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 29 +++
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c   | 29 +++
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c   | 29 +++
 11 files changed, 99 insertions(+), 184 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 6aa3b1d845ab..8b512dc28df8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -131,7 +131,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
int num_ibs,
struct amdgpu_ib *ib = &ibs[0];
struct dma_fence *tmp = NULL;
bool need_ctx_switch;
-   unsigned int patch_offset = ~0;
struct amdgpu_vm *vm;
uint64_t fence_ctx;
uint32_t status = 0, alloc_size;
@@ -139,10 +138,11 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
int num_ibs,
bool secure, init_shadow;
u64 shadow_va, csa_va, gds_va;
int vmid = AMDGPU_JOB_GET_VMID(job);
+   bool need_pipe_sync = false;
+   unsigned int cond_exec;
 
unsigned int i;
int r = 0;
-   bool need_pipe_sync = false;
 
if (num_ibs == 0)
return -EINVAL;
@@ -228,7 +228,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
int num_ibs,
init_shadow, vmid);
 
if (ring->funcs->init_cond_exec)
-   patch_offset = amdgpu_ring_init_cond_exec(ring);
+   cond_exec = amdgpu_ring_init_cond_exec(ring,
+  ring->cond_exe_gpu_addr);
 
amdgpu_device_flush_hdp(adev, ring);
 
@@ -278,16 +279,9 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
int num_ibs,
   fence_flags | AMDGPU_FENCE_FLAG_64BIT);
}
 
-   if (ring->funcs->emit_gfx_shadow) {
+   if (ring->funcs->emit_gfx_shadow && ring->funcs->init_cond_exec) {
amdgpu_ring_emit_gfx_shadow(ring, 0, 0, 0, false, 0);
-
-   if (ring->funcs->init_cond_exec) {
-   unsigned int ce_offset = ~0;
-
-   ce_offset = amdgpu_ring_init_cond_exec(ring);
-   if (ce_offset != ~0 && ring->funcs->patch_cond_exec)
-   amdgpu_ring_patch_cond_exec(ring, ce_offset);
-   }
+   amdgpu_ring_init_cond_exec(ring, ring->cond_exe_gpu_addr);
}
 
r = amdgpu_fence_emit(ring, f, job, fence_flags);
@@ -302,8 +296,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
int num_ibs,
if (ring->funcs->insert_end)
ring->funcs->insert_end(ring);
 
-   if (patch_offset != ~0 && ring->funcs->patch_cond_exec)
-   amdgpu_ring_patch_cond_exec(ring, patch_offset);
+   amdgpu_ring_patch_cond_exec(ring, cond_exec);
 
ring->current_ctx = fence_ctx;
if (vm && ring->funcs->emit_switch_buffer)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index fe1a61eb6e4c..756330767909 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -209,8 +209,7 @@ struct amdgpu_ring_funcs {
void (*insert_end)(struct amdgpu_ring *ring);
/* pad the indirect buffer to the necessary number of dw */
void (*pad_ib)(struct amdgpu_ring *ring, struct amdgpu_ib *ib);
-   unsigned (*init_cond_exec)(struct amdgpu_ring *ring);
-   void (*patch_cond_exec)(struct amdgpu_ring *ring, unsigned offset);
+   unsigned (*init_cond_exec)(struct amdgpu_ring *ring, uint64_t addr);
/* note usage for clock and power gating */
void (*begin_use)(struct amdgpu_ring *ring);
void (*end_use)(struct amdgpu_ring *ring);
@@ -327,8 +326,7 @@ struct amdgpu_ring {
 #def

[PATCH 2/2] drm/amdgpu: workaround to avoid SET_Q_MODE packets

2024-02-29 Thread Christian König
It turned out that executing the SET_Q_MODE packet on every submission
creates to much overhead.

Implement a workaround which allows skipping the SET_Q_MODE packet if
subsequent submissions all use the same parameters.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c   | 104 +++
 2 files changed, 92 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 756330767909..582053f1cd56 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -285,6 +285,9 @@ struct amdgpu_ring {
unsignedcond_exe_offs;
u64 cond_exe_gpu_addr;
volatile u32*cond_exe_cpu_addr;
+   unsigned intset_q_mode_offs;
+   volatile u32*set_q_mode_ptr;
+   u64 set_q_mode_token;
unsignedvm_hub;
unsignedvm_inv_eng;
struct dma_fence*vmid_wait;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 2ccbdee570cf..6e6b6eff48e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -5461,6 +5461,11 @@ static void gfx_v11_0_ring_emit_vm_flush(struct 
amdgpu_ring *ring,
amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0));
amdgpu_ring_write(ring, 0x0);
}
+
+   /* Make sure that we can't skip the SET_Q_MODE packets when the VM
+* changed in any way.
+*/
+   ring->set_q_mode_ptr = NULL;
 }
 
 static void gfx_v11_0_ring_emit_fence_kiq(struct amdgpu_ring *ring, u64 addr,
@@ -5510,16 +5515,81 @@ static void gfx_v11_0_ring_emit_cntxcntl(struct 
amdgpu_ring *ring,
amdgpu_ring_write(ring, 0);
 }
 
+static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring,
+  uint64_t addr)
+{
+   unsigned ret;
+
+   amdgpu_ring_write(ring, PACKET3(PACKET3_COND_EXEC, 3));
+   amdgpu_ring_write(ring, lower_32_bits(addr));
+   amdgpu_ring_write(ring, upper_32_bits(addr));
+   /* discard following DWs if *cond_exec_gpu_addr==0 */
+   amdgpu_ring_write(ring, 0);
+   ret = ring->wptr & ring->buf_mask;
+   /* patch dummy value later */
+   amdgpu_ring_write(ring, 0);
+
+   return ret;
+}
+
 static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring,
   u64 shadow_va, u64 csa_va,
   u64 gds_va, bool init_shadow,
   int vmid)
 {
struct amdgpu_device *adev = ring->adev;
+   unsigned int offs, end;
 
if (!adev->gfx.cp_gfx_shadow)
return;
 
+   /*
+* The logic here isn't easy to understand because we need to keep state
+* accross multiple executions of the function as well as between the
+* CPU and GPU. The general idea is that the newly written GPU command
+* has a condition on the previous one and only executed if really
+* necessary.
+*/
+
+   /*
+* The dw in the NOP controls if the next SET_Q_MODE packet should be
+* executed or not. Reserve 64bits just to be on the save side.
+*/
+   amdgpu_ring_write(ring, PACKET3(PACKET3_NOP, 1));
+   offs = ring->wptr & ring->buf_mask;
+
+   /*
+* We start with skipping the prefix SET_Q_MODE and always executing
+* the postfix SET_Q_MODE packet. This is changed below with a
+* WRITE_DATA command when the postfix executed.
+*/
+   amdgpu_ring_write(ring, shadow_va ? 1 : 0);
+   amdgpu_ring_write(ring, 0);
+
+   if (ring->set_q_mode_offs) {
+   uint64_t addr;
+
+   addr = amdgpu_bo_gpu_offset(ring->ring_obj);
+   addr += ring->set_q_mode_offs << 2;
+   end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr);
+   }
+
+   /*
+* When the postfix SET_Q_MODE packet executes we need to make sure 
that the
+* next prefix SET_Q_MODE packet executes as well.
+*/
+   if (!shadow_va) {
+   uint64_t addr;
+
+   addr = amdgpu_bo_gpu_offset(ring->ring_obj);
+   addr += offs << 2;
+   amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
+   amdgpu_ring_write(ring, WRITE_DATA_DST_SEL(5) | WR_CONFIRM);
+   amdgpu_ring_write(ring, lower_32_bits(addr));
+   amdgpu_ring_write(ring, upper_32_bits(addr));
+   amdgpu_ring_write(ring, 0x1);
+   }
+
amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7));
amdgpu_ring_write(ring, lower_32_bits(shadow_va));
amdgpu_ring_write(ring, upper_

Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-29 Thread Lazar, Lijo



On 2/29/2024 4:40 PM, Ma, Jun wrote:
> Hi Lijo,
> 
> On 2/29/2024 3:33 PM, Lazar, Lijo wrote:
>>
>>
>> On 2/29/2024 11:49 AM, Ma Jun wrote:
>>> Check return value of amdgpu_device_baco_enter/exit and print
>>> warning message because these errors may cause runtime resume failure
>>>
>>> Signed-off-by: Ma Jun 
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 --
>>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index e68bd6f8a6a4..4928b588cd12 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -6000,15 +6000,24 @@ int amdgpu_device_baco_enter(struct drm_device *dev)
>>>  {
>>> struct amdgpu_device *adev = drm_to_adev(dev);
>>> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>>> +   int ret = 0;
>>>  
>>> -   if (!amdgpu_device_supports_baco(dev))
>>> -   return -ENOTSUPP;
>>> +   if (!amdgpu_device_supports_baco(dev)) {
>>> +   ret = -ENOTSUPP;
>>> +   goto baco_error;
>>> +   }
>>>  
>>> if (ras && adev->ras_enabled &&
>>> adev->nbio.funcs->enable_doorbell_interrupt)
>>> adev->nbio.funcs->enable_doorbell_interrupt(adev, false);
>>>  
>>> -   return amdgpu_dpm_baco_enter(adev);
>>> +   ret = amdgpu_dpm_baco_enter(adev);
>>> +
>>> +baco_error:
>>> +   if (ret)
>>> +   dev_warn(adev->dev, "warning: device fails to enter baco. 
>>> ret=%d\n", ret);
>>> +
>>
>> This doesn't look like a real case, moreover the warning message is
> 
> In fact this is a case that actually happened.
> 
> When amdgpu_device_supports_baco returns with error because of some reasons,
> device will enter runtime suspend without calling the  
> amdgpu_device_baco_enter()
> and without any warning message being printed. Then, device is usually fails
> to resume.
> So, I add this message as a warning to help us find the real reason.
> 
>> misleading. If the device doesn't support baco, driver is not supposed
>> to call it for runpm purpose -
>> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2664
>>
> I changed this code in another patch.
> https://lists.freedesktop.org/archives/amd-gfx/2024-February/104929.html
> 

Baco support checks are all based on cached software variables and not
based on hardware interactions. So this is very unlikely. It might also
be that driver really called baco entry, but even before baco entry
happened a device usage is detected for which resume is called.

One other way to really check this is to check if baco exit is called
when the software state is not really SMU_BACO_STATE_ENTER (applicable
only for swsmu). Since baco entry is driver triggered, the imbalance
shouldn't happen.

Thanks,
Lijo

> Regards,
> Ma Jun
> 
>> Thanks,
>> Lijo
>>
>>> +   return ret;
>>>  }
>>>  
>>>  int amdgpu_device_baco_exit(struct drm_device *dev)
>>> @@ -6017,12 +6026,14 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>>> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>>> int ret = 0;
>>>  
>>> -   if (!amdgpu_device_supports_baco(dev))
>>> -   return -ENOTSUPP;
>>> +   if (!amdgpu_device_supports_baco(dev)) {
>>> +   ret = -ENOTSUPP;
>>> +   goto baco_error;
>>> +   }
>>>  
>>> ret = amdgpu_dpm_baco_exit(adev);
>>> if (ret)
>>> -   return ret;
>>> +   goto baco_error;
>>>  
>>> if (ras && adev->ras_enabled &&
>>> adev->nbio.funcs->enable_doorbell_interrupt)
>>> @@ -6032,7 +6043,11 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>>> adev->nbio.funcs->clear_doorbell_interrupt)
>>> adev->nbio.funcs->clear_doorbell_interrupt(adev);
>>>  
>>> -   return 0;
>>> +baco_error:
>>> +   if (ret)
>>> +   dev_warn(adev->dev, "warning: device fails to exit from baco. 
>>> ret=%d\n", ret);
>>> +
>>> +   return ret;
>>>  }
>>>  
>>>  /**


Re: [PATCH V3] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-29 Thread Christian König

Am 29.02.24 um 06:51 schrieb Jesse.Zhang:

fix the issue:
"amdgpu: Failed to create process VM object".

[Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm 
page table.
But when clifo run. It also initializes a vm for a process device through the 
function kfd_process_device_init_vm and ensure the root PD is clean through the 
function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.

  v1:
 - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix)

Signed-off-by: Jesse Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 23 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  3 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 55 +--
  3 files changed, 1 insertion(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..d004ace79536 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1385,10 +1385,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
struct amdgpu_bo_va_mapping, list);
list_del(&mapping->list);
  
-		if (vm->pte_support_ats &&

-   mapping->start < AMDGPU_GMC_HOLE_START)
-   init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
-
r = amdgpu_vm_update_range(adev, vm, false, false, true, false,
   resv, mapping->start, mapping->last,
   init_pte_value, 0, 0, NULL, NULL,
@@ -2264,7 +2260,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
if (r)
return r;
  
-	vm->pte_support_ats = false;

vm->is_compute_context = false;
  
  	vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &

@@ -2350,30 +2345,12 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
   */
  int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
  {
-   bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
int r;
  
  	r = amdgpu_bo_reserve(vm->root.bo, true);

if (r)
return r;
  
-	/* Check if PD needs to be reinitialized and do it before

-* changing any other state, in case it fails.
-*/
-   if (pte_support_ats != vm->pte_support_ats) {
-   /* Sanity checks */
-   if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-   r = -EINVAL;
-   goto unreserve_bo;
-   }
-
-   vm->pte_support_ats = pte_support_ats;
-   r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
-  false);
-   if (r)
-   goto unreserve_bo;
-   }
-
/* Update VM state */
vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
AMDGPU_VM_USE_CPU_FOR_COMPUTE);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 42f6ddec50c1..9f6b5e1ccf34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -357,9 +357,6 @@ struct amdgpu_vm {
/* Functions to use for VM table updates */
const struct amdgpu_vm_update_funcs *update_funcs;
  
-	/* Flag to indicate ATS support from PTE for GFX9 */

-   boolpte_support_ats;
-
/* Up to 128 pending retry page faults */
DECLARE_KFIFO(faults, u64, 128);
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c

index a160265ddc07..f07255532aae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -89,22 +89,6 @@ static unsigned int amdgpu_vm_pt_num_entries(struct 
amdgpu_device *adev,
return AMDGPU_VM_PTE_COUNT(adev);
  }
  
-/**

- * amdgpu_vm_pt_num_ats_entries - return the number of ATS entries in the root 
PD
- *
- * @adev: amdgpu_device pointer
- *
- * Returns:
- * The number of entries in the root page directory which needs the ATS 
setting.
- */
-static unsigned int amdgpu_vm_pt_num_ats_entries(struct amdgpu_device *adev)
-{
-   unsigned int shift;
-
-   shift = amdgpu_vm_pt_level_shift(adev, adev->vm_manager.root_level);
-   return AMDGPU_GMC_HOLE_START >> (shift + AMDGPU_GPU_PAGE_SHIFT);
-}
-
  /**
   * amdgpu_vm_pt_entries_mask - the mask to get the entry number of a PD/PT
   *
@@ -394,27 +378,7 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
}
  
  	entries = amdgpu_bo_size(bo) / 8;

-   if (!vm->pte_support_ats) {
-   ats_entries = 0;
-
-   } else if (!bo->parent) {
-   ats_entries = amdgpu_vm_pt_num_ats_entries(adev);
-   ats_entries = min(ats_entries, entries);
-   entries -= ats_entries;
-

Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-29 Thread Ma, Jun
Hi Lijo,

On 2/29/2024 3:33 PM, Lazar, Lijo wrote:
> 
> 
> On 2/29/2024 11:49 AM, Ma Jun wrote:
>> Check return value of amdgpu_device_baco_enter/exit and print
>> warning message because these errors may cause runtime resume failure
>>
>> Signed-off-by: Ma Jun 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 --
>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e68bd6f8a6a4..4928b588cd12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -6000,15 +6000,24 @@ int amdgpu_device_baco_enter(struct drm_device *dev)
>>  {
>>  struct amdgpu_device *adev = drm_to_adev(dev);
>>  struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>> +int ret = 0;
>>  
>> -if (!amdgpu_device_supports_baco(dev))
>> -return -ENOTSUPP;
>> +if (!amdgpu_device_supports_baco(dev)) {
>> +ret = -ENOTSUPP;
>> +goto baco_error;
>> +}
>>  
>>  if (ras && adev->ras_enabled &&
>>  adev->nbio.funcs->enable_doorbell_interrupt)
>>  adev->nbio.funcs->enable_doorbell_interrupt(adev, false);
>>  
>> -return amdgpu_dpm_baco_enter(adev);
>> +ret = amdgpu_dpm_baco_enter(adev);
>> +
>> +baco_error:
>> +if (ret)
>> +dev_warn(adev->dev, "warning: device fails to enter baco. 
>> ret=%d\n", ret);
>> +
> 
> This doesn't look like a real case, moreover the warning message is

In fact this is a case that actually happened.

When amdgpu_device_supports_baco returns with error because of some reasons,
device will enter runtime suspend without calling the  
amdgpu_device_baco_enter()
and without any warning message being printed. Then, device is usually fails
to resume.
So, I add this message as a warning to help us find the real reason.

> misleading. If the device doesn't support baco, driver is not supposed
> to call it for runpm purpose -
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2664
> 
I changed this code in another patch.
https://lists.freedesktop.org/archives/amd-gfx/2024-February/104929.html

Regards,
Ma Jun

> Thanks,
> Lijo
> 
>> +return ret;
>>  }
>>  
>>  int amdgpu_device_baco_exit(struct drm_device *dev)
>> @@ -6017,12 +6026,14 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>>  struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>>  int ret = 0;
>>  
>> -if (!amdgpu_device_supports_baco(dev))
>> -return -ENOTSUPP;
>> +if (!amdgpu_device_supports_baco(dev)) {
>> +ret = -ENOTSUPP;
>> +goto baco_error;
>> +}
>>  
>>  ret = amdgpu_dpm_baco_exit(adev);
>>  if (ret)
>> -return ret;
>> +goto baco_error;
>>  
>>  if (ras && adev->ras_enabled &&
>>  adev->nbio.funcs->enable_doorbell_interrupt)
>> @@ -6032,7 +6043,11 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>>  adev->nbio.funcs->clear_doorbell_interrupt)
>>  adev->nbio.funcs->clear_doorbell_interrupt(adev);
>>  
>> -return 0;
>> +baco_error:
>> +if (ret)
>> +dev_warn(adev->dev, "warning: device fails to exit from baco. 
>> ret=%d\n", ret);
>> +
>> +return ret;
>>  }
>>  
>>  /**


Re: [RFC PATCH v4 00/42] Color Pipeline API w/ VKMS

2024-02-29 Thread Daniel Vetter
On Mon, Feb 26, 2024 at 04:10:14PM -0500, Harry Wentland wrote:
> This is an RFC set for a color pipeline API, along with a sample
> implementation in VKMS. All the key API bits are here. VKMS now
> supports two named transfer function colorops and two matrix
> colorops. We have IGT tests that check all four of these colorops
> with a pixel-by-pixel comparison that checks that these colorops
> do what we expect them to do with a +/- 1 8 bpc code point margin.

So vkms is definitely great to make sure the igts are generic enough and
somewhat useful, but ... does steam run on vkms too? I think that would be
a really good test to show that the api we have here is actually useful
for compositors in a cross-driver way, and not just a neat idea that
doesn't survive harsh reality.

And yes I realize that's probably going to be a bunch of work, but I feel
like the color pipeline discussion has dragged around enough in
hypotheticals and concerns that I think it would really help a lot.

Thoughts?
-Sima

> 
> The big new change with v4 is the addition of an amdgpu color
> pipeline, for all AMD GPUs with DCN 3 and newer. Amdgpu now support
> the following:
> 
> 1. 1D Curve EOTF
> 2. 3x4 CTM
> 3. Multiplier
> 4. 1D Curve Inverse EOTF
> 5. 1D LUT
> 6. 1D Curve EOTF
> 7. 1D LUT
> 
> The supported curves for the 1D Curve type are:
> - sRGB EOTF and its inverse
> - PQ EOTF, scaled to [0.0, 125.0] and its inverse
> - BT.2020/BT.709 OETF and its inverse
> 
> Note that the 1st and 5th colorops take the EOTF or Inverse
> OETF while the 3rd colorop takes the Inverse EOTF or OETF.
> 
> We are working on two more ops for amdgpu, the HDR multiplier
> and the 3DLUT, which will give us this:
> 
> 1. 1D Curve EOTF
> 2. 3x4 CTM
> 3. HDR Multiplier
> 4. 1D Curve Inverse EOTF
> 5. 1D LUT
> 6. 3D LUT
> 7. 1D Curve EOTF
> 8. 1D LUT
> 
> This, essentially mirrors the color pipeline used by gamescope
> and presented by Melissa Wen, with the exception of the DEGAM
> LUT, which is not currently used. See
> [1] 
> https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf
> 
> After this we'd like to also add the following ops:
> - Scaler (Informational only)
> - Color Encoding, to replace drm_plane's COLOR_ENCODING
> - Color Range, to replace drm_plane's COLOR_RANGE
> 
> This patchset is grouped as follows:
>  - Patches 1-3: couple general patches/fixes
>  - Patches 4-7: introduce kunit to VKMS
>  - Patch 7: description of motivation and details behind the
> Color Pipeline API. If you're reading nothing else
> but are interested in the topic I highly recommend
> you take a look at this.
>  - Patches 7-27: DRM core and VKMS changes for color pipeline API
>  - Patches 28-40: DRM core and amdgpu changes for color pipeline API
> 
> VKMS patches could still be improved in a few ways, though the
> payoff might be limited and I would rather focus on other work
> at the moment. The most obvious thing to improve would be to
> eliminate the hard-coded LUTs for identity, and sRGB, and replace
> them with fixed-point math instead.
> 
> There are plenty of things that I would like to see here but
> haven't had a chance to look at. These will (hopefully) be
> addressed in future iterations, either in VKMS or amdgpu:
>  - Clear documentation for each drm_colorop_type
>  - Add custom LUT colorops to VKMS
>  - Add pre-blending 3DLUT
>  - How to support HW which can't bypass entire pipeline?
>  - Add ability to create colorops that don't have BYPASS
>  - Can we do a LOAD / COMMIT model for LUTs (and other properties)?
>  - read-only scaling colorop which defines scaling taps and position
>  - read-only color format colorop to define supported color formats
>for a pipeline
>  - named matrices, for things like converting YUV to RGB
> 
> IGT tests can be found at
> https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1
> 
> IGT patches are also being sent to the igt-dev mailing list.
> 
> If you prefer a gitlab MR for review you can find it at
> https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5
> 
> v4:
>  - Add amdgpu color pipeline (WIP)
>  - Don't block setting of deprecated properties, instead pass client cap
>to atomic check so drivers can ignore these props
>  - Drop IOCTL definitions (Pekka)
>  - Use enum property for colorop TYPE (Pekka)
>  - A few cleanups to the docs (Pekka)
>  - Rework the TYPE enum to name relation to avoid code duplication (Pekka)
>  - Add missing function declarations (Chaitanya Kumar Borah)
>  - Allow setting of NEXT property to NULL in _set_ function (Chaitanya Kumar 
> Borah)
>  - Add helper for creation of pipeline drm_plane property (Pekka)
>  - Always create Bypass pipeline (Pekka)
>  - A bunch of changes to VKMS kunit tests (Pekka)
>  - Fix index in CTM doc (Pekka)
> 
> v3:
>  - Abandon IOCTLs and discover colorops as clients iterate the pipeline
>  - Remove need for libdrm
>  - Add co

[PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

2024-02-29 Thread Jesse Zhang
 the issue is :
[  388.151802] UBSAN: shift-out-of-bounds in 
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5
[  388.151807] shift exponent 4294967295 is too large for 64-bit type 'long 
long unsigned int'
[  388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE  
6.7.0+ #1
[  388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS WS54117N_140 
01/16/2024
[  388.151816] Workqueue: KFD IH interrupt_wq [amdgpu]
[  388.152084] Call Trace:
[  388.152086]  
[  388.152089]  dump_stack_lvl+0x4c/0x70
[  388.152096]  dump_stack+0x14/0x20
[  388.152098]  ubsan_epilogue+0x9/0x40
[  388.152101]  __ubsan_handle_shift_out_of_bounds+0x113/0x170
[  388.152103]  ? vprintk+0x40/0x70
[  388.152106]  ? swsusp_check+0x131/0x190
[  388.152110]  event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu]
[  388.152411]  ? raw_spin_rq_unlock+0x14/0x40
[  388.152415]  ? finish_task_switch+0x85/0x2a0
[  388.152417]  ? kfifo_copy_out+0x5f/0x70
[  388.152420]  interrupt_wq+0xb2/0x120 [amdgpu]
[  388.152642]  ? interrupt_wq+0xb2/0x120 [amdgpu]
[  388.152728]  process_scheduled_works+0x9a/0x3a0
[  388.152731]  ? __pfx_worker_thread+0x10/0x10
[  388.152732]  worker_thread+0x15f/0x2d0
[  388.152733]  ? __pfx_worker_thread+0x10/0x10
[  388.152734]  kthread+0xfb/0x130
[  388.152735]  ? __pfx_kthread+0x10/0x10
[  388.152736]  ret_from_fork+0x3d/0x60
[  388.152738]  ? __pfx_kthread+0x10/0x10
[  388.152739]  ret_from_fork_asm+0x1b/0x30
[  388.152742]  

Signed-off-by: Jesse Zhang 
---
 include/uapi/linux/kfd_ioctl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 9ce46edc62a5..3d5867df17e8 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code {
 };
 
 /* Mask generated by ecode in kfd_dbg_trap_exception_code */
-#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1))
+#define KFD_EC_MASK(ecode) (ecode ? (1ULL << (ecode - 1)) : 0ULL)
 
 /* Masks for exception code type checks below */
 #define KFD_EC_MASK_QUEUE  (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \
-- 
2.25.1