[PATCH] drm/amdgpu/pm: Fix the error of pwm1_enable setting
Fix the pwm_mode value error which used for pwm1_enable setting Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/pm/amdgpu_pm.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index 9e70c41ad98f..7cc5cd7616b1 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -2582,6 +2582,7 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct device *dev, struct amdgpu_device *adev = dev_get_drvdata(dev); int err, ret; int value; + u32 pwm_mode; if (amdgpu_in_reset(adev)) return -EPERM; @@ -2592,13 +2593,22 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct device *dev, if (err) return err; + if (value == 0) + pwm_mode = AMD_FAN_CTRL_NONE; + else if (value == 1) + pwm_mode = AMD_FAN_CTRL_MANUAL; + else if (value == 2) + pwm_mode = AMD_FAN_CTRL_AUTO; + else + return -EINVAL; + ret = pm_runtime_get_sync(adev_to_drm(adev)->dev); if (ret < 0) { pm_runtime_put_autosuspend(adev_to_drm(adev)->dev); return ret; } - ret = amdgpu_dpm_set_fan_control_mode(adev, value); + ret = amdgpu_dpm_set_fan_control_mode(adev, pwm_mode); pm_runtime_mark_last_busy(adev_to_drm(adev)->dev); pm_runtime_put_autosuspend(adev_to_drm(adev)->dev); -- 2.34.1
RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
[Public] > -Original Message- > From: Zhang, Jesse(Jie) > Sent: Friday, March 1, 2024 12:50 AM > To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Kuehling, Felix > ; Zhang, Yifan > Subject: RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug > > [Public] > > Hi Jon, > > -Original Message- > From: Kim, Jonathan > Sent: Thursday, February 29, 2024 11:58 PM > To: Zhang, Jesse(Jie) ; amd- > g...@lists.freedesktop.org > Cc: Deucher, Alexander ; Kuehling, Felix > ; Zhang, Yifan ; Zhang, > Jesse(Jie) ; Zhang, Jesse(Jie) > > Subject: RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug > > [Public] > > I think this was discussed in another thread. > Exception codes should be range checked prior to applying the mask. Raising > null events to the debugger or runtime isn't useful. > I haven't gotten around to fixing this yet. I should have time this week. > Just to double check, the out of bounds shift is because of a CP interrupt > that > generates a null exception code? > > [Zhang, Jesse(Jie)] Thanks for your reminder, I saw that discussion. > In this interrupt, other fields(such as, source id, client id pasid ) are > correct. > only the value of context_id0 (0xf) is invalid. >How about do the check ,like this: > } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) { > + /* filter out the invalidate context_id0 */ > + if (!(context_id0 >> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) > || > + (context_id0 >> > KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) > > EC_MAX) > + return; The range check should probably flag any exception prefixed as EC_QUEUE_PACKET_* as valid defined in kfd_dbg_trap_exception_code: https://github.com/torvalds/linux/blob/master/include/uapi/linux/kfd_ioctl.h#L857 + Jay to confirm this is the correct exception range for CP_BAD_OPCODE If that's the case, then I think we can define a KFD_DBG_EC_TYPE_IS_QUEUE_PACKET macro similar to: https://github.com/torvalds/linux/blob/master/include/uapi/linux/kfd_ioctl.h#L917 That way, KFD process interrupts v9, v10, v11 can use that check prior to mask conversion and user space may find it useful as well. Jon > kfd_set_dbg_ev_from_interrupt(dev, pasid, > KFD_DEBUG_DOORBELL_ID(context_id0), > > KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)), > Thanks > Jesse > Jon > > > -Original Message- > > From: Jesse Zhang > > Sent: Thursday, February 29, 2024 3:45 AM > > To: amd-gfx@lists.freedesktop.org > > Cc: Deucher, Alexander ; Kuehling, Felix > > ; Kim, Jonathan ; > Zhang, > > Yifan ; Zhang, Jesse(Jie) > ; > > Zhang, Jesse(Jie) > > Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug > > > > the issue is : > > [ 388.151802] UBSAN: shift-out-of-bounds in > > drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5 > > [ 388.151807] shift exponent 4294967295 is too large for 64-bit type > > 'long long unsigned int' > > [ 388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE > > 6.7.0+ #1 > > [ 388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS > > WS54117N_140 01/16/2024 > > [ 388.151816] Workqueue: KFD IH interrupt_wq [amdgpu] [ 388.152084] > > Call Trace: > > [ 388.152086] > > [ 388.152089] dump_stack_lvl+0x4c/0x70 [ 388.152096] > > dump_stack+0x14/0x20 [ 388.152098] ubsan_epilogue+0x9/0x40 [ > > 388.152101] __ubsan_handle_shift_out_of_bounds+0x113/0x170 > > [ 388.152103] ? vprintk+0x40/0x70 > > [ 388.152106] ? swsusp_check+0x131/0x190 [ 388.152110] > > event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu] [ 388.152411] ? > > raw_spin_rq_unlock+0x14/0x40 [ 388.152415] ? > > finish_task_switch+0x85/0x2a0 [ 388.152417] ? > > kfifo_copy_out+0x5f/0x70 [ 388.152420] interrupt_wq+0xb2/0x120 > > [amdgpu] [ 388.152642] ? interrupt_wq+0xb2/0x120 [amdgpu] [ > > 388.152728] process_scheduled_works+0x9a/0x3a0 > > [ 388.152731] ? __pfx_worker_thread+0x10/0x10 [ 388.152732] > > worker_thread+0x15f/0x2d0 [ 388.152733] ? > > __pfx_worker_thread+0x10/0x10 [ 388.152734] kthread+0xfb/0x130 [ > > 388.152735] ? __pfx_kthread+0x10/0x10 [ 388.152736] > > ret_from_fork+0x3d/0x60 [ 388.152738] ? __pfx_kthread+0x10/0x10 [ > > 388.152739] ret_from_fork_asm+0x1b/0x30 [ 388.152742] > > > > Signed-off-by: Jesse Zhang > > --- > > include/uapi/linux/kfd_ioctl.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/uapi/linux/kfd_ioctl.h > > b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..3d5867df17e8 > > 100644 > > --- a/include/uapi/linux/kfd_ioctl.h > > +++ b/include/uapi/linux/kfd_ioctl.h > > @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code { }; > > > > /* Mask generated by ecode in kfd_dbg_trap_exception_code */ > > -#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1)) > > +#define KFD_EC_MASK(ecode) (ecode ? (1UL
RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
[Public] Hi Jon, -Original Message- From: Kim, Jonathan Sent: Thursday, February 29, 2024 11:58 PM To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Felix ; Zhang, Yifan ; Zhang, Jesse(Jie) ; Zhang, Jesse(Jie) Subject: RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug [Public] I think this was discussed in another thread. Exception codes should be range checked prior to applying the mask. Raising null events to the debugger or runtime isn't useful. I haven't gotten around to fixing this yet. I should have time this week. Just to double check, the out of bounds shift is because of a CP interrupt that generates a null exception code? [Zhang, Jesse(Jie)] Thanks for your reminder, I saw that discussion. In this interrupt, other fields(such as, source id, client id pasid ) are correct. only the value of context_id0 (0xf) is invalid. How about do the check ,like this: } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) { + /* filter out the invalidate context_id0 */ + if (!(context_id0 >> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) || + (context_id0 >> KFD_DEBUG_CP_BAD_OP_ECODE_SHIFT) > EC_MAX) + return; kfd_set_dbg_ev_from_interrupt(dev, pasid, KFD_DEBUG_DOORBELL_ID(context_id0), KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)), Thanks Jesse Jon > -Original Message- > From: Jesse Zhang > Sent: Thursday, February 29, 2024 3:45 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Kuehling, Felix > ; Kim, Jonathan ; Zhang, > Yifan ; Zhang, Jesse(Jie) ; > Zhang, Jesse(Jie) > Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug > > the issue is : > [ 388.151802] UBSAN: shift-out-of-bounds in > drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5 > [ 388.151807] shift exponent 4294967295 is too large for 64-bit type > 'long long unsigned int' > [ 388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE > 6.7.0+ #1 > [ 388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS > WS54117N_140 01/16/2024 > [ 388.151816] Workqueue: KFD IH interrupt_wq [amdgpu] [ 388.152084] > Call Trace: > [ 388.152086] > [ 388.152089] dump_stack_lvl+0x4c/0x70 [ 388.152096] > dump_stack+0x14/0x20 [ 388.152098] ubsan_epilogue+0x9/0x40 [ > 388.152101] __ubsan_handle_shift_out_of_bounds+0x113/0x170 > [ 388.152103] ? vprintk+0x40/0x70 > [ 388.152106] ? swsusp_check+0x131/0x190 [ 388.152110] > event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu] [ 388.152411] ? > raw_spin_rq_unlock+0x14/0x40 [ 388.152415] ? > finish_task_switch+0x85/0x2a0 [ 388.152417] ? > kfifo_copy_out+0x5f/0x70 [ 388.152420] interrupt_wq+0xb2/0x120 > [amdgpu] [ 388.152642] ? interrupt_wq+0xb2/0x120 [amdgpu] [ > 388.152728] process_scheduled_works+0x9a/0x3a0 > [ 388.152731] ? __pfx_worker_thread+0x10/0x10 [ 388.152732] > worker_thread+0x15f/0x2d0 [ 388.152733] ? > __pfx_worker_thread+0x10/0x10 [ 388.152734] kthread+0xfb/0x130 [ > 388.152735] ? __pfx_kthread+0x10/0x10 [ 388.152736] > ret_from_fork+0x3d/0x60 [ 388.152738] ? __pfx_kthread+0x10/0x10 [ > 388.152739] ret_from_fork_asm+0x1b/0x30 [ 388.152742] > > Signed-off-by: Jesse Zhang > --- > include/uapi/linux/kfd_ioctl.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/uapi/linux/kfd_ioctl.h > b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..3d5867df17e8 > 100644 > --- a/include/uapi/linux/kfd_ioctl.h > +++ b/include/uapi/linux/kfd_ioctl.h > @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code { }; > > /* Mask generated by ecode in kfd_dbg_trap_exception_code */ > -#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1)) > +#define KFD_EC_MASK(ecode) (ecode ? (1ULL << (ecode - 1)) : 0ULL) > > /* Masks for exception code type checks below */ #define > KFD_EC_MASK_QUEUE > (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \ > -- > 2.25.1
[PATCH AUTOSEL 6.7 24/24] drm/amd/display: fix input states translation error for dcn35 & dcn351
From: Swapnil Patel [ Upstream commit 27a6c49394b1a203beeb94752c9a1d6318f24ddf ] [Why] Currently there is an error while translating input clock sates into output clock states. The highest fclk setting from output sates is being dropped because of this error. [How] For dcn35 and dcn351, make output_states equal to input states. Reviewed-by: Charlene Liu Acked-by: Rodrigo Siqueira Tested-by: Daniel Wheeler Signed-off-by: Swapnil Patel Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- .../drm/amd/display/dc/dml2/dml2_translation_helper.c| 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c index 2c379be19aa84..16452dae4acac 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c @@ -398,7 +398,6 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc, /* Copy clocks tables entries, if available */ if (dml2->config.bbox_overrides.clks_table.num_states) { p->in_states->num_states = dml2->config.bbox_overrides.clks_table.num_states; - for (i = 0; i < dml2->config.bbox_overrides.clks_table.num_entries_per_clk.num_dcfclk_levels; i++) { p->in_states->state_array[i].dcfclk_mhz = dml2->config.bbox_overrides.clks_table.clk_entries[i].dcfclk_mhz; } @@ -437,6 +436,14 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc, } dml2_policy_build_synthetic_soc_states(s, p); + if (dml2->v20.dml_core_ctx.project == dml_project_dcn35 || + dml2->v20.dml_core_ctx.project == dml_project_dcn351) { + // Override last out_state with data from last in_state + // This will ensure that out_state contains max fclk + memcpy(&p->out_states->state_array[p->out_states->num_states - 1], + &p->in_states->state_array[p->in_states->num_states - 1], + sizeof(struct soc_state_bounding_box_st)); + } } void dml2_translate_ip_params(const struct dc *in, struct ip_params_st *out) -- 2.43.0
Re: [PATCH] drm/amdgpu: remove misleading amdgpu_pmops_runtime_idle() comment
Applied. Thanks! On Thu, Feb 29, 2024 at 1:11 PM Bjorn Helgaas wrote: > > From: Bjorn Helgaas > > After 4020c2280233 ("drm/amdgpu: don't runtime suspend if there are > displays attached (v3)"), "ret" is unconditionally set later before being > used, so there's point in initializing it and the associated comment is no > longer meaningful. > > Remove the comment and the unnecessary initialization. > > Signed-off-by: Bjorn Helgaas > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index cc69005f5b46..68416e2a9130 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2744,8 +2744,7 @@ static int amdgpu_pmops_runtime_idle(struct device *dev) > { > struct drm_device *drm_dev = dev_get_drvdata(dev); > struct amdgpu_device *adev = drm_to_adev(drm_dev); > - /* we don't want the main rpm_idle to call suspend - we want to > autosuspend */ > - int ret = 1; > + int ret; > > if (adev->pm.rpm_mode == AMDGPU_RUNPM_NONE) { > pm_runtime_forbid(dev); > -- > 2.34.1 >
[PATCH] drm/amdgpu: remove misleading amdgpu_pmops_runtime_idle() comment
From: Bjorn Helgaas After 4020c2280233 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)"), "ret" is unconditionally set later before being used, so there's point in initializing it and the associated comment is no longer meaningful. Remove the comment and the unnecessary initialization. Signed-off-by: Bjorn Helgaas --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index cc69005f5b46..68416e2a9130 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2744,8 +2744,7 @@ static int amdgpu_pmops_runtime_idle(struct device *dev) { struct drm_device *drm_dev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(drm_dev); - /* we don't want the main rpm_idle to call suspend - we want to autosuspend */ - int ret = 1; + int ret; if (adev->pm.rpm_mode == AMDGPU_RUNPM_NONE) { pm_runtime_forbid(dev); -- 2.34.1
Re: [PATCH v3 3/3] drm/amdgpu: sync page table freeing with tlb flush
On 26/02/2024 17:52, Philip Yang wrote: On 2024-02-23 08:42, Shashank Sharma wrote: This patch: - adds a new list in amdgou_vm to hold the VM PT entries being freed - waits for the TLB flush using the vm->tlb_flush_fence - actually frees the PT BOs V2: rebase V3: Do not attach the tlb_fence to the entries, rather add the entries to a list and delay their freeing (Christian) Cc: Christian König Cc: Alex Deucher Cc: Felix Kuehling Cc: Rajneesh Bhardwaj Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 6 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 6 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 51 --- 3 files changed, 58 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 67c690044b97..eebb73f2c2ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -939,6 +939,10 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm, /* Makes sure no PD/PT is freed before the flush */ dma_resv_add_fence(vm->root.bo->tbo.base.resv, *fence, DMA_RESV_USAGE_BOOKKEEP); + + mutex_lock(&vm->tlb_fence_lock); + vm->tlb_fence_last = *fence; + mutex_unlock(&vm->tlb_fence_lock); } amdgpu_res_first(pages_addr ? NULL : res, offset, @@ -2212,6 +2216,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, INIT_LIST_HEAD(&vm->freed); INIT_LIST_HEAD(&vm->done); INIT_LIST_HEAD(&vm->pt_freed); + INIT_LIST_HEAD(&vm->tlb_flush_waitlist); INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work); INIT_KFIFO(vm->faults); @@ -2244,6 +2249,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, vm->last_unlocked = dma_fence_get_stub(); vm->generation = 0; + mutex_init(&vm->tlb_fence_lock); mutex_init(&vm->eviction_lock); vm->evicting = false; vm->tlb_fence_context = dma_fence_context_alloc(1); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 8e6fd25d07b7..77f10ed80973 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -334,6 +334,10 @@ struct amdgpu_vm { uint64_t*tlb_seq_cpu_addr; uint64_ttlb_fence_context; + struct mutex tlb_fence_lock; + struct dma_fence*tlb_fence_last; + struct list_headtlb_flush_waitlist; + atomic64_t kfd_last_flushed_seq; /* How many times we had to re-generate the page tables */ @@ -379,6 +383,8 @@ struct amdgpu_vm { /* cached fault info */ struct amdgpu_vm_fault_info fault_info; + + int count_bos; }; struct amdgpu_vm_manager { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index 95dc0afdaffb..57ea95c5c085 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -643,13 +643,13 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) if (!entry->bo) return; - entry->bo->vm_bo = NULL; shadow = amdgpu_bo_shadowed(entry->bo); if (shadow) { ttm_bo_set_bulk_move(&shadow->tbo, NULL); amdgpu_bo_unref(&shadow); } ttm_bo_set_bulk_move(&entry->bo->tbo, NULL); + entry->bo->vm_bo = NULL; spin_lock(&entry->vm->status_lock); list_del(&entry->vm_status); @@ -657,6 +657,38 @@ static void amdgpu_vm_pt_free(struct amdgpu_vm_bo_base *entry) amdgpu_bo_unref(&entry->bo); } +static void amdgpu_vm_pt_flush_waitlist(struct amdgpu_vm *vm) +{ + struct amdgpu_vm_bo_base *entry, *next; + LIST_HEAD(tlb_flush_waitlist); + + if (!vm || list_empty(&vm->tlb_flush_waitlist)) + return; + + /* Wait for pending TLB flush before freeing PT BOs */ + mutex_lock(&vm->tlb_fence_lock); + if (vm->tlb_fence_last && !dma_fence_is_signaled(vm->tlb_fence_last)) { + if (dma_fence_wait_timeout(vm->tlb_fence_last, false, + MAX_SCHEDULE_TIMEOUT) <= 0) { + DRM_ERROR("Timedout waiting for TLB flush, not freeing PT BOs\n"); + mutex_unlock(&vm->tlb_fence_lock); + return; + } + + vm->tlb_fence_last = NULL; + } + + /* Save the waitlist locally and reset the flushlist */ + list_splice_init(&vm->tlb_flush_waitlist, &tlb_flush_waitlist); + mutex_unlock(&vm->tlb_fence_lock); + + /* Now free the entries */ + list_for_each_entry_safe(entry, next, &tlb_flush_waitlist, vm_status) { + if (entry) +
RE: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
[Public] I think this was discussed in another thread. Exception codes should be range checked prior to applying the mask. Raising null events to the debugger or runtime isn't useful. I haven't gotten around to fixing this yet. I should have time this week. Just to double check, the out of bounds shift is because of a CP interrupt that generates a null exception code? Jon > -Original Message- > From: Jesse Zhang > Sent: Thursday, February 29, 2024 3:45 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Kuehling, Felix > ; Kim, Jonathan ; > Zhang, Yifan ; Zhang, Jesse(Jie) > ; Zhang, Jesse(Jie) > Subject: [PATCH] drm/amdkfd: fix shift out of bounds about gpu debug > > the issue is : > [ 388.151802] UBSAN: shift-out-of-bounds in > drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5 > [ 388.151807] shift exponent 4294967295 is too large for 64-bit type 'long > long unsigned int' > [ 388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE > 6.7.0+ #1 > [ 388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS > WS54117N_140 01/16/2024 > [ 388.151816] Workqueue: KFD IH interrupt_wq [amdgpu] > [ 388.152084] Call Trace: > [ 388.152086] > [ 388.152089] dump_stack_lvl+0x4c/0x70 > [ 388.152096] dump_stack+0x14/0x20 > [ 388.152098] ubsan_epilogue+0x9/0x40 > [ 388.152101] __ubsan_handle_shift_out_of_bounds+0x113/0x170 > [ 388.152103] ? vprintk+0x40/0x70 > [ 388.152106] ? swsusp_check+0x131/0x190 > [ 388.152110] event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu] > [ 388.152411] ? raw_spin_rq_unlock+0x14/0x40 > [ 388.152415] ? finish_task_switch+0x85/0x2a0 > [ 388.152417] ? kfifo_copy_out+0x5f/0x70 > [ 388.152420] interrupt_wq+0xb2/0x120 [amdgpu] > [ 388.152642] ? interrupt_wq+0xb2/0x120 [amdgpu] > [ 388.152728] process_scheduled_works+0x9a/0x3a0 > [ 388.152731] ? __pfx_worker_thread+0x10/0x10 > [ 388.152732] worker_thread+0x15f/0x2d0 > [ 388.152733] ? __pfx_worker_thread+0x10/0x10 > [ 388.152734] kthread+0xfb/0x130 > [ 388.152735] ? __pfx_kthread+0x10/0x10 > [ 388.152736] ret_from_fork+0x3d/0x60 > [ 388.152738] ? __pfx_kthread+0x10/0x10 > [ 388.152739] ret_from_fork_asm+0x1b/0x30 > [ 388.152742] > > Signed-off-by: Jesse Zhang > --- > include/uapi/linux/kfd_ioctl.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h > index 9ce46edc62a5..3d5867df17e8 100644 > --- a/include/uapi/linux/kfd_ioctl.h > +++ b/include/uapi/linux/kfd_ioctl.h > @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code { > }; > > /* Mask generated by ecode in kfd_dbg_trap_exception_code */ > -#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1)) > +#define KFD_EC_MASK(ecode) (ecode ? (1ULL << (ecode - 1)) : 0ULL) > > /* Masks for exception code type checks below */ > #define KFD_EC_MASK_QUEUE > (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \ > -- > 2.25.1
Re: [PATCH 1/2] Revert "drm/amd: Remove freesync video mode amdgpu parameter"
On Wed, Feb 28, 2024 at 9:45 AM Christian König wrote: > > Am 28.02.24 um 15:23 schrieb Alex Deucher: > > On Wed, Feb 28, 2024 at 2:03 AM Christian König > > wrote: > >> Am 27.02.24 um 19:48 schrieb Alex Deucher: > >>> This reverts commit e94e787e37b99645e7c02d20d0a1ba0f8a18a82a. > >>> > >>> This conflicts with how compositors want to handle VRR. Now > >>> that compositors actually handle VRR, we probably don't need > >>> freesync video. > >>> > >>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2985 > >> Scratching my head what actually happens here? Doesn't the problem then > >> just depend on a module parameter? > > Yes. The problem is that when freesync video is enabled, compositors > > don't know which modes are actual modes and which are a VRR video > > mode. There are still customers that want the vrr video mode smooth > > video playback, but compositors don't want this by default. I guess > > the alternative is to just drop this feature altogether now that > > compositors and media players are starting to support this properly. > > That's what I would suggest as well. > > As far as I can see adding those modes is actually buggy behavior and we > need to avoid it. Well, they work as expected for the use case they were added for, video playback with different refresh rates without a modeset. I'll apply then as is for now and then work on a set of patches to remove the functionality. Alex > > Christian. > > > > > Alex > > > >> Regards, > >> Christian. > >> > >>> Signed-off-by: Alex Deucher > >>> --- > >>>drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > >>>drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 27 + > >>>2 files changed, 28 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> index 0e365cadcc3fc..925026c183f41 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >>> @@ -194,6 +194,7 @@ extern int amdgpu_emu_mode; > >>>extern uint amdgpu_smu_memory_pool_size; > >>>extern int amdgpu_smu_pptable_id; > >>>extern uint amdgpu_dc_feature_mask; > >>> +extern uint amdgpu_freesync_vid_mode; > >>>extern uint amdgpu_dc_debug_mask; > >>>extern uint amdgpu_dc_visual_confirm; > >>>extern int amdgpu_dm_abm_level; > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> index 15a8a64fc4e28..82b154b103f43 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> @@ -199,6 +199,7 @@ int amdgpu_mes_kiq; > >>>int amdgpu_noretry = -1; > >>>int amdgpu_force_asic_type = -1; > >>>int amdgpu_tmz = -1; /* auto */ > >>> +uint amdgpu_freesync_vid_mode; > >>>int amdgpu_reset_method = -1; /* auto */ > >>>int amdgpu_num_kcq = -1; > >>>int amdgpu_smartshift_bias; > >>> @@ -883,6 +884,32 @@ module_param_named(damageclips, amdgpu_damage_clips, > >>> int, 0444); > >>>MODULE_PARM_DESC(tmz, "Enable TMZ feature (-1 = auto (default), 0 = > >>> off, 1 = on)"); > >>>module_param_named(tmz, amdgpu_tmz, int, 0444); > >>> > >>> +/** > >>> + * DOC: freesync_video (uint) > >>> + * Enable the optimization to adjust front porch timing to achieve > >>> seamless > >>> + * mode change experience when setting a freesync supported mode for > >>> which full > >>> + * modeset is not needed. > >>> + * > >>> + * The Display Core will add a set of modes derived from the base > >>> FreeSync > >>> + * video mode into the corresponding connector's mode list based on > >>> commonly > >>> + * used refresh rates and VRR range of the connected display, when users > >>> enable > >>> + * this feature. From the userspace perspective, they can see a seamless > >>> mode > >>> + * change experience when the change between different refresh rates > >>> under the > >>> + * same resolution. Additionally, userspace applications such as Video > >>> playback > >>> + * can read this modeset list and change the refresh rate based on the > >>> video > >>> + * frame rate. Finally, the userspace can also derive an appropriate > >>> mode for a > >>> + * particular refresh rate based on the FreeSync Mode and add it to the > >>> + * connector's mode list. > >>> + * > >>> + * Note: This is an experimental feature. > >>> + * > >>> + * The default value: 0 (off). > >>> + */ > >>> +MODULE_PARM_DESC( > >>> + freesync_video, > >>> + "Enable freesync modesetting optimization feature (0 = off > >>> (default), 1 = on)"); > >>> +module_param_named(freesync_video, amdgpu_freesync_vid_mode, uint, 0444); > >>> + > >>>/** > >>> * DOC: reset_method (int) > >>> * GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = > >>> mode1, 3 = mode2, 4 = baco) >
Re: [PATCH] drm/amd/display: check dc_link before dereferencing
Applied. Thanks! On Tue, Feb 27, 2024 at 2:08 PM Melissa Wen wrote: > > drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6683 > amdgpu_dm_connector_funcs_force() > warn: variable dereferenced before check 'dc_link' (see line 6663) > > Fixes: 967176179215 ("drm/amd/display: fix null-pointer dereference on edid > reading") > Reported-by: Dan Carpenter > Signed-off-by: Melissa Wen > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > index 32efce81a5a7..46dd06e8fc7e 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c > @@ -6653,7 +6653,7 @@ static void amdgpu_dm_connector_funcs_force(struct > drm_connector *connector) > struct edid *edid; > struct i2c_adapter *ddc; > > - if (dc_link->aux_mode) > + if (dc_link && dc_link->aux_mode) > ddc = &aconnector->dm_dp_aux.aux.ddc; > else > ddc = &aconnector->i2c->base; > -- > 2.43.0 >
Re: [PATCH 2/2] drm/amdgpu: workaround to avoid SET_Q_MODE packets
On Thu, Feb 29, 2024 at 9:58 AM Christian König wrote: > > It turned out that executing the SET_Q_MODE packet on every submission > creates to much overhead. > > Implement a workaround which allows skipping the SET_Q_MODE packet if > subsequent submissions all use the same parameters. > > Signed-off-by: Christian König Series is: Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 104 +++ > 2 files changed, 92 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index 756330767909..582053f1cd56 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -285,6 +285,9 @@ struct amdgpu_ring { > unsignedcond_exe_offs; > u64 cond_exe_gpu_addr; > volatile u32*cond_exe_cpu_addr; > + unsigned intset_q_mode_offs; > + volatile u32*set_q_mode_ptr; > + u64 set_q_mode_token; > unsignedvm_hub; > unsignedvm_inv_eng; > struct dma_fence*vmid_wait; > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > index 2ccbdee570cf..6e6b6eff48e2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c > @@ -5461,6 +5461,11 @@ static void gfx_v11_0_ring_emit_vm_flush(struct > amdgpu_ring *ring, > amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); > amdgpu_ring_write(ring, 0x0); > } > + > + /* Make sure that we can't skip the SET_Q_MODE packets when the VM > +* changed in any way. > +*/ > + ring->set_q_mode_ptr = NULL; > } > > static void gfx_v11_0_ring_emit_fence_kiq(struct amdgpu_ring *ring, u64 addr, > @@ -5510,16 +5515,81 @@ static void gfx_v11_0_ring_emit_cntxcntl(struct > amdgpu_ring *ring, > amdgpu_ring_write(ring, 0); > } > > +static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring, > + uint64_t addr) > +{ > + unsigned ret; > + > + amdgpu_ring_write(ring, PACKET3(PACKET3_COND_EXEC, 3)); > + amdgpu_ring_write(ring, lower_32_bits(addr)); > + amdgpu_ring_write(ring, upper_32_bits(addr)); > + /* discard following DWs if *cond_exec_gpu_addr==0 */ > + amdgpu_ring_write(ring, 0); > + ret = ring->wptr & ring->buf_mask; > + /* patch dummy value later */ > + amdgpu_ring_write(ring, 0); > + > + return ret; > +} > + > static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring, >u64 shadow_va, u64 csa_va, >u64 gds_va, bool init_shadow, >int vmid) > { > struct amdgpu_device *adev = ring->adev; > + unsigned int offs, end; > > if (!adev->gfx.cp_gfx_shadow) > return; > > + /* > +* The logic here isn't easy to understand because we need to keep > state > +* accross multiple executions of the function as well as between the > +* CPU and GPU. The general idea is that the newly written GPU command > +* has a condition on the previous one and only executed if really > +* necessary. > +*/ > + > + /* > +* The dw in the NOP controls if the next SET_Q_MODE packet should be > +* executed or not. Reserve 64bits just to be on the save side. > +*/ > + amdgpu_ring_write(ring, PACKET3(PACKET3_NOP, 1)); > + offs = ring->wptr & ring->buf_mask; > + > + /* > +* We start with skipping the prefix SET_Q_MODE and always executing > +* the postfix SET_Q_MODE packet. This is changed below with a > +* WRITE_DATA command when the postfix executed. > +*/ > + amdgpu_ring_write(ring, shadow_va ? 1 : 0); > + amdgpu_ring_write(ring, 0); > + > + if (ring->set_q_mode_offs) { > + uint64_t addr; > + > + addr = amdgpu_bo_gpu_offset(ring->ring_obj); > + addr += ring->set_q_mode_offs << 2; > + end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr); > + } > + > + /* > +* When the postfix SET_Q_MODE packet executes we need to make sure > that the > +* next prefix SET_Q_MODE packet executes as well. > +*/ > + if (!shadow_va) { > + uint64_t addr; > + > + addr = amdgpu_bo_gpu_offset(ring->ring_obj); > + addr += offs << 2; > + amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3)); > + amdgpu_ring_write(ring, WRITE_DATA_DST_SEL(5) | WR_CONFIRM); > + amdgp
Re: [RFC PATCH v4 00/42] Color Pipeline API w/ VKMS
On 2/29/24 10:43, Daniel Vetter wrote: On Mon, Feb 26, 2024 at 04:10:14PM -0500, Harry Wentland wrote: This is an RFC set for a color pipeline API, along with a sample implementation in VKMS. All the key API bits are here. VKMS now supports two named transfer function colorops and two matrix colorops. We have IGT tests that check all four of these colorops with a pixel-by-pixel comparison that checks that these colorops do what we expect them to do with a +/- 1 8 bpc code point margin. So vkms is definitely great to make sure the igts are generic enough and somewhat useful, but ... does steam run on vkms too? I think that would be a really good test to show that the api we have here is actually useful for compositors in a cross-driver way, and not just a neat idea that doesn't survive harsh reality. And yes I realize that's probably going to be a bunch of work, but I feel like the color pipeline discussion has dragged around enough in hypotheticals and concerns that I think it would really help a lot. I don't think we have ever tested Steam/Gamescope on vkms. The last time I tried stuff there, there was all the problems with the ttm page table tail thing for virtio stuff that made using Steam games + Gamescope unfeasable because Vulkan + bindless, but I have heard those are solved now? I will have to try it again at the weekend and see where it's at. I am willing to place my bets that some part of the stack will fall over relating to modifiers somehow... =P But yes, testing there would be good too, as we have the full Steam Deck OLED HDR color pipeline implemented in shader-based composition validated as being 1:1 on a suite of HDR and SDR test images. (That *will* definitely rely on 3D LUTs being tetrahedrally interpolated though) I'll have a look at this at the weekend and also see about getting a Gamescope branch that uses the new wip colorop stuff. - Joshie 🐸✨ Thoughts? -Sima The big new change with v4 is the addition of an amdgpu color pipeline, for all AMD GPUs with DCN 3 and newer. Amdgpu now support the following: 1. 1D Curve EOTF 2. 3x4 CTM 3. Multiplier 4. 1D Curve Inverse EOTF 5. 1D LUT 6. 1D Curve EOTF 7. 1D LUT The supported curves for the 1D Curve type are: - sRGB EOTF and its inverse - PQ EOTF, scaled to [0.0, 125.0] and its inverse - BT.2020/BT.709 OETF and its inverse Note that the 1st and 5th colorops take the EOTF or Inverse OETF while the 3rd colorop takes the Inverse EOTF or OETF. We are working on two more ops for amdgpu, the HDR multiplier and the 3DLUT, which will give us this: 1. 1D Curve EOTF 2. 3x4 CTM 3. HDR Multiplier 4. 1D Curve Inverse EOTF 5. 1D LUT 6. 3D LUT 7. 1D Curve EOTF 8. 1D LUT This, essentially mirrors the color pipeline used by gamescope and presented by Melissa Wen, with the exception of the DEGAM LUT, which is not currently used. See [1] https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf After this we'd like to also add the following ops: - Scaler (Informational only) - Color Encoding, to replace drm_plane's COLOR_ENCODING - Color Range, to replace drm_plane's COLOR_RANGE This patchset is grouped as follows: - Patches 1-3: couple general patches/fixes - Patches 4-7: introduce kunit to VKMS - Patch 7: description of motivation and details behind the Color Pipeline API. If you're reading nothing else but are interested in the topic I highly recommend you take a look at this. - Patches 7-27: DRM core and VKMS changes for color pipeline API - Patches 28-40: DRM core and amdgpu changes for color pipeline API VKMS patches could still be improved in a few ways, though the payoff might be limited and I would rather focus on other work at the moment. The most obvious thing to improve would be to eliminate the hard-coded LUTs for identity, and sRGB, and replace them with fixed-point math instead. There are plenty of things that I would like to see here but haven't had a chance to look at. These will (hopefully) be addressed in future iterations, either in VKMS or amdgpu: - Clear documentation for each drm_colorop_type - Add custom LUT colorops to VKMS - Add pre-blending 3DLUT - How to support HW which can't bypass entire pipeline? - Add ability to create colorops that don't have BYPASS - Can we do a LOAD / COMMIT model for LUTs (and other properties)? - read-only scaling colorop which defines scaling taps and position - read-only color format colorop to define supported color formats for a pipeline - named matrices, for things like converting YUV to RGB IGT tests can be found at https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1 IGT patches are also being sent to the igt-dev mailing list. If you prefer a gitlab MR for review you can find it at https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5 v4: - Add amdgpu color pipeline (WIP) - Don't block
[pull] amdgpu drm-fixes-6.8
Hi Dave, Sima, Fixes for 6.8. The following changes since commit d206a76d7d2726f3b096037f2079ce0bd3ba329b: Linux 6.8-rc6 (2024-02-25 15:46:06 -0800) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-fixes-6.8-2024-02-29 for you to fetch changes up to b7cdccc6a849568775f738b1e233f751a8fed013: drm/amd/display: Add monitor patch for specific eDP (2024-02-28 17:33:05 -0500) amd-drm-fixes-6.8-2024-02-29: amdgpu: - Fix potential buffer overflow - Fix power min cap - Suspend/resume fix - SI PM fix - eDP fix Alex Deucher (1): Revert "drm/amd/pm: resolve reboot exception for si oland" Ma Jun (1): drm/amdgpu/pm: Fix the power1_min_cap value Prike Liang (1): drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series Ryan Lin (1): drm/amd/display: Add monitor patch for specific eDP Srinivasan Shanmugam (1): drm/amd/display: Prevent potential buffer overflow in map_hw_resources drivers/gpu/drm/amd/amdgpu/soc15.c | 45 -- .../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 6 ++- drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c | 5 +++ drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 29 ++ drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 9 ++--- drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c| 9 ++--- .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c| 9 ++--- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 9 ++--- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 9 ++--- 9 files changed, 83 insertions(+), 47 deletions(-)
[PATCH 1/2] drm/amdgpu: cleanup conditional execution
First of all calculating the number of dw to patch into a conditional execution is not something HW generation specific. This is just standard ring buffer calculations. While at it also reduce the BUG_ON() into WARN_ON(). Then instead of a random bit pattern use 0 as default value for the number of dw skipped, this way it's not mandatory any more to patch the conditional execution. And last make the address to check a parameter of the conditional execution instead of getting this from the ring. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 21 ++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 30 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 26 +--- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 28 +++--- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 27 +++-- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 28 +++--- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 28 +++--- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 29 +++ drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 29 +++ drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 29 +++ 11 files changed, 99 insertions(+), 184 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 6aa3b1d845ab..8b512dc28df8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -131,7 +131,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs, struct amdgpu_ib *ib = &ibs[0]; struct dma_fence *tmp = NULL; bool need_ctx_switch; - unsigned int patch_offset = ~0; struct amdgpu_vm *vm; uint64_t fence_ctx; uint32_t status = 0, alloc_size; @@ -139,10 +138,11 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs, bool secure, init_shadow; u64 shadow_va, csa_va, gds_va; int vmid = AMDGPU_JOB_GET_VMID(job); + bool need_pipe_sync = false; + unsigned int cond_exec; unsigned int i; int r = 0; - bool need_pipe_sync = false; if (num_ibs == 0) return -EINVAL; @@ -228,7 +228,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs, init_shadow, vmid); if (ring->funcs->init_cond_exec) - patch_offset = amdgpu_ring_init_cond_exec(ring); + cond_exec = amdgpu_ring_init_cond_exec(ring, + ring->cond_exe_gpu_addr); amdgpu_device_flush_hdp(adev, ring); @@ -278,16 +279,9 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs, fence_flags | AMDGPU_FENCE_FLAG_64BIT); } - if (ring->funcs->emit_gfx_shadow) { + if (ring->funcs->emit_gfx_shadow && ring->funcs->init_cond_exec) { amdgpu_ring_emit_gfx_shadow(ring, 0, 0, 0, false, 0); - - if (ring->funcs->init_cond_exec) { - unsigned int ce_offset = ~0; - - ce_offset = amdgpu_ring_init_cond_exec(ring); - if (ce_offset != ~0 && ring->funcs->patch_cond_exec) - amdgpu_ring_patch_cond_exec(ring, ce_offset); - } + amdgpu_ring_init_cond_exec(ring, ring->cond_exe_gpu_addr); } r = amdgpu_fence_emit(ring, f, job, fence_flags); @@ -302,8 +296,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs, if (ring->funcs->insert_end) ring->funcs->insert_end(ring); - if (patch_offset != ~0 && ring->funcs->patch_cond_exec) - amdgpu_ring_patch_cond_exec(ring, patch_offset); + amdgpu_ring_patch_cond_exec(ring, cond_exec); ring->current_ctx = fence_ctx; if (vm && ring->funcs->emit_switch_buffer) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index fe1a61eb6e4c..756330767909 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -209,8 +209,7 @@ struct amdgpu_ring_funcs { void (*insert_end)(struct amdgpu_ring *ring); /* pad the indirect buffer to the necessary number of dw */ void (*pad_ib)(struct amdgpu_ring *ring, struct amdgpu_ib *ib); - unsigned (*init_cond_exec)(struct amdgpu_ring *ring); - void (*patch_cond_exec)(struct amdgpu_ring *ring, unsigned offset); + unsigned (*init_cond_exec)(struct amdgpu_ring *ring, uint64_t addr); /* note usage for clock and power gating */ void (*begin_use)(struct amdgpu_ring *ring); void (*end_use)(struct amdgpu_ring *ring); @@ -327,8 +326,7 @@ struct amdgpu_ring { #def
[PATCH 2/2] drm/amdgpu: workaround to avoid SET_Q_MODE packets
It turned out that executing the SET_Q_MODE packet on every submission creates to much overhead. Implement a workaround which allows skipping the SET_Q_MODE packet if subsequent submissions all use the same parameters. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 104 +++ 2 files changed, 92 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 756330767909..582053f1cd56 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -285,6 +285,9 @@ struct amdgpu_ring { unsignedcond_exe_offs; u64 cond_exe_gpu_addr; volatile u32*cond_exe_cpu_addr; + unsigned intset_q_mode_offs; + volatile u32*set_q_mode_ptr; + u64 set_q_mode_token; unsignedvm_hub; unsignedvm_inv_eng; struct dma_fence*vmid_wait; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 2ccbdee570cf..6e6b6eff48e2 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -5461,6 +5461,11 @@ static void gfx_v11_0_ring_emit_vm_flush(struct amdgpu_ring *ring, amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); amdgpu_ring_write(ring, 0x0); } + + /* Make sure that we can't skip the SET_Q_MODE packets when the VM +* changed in any way. +*/ + ring->set_q_mode_ptr = NULL; } static void gfx_v11_0_ring_emit_fence_kiq(struct amdgpu_ring *ring, u64 addr, @@ -5510,16 +5515,81 @@ static void gfx_v11_0_ring_emit_cntxcntl(struct amdgpu_ring *ring, amdgpu_ring_write(ring, 0); } +static unsigned gfx_v11_0_ring_emit_init_cond_exec(struct amdgpu_ring *ring, + uint64_t addr) +{ + unsigned ret; + + amdgpu_ring_write(ring, PACKET3(PACKET3_COND_EXEC, 3)); + amdgpu_ring_write(ring, lower_32_bits(addr)); + amdgpu_ring_write(ring, upper_32_bits(addr)); + /* discard following DWs if *cond_exec_gpu_addr==0 */ + amdgpu_ring_write(ring, 0); + ret = ring->wptr & ring->buf_mask; + /* patch dummy value later */ + amdgpu_ring_write(ring, 0); + + return ret; +} + static void gfx_v11_0_ring_emit_gfx_shadow(struct amdgpu_ring *ring, u64 shadow_va, u64 csa_va, u64 gds_va, bool init_shadow, int vmid) { struct amdgpu_device *adev = ring->adev; + unsigned int offs, end; if (!adev->gfx.cp_gfx_shadow) return; + /* +* The logic here isn't easy to understand because we need to keep state +* accross multiple executions of the function as well as between the +* CPU and GPU. The general idea is that the newly written GPU command +* has a condition on the previous one and only executed if really +* necessary. +*/ + + /* +* The dw in the NOP controls if the next SET_Q_MODE packet should be +* executed or not. Reserve 64bits just to be on the save side. +*/ + amdgpu_ring_write(ring, PACKET3(PACKET3_NOP, 1)); + offs = ring->wptr & ring->buf_mask; + + /* +* We start with skipping the prefix SET_Q_MODE and always executing +* the postfix SET_Q_MODE packet. This is changed below with a +* WRITE_DATA command when the postfix executed. +*/ + amdgpu_ring_write(ring, shadow_va ? 1 : 0); + amdgpu_ring_write(ring, 0); + + if (ring->set_q_mode_offs) { + uint64_t addr; + + addr = amdgpu_bo_gpu_offset(ring->ring_obj); + addr += ring->set_q_mode_offs << 2; + end = gfx_v11_0_ring_emit_init_cond_exec(ring, addr); + } + + /* +* When the postfix SET_Q_MODE packet executes we need to make sure that the +* next prefix SET_Q_MODE packet executes as well. +*/ + if (!shadow_va) { + uint64_t addr; + + addr = amdgpu_bo_gpu_offset(ring->ring_obj); + addr += offs << 2; + amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3)); + amdgpu_ring_write(ring, WRITE_DATA_DST_SEL(5) | WR_CONFIRM); + amdgpu_ring_write(ring, lower_32_bits(addr)); + amdgpu_ring_write(ring, upper_32_bits(addr)); + amdgpu_ring_write(ring, 0x1); + } + amdgpu_ring_write(ring, PACKET3(PACKET3_SET_Q_PREEMPTION_MODE, 7)); amdgpu_ring_write(ring, lower_32_bits(shadow_va)); amdgpu_ring_write(ring, upper_
Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit
On 2/29/2024 4:40 PM, Ma, Jun wrote: > Hi Lijo, > > On 2/29/2024 3:33 PM, Lazar, Lijo wrote: >> >> >> On 2/29/2024 11:49 AM, Ma Jun wrote: >>> Check return value of amdgpu_device_baco_enter/exit and print >>> warning message because these errors may cause runtime resume failure >>> >>> Signed-off-by: Ma Jun >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 -- >>> 1 file changed, 22 insertions(+), 7 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index e68bd6f8a6a4..4928b588cd12 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -6000,15 +6000,24 @@ int amdgpu_device_baco_enter(struct drm_device *dev) >>> { >>> struct amdgpu_device *adev = drm_to_adev(dev); >>> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); >>> + int ret = 0; >>> >>> - if (!amdgpu_device_supports_baco(dev)) >>> - return -ENOTSUPP; >>> + if (!amdgpu_device_supports_baco(dev)) { >>> + ret = -ENOTSUPP; >>> + goto baco_error; >>> + } >>> >>> if (ras && adev->ras_enabled && >>> adev->nbio.funcs->enable_doorbell_interrupt) >>> adev->nbio.funcs->enable_doorbell_interrupt(adev, false); >>> >>> - return amdgpu_dpm_baco_enter(adev); >>> + ret = amdgpu_dpm_baco_enter(adev); >>> + >>> +baco_error: >>> + if (ret) >>> + dev_warn(adev->dev, "warning: device fails to enter baco. >>> ret=%d\n", ret); >>> + >> >> This doesn't look like a real case, moreover the warning message is > > In fact this is a case that actually happened. > > When amdgpu_device_supports_baco returns with error because of some reasons, > device will enter runtime suspend without calling the > amdgpu_device_baco_enter() > and without any warning message being printed. Then, device is usually fails > to resume. > So, I add this message as a warning to help us find the real reason. > >> misleading. If the device doesn't support baco, driver is not supposed >> to call it for runpm purpose - >> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2664 >> > I changed this code in another patch. > https://lists.freedesktop.org/archives/amd-gfx/2024-February/104929.html > Baco support checks are all based on cached software variables and not based on hardware interactions. So this is very unlikely. It might also be that driver really called baco entry, but even before baco entry happened a device usage is detected for which resume is called. One other way to really check this is to check if baco exit is called when the software state is not really SMU_BACO_STATE_ENTER (applicable only for swsmu). Since baco entry is driver triggered, the imbalance shouldn't happen. Thanks, Lijo > Regards, > Ma Jun > >> Thanks, >> Lijo >> >>> + return ret; >>> } >>> >>> int amdgpu_device_baco_exit(struct drm_device *dev) >>> @@ -6017,12 +6026,14 @@ int amdgpu_device_baco_exit(struct drm_device *dev) >>> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); >>> int ret = 0; >>> >>> - if (!amdgpu_device_supports_baco(dev)) >>> - return -ENOTSUPP; >>> + if (!amdgpu_device_supports_baco(dev)) { >>> + ret = -ENOTSUPP; >>> + goto baco_error; >>> + } >>> >>> ret = amdgpu_dpm_baco_exit(adev); >>> if (ret) >>> - return ret; >>> + goto baco_error; >>> >>> if (ras && adev->ras_enabled && >>> adev->nbio.funcs->enable_doorbell_interrupt) >>> @@ -6032,7 +6043,11 @@ int amdgpu_device_baco_exit(struct drm_device *dev) >>> adev->nbio.funcs->clear_doorbell_interrupt) >>> adev->nbio.funcs->clear_doorbell_interrupt(adev); >>> >>> - return 0; >>> +baco_error: >>> + if (ret) >>> + dev_warn(adev->dev, "warning: device fails to exit from baco. >>> ret=%d\n", ret); >>> + >>> + return ret; >>> } >>> >>> /**
Re: [PATCH V3] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven
Am 29.02.24 um 06:51 schrieb Jesse.Zhang: fix the issue: "amdgpu: Failed to create process VM object". [Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table. But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean. So they have a conflict, and clinfo always failed. v1: - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix) Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 23 -- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 3 -- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 55 +-- 3 files changed, 1 insertion(+), 80 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index ed4a8c5d26d7..d004ace79536 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1385,10 +1385,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, struct amdgpu_bo_va_mapping, list); list_del(&mapping->list); - if (vm->pte_support_ats && - mapping->start < AMDGPU_GMC_HOLE_START) - init_pte_value = AMDGPU_PTE_DEFAULT_ATC; - r = amdgpu_vm_update_range(adev, vm, false, false, true, false, resv, mapping->start, mapping->last, init_pte_value, 0, 0, NULL, NULL, @@ -2264,7 +2260,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (r) return r; - vm->pte_support_ats = false; vm->is_compute_context = false; vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode & @@ -2350,30 +2345,12 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, */ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) { - bool pte_support_ats = (adev->asic_type == CHIP_RAVEN); int r; r = amdgpu_bo_reserve(vm->root.bo, true); if (r) return r; - /* Check if PD needs to be reinitialized and do it before -* changing any other state, in case it fails. -*/ - if (pte_support_ats != vm->pte_support_ats) { - /* Sanity checks */ - if (!amdgpu_vm_pt_is_root_clean(adev, vm)) { - r = -EINVAL; - goto unreserve_bo; - } - - vm->pte_support_ats = pte_support_ats; - r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo), - false); - if (r) - goto unreserve_bo; - } - /* Update VM state */ vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode & AMDGPU_VM_USE_CPU_FOR_COMPUTE); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 42f6ddec50c1..9f6b5e1ccf34 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -357,9 +357,6 @@ struct amdgpu_vm { /* Functions to use for VM table updates */ const struct amdgpu_vm_update_funcs *update_funcs; - /* Flag to indicate ATS support from PTE for GFX9 */ - boolpte_support_ats; - /* Up to 128 pending retry page faults */ DECLARE_KFIFO(faults, u64, 128); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index a160265ddc07..f07255532aae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -89,22 +89,6 @@ static unsigned int amdgpu_vm_pt_num_entries(struct amdgpu_device *adev, return AMDGPU_VM_PTE_COUNT(adev); } -/** - * amdgpu_vm_pt_num_ats_entries - return the number of ATS entries in the root PD - * - * @adev: amdgpu_device pointer - * - * Returns: - * The number of entries in the root page directory which needs the ATS setting. - */ -static unsigned int amdgpu_vm_pt_num_ats_entries(struct amdgpu_device *adev) -{ - unsigned int shift; - - shift = amdgpu_vm_pt_level_shift(adev, adev->vm_manager.root_level); - return AMDGPU_GMC_HOLE_START >> (shift + AMDGPU_GPU_PAGE_SHIFT); -} - /** * amdgpu_vm_pt_entries_mask - the mask to get the entry number of a PD/PT * @@ -394,27 +378,7 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct amdgpu_vm *vm, } entries = amdgpu_bo_size(bo) / 8; - if (!vm->pte_support_ats) { - ats_entries = 0; - - } else if (!bo->parent) { - ats_entries = amdgpu_vm_pt_num_ats_entries(adev); - ats_entries = min(ats_entries, entries); - entries -= ats_entries; -
Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit
Hi Lijo, On 2/29/2024 3:33 PM, Lazar, Lijo wrote: > > > On 2/29/2024 11:49 AM, Ma Jun wrote: >> Check return value of amdgpu_device_baco_enter/exit and print >> warning message because these errors may cause runtime resume failure >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 -- >> 1 file changed, 22 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index e68bd6f8a6a4..4928b588cd12 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -6000,15 +6000,24 @@ int amdgpu_device_baco_enter(struct drm_device *dev) >> { >> struct amdgpu_device *adev = drm_to_adev(dev); >> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); >> +int ret = 0; >> >> -if (!amdgpu_device_supports_baco(dev)) >> -return -ENOTSUPP; >> +if (!amdgpu_device_supports_baco(dev)) { >> +ret = -ENOTSUPP; >> +goto baco_error; >> +} >> >> if (ras && adev->ras_enabled && >> adev->nbio.funcs->enable_doorbell_interrupt) >> adev->nbio.funcs->enable_doorbell_interrupt(adev, false); >> >> -return amdgpu_dpm_baco_enter(adev); >> +ret = amdgpu_dpm_baco_enter(adev); >> + >> +baco_error: >> +if (ret) >> +dev_warn(adev->dev, "warning: device fails to enter baco. >> ret=%d\n", ret); >> + > > This doesn't look like a real case, moreover the warning message is In fact this is a case that actually happened. When amdgpu_device_supports_baco returns with error because of some reasons, device will enter runtime suspend without calling the amdgpu_device_baco_enter() and without any warning message being printed. Then, device is usually fails to resume. So, I add this message as a warning to help us find the real reason. > misleading. If the device doesn't support baco, driver is not supposed > to call it for runpm purpose - > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2664 > I changed this code in another patch. https://lists.freedesktop.org/archives/amd-gfx/2024-February/104929.html Regards, Ma Jun > Thanks, > Lijo > >> +return ret; >> } >> >> int amdgpu_device_baco_exit(struct drm_device *dev) >> @@ -6017,12 +6026,14 @@ int amdgpu_device_baco_exit(struct drm_device *dev) >> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); >> int ret = 0; >> >> -if (!amdgpu_device_supports_baco(dev)) >> -return -ENOTSUPP; >> +if (!amdgpu_device_supports_baco(dev)) { >> +ret = -ENOTSUPP; >> +goto baco_error; >> +} >> >> ret = amdgpu_dpm_baco_exit(adev); >> if (ret) >> -return ret; >> +goto baco_error; >> >> if (ras && adev->ras_enabled && >> adev->nbio.funcs->enable_doorbell_interrupt) >> @@ -6032,7 +6043,11 @@ int amdgpu_device_baco_exit(struct drm_device *dev) >> adev->nbio.funcs->clear_doorbell_interrupt) >> adev->nbio.funcs->clear_doorbell_interrupt(adev); >> >> -return 0; >> +baco_error: >> +if (ret) >> +dev_warn(adev->dev, "warning: device fails to exit from baco. >> ret=%d\n", ret); >> + >> +return ret; >> } >> >> /**
Re: [RFC PATCH v4 00/42] Color Pipeline API w/ VKMS
On Mon, Feb 26, 2024 at 04:10:14PM -0500, Harry Wentland wrote: > This is an RFC set for a color pipeline API, along with a sample > implementation in VKMS. All the key API bits are here. VKMS now > supports two named transfer function colorops and two matrix > colorops. We have IGT tests that check all four of these colorops > with a pixel-by-pixel comparison that checks that these colorops > do what we expect them to do with a +/- 1 8 bpc code point margin. So vkms is definitely great to make sure the igts are generic enough and somewhat useful, but ... does steam run on vkms too? I think that would be a really good test to show that the api we have here is actually useful for compositors in a cross-driver way, and not just a neat idea that doesn't survive harsh reality. And yes I realize that's probably going to be a bunch of work, but I feel like the color pipeline discussion has dragged around enough in hypotheticals and concerns that I think it would really help a lot. Thoughts? -Sima > > The big new change with v4 is the addition of an amdgpu color > pipeline, for all AMD GPUs with DCN 3 and newer. Amdgpu now support > the following: > > 1. 1D Curve EOTF > 2. 3x4 CTM > 3. Multiplier > 4. 1D Curve Inverse EOTF > 5. 1D LUT > 6. 1D Curve EOTF > 7. 1D LUT > > The supported curves for the 1D Curve type are: > - sRGB EOTF and its inverse > - PQ EOTF, scaled to [0.0, 125.0] and its inverse > - BT.2020/BT.709 OETF and its inverse > > Note that the 1st and 5th colorops take the EOTF or Inverse > OETF while the 3rd colorop takes the Inverse EOTF or OETF. > > We are working on two more ops for amdgpu, the HDR multiplier > and the 3DLUT, which will give us this: > > 1. 1D Curve EOTF > 2. 3x4 CTM > 3. HDR Multiplier > 4. 1D Curve Inverse EOTF > 5. 1D LUT > 6. 3D LUT > 7. 1D Curve EOTF > 8. 1D LUT > > This, essentially mirrors the color pipeline used by gamescope > and presented by Melissa Wen, with the exception of the DEGAM > LUT, which is not currently used. See > [1] > https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf > > After this we'd like to also add the following ops: > - Scaler (Informational only) > - Color Encoding, to replace drm_plane's COLOR_ENCODING > - Color Range, to replace drm_plane's COLOR_RANGE > > This patchset is grouped as follows: > - Patches 1-3: couple general patches/fixes > - Patches 4-7: introduce kunit to VKMS > - Patch 7: description of motivation and details behind the > Color Pipeline API. If you're reading nothing else > but are interested in the topic I highly recommend > you take a look at this. > - Patches 7-27: DRM core and VKMS changes for color pipeline API > - Patches 28-40: DRM core and amdgpu changes for color pipeline API > > VKMS patches could still be improved in a few ways, though the > payoff might be limited and I would rather focus on other work > at the moment. The most obvious thing to improve would be to > eliminate the hard-coded LUTs for identity, and sRGB, and replace > them with fixed-point math instead. > > There are plenty of things that I would like to see here but > haven't had a chance to look at. These will (hopefully) be > addressed in future iterations, either in VKMS or amdgpu: > - Clear documentation for each drm_colorop_type > - Add custom LUT colorops to VKMS > - Add pre-blending 3DLUT > - How to support HW which can't bypass entire pipeline? > - Add ability to create colorops that don't have BYPASS > - Can we do a LOAD / COMMIT model for LUTs (and other properties)? > - read-only scaling colorop which defines scaling taps and position > - read-only color format colorop to define supported color formats >for a pipeline > - named matrices, for things like converting YUV to RGB > > IGT tests can be found at > https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1 > > IGT patches are also being sent to the igt-dev mailing list. > > If you prefer a gitlab MR for review you can find it at > https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5 > > v4: > - Add amdgpu color pipeline (WIP) > - Don't block setting of deprecated properties, instead pass client cap >to atomic check so drivers can ignore these props > - Drop IOCTL definitions (Pekka) > - Use enum property for colorop TYPE (Pekka) > - A few cleanups to the docs (Pekka) > - Rework the TYPE enum to name relation to avoid code duplication (Pekka) > - Add missing function declarations (Chaitanya Kumar Borah) > - Allow setting of NEXT property to NULL in _set_ function (Chaitanya Kumar > Borah) > - Add helper for creation of pipeline drm_plane property (Pekka) > - Always create Bypass pipeline (Pekka) > - A bunch of changes to VKMS kunit tests (Pekka) > - Fix index in CTM doc (Pekka) > > v3: > - Abandon IOCTLs and discover colorops as clients iterate the pipeline > - Remove need for libdrm > - Add co
[PATCH] drm/amdkfd: fix shift out of bounds about gpu debug
the issue is : [ 388.151802] UBSAN: shift-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5 [ 388.151807] shift exponent 4294967295 is too large for 64-bit type 'long long unsigned int' [ 388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE 6.7.0+ #1 [ 388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS WS54117N_140 01/16/2024 [ 388.151816] Workqueue: KFD IH interrupt_wq [amdgpu] [ 388.152084] Call Trace: [ 388.152086] [ 388.152089] dump_stack_lvl+0x4c/0x70 [ 388.152096] dump_stack+0x14/0x20 [ 388.152098] ubsan_epilogue+0x9/0x40 [ 388.152101] __ubsan_handle_shift_out_of_bounds+0x113/0x170 [ 388.152103] ? vprintk+0x40/0x70 [ 388.152106] ? swsusp_check+0x131/0x190 [ 388.152110] event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu] [ 388.152411] ? raw_spin_rq_unlock+0x14/0x40 [ 388.152415] ? finish_task_switch+0x85/0x2a0 [ 388.152417] ? kfifo_copy_out+0x5f/0x70 [ 388.152420] interrupt_wq+0xb2/0x120 [amdgpu] [ 388.152642] ? interrupt_wq+0xb2/0x120 [amdgpu] [ 388.152728] process_scheduled_works+0x9a/0x3a0 [ 388.152731] ? __pfx_worker_thread+0x10/0x10 [ 388.152732] worker_thread+0x15f/0x2d0 [ 388.152733] ? __pfx_worker_thread+0x10/0x10 [ 388.152734] kthread+0xfb/0x130 [ 388.152735] ? __pfx_kthread+0x10/0x10 [ 388.152736] ret_from_fork+0x3d/0x60 [ 388.152738] ? __pfx_kthread+0x10/0x10 [ 388.152739] ret_from_fork_asm+0x1b/0x30 [ 388.152742] Signed-off-by: Jesse Zhang --- include/uapi/linux/kfd_ioctl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..3d5867df17e8 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code { }; /* Mask generated by ecode in kfd_dbg_trap_exception_code */ -#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1)) +#define KFD_EC_MASK(ecode) (ecode ? (1ULL << (ecode - 1)) : 0ULL) /* Masks for exception code type checks below */ #define KFD_EC_MASK_QUEUE (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \ -- 2.25.1