[PATCH v2] drm/amdgpu/sriov: Only sriov runtime support use kiq
Refine the code style, add brackets. For sriov, don't use kiq in exclusive mode, as don't know how long time it will take, some times it will occur exclusive timeout. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index f71615e..adfd0bd 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -320,7 +320,7 @@ signed long amdgpu_kiq_reg_write_reg_wait(struct amdgpu_device *adev, struct amdgpu_kiq *kiq = >gfx.kiq; struct amdgpu_ring *ring = >ring; - if (!ring->ready) + if (!ring->ready || (!amdgpu_sriov_runtime(adev) && amdgpu_sriov_vf(adev))) return -EINVAL; spin_lock_irqsave(>ring_lock, flags); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu/sriov: Only sriov runtime support use kiq
For sriov, don't use kiq in exclusive mode, as don't know how long time it will take, some times it will occur exclusive timeout. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index f71615e..062391e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -320,7 +320,7 @@ signed long amdgpu_kiq_reg_write_reg_wait(struct amdgpu_device *adev, struct amdgpu_kiq *kiq = >gfx.kiq; struct amdgpu_ring *ring = >ring; - if (!ring->ready) + if (!ring->ready || !amdgpu_sriov_runtime(adev) && amdgpu_sriov_vf(adev)) return -EINVAL; spin_lock_irqsave(>ring_lock, flags); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] amdgpu: fix multi-process hang issue
>-Original Message- >From: Christian König >Sent: Monday, August 20, 2018 9:15 PM >To: Deng, Emily ; amd-gfx@lists.freedesktop.org >Cc: Liu, Monk >Subject: Re: [PATCH] amdgpu: fix multi-process hang issue > >Am 20.08.2018 um 05:35 schrieb Emily Deng: >> From: Monk Liu >> >> SWDEV-146499: hang during multi vulkan process testing >> >> cause: >> the second frame's PREAMBLE_IB have clear-state and LOAD actions, >> those actions ruin the pipeline that is still doing process in the >> previous frame's work-load IB. >> >> fix: >> need insert pipeline sync if have context switch for SRIOV (because >> only SRIOV will report PREEMPTION flag to UMD) >> >> Change-Id: If676da7a9b0147114cd76d19b6035ed8033de449 >> Signed-off-by: Monk Liu >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 15 +++ >> 1 file changed, 11 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> index 5c22cfd..66efa85 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >> @@ -95,6 +95,12 @@ void amdgpu_ib_free(struct amdgpu_device *adev, >struct amdgpu_ib *ib, >> amdgpu_sa_bo_free(adev, >sa_bo, f); >> } >> >> +static bool amdgpu_ib_has_preamble(struct amdgpu_ring *ring, bool >ctx_switch) { >> +return (amdgpu_sriov_vf(ring->adev) && >> +ring->funcs->type == AMDGPU_RING_TYPE_GFX && >> +ctx_switch); >> +} >> + > >Well NAK, please merge that into amdgpu_ib_schedule() and drop the extra >check for GFX, that will apply to compute queues as well. > >> /** >>* amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring >>* >> @@ -123,7 +129,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >unsigned num_ibs, >> struct amdgpu_device *adev = ring->adev; >> struct amdgpu_ib *ib = [0]; >> struct dma_fence *tmp = NULL; >> -bool skip_preamble, need_ctx_switch; >> +bool need_ctx_switch; >> unsigned patch_offset = ~0; >> struct amdgpu_vm *vm; >> uint64_t fence_ctx; >> @@ -156,6 +162,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >unsigned num_ibs, >> return -EINVAL; >> } >> >> +need_ctx_switch = ring->current_ctx != fence_ctx; >> + >> alloc_size = ring->funcs->emit_frame_size + num_ibs * >> ring->funcs->emit_ib_size; >> >> @@ -167,6 +175,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >> unsigned num_ibs, >> >> if (ring->funcs->emit_pipeline_sync && job && >> ((tmp = amdgpu_sync_get_fence(>sched_sync, NULL)) || >> + amdgpu_ib_has_preamble(ring, need_ctx_switch) || >> amdgpu_vm_need_pipeline_sync(ring, job))) { >> need_pipe_sync = true; >> >> @@ -200,8 +209,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, >unsigned num_ibs, >> amdgpu_asic_flush_hdp(adev, ring); >> } >> >> -skip_preamble = ring->current_ctx == fence_ctx; >> -need_ctx_switch = ring->current_ctx != fence_ctx; >> if (job && ring->funcs->emit_cntxcntl) { >> if (need_ctx_switch) >> status |= AMDGPU_HAVE_CTX_SWITCH; @@ -215,7 >+222,7 @@ int >> amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, >> >> /* drop preamble IBs if we don't have a context switch */ >> if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && >> -skip_preamble && >> +!need_ctx_switch && >> !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) >&& >> !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, >Preamble CE ib >> must be inserted anyway */ > >Ok, please clean that up more carefully. All prerequisites which are not IB >dependent should come before the loop. > >BTW: The handling of AMDGPU_PREAMBLE_IB_PRESENT_FIRST in amdgpu_cs.c >is completely broken as well. Thanks, will refine more carefully. > >Regards, >Christian. > >> continue; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/powerplay: enable dpm under pass-through
On 08/20/2018 11:39 AM, Yintian Tao wrote: Repeat enable dpm under pass-through because there is no actually hardware-fini and real power-off when guest vm shutdown or reboot. Otherwise, under pass-through it will be failed to populate populate duplicate "populate" and upload SCLK MCLK DPM levels due to zero of pcie_speed_table.count. Change-Id: I7cbc55c650867d00e19241ceea5d98f78b5ac3f5 Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c index 53207e7..6ef3c87 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c @@ -75,10 +75,12 @@ int phm_set_power_state(struct pp_hwmgr *hwmgr, int phm_enable_dynamic_state_management(struct pp_hwmgr *hwmgr) { + struct amdgpu_device *adev = NULL; int ret = -EINVAL;; PHM_FUNC_CHECK(hwmgr); + adev = hwmgr->adev; It could be simplified as struct amdgpu_device *adev = hwmgr->adev; or use that directly amdgpu_passthrough(hwmgr->adev) apart from that, Acked-by: Junwei Zhang - if (smum_is_dpm_running(hwmgr)) { + if (smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev)) { pr_info("dpm has been enabled\n"); return 0; } ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH (repost) 1/5] drm_dp_cec: check that aux has a transfer function
On Mon, 2018-08-20 at 22:47 +0200, Hans Verkuil wrote: > On 08/20/2018 08:51 PM, Lyude Paul wrote: > > On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: > > > From: Hans Verkuil > > > > > > If aux->transfer == NULL, then just return without doing > > > anything. In that case the function is likely called for > > > a non-(e)DP connector. > > > > > > This never happened for the i915 driver, but the nouveau and amdgpu > > > drivers need this check. > > > > Could you give a backtrace from where you're hitting this issue with nouveau > > and > > amdgpu? It doesn't make a whole ton of sense to have connectors registering > > DP > > aux busses if they aren't actually DP, that should probably just be fixed... > Agh, nouveau and amdgpu making questionable decisions :s... but I suppose that makes sense. Reviewed-by: Lyude Paul > The difference between the i915 driver and the nouveau (and amdgpu) driver is > that in the i915 driver the drm_dp_cec_set_edid/unset_edid/irq functions are > called from code that is exclusively for DisplayPort connectors. > > For nouveau/amdgpu they are called from code that is shared between > DisplayPort > and HDMI, so aux->transfer may be NULL. > > Rather than either testing for the connector type or for a non-NULL aux- > >transfer > every time I call a drm_dp_cec_* function, it is better to just test for > aux->transfer in the drm_dp_cec_* functions themselves. It's more robust. > > So there isn't a bug or anything like that, it's just so that these drm_dp_cec > functions can handle a slightly different driver design safely. > > The registration and unregistration of the cec devices is always DP specific, > and an attempt to register a cec device for a non-DP connector will now fail > with a WARN_ON. > > Regards, > > Hans > > > > > > > > > The alternative would be to add this check in those drivers before > > > every drm_dp_cec call, but it makes sense to check it in the > > > drm_dp_cec functions to prevent a kernel oops. > > > > > > Signed-off-by: Hans Verkuil > > > --- > > > drivers/gpu/drm/drm_dp_cec.c | 14 ++ > > > 1 file changed, 14 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c > > > index 988513346e9c..1407b13a8d5d 100644 > > > --- a/drivers/gpu/drm/drm_dp_cec.c > > > +++ b/drivers/gpu/drm/drm_dp_cec.c > > > @@ -238,6 +238,10 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux) > > > u8 cec_irq; > > > int ret; > > > > > > + /* No transfer function was set, so not a DP connector */ > > > + if (!aux->transfer) > > > + return; > > > + > > > mutex_lock(>cec.lock); > > > if (!aux->cec.adap) > > > goto unlock; > > > @@ -293,6 +297,10 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, > > > const > > > struct edid *edid) > > > unsigned int num_las = 1; > > > u8 cap; > > > > > > + /* No transfer function was set, so not a DP connector */ > > > + if (!aux->transfer) > > > + return; > > > + > > > #ifndef CONFIG_MEDIA_CEC_RC > > > /* > > >* CEC_CAP_RC is part of CEC_CAP_DEFAULTS, but it is stripped by > > > @@ -361,6 +369,10 @@ EXPORT_SYMBOL(drm_dp_cec_set_edid); > > > */ > > > void drm_dp_cec_unset_edid(struct drm_dp_aux *aux) > > > { > > > + /* No transfer function was set, so not a DP connector */ > > > + if (!aux->transfer) > > > + return; > > > + > > > cancel_delayed_work_sync(>cec.unregister_work); > > > > > > mutex_lock(>cec.lock); > > > @@ -404,6 +416,8 @@ void drm_dp_cec_register_connector(struct drm_dp_aux > > > *aux, > > > const char *name, > > > struct device *parent) > > > { > > > WARN_ON(aux->cec.adap); > > > + if (WARN_ON(!aux->transfer)) > > > + return; > > > aux->cec.name = name; > > > aux->cec.parent = parent; > > > INIT_DELAYED_WORK(>cec.unregister_work, > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix amdgpu_amdkfd_remove_eviction_fence v2
Hi Christian, Are you going to submit this change to amd-staging-drm-next? amd-kfd-staging would pick it up from there automatically. Regards, Felix On 2018-08-15 01:57 PM, Felix Kuehling wrote: > I applied your change to my local KFD staging branch and it through a > presubmission build/test (sorry, only accessible from inside AMD): > http://git.amd.com:8080/#/c/167906/ > > It passed, but checkpatch pointed out one issue: > http://git.amd.com:8080/#/c/167906/1/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c@247 > (see inline) > > With that fixed, this change is Reviewed-by: Felix Kuehling > > > Regards, > Felix > > > On 2018-08-15 03:46 AM, Christian König wrote: >> Fix quite a number of bugs here. Unfortunately only compile tested. >> >> v2: fix copy error >> >> Signed-off-by: Christian König >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 103 >> ++- >> 1 file changed, 46 insertions(+), 57 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> index fa38a960ce00..887663ede781 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >> @@ -206,11 +206,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct >> amdgpu_bo *bo, >> struct amdgpu_amdkfd_fence ***ef_list, >> unsigned int *ef_count) >> { >> -struct reservation_object_list *fobj; >> -struct reservation_object *resv; >> -unsigned int i = 0, j = 0, k = 0, shared_count; >> -unsigned int count = 0; >> -struct amdgpu_amdkfd_fence **fence_list; >> +struct reservation_object *resv = bo->tbo.resv; >> +struct reservation_object_list *old, *new; >> +unsigned int i, j, k; >> >> if (!ef && !ef_list) >> return -EINVAL; >> @@ -220,76 +218,67 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct >> amdgpu_bo *bo, >> *ef_count = 0; >> } >> >> -resv = bo->tbo.resv; >> -fobj = reservation_object_get_list(resv); >> - >> -if (!fobj) >> +old = reservation_object_get_list(resv); >> +if (!old) >> return 0; >> >> -preempt_disable(); >> -write_seqcount_begin(>seq); >> +new = kmalloc(offsetof(typeof(*new), shared[old->shared_max]), >> + GFP_KERNEL); >> +if (!new) >> +return -ENOMEM; >> >> -/* Go through all the shared fences in the resevation object. If >> - * ef is specified and it exists in the list, remove it and reduce the >> - * count. If ef is not specified, then get the count of eviction fences >> - * present. >> +/* Go through all the shared fences in the resevation object and sort >> + * the interesting ones to the end of the list. >> */ >> -shared_count = fobj->shared_count; >> -for (i = 0; i < shared_count; ++i) { >> +for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) { >> struct dma_fence *f; >> >> -f = rcu_dereference_protected(fobj->shared[i], >> +f = rcu_dereference_protected(old->shared[i], >>reservation_object_held(resv)); >> >> -if (ef) { >> -if (f->context == ef->base.context) { >> -dma_fence_put(f); >> -fobj->shared_count--; >> -} else { >> -RCU_INIT_POINTER(fobj->shared[j++], f); >> -} >> -} else if (to_amdgpu_amdkfd_fence(f)) >> -count++; >> +if ((ef && f->context == ef->base.context) || >> +(!ef && to_amdgpu_amdkfd_fence(f))) >> +RCU_INIT_POINTER(new->shared[--j], f); >> +else >> +RCU_INIT_POINTER(new->shared[k++], f); >> } >> -write_seqcount_end(>seq); >> -preempt_enable(); >> - >> -if (ef || !count) >> -return 0; >> - >> -/* Alloc memory for count number of eviction fence pointers. Fill the >> - * ef_list array and ef_count >> - */ >> -fence_list = kcalloc(count, sizeof(struct amdgpu_amdkfd_fence *), >> - GFP_KERNEL); >> -if (!fence_list) >> -return -ENOMEM; >> +new->shared_max = old->shared_max; >> +new->shared_count = k; >> >> -preempt_disable(); >> -write_seqcount_begin(>seq); >> +if (!ef) { >> +unsigned int count = old->shared_count - j; >> >> -j = 0; >> -for (i = 0; i < shared_count; ++i) { >> -struct dma_fence *f; >> -struct amdgpu_amdkfd_fence *efence; >> - >> -f = rcu_dereference_protected(fobj->shared[i], >> -reservation_object_held(resv)); >> +/* Alloc memory for count number of eviction fence pointers. >> Fill
Re: [PATCH (repost) 3/5] drm_dp_mst_topology: fix broken drm_dp_sideband_parse_remote_dpcd_read()
On Mon, 2018-08-20 at 22:43 +0200, Hans Verkuil wrote: > On 08/20/2018 08:59 PM, Lyude Paul wrote: > > Reviewed-by: Lyude Paul > > > > We really need to add support for using this into the MST helpers. A good > > way to > > test this would probably be to hook up an aux device to the DP AUX adapters > > we > > create for each MST topology > > If you are interested, I have code for that in my MST test branch: > > https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=cec-nv-amd-mst > > It's the "drm_dp_mst_topology: use correct AUX channel" patch. > > I don't have plans to post this patch since CEC for MST isn't working > (still trying to figure out why not), but you are free to pick it up > if you want. Maybe someday but don't count on it yet! I've got a lot of stuff on my plate atm :) > > Regards, > > Hans > > > > > On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: > > > From: Hans Verkuil > > > > > > When parsing the reply of a DP_REMOTE_DPCD_READ DPCD command the > > > result is wrong due to a missing idx increment. > > > > > > This was never noticed since DP_REMOTE_DPCD_READ is currently not > > > used, but if you enable it, then it is all wrong. > > > > > > Signed-off-by: Hans Verkuil > > > --- > > > drivers/gpu/drm/drm_dp_mst_topology.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > > > b/drivers/gpu/drm/drm_dp_mst_topology.c > > > index 7780567aa669..5ff1d79b86c4 100644 > > > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > > > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > > > @@ -439,6 +439,7 @@ static bool > > > drm_dp_sideband_parse_remote_dpcd_read(struct > > > drm_dp_sideband_msg_rx > > > if (idx > raw->curlen) > > > goto fail_len; > > > repmsg->u.remote_dpcd_read_ack.num_bytes = raw->msg[idx]; > > > + idx++; > > > if (idx > raw->curlen) > > > goto fail_len; > > > > > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH (repost) 1/5] drm_dp_cec: check that aux has a transfer function
On 08/20/2018 08:51 PM, Lyude Paul wrote: > On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: >> From: Hans Verkuil >> >> If aux->transfer == NULL, then just return without doing >> anything. In that case the function is likely called for >> a non-(e)DP connector. >> >> This never happened for the i915 driver, but the nouveau and amdgpu >> drivers need this check. > Could you give a backtrace from where you're hitting this issue with nouveau > and > amdgpu? It doesn't make a whole ton of sense to have connectors registering DP > aux busses if they aren't actually DP, that should probably just be fixed... The difference between the i915 driver and the nouveau (and amdgpu) driver is that in the i915 driver the drm_dp_cec_set_edid/unset_edid/irq functions are called from code that is exclusively for DisplayPort connectors. For nouveau/amdgpu they are called from code that is shared between DisplayPort and HDMI, so aux->transfer may be NULL. Rather than either testing for the connector type or for a non-NULL aux->transfer every time I call a drm_dp_cec_* function, it is better to just test for aux->transfer in the drm_dp_cec_* functions themselves. It's more robust. So there isn't a bug or anything like that, it's just so that these drm_dp_cec functions can handle a slightly different driver design safely. The registration and unregistration of the cec devices is always DP specific, and an attempt to register a cec device for a non-DP connector will now fail with a WARN_ON. Regards, Hans > >> >> The alternative would be to add this check in those drivers before >> every drm_dp_cec call, but it makes sense to check it in the >> drm_dp_cec functions to prevent a kernel oops. >> >> Signed-off-by: Hans Verkuil >> --- >> drivers/gpu/drm/drm_dp_cec.c | 14 ++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c >> index 988513346e9c..1407b13a8d5d 100644 >> --- a/drivers/gpu/drm/drm_dp_cec.c >> +++ b/drivers/gpu/drm/drm_dp_cec.c >> @@ -238,6 +238,10 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux) >> u8 cec_irq; >> int ret; >> >> +/* No transfer function was set, so not a DP connector */ >> +if (!aux->transfer) >> +return; >> + >> mutex_lock(>cec.lock); >> if (!aux->cec.adap) >> goto unlock; >> @@ -293,6 +297,10 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const >> struct edid *edid) >> unsigned int num_las = 1; >> u8 cap; >> >> +/* No transfer function was set, so not a DP connector */ >> +if (!aux->transfer) >> +return; >> + >> #ifndef CONFIG_MEDIA_CEC_RC >> /* >> * CEC_CAP_RC is part of CEC_CAP_DEFAULTS, but it is stripped by >> @@ -361,6 +369,10 @@ EXPORT_SYMBOL(drm_dp_cec_set_edid); >> */ >> void drm_dp_cec_unset_edid(struct drm_dp_aux *aux) >> { >> +/* No transfer function was set, so not a DP connector */ >> +if (!aux->transfer) >> +return; >> + >> cancel_delayed_work_sync(>cec.unregister_work); >> >> mutex_lock(>cec.lock); >> @@ -404,6 +416,8 @@ void drm_dp_cec_register_connector(struct drm_dp_aux >> *aux, >> const char *name, >> struct device *parent) >> { >> WARN_ON(aux->cec.adap); >> +if (WARN_ON(!aux->transfer)) >> +return; >> aux->cec.name = name; >> aux->cec.parent = parent; >> INIT_DELAYED_WORK(>cec.unregister_work, > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH (repost) 3/5] drm_dp_mst_topology: fix broken drm_dp_sideband_parse_remote_dpcd_read()
On 08/20/2018 08:59 PM, Lyude Paul wrote: > Reviewed-by: Lyude Paul > > We really need to add support for using this into the MST helpers. A good way > to > test this would probably be to hook up an aux device to the DP AUX adapters we > create for each MST topology If you are interested, I have code for that in my MST test branch: https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=cec-nv-amd-mst It's the "drm_dp_mst_topology: use correct AUX channel" patch. I don't have plans to post this patch since CEC for MST isn't working (still trying to figure out why not), but you are free to pick it up if you want. Regards, Hans > > On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: >> From: Hans Verkuil >> >> When parsing the reply of a DP_REMOTE_DPCD_READ DPCD command the >> result is wrong due to a missing idx increment. >> >> This was never noticed since DP_REMOTE_DPCD_READ is currently not >> used, but if you enable it, then it is all wrong. >> >> Signed-off-by: Hans Verkuil >> --- >> drivers/gpu/drm/drm_dp_mst_topology.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c >> b/drivers/gpu/drm/drm_dp_mst_topology.c >> index 7780567aa669..5ff1d79b86c4 100644 >> --- a/drivers/gpu/drm/drm_dp_mst_topology.c >> +++ b/drivers/gpu/drm/drm_dp_mst_topology.c >> @@ -439,6 +439,7 @@ static bool drm_dp_sideband_parse_remote_dpcd_read(struct >> drm_dp_sideband_msg_rx >> if (idx > raw->curlen) >> goto fail_len; >> repmsg->u.remote_dpcd_read_ack.num_bytes = raw->msg[idx]; >> +idx++; >> if (idx > raw->curlen) >> goto fail_len; >> > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH (repost) 3/5] drm_dp_mst_topology: fix broken drm_dp_sideband_parse_remote_dpcd_read()
Reviewed-by: Lyude Paul We really need to add support for using this into the MST helpers. A good way to test this would probably be to hook up an aux device to the DP AUX adapters we create for each MST topology On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: > From: Hans Verkuil > > When parsing the reply of a DP_REMOTE_DPCD_READ DPCD command the > result is wrong due to a missing idx increment. > > This was never noticed since DP_REMOTE_DPCD_READ is currently not > used, but if you enable it, then it is all wrong. > > Signed-off-by: Hans Verkuil > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 7780567aa669..5ff1d79b86c4 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -439,6 +439,7 @@ static bool drm_dp_sideband_parse_remote_dpcd_read(struct > drm_dp_sideband_msg_rx > if (idx > raw->curlen) > goto fail_len; > repmsg->u.remote_dpcd_read_ack.num_bytes = raw->msg[idx]; > + idx++; > if (idx > raw->curlen) > goto fail_len; > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH (repost) 4/5] drm/nouveau: add DisplayPort CEC-Tunneling-over-AUX support
On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: > From: Hans Verkuil > > Add DisplayPort CEC-Tunneling-over-AUX support to nouveau. > > Signed-off-by: Hans Verkuil > --- > drivers/gpu/drm/nouveau/nouveau_connector.c | 17 +++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c > b/drivers/gpu/drm/nouveau/nouveau_connector.c > index 51932c72334e..eb4f766b5958 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_connector.c > +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c > @@ -400,8 +400,10 @@ nouveau_connector_destroy(struct drm_connector > *connector) > kfree(nv_connector->edid); > drm_connector_unregister(connector); > drm_connector_cleanup(connector); > - if (nv_connector->aux.transfer) > + if (nv_connector->aux.transfer) { With the comments I made about this on patch 1 addressed/resolved: Reviewed-by: Lyude Paul > + drm_dp_cec_unregister_connector(_connector->aux); > drm_dp_aux_unregister(_connector->aux); > + } > kfree(connector); > } > > @@ -608,6 +610,7 @@ nouveau_connector_detect(struct drm_connector *connector, > bool force) > > nouveau_connector_set_encoder(connector, nv_encoder); > conn_status = connector_status_connected; > + drm_dp_cec_set_edid(_connector->aux, nv_connector->edid); > goto out; > } > > @@ -1108,11 +,14 @@ nouveau_connector_hotplug(struct nvif_notify *notify) > > if (rep->mask & NVIF_NOTIFY_CONN_V0_IRQ) { > NV_DEBUG(drm, "service %s\n", name); > + drm_dp_cec_irq(_connector->aux); > if ((nv_encoder = find_encoder(connector, DCB_OUTPUT_DP))) > nv50_mstm_service(nv_encoder->dp.mstm); > } else { > bool plugged = (rep->mask != NVIF_NOTIFY_CONN_V0_UNPLUG); > > + if (!plugged) > + drm_dp_cec_unset_edid(_connector->aux); > NV_DEBUG(drm, "%splugged %s\n", plugged ? "" : "un", name); > if ((nv_encoder = find_encoder(connector, DCB_OUTPUT_DP))) { > if (!plugged) > @@ -1302,7 +1308,6 @@ nouveau_connector_create(struct drm_device *dev, int > index) > kfree(nv_connector); > return ERR_PTR(ret); > } > - > funcs = _connector_funcs; > break; > default: > @@ -1356,6 +1361,14 @@ nouveau_connector_create(struct drm_device *dev, int > index) > break; > } > > + switch (type) { > + case DRM_MODE_CONNECTOR_DisplayPort: > + case DRM_MODE_CONNECTOR_eDP: > + drm_dp_cec_register_connector(_connector->aux, > + connector->name, dev->dev); > + break; > + } > + > ret = nvif_notify_init(>disp.object, nouveau_connector_hotplug, > true, NV04_DISP_NTFY_CONN, > &(struct nvif_notify_conn_req_v0) { ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [Nouveau] [PATCH (repost) 2/5] drm_dp_cec: add note about good MegaChips 2900 CEC support
Reviewed-by: Lyude Paul On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: > From: Hans Verkuil > > A big problem with DP CEC-Tunneling-over-AUX is that it is tricky > to find adapters with a chipset that supports this AND where the > manufacturer actually connected the HDMI CEC line to the chipset. > > Add a mention of the MegaChips 2900 chipset which seems to support > this feature well. > > Signed-off-by: Hans Verkuil > --- > drivers/gpu/drm/drm_dp_cec.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c > index 1407b13a8d5d..8a718f85079a 100644 > --- a/drivers/gpu/drm/drm_dp_cec.c > +++ b/drivers/gpu/drm/drm_dp_cec.c > @@ -16,7 +16,9 @@ > * here. Quite a few active (mini-)DP-to-HDMI or USB-C-to-HDMI adapters > * have a converter chip that supports CEC-Tunneling-over-AUX (usually the > * Parade PS176), but they do not wire up the CEC pin, thus making CEC > - * useless. > + * useless. Note that MegaChips 2900-based adapters appear to have good > + * support for CEC tunneling. Those adapters that I have tested using > + * this chipset all have the CEC line connected. > * > * Sadly there is no way for this driver to know this. What happens is > * that a /dev/cecX device is created that is isolated and unable to see ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH (repost) 1/5] drm_dp_cec: check that aux has a transfer function
On Fri, 2018-08-17 at 16:11 +0200, Hans Verkuil wrote: > From: Hans Verkuil > > If aux->transfer == NULL, then just return without doing > anything. In that case the function is likely called for > a non-(e)DP connector. > > This never happened for the i915 driver, but the nouveau and amdgpu > drivers need this check. Could you give a backtrace from where you're hitting this issue with nouveau and amdgpu? It doesn't make a whole ton of sense to have connectors registering DP aux busses if they aren't actually DP, that should probably just be fixed... > > The alternative would be to add this check in those drivers before > every drm_dp_cec call, but it makes sense to check it in the > drm_dp_cec functions to prevent a kernel oops. > > Signed-off-by: Hans Verkuil > --- > drivers/gpu/drm/drm_dp_cec.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c > index 988513346e9c..1407b13a8d5d 100644 > --- a/drivers/gpu/drm/drm_dp_cec.c > +++ b/drivers/gpu/drm/drm_dp_cec.c > @@ -238,6 +238,10 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux) > u8 cec_irq; > int ret; > > + /* No transfer function was set, so not a DP connector */ > + if (!aux->transfer) > + return; > + > mutex_lock(>cec.lock); > if (!aux->cec.adap) > goto unlock; > @@ -293,6 +297,10 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const > struct edid *edid) > unsigned int num_las = 1; > u8 cap; > > + /* No transfer function was set, so not a DP connector */ > + if (!aux->transfer) > + return; > + > #ifndef CONFIG_MEDIA_CEC_RC > /* >* CEC_CAP_RC is part of CEC_CAP_DEFAULTS, but it is stripped by > @@ -361,6 +369,10 @@ EXPORT_SYMBOL(drm_dp_cec_set_edid); > */ > void drm_dp_cec_unset_edid(struct drm_dp_aux *aux) > { > + /* No transfer function was set, so not a DP connector */ > + if (!aux->transfer) > + return; > + > cancel_delayed_work_sync(>cec.unregister_work); > > mutex_lock(>cec.lock); > @@ -404,6 +416,8 @@ void drm_dp_cec_register_connector(struct drm_dp_aux *aux, > const char *name, > struct device *parent) > { > WARN_ON(aux->cec.adap); > + if (WARN_ON(!aux->transfer)) > + return; > aux->cec.name = name; > aux->cec.parent = parent; > INIT_DELAYED_WORK(>cec.unregister_work, ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: gitlab migration
On Mon, 2018-06-11 at 12:33 +0200, Michel Dänzer wrote: > As the maintainer of xf86-video-amdgpu and -ati, I'm fine with migrating > these to GitLab for Git and patch review. > > However, I'm not sure what to do about bugs/issues. My first thought was > to allow creating new issues in GitLab and disable creating new reports > in Bugzilla, but not to migrate existing reports from Bugzilla. However, > it still happens fairly often that a report is initially filed against > the Xorg drivers, even though it's actually an issue in the X server, > Mesa or the kernel (old habits die hard...). Therefore, I'm inclined to > stick to Bugzilla until at least the X server and Mesa have moved to > GitLab for issue tracking, to hopefully allow moving such misfiled issues. I've hacked up the migration script to allow moving individual issues by bug number instead of matching a whole product/component [1]. This has already been useful to sort out some of the junkdrawer components like App/Other. Running the script does require someone with supercow powers on both sides, but I'm happy to do so as needed. With that in mind, I think it may be time to enable issues and merge requests for xserver, and disable the remaining (server) bz components for new issues. Any objections? [1] - https://gitlab.freedesktop.org/freedesktop/bztogl/merge_requests/1 - ajax ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Chrome crash on 4.18.1
Thanks! Em seg, 20 de ago de 2018 12:17, Michel Dänzer escreveu: > On 2018-08-18 12:39 AM, Bráulio Bhavamitra wrote: > > Chrome freezed and had to be killed, error on dmesg below. > > > > [...] > > > > [ 5898.208262] [ cut here ] > > [ 5898.208272] kernel BUG at drivers/dma-buf/reservation.c:234! > > [ 5898.208305] invalid opcode: [#1] PREEMPT SMP NOPTI > > [ 5898.208314] CPU: 6 PID: 2988 Comm: chromium Tainted: G C O > > 4.18.1-arch1-1-ARCH #1 > > [ 5898.208317] Hardware name: HP HP ENVY x360 Convertible 15m-bq1xx/83C6, > > BIOS F.17 03/29/2018 > > [ 5898.208334] RIP: 0010:reservation_object_add_shared_fence+0x31b/0x370 > > [ 5898.208336] Code: ff 4c 89 7d 18 c7 45 10 01 00 00 00 e9 e2 fd ff ff > e8 > > 46 8c a8 ff e9 0c fe ff ff b8 18 00 00 00 ba 01 00 00 00 e9 c0 fd ff ff > > <0f> 0b 8d 50 01 48 8d 04 c5 18 00 00 00 48 01 d8 45 31 f6 4c 89 38 > > [ 5898.208373] RSP: 0018:a1b1ca9abb38 EFLAGS: 00010246 > > [ 5898.208377] RAX: 0004 RBX: 9236f1c71580 RCX: > > 0001 > > [ 5898.208380] RDX: 0001 RSI: 92371223da60 RDI: > > 9236b196ba18 > > [ 5898.208382] RBP: R08: 9236bc7c3c80 R09: > > 92374aba2378 > > [ 5898.208385] R10: 9237374cded8 R11: 9237374cdec8 R12: > > 9236b196ba18 > > [ 5898.208387] R13: 9236b1ac0130 R14: 9236bfe3c000 R15: > > 92371223da60 > > [ 5898.208390] FS: 7fffe08c2180() GS:92374ab8() > > knlGS: > > [ 5898.208393] CS: 0010 DS: ES: CR0: 80050033 > > [ 5898.208395] CR2: 265eadacf000 CR3: 0004794e4000 CR4: > > 003406e0 > > [ 5898.208398] Call Trace: > > [ 5898.208519] amdgpu_vm_update_directories+0x2a8/0x3e0 [amdgpu] > > This should be fixed with the changes merged by Linus for the 4.19 > release. The relevant fixes were marked for backporting to stable, so > they should appear in future 4.18.y releases as well. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Chrome crash on 4.18.1
On 2018-08-18 12:39 AM, Bráulio Bhavamitra wrote: > Chrome freezed and had to be killed, error on dmesg below. > > [...] > > [ 5898.208262] [ cut here ] > [ 5898.208272] kernel BUG at drivers/dma-buf/reservation.c:234! > [ 5898.208305] invalid opcode: [#1] PREEMPT SMP NOPTI > [ 5898.208314] CPU: 6 PID: 2988 Comm: chromium Tainted: G C O > 4.18.1-arch1-1-ARCH #1 > [ 5898.208317] Hardware name: HP HP ENVY x360 Convertible 15m-bq1xx/83C6, > BIOS F.17 03/29/2018 > [ 5898.208334] RIP: 0010:reservation_object_add_shared_fence+0x31b/0x370 > [ 5898.208336] Code: ff 4c 89 7d 18 c7 45 10 01 00 00 00 e9 e2 fd ff ff e8 > 46 8c a8 ff e9 0c fe ff ff b8 18 00 00 00 ba 01 00 00 00 e9 c0 fd ff ff > <0f> 0b 8d 50 01 48 8d 04 c5 18 00 00 00 48 01 d8 45 31 f6 4c 89 38 > [ 5898.208373] RSP: 0018:a1b1ca9abb38 EFLAGS: 00010246 > [ 5898.208377] RAX: 0004 RBX: 9236f1c71580 RCX: > 0001 > [ 5898.208380] RDX: 0001 RSI: 92371223da60 RDI: > 9236b196ba18 > [ 5898.208382] RBP: R08: 9236bc7c3c80 R09: > 92374aba2378 > [ 5898.208385] R10: 9237374cded8 R11: 9237374cdec8 R12: > 9236b196ba18 > [ 5898.208387] R13: 9236b1ac0130 R14: 9236bfe3c000 R15: > 92371223da60 > [ 5898.208390] FS: 7fffe08c2180() GS:92374ab8() > knlGS: > [ 5898.208393] CS: 0010 DS: ES: CR0: 80050033 > [ 5898.208395] CR2: 265eadacf000 CR3: 0004794e4000 CR4: > 003406e0 > [ 5898.208398] Call Trace: > [ 5898.208519] amdgpu_vm_update_directories+0x2a8/0x3e0 [amdgpu] This should be fixed with the changes merged by Linus for the 4.19 release. The relevant fixes were marked for backporting to stable, so they should appear in future 4.18.y releases as well. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 5/5] drm/amdgpu:add VCN booting with firmware loaded by PSP
On 2018-08-18 01:25 AM, Felix Kuehling wrote: > ROCm CQE is seeing what looks like hangs during amdgpu initialization on > Raven and Vega20. Amdgpu basically stops printing messages while trying > to load VCN firmware. It never completes initialization, but there is no > obvious error message. What does "never" mean exactly? :) I.e. how long have you waited? If it does continue after a few minutes, it sounds like the firmware isn't available in the same place where the driver is loaded from, i.e. either the driver is loaded from initrd but the firmware isn't available there, or the driver is built into the kernel, but the firmware isn't built into the kernel. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Fix compile warning
Reviewed-by: Alex Deucher From: amd-gfx on behalf of Rex Zhu Sent: Monday, August 20, 2018 8:26:07 AM To: amd-gfx@lists.freedesktop.org Cc: Zhu, Rex Subject: [PATCH] drm/amdgpu: Fix compile warning In function ‘gfx_v9_0_check_fw_write_wait’: warning: enumeration value ‘CHIP_TAHITI’ not handled in switch [-Wswitch] Always add default case in case there is no match Signed-off-by: Rex Zhu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 6d75927..5990e5dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -532,6 +532,8 @@ static void gfx_v9_0_check_fw_write_wait(struct amdgpu_device *adev) (adev->gfx.mec_feature_version >= 42)) adev->gfx.mec_fw_write_wait = true; break; + default: + break; } } -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/powerplay: enable dpm under pass-through
Acked-by: Alex Deucher From: amd-gfx on behalf of Yintian Tao Sent: Sunday, August 19, 2018 11:39:53 PM To: amd-gfx@lists.freedesktop.org Cc: Tao, Yintian Subject: [PATCH] drm/powerplay: enable dpm under pass-through Repeat enable dpm under pass-through because there is no actually hardware-fini and real power-off when guest vm shutdown or reboot. Otherwise, under pass-through it will be failed to populate populate and upload SCLK MCLK DPM levels due to zero of pcie_speed_table.count. Change-Id: I7cbc55c650867d00e19241ceea5d98f78b5ac3f5 Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c index 53207e7..6ef3c87 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c @@ -75,10 +75,12 @@ int phm_set_power_state(struct pp_hwmgr *hwmgr, int phm_enable_dynamic_state_management(struct pp_hwmgr *hwmgr) { + struct amdgpu_device *adev = NULL; int ret = -EINVAL;; PHM_FUNC_CHECK(hwmgr); + adev = hwmgr->adev; - if (smum_is_dpm_running(hwmgr)) { + if (smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev)) { pr_info("dpm has been enabled\n"); return 0; } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v2 5/5] drm/amdgpu:add VCN booting with firmware loaded by PSP
Hi Felix, We did test on both China team and Makham team. Also Embedded team did the test also on release 18.20 for Raven. Please let ROCm CQE team issue a JIRA ticket and the detail reproduce step. Thanks & Best Regards! James Zhu From: amd-gfx on behalf of Felix Kuehling Sent: Friday, August 17, 2018 7:25:53 PM To: James Zhu; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander; Gao, Likun; Zhu, James; Huang, Ray Subject: Re: [PATCH v2 5/5] drm/amdgpu:add VCN booting with firmware loaded by PSP ROCm CQE is seeing what looks like hangs during amdgpu initialization on Raven and Vega20. Amdgpu basically stops printing messages while trying to load VCN firmware. It never completes initialization, but there is no obvious error message. These are the last messages from amdgpu in the log: [1.282661] [drm] Found VCN firmware Version: 1.24 Family ID: 18 [1.282664] [drm] PSP loading VCN firmware [1.303164] [drm] reserve 0x40 from 0xf400e0 for PSP TMR SIZE Any applications trying to use /dev/dri/* hang with a backtrace like below. Was this change expected to affect Raven and Vega20? Has it been tested before submitting? Do we need updated VCN firmware for it to work? Thanks, Felix [ 363.352985] INFO: task gpu-manager:937 blocked for more than 120 seconds. [ 363.352995] Not tainted 4.18.0-rc1-kfd-compute-roc-master-8912 #1 [ 363.352999] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.353004] gpu-manager D0 937 1 0x [ 363.353008] Call Trace: [ 363.353018] ? __schedule+0x3d9/0x8b0 [ 363.353023] schedule+0x32/0x80 [ 363.353026] schedule_preempt_disabled+0xa/0x10 [ 363.353028] __mutex_lock.isra.4+0x2ae/0x4e0 [ 363.353031] ? _cond_resched+0x16/0x40 [ 363.353048] ? drm_stub_open+0x2e/0x100 [drm] [ 363.353063] drm_stub_open+0x2e/0x100 [drm] [ 363.353069] chrdev_open+0xbe/0x1a0 [ 363.353072] ? cdev_put+0x20/0x20 [ 363.353075] do_dentry_open+0x1e2/0x300 [ 363.353078] path_openat+0x2b4/0x14b0 [ 363.353082] ? vsnprintf+0x230/0x4c0 [ 363.353086] ? __alloc_pages_nodemask+0x100/0x290 [ 363.353088] do_filp_open+0x99/0x110 [ 363.353092] ? generic_update_time+0x6a/0xc0 [ 363.353094] ? touch_atime+0xc1/0xd0 [ 363.353096] ? _cond_resched+0x16/0x40 [ 363.353100] ? do_sys_open+0x126/0x210 [ 363.353102] do_sys_open+0x126/0x210 [ 363.353106] do_syscall_64+0x4f/0x100 [ 363.353110] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 363.353113] RIP: 0033:0x7f988f340040 [ 363.353113] Code: Bad RIP value. [ 363.353120] RSP: 002b:7ffecdefe618 EFLAGS: 0246 ORIG_RAX: 0002 [ 363.353123] RAX: ffda RBX: 02337cd0 RCX: 7f988f340040 [ 363.353124] RDX: 7ffecdefe67e RSI: 0002 RDI: 7ffecdefe670 [ 363.353125] RBP: 7ffecdefe6a0 R08: R09: 000e [ 363.353126] R10: 069d R11: 0246 R12: 00401b40 [ 363.353127] R13: 7ffecdefe910 R14: R15: On 2018-08-09 12:31 PM, James Zhu wrote: > From: Likun Gao > > Setup psp firmware loading for VCN, and make VCN block > booting from tmr mac address. > > Signed-off-by: James Zhu > Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +-- > drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 38 > ++--- > 2 files changed, 40 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > index 878f62c..77c192a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c > @@ -111,9 +111,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) >version_major, version_minor, family_id); >} > > - bo_size = AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8) > - + AMDGPU_VCN_STACK_SIZE + AMDGPU_VCN_HEAP_SIZE > + bo_size = AMDGPU_VCN_STACK_SIZE + AMDGPU_VCN_HEAP_SIZE > + AMDGPU_VCN_SESSION_SIZE * 40; > + if (adev->firmware.load_type != AMDGPU_FW_LOAD_PSP) > + bo_size += > AMDGPU_GPU_PAGE_ALIGN(le32_to_cpu(hdr->ucode_size_bytes) + 8); >r = amdgpu_bo_create_kernel(adev, bo_size, PAGE_SIZE, >AMDGPU_GEM_DOMAIN_VRAM, >vcn.vcpu_bo, >>vcn.gpu_addr, >vcn.cpu_addr); > @@ -189,11 +190,13 @@ int amdgpu_vcn_resume(struct amdgpu_device *adev) >unsigned offset; > >hdr = (const struct common_firmware_header > *)adev->vcn.fw->data; > - offset = le32_to_cpu(hdr->ucode_array_offset_bytes); > - memcpy_toio(adev->vcn.cpu_addr, adev->vcn.fw->data + offset, > - le32_to_cpu(hdr->ucode_size_bytes)); > - size -= le32_to_cpu(hdr->ucode_size_bytes); > - ptr +=
Re: [PATCH v4 4/5] drm/amdgpu: use bulk moves for efficient VM LRU handling (v4)
Am 20.08.2018 um 08:05 schrieb Huang Rui: On Fri, Aug 17, 2018 at 06:38:16PM +0800, Koenig, Christian wrote: Am 17.08.2018 um 12:08 schrieb Huang Rui: I continue to work for bulk moving that based on the proposal by Christian. Background: amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move all of them on the end of LRU list one by one. Thus, that cause so many BOs moved to the end of the LRU, and impact performance seriously. Then Christian provided a workaround to not move PD/PT BOs on LRU with below patch: "drm/amdgpu: band aid validating VM PTs" Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae However, the final solution should bulk move all PD/PT and PerVM BOs on the LRU instead of one by one. Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need to be validated we move all BOs together to the end of the LRU without dropping the lock for the LRU. While doing so we note the beginning and end of this block in the LRU list. Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything to do, we don't move every BO one by one, but instead cut the LRU list into pieces so that we bulk move everything to the end in just one operation. Test data: +--+-+---+---+ | |The Talos|Clpeak(OCL)|BusSpeedReadback(OCL) | | |Principle(Vulkan)| | | ++ | | | |0.319 ms(1k) 0.314 ms(2K) 0.308 ms(4K) | | Original | 147.7 FPS | 76.86 us |0.307 ms(8K) 0.310 ms(16K) | ++ | Orignial + WA| | |0.254 ms(1K) 0.241 ms(2K) | |(don't move | 162.1 FPS | 42.15 us |0.230 ms(4K) 0.223 ms(8K) 0.204 ms(16K)| |PT BOs on LRU)| | | | ++ | Bulk move| 163.1 FPS | 40.52 us |0.244 ms(1K) 0.252 ms(2K) 0.213 ms(4K) | | | | |0.214 ms(8K) 0.225 ms(16K) | +--+-+---+---+ After test them with above three benchmarks include vulkan and opencl. We can see the visible improvement than original, and even better than original with workaround. v2: move all BOs include idle, relocated, and moved list to the end of LRU and put them together. v3: remove unused parameter and use list_for_each_entry instead of the one with save entry. v4: move the amdgpu_vm_move_to_lru_tail after command submission, at that time, all bo will be back on idle list. Signed-off-by: Christian König Signed-off-by: Huang Rui Tested-by: Mike Lothian Tested-by: Dieter Nützel Acked-by: Chunming Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 11 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 71 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 11 +- 3 files changed, 75 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 502b94f..9fbdf02 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1260,6 +1260,16 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, return 0; } +static void amdgpu_cs_vm_move_on_lru(struct amdgpu_device *adev, +struct amdgpu_cs_parser *p) +{ + struct amdgpu_fpriv *fpriv = p->filp->driver_priv; + struct amdgpu_vm *vm = >vm; + + if (vm->validated) That check belongs inside amdgpu_vm_move_to_lru_tail(). + amdgpu_vm_move_to_lru_tail(adev, vm); +} + int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) { struct amdgpu_device *adev = dev->dev_private; @@ -1310,6 +1320,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) r = amdgpu_cs_submit(, cs); + amdgpu_cs_vm_move_on_lru(adev, ); out: amdgpu_cs_parser_fini(, r, reserved_buffers); return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 9c84770..037cfbc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -268,6 +268,53 @@ void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm, } /** + * amdgpu_vm_move_to_lru_tail_by_list - move one list of BOs to end of LRU + * + * @vm: vm providing the BOs + * @list: the list that stored BOs + * + * Move one list of BOs to the end of LRU and update the positions. + */ +static void +amdgpu_vm_move_to_lru_tail_by_list(struct amdgpu_vm
Re: [PATCH] amdgpu: fix multi-process hang issue
Am 20.08.2018 um 05:35 schrieb Emily Deng: From: Monk Liu SWDEV-146499: hang during multi vulkan process testing cause: the second frame's PREAMBLE_IB have clear-state and LOAD actions, those actions ruin the pipeline that is still doing process in the previous frame's work-load IB. fix: need insert pipeline sync if have context switch for SRIOV (because only SRIOV will report PREEMPTION flag to UMD) Change-Id: If676da7a9b0147114cd76d19b6035ed8033de449 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 5c22cfd..66efa85 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -95,6 +95,12 @@ void amdgpu_ib_free(struct amdgpu_device *adev, struct amdgpu_ib *ib, amdgpu_sa_bo_free(adev, >sa_bo, f); } +static bool amdgpu_ib_has_preamble(struct amdgpu_ring *ring, bool ctx_switch) { + return (amdgpu_sriov_vf(ring->adev) && + ring->funcs->type == AMDGPU_RING_TYPE_GFX && + ctx_switch); +} + Well NAK, please merge that into amdgpu_ib_schedule() and drop the extra check for GFX, that will apply to compute queues as well. /** * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring * @@ -123,7 +129,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, struct amdgpu_device *adev = ring->adev; struct amdgpu_ib *ib = [0]; struct dma_fence *tmp = NULL; - bool skip_preamble, need_ctx_switch; + bool need_ctx_switch; unsigned patch_offset = ~0; struct amdgpu_vm *vm; uint64_t fence_ctx; @@ -156,6 +162,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, return -EINVAL; } + need_ctx_switch = ring->current_ctx != fence_ctx; + alloc_size = ring->funcs->emit_frame_size + num_ibs * ring->funcs->emit_ib_size; @@ -167,6 +175,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, if (ring->funcs->emit_pipeline_sync && job && ((tmp = amdgpu_sync_get_fence(>sched_sync, NULL)) || +amdgpu_ib_has_preamble(ring, need_ctx_switch) || amdgpu_vm_need_pipeline_sync(ring, job))) { need_pipe_sync = true; @@ -200,8 +209,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, amdgpu_asic_flush_hdp(adev, ring); } - skip_preamble = ring->current_ctx == fence_ctx; - need_ctx_switch = ring->current_ctx != fence_ctx; if (job && ring->funcs->emit_cntxcntl) { if (need_ctx_switch) status |= AMDGPU_HAVE_CTX_SWITCH; @@ -215,7 +222,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* drop preamble IBs if we don't have a context switch */ if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && - skip_preamble && + !need_ctx_switch && !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ Ok, please clean that up more carefully. All prerequisites which are not IB dependent should come before the loop. BTW: The handling of AMDGPU_PREAMBLE_IB_PRESENT_FIRST in amdgpu_cs.c is completely broken as well. Regards, Christian. continue; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: Fix compile warning
In function ‘gfx_v9_0_check_fw_write_wait’: warning: enumeration value ‘CHIP_TAHITI’ not handled in switch [-Wswitch] Always add default case in case there is no match Signed-off-by: Rex Zhu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 6d75927..5990e5dc 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -532,6 +532,8 @@ static void gfx_v9_0_check_fw_write_wait(struct amdgpu_device *adev) (adev->gfx.mec_feature_version >= 42)) adev->gfx.mec_fw_write_wait = true; break; + default: + break; } } -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH v2] drm/amd/display: Fix bug use wrong pp interface
Used wrong pp interface, the original interface is exposed by dpm on SI and paritial CI. Pointed out by Francis David v2: dal only need to set min_dcefclk and min_fclk to smu. so use display_clock_voltage_request interface, instand of update all display configuration. Acked-by: Alex Deucher Signed-off-by: Rex Zhu --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c index e5c5b0a..7811d60 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c @@ -480,12 +480,20 @@ void pp_rv_set_display_requirement(struct pp_smu *pp, { const struct dc_context *ctx = pp->dm; struct amdgpu_device *adev = ctx->driver_context; + void *pp_handle = adev->powerplay.pp_handle; const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs; + struct pp_display_clock_request clock = {0}; - if (!pp_funcs || !pp_funcs->display_configuration_changed) + if (!req || !pp_funcs || !pp_funcs->display_clock_voltage_request) return; - amdgpu_dpm_display_configuration_changed(adev); + clock.clock_type = amd_pp_dcf_clock; + clock.clock_freq_in_khz = req->hard_min_dcefclk_khz; + pp_funcs->display_clock_voltage_request(pp_handle, ); + + clock.clock_type = amd_pp_f_clock; + clock.clock_freq_in_khz = req->hard_min_fclk_khz; + pp_funcs->display_clock_voltage_request(pp_handle, ); } void pp_rv_set_wm_ranges(struct pp_smu *pp, -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] amdgpu: fix multi-process hang issue
From: Monk Liu SWDEV-146499: hang during multi vulkan process testing cause: the second frame's PREAMBLE_IB have clear-state and LOAD actions, those actions ruin the pipeline that is still doing process in the previous frame's work-load IB. fix: need insert pipeline sync if have context switch for SRIOV (because only SRIOV will report PREEMPTION flag to UMD) Change-Id: If676da7a9b0147114cd76d19b6035ed8033de449 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 5c22cfd..66efa85 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -95,6 +95,12 @@ void amdgpu_ib_free(struct amdgpu_device *adev, struct amdgpu_ib *ib, amdgpu_sa_bo_free(adev, >sa_bo, f); } +static bool amdgpu_ib_has_preamble(struct amdgpu_ring *ring, bool ctx_switch) { + return (amdgpu_sriov_vf(ring->adev) && + ring->funcs->type == AMDGPU_RING_TYPE_GFX && + ctx_switch); +} + /** * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring * @@ -123,7 +129,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, struct amdgpu_device *adev = ring->adev; struct amdgpu_ib *ib = [0]; struct dma_fence *tmp = NULL; - bool skip_preamble, need_ctx_switch; + bool need_ctx_switch; unsigned patch_offset = ~0; struct amdgpu_vm *vm; uint64_t fence_ctx; @@ -156,6 +162,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, return -EINVAL; } + need_ctx_switch = ring->current_ctx != fence_ctx; + alloc_size = ring->funcs->emit_frame_size + num_ibs * ring->funcs->emit_ib_size; @@ -167,6 +175,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, if (ring->funcs->emit_pipeline_sync && job && ((tmp = amdgpu_sync_get_fence(>sched_sync, NULL)) || +amdgpu_ib_has_preamble(ring, need_ctx_switch) || amdgpu_vm_need_pipeline_sync(ring, job))) { need_pipe_sync = true; @@ -200,8 +209,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, amdgpu_asic_flush_hdp(adev, ring); } - skip_preamble = ring->current_ctx == fence_ctx; - need_ctx_switch = ring->current_ctx != fence_ctx; if (job && ring->funcs->emit_cntxcntl) { if (need_ctx_switch) status |= AMDGPU_HAVE_CTX_SWITCH; @@ -215,7 +222,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* drop preamble IBs if we don't have a context switch */ if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && - skip_preamble && + !need_ctx_switch && !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ continue; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix VM clearing for the root PD
On 08/17/2018 08:24 PM, Christian König wrote: We need to figure out the address after validating the BO, not before. Signed-off-by: Christian König Reviewed-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 662e8a34d52c..662aec5c81d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -369,7 +369,6 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, uint64_t addr; int r; - addr = amdgpu_bo_gpu_offset(bo); entries = amdgpu_bo_size(bo) / 8; if (pte_support_ats) { @@ -401,6 +400,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, if (r) goto error; + addr = amdgpu_bo_gpu_offset(bo); if (ats_entries) { uint64_t ats_value; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v4 4/5] drm/amdgpu: use bulk moves for efficient VM LRU handling (v4)
On Fri, Aug 17, 2018 at 06:38:16PM +0800, Koenig, Christian wrote: > Am 17.08.2018 um 12:08 schrieb Huang Rui: > > I continue to work for bulk moving that based on the proposal by Christian. > > > > Background: > > amdgpu driver will move all PD/PT and PerVM BOs into idle list. Then move > > all of > > them on the end of LRU list one by one. Thus, that cause so many BOs moved > > to > > the end of the LRU, and impact performance seriously. > > > > Then Christian provided a workaround to not move PD/PT BOs on LRU with below > > patch: > > "drm/amdgpu: band aid validating VM PTs" > > Commit 0bbf32026cf5ba41e9922b30e26e1bed1ecd38ae > > > > However, the final solution should bulk move all PD/PT and PerVM BOs on the > > LRU > > instead of one by one. > > > > Whenever amdgpu_vm_validate_pt_bos() is called and we have BOs which need > > to be > > validated we move all BOs together to the end of the LRU without dropping > > the > > lock for the LRU. > > > > While doing so we note the beginning and end of this block in the LRU list. > > > > Now when amdgpu_vm_validate_pt_bos() is called and we don't have anything > > to do, > > we don't move every BO one by one, but instead cut the LRU list into pieces > > so > > that we bulk move everything to the end in just one operation. > > > > Test data: > > +--+-+---+---+ > > | |The Talos|Clpeak(OCL)|BusSpeedReadback(OCL) > > | > > | |Principle(Vulkan)| | > > | > > ++ > > | | | |0.319 ms(1k) 0.314 ms(2K) > > 0.308 ms(4K) | > > | Original | 147.7 FPS | 76.86 us |0.307 ms(8K) 0.310 ms(16K) > > | > > ++ > > | Orignial + WA| | |0.254 ms(1K) 0.241 ms(2K) > > | > > |(don't move | 162.1 FPS | 42.15 us |0.230 ms(4K) 0.223 ms(8K) > > 0.204 ms(16K)| > > |PT BOs on LRU)| | | > > | > > ++ > > | Bulk move| 163.1 FPS | 40.52 us |0.244 ms(1K) 0.252 ms(2K) > > 0.213 ms(4K) | > > | | | |0.214 ms(8K) 0.225 ms(16K) > > | > > +--+-+---+---+ > > > > After test them with above three benchmarks include vulkan and opencl. We > > can > > see the visible improvement than original, and even better than original > > with > > workaround. > > > > v2: move all BOs include idle, relocated, and moved list to the end of LRU > > and > > put them together. > > v3: remove unused parameter and use list_for_each_entry instead of the one > > with > > save entry. > > v4: move the amdgpu_vm_move_to_lru_tail after command submission, at that > > time, > > all bo will be back on idle list. > > > > Signed-off-by: Christian König > > Signed-off-by: Huang Rui > > Tested-by: Mike Lothian > > Tested-by: Dieter Nützel > > Acked-by: Chunming Zhou > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 11 ++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 71 > > ++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 11 +- > > 3 files changed, 75 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > index 502b94f..9fbdf02 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > @@ -1260,6 +1260,16 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser > > *p, > > return 0; > > } > > > > +static void amdgpu_cs_vm_move_on_lru(struct amdgpu_device *adev, > > +struct amdgpu_cs_parser *p) > > +{ > > + struct amdgpu_fpriv *fpriv = p->filp->driver_priv; > > + struct amdgpu_vm *vm = >vm; > > + > > + if (vm->validated) > > That check belongs inside amdgpu_vm_move_to_lru_tail(). > > > + amdgpu_vm_move_to_lru_tail(adev, vm); > > +} > > + > > int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file > > *filp) > > { > > struct amdgpu_device *adev = dev->dev_private; > > @@ -1310,6 +1320,7 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void > > *data, struct drm_file *filp) > > > > r = amdgpu_cs_submit(, cs); > > > > + amdgpu_cs_vm_move_on_lru(adev, ); > > out: > > amdgpu_cs_parser_fini(, r, reserved_buffers); > > return r; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > index 9c84770..037cfbc 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > +++
RE: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity
-Original Message- From: dri-devel On Behalf Of Andrey Grodzovsky Sent: Friday, August 17, 2018 11:16 PM To: dri-de...@lists.freedesktop.org Cc: Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity The flag will prevent another thread from same process to reinsert the entity queue into scheduler's rq after it was already removed from there by another thread during drm_sched_entity_flush. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/scheduler/sched_entity.c | 10 +- include/drm/gpu_scheduler.h | 2 ++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 1416edb..07cfe63 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -177,8 +177,12 @@ long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout) /* For killed process disable any more IBs enqueue right now */ last_user = cmpxchg(>last_user, current->group_leader, NULL); if ((!last_user || last_user == current->group_leader) && - (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) + (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) { + spin_lock(>rq_lock); + entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); + spin_unlock(>rq_lock); + } return ret; } @@ -504,6 +508,10 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job, if (first) { /* Add the entity to the run queue */ spin_lock(>rq_lock); + if (entity->stopped) { + spin_unlock(>rq_lock); + return; + } [DZ] the code changes so frequent recently and has this regression, my code synced last Friday still has below checking: spin_lock(>rq_lock); if (!entity->rq) { DRM_ERROR("Trying to push to a killed entity\n"); spin_unlock(>rq_lock); return; } So you should add DRM_ERROR as well when hitting it. With that fix, patch is Reviewed-by: Chunming Zhou Regards, David Zhou drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(>rq_lock); drm_sched_wakeup(entity->rq->sched); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 919ae57..daec50f 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -70,6 +70,7 @@ enum drm_sched_priority { * @fini_status: contains the exit status in case the process was signalled. * @last_scheduled: points to the finished fence of the last scheduled job. * @last_user: last group leader pushing a job into the entity. + * @stopped: Marks the enity as removed from rq and destined for termination. * * Entities will emit jobs in order to their corresponding hardware * ring, and the scheduler will alternate between entities based on @@ -92,6 +93,7 @@ struct drm_sched_entity { atomic_t*guilty; struct dma_fence*last_scheduled; struct task_struct *last_user; + boolstopped; }; /** -- 2.7.4 ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix VM clearing for the root PD
On 08/17/2018 08:24 PM, Christian König wrote: We need to figure out the address after validating the BO, not before. Signed-off-by: Christian König Reviewed-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 662e8a34d52c..662aec5c81d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -369,7 +369,6 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, uint64_t addr; int r; - addr = amdgpu_bo_gpu_offset(bo); entries = amdgpu_bo_size(bo) / 8; if (pte_support_ats) { @@ -401,6 +400,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, if (r) goto error; + addr = amdgpu_bo_gpu_offset(bo); if (ats_entries) { uint64_t ats_value; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/powerplay: enable dpm under pass-through
Repeat enable dpm under pass-through because there is no actually hardware-fini and real power-off when guest vm shutdown or reboot. Otherwise, under pass-through it will be failed to populate populate and upload SCLK MCLK DPM levels due to zero of pcie_speed_table.count. Change-Id: I7cbc55c650867d00e19241ceea5d98f78b5ac3f5 Signed-off-by: Yintian Tao --- drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c index 53207e7..6ef3c87 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hardwaremanager.c @@ -75,10 +75,12 @@ int phm_set_power_state(struct pp_hwmgr *hwmgr, int phm_enable_dynamic_state_management(struct pp_hwmgr *hwmgr) { + struct amdgpu_device *adev = NULL; int ret = -EINVAL;; PHM_FUNC_CHECK(hwmgr); + adev = hwmgr->adev; - if (smum_is_dpm_running(hwmgr)) { + if (smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev)) { pr_info("dpm has been enabled\n"); return 0; } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] amdgpu: fix multi-process hang issue
From: Monk Liu SWDEV-146499: hang during multi vulkan process testing cause: the second frame's PREAMBLE_IB have clear-state and LOAD actions, those actions ruin the pipeline that is still doing process in the previous frame's work-load IB. fix: need insert pipeline sync if have context switch for SRIOV (because only SRIOV will report PREEMPTION flag to UMD) Change-Id: If676da7a9b0147114cd76d19b6035ed8033de449 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 5c22cfd..66efa85 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -95,6 +95,12 @@ void amdgpu_ib_free(struct amdgpu_device *adev, struct amdgpu_ib *ib, amdgpu_sa_bo_free(adev, >sa_bo, f); } +static bool amdgpu_ib_has_preamble(struct amdgpu_ring *ring, bool ctx_switch) { + return (amdgpu_sriov_vf(ring->adev) && + ring->funcs->type == AMDGPU_RING_TYPE_GFX && + ctx_switch); +} + /** * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring * @@ -123,7 +129,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, struct amdgpu_device *adev = ring->adev; struct amdgpu_ib *ib = [0]; struct dma_fence *tmp = NULL; - bool skip_preamble, need_ctx_switch; + bool need_ctx_switch; unsigned patch_offset = ~0; struct amdgpu_vm *vm; uint64_t fence_ctx; @@ -156,6 +162,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, return -EINVAL; } + need_ctx_switch = ring->current_ctx != fence_ctx; + alloc_size = ring->funcs->emit_frame_size + num_ibs * ring->funcs->emit_ib_size; @@ -167,6 +175,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, if (ring->funcs->emit_pipeline_sync && job && ((tmp = amdgpu_sync_get_fence(>sched_sync, NULL)) || +amdgpu_ib_has_preamble(ring, need_ctx_switch) || amdgpu_vm_need_pipeline_sync(ring, job))) { need_pipe_sync = true; @@ -200,8 +209,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, amdgpu_asic_flush_hdp(adev, ring); } - skip_preamble = ring->current_ctx == fence_ctx; - need_ctx_switch = ring->current_ctx != fence_ctx; if (job && ring->funcs->emit_cntxcntl) { if (need_ctx_switch) status |= AMDGPU_HAVE_CTX_SWITCH; @@ -215,7 +222,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, /* drop preamble IBs if we don't have a context switch */ if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) && - skip_preamble && + !need_ctx_switch && !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) && !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */ continue; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix VM clearing for the root PD
On Fri, Aug 17, 2018 at 02:24:22PM +0200, Christian König wrote: > We need to figure out the address after validating the BO, not before. > > Signed-off-by: Christian König Reviewed-by: Huang Rui > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 662e8a34d52c..662aec5c81d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -369,7 +369,6 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, > uint64_t addr; > int r; > > - addr = amdgpu_bo_gpu_offset(bo); > entries = amdgpu_bo_size(bo) / 8; > > if (pte_support_ats) { > @@ -401,6 +400,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, > if (r) > goto error; > > + addr = amdgpu_bo_gpu_offset(bo); > if (ats_entries) { > uint64_t ats_value; > > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: fix VM clearing for the root PD
> From: amd-gfx On Behalf Of Christian > K?nig > Sent: Friday, August 17, 2018 20:24 > To: amd-gfx@lists.freedesktop.org > Subject: [PATCH] drm/amdgpu: fix VM clearing for the root PD > > We need to figure out the address after validating the BO, not before. > > Signed-off-by: Christian König Reviewed-by: Junwei Zhang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 662e8a34d52c..662aec5c81d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -369,7 +369,6 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device > *adev, > uint64_t addr; > int r; > > - addr = amdgpu_bo_gpu_offset(bo); > entries = amdgpu_bo_size(bo) / 8; > > if (pte_support_ats) { > @@ -401,6 +400,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device > *adev, > if (r) > goto error; > > + addr = amdgpu_bo_gpu_offset(bo); > if (ats_entries) { > uint64_t ats_value; > > -- > 2.17.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix VM clearing for the root PD
On 08/17/2018 08:24 PM, Christian König wrote: We need to figure out the address after validating the BO, not before. Signed-off-by: Christian König Reviewed-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 662e8a34d52c..662aec5c81d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -369,7 +369,6 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, uint64_t addr; int r; - addr = amdgpu_bo_gpu_offset(bo); entries = amdgpu_bo_size(bo) / 8; if (pte_support_ats) { @@ -401,6 +400,7 @@ static int amdgpu_vm_clear_bo(struct amdgpu_device *adev, if (r) goto error; + addr = amdgpu_bo_gpu_offset(bo); if (ats_entries) { uint64_t ats_value; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx