>>the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.
I know, > + /* a read return value of 1 means semaphore acuqire */ > + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 || > + ring->funcs->vmhub == AMDGPU_MMHUB_1) > + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1); But ring->funcs->vmhub wil always be AMDGPU_GFXHUB, right ? since this ring is from "&kiq->ring" ? >> Yes, agree. But since we now knew that we won't need that we can just drop >> this patch altogether. Yeah, the semaphore wrapping is in PATCH 2/2, agree that this PATCH 1/2 could be dropped -----邮件原件----- 发件人: Christian König <ckoenig.leichtzumer...@gmail.com> 发送时间: 2019年11月20日 22:39 收件人: Liu, Monk <monk....@amd.com>; Zhu, Changfeng <changfeng....@amd.com>; Koenig, Christian <christian.koe...@amd.com>; Xiao, Jack <jack.x...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Huang, Ray <ray.hu...@amd.com>; Huang, Shimmer <xinmei.hu...@amd.com>; amd-gfx@lists.freedesktop.org 主题: Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt Hi Monk, the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega. > Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper > function that only serve VM invalidate, so I don't think You should > put the semaphore read/write in this routine, instead you can put > semaphore r/w out side of this routine and only Put them around the VM > invalidate logic Yes, agree. But since we now knew that we won't need that we can just drop this patch altogether. Regards, Christian. Am 20.11.19 um 15:30 schrieb Liu, Monk: > Thanks for sharing this JIR > > now I got the picture of this issue from you and Christian. > > So the semaphore grabbing can prevent RTL to power off the MMHUB, I > see > > The practice is that SRIOV won't enable PG at all (even our GIM driver > won't enable PG, maybe in future we would enable it ) > > I think I don't have too many concern about your patches, > > But I have comments on your patch 1: > > void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev, > uint32_t reg0, uint32_t reg1, > - uint32_t ref, uint32_t mask) > + uint32_t ref, uint32_t mask, > + uint32_t sem) > { > struct amdgpu_kiq *kiq = &adev->gfx.kiq; > struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ void > amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev, > uint32_t seq; > > spin_lock_irqsave(&kiq->ring_lock, flags); > - amdgpu_ring_alloc(ring, 32); > + amdgpu_ring_alloc(ring, 60); > + > + /* > + * It may lose gpuvm invalidate acknowldege state across power-gating > + * off cycle, add semaphore acquire before invalidation and semaphore > + * release after invalidation to avoid entering power gated state > + * to WA the Issue > + */ > + > + /* a read return value of 1 means semaphore acuqire */ > + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 || > + ring->funcs->vmhub == AMDGPU_MMHUB_1) > + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1); > > > See that in this routine, the ring is always KIQ, so below code looks > redundant : > > + /* a read return value of 1 means semaphore acuqire */ > + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 || > + ring->funcs->vmhub == AMDGPU_MMHUB_1) > + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1); > > Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper > function that only serve VM invalidate, so I don't think You should > put the semaphore read/write in this routine, instead you can put > semaphore r/w out side of this routine and only Put them around the VM > invalidate logic > > Thanks > > -----邮件原件----- > 发件人: Zhu, Changfeng <changfeng....@amd.com> > 发送时间: 2019年11月20日 22:17 > 收件人: Koenig, Christian <christian.koe...@amd.com>; Liu, Monk > <monk....@amd.com>; Xiao, Jack <jack.x...@amd.com>; Zhou1, Tao > <tao.zh...@amd.com>; Huang, Ray <ray.hu...@amd.com>; Huang, Shimmer > <xinmei.hu...@amd.com>; amd-gfx@lists.freedesktop.org > 主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore > workaround in amdgpu_virt > >>>> Did Changfeng already hit this issue under SRIOV ??? > I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb . > The problem is also seen by Zhou,Tao. > > And this is ticket: > http://ontrack-internal.amd.com/browse/SWDEV-201459 > > After the semaphore patch, the problem can be fixed. > > If SROV has concern about this problem, it should not add semaphore in SROV. > > However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/ > gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/ > gmc_v10_0_emit_flush_gpu_tlb > > Or how can we handle the ticket above? > > BR, > Changfeng. > > -----Original Message----- > From: Christian König <ckoenig.leichtzumer...@gmail.com> > Sent: Wednesday, November 20, 2019 10:00 PM > To: Liu, Monk <monk....@amd.com>; Koenig, Christian > <christian.koe...@amd.com>; Zhu, Changfeng <changfeng....@amd.com>; > Xiao, Jack <jack.x...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Huang, > Ray <ray.hu...@amd.com>; Huang, Shimmer <xinmei.hu...@amd.com>; > amd-gfx@lists.freedesktop.org > Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore > workaround in amdgpu_virt > >> Did Changfeng already hit this issue under SRIOV ? > I don't think so, but Changfeng needs to answer this. > > Question is does the extra semaphore acquire has some negative effect on > SRIOV? > > I would like to avoid having even more SRIOV specific handling in here which > we can't really test on bare metal. > > Christian. > > Am 20.11.19 um 14:54 schrieb Liu, Monk: >> Hah, but in SRIOV case, our guest KMD driver is not allowed to do >> such things .... (and even there is a bug that KMD try to power gate, >> the SMU firmware would not really do the jobs since We have PSP L1 >> policy to prevent those danger operations ) >> >> Did Changfeng already hit this issue under SRIOV ??? >> >> -----邮件原件----- >> 发件人: Koenig, Christian <christian.koe...@amd.com> >> 发送时间: 2019年11月20日 21:21 >> 收件人: Liu, Monk <monk....@amd.com>; Zhu, Changfeng >> <changfeng....@amd.com>; Xiao, Jack <jack.x...@amd.com>; Zhou1, Tao >> <tao.zh...@amd.com>; Huang, Ray <ray.hu...@amd.com>; Huang, Shimmer >> <xinmei.hu...@amd.com>; amd-gfx@lists.freedesktop.org >> 主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore >> workaround in amdgpu_virt >> >> Hi Monk, >> >> this is a fix for power gating the MMHUB. >> >> Basic problem is that the MMHUB can power gate while an invalidation is in >> progress which looses all bits in the ACK register and so deadlocks the >> engine waiting for the invalidation to finish. >> >> This bug is hit immediately when we enable power gating of the MMHUB. >> >> Regards, >> Christian. >> >> Am 20.11.19 um 14:18 schrieb Liu, Monk: >>> Hi Changfeng >>> >>> Firs of all, there is no power-gating off circle involved in AMDGPU >>> SRIOV, since we don't allow VF/VM do such things so I do feel >>> strange why you post something like this Especially on VEGA10 >>> serials which looks doesn't have any issue on those gpu_flush part >>> >>> Here is my questions for you: >>> 1) Can you point me what issue had you been experienced ? and how to >>> repro the bug >>> 2) if you do hit some issues, did you verified that your patch can fix it ? >>> >>> besides >>> >>> /Monk >>> >>> -----邮件原件----- >>> 发件人: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> 代表 >>> Changfeng.Zhu >>> 发送时间: 2019年11月20日 17:14 >>> 收件人: Koenig, Christian <christian.koe...@amd.com>; Xiao, Jack >>> <jack.x...@amd.com>; Zhou1, Tao <tao.zh...@amd.com>; Huang, Ray >>> <ray.hu...@amd.com>; Huang, Shimmer <xinmei.hu...@amd.com>; >>> amd-gfx@lists.freedesktop.org >>> 抄送: Zhu, Changfeng <changfeng....@amd.com> >>> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in >>> amdgpu_virt >>> >>> From: changzhu <changfeng....@amd.com> >>> >>> It may lose gpuvm invalidate acknowldege state across power-gating off >>> cycle. To avoid this issue in virt invalidation, add semaphore acquire >>> before invalidation and semaphore release after invalidation. >>> >>> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b >>> Signed-off-by: changzhu <changfeng....@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++++++++++++++++++++++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 3 ++- >>> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 ++- >>> 3 files changed, 28 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> index f04eb1a64271..70ffaf91cd12 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device >>> *adev, uint32_t reg, uint32_t v) >>> >>> void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev, >>> uint32_t reg0, uint32_t reg1, >>> - uint32_t ref, uint32_t mask) >>> + uint32_t ref, uint32_t mask, >>> + uint32_t sem) >>> { >>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ >>> void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev, >>> uint32_t seq; >>> >>> spin_lock_irqsave(&kiq->ring_lock, flags); >>> - amdgpu_ring_alloc(ring, 32); >>> + amdgpu_ring_alloc(ring, 60); >>> + >>> + /* >>> + * It may lose gpuvm invalidate acknowldege state across power-gating >>> + * off cycle, add semaphore acquire before invalidation and semaphore >>> + * release after invalidation to avoid entering power gated state >>> + * to WA the Issue >>> + */ >>> + >>> + /* a read return value of 1 means semaphore acuqire */ >>> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 || >>> + ring->funcs->vmhub == AMDGPU_MMHUB_1) >>> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1); >>> + >>> amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1, >>> ref, mask); >>> + /* >>> + * add semaphore release after invalidation, >>> + * write with 0 means semaphore release >>> + */ >>> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 || >>> + ring->funcs->vmhub == AMDGPU_MMHUB_1) >>> + amdgpu_ring_emit_wreg(ring, sem, 0); >>> + >>> amdgpu_fence_emit_polling(ring, &seq); >>> amdgpu_ring_commit(ring); >>> spin_unlock_irqrestore(&kiq->ring_lock, flags); diff --git >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>> index b0b2bdc750df..bda6a2f37dc0 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >>> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device >>> *adev, uint32_t reg); void amdgpu_virt_kiq_wreg(struct amdgpu_device >>> *adev, uint32_t reg, uint32_t v); void >>> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev, >>> uint32_t reg0, uint32_t rreg1, >>> - uint32_t ref, uint32_t mask); >>> + uint32_t ref, uint32_t mask, >>> + uint32_t sem); >>> int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, >>> bool init); int amdgpu_virt_release_full_gpu(struct amdgpu_device >>> *adev, bool init); int amdgpu_virt_reset_gpu(struct amdgpu_device >>> *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> index f25cd97ba5f2..1ae59af7836a 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> @@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct >>> amdgpu_device *adev, uint32_t vmid, >>> !adev->in_gpu_reset) { >>> uint32_t req = hub->vm_inv_eng0_req + eng; >>> uint32_t ack = hub->vm_inv_eng0_ack + eng; >>> + uint32_t sem = hub->vm_inv_eng0_sem + eng; >>> >>> amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp, >>> - 1 << vmid); >>> + 1 << vmid, sem); >>> return; >>> } >>> >>> -- >>> 2.17.1 >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx