Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
Eh sorry. It is merged, I was looking at wrong branch. Regards, Nirmoy On 6/8/20 7:13 PM, Nirmoy wrote: Hi Christian, I realized we are still missing this patch while reading dmesg of https://gitlab.freedesktop.org/drm/amd/-/issues/1158 <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1158&data=02%7C01%7Cnirmoy.das%40amd.com%7C426163ec6ca743ff3f5d08d80bcf2dab%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637272332499610412&sdata=gQW0f0Ovm52jmekzgDYLrr3XvHPp3V9u1steyBNtgic%3D&reserved=0> Regards, Nirmoy On 2/28/20 4:24 PM, Li, Dennis wrote: [AMD Public Use] Looks good to me Test-by: Dennis Li mailto:dennis...@amd.com>> Best Regards Dennis Li *From:* amd-gfx *On Behalf Of *Deucher, Alexander *Sent:* Thursday, February 27, 2020 11:18 PM *To:* Christian König ; Das, Nirmoy ; amd-gfx@lists.freedesktop.org *Subject:* Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini [AMD Public Use] Looks good to me. Reviewed-by: Alex Deucher <mailto:alexander.deuc...@amd.com>> *From:*Christian König <mailto:ckoenig.leichtzumer...@gmail.com>> *Sent:* Thursday, February 27, 2020 9:50 AM *To:* Das, Nirmoy mailto:nirmoy@amd.com>>; amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander mailto:alexander.deuc...@amd.com>> *Subject:* Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini Alex any comment on this? Am 25.02.20 um 14:16 schrieb Nirmoy: > Acked-by: Nirmoy Das mailto:nirmoy@amd.com>> > > On 2/25/20 2:07 PM, Christian König wrote: >> When we stop the HW for example for GPU reset we should not stop the >> front-end scheduler. Otherwise we run into intermediate failures during >> command submission. >> >> The scheduler should only be stopped in very few cases: >> 1. We can't get the hardware working in ring or IB test after a GPU >> reset. >> 2. The KIQ scheduler is not used in the front-end and should be >> disabled during GPU reset. >> 3. In amdgpu_ring_fini() when the driver unloads. >> >> Signed-off-by: Christian König <mailto:christian.koe...@amd.com>> >> --- >> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 >> drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - >> drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/si_dma.c | 1 - >> drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- >> 20 files changed, 10 insertions(+), 104 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> index 4274ccf765de..cb3b3a0a1348 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct >> amdgpu_device *adev) >> WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); >> WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); >> } >> - sdma0->sched.ready = false; >> - sdma1->sched.ready = false; >> } >> /** >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> index 7b6158320400..36ce67ce4800 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct >> amdgpu_device *adev, bool enable) >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_M
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
Hi Christian, I realized we are still missing this patch while reading dmesg of https://gitlab.freedesktop.org/drm/amd/-/issues/1158 Regards, Nirmoy On 2/28/20 4:24 PM, Li, Dennis wrote: [AMD Public Use] Looks good to me Test-by: Dennis Li mailto:dennis...@amd.com>> Best Regards Dennis Li *From:* amd-gfx *On Behalf Of *Deucher, Alexander *Sent:* Thursday, February 27, 2020 11:18 PM *To:* Christian König ; Das, Nirmoy ; amd-gfx@lists.freedesktop.org *Subject:* Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini [AMD Public Use] Looks good to me. Reviewed-by: Alex Deucher <mailto:alexander.deuc...@amd.com>> *From:*Christian König <mailto:ckoenig.leichtzumer...@gmail.com>> *Sent:* Thursday, February 27, 2020 9:50 AM *To:* Das, Nirmoy mailto:nirmoy@amd.com>>; amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org> <mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander mailto:alexander.deuc...@amd.com>> *Subject:* Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini Alex any comment on this? Am 25.02.20 um 14:16 schrieb Nirmoy: > Acked-by: Nirmoy Das mailto:nirmoy@amd.com>> > > On 2/25/20 2:07 PM, Christian König wrote: >> When we stop the HW for example for GPU reset we should not stop the >> front-end scheduler. Otherwise we run into intermediate failures during >> command submission. >> >> The scheduler should only be stopped in very few cases: >> 1. We can't get the hardware working in ring or IB test after a GPU >> reset. >> 2. The KIQ scheduler is not used in the front-end and should be >> disabled during GPU reset. >> 3. In amdgpu_ring_fini() when the driver unloads. >> >> Signed-off-by: Christian König <mailto:christian.koe...@amd.com>> >> --- >> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 >> drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - >> drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/si_dma.c | 1 - >> drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- >> 20 files changed, 10 insertions(+), 104 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> index 4274ccf765de..cb3b3a0a1348 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct >> amdgpu_device *adev) >> WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); >> WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); >> } >> - sdma0->sched.ready = false; >> - sdma1->sched.ready = false; >> } >> /** >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> index 7b6158320400..36ce67ce4800 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct >> amdgpu_device *adev, bool enable) >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); >> - if (!enable) { >> - for (i = 0; i < adev->gfx.num_gfx_rings; i++) >> - adev->gfx.gfx_ring[i].sched.ready = false; >> - } >> WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); >> for (i = 0; i < adev->usec_timeout; i++) { >> @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct >> amdgpu_device *adev) >> static void gfx_v10_0_c
RE: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
[AMD Public Use] Looks good to me Test-by: Dennis Li mailto:dennis...@amd.com>> Best Regards Dennis Li From: amd-gfx On Behalf Of Deucher, Alexander Sent: Thursday, February 27, 2020 11:18 PM To: Christian König ; Das, Nirmoy ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini [AMD Public Use] Looks good to me. Reviewed-by: Alex Deucher mailto:alexander.deuc...@amd.com>> From: Christian König mailto:ckoenig.leichtzumer...@gmail.com>> Sent: Thursday, February 27, 2020 9:50 AM To: Das, Nirmoy mailto:nirmoy@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander mailto:alexander.deuc...@amd.com>> Subject: Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini Alex any comment on this? Am 25.02.20 um 14:16 schrieb Nirmoy: > Acked-by: Nirmoy Das mailto:nirmoy@amd.com>> > > On 2/25/20 2:07 PM, Christian König wrote: >> When we stop the HW for example for GPU reset we should not stop the >> front-end scheduler. Otherwise we run into intermediate failures during >> command submission. >> >> The scheduler should only be stopped in very few cases: >> 1. We can't get the hardware working in ring or IB test after a GPU >> reset. >> 2. The KIQ scheduler is not used in the front-end and should be >> disabled during GPU reset. >> 3. In amdgpu_ring_fini() when the driver unloads. >> >> Signed-off-by: Christian König >> mailto:christian.koe...@amd.com>> >> --- >> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 >> drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - >> drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/si_dma.c| 1 - >> drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- >> 20 files changed, 10 insertions(+), 104 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> index 4274ccf765de..cb3b3a0a1348 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct >> amdgpu_device *adev) >> WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); >> WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); >> } >> -sdma0->sched.ready = false; >> -sdma1->sched.ready = false; >> } >> /** >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> index 7b6158320400..36ce67ce4800 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct >> amdgpu_device *adev, bool enable) >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); >> -if (!enable) { >> -for (i = 0; i < adev->gfx.num_gfx_rings; i++) >> -adev->gfx.gfx_ring[i].sched.ready = false; >> -} >> WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); >> for (i = 0; i < adev->usec_timeout; i++) { >> @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct >> amdgpu_device *adev) >> static void gfx_v10_0_cp_compute_enable(struct amdgpu_device >> *adev, bool enable) >> { >> -int i; >> - >> if (enable) { >> WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0); >> } else { >> WREG32_SOC15(GC, 0
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
[AMD Public Use] Looks good to me. Reviewed-by: Alex Deucher From: Christian König Sent: Thursday, February 27, 2020 9:50 AM To: Das, Nirmoy ; amd-gfx@lists.freedesktop.org ; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini Alex any comment on this? Am 25.02.20 um 14:16 schrieb Nirmoy: > Acked-by: Nirmoy Das > > On 2/25/20 2:07 PM, Christian König wrote: >> When we stop the HW for example for GPU reset we should not stop the >> front-end scheduler. Otherwise we run into intermediate failures during >> command submission. >> >> The scheduler should only be stopped in very few cases: >> 1. We can't get the hardware working in ring or IB test after a GPU >> reset. >> 2. The KIQ scheduler is not used in the front-end and should be >> disabled during GPU reset. >> 3. In amdgpu_ring_fini() when the driver unloads. >> >> Signed-off-by: Christian König >> --- >> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 >> drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - >> drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- >> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/si_dma.c| 1 - >> drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- >> drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 >> drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- >> drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - >> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- >> 20 files changed, 10 insertions(+), 104 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> index 4274ccf765de..cb3b3a0a1348 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c >> @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct >> amdgpu_device *adev) >> WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); >> WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); >> } >> -sdma0->sched.ready = false; >> -sdma1->sched.ready = false; >> } >> /** >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> index 7b6158320400..36ce67ce4800 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >> @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct >> amdgpu_device *adev, bool enable) >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); >> tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); >> -if (!enable) { >> -for (i = 0; i < adev->gfx.num_gfx_rings; i++) >> -adev->gfx.gfx_ring[i].sched.ready = false; >> -} >> WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); >> for (i = 0; i < adev->usec_timeout; i++) { >> @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct >> amdgpu_device *adev) >> static void gfx_v10_0_cp_compute_enable(struct amdgpu_device >> *adev, bool enable) >> { >> -int i; >> - >> if (enable) { >> WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0); >> } else { >> WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, >>(CP_MEC_CNTL__MEC_ME1_HALT_MASK | >> CP_MEC_CNTL__MEC_ME2_HALT_MASK)); >> -for (i = 0; i < adev->gfx.num_compute_rings; i++) >> -adev->gfx.compute_ring[i].sched.ready = false; >> adev->gfx.kiq.ring.sched.ready = false; >> } >> udelay(50); >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c >> index 31f44d05e606..e462a099dbda 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c >> @@ -1950,7 +1
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
Alex any comment on this? Am 25.02.20 um 14:16 schrieb Nirmoy: Acked-by: Nirmoy Das On 2/25/20 2:07 PM, Christian König wrote: When we stop the HW for example for GPU reset we should not stop the front-end scheduler. Otherwise we run into intermediate failures during command submission. The scheduler should only be stopped in very few cases: 1. We can't get the hardware working in ring or IB test after a GPU reset. 2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset. 3. In amdgpu_ring_fini() when the driver unloads. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/si_dma.c | 1 - drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- 20 files changed, 10 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 4274ccf765de..cb3b3a0a1348 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); } - sdma0->sched.ready = false; - sdma1->sched.ready = false; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 7b6158320400..36ce67ce4800 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); - if (!enable) { - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - } WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); for (i = 0; i < adev->usec_timeout; i++) { @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device *adev) static void gfx_v10_0_cp_compute_enable(struct amdgpu_device *adev, bool enable) { - int i; - if (enable) { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0); } else { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, (CP_MEC_CNTL__MEC_ME1_HALT_MASK | CP_MEC_CNTL__MEC_ME2_HALT_MASK)); - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; adev->gfx.kiq.ring.sched.ready = false; } udelay(50); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 31f44d05e606..e462a099dbda 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -1950,7 +1950,6 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) { - int i; if (enable) { WREG32(mmCP_ME_CNTL, 0); } else { @@ -1958,10 +1957,6 @@ static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) CP_ME_CNTL__PFP_HALT_MASK | CP_ME_CNTL__CE_HALT_MASK)); WREG32(mmSCRATCH_UMSK, 0); - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; } udelay(50); } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 8f20a5dd44fe..9bc8673c83ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -2431,15 +2431,12 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) */ static void gfx_v7_0_cp_gfx_enable(struct amdgpu_device *adev
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
On 2/25/20 2:16 PM, Nirmoy wrote: Acked-by: Nirmoy Das Please change that to Tested-by: Nirmoy Das ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
Hi Dennis, Can you please test this patch on vega20 for SWDEV-223117 Regards, Nirmoy On 2/25/20 2:07 PM, Christian König wrote: When we stop the HW for example for GPU reset we should not stop the front-end scheduler. Otherwise we run into intermediate failures during command submission. The scheduler should only be stopped in very few cases: 1. We can't get the hardware working in ring or IB test after a GPU reset. 2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset. 3. In amdgpu_ring_fini() when the driver unloads. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/si_dma.c| 1 - drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- 20 files changed, 10 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 4274ccf765de..cb3b3a0a1348 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); } - sdma0->sched.ready = false; - sdma1->sched.ready = false; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 7b6158320400..36ce67ce4800 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); - if (!enable) { - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - } WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); for (i = 0; i < adev->usec_timeout; i++) { @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device *adev) static void gfx_v10_0_cp_compute_enable(struct amdgpu_device *adev, bool enable) { - int i; - if (enable) { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0); } else { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, (CP_MEC_CNTL__MEC_ME1_HALT_MASK | CP_MEC_CNTL__MEC_ME2_HALT_MASK)); - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; adev->gfx.kiq.ring.sched.ready = false; } udelay(50); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 31f44d05e606..e462a099dbda 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -1950,7 +1950,6 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) { - int i; if (enable) { WREG32(mmCP_ME_CNTL, 0); } else { @@ -1958,10 +1957,6 @@ static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) CP_ME_CNTL__PFP_HALT_MASK | CP_ME_CNTL__CE_HALT_MASK)); WREG32(mmSCRATCH_UMSK, 0); - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; } udelay(50); } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 8f20a5dd44fe..9bc8673c83ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
Hi Christian, I tested with amdgpu_test which does a GPU reset as well because of deadlock_tests. Reset was fine I could run amdgpu_test multiple times. dmesg: Feb 25 14:32:20 brihaspati kernel: [drm:gfx_v9_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream Feb 25 14:32:20 brihaspati kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=290, emitted seq=291 Feb 25 14:32:20 brihaspati kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process amdgpu_test pid 2401 thread amdgpu_test pid 2401 Feb 25 14:32:20 brihaspati kernel: amdgpu :09:00.0: GPU reset begin! Feb 25 14:32:21 brihaspati kernel: amdgpu :09:00.0: GPU BACO reset Feb 25 14:32:21 brihaspati kernel: amdgpu :09:00.0: GPU reset succeeded, trying to resume Feb 25 14:32:21 brihaspati kernel: [drm] PCIE GART of 512M enabled (table at 0x00F40090). Feb 25 14:32:21 brihaspati kernel: [drm] VRAM is lost due to GPU reset! Feb 25 14:32:21 brihaspati kernel: [drm] PSP is resuming... Feb 25 14:32:22 brihaspati kernel: [drm] reserve 0x40 from 0xf5fe80 for PSP TMR Feb 25 14:32:22 brihaspati kernel: [drm] kiq ring mec 2 pipe 1 q 0 Feb 25 14:32:22 brihaspati kernel: [drm] UVD and UVD ENC initialized successfully. Feb 25 14:32:22 brihaspati kernel: [drm] VCE initialized successfully. Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring gfx uses VM inv eng 0 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring sdma0 uses VM inv eng 0 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring page0 uses VM inv eng 1 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring sdma1 uses VM inv eng 4 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring page1 uses VM inv eng 5 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring uvd_0 uses VM inv eng 6 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring vce0 uses VM inv eng 9 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring vce1 uses VM inv eng 10 on hub 1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: ring vce2 uses VM inv eng 11 on hub 1 Feb 25 14:32:22 brihaspati kernel: [drm] ECC is not present. Feb 25 14:32:22 brihaspati kernel: [drm] SRAM ECC is not present. Feb 25 14:32:22 brihaspati kernel: [drm] recover vram bo from shadow start Feb 25 14:32:22 brihaspati kernel: [drm] recover vram bo from shadow done Feb 25 14:32:22 brihaspati kernel: [drm] Skip scheduling IBs! Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: GPU reset(2) succeeded! Feb 25 14:32:22 brihaspati kernel: gmc_v9_0_process_interrupt: 45 callbacks suppressed Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32769, for process amdgpu_test pid 2401 thread amdgpu_test pid 2401) Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: in page starting at address 0xdeadb000 from client 27 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00440C51 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: MORE_FAULTS: 0x1 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: WALKER_ERROR: 0x0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: PERMISSION_FAULTS: 0x5 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: MAPPING_ERROR: 0x0 Feb 25 14:32:22 brihaspati kernel: amdgpu :09:00.0: RW: 0x1 Feb 25 14:32:23 brihaspati systemd[1]: NetworkManager-dispatcher.service: Succeeded. Feb 25 14:32:24 brihaspati nscd[1255]: 1255 checking for monitored file `/etc/services': No such file or directory Feb 25 14:32:32 brihaspati PackageKit[2092]: daemon quit Feb 25 14:32:32 brihaspati systemd[1]: packagekit.service: Succeeded. Feb 25 14:32:39 brihaspati nscd[1255]: 1255 checking for monitored fil
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
Am 25.02.20 um 14:16 schrieb Nirmoy: Acked-by: Nirmoy Das Could you test it as well? I only did a quick round of smoke tests, but somebody should probably run a gpu reset test as well. Thanks in advance, Christian. On 2/25/20 2:07 PM, Christian König wrote: When we stop the HW for example for GPU reset we should not stop the front-end scheduler. Otherwise we run into intermediate failures during command submission. The scheduler should only be stopped in very few cases: 1. We can't get the hardware working in ring or IB test after a GPU reset. 2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset. 3. In amdgpu_ring_fini() when the driver unloads. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/si_dma.c | 1 - drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- 20 files changed, 10 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 4274ccf765de..cb3b3a0a1348 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); } - sdma0->sched.ready = false; - sdma1->sched.ready = false; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 7b6158320400..36ce67ce4800 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); - if (!enable) { - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - } WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); for (i = 0; i < adev->usec_timeout; i++) { @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device *adev) static void gfx_v10_0_cp_compute_enable(struct amdgpu_device *adev, bool enable) { - int i; - if (enable) { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0); } else { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, (CP_MEC_CNTL__MEC_ME1_HALT_MASK | CP_MEC_CNTL__MEC_ME2_HALT_MASK)); - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; adev->gfx.kiq.ring.sched.ready = false; } udelay(50); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 31f44d05e606..e462a099dbda 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -1950,7 +1950,6 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) { - int i; if (enable) { WREG32(mmCP_ME_CNTL, 0); } else { @@ -1958,10 +1957,6 @@ static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) CP_ME_CNTL__PFP_HALT_MASK | CP_ME_CNTL__CE_HALT_MASK)); WREG32(mmSCRATCH_UMSK, 0); - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; } udelay(50); } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 8f20a5dd44fe..9bc8673c83ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -2431,15 +2431,12 @@ static
Re: [PATCH] drm/amdgpu: stop disable the scheduler during HW fini
Acked-by: Nirmoy Das On 2/25/20 2:07 PM, Christian König wrote: When we stop the HW for example for GPU reset we should not stop the front-end scheduler. Otherwise we run into intermediate failures during command submission. The scheduler should only be stopped in very few cases: 1. We can't get the hardware working in ring or IB test after a GPU reset. 2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset. 3. In amdgpu_ring_fini() when the driver unloads. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/si_dma.c| 1 - drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- 20 files changed, 10 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 4274ccf765de..cb3b3a0a1348 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); } - sdma0->sched.ready = false; - sdma1->sched.ready = false; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 7b6158320400..36ce67ce4800 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); - if (!enable) { - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - } WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); for (i = 0; i < adev->usec_timeout; i++) { @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device *adev) static void gfx_v10_0_cp_compute_enable(struct amdgpu_device *adev, bool enable) { - int i; - if (enable) { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0); } else { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, (CP_MEC_CNTL__MEC_ME1_HALT_MASK | CP_MEC_CNTL__MEC_ME2_HALT_MASK)); - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; adev->gfx.kiq.ring.sched.ready = false; } udelay(50); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 31f44d05e606..e462a099dbda 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -1950,7 +1950,6 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) { - int i; if (enable) { WREG32(mmCP_ME_CNTL, 0); } else { @@ -1958,10 +1957,6 @@ static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) CP_ME_CNTL__PFP_HALT_MASK | CP_ME_CNTL__CE_HALT_MASK)); WREG32(mmSCRATCH_UMSK, 0); - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; } udelay(50); } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 8f20a5dd44fe..9bc8673c83ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -2431,15 +2431,12 @@ static int g
[PATCH] drm/amdgpu: stop disable the scheduler during HW fini
When we stop the HW for example for GPU reset we should not stop the front-end scheduler. Otherwise we run into intermediate failures during command submission. The scheduler should only be stopped in very few cases: 1. We can't get the hardware working in ring or IB test after a GPU reset. 2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset. 3. In amdgpu_ring_fini() when the driver unloads. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 2 -- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 5 - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 25 + drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 9 - drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 -- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/si_dma.c| 1 - drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 7 --- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 4 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 3 --- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 9 - drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 11 +-- 20 files changed, 10 insertions(+), 104 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c index 4274ccf765de..cb3b3a0a1348 100644 --- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c @@ -320,8 +320,6 @@ static void cik_sdma_gfx_stop(struct amdgpu_device *adev) WREG32(mmSDMA0_GFX_RB_CNTL + sdma_offsets[i], rb_cntl); WREG32(mmSDMA0_GFX_IB_CNTL + sdma_offsets[i], 0); } - sdma0->sched.ready = false; - sdma1->sched.ready = false; } /** diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 7b6158320400..36ce67ce4800 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -2391,10 +2391,6 @@ static int gfx_v10_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, ME_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, PFP_HALT, enable ? 0 : 1); tmp = REG_SET_FIELD(tmp, CP_ME_CNTL, CE_HALT, enable ? 0 : 1); - if (!enable) { - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - } WREG32_SOC15(GC, 0, mmCP_ME_CNTL, tmp); for (i = 0; i < adev->usec_timeout; i++) { @@ -2869,16 +2865,12 @@ static int gfx_v10_0_cp_gfx_resume(struct amdgpu_device *adev) static void gfx_v10_0_cp_compute_enable(struct amdgpu_device *adev, bool enable) { - int i; - if (enable) { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, 0); } else { WREG32_SOC15(GC, 0, mmCP_MEC_CNTL, (CP_MEC_CNTL__MEC_ME1_HALT_MASK | CP_MEC_CNTL__MEC_ME2_HALT_MASK)); - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; adev->gfx.kiq.ring.sched.ready = false; } udelay(50); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c index 31f44d05e606..e462a099dbda 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c @@ -1950,7 +1950,6 @@ static int gfx_v6_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) { - int i; if (enable) { WREG32(mmCP_ME_CNTL, 0); } else { @@ -1958,10 +1957,6 @@ static void gfx_v6_0_cp_gfx_enable(struct amdgpu_device *adev, bool enable) CP_ME_CNTL__PFP_HALT_MASK | CP_ME_CNTL__CE_HALT_MASK)); WREG32(mmSCRATCH_UMSK, 0); - for (i = 0; i < adev->gfx.num_gfx_rings; i++) - adev->gfx.gfx_ring[i].sched.ready = false; - for (i = 0; i < adev->gfx.num_compute_rings; i++) - adev->gfx.compute_ring[i].sched.ready = false; } udelay(50); } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c index 8f20a5dd44fe..9bc8673c83ac 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c @@ -2431,15 +2431,12 @@ static int gfx_v7_0_ring_test_ib(struct amdgpu_ring *ring, long timeout) */ static void gfx_v7_0_cp_gfx_en