Re: [PATCH] drm/amdgpu: wait for all rings to drain before runtime suspending
On Wed, Dec 11, 2019 at 8:07 AM Christian König wrote: > > Am 11.12.19 um 03:26 schrieb zhoucm1: > > > > On 2019/12/11 上午6:08, Alex Deucher wrote: > >> Add a safety check to runtime suspend to make sure all outstanding > >> fences have signaled before we suspend. Doesn't fix any known issue. > >> > >> We already do this via the fence driver suspend function, but we > >> just force completion rather than bailing. This bails on runtime > >> suspend so we can try again later once the fences are signaled to > >> avoid missing any outstanding work. > > > > The idea sounds OK to me, but if you want to drain the rings, you > > should make sure no more submission, right? > > > > So you should park all schedulers before waiting for all outstanding > > fences completed. > > At that point userspace should already be put to hold, so no new > submissions. But it probably won't hurt stopping the scheduler anyway. > Any ioctl calls will wake the hw again or increase the usage count. > But another issue I see is what happens if we locked up the hardware? > Regular GPU reset would kick in eventually. Alex > Christian. > > > > > -David > > > >> > >> Signed-off-by: Alex Deucher > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +++- > >> 1 file changed, 11 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >> index 2f367146c72c..81322b0a8acf 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >> @@ -1214,13 +1214,23 @@ static int > >> amdgpu_pmops_runtime_suspend(struct device *dev) > >> struct pci_dev *pdev = to_pci_dev(dev); > >> struct drm_device *drm_dev = pci_get_drvdata(pdev); > >> struct amdgpu_device *adev = drm_dev->dev_private; > >> -int ret; > >> +int ret, i; > >> if (!adev->runpm) { > >> pm_runtime_forbid(dev); > >> return -EBUSY; > >> } > >> +/* wait for all rings to drain before suspending */ > >> +for (i = 0; i < AMDGPU_MAX_RINGS; i++) { > >> +struct amdgpu_ring *ring = adev->rings[i]; > >> +if (ring && ring->sched.ready) { > >> +ret = amdgpu_fence_wait_empty(ring); > >> +if (ret) > >> +return -EBUSY; > >> +} > >> +} > >> + > >> if (amdgpu_device_supports_boco(drm_dev)) > >> drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; > >> drm_kms_helper_poll_disable(drm_dev); > > ___ > > amd-gfx mailing list > > amd-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: wait for all rings to drain before runtime suspending
Am 11.12.19 um 03:26 schrieb zhoucm1: On 2019/12/11 上午6:08, Alex Deucher wrote: Add a safety check to runtime suspend to make sure all outstanding fences have signaled before we suspend. Doesn't fix any known issue. We already do this via the fence driver suspend function, but we just force completion rather than bailing. This bails on runtime suspend so we can try again later once the fences are signaled to avoid missing any outstanding work. The idea sounds OK to me, but if you want to drain the rings, you should make sure no more submission, right? So you should park all schedulers before waiting for all outstanding fences completed. At that point userspace should already be put to hold, so no new submissions. But it probably won't hurt stopping the scheduler anyway. But another issue I see is what happens if we locked up the hardware? Christian. -David Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 2f367146c72c..81322b0a8acf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1214,13 +1214,23 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) struct pci_dev *pdev = to_pci_dev(dev); struct drm_device *drm_dev = pci_get_drvdata(pdev); struct amdgpu_device *adev = drm_dev->dev_private; - int ret; + int ret, i; if (!adev->runpm) { pm_runtime_forbid(dev); return -EBUSY; } + /* wait for all rings to drain before suspending */ + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + if (ring && ring->sched.ready) { + ret = amdgpu_fence_wait_empty(ring); + if (ret) + return -EBUSY; + } + } + if (amdgpu_device_supports_boco(drm_dev)) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; drm_kms_helper_poll_disable(drm_dev); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: wait for all rings to drain before runtime suspending
On 2019/12/11 上午6:08, Alex Deucher wrote: Add a safety check to runtime suspend to make sure all outstanding fences have signaled before we suspend. Doesn't fix any known issue. We already do this via the fence driver suspend function, but we just force completion rather than bailing. This bails on runtime suspend so we can try again later once the fences are signaled to avoid missing any outstanding work. The idea sounds OK to me, but if you want to drain the rings, you should make sure no more submission, right? So you should park all schedulers before waiting for all outstanding fences completed. -David Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 2f367146c72c..81322b0a8acf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1214,13 +1214,23 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) struct pci_dev *pdev = to_pci_dev(dev); struct drm_device *drm_dev = pci_get_drvdata(pdev); struct amdgpu_device *adev = drm_dev->dev_private; - int ret; + int ret, i; if (!adev->runpm) { pm_runtime_forbid(dev); return -EBUSY; } + /* wait for all rings to drain before suspending */ + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + if (ring && ring->sched.ready) { + ret = amdgpu_fence_wait_empty(ring); + if (ret) + return -EBUSY; + } + } + if (amdgpu_device_supports_boco(drm_dev)) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; drm_kms_helper_poll_disable(drm_dev); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: wait for all rings to drain before runtime suspending
Add a safety check to runtime suspend to make sure all outstanding fences have signaled before we suspend. Doesn't fix any known issue. We already do this via the fence driver suspend function, but we just force completion rather than bailing. This bails on runtime suspend so we can try again later once the fences are signaled to avoid missing any outstanding work. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 2f367146c72c..81322b0a8acf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -1214,13 +1214,23 @@ static int amdgpu_pmops_runtime_suspend(struct device *dev) struct pci_dev *pdev = to_pci_dev(dev); struct drm_device *drm_dev = pci_get_drvdata(pdev); struct amdgpu_device *adev = drm_dev->dev_private; - int ret; + int ret, i; if (!adev->runpm) { pm_runtime_forbid(dev); return -EBUSY; } + /* wait for all rings to drain before suspending */ + for (i = 0; i < AMDGPU_MAX_RINGS; i++) { + struct amdgpu_ring *ring = adev->rings[i]; + if (ring && ring->sched.ready) { + ret = amdgpu_fence_wait_empty(ring); + if (ret) + return -EBUSY; + } + } + if (amdgpu_device_supports_boco(drm_dev)) drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING; drm_kms_helper_poll_disable(drm_dev); -- 2.23.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx