Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
I already did, thanks to Shayun I already tested on XGMI SRIOV and it looks ok. What I need now is code review, mostly on the new patches (8-12). I hope you, Monk, Shayun, Lijo and Christian can help with that. Andrey From: Chen, JingWen Sent: 06 February 2022 21:41 To: Grodzovsky, Andrey ; Christian König ; Koenig, Christian ; Lazar, Lijo ; dri-devel@lists.freedesktop.org ; amd-...@lists.freedesktop.org ; Chen, JingWen Cc: Chen, Horace ; Liu, Monk Subject: Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs Hi Andrey, I don't have any XGMI machines here, maybe you can reach out shaoyun for help. On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote: Just a gentle ping. Andrey From: Grodzovsky, Andrey Sent: 26 January 2022 10:52 To: Christian König <mailto:ckoenig.leichtzumer...@gmail.com>; Koenig, Christian <mailto:christian.koe...@amd.com>; Lazar, Lijo <mailto:lijo.la...@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org> <mailto:dri-devel@lists.freedesktop.org>; amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org> <mailto:amd-...@lists.freedesktop.org>; Chen, JingWen <mailto:jingwen.ch...@amd.com> Cc: Chen, Horace <mailto:horace.c...@amd.com>; Liu, Monk <mailto:monk@amd.com> Subject: Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs JingWen - could you maybe give those patches a try on SRIOV XGMI system ? If you see issues maybe you could let me connect and debug. My SRIOV XGMI system which Shayun kindly arranged for me is not loading the driver with my drm-misc-next branch even without my patches. Andrey On 2022-01-17 14:21, Andrey Grodzovsky wrote: On 2022-01-17 2:17 p.m., Christian König wrote: Am 17.01.22 um 20:14 schrieb Andrey Grodzovsky: Ping on the question Oh, my! That was already more than a week ago and is completely swapped out of my head again. Andrey On 2022-01-05 1:11 p.m., Andrey Grodzovsky wrote: Also, what about having the reset_active or in_reset flag in the reset_domain itself? Of hand that sounds like a good idea. What then about the adev->reset_sem semaphore ? Should we also move this to reset_domain ? Both of the moves have functional implications only for XGMI case because there will be contention over accessing those single instance variables from multiple devices while now each device has it's own copy. Since this is a rw semaphore that should be unproblematic I think. It could just be that the cache line of the lock then plays ping/pong between the CPU cores. What benefit the centralization into reset_domain gives - is it for example to prevent one device in a hive trying to access through MMIO another one's VRAM (shared FB memory) while the other one goes through reset ? I think that this is the killer argument for a centralized lock, yes. np, i will add a patch with centralizing both flag into reset domain and resend. Andrey Christian. Andrey
Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
Just a gentle ping. Andrey From: Grodzovsky, Andrey Sent: 26 January 2022 10:52 To: Christian König ; Koenig, Christian ; Lazar, Lijo ; dri-devel@lists.freedesktop.org ; amd-...@lists.freedesktop.org ; Chen, JingWen Cc: Chen, Horace ; Liu, Monk Subject: Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs JingWen - could you maybe give those patches a try on SRIOV XGMI system ? If you see issues maybe you could let me connect and debug. My SRIOV XGMI system which Shayun kindly arranged for me is not loading the driver with my drm-misc-next branch even without my patches. Andrey On 2022-01-17 14:21, Andrey Grodzovsky wrote: On 2022-01-17 2:17 p.m., Christian König wrote: Am 17.01.22 um 20:14 schrieb Andrey Grodzovsky: Ping on the question Oh, my! That was already more than a week ago and is completely swapped out of my head again. Andrey On 2022-01-05 1:11 p.m., Andrey Grodzovsky wrote: Also, what about having the reset_active or in_reset flag in the reset_domain itself? Of hand that sounds like a good idea. What then about the adev->reset_sem semaphore ? Should we also move this to reset_domain ? Both of the moves have functional implications only for XGMI case because there will be contention over accessing those single instance variables from multiple devices while now each device has it's own copy. Since this is a rw semaphore that should be unproblematic I think. It could just be that the cache line of the lock then plays ping/pong between the CPU cores. What benefit the centralization into reset_domain gives - is it for example to prevent one device in a hive trying to access through MMIO another one's VRAM (shared FB memory) while the other one goes through reset ? I think that this is the killer argument for a centralized lock, yes. np, i will add a patch with centralizing both flag into reset domain and resend. Andrey Christian. Andrey
Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4)
AFAIK this one is independent. Christian, can you confirm ? Andrey From: amd-gfx on behalf of Alex Deucher Sent: 14 September 2021 15:33 To: Christian König Cc: Liu, Monk ; amd-gfx list ; Maling list - DRI developers Subject: Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v4) Was this fix independent of the other discussions? Should this be applied to drm-misc? Alex On Wed, Sep 1, 2021 at 4:42 PM Alex Deucher wrote: > > On Wed, Sep 1, 2021 at 2:50 AM Christian König > wrote: > > > > Am 01.09.21 um 02:46 schrieb Monk Liu: > > > issue: > > > in cleanup_job the cancle_delayed_work will cancel a TO timer > > > even the its corresponding job is still running. > > > > > > fix: > > > do not cancel the timer in cleanup_job, instead do the cancelling > > > only when the heading job is signaled, and if there is a "next" job > > > we start_timeout again. > > > > > > v2: > > > further cleanup the logic, and do the TDR timer cancelling if the > > > signaled job > > > is the last one in its scheduler. > > > > > > v3: > > > change the issue description > > > remove the cancel_delayed_work in the begining of the cleanup_job > > > recover the implement of drm_sched_job_begin. > > > > > > v4: > > > remove the kthread_should_park() checking in cleanup_job routine, > > > we should cleanup the signaled job asap > > > > > > TODO: > > > 1)introduce pause/resume scheduler in job_timeout to serial the handling > > > of scheduler and job_timeout. > > > 2)drop the bad job's del and insert in scheduler due to above > > > serialization > > > (no race issue anymore with the serialization) > > > > > > tested-by: jingwen > > > Signed-off-by: Monk Liu > > > > Reviewed-by: Christian König > > > > Are you planning to push this to drm-misc? > > Alex > > > > > --- > > > drivers/gpu/drm/scheduler/sched_main.c | 26 +- > > > 1 file changed, 9 insertions(+), 17 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > index a2a9536..3e0bbc7 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > @@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler > > > *sched) > > > { > > > struct drm_sched_job *job, *next; > > > > > > - /* > > > - * Don't destroy jobs while the timeout worker is running OR thread > > > - * is being parked and hence assumed to not touch pending_list > > > - */ > > > - if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && > > > - !cancel_delayed_work(&sched->work_tdr)) || > > > - kthread_should_park()) > > > - return NULL; > > > - > > > spin_lock(&sched->job_list_lock); > > > > > > job = list_first_entry_or_null(&sched->pending_list, > > > @@ -693,17 +684,21 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler > > > *sched) > > > if (job && dma_fence_is_signaled(&job->s_fence->finished)) { > > > /* remove job from pending_list */ > > > list_del_init(&job->list); > > > + > > > + /* cancel this job's TO timer */ > > > + cancel_delayed_work(&sched->work_tdr); > > > /* make the scheduled timestamp more accurate */ > > > next = list_first_entry_or_null(&sched->pending_list, > > > typeof(*next), list); > > > - if (next) > > > + > > > + if (next) { > > > next->s_fence->scheduled.timestamp = > > > job->s_fence->finished.timestamp; > > > - > > > + /* start TO timer for next job */ > > > + drm_sched_start_timeout(sched); > > > + } > > > } else { > > > job = NULL; > > > - /* queue timeout for next job */ > > > - drm_sched_start_timeout(sched); > > > } > > > > > > spin_unlock(&sched->job_list_lock); > > > @@ -791,11 +786,8 @@ static int drm_sched_main(void *param) > > > (entity = > > > drm_sched_select_entity(sched))) || > > >kthread_should_stop()); > > > > > > - if (cleanup_job) { > > > + if (cleanup_job) > > > sched->ops->free_job(cleanup_job); > > > - /* queue timeout for next job */ > > > - drm_sched_start_timeout(sched); > > > - } > > > > > > if (!entity) > > > continue; > >
Re: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3)
What about removing (kthread_should_park()) ? We decided it's useless as far as I remember. Andrey From: amd-gfx on behalf of Liu, Monk Sent: 31 August 2021 20:24 To: Liu, Monk ; amd-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Subject: RE: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3) [AMD Official Use Only] Ping Christian, Andrey Can we merge this patch first ? this is a standalone patch for the timer Thanks -- Monk Liu | Cloud-GPU Core team -- -Original Message- From: Monk Liu Sent: Tuesday, August 31, 2021 6:36 PM To: amd-...@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org; Liu, Monk Subject: [PATCH 1/2] drm/sched: fix the bug of time out calculation(v3) issue: in cleanup_job the cancle_delayed_work will cancel a TO timer even the its corresponding job is still running. fix: do not cancel the timer in cleanup_job, instead do the cancelling only when the heading job is signaled, and if there is a "next" job we start_timeout again. v2: further cleanup the logic, and do the TDR timer cancelling if the signaled job is the last one in its scheduler. v3: change the issue description remove the cancel_delayed_work in the begining of the cleanup_job recover the implement of drm_sched_job_begin. TODO: 1)introduce pause/resume scheduler in job_timeout to serial the handling of scheduler and job_timeout. 2)drop the bad job's del and insert in scheduler due to above serialization (no race issue anymore with the serialization) tested-by: jingwen Signed-off-by: Monk Liu --- drivers/gpu/drm/scheduler/sched_main.c | 25 ++--- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a9536..ecf8140 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -676,13 +676,7 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) { struct drm_sched_job *job, *next; - /* -* Don't destroy jobs while the timeout worker is running OR thread -* is being parked and hence assumed to not touch pending_list -*/ - if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && - !cancel_delayed_work(&sched->work_tdr)) || - kthread_should_park()) + if (kthread_should_park()) return NULL; spin_lock(&sched->job_list_lock); @@ -693,17 +687,21 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) if (job && dma_fence_is_signaled(&job->s_fence->finished)) { /* remove job from pending_list */ list_del_init(&job->list); + + /* cancel this job's TO timer */ + cancel_delayed_work(&sched->work_tdr); /* make the scheduled timestamp more accurate */ next = list_first_entry_or_null(&sched->pending_list, typeof(*next), list); - if (next) + + if (next) { next->s_fence->scheduled.timestamp = job->s_fence->finished.timestamp; - + /* start TO timer for next job */ + drm_sched_start_timeout(sched); + } } else { job = NULL; - /* queue timeout for next job */ - drm_sched_start_timeout(sched); } spin_unlock(&sched->job_list_lock); @@ -791,11 +789,8 @@ static int drm_sched_main(void *param) (entity = drm_sched_select_entity(sched))) || kthread_should_stop()); - if (cleanup_job) { + if (cleanup_job) sched->ops->free_job(cleanup_job); - /* queue timeout for next job */ - drm_sched_start_timeout(sched); - } if (!entity) continue; -- 2.7.4
Re: [PATCH 0/7] libdrm tests for hot-unplug fe goature
Is libdrm on gitlab ? I wasn't aware of this. I assumed code reviews still go through dri-devel. Andrey From: Alex Deucher Sent: 03 June 2021 17:20 To: Grodzovsky, Andrey Cc: Maling list - DRI developers ; amd-gfx list ; Deucher, Alexander ; Christian König Subject: Re: [PATCH 0/7] libdrm tests for hot-unplug feature Please open a gitlab MR for these. Alex On Tue, Jun 1, 2021 at 4:17 PM Andrey Grodzovsky wrote: > > Adding some tests to acompany the recently added hot-unplug > feature. For now the test suite is disabled until the feature > propagates from drm-misc-next to drm-next. > > Andrey Grodzovsky (7): > tests/amdgpu: Fix valgrind warning > xf86drm: Add function to retrieve char device path > test/amdgpu: Add helper functions for hot unplug > test/amdgpu/hotunplug: Add test suite for GPU unplug > test/amdgpu/hotunplug: Add basic test > tests/amdgpu/hotunplug: Add unplug with cs test. > tests/amdgpu/hotunplug: Add hotunplug with exported bo test > > tests/amdgpu/amdgpu_test.c | 42 +++- > tests/amdgpu/amdgpu_test.h | 26 +++ > tests/amdgpu/basic_tests.c | 5 +- > tests/amdgpu/hotunplug_tests.c | 357 + > tests/amdgpu/meson.build | 1 + > xf86drm.c | 23 +++ > xf86drm.h | 1 + > 7 files changed, 450 insertions(+), 5 deletions(-) > create mode 100644 tests/amdgpu/hotunplug_tests.c > > -- > 2.25.1 > > ___ > amd-gfx mailing list > amd-...@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C8fb7f614798b4d19572e08d926d57530%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637583520507282588%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ozqlNQACGvLJugQ2GNvFl8CKgAH0thqMRpWjHpURlyc%3D&reserved=0
Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
Ok then, I guess I will proceed with the dummy pages list implementation then. Andrey From: Koenig, Christian Sent: 08 January 2021 09:52 To: Grodzovsky, Andrey ; Daniel Vetter Cc: amd-...@lists.freedesktop.org ; dri-devel@lists.freedesktop.org ; daniel.vet...@ffwll.ch ; r...@kernel.org ; l.st...@pengutronix.de ; yuq...@gmail.com ; e...@anholt.net ; Deucher, Alexander ; gre...@linuxfoundation.org ; ppaala...@gmail.com ; Wentland, Harry Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object Mhm, I'm not aware of any let over pointer between TTM and GEM and we worked quite hard on reducing the size of the amdgpu_bo, so another extra pointer just for that corner case would suck quite a bit. Christian. Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky: > Daniel had some objections to this (see bellow) and so I guess I need > you both to agree on the approach before I proceed. > > Andrey > > On 1/8/21 9:33 AM, Christian König wrote: >> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky: >>> Hey Christian, just a ping. >> >> Was there any question for me here? >> >> As far as I can see the best approach would still be to fill the VMA >> with a single dummy page and avoid pointers in the GEM object. >> >> Christian. >> >>> >>> Andrey >>> >>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote: >>>> >>>> On 1/7/21 11:30 AM, Daniel Vetter wrote: >>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote: >>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote: >>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote: >>>>>>>> On 11/23/20 3:01 AM, Christian König wrote: >>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky: >>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote: >>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once >>>>>>>>>>>> device is removed. >>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not >>>>>>>>>>> something we can do. >>>>>>>>>>> >>>>>>>>>>> We need to find a different approach here. >>>>>>>>>>> >>>>>>>>>>> Can't we call alloc_page() on each fault and link them together >>>>>>>>>>> so they are freed when the device is finally reaped? >>>>>>>>>> For sure better to optimize and allocate on demand when we reach >>>>>>>>>> this corner case, but why the linking ? >>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ? >>>>>>>>> I want to avoid keeping the page in the GEM object. >>>>>>>>> >>>>>>>>> What we can do is to allocate a page on demand for each fault >>>>>>>>> and link >>>>>>>>> the together in the bdev instead. >>>>>>>>> >>>>>>>>> And when the bdev is then finally destroyed after the last >>>>>>>>> application >>>>>>>>> closed we can finally release all of them. >>>>>>>>> >>>>>>>>> Christian. >>>>>>>> Hey, started to implement this and then realized that by >>>>>>>> allocating a page >>>>>>>> for each fault indiscriminately >>>>>>>> we will be allocating a new page for each faulting virtual >>>>>>>> address within a >>>>>>>> VA range belonging the same BO >>>>>>>> and this is obviously too much and not the intention. Should I >>>>>>>> instead use >>>>>>>> let's say a hashtable with the hash >>>>>>>> key being faulting BO address to actually keep allocating and >>>>>>>> reusing same >>>>>>>> dummy zero page per GEM BO >>>>>>>> (or for that matter DRM file object address for non imported >>>>>>>> BOs) ? >>>>>>> Why do we need a hashtable? All the sw structures to track this >>>>>>> should >>>>>>> s
Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
Hey, just a ping on my comments/question bellow. Andrey From: Grodzovsky, Andrey Sent: 25 November 2020 12:39 To: Daniel Vetter Cc: amd-gfx list ; dri-devel ; Christian König ; Rob Herring ; Lucas Stach ; Qiang Yu ; Anholt, Eric ; Pekka Paalanen ; Deucher, Alexander ; Greg KH ; Wentland, Harry Subject: Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug On 11/25/20 4:04 AM, Daniel Vetter wrote: On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky <mailto:andrey.grodzov...@amd.com> wrote: On 11/24/20 9:49 AM, Daniel Vetter wrote: On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote: Avoids NULL ptr due to kobj->sd being unset on device removal. Signed-off-by: Andrey Grodzovsky <mailto:andrey.grodzov...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index caf828a..812e592 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "amdgpu.h" #include "amdgpu_ras.h" @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev) .attrs = attrs, }; -sysfs_remove_group(&adev->dev->kobj, &group); +if (!drm_dev_is_unplugged(&adev->ddev)) +sysfs_remove_group(&adev->dev->kobj, &group); This looks wrong. sysfs, like any other interface, should be unconditionally thrown out when we do the drm_dev_unregister. Whether hotunplugged or not should matter at all. Either this isn't needed at all, or something is wrong with the ordering here. But definitely fishy. -Daniel So technically this is needed because kobejct's sysfs directory entry kobj->sd is set to NULL on device removal (from sysfs_remove_dir) but because we don't finalize the device until last reference to drm file is dropped (which can happen later) we end up calling sysfs_remove_file/dir after this pointer is NULL. sysfs_remove_file checks for NULL and aborts while sysfs_remove_dir is not and that why I guard against calls to sysfs_remove_dir. But indeed the whole approach in the driver is incorrect, as Greg pointed out - we should use default groups attributes instead of explicit calls to sysfs interface and this would save those troubles. But again. the issue here of scope of work, converting all of amdgpu to default groups attributes is somewhat lengthy process with extra testing as the entire driver is papered with sysfs references and seems to me more of a standalone cleanup, just like switching to devm_ and drmm_ work. To me at least it seems that it makes more sense to finalize and push the hot unplug patches so that this new functionality can be part of the driver sooner and then incrementally improve it by working on those other topics. Just as devm_/drmm_ I also added sysfs cleanup to my TODO list in the RFC patch. Hm, whether you solve this with the default group stuff to auto-remove, or remove explicitly at the right time doesn't matter much. The underlying problem you have here is that it's done way too late. As far as I understood correctly the default group attrs by reading this article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/ it will be removed together with the device and not too late like now and I quote from the last paragraph there: "By setting this value, you don’t have to do anything in your probe() or release() functions at all in order for the sysfs files to be properly created and destroyed whenever your device is added or removed from the system. And you will, most importantly, do it in a race-free manner, which is always a good thing." To me this seems like the best solution to the late remove issue. What do you think ? sysfs removal (like all uapi interfaces) need to be removed as part of drm_dev_unregister. Do you mean we need to trace and aggregate all sysfs files creation within the low level drivers and then call some sysfs release function inside drm_dev_unregister to iterate and release them all ? I guess aside from the split into fini_hw and fini_sw, you also need an unregister_late callback (like we have already for drm_connector, so that e.g. backlight and similar stuff can be unregistered). Is this the callback you suggest to call from within drm_dev_unregister and it will be responsible to release all sysfs files created within the driver ? Andrey Papering over the underlying bug like this doesn't really fix much, the lifetimes are still wrong. -Daniel Andrey return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/driver
Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
Hey Daniel, just a ping on a bunch of questions i posted bellow. Andtey From: Grodzovsky, Andrey Sent: 25 November 2020 14:34 To: Daniel Vetter ; Koenig, Christian Cc: r...@kernel.org ; daniel.vet...@ffwll.ch ; dri-devel@lists.freedesktop.org ; e...@anholt.net ; ppaala...@gmail.com ; amd-...@lists.freedesktop.org ; gre...@linuxfoundation.org ; Deucher, Alexander ; l.st...@pengutronix.de ; Wentland, Harry ; yuq...@gmail.com Subject: Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use On 11/25/20 11:36 AM, Daniel Vetter wrote: > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote: >> Am 25.11.20 um 11:40 schrieb Daniel Vetter: >>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote: >>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky: >>>>> On 11/24/20 2:41 AM, Christian König wrote: >>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky: >>>>>>> On 11/23/20 3:41 PM, Christian König wrote: >>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky: >>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote: >>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky: >>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote: >>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: >>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug >>>>>>>>>>>>> before device's IOMMU group is released. >>>>>>>>>>>> It would be cleaner if we could do the whole >>>>>>>>>>>> handling in TTM. I also need to double check >>>>>>>>>>>> what you are doing with this function. >>>>>>>>>>>> >>>>>>>>>>>> Christian. >>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology >>>>>>>>>>> notifier per device." to see >>>>>>>>>>> how i use it. I don't see why this should go >>>>>>>>>>> into TTM mid-layer - the stuff I do inside >>>>>>>>>>> is vendor specific and also I don't think TTM is >>>>>>>>>>> explicitly aware of IOMMU ? >>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be >>>>>>>>>>> registered from within TTM >>>>>>>>>>> and then use a hook to call into vendor specific handler ? >>>>>>>>>> No, that is really vendor specific. >>>>>>>>>> >>>>>>>>>> What I meant is to have a function like >>>>>>>>>> ttm_resource_manager_evict_all() which you only need >>>>>>>>>> to call and all tt objects are unpopulated. >>>>>>>>> So instead of this BO list i create and later iterate in >>>>>>>>> amdgpu from the IOMMU patch you just want to do it >>>>>>>>> within >>>>>>>>> TTM with a single function ? Makes much more sense. >>>>>>>> Yes, exactly. >>>>>>>> >>>>>>>> The list_empty() checks we have in TTM for the LRU are >>>>>>>> actually not the best idea, we should now check the >>>>>>>> pin_count instead. This way we could also have a list of the >>>>>>>> pinned BOs in TTM. >>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for >>>>>>> the unpinned BOs and this new function for the pinned ones ? >>>>>>> It's probably a good idea to combine both iterations into this >>>>>>> new function to cover all the BOs allocated on the device. >>>>>> Yes, that's what I had in my mind as well. >>>>>> >>>>>>>> BTW: Have you thought about what happens when we unpopulate >>>>>>>> a BO while we still try to use a kernel mapping for it? That >>>>>>>> could have unforeseen consequences. >>>>>>> Are you asking what happens to kmap or vmap style mapped CPU >>>>>>> accesses once we drop all the DMA backing pages for a particular >>>>>>> BO ? Because for user mappings >>>>>>>
Re: [PATCH v4] drm/scheduler: Avoid accessing freed bad job.
Christian asked to submit it to drm-misc instead of our drm-next to avoid later conflicts with Steven's patch which he mentioned in this thread which is not in drm-next yet. Christian, Alex, once this merged to drm-misc I guess we need to pull all latest changes from there to drm-next so the issue Emily reported can be avoided. Andrey From: Deng, Emily Sent: 25 November 2019 16:44:36 To: Grodzovsky, Andrey Cc: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; Koenig, Christian; steven.pr...@arm.com; Grodzovsky, Andrey Subject: RE: [PATCH v4] drm/scheduler: Avoid accessing freed bad job. [AMD Official Use Only - Internal Distribution Only] Hi Andrey, Seems you didn't submit this patch? Best wishes Emily Deng >-Original Message- >From: Andrey Grodzovsky >Sent: Monday, November 25, 2019 12:51 PM >Cc: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; Koenig, >Christian ; Deng, Emily >; steven.pr...@arm.com; Grodzovsky, Andrey > >Subject: [PATCH v4] drm/scheduler: Avoid accessing freed bad job. > >Problem: >Due to a race between drm_sched_cleanup_jobs in sched thread and >drm_sched_job_timedout in timeout work there is a possiblity that bad job >was already freed while still being accessed from the timeout thread. > >Fix: >Instead of just peeking at the bad job in the mirror list remove it from the >list >under lock and then put it back later when we are garanteed no race with >main sched thread is possible which is after the thread is parked. > >v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs. > >v3: Rebase on top of drm-misc-next. v2 is not needed anymore as >drm_sched_get_cleanup_job already has a lock there. > >v4: Fix comments to relfect latest code in drm-misc. > >Signed-off-by: Andrey Grodzovsky >Reviewed-by: Christian König >Tested-by: Emily Deng >--- > drivers/gpu/drm/scheduler/sched_main.c | 27 >+++ > 1 file changed, 27 insertions(+) > >diff --git a/drivers/gpu/drm/scheduler/sched_main.c >b/drivers/gpu/drm/scheduler/sched_main.c >index 6774955..1bf9c40 100644 >--- a/drivers/gpu/drm/scheduler/sched_main.c >+++ b/drivers/gpu/drm/scheduler/sched_main.c >@@ -284,10 +284,21 @@ static void drm_sched_job_timedout(struct >work_struct *work) > unsigned long flags; > > sched = container_of(work, struct drm_gpu_scheduler, >work_tdr.work); >+ >+ /* Protects against concurrent deletion in >drm_sched_get_cleanup_job */ >+ spin_lock_irqsave(&sched->job_list_lock, flags); > job = list_first_entry_or_null(&sched->ring_mirror_list, > struct drm_sched_job, node); > > if (job) { >+ /* >+ * Remove the bad job so it cannot be freed by concurrent >+ * drm_sched_cleanup_jobs. It will be reinserted back after >sched->thread >+ * is parked at which point it's safe. >+ */ >+ list_del_init(&job->node); >+ spin_unlock_irqrestore(&sched->job_list_lock, flags); >+ > job->sched->ops->timedout_job(job); > > /* >@@ -298,6 +309,8 @@ static void drm_sched_job_timedout(struct >work_struct *work) > job->sched->ops->free_job(job); > sched->free_guilty = false; > } >+ } else { >+ spin_unlock_irqrestore(&sched->job_list_lock, flags); > } > > spin_lock_irqsave(&sched->job_list_lock, flags); @@ -370,6 +383,20 >@@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct >drm_sched_job *bad) > kthread_park(sched->thread); > > /* >+ * Reinsert back the bad job here - now it's safe as >+ * drm_sched_get_cleanup_job cannot race against us and release the >+ * bad job at this point - we parked (waited for) any in progress >+ * (earlier) cleanups and drm_sched_get_cleanup_job will not be >called >+ * now until the scheduler thread is unparked. >+ */ >+ if (bad && bad->sched == sched) >+ /* >+ * Add at the head of the queue to reflect it was the earliest >+ * job extracted. >+ */ >+ list_add(&bad->node, &sched->ring_mirror_list); >+ >+ /* >* Iterate the job list from later to earlier one and either deactive >* their HW callbacks or remove them from mirror list if they already >* signaled. >-- >2.7.4 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/sched: Fix passing zero to 'PTR_ERR' warning
On 10/29/19 2:03 PM, Dan Carpenter wrote: > On Tue, Oct 29, 2019 at 11:04:44AM -0400, Andrey Grodzovsky wrote: >> Fix a static code checker warning. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index f39b97e..898b0c9 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -497,7 +497,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler >> *sched) >> fence = sched->ops->run_job(s_job); >> >> if (IS_ERR_OR_NULL(fence)) { >> s_job->s_fence->parent = NULL; >> -dma_fence_set_error(&s_fence->finished, PTR_ERR(fence)); >> +dma_fence_set_error(&s_fence->finished, >> PTR_ERR_OR_ZERO(fence)); > I feel like I should explain better. It's generally bad to mix NULL and > error pointers. The situation where you would do it is when NULL is a > special case of success. A typical situation is you request a feature, > like maybe logging for example: > > p = get_logger(); > > If there isn't enough memory then get_logger() returns ERR_PTR(-ENOMEM); > but if the user has disabled logging then we can't return a valid > pointer but it's also not an error so we return NULL. It's a special > case of success. > > In this situation sched->ops->run_job(s_job); appears to only ever > return NULL and it's not a special case of success, it's a regular old > error. I guess we are transitioning from returning NULL to returning > error pointers? No, check patch 'drm/amdgpu: If amdgpu_ib_schedule fails return back the error.' , amdgpu_job_run will pack an actual error code into ERR_PTR Andrey > > So we should just do something like: > > fence = sched->ops->run_job(s_job); > > /* FIXME: Oct 2019: Remove this code when fence can't be NULL. */ > if (!fence) > fence = ERR_PTR(-EINVAL); > > if (IS_ERR(fence)) { > ... > > regards, > dan carpenter > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/2] drm/sched: Set error to s_fence if HW job submission failed.
On 10/25/19 11:55 AM, Koenig, Christian wrote: > Am 25.10.19 um 16:57 schrieb Grodzovsky, Andrey: >> On 10/25/19 4:44 AM, Christian König wrote: >>> Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky: >>>> Problem: >>>> When run_job fails and HW fence returned is NULL we still signal >>>> the s_fence to avoid hangs but the user has no way of knowing if >>>> the actual HW job was ran and finished. >>>> >>>> Fix: >>>> Allow .run_job implementations to return ERR_PTR in the fence pointer >>>> returned and then set this error for s_fence->finished fence so whoever >>>> wait on this fence can inspect the signaled fence for an error. >>>> >>>> Signed-off-by: Andrey Grodzovsky >>>> --- >>>> drivers/gpu/drm/scheduler/sched_main.c | 19 --- >>>> 1 file changed, 16 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>> index 9a0ee74..f39b97e 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -479,6 +479,7 @@ void drm_sched_resubmit_jobs(struct >>>> drm_gpu_scheduler *sched) >>>> struct drm_sched_job *s_job, *tmp; >>>> uint64_t guilty_context; >>>> bool found_guilty = false; >>>> + struct dma_fence *fence; >>>> list_for_each_entry_safe(s_job, tmp, >>>> &sched->ring_mirror_list, node) { >>>> struct drm_sched_fence *s_fence = s_job->s_fence; >>>> @@ -492,7 +493,16 @@ void drm_sched_resubmit_jobs(struct >>>> drm_gpu_scheduler *sched) >>>> dma_fence_set_error(&s_fence->finished, -ECANCELED); >>>> dma_fence_put(s_job->s_fence->parent); >>>> - s_job->s_fence->parent = sched->ops->run_job(s_job); >>>> + fence = sched->ops->run_job(s_job); >>>> + >>>> + if (IS_ERR_OR_NULL(fence)) { >>>> + s_job->s_fence->parent = NULL; >>>> + dma_fence_set_error(&s_fence->finished, PTR_ERR(fence)); >>>> + } else { >>>> + s_job->s_fence->parent = fence; >>>> + } >>>> + >>>> + >>> Maybe time for a drm_sched_run_job() function which does that >>> handling? And why don't we need to install the callback here? >> What code do you want to put in drm_sched_run_job ? >> >> We reinstall the callback later in drm_sched_start, >> drm_sched_resubmit_jobs is conditional on whether the guilty fence did >> signal by this time or not and so the split of the logic into >> drm_sched_start and drm_sched_resubmit_jobs. > Ah, yes of course. In this case the patch is Reviewed-by: Christian > König . > > Regards, > Christian. Thanks, there is also 2/2 patch for amdgpu, please take a look. Andrey > >> Andrey >> >> >>> Apart from that looks good to me, >>> Christian. >>> >>>> } >>>> } >>>> EXPORT_SYMBOL(drm_sched_resubmit_jobs); >>>> @@ -720,7 +730,7 @@ static int drm_sched_main(void *param) >>>> fence = sched->ops->run_job(sched_job); >>>> drm_sched_fence_scheduled(s_fence); >>>> - if (fence) { >>>> + if (!IS_ERR_OR_NULL(fence)) { >>>> s_fence->parent = dma_fence_get(fence); >>>> r = dma_fence_add_callback(fence, &sched_job->cb, >>>> drm_sched_process_job); >>>> @@ -730,8 +740,11 @@ static int drm_sched_main(void *param) >>>> DRM_ERROR("fence add callback failed (%d)\n", >>>> r); >>>> dma_fence_put(fence); >>>> - } else >>>> + } else { >>>> + >>>> + dma_fence_set_error(&s_fence->finished, PTR_ERR(fence)); >>>> drm_sched_process_job(NULL, &sched_job->cb); >>>> + } >>>> wake_up(&sched->job_scheduled); >>>> } ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/2] drm/sched: Set error to s_fence if HW job submission failed.
On 10/25/19 4:44 AM, Christian König wrote: > Am 24.10.19 um 21:57 schrieb Andrey Grodzovsky: >> Problem: >> When run_job fails and HW fence returned is NULL we still signal >> the s_fence to avoid hangs but the user has no way of knowing if >> the actual HW job was ran and finished. >> >> Fix: >> Allow .run_job implementations to return ERR_PTR in the fence pointer >> returned and then set this error for s_fence->finished fence so whoever >> wait on this fence can inspect the signaled fence for an error. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 19 --- >> 1 file changed, 16 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 9a0ee74..f39b97e 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -479,6 +479,7 @@ void drm_sched_resubmit_jobs(struct >> drm_gpu_scheduler *sched) >> struct drm_sched_job *s_job, *tmp; >> uint64_t guilty_context; >> bool found_guilty = false; >> + struct dma_fence *fence; >> list_for_each_entry_safe(s_job, tmp, >> &sched->ring_mirror_list, node) { >> struct drm_sched_fence *s_fence = s_job->s_fence; >> @@ -492,7 +493,16 @@ void drm_sched_resubmit_jobs(struct >> drm_gpu_scheduler *sched) >> dma_fence_set_error(&s_fence->finished, -ECANCELED); >> dma_fence_put(s_job->s_fence->parent); >> - s_job->s_fence->parent = sched->ops->run_job(s_job); >> + fence = sched->ops->run_job(s_job); >> + >> + if (IS_ERR_OR_NULL(fence)) { >> + s_job->s_fence->parent = NULL; >> + dma_fence_set_error(&s_fence->finished, PTR_ERR(fence)); >> + } else { >> + s_job->s_fence->parent = fence; >> + } >> + >> + > > Maybe time for a drm_sched_run_job() function which does that > handling? And why don't we need to install the callback here? What code do you want to put in drm_sched_run_job ? We reinstall the callback later in drm_sched_start, drm_sched_resubmit_jobs is conditional on whether the guilty fence did signal by this time or not and so the split of the logic into drm_sched_start and drm_sched_resubmit_jobs. Andrey > > Apart from that looks good to me, > Christian. > >> } >> } >> EXPORT_SYMBOL(drm_sched_resubmit_jobs); >> @@ -720,7 +730,7 @@ static int drm_sched_main(void *param) >> fence = sched->ops->run_job(sched_job); >> drm_sched_fence_scheduled(s_fence); >> - if (fence) { >> + if (!IS_ERR_OR_NULL(fence)) { >> s_fence->parent = dma_fence_get(fence); >> r = dma_fence_add_callback(fence, &sched_job->cb, >> drm_sched_process_job); >> @@ -730,8 +740,11 @@ static int drm_sched_main(void *param) >> DRM_ERROR("fence add callback failed (%d)\n", >> r); >> dma_fence_put(fence); >> - } else >> + } else { >> + >> + dma_fence_set_error(&s_fence->finished, PTR_ERR(fence)); >> drm_sched_process_job(NULL, &sched_job->cb); >> + } >> wake_up(&sched->job_scheduled); >> } > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm_sched with panfrost crash on T820
On 10/3/19 4:34 AM, Neil Armstrong wrote: > Hi Andrey, > > Le 02/10/2019 à 16:40, Grodzovsky, Andrey a écrit : >> On 9/30/19 10:52 AM, Hillf Danton wrote: >>> On Mon, 30 Sep 2019 11:17:45 +0200 Neil Armstrong wrote: >>>> Did a new run from 5.3: >>>> >>>> [ 35.971972] Call trace: >>>> [ 35.974391] drm_sched_increase_karma+0x5c/0xf0 >>>>10667f3810667F94 >>>>drivers/gpu/drm/scheduler/sched_main.c:335 >>>> >>>> The crashing line is : >>>> if (bad->s_fence->scheduled.context == >>>> entity->fence_context) { >>>> >>>> Doesn't seem related to guilty job. >>> Bail out if s_fence is no longer fresh. >>> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -333,6 +333,10 @@ void drm_sched_increase_karma(struct drm >>> >>> spin_lock(&rq->lock); >>> list_for_each_entry_safe(entity, tmp, &rq->entities, >>> list) { >>> + if (!smp_load_acquire(&bad->s_fence)) { >>> + spin_unlock(&rq->lock); >>> + return; >>> + } >>> if (bad->s_fence->scheduled.context == >>> entity->fence_context) { >>> if (atomic_read(&bad->karma) > >>> @@ -543,7 +547,7 @@ EXPORT_SYMBOL(drm_sched_job_init); >>>void drm_sched_job_cleanup(struct drm_sched_job *job) >>>{ >>> dma_fence_put(&job->s_fence->finished); >>> - job->s_fence = NULL; >>> + smp_store_release(&job->s_fence, 0); >>>} >>>EXPORT_SYMBOL(drm_sched_job_cleanup); > This fixed the problem on the 10 CI runs. > > Neil These are good news but I still fail to see how this fixes the problem - Hillf, do you mind explaining how you came up with this particular fix - what was the bug you saw ? Andrey > >> Does this change help the problem ? Note that drm_sched_job_cleanup is >> called from scheduler thread which is stopped at all times when work_tdr >> thread is running and anyway the 'bad' job is still in the >> ring_mirror_list while it's being accessed from >> drm_sched_increase_karma so I don't think drm_sched_job_cleanup can be >> called for it BEFORE or while drm_sched_increase_karma is executed. >> >> Andrey >> >> >>> >>> -- >>> ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm_sched with panfrost crash on T820
On 9/30/19 5:17 AM, Neil Armstrong wrote: > Hi Andrey, > > On 27/09/2019 22:55, Grodzovsky, Andrey wrote: >> Can you please use addr2line or gdb to pinpoint where in >> drm_sched_increase_karma you hit the NULL ptr ? It looks like the guilty >> job, but to be sure. > Did a new run from 5.3: > > [ 35.971972] Call trace: > [ 35.974391] drm_sched_increase_karma+0x5c/0xf010667f38 > 10667F94drivers/gpu/drm/scheduler/sched_main.c:335 > > > The crashing line is : > if (bad->s_fence->scheduled.context == > entity->fence_context) { > > Doesn't seem related to guilty job. > > Neil Thanks Neil, by guilty i meant the 'bad' job. I reviewed the code and can't see anything suspicious for now. To help clarify could you please provide ftrace log for this ? All the dma_fence and gpu_scheduler traces can help. I usually just set them all up in one line using trace-cmd utility like this before starting the run. If you have any relevant traces in panfrost it aslo can be useful. sudo trace-cmd start -e dma_fence -e gpu_scheduler Andrey > >> Andrey >> >> On 9/27/19 4:12 AM, Neil Armstrong wrote: >>> Hi Christian, >>> >>> In v5.3, running dEQP triggers the following kernel crash : >>> >>> [ 20.224982] Unable to handle kernel NULL pointer dereference at virtual >>> address 0038 >>> [...] >>> [ 20.291064] Hardware name: Khadas VIM2 (DT) >>> [ 20.295217] Workqueue: events drm_sched_job_timedout >>> [...] >>> [ 20.304867] pc : drm_sched_increase_karma+0x5c/0xf0 >>> [ 20.309696] lr : drm_sched_increase_karma+0x44/0xf0 >>> [...] >>> [ 20.396720] Call trace: >>> [ 20.399138] drm_sched_increase_karma+0x5c/0xf0 >>> [ 20.403623] panfrost_job_timedout+0x12c/0x1e0 >>> [ 20.408021] drm_sched_job_timedout+0x48/0xa0 >>> [ 20.412336] process_one_work+0x1e0/0x320 >>> [ 20.416300] worker_thread+0x40/0x450 >>> [ 20.419924] kthread+0x124/0x128 >>> [ 20.423116] ret_from_fork+0x10/0x18 >>> [ 20.426653] Code: f941 540001c0 f9400a83 f9402402 (f9401c64) >>> [ 20.432690] ---[ end trace bd02f890139096a7 ]--- >>> >>> Which never happens, at all, on v5.2. >>> >>> I did a (very) long (7 days, ~100runs) bisect run using our LAVA lab >>> (thanks tomeu !), but >>> bisecting was not easy since the bad commit landed on drm-misc-next after >>> v5.1-rc6, and >>> then v5.2-rc1 was backmerged into drm-misc-next at: >>> [1] 374ed5429346 Merge drm/drm-next into drm-misc-next >>> >>> Thus bisecting between [1] ang v5.2-rc1 leads to commit based on >>> v5.2-rc1... where panfrost was >>> not enabled in the Khadas VIM2 DT. >>> >>> Anyway, I managed to identify 3 possibly breaking commits : >>> [2] 290764af7e36 drm/sched: Keep s_fence->parent pointer >>> [3] 5918045c4ed4 drm/scheduler: rework job destruction >>> [4] a5343b8a2ca5 drm/scheduler: Add flag to hint the release of guilty job. >>> >>> But [1] and [2] doesn't crash the same way : >>> [ 16.257912] Unable to handle kernel NULL pointer dereference at virtual >>> address 0060 >>> [...] >>> [ 16.308307] CPU: 4 PID: 80 Comm: kworker/4:1 Not tainted >>> 5.1.0-rc2-01185-g290764af7e36-dirty #378 >>> [ 16.317099] Hardware name: Khadas VIM2 (DT) >>> [...]) >>> [ 16.330907] pc : refcount_sub_and_test_checked+0x4/0xb0 >>> [ 16.336078] lr : refcount_dec_and_test_checked+0x14/0x20 >>> [...] >>> [ 16.423533] Process kworker/4:1 (pid: 80, stack limit = >>> 0x(ptrval)) >>> [ 16.430431] Call trace: >>> [ 16.432851] refcount_sub_and_test_checked+0x4/0xb0 >>> [ 16.437681] drm_sched_job_cleanup+0x24/0x58 >>> [ 16.441908] panfrost_job_free+0x14/0x28 >>> [ 16.445787] drm_sched_job_timedout+0x6c/0xa0 >>> [ 16.450102] process_one_work+0x1e0/0x320 >>> [ 16.454067] worker_thread+0x40/0x450 >>> [ 16.457690] kthread+0x124/0x128 >>> [ 16.460882] ret_from_fork+0x10/0x18 >>> [ 16.464421] Code: 5280 d65f03c0 d503201f aa0103e3 (b9400021) >>> [ 16.470456] ---[ end trace 39a67412ee1b64b5 ]--- >>> >>> and [3] fails like on v5.3 (in drm_sched_increase_karma): >>> [ 33.830080] Unable to handle kernel NULL pointer dereference at virtual >>> address 00
Re: drm_sched with panfrost crash on T820
On 9/30/19 10:52 AM, Hillf Danton wrote: > On Mon, 30 Sep 2019 11:17:45 +0200 Neil Armstrong wrote: >> Did a new run from 5.3: >> >> [ 35.971972] Call trace: >> [ 35.974391] drm_sched_increase_karma+0x5c/0xf0 >> 10667f3810667F94 >> drivers/gpu/drm/scheduler/sched_main.c:335 >> >> The crashing line is : >> if (bad->s_fence->scheduled.context == >> entity->fence_context) { >> >> Doesn't seem related to guilty job. > Bail out if s_fence is no longer fresh. > > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -333,6 +333,10 @@ void drm_sched_increase_karma(struct drm > > spin_lock(&rq->lock); > list_for_each_entry_safe(entity, tmp, &rq->entities, > list) { > + if (!smp_load_acquire(&bad->s_fence)) { > + spin_unlock(&rq->lock); > + return; > + } > if (bad->s_fence->scheduled.context == > entity->fence_context) { > if (atomic_read(&bad->karma) > > @@ -543,7 +547,7 @@ EXPORT_SYMBOL(drm_sched_job_init); > void drm_sched_job_cleanup(struct drm_sched_job *job) > { > dma_fence_put(&job->s_fence->finished); > - job->s_fence = NULL; > + smp_store_release(&job->s_fence, 0); > } > EXPORT_SYMBOL(drm_sched_job_cleanup); Does this change help the problem ? Note that drm_sched_job_cleanup is called from scheduler thread which is stopped at all times when work_tdr thread is running and anyway the 'bad' job is still in the ring_mirror_list while it's being accessed from drm_sched_increase_karma so I don't think drm_sched_job_cleanup can be called for it BEFORE or while drm_sched_increase_karma is executed. Andrey > > -- >
Re: drm_sched with panfrost crash on T820
Can you please use addr2line or gdb to pinpoint where in drm_sched_increase_karma you hit the NULL ptr ? It looks like the guilty job, but to be sure. Andrey On 9/27/19 4:12 AM, Neil Armstrong wrote: > Hi Christian, > > In v5.3, running dEQP triggers the following kernel crash : > > [ 20.224982] Unable to handle kernel NULL pointer dereference at virtual > address 0038 > [...] > [ 20.291064] Hardware name: Khadas VIM2 (DT) > [ 20.295217] Workqueue: events drm_sched_job_timedout > [...] > [ 20.304867] pc : drm_sched_increase_karma+0x5c/0xf0 > [ 20.309696] lr : drm_sched_increase_karma+0x44/0xf0 > [...] > [ 20.396720] Call trace: > [ 20.399138] drm_sched_increase_karma+0x5c/0xf0 > [ 20.403623] panfrost_job_timedout+0x12c/0x1e0 > [ 20.408021] drm_sched_job_timedout+0x48/0xa0 > [ 20.412336] process_one_work+0x1e0/0x320 > [ 20.416300] worker_thread+0x40/0x450 > [ 20.419924] kthread+0x124/0x128 > [ 20.423116] ret_from_fork+0x10/0x18 > [ 20.426653] Code: f941 540001c0 f9400a83 f9402402 (f9401c64) > [ 20.432690] ---[ end trace bd02f890139096a7 ]--- > > Which never happens, at all, on v5.2. > > I did a (very) long (7 days, ~100runs) bisect run using our LAVA lab (thanks > tomeu !), but > bisecting was not easy since the bad commit landed on drm-misc-next after > v5.1-rc6, and > then v5.2-rc1 was backmerged into drm-misc-next at: > [1] 374ed5429346 Merge drm/drm-next into drm-misc-next > > Thus bisecting between [1] ang v5.2-rc1 leads to commit based on v5.2-rc1... > where panfrost was > not enabled in the Khadas VIM2 DT. > > Anyway, I managed to identify 3 possibly breaking commits : > [2] 290764af7e36 drm/sched: Keep s_fence->parent pointer > [3] 5918045c4ed4 drm/scheduler: rework job destruction > [4] a5343b8a2ca5 drm/scheduler: Add flag to hint the release of guilty job. > > But [1] and [2] doesn't crash the same way : > [ 16.257912] Unable to handle kernel NULL pointer dereference at virtual > address 0060 > [...] > [ 16.308307] CPU: 4 PID: 80 Comm: kworker/4:1 Not tainted > 5.1.0-rc2-01185-g290764af7e36-dirty #378 > [ 16.317099] Hardware name: Khadas VIM2 (DT) > [...]) > [ 16.330907] pc : refcount_sub_and_test_checked+0x4/0xb0 > [ 16.336078] lr : refcount_dec_and_test_checked+0x14/0x20 > [...] > [ 16.423533] Process kworker/4:1 (pid: 80, stack limit = 0x(ptrval)) > [ 16.430431] Call trace: > [ 16.432851] refcount_sub_and_test_checked+0x4/0xb0 > [ 16.437681] drm_sched_job_cleanup+0x24/0x58 > [ 16.441908] panfrost_job_free+0x14/0x28 > [ 16.445787] drm_sched_job_timedout+0x6c/0xa0 > [ 16.450102] process_one_work+0x1e0/0x320 > [ 16.454067] worker_thread+0x40/0x450 > [ 16.457690] kthread+0x124/0x128 > [ 16.460882] ret_from_fork+0x10/0x18 > [ 16.464421] Code: 5280 d65f03c0 d503201f aa0103e3 (b9400021) > [ 16.470456] ---[ end trace 39a67412ee1b64b5 ]--- > > and [3] fails like on v5.3 (in drm_sched_increase_karma): > [ 33.830080] Unable to handle kernel NULL pointer dereference at virtual > address 0038 > [...] > [ 33.871946] Internal error: Oops: 9604 [#1] PREEMPT SMP > [ 33.877450] Modules linked in: > [ 33.880474] CPU: 6 PID: 81 Comm: kworker/6:1 Not tainted > 5.1.0-rc2-01186-ga5343b8a2ca5-dirty #380 > [ 33.889265] Hardware name: Khadas VIM2 (DT) > [ 33.893419] Workqueue: events drm_sched_job_timedout > [...] > [ 33.903069] pc : drm_sched_increase_karma+0x5c/0xf0 > [ 33.907898] lr : drm_sched_increase_karma+0x44/0xf0 > [...] > [ 33.994924] Process kworker/6:1 (pid: 81, stack limit = 0x(ptrval)) > [ 34.001822] Call trace: > [ 34.004242] drm_sched_increase_karma+0x5c/0xf0 > [ 34.008726] panfrost_job_timedout+0x12c/0x1e0 > [ 34.013122] drm_sched_job_timedout+0x48/0xa0 > [ 34.017438] process_one_work+0x1e0/0x320 > [ 34.021402] worker_thread+0x40/0x450 > [ 34.025026] kthread+0x124/0x128 > [ 34.028218] ret_from_fork+0x10/0x18 > [ 34.031755] Code: f941 540001c0 f9400a83 f9402402 (f9401c64) > [ 34.037792] ---[ end trace be3fd6f77f4df267 ]--- > > > When I revert [3] on [1], i get the same crash as [2], meaning > the commit [3] masks the failure [2] introduced. > > Do you know how to solve this ? > > Thanks, > Neil
Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible()
On 9/26/19 11:59 AM, Steven Price wrote: > On 26/09/2019 16:48, Grodzovsky, Andrey wrote: >> On 9/26/19 11:23 AM, Steven Price wrote: >>> On 26/09/2019 16:14, Grodzovsky, Andrey wrote: >>>> On 9/26/19 10:16 AM, Steven Price wrote: >>>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >>>>> it is called as the condition of wait_event_interruptible() it must not >>>>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. >>>>> >>>>> Instead let's rename drm_sched_cleanup_jobs() to >>>>> drm_sched_get_cleanup_job() and simply return a job for processing if >>>>> there is one. The caller can then call the free_job() callback outside >>>>> the wait_event_interruptible() where sleeping is possible before >>>>> re-checking and returning to sleep if necessary. >>>>> >>>>> Signed-off-by: Steven Price >>>>> --- >>>>> Changes from v3: >>>>> * drm_sched_main() re-arms the timeout for the next job after calling >>>>> free_job() >>>>> >>>>> drivers/gpu/drm/scheduler/sched_main.c | 45 +++--- >>>>> 1 file changed, 26 insertions(+), 19 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>>> index 9a0ee74d82dc..148468447ba9 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence >>>>> *f, struct dma_fence_cb *cb) >>>>> } >>>>> >>>>> /** >>>>> - * drm_sched_cleanup_jobs - destroy finished jobs >>>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be >>>>> destroyed >>>>> * >>>>> * @sched: scheduler instance >>>>> * >>>>> - * Remove all finished jobs from the mirror list and destroy them. >>>>> + * Returns the next finished job from the mirror list (if there is one) >>>>> + * ready for it to be destroyed. >>>>> */ >>>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>>>> +static struct drm_sched_job * >>>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>>>> { >>>>> + struct drm_sched_job *job = NULL; >>>>> unsigned long flags; >>>>> >>>>> /* Don't destroy jobs while the timeout worker is running */ >>>>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>>> !cancel_delayed_work(&sched->work_tdr)) >>>>> - return; >>>>> - >>>>> + return NULL; >>>>> >>>>> - while (!list_empty(&sched->ring_mirror_list)) { >>>>> - struct drm_sched_job *job; >>>>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>>>> >>>>> - job = list_first_entry(&sched->ring_mirror_list, >>>>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>>>> struct drm_sched_job, node); >>>>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>>>> - break; >>>>> >>>>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>>>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>>>> /* remove job from ring_mirror_list */ >>>>> list_del_init(&job->node); >>>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>>> - >>>>> - sched->ops->free_job(job); >>>>> + } else { >>>>> + job = NULL; >>>>> + /* queue timeout for next job */ >>>>> + drm_sched_start_timeout(sched); >>>>> } >>>>> >>>>> - /* queue timeout for next job */ >>>>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>>>> - drm_sched_start_ti
Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible()
On 9/26/19 11:23 AM, Steven Price wrote: > On 26/09/2019 16:14, Grodzovsky, Andrey wrote: >> On 9/26/19 10:16 AM, Steven Price wrote: >>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >>> it is called as the condition of wait_event_interruptible() it must not >>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. >>> >>> Instead let's rename drm_sched_cleanup_jobs() to >>> drm_sched_get_cleanup_job() and simply return a job for processing if >>> there is one. The caller can then call the free_job() callback outside >>> the wait_event_interruptible() where sleeping is possible before >>> re-checking and returning to sleep if necessary. >>> >>> Signed-off-by: Steven Price >>> --- >>> Changes from v3: >>>* drm_sched_main() re-arms the timeout for the next job after calling >>> free_job() >>> >>>drivers/gpu/drm/scheduler/sched_main.c | 45 +++--- >>>1 file changed, 26 insertions(+), 19 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index 9a0ee74d82dc..148468447ba9 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence >>> *f, struct dma_fence_cb *cb) >>>} >>> >>>/** >>> - * drm_sched_cleanup_jobs - destroy finished jobs >>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >>> * >>> * @sched: scheduler instance >>> * >>> - * Remove all finished jobs from the mirror list and destroy them. >>> + * Returns the next finished job from the mirror list (if there is one) >>> + * ready for it to be destroyed. >>> */ >>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>> +static struct drm_sched_job * >>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>>{ >>> + struct drm_sched_job *job = NULL; >>> unsigned long flags; >>> >>> /* Don't destroy jobs while the timeout worker is running */ >>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>> !cancel_delayed_work(&sched->work_tdr)) >>> - return; >>> - >>> + return NULL; >>> >>> - while (!list_empty(&sched->ring_mirror_list)) { >>> - struct drm_sched_job *job; >>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>> >>> - job = list_first_entry(&sched->ring_mirror_list, >>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>>struct drm_sched_job, node); >>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>> - break; >>> >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>> /* remove job from ring_mirror_list */ >>> list_del_init(&job->node); >>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> - >>> - sched->ops->free_job(job); >>> + } else { >>> + job = NULL; >>> + /* queue timeout for next job */ >>> + drm_sched_start_timeout(sched); >>> } >>> >>> - /* queue timeout for next job */ >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> - drm_sched_start_timeout(sched); >>> spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> >>> + return job; >>>} >>> >>>/** >>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) >>> struct drm_sched_fence *s_fence; >>> struct drm_sched_job *sched_job; >>> struct dma_fence *fence; >>> + struct drm_sched_job *cleanup_job = NULL; >>> >>> wait_event_interruptible(sched->wake_up_worker, >>> -(drm_sched_cleanup_jobs(sched), >>> +(cleanup_job = >>> drm_sched_get_cleanup_job(sched))
Re: [PATCH v4] drm: Don't free jobs in wait_event_interruptible()
On 9/26/19 10:16 AM, Steven Price wrote: > drm_sched_cleanup_jobs() attempts to free finished jobs, however because > it is called as the condition of wait_event_interruptible() it must not > sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. > > Instead let's rename drm_sched_cleanup_jobs() to > drm_sched_get_cleanup_job() and simply return a job for processing if > there is one. The caller can then call the free_job() callback outside > the wait_event_interruptible() where sleeping is possible before > re-checking and returning to sleep if necessary. > > Signed-off-by: Steven Price > --- > Changes from v3: > * drm_sched_main() re-arms the timeout for the next job after calling > free_job() > > drivers/gpu/drm/scheduler/sched_main.c | 45 +++--- > 1 file changed, 26 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 9a0ee74d82dc..148468447ba9 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, > struct dma_fence_cb *cb) > } > > /** > - * drm_sched_cleanup_jobs - destroy finished jobs > + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >* >* @sched: scheduler instance >* > - * Remove all finished jobs from the mirror list and destroy them. > + * Returns the next finished job from the mirror list (if there is one) > + * ready for it to be destroyed. >*/ > -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) > +static struct drm_sched_job * > +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) > { > + struct drm_sched_job *job = NULL; > unsigned long flags; > > /* Don't destroy jobs while the timeout worker is running */ > if (sched->timeout != MAX_SCHEDULE_TIMEOUT && > !cancel_delayed_work(&sched->work_tdr)) > - return; > - > + return NULL; > > - while (!list_empty(&sched->ring_mirror_list)) { > - struct drm_sched_job *job; > + spin_lock_irqsave(&sched->job_list_lock, flags); > > - job = list_first_entry(&sched->ring_mirror_list, > + job = list_first_entry_or_null(&sched->ring_mirror_list, > struct drm_sched_job, node); > - if (!dma_fence_is_signaled(&job->s_fence->finished)) > - break; > > - spin_lock_irqsave(&sched->job_list_lock, flags); > + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { > /* remove job from ring_mirror_list */ > list_del_init(&job->node); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > - > - sched->ops->free_job(job); > + } else { > + job = NULL; > + /* queue timeout for next job */ > + drm_sched_start_timeout(sched); > } > > - /* queue timeout for next job */ > - spin_lock_irqsave(&sched->job_list_lock, flags); > - drm_sched_start_timeout(sched); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > > + return job; > } > > /** > @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) > struct drm_sched_fence *s_fence; > struct drm_sched_job *sched_job; > struct dma_fence *fence; > + struct drm_sched_job *cleanup_job = NULL; > > wait_event_interruptible(sched->wake_up_worker, > - (drm_sched_cleanup_jobs(sched), > + (cleanup_job = > drm_sched_get_cleanup_job(sched)) || >(!drm_sched_blocked(sched) && > (entity = > drm_sched_select_entity(sched))) || > - kthread_should_stop())); > + kthread_should_stop()); > + > + while (cleanup_job) { > + sched->ops->free_job(cleanup_job); > + /* queue timeout for next job */ > + drm_sched_start_timeout(sched); > + > + cleanup_job = drm_sched_get_cleanup_job(sched); > + } Why drm_sched_start_timeout is called both here and inside drm_sched_get_cleanup_job ? And also why call it multiple times in the loop instead of only once after the loop is done ? Andrey > > if (!entity) > continue; ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()
On 9/26/19 3:07 AM, Koenig, Christian wrote: > Am 25.09.19 um 17:14 schrieb Steven Price: >> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >> it is called as the condition of wait_event_interruptible() it must not >> sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep. >> >> Instead let's rename drm_sched_cleanup_jobs() to >> drm_sched_get_cleanup_job() and simply return a job for processing if >> there is one. The caller can then call the free_job() callback outside >> the wait_event_interruptible() where sleeping is possible before >> re-checking and returning to sleep if necessary. >> >> Signed-off-by: Steven Price >> --- >>drivers/gpu/drm/scheduler/sched_main.c | 44 ++ >>1 file changed, 24 insertions(+), 20 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 9a0ee74d82dc..0ed4aaa4e6d1 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, >> struct dma_fence_cb *cb) >>} >> >>/** >> - * drm_sched_cleanup_jobs - destroy finished jobs >> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >> * >> * @sched: scheduler instance >> * >> - * Remove all finished jobs from the mirror list and destroy them. >> + * Returns the next finished job from the mirror list (if there is one) >> + * ready for it to be destroyed. >> */ >> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >> +static struct drm_sched_job * >> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>{ >> +struct drm_sched_job *job = NULL; >> unsigned long flags; >> >> /* Don't destroy jobs while the timeout worker is running */ >> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >> !cancel_delayed_work(&sched->work_tdr)) >> -return; >> - >> - >> -while (!list_empty(&sched->ring_mirror_list)) { >> -struct drm_sched_job *job; >> +return NULL; >> >> -job = list_first_entry(&sched->ring_mirror_list, >> +job = list_first_entry_or_null(&sched->ring_mirror_list, >> struct drm_sched_job, node); > This is probably better done after taking the lock, apart from that > looks good to me. > > Christian. Why is it necessary if insertion and removal from the ring_mirror_list are only done from the same scheduler thread ? Andrey > >> -if (!dma_fence_is_signaled(&job->s_fence->finished)) >> -break; >> >> -spin_lock_irqsave(&sched->job_list_lock, flags); >> +spin_lock_irqsave(&sched->job_list_lock, flags); >> + >> +if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >> /* remove job from ring_mirror_list */ >> list_del_init(&job->node); >> -spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> -sched->ops->free_job(job); >> +} else { >> +job = NULL; >> +/* queue timeout for next job */ >> +drm_sched_start_timeout(sched); >> } >> >> -/* queue timeout for next job */ >> -spin_lock_irqsave(&sched->job_list_lock, flags); >> -drm_sched_start_timeout(sched); >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> >> +return job; >>} >> >>/** >> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param) >> struct drm_sched_fence *s_fence; >> struct drm_sched_job *sched_job; >> struct dma_fence *fence; >> +struct drm_sched_job *cleanup_job = NULL; >> >> wait_event_interruptible(sched->wake_up_worker, >> - (drm_sched_cleanup_jobs(sched), >> + (cleanup_job = >> drm_sched_get_cleanup_job(sched)) || >> (!drm_sched_blocked(sched) && >>(entity = >> drm_sched_select_entity(sched))) || >> - kthread_should_stop())); >> + kthread_should_stop()); >> + >> +while (cleanup_job) { >> +sched->ops->free_job(cleanup_job); >> +cleanup_job = drm_sched_get_cleanup_job(sched); >> +} >> >> if (!entity) >> continue; ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()
On 9/26/19 5:41 AM, Steven Price wrote: > On 25/09/2019 21:09, Grodzovsky, Andrey wrote: >> On 9/25/19 12:07 PM, Andrey Grodzovsky wrote: >>> On 9/25/19 12:00 PM, Steven Price wrote: >>> >>>> On 25/09/2019 16:56, Grodzovsky, Andrey wrote: >>>>> On 9/25/19 11:14 AM, Steven Price wrote: >>>>> >>>>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however >>>>>> because >>>>>> it is called as the condition of wait_event_interruptible() it must >>>>>> not >>>>>> sleep. Unfortunately some free callbacks (notably for Panfrost) do >>>>>> sleep. >>>>>> >>>>>> Instead let's rename drm_sched_cleanup_jobs() to >>>>>> drm_sched_get_cleanup_job() and simply return a job for processing if >>>>>> there is one. The caller can then call the free_job() callback outside >>>>>> the wait_event_interruptible() where sleeping is possible before >>>>>> re-checking and returning to sleep if necessary. >>>>>> >>>>>> Signed-off-by: Steven Price >>>>>> --- >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 44 >>>>>> ++ >>>>>> 1 file changed, 24 insertions(+), 20 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644 >>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct >>>>>> dma_fence *f, struct dma_fence_cb *cb) >>>>>> } >>>>>> /** >>>>>> - * drm_sched_cleanup_jobs - destroy finished jobs >>>>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be >>>>>> destroyed >>>>>> * >>>>>> * @sched: scheduler instance >>>>>> * >>>>>> - * Remove all finished jobs from the mirror list and destroy them. >>>>>> + * Returns the next finished job from the mirror list (if there is >>>>>> one) >>>>>> + * ready for it to be destroyed. >>>>>> */ >>>>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>>>>> +static struct drm_sched_job * >>>>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>>>>> { >>>>>> + struct drm_sched_job *job = NULL; >>>>>> unsigned long flags; >>>>>> /* Don't destroy jobs while the timeout worker is running */ >>>>>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>>>> !cancel_delayed_work(&sched->work_tdr)) >>>>>> - return; >>>>>> - >>>>>> - >>>>>> - while (!list_empty(&sched->ring_mirror_list)) { >>>>>> - struct drm_sched_job *job; >>>>>> + return NULL; >>>>>> - job = list_first_entry(&sched->ring_mirror_list, >>>>>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>>>>> struct drm_sched_job, node); >>>>>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>>>>> - break; >>>>>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>>>>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>>>>> + >>>>>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>>>>> /* remove job from ring_mirror_list */ >>>>>> list_del_init(&job->node); >>>>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>>>> - >>>>>> - sched->ops->free_job(job); >>>>>> + } else { >>>>>> + job = NULL; >>>>>> + /* queue timeout for next job */ >>>>>> + drm_sched_start_timeout(sched); >>>>>> } >>>
Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()
On 9/25/19 12:07 PM, Andrey Grodzovsky wrote: > On 9/25/19 12:00 PM, Steven Price wrote: > >> On 25/09/2019 16:56, Grodzovsky, Andrey wrote: >>> On 9/25/19 11:14 AM, Steven Price wrote: >>> >>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however >>>> because >>>> it is called as the condition of wait_event_interruptible() it must >>>> not >>>> sleep. Unfortunately some free callbacks (notably for Panfrost) do >>>> sleep. >>>> >>>> Instead let's rename drm_sched_cleanup_jobs() to >>>> drm_sched_get_cleanup_job() and simply return a job for processing if >>>> there is one. The caller can then call the free_job() callback outside >>>> the wait_event_interruptible() where sleeping is possible before >>>> re-checking and returning to sleep if necessary. >>>> >>>> Signed-off-by: Steven Price >>>> --- >>>> drivers/gpu/drm/scheduler/sched_main.c | 44 >>>> ++ >>>> 1 file changed, 24 insertions(+), 20 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct >>>> dma_fence *f, struct dma_fence_cb *cb) >>>> } >>>> /** >>>> - * drm_sched_cleanup_jobs - destroy finished jobs >>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be >>>> destroyed >>>> * >>>> * @sched: scheduler instance >>>> * >>>> - * Remove all finished jobs from the mirror list and destroy them. >>>> + * Returns the next finished job from the mirror list (if there is >>>> one) >>>> + * ready for it to be destroyed. >>>> */ >>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>>> +static struct drm_sched_job * >>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>>> { >>>> + struct drm_sched_job *job = NULL; >>>> unsigned long flags; >>>> /* Don't destroy jobs while the timeout worker is running */ >>>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>>> !cancel_delayed_work(&sched->work_tdr)) >>>> - return; >>>> - >>>> - >>>> - while (!list_empty(&sched->ring_mirror_list)) { >>>> - struct drm_sched_job *job; >>>> + return NULL; >>>> - job = list_first_entry(&sched->ring_mirror_list, >>>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>>> struct drm_sched_job, node); >>>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>>> - break; >>>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>>> + >>>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>>> /* remove job from ring_mirror_list */ >>>> list_del_init(&job->node); >>>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>> - >>>> - sched->ops->free_job(job); >>>> + } else { >>>> + job = NULL; >>>> + /* queue timeout for next job */ >>>> + drm_sched_start_timeout(sched); >>>> } >>>> - /* queue timeout for next job */ >>>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>>> - drm_sched_start_timeout(sched); >>>> spin_unlock_irqrestore(&sched->job_list_lock, flags); >>>> + return job; >>>> } >>>> /** >>>> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param) >>>> struct drm_sched_fence *s_fence; >>>> struct drm_sched_job *sched_job; >>>> struct dma_fence *fence; >>>> + struct drm_sched_job *cleanup_job = NULL; >>>> wait_event_interruptible(sc
Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()
On 9/25/19 12:00 PM, Steven Price wrote: > On 25/09/2019 16:56, Grodzovsky, Andrey wrote: >> On 9/25/19 11:14 AM, Steven Price wrote: >> >>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >>> it is called as the condition of wait_event_interruptible() it must not >>> sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep. >>> >>> Instead let's rename drm_sched_cleanup_jobs() to >>> drm_sched_get_cleanup_job() and simply return a job for processing if >>> there is one. The caller can then call the free_job() callback outside >>> the wait_event_interruptible() where sleeping is possible before >>> re-checking and returning to sleep if necessary. >>> >>> Signed-off-by: Steven Price >>> --- >>>drivers/gpu/drm/scheduler/sched_main.c | 44 ++ >>>1 file changed, 24 insertions(+), 20 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index 9a0ee74d82dc..0ed4aaa4e6d1 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence >>> *f, struct dma_fence_cb *cb) >>>} >>> >>>/** >>> - * drm_sched_cleanup_jobs - destroy finished jobs >>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >>> * >>> * @sched: scheduler instance >>> * >>> - * Remove all finished jobs from the mirror list and destroy them. >>> + * Returns the next finished job from the mirror list (if there is one) >>> + * ready for it to be destroyed. >>> */ >>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>> +static struct drm_sched_job * >>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>>{ >>> + struct drm_sched_job *job = NULL; >>> unsigned long flags; >>> >>> /* Don't destroy jobs while the timeout worker is running */ >>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>> !cancel_delayed_work(&sched->work_tdr)) >>> - return; >>> - >>> - >>> - while (!list_empty(&sched->ring_mirror_list)) { >>> - struct drm_sched_job *job; >>> + return NULL; >>> >>> - job = list_first_entry(&sched->ring_mirror_list, >>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>>struct drm_sched_job, node); >>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>> - break; >>> >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>> + >>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>> /* remove job from ring_mirror_list */ >>> list_del_init(&job->node); >>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> - >>> - sched->ops->free_job(job); >>> + } else { >>> + job = NULL; >>> + /* queue timeout for next job */ >>> + drm_sched_start_timeout(sched); >>> } >>> >>> - /* queue timeout for next job */ >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> - drm_sched_start_timeout(sched); >>> spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> >>> + return job; >>>} >>> >>>/** >>> @@ -698,12 +696,18 @@ static int drm_sched_main(void *param) >>> struct drm_sched_fence *s_fence; >>> struct drm_sched_job *sched_job; >>> struct dma_fence *fence; >>> + struct drm_sched_job *cleanup_job = NULL; >>> >>> wait_event_interruptible(sched->wake_up_worker, >>> -(drm_sched_cleanup_jobs(sched), >>> +(cleanup_job = >>> drm_sched_get_cleanup_job(sched)) || >>> (!drm_sched_blocked(sched) && >>>
Re: [PATCH] drm: Don't free jobs in wait_event_interruptible()
On 9/25/19 11:14 AM, Steven Price wrote: > drm_sched_cleanup_jobs() attempts to free finished jobs, however because > it is called as the condition of wait_event_interruptible() it must not > sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep. > > Instead let's rename drm_sched_cleanup_jobs() to > drm_sched_get_cleanup_job() and simply return a job for processing if > there is one. The caller can then call the free_job() callback outside > the wait_event_interruptible() where sleeping is possible before > re-checking and returning to sleep if necessary. > > Signed-off-by: Steven Price > --- > drivers/gpu/drm/scheduler/sched_main.c | 44 ++ > 1 file changed, 24 insertions(+), 20 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 9a0ee74d82dc..0ed4aaa4e6d1 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, > struct dma_fence_cb *cb) > } > > /** > - * drm_sched_cleanup_jobs - destroy finished jobs > + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >* >* @sched: scheduler instance >* > - * Remove all finished jobs from the mirror list and destroy them. > + * Returns the next finished job from the mirror list (if there is one) > + * ready for it to be destroyed. >*/ > -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) > +static struct drm_sched_job * > +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) > { > + struct drm_sched_job *job = NULL; > unsigned long flags; > > /* Don't destroy jobs while the timeout worker is running */ > if (sched->timeout != MAX_SCHEDULE_TIMEOUT && > !cancel_delayed_work(&sched->work_tdr)) > - return; > - > - > - while (!list_empty(&sched->ring_mirror_list)) { > - struct drm_sched_job *job; > + return NULL; > > - job = list_first_entry(&sched->ring_mirror_list, > + job = list_first_entry_or_null(&sched->ring_mirror_list, > struct drm_sched_job, node); > - if (!dma_fence_is_signaled(&job->s_fence->finished)) > - break; > > - spin_lock_irqsave(&sched->job_list_lock, flags); > + spin_lock_irqsave(&sched->job_list_lock, flags); > + > + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { > /* remove job from ring_mirror_list */ > list_del_init(&job->node); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > - > - sched->ops->free_job(job); > + } else { > + job = NULL; > + /* queue timeout for next job */ > + drm_sched_start_timeout(sched); > } > > - /* queue timeout for next job */ > - spin_lock_irqsave(&sched->job_list_lock, flags); > - drm_sched_start_timeout(sched); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > > + return job; > } > > /** > @@ -698,12 +696,18 @@ static int drm_sched_main(void *param) > struct drm_sched_fence *s_fence; > struct drm_sched_job *sched_job; > struct dma_fence *fence; > + struct drm_sched_job *cleanup_job = NULL; > > wait_event_interruptible(sched->wake_up_worker, > - (drm_sched_cleanup_jobs(sched), > + (cleanup_job = > drm_sched_get_cleanup_job(sched)) || >(!drm_sched_blocked(sched) && > (entity = > drm_sched_select_entity(sched))) || > - kthread_should_stop())); > + kthread_should_stop()); Can't we just call drm_sched_cleanup_jobs right here, remove all the conditions in wait_event_interruptible (make it always true) and after drm_sched_cleanup_jobs is called test for all those conditions and return to sleep if they evaluate to false ? drm_sched_cleanup_jobs is called unconditionally inside wait_event_interruptible anyway... This is more of a question to Christian. Andrey > + > + while (cleanup_job) { > + sched->ops->free_job(cleanup_job); > + cleanup_job = drm_sched_get_cleanup_job(sched); > + } > > if (!entity) > continue; ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/scheduler: use job count instead of peek
Acked-by: Andrey Grodzovsky Andrey On 8/9/19 11:31 AM, Christian König wrote: > The spsc_queue_peek function is accessing queue->head which belongs to > the consumer thread and shouldn't be accessed by the producer > > This is fixing a rare race condition when destroying entities. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/scheduler/sched_entity.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index 35ddbec1375a..671c90f34ede 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -95,7 +95,7 @@ static bool drm_sched_entity_is_idle(struct > drm_sched_entity *entity) > rmb(); /* for list_empty to work without lock */ > > if (list_empty(&entity->list) || > - spsc_queue_peek(&entity->job_queue) == NULL) > + spsc_queue_count(&entity->job_queue) == 0) > return true; > > return false; > @@ -281,7 +281,7 @@ void drm_sched_entity_fini(struct drm_sched_entity > *entity) > /* Consumption of existing IBs wasn't completed. Forcefully >* remove them here. >*/ > - if (spsc_queue_peek(&entity->job_queue)) { > + if (spsc_queue_count(&entity->job_queue)) { > if (sched) { > /* Park the kernel for a moment to make sure it isn't > processing >* our enity. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/scheduler: put killed job cleanup to worker
On 7/3/19 10:53 AM, Lucas Stach wrote: > Am Mittwoch, den 03.07.2019, 14:41 + schrieb Grodzovsky, Andrey: >> On 7/3/19 10:32 AM, Lucas Stach wrote: >>> Am Mittwoch, den 03.07.2019, 14:23 + schrieb Grodzovsky, Andrey: >>>> On 7/3/19 6:28 AM, Lucas Stach wrote: >>>>> drm_sched_entity_kill_jobs_cb() is called right from the last scheduled >>>>> job finished fence signaling. As this might happen from IRQ context we >>>>> now end up calling the GPU driver free_job callback in IRQ context, while >>>>> all other paths call it from normal process context. >>>>> >>>>> Etnaviv in particular calls core kernel functions that are only valid to >>>>> be called from process context when freeing the job. Other drivers might >>>>> have similar issues, but I did not validate this. Fix this by punting >>>>> the cleanup work into a work item, so the driver expectations are met. >>>>> >>>>>>> Signed-off-by: Lucas Stach >>>>> --- > [...] > >>>> I rechecked the latest code and finish_work was removed in ffae3e5 >>>> 'drm/scheduler: rework job destruction' >>> Aw, thanks. Seems this patch was stuck for a bit too long in my >>> outgoing queue. I've just checked the commit you pointed out, it should >>> also fix the issue that this patch was trying to fix. >> >> Not sure about this as you patch only concerns use case when cleaning >> unfinished job's for entity being destroyed. > AFAICS after ffae3e5 all the free_job invocations are done from process > context, so things should work for etnaviv. > > Regards, > Lucas Actually for jobs that were never submitted to HW your change actually makes sense as those will still get cleaned from IRQ context when entity->last_scheduled will signal. Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/scheduler: put killed job cleanup to worker
On 7/3/19 10:32 AM, Lucas Stach wrote: > Am Mittwoch, den 03.07.2019, 14:23 + schrieb Grodzovsky, Andrey: >> On 7/3/19 6:28 AM, Lucas Stach wrote: >>> drm_sched_entity_kill_jobs_cb() is called right from the last scheduled >>> job finished fence signaling. As this might happen from IRQ context we >>> now end up calling the GPU driver free_job callback in IRQ context, while >>> all other paths call it from normal process context. >>> >>> Etnaviv in particular calls core kernel functions that are only valid to >>> be called from process context when freeing the job. Other drivers might >>> have similar issues, but I did not validate this. Fix this by punting >>> the cleanup work into a work item, so the driver expectations are met. >>> >>>>> Signed-off-by: Lucas Stach >>> --- >>> drivers/gpu/drm/scheduler/sched_entity.c | 28 ++-- >>> 1 file changed, 17 insertions(+), 11 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c >>> b/drivers/gpu/drm/scheduler/sched_entity.c >>> index 35ddbec1375a..ba4eb66784b9 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>> @@ -202,23 +202,23 @@ long drm_sched_entity_flush(struct drm_sched_entity >>> *entity, long timeout) >>> } >>> EXPORT_SYMBOL(drm_sched_entity_flush); >>> >>> -/** >>> - * drm_sched_entity_kill_jobs - helper for drm_sched_entity_kill_jobs >>> - * >>> - * @f: signaled fence >>> - * @cb: our callback structure >>> - * >>> - * Signal the scheduler finished fence when the entity in question is >>> killed. >>> - */ >>> +static void drm_sched_entity_kill_work(struct work_struct *work) >>> +{ >>>>> + struct drm_sched_job *job = container_of(work, struct drm_sched_job, >>>>> + finish_work); >>> + >>>>> + drm_sched_fence_finished(job->s_fence); >>>>> + WARN_ON(job->s_fence->parent); >>>>> + job->sched->ops->free_job(job); >>> +} >>> + >>> static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, >>>>> struct dma_fence_cb *cb) >>> { >>>>> struct drm_sched_job *job = container_of(cb, struct >>>>> drm_sched_job, >>>>> finish_cb); >>> >>>>> - drm_sched_fence_finished(job->s_fence); >>>>> - WARN_ON(job->s_fence->parent); >>>>> - job->sched->ops->free_job(job); >>>>> + schedule_work(&job->finish_work); >>> } >>> >>> /** >>> @@ -240,6 +240,12 @@ static void drm_sched_entity_kill_jobs(struct >>> drm_sched_entity *entity) >>>>> drm_sched_fence_scheduled(s_fence); >>>>> dma_fence_set_error(&s_fence->finished, -ESRCH); >>> >>>>> + /* >>>>> + * Replace regular finish work function with one that just >>>>> + * kills the job. >>>>> + */ >>> + job->finish_work.func = drm_sched_entity_kill_work; >> >> I rechecked the latest code and finish_work was removed in ffae3e5 >> 'drm/scheduler: rework job destruction' > Aw, thanks. Seems this patch was stuck for a bit too long in my > outgoing queue. I've just checked the commit you pointed out, it should > also fix the issue that this patch was trying to fix. Not sure about this as you patch only concerns use case when cleaning unfinished job's for entity being destroyed. Andrey > > Regards, > Lucas ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/scheduler: put killed job cleanup to worker
On 7/3/19 6:28 AM, Lucas Stach wrote: > drm_sched_entity_kill_jobs_cb() is called right from the last scheduled > job finished fence signaling. As this might happen from IRQ context we > now end up calling the GPU driver free_job callback in IRQ context, while > all other paths call it from normal process context. > > Etnaviv in particular calls core kernel functions that are only valid to > be called from process context when freeing the job. Other drivers might > have similar issues, but I did not validate this. Fix this by punting > the cleanup work into a work item, so the driver expectations are met. > > Signed-off-by: Lucas Stach > --- > drivers/gpu/drm/scheduler/sched_entity.c | 28 ++-- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index 35ddbec1375a..ba4eb66784b9 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -202,23 +202,23 @@ long drm_sched_entity_flush(struct drm_sched_entity > *entity, long timeout) > } > EXPORT_SYMBOL(drm_sched_entity_flush); > > -/** > - * drm_sched_entity_kill_jobs - helper for drm_sched_entity_kill_jobs > - * > - * @f: signaled fence > - * @cb: our callback structure > - * > - * Signal the scheduler finished fence when the entity in question is killed. > - */ > +static void drm_sched_entity_kill_work(struct work_struct *work) > +{ > + struct drm_sched_job *job = container_of(work, struct drm_sched_job, > + finish_work); > + > + drm_sched_fence_finished(job->s_fence); > + WARN_ON(job->s_fence->parent); > + job->sched->ops->free_job(job); > +} > + > static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, > struct dma_fence_cb *cb) > { > struct drm_sched_job *job = container_of(cb, struct drm_sched_job, >finish_cb); > > - drm_sched_fence_finished(job->s_fence); > - WARN_ON(job->s_fence->parent); > - job->sched->ops->free_job(job); > + schedule_work(&job->finish_work); > } > > /** > @@ -240,6 +240,12 @@ static void drm_sched_entity_kill_jobs(struct > drm_sched_entity *entity) > drm_sched_fence_scheduled(s_fence); > dma_fence_set_error(&s_fence->finished, -ESRCH); > > + /* > + * Replace regular finish work function with one that just > + * kills the job. > + */ > + job->finish_work.func = drm_sched_entity_kill_work; I rechecked the latest code and finish_work was removed in ffae3e5 'drm/scheduler: rework job destruction' Andrey > + > /* >* When pipe is hanged by older entity, new entity might >* not even have chance to submit it's first job to HW ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/sched: Fix make htmldocs warnings.
On 6/3/19 3:24 AM, Daniel Vetter wrote: > On Thu, May 30, 2019 at 05:04:20PM +0200, Christian König wrote: >> Am 29.05.19 um 21:36 schrieb Daniel Vetter: >>> On Wed, May 29, 2019 at 04:43:45PM +, Grodzovsky, Andrey wrote: >>>> I don't, sorry. >>> Should we fix that? Seems like you do plenty of scheduler stuff, so would >>> make sense I guess ... >> Reviewed-by: Christian König for the patch. >> >> And +1 for giving Andrey commit rights to drm-misc-next. > If Andrey is on board too, pls file an issue with the legacy ssh account > requests template here: > https://gitlab.freedesktop.org/freedesktop/freedesktop/ > > And then ping on irc or here so drm-misc folks can ack&forward. > -Daniel Here is the ticket https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/152 Andrey > >> Christian. >> >>> -Daniel >>> >>>> Andrey >>>> >>>> On 5/29/19 12:42 PM, Alex Deucher wrote: >>>>> On Wed, May 29, 2019 at 10:29 AM Andrey Grodzovsky >>>>> wrote: >>>>>> Signed-off-by: Andrey Grodzovsky >>>>> Reviewed-by: Alex Deucher >>>>> >>>>> I'll push it to drm-misc in a minute unless you have commit rights. >>>>> >>>>> Alex >>>>> >>>>>> --- >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 2 ++ >>>>>> 1 file changed, 2 insertions(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> index 49e7d07..c1058ee 100644 >>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> @@ -353,6 +353,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma); >>>>>> * drm_sched_stop - stop the scheduler >>>>>> * >>>>>> * @sched: scheduler instance >>>>>> + * @bad: job which caused the time out >>>>>> * >>>>>> * Stop the scheduler and also removes and frees all completed jobs. >>>>>> * Note: bad job will not be freed as it might be used later and so >>>>>> it's >>>>>> @@ -422,6 +423,7 @@ EXPORT_SYMBOL(drm_sched_stop); >>>>>> * drm_sched_job_recovery - recover jobs after a reset >>>>>> * >>>>>> * @sched: scheduler instance >>>>>> + * @full_recovery: proceed with complete sched restart >>>>>> * >>>>>> */ >>>>>> void drm_sched_start(struct drm_gpu_scheduler *sched, bool >>>>>> full_recovery) >>>>>> -- >>>>>> 2.7.4 >>>>>> >>>>>> ___ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/sched: Fix make htmldocs warnings.
I don't, sorry. Andrey On 5/29/19 12:42 PM, Alex Deucher wrote: > On Wed, May 29, 2019 at 10:29 AM Andrey Grodzovsky > wrote: >> Signed-off-by: Andrey Grodzovsky > Reviewed-by: Alex Deucher > > I'll push it to drm-misc in a minute unless you have commit rights. > > Alex > >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 49e7d07..c1058ee 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -353,6 +353,7 @@ EXPORT_SYMBOL(drm_sched_increase_karma); >>* drm_sched_stop - stop the scheduler >>* >>* @sched: scheduler instance >> + * @bad: job which caused the time out >>* >>* Stop the scheduler and also removes and frees all completed jobs. >>* Note: bad job will not be freed as it might be used later and so it's >> @@ -422,6 +423,7 @@ EXPORT_SYMBOL(drm_sched_stop); >>* drm_sched_job_recovery - recover jobs after a reset >>* >>* @sched: scheduler instance >> + * @full_recovery: proceed with complete sched restart >>* >>*/ >> void drm_sched_start(struct drm_gpu_scheduler *sched, bool full_recovery) >> -- >> 2.7.4 >> >> ___ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [bug report] drm/scheduler: rework job destruction
Thanks for letting know, I will send a fix soon. Andrey On 5/22/19 9:07 AM, Dan Carpenter wrote: > [CAUTION: External Email] > > Hello Christian König, > > The patch 5918045c4ed4: "drm/scheduler: rework job destruction" from > Apr 18, 2019, leads to the following static checker warning: > > drivers/gpu/drm/scheduler/sched_main.c:297 drm_sched_job_timedout() > error: potential NULL dereference 'job'. > > drivers/gpu/drm/scheduler/sched_main.c > 279 static void drm_sched_job_timedout(struct work_struct *work) > 280 { > 281 struct drm_gpu_scheduler *sched; > 282 struct drm_sched_job *job; > 283 unsigned long flags; > 284 > 285 sched = container_of(work, struct drm_gpu_scheduler, > work_tdr.work); > 286 job = list_first_entry_or_null(&sched->ring_mirror_list, > 287 struct drm_sched_job, node); > 288 > 289 if (job) > ^^^ > We assume that job can be NULL. > > 290 job->sched->ops->timedout_job(job); > 291 > 292 /* > 293 * Guilty job did complete and hence needs to be manually > removed > 294 * See drm_sched_stop doc. > 295 */ > 296 if (sched->free_guilty) { > > Originally (last week) this check was "if (list_empty(&job->node))" > which is obviously problematic if job is NULL. It's not clear to me > that this new check ensures that job is non-NULL either. > > 297 job->sched->ops->free_job(job); > ^ > Dereference. > > 298 sched->free_guilty = false; > 299 } > 300 > 301 spin_lock_irqsave(&sched->job_list_lock, flags); > 302 drm_sched_start_timeout(sched); > 303 spin_unlock_irqrestore(&sched->job_list_lock, flags); > 304 } > > regards, > dan carpenter ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: lima_bo memory leak after drm_sched job destruction rework
Don't have the code in front of me now but as far as I remember it will only prematurely terminate in drm_sched_cleanup_jobs if there is timeout work in progress which would not be the case if nothing hangs. Andrey From: Erico Nunes Sent: 17 May 2019 17:42:48 To: Grodzovsky, Andrey Cc: Deucher, Alexander; Koenig, Christian; Zhou, David(ChunMing); David Airlie; Daniel Vetter; Lucas Stach; Russell King; Christian Gmeiner; Qiang Yu; Rob Herring; Tomeu Vizoso; Eric Anholt; Rex Zhu; Huang, Ray; Deng, Emily; Nayan Deshmukh; Sharat Masetty; amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; l...@lists.freedesktop.org Subject: Re: lima_bo memory leak after drm_sched job destruction rework [CAUTION: External Email] On Fri, May 17, 2019 at 10:43 PM Grodzovsky, Andrey wrote: > On 5/17/19 3:35 PM, Erico Nunes wrote: > > Lima currently defaults to an "infinite" timeout. Setting a 500ms > > default timeout like most other drm_sched users do fixed the leak for > > me. > > I am not very clear about the problem - so you basically never allow a > time out handler to run ? And then when the job hangs for ever you get > this memory leak ? How it worked for you before this refactoring ? As > far as I remember sched->ops->free_job before this change was called > from drm_sched_job_finish which is the work scheduled from HW fence > signaled callback - drm_sched_process_job so if your job hangs for ever > anyway and this work is never scheduled when your free_job callback was > called ? In this particular case, the jobs run successfully, nothing hangs. Lima currently specifies an "infinite" timeout to the drm scheduler, so if a job did did hang, it would hang forever, I suppose. But this is not the issue. If I understand correctly it worked well before the rework because the cleanup was triggered at the end of drm_sched_process_job independently on the timeout. What I'm observing now is that even when jobs run successfully, they are not cleaned by the drm scheduler because drm_sched_cleanup_jobs seems to give up based on the status of a timeout worker. I would expect the timeout value to only be relevant in error/hung job cases. I will probably set the timeout to a reasonable value anyway, I just posted here to report that this can possibly be a bug in the drm scheduler after that rework. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: lima_bo memory leak after drm_sched job destruction rework
On 5/17/19 3:35 PM, Erico Nunes wrote: > [CAUTION: External Email] > > Hello, > > I have recently observed a memory leak issue with lima using > drm-misc-next, which I initially reported here: > https://gitlab.freedesktop.org/lima/linux/issues/24 > It is an easily reproduceable memory leak which I was able to bisect to > commit: > > 5918045c4ed4 drm/scheduler: rework job destruction > > After some investigation, it seems that after the refactor, > sched->ops->free_job (in lima: lima_sched_free_job) is no longer > called. > With some more debugging I found that it is not being called because > the job freeing is now in drm_sched_cleanup_jobs, which for lima > always aborts in the initial "Don't destroy jobs while the timeout > worker is running" condition. > > Lima currently defaults to an "infinite" timeout. Setting a 500ms > default timeout like most other drm_sched users do fixed the leak for > me. I am not very clear about the problem - so you basically never allow a time out handler to run ? And then when the job hangs for ever you get this memory leak ? How it worked for you before this refactoring ? As far as I remember sched->ops->free_job before this change was called from drm_sched_job_finish which is the work scheduled from HW fence signaled callback - drm_sched_process_job so if your job hangs for ever anyway and this work is never scheduled when your free_job callback was called ? > > I can send a patch to set a 500ms timeout and have it probably working > again, but I am wondering now if this is expected behaviour for > drm_sched after the refactor. > In particular I also noticed that drm_sched_suspend_timeout is not > called anywhere. Is it expected that we now rely on a timeout > parameter to cleanup jobs that ran successfully? AFAIK the drm_sched_suspend_timeout is used by a driver in a staging branch, Christian can give more detail. Andrey > > Erico ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.
Thanks David, with that only patches 5 and 6 are left for the series to be reviewed. Christian, any more comments on those patches ? Andrey On 4/27/19 10:56 PM, Zhou, David(ChunMing) wrote: Sorry, I only can put my Acked-by: Chunming Zhou <mailto:david1.z...@amd.com> on patch#3. I cannot fully judge patch #4, #5, #6. -David From: amd-gfx <mailto:amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Friday, April 26, 2019 10:09 PM To: Koenig, Christian <mailto:christian.koe...@amd.com>; Zhou, David(ChunMing) <mailto:david1.z...@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org>; e...@anholt.net<mailto:e...@anholt.net>; etna...@lists.freedesktop.org<mailto:etna...@lists.freedesktop.org> Cc: Kazlauskas, Nicholas <mailto:nicholas.kazlaus...@amd.com>; Liu, Monk <mailto:monk@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. Ping (mostly David and Monk). Andrey On 4/24/19 3:09 AM, Christian König wrote: Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing): >> -drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } HW fence are already forced completion, then we can just disable irq fence process and ignore hw fence signal when we are trying to do GPU reset, I think. Otherwise which will make the logic much more complex. If this situation happens because of long time execution, we can increase timeout of reset detection. You are not thinking widely enough, forcing the hw fence to complete can trigger other to start other activity in the system. We first need to stop everything and make sure that we don't do any processing any more and then start with our reset procedure including forcing all hw fences to complete. Christian. -David From: amd-gfx <mailto:amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Wednesday, April 24, 2019 12:00 AM To: Zhou, David(ChunMing) <mailto:david1.z...@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org>; e...@anholt.net<mailto:e...@anholt.net>; etna...@lists.freedesktop.org<mailto:etna...@lists.freedesktop.org>; ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com> Cc: Kazlauskas, Nicholas <mailto:nicholas.kazlaus...@amd.com>; Liu, Monk <mailto:monk@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? Original Message Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >
Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.
Ping (mostly David and Monk). Andrey On 4/24/19 3:09 AM, Christian König wrote: Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing): >> -drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } HW fence are already forced completion, then we can just disable irq fence process and ignore hw fence signal when we are trying to do GPU reset, I think. Otherwise which will make the logic much more complex. If this situation happens because of long time execution, we can increase timeout of reset detection. You are not thinking widely enough, forcing the hw fence to complete can trigger other to start other activity in the system. We first need to stop everything and make sure that we don't do any processing any more and then start with our reset procedure including forcing all hw fences to complete. Christian. -David From: amd-gfx <mailto:amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Grodzovsky, Andrey Sent: Wednesday, April 24, 2019 12:00 AM To: Zhou, David(ChunMing) <mailto:david1.z...@amd.com>; dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org>; e...@anholt.net<mailto:e...@anholt.net>; etna...@lists.freedesktop.org<mailto:etna...@lists.freedesktop.org>; ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com> Cc: Kazlauskas, Nicholas <mailto:nicholas.kazlaus...@amd.com>; Liu, Monk <mailto:monk@amd.com> Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? Original Message ---- Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky >> <mailto:andrey.grodzov...@amd.com> >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 >> +++-- >>1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> -drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +
Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.
No, i mean the actual HW fence which signals when the job finished execution on the HW. Andrey On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote: do you mean fence timer? why not stop it as well when stopping sched for the reason of hw reset? Original Message Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled. From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com> CC: "Kazlauskas, Nicholas" ,"Liu, Monk" On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky >> <mailto:andrey.grodzov...@amd.com> >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 >> +++-- >>1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> -drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> +/* Don't suspend on bare metal if we are not going to HW reset the ASIC >> */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct >> amdgpu_hive_info *hive, >> return r; >>} >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool >> trylock) >>{ >> -int i; >> - >> -for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> -struct amdgpu_ring *ring = adev->rings[i]; >> - >> -if (!ring || !ring->sched.thread) >> -continue; >> - >> -if (!adev->asic_reset_res) >> -drm_sched_resubmit_jobs(&ring->sched); >> +if (trylock) { >> +if (!mutex_trylock(&adev->lock_reset)) >> +return false; >> +} else >> +mutex_lock(&adev->lock_reset); >> >> -drm_sched_start(&ring->sched, !adev->asic_reset_res); >> -} >> - >> -if (!amdgpu_device_has_dc_support(adev)) { >> -drm_helper_resume_force_mode(adev->ddev); >> -} >> - >> -adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> -mutex_lock(&adev
Re: [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer
On 4/22/19 8:59 AM, Zhou, David(ChunMing) wrote: > +Monk to response this patch. > > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> For later driver's reference to see if the fence is signaled. >> >> v2: Move parent fence put to resubmit jobs. >> >> Signed-off-by: Andrey Grodzovsky >> Reviewed-by: Christian König >> --- >>drivers/gpu/drm/scheduler/sched_main.c | 11 +-- >>1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 7816de7..03e6bd8 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, >> struct drm_sched_job *bad) >> if (s_job->s_fence->parent && >> dma_fence_remove_callback(s_job->s_fence->parent, >>&s_job->cb)) { >> -dma_fence_put(s_job->s_fence->parent); >> -s_job->s_fence->parent = NULL; > I vaguely remember Monk set parent to be NULL to avoiod potiential free > problem after callback removal. > > > -David I see, we have to avoid setting it to NULL here as in case the guilty job does signal and we avoid HW reset we are not going to resubmit the jobs and hence stay with the same parent on reattachment of the cb. So I need to know exactly what scenario this set to NULL fixes. Andrey > > >> atomic_dec(&sched->hw_rq_count); >> } else { >> /* >> @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, >> struct drm_sched_job *bad) >> sched->ops->free_job(s_job); >> } >> } >> + >> +/* >> + * Stop pending timer in flight as we rearm it in drm_sched_start. This >> + * avoids the pending timeout work in progress to fire right away after >> + * this TDR finished and before the newly restarted jobs had a >> + * chance to complete. >> + */ >> +cancel_delayed_work(&sched->work_tdr); >>} >> >>EXPORT_SYMBOL(drm_sched_stop); >> @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler >> *sched) >> if (found_guilty && s_job->s_fence->scheduled.context == >> guilty_context) >> dma_fence_set_error(&s_fence->finished, -ECANCELED); >> >> +dma_fence_put(s_job->s_fence->parent); >> s_job->s_fence->parent = sched->ops->run_job(s_job); >> } >>} ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 3/6] drm/scheduler: rework job destruction
On 4/23/19 10:44 AM, Zhou, David(ChunMing) wrote: This patch is to fix deadlock between fence->lock and sched->job_list_lock, right? So I suggest to just move list_del_init(&s_job->node) from drm_sched_process_job to work thread. That will avoid deadlock described in the link. Do you mean restoring back scheduling work from HW fence interrupt handler and deleting there ? Yes, I suggested this as an option (take a look at my comment 9 in https://bugs.freedesktop.org/show_bug.cgi?id=109692) but since we still have to wait for all fences in flight to signal to avoid the problem fixed in '3741540 drm/sched: Rework HW fence processing.' this thing becomes somewhat complicated and so Christian came up with the core idea in this patch which is to do all deletions/insertions thread safe by grantee it's always dome from one thread. It does simplify the handling. Andrey Original Message Subject: Re: [PATCH v5 3/6] drm/scheduler: rework job destruction From: "Grodzovsky, Andrey" To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com> CC: "Kazlauskas, Nicholas" ,"Koenig, Christian" On 4/22/19 8:48 AM, Chunming Zhou wrote: > Hi Andrey, > > static void drm_sched_process_job(struct dma_fence *f, struct > dma_fence_cb *cb) > { > ... > spin_lock_irqsave(&sched->job_list_lock, flags); > /* remove job from ring_mirror_list */ > list_del_init(&s_job->node); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > [David] How about just remove above to worker from irq process? Any > problem? Maybe I missed previous your discussion, but I think removing > lock for list is a risk for future maintenance although you make sure > thread safe currently. > > -David We remove the lock exactly because of the fact that insertion and removal to/from the list will be done form exactly one thread at ant time now. So I am not sure I understand what you mean. Andrey > > ... > > schedule_work(&s_job->finish_work); > } > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> From: Christian König >> <mailto:christian.koe...@amd.com> >> >> We now destroy finished jobs from the worker thread to make sure that >> we never destroy a job currently in timeout processing. >> By this we avoid holding lock around ring mirror list in drm_sched_stop >> which should solve a deadlock reported by a user. >> >> v2: Remove unused variable. >> v4: Move guilty job free into sched code. >> v5: >> Move sched->hw_rq_count to drm_sched_start to account for counter >> decrement in drm_sched_stop even when we don't call resubmit jobs >> if guily job did signal. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >> >> Signed-off-by: Christian König >> <mailto:christian.koe...@amd.com> >> Signed-off-by: Andrey Grodzovsky >> <mailto:andrey.grodzov...@amd.com> >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- >>drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- >>drivers/gpu/drm/lima/lima_sched.c | 2 +- >>drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- >>drivers/gpu/drm/scheduler/sched_main.c | 159 >> + >>drivers/gpu/drm/v3d/v3d_sched.c| 2 +- >>include/drm/gpu_scheduler.h| 6 +- >>8 files changed, 102 insertions(+), 84 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 7cee269..a0e165c 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >>if (!ring || !ring->sched.thread) >>continue; >> >> - drm_sched_stop(&ring->sched); >> + drm_sched_stop(&ring->sched, &job->base); >> >>/* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >>amdgpu_fence_driver_force_completion(ring); >> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >>if(job) >>drm_sched_increase_karma(&am
Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.
On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote: > +Monk. > > GPU reset is used widely in SRIOV, so need virtulizatino guy take a look. > > But out of curious, why guilty job can signal more if the job is already > set to guilty? set it wrongly? > > > -David It's possible that the job does completes at a later time then it's timeout handler started processing so in this patch we try to protect against this by rechecking the HW fence after stopping all SW schedulers. We do it BEFORE marking guilty on the job's sched_entity so at the point we check the guilty flag is not set yet. Andrey > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> Avoid passing job_signaled to helper fnctions to keep all the decision >> making about skipping HW reset in one place. >> >> v3: >> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >> against it's decrement in drm_sched_stop in non HW reset case. >> v4: rebase >> v5: Revert v3 as we do it now in sceduler code. >> >> Signed-off-by: Andrey Grodzovsky >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 >> +++-- >>1 file changed, 95 insertions(+), 48 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index a0e165c..85f8792 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> -drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> +/* Don't suspend on bare metal if we are not going to HW reset the ASIC >> */ >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct >> amdgpu_hive_info *hive, >> return r; >>} >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool >> trylock) >>{ >> -int i; >> - >> -for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> -struct amdgpu_ring *ring = adev->rings[i]; >> - >> -if (!ring || !ring->sched.thread) >> -continue; >> - >> -if (!adev->asic_reset_res) >> -drm_sched_resubmit_jobs(&ring->sched); >> +if (trylock) { >> +if (!mutex_trylock(&adev->lock_reset)) >> +return false; >> +} else >> +mutex_lock(&adev->lock_reset); >> >> -drm_sched_start(&ring->sched, !adev->asic_reset_res); >> -} >> - >> -if (!amdgpu_device_has_dc_support(adev)) { >> -drm_helper_resume_force_mode(adev->ddev); >> -} >> - >> -adev->asic_reset_res = 0; >> -} >> - >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> -{ >> -mutex_lock(&adev->lock_reset); >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >>amdgpu_amdkfd_pre_reset(adev); >> + >> +return true; >>} >> >>static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct >> amdgpu_device *adev) >>int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >>struct amdgpu_job *job) >>{ >> -int r; >> +struct list_head device_list, *device_list_handle = NULL; >> +bool need_full_reset, job_signaled; >> struct amdgpu_hive_info *hive = NULL; >> -bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> -struct list_head device_list, *device_list_handle = NULL; >> +int i, r = 0; >> >> +need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> >> dev_info(adev->dev, "GPU reset begin!\n"); >> >> +hive = amdgpu_get_xgmi_hive(adev, false); >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already >> executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to avoid chain of resets executing from >> + * either trigger by jobs on different adevs
Re: [PATCH v5 3/6] drm/scheduler: rework job destruction
On 4/22/19 8:48 AM, Chunming Zhou wrote: > Hi Andrey, > > static void drm_sched_process_job(struct dma_fence *f, struct > dma_fence_cb *cb) > { > ... > spin_lock_irqsave(&sched->job_list_lock, flags); > /* remove job from ring_mirror_list */ > list_del_init(&s_job->node); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > [David] How about just remove above to worker from irq process? Any > problem? Maybe I missed previous your discussion, but I think removing > lock for list is a risk for future maintenance although you make sure > thread safe currently. > > -David We remove the lock exactly because of the fact that insertion and removal to/from the list will be done form exactly one thread at ant time now. So I am not sure I understand what you mean. Andrey > > ... > > schedule_work(&s_job->finish_work); > } > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> From: Christian König >> >> We now destroy finished jobs from the worker thread to make sure that >> we never destroy a job currently in timeout processing. >> By this we avoid holding lock around ring mirror list in drm_sched_stop >> which should solve a deadlock reported by a user. >> >> v2: Remove unused variable. >> v4: Move guilty job free into sched code. >> v5: >> Move sched->hw_rq_count to drm_sched_start to account for counter >> decrement in drm_sched_stop even when we don't call resubmit jobs >> if guily job did signal. >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >> >> Signed-off-by: Christian König >> Signed-off-by: Andrey Grodzovsky >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- >>drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- >>drivers/gpu/drm/lima/lima_sched.c | 2 +- >>drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- >>drivers/gpu/drm/scheduler/sched_main.c | 159 >> + >>drivers/gpu/drm/v3d/v3d_sched.c| 2 +- >>include/drm/gpu_scheduler.h| 6 +- >>8 files changed, 102 insertions(+), 84 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 7cee269..a0e165c 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> >> -drm_sched_stop(&ring->sched); >> +drm_sched_stop(&ring->sched, &job->base); >> >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if(job) >> drm_sched_increase_karma(&job->base); >> >> - >> - >> if (!amdgpu_sriov_vf(adev)) { >> >> if (!need_full_reset) >> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct >> amdgpu_hive_info *hive, >> return r; >>} >> >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> - struct amdgpu_job *job) >> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>{ >> int i; >> >> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device >> *adev, >> >> /* Post ASIC reset for all devs .*/ >> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { >> -amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job >> : NULL); >> +amdgpu_device_post_asic_reset(tmp_adev); >> >> if (r) { >> /* bad news, how to tell it to userspace ? */ >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> index 33854c9..5778d9c 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >> mmu_size + gpu->buffer.size; >> >> /* Add in the active command buffers */ >> -spin_lock_irqsave(&gpu->sched.job_list_lock, flags); >> list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { >> submit = to_etnaviv_submit(s_job); >> file_size += submit->cmdbuf.size; >> n_obj++; >> } >> -spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); >> >> /* Add in the active buffer objects */ >> list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { >> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >>gpu->buffer.size, >>etn
Re: [PATCH] drm/sched: Fix description of drm_sched_stop
Reviewed-by: Andrey Grodzovsky Andrey On 4/20/19 8:50 AM, Jonathan Neuschäfer wrote: > Since commit 222b5f044159 ("drm/sched: Refactor ring mirror list > handling."), drm_sched_hw_job_reset is no longer there, so let's adjust > the doc comment accordingly. > > Signed-off-by: Jonathan Neuschäfer > --- > drivers/gpu/drm/scheduler/sched_main.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 19fc601c9eeb..a1bec2779e76 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -366,10 +366,9 @@ void drm_sched_increase_karma(struct drm_sched_job *bad) > EXPORT_SYMBOL(drm_sched_increase_karma); > > /** > - * drm_sched_hw_job_reset - stop the scheduler if it contains the bad job > + * drm_sched_stop - stop the scheduler >* >* @sched: scheduler instance > - * @bad: bad scheduler job >* >*/ > void drm_sched_stop(struct drm_gpu_scheduler *sched) > -- > 2.20.1 > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.
On 4/23/19 8:32 AM, Koenig, Christian wrote: > Well you at least have to give me time till after the holidays to get > going again :) > > Not sure exactly jet why we need patch number 5. Probably you missed the mail where I pointed out a bug I found during testing - I am reattaching the mail and the KASAN dump. Andrey > > And we should probably commit patch #1 and #2. > > Christian. > > Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: >> Ping for patches 3, new patch 5 and patch 6. >> >> Andrey >> >> On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: >>> Also reject TDRs if another one already running. >>> >>> v2: >>> Stop all schedulers across device and entire XGMI hive before >>> force signaling HW fences. >>> Avoid passing job_signaled to helper fnctions to keep all the decision >>> making about skipping HW reset in one place. >>> >>> v3: >>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >>> against it's decrement in drm_sched_stop in non HW reset case. >>> v4: rebase >>> v5: Revert v3 as we do it now in sceduler code. >>> >>> Signed-off-by: Andrey Grodzovsky >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 >>> +++-- >>> 1 file changed, 95 insertions(+), 48 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index a0e165c..85f8792 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct >>> amdgpu_device *adev, >>> if (!ring || !ring->sched.thread) >>> continue; >>> >>> - drm_sched_stop(&ring->sched, &job->base); >>> - >>> /* after all hw jobs are reset, hw fence is >>> meaningless, so force_completion */ >>> amdgpu_fence_driver_force_completion(ring); >>> } >>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct >>> amdgpu_device *adev, >>> if(job) >>> drm_sched_increase_karma(&job->base); >>> >>> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC >>> */ >>> if (!amdgpu_sriov_vf(adev)) { >>> >>> if (!need_full_reset) >>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct >>> amdgpu_hive_info *hive, >>> return r; >>> } >>> >>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool >>> trylock) >>> { >>> - int i; >>> - >>> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>> - struct amdgpu_ring *ring = adev->rings[i]; >>> - >>> - if (!ring || !ring->sched.thread) >>> - continue; >>> - >>> - if (!adev->asic_reset_res) >>> - drm_sched_resubmit_jobs(&ring->sched); >>> + if (trylock) { >>> + if (!mutex_trylock(&adev->lock_reset)) >>> + return false; >>> + } else >>> + mutex_lock(&adev->lock_reset); >>> >>> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >>> - } >>> - >>> - if (!amdgpu_device_has_dc_support(adev)) { >>> - drm_helper_resume_force_mode(adev->ddev); >>> - } >>> - >>> - adev->asic_reset_res = 0; >>> -} >>> - >>> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >>> -{ >>> - mutex_lock(&adev->lock_reset); >>> atomic_inc(&adev->gpu_reset_counter); >>> adev->in_gpu_reset = 1; >>> /* Block kfd: SRIOV would do it separately */ >>> if (!amdgpu_sriov_vf(adev)) >>> amdgpu_amdkfd_pre_reset(adev); >>> + >>> + return true; >>> } >>> >>> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >>> @@ -3538,40 +3521,42 @@
Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.
OK, i will merge them into amd-staging drm-next. Andrey On 4/23/19 9:14 AM, Kazlauskas, Nicholas wrote: > Feel free to merge 1+2 since they don't really depend on any other work > in the series and they were previously reviewed. > > Nicholas Kazlauskas > > On 4/23/19 8:32 AM, Koenig, Christian wrote: >> Well you at least have to give me time till after the holidays to get >> going again :) >> >> Not sure exactly jet why we need patch number 5. >> >> And we should probably commit patch #1 and #2. >> >> Christian. >> >> Am 22.04.19 um 13:54 schrieb Grodzovsky, Andrey: >>> Ping for patches 3, new patch 5 and patch 6. >>> >>> Andrey >>> >>> On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: >>>> Also reject TDRs if another one already running. >>>> >>>> v2: >>>> Stop all schedulers across device and entire XGMI hive before >>>> force signaling HW fences. >>>> Avoid passing job_signaled to helper fnctions to keep all the decision >>>> making about skipping HW reset in one place. >>>> >>>> v3: >>>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced >>>> against it's decrement in drm_sched_stop in non HW reset case. >>>> v4: rebase >>>> v5: Revert v3 as we do it now in sceduler code. >>>> >>>> Signed-off-by: Andrey Grodzovsky >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 >>>> +++-- >>>> 1 file changed, 95 insertions(+), 48 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index a0e165c..85f8792 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct >>>> amdgpu_device *adev, >>>>if (!ring || !ring->sched.thread) >>>>continue; >>>> >>>> - drm_sched_stop(&ring->sched, &job->base); >>>> - >>>>/* after all hw jobs are reset, hw fence is >>>> meaningless, so force_completion */ >>>>amdgpu_fence_driver_force_completion(ring); >>>>} >>>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct >>>> amdgpu_device *adev, >>>>if(job) >>>>drm_sched_increase_karma(&job->base); >>>> >>>> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC >>>> */ >>>>if (!amdgpu_sriov_vf(adev)) { >>>> >>>>if (!need_full_reset) >>>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct >>>> amdgpu_hive_info *hive, >>>>return r; >>>> } >>>> >>>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool >>>> trylock) >>>> { >>>> - int i; >>>> - >>>> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> - struct amdgpu_ring *ring = adev->rings[i]; >>>> - >>>> - if (!ring || !ring->sched.thread) >>>> - continue; >>>> - >>>> - if (!adev->asic_reset_res) >>>> - drm_sched_resubmit_jobs(&ring->sched); >>>> + if (trylock) { >>>> + if (!mutex_trylock(&adev->lock_reset)) >>>> + return false; >>>> + } else >>>> + mutex_lock(&adev->lock_reset); >>>> >>>> - drm_sched_start(&ring->sched, !adev->asic_reset_res); >>>> - } >>>> - >>>> - if (!amdgpu_device_has_dc_support(adev)) { >>>> - drm_helper_resume_force_mode(adev->ddev); >>>> - } >>>> - >>>> - adev->asic_reset_res = 0; >>>> -} >>>> - >>>> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >>>> -{ >>>> - mutex_lock(&adev->lock_rese
Re: [PATCH v5 1/6] drm/amd/display: wait for fence without holding reservation lock
This series is on top of drm-misc because of panfrost and lima drovers which are missing form amd-staging-drm-next. Once i land it in drm-misc I will merge and p[ush it into drm-next. Andrey On 4/22/19 10:35 PM, Dieter Nützel wrote: > Hello Andrey, > > this series can't apply (brake on #3) on top of amd-staging-drm-next. > v2 works (Thu, 11 Apr 2019). > > Dieter > > Am 18.04.2019 17:00, schrieb Andrey Grodzovsky: >> From: Christian König >> >> Don't block others while waiting for the fences to finish, concurrent >> submission is perfectly valid in this case and holding the lock can >> prevent killed applications from terminating. >> >> Signed-off-by: Christian König >> Reviewed-by: Nicholas Kazlauskas >> --- >> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 - >> 1 file changed, 8 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> index 380a7f9..ad4f0e5 100644 >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> @@ -4814,23 +4814,26 @@ static void amdgpu_dm_commit_planes(struct >> drm_atomic_state *state, >> continue; >> } >> >> + abo = gem_to_amdgpu_bo(fb->obj[0]); >> + >> + /* Wait for all fences on this FB */ >> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, true, >> + false, >> + MAX_SCHEDULE_TIMEOUT); >> + WARN_ON(r < 0); >> + >> /* >> * TODO This might fail and hence better not used, wait >> * explicitly on fences instead >> * and in general should be called for >> * blocking commit to as per framework helpers >> */ >> - abo = gem_to_amdgpu_bo(fb->obj[0]); >> r = amdgpu_bo_reserve(abo, true); >> if (unlikely(r != 0)) { >> DRM_ERROR("failed to reserve buffer before flip\n"); >> WARN_ON(1); >> } >> >> - /* Wait for all fences on this FB */ >> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, >> false, >> - MAX_SCHEDULE_TIMEOUT) < 0); >> - >> amdgpu_bo_get_tiling_flags(abo, &tiling_flags); >> >> amdgpu_bo_unreserve(abo); ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.
Ping for patches 3, new patch 5 and patch 6. Andrey On 4/18/19 11:00 AM, Andrey Grodzovsky wrote: > Also reject TDRs if another one already running. > > v2: > Stop all schedulers across device and entire XGMI hive before > force signaling HW fences. > Avoid passing job_signaled to helper fnctions to keep all the decision > making about skipping HW reset in one place. > > v3: > Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced > against it's decrement in drm_sched_stop in non HW reset case. > v4: rebase > v5: Revert v3 as we do it now in sceduler code. > > Signed-off-by: Andrey Grodzovsky > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 > +++-- > 1 file changed, 95 insertions(+), 48 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index a0e165c..85f8792 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct > amdgpu_device *adev, > if (!ring || !ring->sched.thread) > continue; > > - drm_sched_stop(&ring->sched, &job->base); > - > /* after all hw jobs are reset, hw fence is meaningless, so > force_completion */ > amdgpu_fence_driver_force_completion(ring); > } > @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct > amdgpu_device *adev, > if(job) > drm_sched_increase_karma(&job->base); > > + /* Don't suspend on bare metal if we are not going to HW reset the ASIC > */ > if (!amdgpu_sriov_vf(adev)) { > > if (!need_full_reset) > @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct > amdgpu_hive_info *hive, > return r; > } > > -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) > +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock) > { > - int i; > - > - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { > - struct amdgpu_ring *ring = adev->rings[i]; > - > - if (!ring || !ring->sched.thread) > - continue; > - > - if (!adev->asic_reset_res) > - drm_sched_resubmit_jobs(&ring->sched); > + if (trylock) { > + if (!mutex_trylock(&adev->lock_reset)) > + return false; > + } else > + mutex_lock(&adev->lock_reset); > > - drm_sched_start(&ring->sched, !adev->asic_reset_res); > - } > - > - if (!amdgpu_device_has_dc_support(adev)) { > - drm_helper_resume_force_mode(adev->ddev); > - } > - > - adev->asic_reset_res = 0; > -} > - > -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) > -{ > - mutex_lock(&adev->lock_reset); > atomic_inc(&adev->gpu_reset_counter); > adev->in_gpu_reset = 1; > /* Block kfd: SRIOV would do it separately */ > if (!amdgpu_sriov_vf(adev)) > amdgpu_amdkfd_pre_reset(adev); > + > + return true; > } > > static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) > @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct > amdgpu_device *adev) > int amdgpu_device_gpu_recover(struct amdgpu_device *adev, > struct amdgpu_job *job) > { > - int r; > + struct list_head device_list, *device_list_handle = NULL; > + bool need_full_reset, job_signaled; > struct amdgpu_hive_info *hive = NULL; > - bool need_full_reset = false; > struct amdgpu_device *tmp_adev = NULL; > - struct list_head device_list, *device_list_handle = NULL; > + int i, r = 0; > > + need_full_reset = job_signaled = false; > INIT_LIST_HEAD(&device_list); > > dev_info(adev->dev, "GPU reset begin!\n"); > > + hive = amdgpu_get_xgmi_hive(adev, false); > + > /* > - * In case of XGMI hive disallow concurrent resets to be triggered > - * by different nodes. No point also since the one node already > executing > - * reset will also reset all the other nodes in the hive. > + * Here we trylock to avoid chain of resets executing from > + * either trigger by jobs on different adevs in XGMI hive or jobs on > + * different schedulers for same device while this TO handler is > running. > + * We always reset all schedulers for device and all devices for XGMI > + * hive so that should take care of them too. >*/ > - hive = amdgpu_get_xgmi_hive(adev, 0); > - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 && > - !mutex_trylock(&hive->reset_lock)) > + > + if (hive && !mutex_trylock(&hive->reset_lock)) { > + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another > already in progress", > + job->base.id, hive->hive_id); >
Re: [PATCH v3 1/5] drm/scheduler: rework job destruction
On 4/16/19 12:00 PM, Koenig, Christian wrote: > Am 16.04.19 um 17:42 schrieb Grodzovsky, Andrey: >> On 4/16/19 10:58 AM, Grodzovsky, Andrey wrote: >>> On 4/16/19 10:43 AM, Koenig, Christian wrote: >>>> Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >>>>> On 4/16/19 5:47 AM, Christian König wrote: >>>>>> Am 15.04.19 um 23:17 schrieb Eric Anholt: >>>>>>> Andrey Grodzovsky writes: >>>>>>> >>>>>>>> From: Christian König >>>>>>>> >>>>>>>> We now destroy finished jobs from the worker thread to make sure that >>>>>>>> we never destroy a job currently in timeout processing. >>>>>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop >>>>>>>> which should solve a deadlock reported by a user. >>>>>>>> >>>>>>>> v2: Remove unused variable. >>>>>>>> >>>>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>>>>>>> >>>>>>>> Signed-off-by: Christian König >>>>>>>> Signed-off-by: Andrey Grodzovsky >>>>>>>> --- >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++-- >>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +- >>>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 138 >>>>>>>> + >>>>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +- >>>>>>> Missing corresponding panfrost and lima updates. You should probably >>>>>>> pull in drm-misc for hacking on the scheduler. >>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> index ce7c737b..8efb091 100644 >>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>>>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, >>>>>>>> struct drm_sched_job *sched_job) >>>>>>>> /* block scheduler */ >>>>>>>> for (q = 0; q < V3D_MAX_QUEUES; q++) >>>>>>>> - drm_sched_stop(&v3d->queue[q].sched); >>>>>>>> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >>>>>>>> if(sched_job) >>>>>>>> drm_sched_increase_karma(sched_job); >>>>>>>> + /* >>>>>>>> + * Guilty job did complete and hence needs to be manually removed >>>>>>>> + * See drm_sched_stop doc. >>>>>>>> + */ >>>>>>>> + if (list_empty(&sched_job->node)) >>>>>>>> + sched_job->sched->ops->free_job(sched_job); >>>>>>> If the if (sched_job) is necessary up above, then this should clearly be >>>>>>> under it. >>>>>>> >>>>>>> But, can we please have a core scheduler thing we call here instead of >>>>>>> drivers all replicating it? >>>>>> Yeah that's also something I noted before. >>>>>> >>>>>> Essential problem is that we remove finished jobs from the mirror list >>>>>> and so need to destruct them because we otherwise leak them. >>>>>> >>>>>> Alternative approach here would be to keep the jobs on the ring mirror >>>>>> list, but not submit them again. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>> I really prefer to avoid this, it means adding extra flag to sched_job >>>>> to check in each iteration of the ring mirror list. >>>> Mhm, why actually? We just need to check if the scheduler fence is >>>> signaled. >>> OK, i see it's equivalent but this still en extra check for all the >>> iterations. >>> >>>>> What about changing >>>>> signature of drm_sched_backend_ops.timedout_job to return drm_sched_job* >>>>> instead
Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
On 4/17/19 2:01 PM, Koenig, Christian wrote: > Am 17.04.19 um 19:59 schrieb Christian König: >> Am 17.04.19 um 19:53 schrieb Grodzovsky, Andrey: >>> On 4/17/19 1:17 PM, Christian König wrote: >>>> I can't review this patch, since I'm one of the authors of it, but in >>>> general your changes look good to me now. >>>> >>>> For patch #5 I think it might be cleaner if we move incrementing of >>>> the hw_rq_count while starting the scheduler again. >>> But the increment of hw_rq_count is conditional on if the guilty job >>> was signaled, moving it into drm_sched_start will also force me to pass >>> 'job_signaled' flag into drm_sched_start which is against your original >>> comment that we don't want to pass this logic around helper functions >>> and keep it all in one place which is amdgpu_device_gpu_recover. >> Well I hope that incrementing hw_rq_count is conditional for signaled >> jobs anyway, or otherwise we would seriously mess up the counter. >> >> E.g. in drm_sched_stop() we also only decrement it when we where able >> to remove the callback. > Ok, checking the code again we don't need any special handling here > since all signaled jobs are already removed from the mirror_list. > > Christian. We decrement in drm_sched_stop and then later if the guilty job is found to be signaled we are skipping drm_sched_resubmit_jobs and so will not increment back and then the count becomes 'negative' when the fence signals and i got a bug. But now i think what i need is to just move the atomic_inc(&sched->hw_rq_count) from drm_sched_resubmit_jobs into drm_sched_start and so this way i can get rid of the conditional re-incriment i am doing now. Agree ? Andrey > >> Christian. >> >>> Andrey >>> >>> >>>> Regards, >>>> Christian. >>>> >>>> Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey: >>>>> Ping on this patch and patch 5. The rest already RBed. >>>>> >>>>> Andrey >>>>> >>>>> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote: >>>>>> From: Christian König >>>>>> >>>>>> We now destroy finished jobs from the worker thread to make sure that >>>>>> we never destroy a job currently in timeout processing. >>>>>> By this we avoid holding lock around ring mirror list in >>>>>> drm_sched_stop >>>>>> which should solve a deadlock reported by a user. >>>>>> >>>>>> v2: Remove unused variable. >>>>>> v4: Move guilty job free into sched code. >>>>>> >>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>>>>> >>>>>> Signed-off-by: Christian König >>>>>> Signed-off-by: Andrey Grodzovsky >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- >>>>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- >>>>>> drivers/gpu/drm/lima/lima_sched.c | 2 +- >>>>>> drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 145 >>>>>> + >>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 2 +- >>>>>> include/drm/gpu_scheduler.h | 6 +- >>>>>> 8 files changed, 94 insertions(+), 78 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> index 7cee269..a0e165c 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct >>>>>> amdgpu_device *adev, >>>>>> if (!ring || !ring->sched.thread) >>>>>> continue; >>>>>> - drm_sched_stop(&ring->sched); >>>>>> + drm_sched_stop(&ring->sched, &job->base); >>>>>> /* after all hw jobs are reset, hw fence is >>>>>> meaningless, so force_completion */ >>>>>> amd
Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
On 4/17/19 1:17 PM, Christian König wrote: > I can't review this patch, since I'm one of the authors of it, but in > general your changes look good to me now. > > For patch #5 I think it might be cleaner if we move incrementing of > the hw_rq_count while starting the scheduler again. But the increment of hw_rq_count is conditional on if the guilty job was signaled, moving it into drm_sched_start will also force me to pass 'job_signaled' flag into drm_sched_start which is against your original comment that we don't want to pass this logic around helper functions and keep it all in one place which is amdgpu_device_gpu_recover. Andrey > > Regards, > Christian. > > Am 17.04.19 um 16:36 schrieb Grodzovsky, Andrey: >> Ping on this patch and patch 5. The rest already RBed. >> >> Andrey >> >> On 4/16/19 2:23 PM, Andrey Grodzovsky wrote: >>> From: Christian König >>> >>> We now destroy finished jobs from the worker thread to make sure that >>> we never destroy a job currently in timeout processing. >>> By this we avoid holding lock around ring mirror list in drm_sched_stop >>> which should solve a deadlock reported by a user. >>> >>> v2: Remove unused variable. >>> v4: Move guilty job free into sched code. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>> >>> Signed-off-by: Christian König >>> Signed-off-by: Andrey Grodzovsky >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- >>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- >>> drivers/gpu/drm/lima/lima_sched.c | 2 +- >>> drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- >>> drivers/gpu/drm/scheduler/sched_main.c | 145 >>> + >>> drivers/gpu/drm/v3d/v3d_sched.c | 2 +- >>> include/drm/gpu_scheduler.h | 6 +- >>> 8 files changed, 94 insertions(+), 78 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 7cee269..a0e165c 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct >>> amdgpu_device *adev, >>> if (!ring || !ring->sched.thread) >>> continue; >>> - drm_sched_stop(&ring->sched); >>> + drm_sched_stop(&ring->sched, &job->base); >>> /* after all hw jobs are reset, hw fence is >>> meaningless, so force_completion */ >>> amdgpu_fence_driver_force_completion(ring); >>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct >>> amdgpu_device *adev, >>> if(job) >>> drm_sched_increase_karma(&job->base); >>> - >>> - >>> if (!amdgpu_sriov_vf(adev)) { >>> if (!need_full_reset) >>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct >>> amdgpu_hive_info *hive, >>> return r; >>> } >>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device >>> *adev, >>> - struct amdgpu_job *job) >>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >>> { >>> int i; >>> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct >>> amdgpu_device *adev, >>> /* Post ASIC reset for all devs .*/ >>> list_for_each_entry(tmp_adev, device_list_handle, >>> gmc.xgmi.head) { >>> - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? >>> job : NULL); >>> + amdgpu_device_post_asic_reset(tmp_adev); >>> if (r) { >>> /* bad news, how to tell it to userspace ? */ >>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c >>> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >>> index 33854c9..5778d9c 100644 >>> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c >>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c >>> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) >>> mmu_size + gpu->buffer.size; >>> /* Add in the active command buffers */ >>> - spin_lock_irqsave(&gpu->sche
Re: [PATCH v4 3/5] drm/scheduler: rework job destruction
Ping on this patch and patch 5. The rest already RBed. Andrey On 4/16/19 2:23 PM, Andrey Grodzovsky wrote: > From: Christian König > > We now destroy finished jobs from the worker thread to make sure that > we never destroy a job currently in timeout processing. > By this we avoid holding lock around ring mirror list in drm_sched_stop > which should solve a deadlock reported by a user. > > v2: Remove unused variable. > v4: Move guilty job free into sched code. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 > > Signed-off-by: Christian König > Signed-off-by: Andrey Grodzovsky > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +- > drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- > drivers/gpu/drm/lima/lima_sched.c | 2 +- > drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- > drivers/gpu/drm/scheduler/sched_main.c | 145 > + > drivers/gpu/drm/v3d/v3d_sched.c| 2 +- > include/drm/gpu_scheduler.h| 6 +- > 8 files changed, 94 insertions(+), 78 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 7cee269..a0e165c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct > amdgpu_device *adev, > if (!ring || !ring->sched.thread) > continue; > > - drm_sched_stop(&ring->sched); > + drm_sched_stop(&ring->sched, &job->base); > > /* after all hw jobs are reset, hw fence is meaningless, so > force_completion */ > amdgpu_fence_driver_force_completion(ring); > @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct > amdgpu_device *adev, > if(job) > drm_sched_increase_karma(&job->base); > > - > - > if (!amdgpu_sriov_vf(adev)) { > > if (!need_full_reset) > @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info > *hive, > return r; > } > > -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev, > - struct amdgpu_job *job) > +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) > { > int i; > > @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device > *adev, > > /* Post ASIC reset for all devs .*/ > list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { > - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job > : NULL); > + amdgpu_device_post_asic_reset(tmp_adev); > > if (r) { > /* bad news, how to tell it to userspace ? */ > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c > b/drivers/gpu/drm/etnaviv/etnaviv_dump.c > index 33854c9..5778d9c 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c > @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) > mmu_size + gpu->buffer.size; > > /* Add in the active command buffers */ > - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); > list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { > submit = to_etnaviv_submit(s_job); > file_size += submit->cmdbuf.size; > n_obj++; > } > - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); > > /* Add in the active buffer objects */ > list_for_each_entry(vram, &gpu->mmu->mappings, mmu_node) { > @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu) > gpu->buffer.size, > etnaviv_cmdbuf_get_va(&gpu->buffer)); > > - spin_lock_irqsave(&gpu->sched.job_list_lock, flags); > list_for_each_entry(s_job, &gpu->sched.ring_mirror_list, node) { > submit = to_etnaviv_submit(s_job); > etnaviv_core_dump_mem(&iter, ETDUMP_BUF_CMD, > submit->cmdbuf.vaddr, submit->cmdbuf.size, > etnaviv_cmdbuf_get_va(&submit->cmdbuf)); > } > - spin_unlock_irqrestore(&gpu->sched.job_list_lock, flags); > > /* Reserve space for the bomap */ > if (n_bomap_pages) { > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index 6d24fea..a813c82 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -109,7 +109,7 @@ static void etnaviv_sched_timedout_job(struct > drm_sched_job *sched_job) > } > > /* block scheduler */ > - drm_sched_stop(&gpu->sched); > + drm_sched_stop(&gpu->sched, sched_job); > > if(sched_job) >
Re: [PATCH v3 1/5] drm/scheduler: rework job destruction
On 4/16/19 10:58 AM, Grodzovsky, Andrey wrote: > On 4/16/19 10:43 AM, Koenig, Christian wrote: >> Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >>> On 4/16/19 5:47 AM, Christian König wrote: >>>> Am 15.04.19 um 23:17 schrieb Eric Anholt: >>>>> Andrey Grodzovsky writes: >>>>> >>>>>> From: Christian König >>>>>> >>>>>> We now destroy finished jobs from the worker thread to make sure that >>>>>> we never destroy a job currently in timeout processing. >>>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop >>>>>> which should solve a deadlock reported by a user. >>>>>> >>>>>> v2: Remove unused variable. >>>>>> >>>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>>>>> >>>>>> Signed-off-by: Christian König >>>>>> Signed-off-by: Andrey Grodzovsky >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++-- >>>>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +- >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 138 >>>>>> + >>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +- >>>>> Missing corresponding panfrost and lima updates. You should probably >>>>> pull in drm-misc for hacking on the scheduler. >>>>> >>>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>> index ce7c737b..8efb091 100644 >>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c >>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >>>>>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, >>>>>> struct drm_sched_job *sched_job) >>>>>> /* block scheduler */ >>>>>> for (q = 0; q < V3D_MAX_QUEUES; q++) >>>>>> - drm_sched_stop(&v3d->queue[q].sched); >>>>>> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >>>>>> if(sched_job) >>>>>> drm_sched_increase_karma(sched_job); >>>>>> + /* >>>>>> + * Guilty job did complete and hence needs to be manually removed >>>>>> + * See drm_sched_stop doc. >>>>>> + */ >>>>>> + if (list_empty(&sched_job->node)) >>>>>> + sched_job->sched->ops->free_job(sched_job); >>>>> If the if (sched_job) is necessary up above, then this should clearly be >>>>> under it. >>>>> >>>>> But, can we please have a core scheduler thing we call here instead of >>>>> drivers all replicating it? >>>> Yeah that's also something I noted before. >>>> >>>> Essential problem is that we remove finished jobs from the mirror list >>>> and so need to destruct them because we otherwise leak them. >>>> >>>> Alternative approach here would be to keep the jobs on the ring mirror >>>> list, but not submit them again. >>>> >>>> Regards, >>>> Christian. >>> I really prefer to avoid this, it means adding extra flag to sched_job >>> to check in each iteration of the ring mirror list. >> Mhm, why actually? We just need to check if the scheduler fence is signaled. > OK, i see it's equivalent but this still en extra check for all the > iterations. > >>> What about changing >>> signature of drm_sched_backend_ops.timedout_job to return drm_sched_job* >>> instead of void, this way we can return the guilty job back from the >>> driver specific handler to the generic drm_sched_job_timedout and >>> release it there. >> Well the timeout handler already has the job, so returning it doesn't >> make much sense. >> >> The problem is rather that the timeout handler doesn't know if it should >> destroy the job or not. > > But the driver specific handler does, and actually returning back either > the pointer to the job or null will give an indication of that. We can > even return bool. > > Andrey Thinking a bit more about this - the way this check is done now "if (list_empty(&sched_job->node)) then free the sched_job" actually makes it possible to just move this as is from driver specific callbacks into drm_sched_job_timeout without any other changes. Andrey > >> Christian. >> >>> Andrey >>> >>>>>> + >>>>>> /* get the GPU back into the init state */ >>>>>> v3d_reset(v3d); >> ___ >> amd-gfx mailing list >> amd-...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ > amd-gfx mailing list > amd-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 1/5] drm/scheduler: rework job destruction
On 4/16/19 10:43 AM, Koenig, Christian wrote: > Am 16.04.19 um 16:36 schrieb Grodzovsky, Andrey: >> On 4/16/19 5:47 AM, Christian König wrote: >>> Am 15.04.19 um 23:17 schrieb Eric Anholt: >>>> Andrey Grodzovsky writes: >>>> >>>>> From: Christian König >>>>> >>>>> We now destroy finished jobs from the worker thread to make sure that >>>>> we never destroy a job currently in timeout processing. >>>>> By this we avoid holding lock around ring mirror list in drm_sched_stop >>>>> which should solve a deadlock reported by a user. >>>>> >>>>> v2: Remove unused variable. >>>>> >>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>>>> >>>>> Signed-off-by: Christian König >>>>> Signed-off-by: Andrey Grodzovsky >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++-- >>>>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +- >>>>> drivers/gpu/drm/scheduler/sched_main.c | 138 >>>>> + >>>>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +- >>>> Missing corresponding panfrost and lima updates. You should probably >>>> pull in drm-misc for hacking on the scheduler. >>>> >>>>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c >>>>> b/drivers/gpu/drm/v3d/v3d_sched.c >>>>> index ce7c737b..8efb091 100644 >>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c >>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >>>>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, >>>>> struct drm_sched_job *sched_job) >>>>> /* block scheduler */ >>>>> for (q = 0; q < V3D_MAX_QUEUES; q++) >>>>> - drm_sched_stop(&v3d->queue[q].sched); >>>>> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >>>>> if(sched_job) >>>>> drm_sched_increase_karma(sched_job); >>>>> + /* >>>>> + * Guilty job did complete and hence needs to be manually removed >>>>> + * See drm_sched_stop doc. >>>>> + */ >>>>> + if (list_empty(&sched_job->node)) >>>>> + sched_job->sched->ops->free_job(sched_job); >>>> If the if (sched_job) is necessary up above, then this should clearly be >>>> under it. >>>> >>>> But, can we please have a core scheduler thing we call here instead of >>>> drivers all replicating it? >>> Yeah that's also something I noted before. >>> >>> Essential problem is that we remove finished jobs from the mirror list >>> and so need to destruct them because we otherwise leak them. >>> >>> Alternative approach here would be to keep the jobs on the ring mirror >>> list, but not submit them again. >>> >>> Regards, >>> Christian. >> I really prefer to avoid this, it means adding extra flag to sched_job >> to check in each iteration of the ring mirror list. > Mhm, why actually? We just need to check if the scheduler fence is signaled. OK, i see it's equivalent but this still en extra check for all the iterations. > >> What about changing >> signature of drm_sched_backend_ops.timedout_job to return drm_sched_job* >> instead of void, this way we can return the guilty job back from the >> driver specific handler to the generic drm_sched_job_timedout and >> release it there. > Well the timeout handler already has the job, so returning it doesn't > make much sense. > > The problem is rather that the timeout handler doesn't know if it should > destroy the job or not. But the driver specific handler does, and actually returning back either the pointer to the job or null will give an indication of that. We can even return bool. Andrey > > Christian. > >> Andrey >> >>>>> + >>>>> /* get the GPU back into the init state */ >>>>> v3d_reset(v3d); > ___ > amd-gfx mailing list > amd-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 1/5] drm/scheduler: rework job destruction
On 4/16/19 5:47 AM, Christian König wrote: > Am 15.04.19 um 23:17 schrieb Eric Anholt: >> Andrey Grodzovsky writes: >> >>> From: Christian König >>> >>> We now destroy finished jobs from the worker thread to make sure that >>> we never destroy a job currently in timeout processing. >>> By this we avoid holding lock around ring mirror list in drm_sched_stop >>> which should solve a deadlock reported by a user. >>> >>> v2: Remove unused variable. >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692 >>> >>> Signed-off-by: Christian König >>> Signed-off-by: Andrey Grodzovsky >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 ++-- >>> drivers/gpu/drm/etnaviv/etnaviv_dump.c | 4 - >>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 9 +- >>> drivers/gpu/drm/scheduler/sched_main.c | 138 >>> + >>> drivers/gpu/drm/v3d/v3d_sched.c | 9 +- >> Missing corresponding panfrost and lima updates. You should probably >> pull in drm-misc for hacking on the scheduler. >> >>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c >>> b/drivers/gpu/drm/v3d/v3d_sched.c >>> index ce7c737b..8efb091 100644 >>> --- a/drivers/gpu/drm/v3d/v3d_sched.c >>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >>> @@ -232,11 +232,18 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, >>> struct drm_sched_job *sched_job) >>> /* block scheduler */ >>> for (q = 0; q < V3D_MAX_QUEUES; q++) >>> - drm_sched_stop(&v3d->queue[q].sched); >>> + drm_sched_stop(&v3d->queue[q].sched, sched_job); >>> if(sched_job) >>> drm_sched_increase_karma(sched_job); >>> + /* >>> + * Guilty job did complete and hence needs to be manually removed >>> + * See drm_sched_stop doc. >>> + */ >>> + if (list_empty(&sched_job->node)) >>> + sched_job->sched->ops->free_job(sched_job); >> If the if (sched_job) is necessary up above, then this should clearly be >> under it. >> >> But, can we please have a core scheduler thing we call here instead of >> drivers all replicating it? > > Yeah that's also something I noted before. > > Essential problem is that we remove finished jobs from the mirror list > and so need to destruct them because we otherwise leak them. > > Alternative approach here would be to keep the jobs on the ring mirror > list, but not submit them again. > > Regards, > Christian. I really prefer to avoid this, it means adding extra flag to sched_job to check in each iteration of the ring mirror list. What about changing signature of drm_sched_backend_ops.timedout_job to return drm_sched_job* instead of void, this way we can return the guilty job back from the driver specific handler to the generic drm_sched_job_timedout and release it there. Andrey > >> >>> + >>> /* get the GPU back into the init state */ >>> v3d_reset(v3d); > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/4] drm/amdgpu: Avoid HW reset if guilty job already signaled.
On 4/15/19 2:46 AM, Koenig, Christian wrote: I agree this would be good in case of amdgpu_device_pre_asic_reset because we can totally skip this function if guilty job already signaled, but for amdgpu_device_post_asic_reset it crates complications because drm_sched_start is right in the middle there after drm_sched_resubmit_jobs but before forcing back set mode in display so I prefer to keep passing 'job_signaled' to amdgpu_device_post_asic_reset. Alternative is to get rid of this function and bring it's body into amdgpu_device_gpu_recover which is already pretty cluttered and confusing. What do you think ? Yeah, that's what I meant with that this still looks rather unorganized. How about splitting up amdgpu_device_post_asic_reset into more functions? Would that work out as well? I looked into it and seems the simplest way is just to move this function body into the main reset function since it's pretty short function and there is internal loop across all rings inside. I resent the patches and also amended your lost display patch 'wait for fence without holding reservation lock'. Andrey I think what we should do is to keep amdgpu_device_gpu_recover() the top level control logic, e.g. which step is called on which device in which order. We should not push that decision into the individual steps, because that would make things even more confusing. Regards, Christian. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/4] drm/amdgpu: Avoid HW reset if guilty job already signaled.
On 4/12/19 3:39 AM, Christian König wrote: > Am 11.04.19 um 18:03 schrieb Andrey Grodzovsky: >> Also reject TDRs if another one already running. >> >> v2: >> Stop all schedulers across device and entire XGMI hive before >> force signaling HW fences. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 125 >> - >> 1 file changed, 88 insertions(+), 37 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index aabd043..ce9c494 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3327,7 +3327,8 @@ bool amdgpu_device_should_recover_gpu(struct >> amdgpu_device *adev) >> static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> struct amdgpu_job *job, >> - bool *need_full_reset_arg) >> + bool *need_full_reset_arg, >> + bool job_signaled) >> { >> int i, r = 0; >> bool need_full_reset = *need_full_reset_arg; >> @@ -3339,8 +3340,6 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> - drm_sched_stop(&ring->sched, &job->base); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3358,7 +3357,8 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> - if (!amdgpu_sriov_vf(adev)) { >> + /* Don't suspend on bare metal if we are not going to HW reset >> the ASIC */ >> + if (!amdgpu_sriov_vf(adev) && !job_signaled) { >> if (!need_full_reset) >> need_full_reset = amdgpu_device_ip_need_full_reset(adev); >> @@ -3495,7 +3495,7 @@ static int amdgpu_do_asic_reset(struct >> amdgpu_hive_info *hive, >> return r; >> } >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static void amdgpu_device_post_asic_reset(struct amdgpu_device >> *adev, bool job_signaled) >> { >> int i; >> @@ -3505,7 +3505,8 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> if (!ring || !ring->sched.thread) >> continue; >> - if (!adev->asic_reset_res) >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!adev->asic_reset_res && !job_signaled) >> drm_sched_resubmit_jobs(&ring->sched); >> drm_sched_start(&ring->sched, !adev->asic_reset_res); >> @@ -3518,14 +3519,21 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> adev->asic_reset_res = 0; >> } >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool >> trylock) >> { >> - mutex_lock(&adev->lock_reset); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> + >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3553,38 +3561,43 @@ static void amdgpu_device_unlock_adev(struct >> amdgpu_device *adev) >> int amdgpu_device_gpu_recover(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> - int r; >> + int r, i; > > BTW: Usual kernel coding style is to use reverse xmas tree. E.g. > variables like "int i, r" last in the declarations. > >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> struct list_head device_list, *device_list_handle = NULL; >> + bool xgmi_topology_present, need_full_reset, job_signaled; >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> dev_info(adev->dev, "GPU reset begin!\n"); >> + hive = amdgpu_get_xgmi_hive(adev, 0); > > The second parameter should actually be a bool, so please use false here. > >> + xgmi_topology_present = hive && >> adev->gmc.xgmi.num_physical_nodes > 1; > > Why are we actually checking num_physical_nodes here? That the usual way of knowing you have XGMI topology, but having hive!=NULL should be equivalent so I can remove it. > >> + >> /* >> - * In case of XGMI hive disallow concurrent resets to be triggered >> - * by different nodes. No point also since the one node already >> executing >> - * reset will also reset all the other nodes in the hive. >> + * Here we trylock to a
Re: [PATCH 4/4] drm/amd/display: Restore deleted patch to resolve reset deadlock.
On 4/12/19 3:40 AM, Christian König wrote: > Am 11.04.19 um 18:03 schrieb Andrey Grodzovsky: >> Patch '5edb0c9b Fix deadlock with display during hanged ring recovery' >> was accidentaly removed during one of DALs code merges. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 >> +-- >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> index 0648794..27e0383 100644 >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> @@ -5138,14 +5138,21 @@ static void amdgpu_dm_commit_planes(struct >> drm_atomic_state *state, >> */ >> abo = gem_to_amdgpu_bo(fb->obj[0]); >> r = amdgpu_bo_reserve(abo, true); >> - if (unlikely(r != 0)) { >> + if (unlikely(r != 0)) >> DRM_ERROR("failed to reserve buffer before flip\n"); >> - WARN_ON(1); >> - } > > I also already suggested to completely stop waiting while the BO is > being reserved, but looks like that got dropped as well. > > I would say something is seriously wrong with DALs development process > here. > > Christian. Yea, I think your patch that moved the wait out of the reserved section got dropped as well, when I re spin the series with your comments for the TDR stuff I will also add a patch restoring your change. Andrey > >> - /* Wait for all fences on this FB */ >> - WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, true, >> false, >> - MAX_SCHEDULE_TIMEOUT) < 0); >> + /* >> + * Wait for all fences on this FB. Do limited wait to avoid >> + * deadlock during GPU reset when this fence will not signal >> + * but we hold reservation lock for the BO. >> + */ >> + r = reservation_object_wait_timeout_rcu(abo->tbo.resv, >> + true, false, >> + msecs_to_jiffies(5000)); >> + if (unlikely(r == 0)) >> + DRM_ERROR("Waiting for fences timed out."); >> + >> + >> amdgpu_bo_get_tiling_flags(abo, &tiling_flags); > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 4/4] drm/amd/display: Restore deleted patch to resolve reset deadlock.
On 4/11/19 12:41 PM, Kazlauskas, Nicholas wrote: > On 4/11/19 12:03 PM, Andrey Grodzovsky wrote: >> Patch '5edb0c9b Fix deadlock with display during hanged ring recovery' >> was accidentaly removed during one of DALs code merges. >> >> Signed-off-by: Andrey Grodzovsky > Reviewed-by: Nicholas Kazlauskas > > Probably got lost in a refactor. > > Also, didn't Christian have a patch recently to not lock the reservation > object when waiting for the fence? Looks like that's missing too, or > maybe it didn't get merged. > > Nicholas Kazlauskas This patch actually didn't help to resolve the deadlock - take a look at https://bugs.freedesktop.org/show_bug.cgi?id=109692 toward the end. I believe that the reason is that the fences attached to the FB BO in the reservation object are SW fences, they are dettached from HW fences during reset and twill only be attached back/signaled later in the reset sequence when we resubmit the jobs in the ring mirror list. So force signaling the HW fences that we do before suspending the display will have no effect. But I am not 100% sure about this. Andrey > >> --- >>drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 19 +-- >>1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> index 0648794..27e0383 100644 >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c >> @@ -5138,14 +5138,21 @@ static void amdgpu_dm_commit_planes(struct >> drm_atomic_state *state, >> */ >> abo = gem_to_amdgpu_bo(fb->obj[0]); >> r = amdgpu_bo_reserve(abo, true); >> -if (unlikely(r != 0)) { >> +if (unlikely(r != 0)) >> DRM_ERROR("failed to reserve buffer before flip\n"); >> -WARN_ON(1); >> -} >> >> -/* Wait for all fences on this FB */ >> -WARN_ON(reservation_object_wait_timeout_rcu(abo->tbo.resv, >> true, false, >> - >> MAX_SCHEDULE_TIMEOUT) < 0); >> +/* >> + * Wait for all fences on this FB. Do limited wait to avoid >> + * deadlock during GPU reset when this fence will not signal >> + * but we hold reservation lock for the BO. >> + */ >> +r = reservation_object_wait_timeout_rcu(abo->tbo.resv, >> +true, false, >> +msecs_to_jiffies(5000)); >> +if (unlikely(r == 0)) >> +DRM_ERROR("Waiting for fences timed out."); >> + >> + >> >> amdgpu_bo_get_tiling_flags(abo, &tiling_flags); >> >> ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] drm/amdgpu: Avoid HW reset if guilty job already signaled.
On 4/10/19 10:41 AM, Christian König wrote: > Am 10.04.19 um 16:28 schrieb Grodzovsky, Andrey: >> On 4/10/19 10:06 AM, Christian König wrote: >>> Am 09.04.19 um 18:42 schrieb Grodzovsky, Andrey: >>>> On 4/9/19 10:50 AM, Christian König wrote: >>>>> Am 08.04.19 um 18:08 schrieb Andrey Grodzovsky: >>>>>> Also reject TDRs if another one already running. >>>>>> >>>>>> Signed-off-by: Andrey Grodzovsky >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 94 >>>>>> +- >>>>>> 1 file changed, 67 insertions(+), 27 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> index aabd043..4446892 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>> @@ -3327,10 +3327,12 @@ bool amdgpu_device_should_recover_gpu(struct >>>>>> amdgpu_device *adev) >>>>>> static int amdgpu_device_pre_asic_reset(struct amdgpu_device >>>>>> *adev, >>>>>> struct amdgpu_job *job, >>>>>> - bool *need_full_reset_arg) >>>>>> + bool *need_full_reset_arg, >>>>>> + bool *job_signaled) >>>>>> { >>>>>> int i, r = 0; >>>>>> bool need_full_reset = *need_full_reset_arg; >>>>>> + struct amdgpu_ring *job_ring = job ? >>>>>> to_amdgpu_ring(job->base.sched) : NULL; >>>>>> /* block all schedulers and reset given job's ring */ >>>>>> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>>>> @@ -3341,6 +3343,17 @@ static int >>>>>> amdgpu_device_pre_asic_reset(struct >>>>>> amdgpu_device *adev, >>>>>> drm_sched_stop(&ring->sched, &job->base); >>>>>> + /* >>>>>> + * Must check guilty signal here since after this point >>>>>> all old >>>>>> + * HW fences are force signaled. >>>>>> + * >>>>>> + * job->base holds a reference to parent fence >>>>>> + */ >>>>>> + if (job_signaled && job && ring == job_ring && >>>>>> + job->base.s_fence->parent && >>>>>> + dma_fence_is_signaled(job->base.s_fence->parent)) >>>>>> + *job_signaled = true; >>>>>> + >>>>> That won't work correctly. See when the guilty job is not on the >>>>> first >>>>> scheduler, you would already have force completed some before getting >>>>> here. >>>>> >>>>> Better to stop all schedulers first and then do the check. >>>>> >>>>> Christian. >>>> What do you mean by first scheduler ? There is one scheduler object >>>> per >>>> ring so I am not clear what 'first' means here. >>> Well for example if the guilty job is from a compute ring the we have >>> already force signaled the gfx ring here. >>> >>> Same is true for other devices in the same hive, so it would probably >>> be a good idea to move the force signaling and the IP reset somewhere >>> else and this check up a layer. >>> >>> Christian. >> >> Let me see if I understand you correctly - you want to AVOID ANY force >> signaling in case we are not going to HW reset and so you want to have >> the check if guilty is signaled BEFORE any ring fences are force >> signaled. Correct ? > > Correct. > > Basically we should do the following: > 1. Stop all schedulers to make sure that nothing is going on. > 2. Check the guilty job once more to make sure that it hasn't signaled > in the meantime. > 3. Start our reset procedure, with force complete, soft reset > eventually hard reset etc etc.. > 4. Resubmit all not yet completed jobs. > 5. Start the schedulers again. > > Christian. Why not just always ensure the guilty job's ring is always checked first and then do the rest of the rings - inside amdgpu_device_pre_asic_reset. Seems to me
Re: [PATCH 3/3] drm/amdgpu: Avoid HW reset if guilty job already signaled.
On 4/10/19 10:06 AM, Christian König wrote: > Am 09.04.19 um 18:42 schrieb Grodzovsky, Andrey: >> On 4/9/19 10:50 AM, Christian König wrote: >>> Am 08.04.19 um 18:08 schrieb Andrey Grodzovsky: >>>> Also reject TDRs if another one already running. >>>> >>>> Signed-off-by: Andrey Grodzovsky >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 94 >>>> +- >>>> 1 file changed, 67 insertions(+), 27 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index aabd043..4446892 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3327,10 +3327,12 @@ bool amdgpu_device_should_recover_gpu(struct >>>> amdgpu_device *adev) >>>> static int amdgpu_device_pre_asic_reset(struct amdgpu_device >>>> *adev, >>>> struct amdgpu_job *job, >>>> - bool *need_full_reset_arg) >>>> + bool *need_full_reset_arg, >>>> + bool *job_signaled) >>>> { >>>> int i, r = 0; >>>> bool need_full_reset = *need_full_reset_arg; >>>> + struct amdgpu_ring *job_ring = job ? >>>> to_amdgpu_ring(job->base.sched) : NULL; >>>> /* block all schedulers and reset given job's ring */ >>>> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >>>> @@ -3341,6 +3343,17 @@ static int amdgpu_device_pre_asic_reset(struct >>>> amdgpu_device *adev, >>>> drm_sched_stop(&ring->sched, &job->base); >>>> + /* >>>> + * Must check guilty signal here since after this point >>>> all old >>>> + * HW fences are force signaled. >>>> + * >>>> + * job->base holds a reference to parent fence >>>> + */ >>>> + if (job_signaled && job && ring == job_ring && >>>> + job->base.s_fence->parent && >>>> + dma_fence_is_signaled(job->base.s_fence->parent)) >>>> + *job_signaled = true; >>>> + >>> That won't work correctly. See when the guilty job is not on the first >>> scheduler, you would already have force completed some before getting >>> here. >>> >>> Better to stop all schedulers first and then do the check. >>> >>> Christian. >> >> What do you mean by first scheduler ? There is one scheduler object per >> ring so I am not clear what 'first' means here. > > Well for example if the guilty job is from a compute ring the we have > already force signaled the gfx ring here. > > Same is true for other devices in the same hive, so it would probably > be a good idea to move the force signaling and the IP reset somewhere > else and this check up a layer. > > Christian. Let me see if I understand you correctly - you want to AVOID ANY force signaling in case we are not going to HW reset and so you want to have the check if guilty is signaled BEFORE any ring fences are force signaled. Correct ? Andrey > >> >> Andrey >> >> >>>> /* after all hw jobs are reset, hw fence is meaningless, so >>>> force_completion */ >>>> amdgpu_fence_driver_force_completion(ring); >>>> } >>>> @@ -3358,7 +3371,8 @@ static int amdgpu_device_pre_asic_reset(struct >>>> amdgpu_device *adev, >>>> - if (!amdgpu_sriov_vf(adev)) { >>>> + /* Don't suspend on bare metal if we are not going to HW reset >>>> the ASIC */ >>>> + if (!amdgpu_sriov_vf(adev) && !(*job_signaled)) { >>>> if (!need_full_reset) >>>> need_full_reset = >>>> amdgpu_device_ip_need_full_reset(adev); >>>> @@ -3495,7 +3509,7 @@ static int amdgpu_do_asic_reset(struct >>>> amdgpu_hive_info *hive, >>>> return r; >>>> } >>>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device >>>> *adev) >>>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device >>>> *adev, bool job_signaled) >>>> { >>>> int i; >>>> @@ -3505,7
Re: [PATCH 3/3] drm/amdgpu: Avoid HW reset if guilty job already signaled.
On 4/9/19 10:50 AM, Christian König wrote: > Am 08.04.19 um 18:08 schrieb Andrey Grodzovsky: >> Also reject TDRs if another one already running. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 94 >> +- >> 1 file changed, 67 insertions(+), 27 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index aabd043..4446892 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3327,10 +3327,12 @@ bool amdgpu_device_should_recover_gpu(struct >> amdgpu_device *adev) >> static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> struct amdgpu_job *job, >> - bool *need_full_reset_arg) >> + bool *need_full_reset_arg, >> + bool *job_signaled) >> { >> int i, r = 0; >> bool need_full_reset = *need_full_reset_arg; >> + struct amdgpu_ring *job_ring = job ? >> to_amdgpu_ring(job->base.sched) : NULL; >> /* block all schedulers and reset given job's ring */ >> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> @@ -3341,6 +3343,17 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> drm_sched_stop(&ring->sched, &job->base); >> + /* >> + * Must check guilty signal here since after this point all old >> + * HW fences are force signaled. >> + * >> + * job->base holds a reference to parent fence >> + */ >> + if (job_signaled && job && ring == job_ring && >> + job->base.s_fence->parent && >> + dma_fence_is_signaled(job->base.s_fence->parent)) >> + *job_signaled = true; >> + > > That won't work correctly. See when the guilty job is not on the first > scheduler, you would already have force completed some before getting > here. > > Better to stop all schedulers first and then do the check. > > Christian. What do you mean by first scheduler ? There is one scheduler object per ring so I am not clear what 'first' means here. Andrey > >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3358,7 +3371,8 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> - if (!amdgpu_sriov_vf(adev)) { >> + /* Don't suspend on bare metal if we are not going to HW reset >> the ASIC */ >> + if (!amdgpu_sriov_vf(adev) && !(*job_signaled)) { >> if (!need_full_reset) >> need_full_reset = amdgpu_device_ip_need_full_reset(adev); >> @@ -3495,7 +3509,7 @@ static int amdgpu_do_asic_reset(struct >> amdgpu_hive_info *hive, >> return r; >> } >> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> +static void amdgpu_device_post_asic_reset(struct amdgpu_device >> *adev, bool job_signaled) >> { >> int i; >> @@ -3505,7 +3519,8 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> if (!ring || !ring->sched.thread) >> continue; >> - if (!adev->asic_reset_res) >> + /* No point to resubmit jobs if we didn't HW reset*/ >> + if (!adev->asic_reset_res && !job_signaled) >> drm_sched_resubmit_jobs(&ring->sched); >> drm_sched_start(&ring->sched, !adev->asic_reset_res); >> @@ -3518,14 +3533,21 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev) >> adev->asic_reset_res = 0; >> } >> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev) >> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool >> trylock) >> { >> - mutex_lock(&adev->lock_reset); >> + if (trylock) { >> + if (!mutex_trylock(&adev->lock_reset)) >> + return false; >> + } else >> + mutex_lock(&adev->lock_reset); >> + >> atomic_inc(&adev->gpu_reset_counter); >> adev->in_gpu_reset = 1; >> /* Block kfd: SRIOV would do it separately */ >> if (!amdgpu_sriov_vf(adev)) >> amdgpu_amdkfd_pre_reset(adev); >> + >> + return true; >> } >> static void amdgpu_device_unlock_adev(struct amdgpu_device *adev) >> @@ -3555,29 +3577,44 @@ int amdgpu_device_gpu_recover(struct >> amdgpu_device *adev, >> { >> int r; >> struct amdgpu_hive_info *hive = NULL; >> - bool need_full_reset = false; >> struct amdgpu_device *tmp_adev = NULL; >> struct list_head device_list, *device_list_handle = NULL; >> + bool xgmi_topology_present, need_full_reset, job_signaled; >> + need_full_reset = job_signaled = false; >> INIT_LIST_HEAD(&device_list); >> dev_info(adev->dev, "GPU reset begin!\n"); >> + hive = amdgpu_get_xgmi_hive(adev, 0); >> + xgmi_topology_present = hive && >> adev->gm
Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched.
np Andrey On 3/13/19 1:53 PM, Eric Anholt wrote: > "Grodzovsky, Andrey" writes: > >> On 3/13/19 12:13 PM, Eric Anholt wrote: >>> "Grodzovsky, Andrey" writes: >>> >>>> They are not the same, but the guilty job belongs to only one {entity, >>>> scheduler} pair and so we mark as guilty only for that particular >>>> entity in the context of that scheduler only once. >>> I get it now, sorry. I'll merge this through drm-misc-next. >> np, i actually pushed it into our internal branch already so you can do >> that or wait for our next pull request. > I also fixed the whitespace in the moved code and added the missing > Fixes: line, so I'd like to get it merged through the proper tree for > maintaining v3d. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched.
On 3/13/19 12:13 PM, Eric Anholt wrote: > "Grodzovsky, Andrey" writes: > >> They are not the same, but the guilty job belongs to only one {entity, >> scheduler} pair and so we mark as guilty only for that particular >> entity in the context of that scheduler only once. > I get it now, sorry. I'll merge this through drm-misc-next. np, i actually pushed it into our internal branch already so you can do that or wait for our next pull request. Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched.
They are not the same, but the guilty job belongs to only one {entity, scheduler} pair and so we mark as guilty only for that particular entity in the context of that scheduler only once. Andrey From: Eric Anholt Sent: 12 March 2019 13:33:16 To: Grodzovsky, Andrey; dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; to...@tomeuvizoso.net Cc: Grodzovsky, Andrey Subject: Re: [PATCH] drm/v3d: Fix calling drm_sched_resubmit_jobs for same sched. Andrey Grodzovsky writes: > Also stop calling drm_sched_increase_karma multiple times. Each v3d->queue[q].sched was initialized with a separate drm_sched_init(). I wouldn't have thought they were all the "same sched". ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v6 1/2] drm/sched: Refactor ring mirror list handling.
On 3/12/19 3:43 AM, Tomeu Vizoso wrote: > On Thu, 27 Dec 2018 at 20:28, Andrey Grodzovsky > wrote: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from non signaled HW fenes wait for all signaled HW fences >> to complete before rerunning the jobs. >> >> v2: Fix resubmission of guilty job into HW after refactoring. >> >> v4: >> Full restart for all the jobs, not only from guilty ring. >> Extract karma increase into standalone function. >> >> v5: >> Rework waiting for signaled jobs without relying on the job >> struct itself as those might already be freed for non 'guilty' >> job's schedulers. >> Expose karma increase to drivers. >> >> v6: >> Use list_for_each_entry_safe_continue and drm_sched_process_job >> in case fence already signaled. >> Call drm_sched_increase_karma only once for amdgpu and add documentation. >> >> Suggested-by: Christian Koenig >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 ++- >> drivers/gpu/drm/etnaviv/etnaviv_sched.c| 11 +- >> drivers/gpu/drm/scheduler/sched_main.c | 195 >> +++-- >> drivers/gpu/drm/v3d/v3d_sched.c| 12 +- >> include/drm/gpu_scheduler.h| 8 +- >> 5 files changed, 157 insertions(+), 89 deletions(-) >> > [snip] >> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c >> b/drivers/gpu/drm/v3d/v3d_sched.c >> index 445b2ef..f76d9ed 100644 >> --- a/drivers/gpu/drm/v3d/v3d_sched.c >> +++ b/drivers/gpu/drm/v3d/v3d_sched.c >> @@ -178,18 +178,22 @@ v3d_job_timedout(struct drm_sched_job *sched_job) >> for (q = 0; q < V3D_MAX_QUEUES; q++) { >> struct drm_gpu_scheduler *sched = &v3d->queue[q].sched; >> >> - kthread_park(sched->thread); >> - drm_sched_hw_job_reset(sched, (sched_job->sched == sched ? >> + drm_sched_stop(sched, (sched_job->sched == sched ? >> sched_job : NULL)); >> + >> + if(sched_job) >> + drm_sched_increase_karma(sched_job); >> } >> >> /* get the GPU back into the init state */ >> v3d_reset(v3d); >> >> + for (q = 0; q < V3D_MAX_QUEUES; q++) >> + drm_sched_resubmit_jobs(sched_job->sched); > Hi Andrey, > > I'm not sure of what was the original intent, but I guess it wasn't to > repeatedly call resubmit_jobs on that specific job's queue? > > Regards, > > Tomeu My bad, there is also another mistake here with increasing karma for the guilty job's entity multiple times. I will fix that. Thanks for pointing out. Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2] tests/amdgpu: add deadlock test for sdma
On 3/6/19 1:37 AM, Cui, Flora wrote: > deadlock test for sdma will cause gpu recoverty. > disable the test for now until GPU reset recovery could survive at least > 1000 times test. Can you specify what issues you see and on what ASIC ? Andrey > > v2: add modprobe parameter > > Change-Id: I9adac63c62db22107345eddb30e7d81a1bda838c > Signed-off-by: Flora Cui > --- > tests/amdgpu/amdgpu_test.c| 4 ++ > tests/amdgpu/deadlock_tests.c | 103 > +- > 2 files changed, 106 insertions(+), 1 deletion(-) > > diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c > index ebf4409..38b8a68 100644 > --- a/tests/amdgpu/amdgpu_test.c > +++ b/tests/amdgpu/amdgpu_test.c > @@ -426,6 +426,10 @@ static void amdgpu_disable_suites() > "compute ring block test (set > amdgpu.lockup_timeout=50)", CU_FALSE)) > fprintf(stderr, "test deactivation failed - %s\n", > CU_get_error_msg()); > > + if (amdgpu_set_test_active(DEADLOCK_TESTS_STR, > + "sdma ring block test (set > amdgpu.lockup_timeout=50)", CU_FALSE)) > + fprintf(stderr, "test deactivation failed - %s\n", > CU_get_error_msg()); > + > if (amdgpu_set_test_active(BO_TESTS_STR, "Metadata", CU_FALSE)) > fprintf(stderr, "test deactivation failed - %s\n", > CU_get_error_msg()); > > diff --git a/tests/amdgpu/deadlock_tests.c b/tests/amdgpu/deadlock_tests.c > index a6c2635..91368c1 100644 > --- a/tests/amdgpu/deadlock_tests.c > +++ b/tests/amdgpu/deadlock_tests.c > @@ -96,6 +96,9 @@ > > #define mmVM_CONTEXT0_PAGE_TABLE_BASE_ADDR > 0x54f > > +#define SDMA_PKT_HEADER_OP(x)(x & 0xff) > +#define SDMA_OP_POLL_REGMEM 8 > + > static amdgpu_device_handle device_handle; > static uint32_t major_version; > static uint32_t minor_version; > @@ -110,6 +113,7 @@ static void amdgpu_deadlock_gfx(void); > static void amdgpu_deadlock_compute(void); > static void amdgpu_illegal_reg_access(); > static void amdgpu_illegal_mem_access(); > +static void amdgpu_deadlock_sdma(void); > > CU_BOOL suite_deadlock_tests_enable(void) > { > @@ -171,6 +175,7 @@ int suite_deadlock_tests_clean(void) > CU_TestInfo deadlock_tests[] = { > { "gfx ring block test (set amdgpu.lockup_timeout=50)", > amdgpu_deadlock_gfx }, > { "compute ring block test (set amdgpu.lockup_timeout=50)", > amdgpu_deadlock_compute }, > + { "sdma ring block test (set amdgpu.lockup_timeout=50)", > amdgpu_deadlock_sdma }, > { "illegal reg access test", amdgpu_illegal_reg_access }, > { "illegal mem access test (set amdgpu.vm_fault_stop=2)", > amdgpu_illegal_mem_access }, > CU_TEST_INFO_NULL, > @@ -260,7 +265,6 @@ static void amdgpu_deadlock_helper(unsigned ip_type) > ibs_request.ibs = &ib_info; > ibs_request.resources = bo_list; > ibs_request.fence_info.handle = NULL; > - > for (i = 0; i < 200; i++) { > r = amdgpu_cs_submit(context_handle, 0,&ibs_request, 1); > CU_ASSERT_EQUAL((r == 0 || r == -ECANCELED), 1); > @@ -291,6 +295,103 @@ static void amdgpu_deadlock_helper(unsigned ip_type) > CU_ASSERT_EQUAL(r, 0); > } > > +static void amdgpu_deadlock_sdma(void) > +{ > + amdgpu_context_handle context_handle; > + amdgpu_bo_handle ib_result_handle; > + void *ib_result_cpu; > + uint64_t ib_result_mc_address; > + struct amdgpu_cs_request ibs_request; > + struct amdgpu_cs_ib_info ib_info; > + struct amdgpu_cs_fence fence_status; > + uint32_t expired; > + int i, r; > + amdgpu_bo_list_handle bo_list; > + amdgpu_va_handle va_handle; > + struct drm_amdgpu_info_hw_ip info; > + uint32_t ring_id; > + > + r = amdgpu_query_hw_ip_info(device_handle, AMDGPU_HW_IP_DMA, 0, &info); > + CU_ASSERT_EQUAL(r, 0); > + > + r = amdgpu_cs_ctx_create(device_handle, &context_handle); > + CU_ASSERT_EQUAL(r, 0); > + > + for (ring_id = 0; (1 << ring_id) & info.available_rings; ring_id++) { > + r = pthread_create(&stress_thread, NULL, write_mem_address, > NULL); > + CU_ASSERT_EQUAL(r, 0); > + > + r = amdgpu_bo_alloc_and_map_raw(device_handle, 4096, 4096, > + AMDGPU_GEM_DOMAIN_GTT, 0, use_uc_mtype ? > AMDGPU_VM_MTYPE_UC : 0, > + &ib_result_handle, > &ib_result_cpu, > + > &ib_result_mc_address, &va_handle); > + CU_ASSERT_EQUAL(r, 0); > + > + r = amdgpu_get_bo_list(device_handle, ib_result_handle, NULL, > +&bo_list); > + CU_ASSERT_EQUAL(r, 0); > + > + ptr = ib_result_cpu; > + i = 0; > + > + ptr[i++] = SDMA_PKT_HEADER_OP(SDMA_OP_POLL_REGMEM) | > + (0 << 26) | /* WAIT
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
OK, I will update patches 1 and 2 and given your RBs push them since they fix some races. I will then update and test patch 3 on some basic scenarios and will send it for separate review where I might put a TODO comment in code with my objections regarding long jobs form our discussion so you can see and reply on that. Andrey On 01/24/2019 06:34 AM, Koenig, Christian wrote: > I see a few cleanups on Patch #3 which actually belong in patch #1: > >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct >> drm_sched_job *bad) > The "bad" job parameter actually isn't used any more, isn't it? > >> +retry_wait: > Not used any more. > > But apart from that at least patch #1 and #2 look like they can have my > rb now. > > Patch #3 looks also like it should work after a bit of polishing. > > Thanks, > Christian. > > Am 18.01.19 um 20:15 schrieb Grodzovsky, Andrey: >> Attached series is the first 2 patches we already discussed about ring >> mirror list handling racing with all your comments fixed (still not >> committed). The third patch is a prototype based on the first 2 patches >> and on our discussion. >> >> Please take a look. >> >> Andrey >> >> >> On 01/18/2019 01:32 PM, Koenig, Christian wrote: >>> Am 18.01.19 um 18:34 schrieb Grodzovsky, Andrey: >>>> On 01/18/2019 12:10 PM, Koenig, Christian wrote: >>>>> Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey: >>>>>> On 01/18/2019 04:25 AM, Koenig, Christian wrote: >>>>>>> [SNIP] >>>>>>>>>>> Re-arming the timeout should probably have a much reduced value >>>>>>>>>>> when the job hasn't changed. E.g. something like a few ms. >>>>>>>> Now i got thinking about non hanged job in progress (job A) and let's >>>>>>>> say it's a long job , it just started executing but due to time out of >>>>>>>> another job (job B) on another (or this scheduler) it's parent cb got >>>>>>>> disconnected, we disarmed the tdr timer for the job's scheduler, >>>>>>>> meanwhile the timed out job did manage to complete before HW reset >>>>>>>> check and hence we skip HW reset, attach back the cb and rearm job's A >>>>>>>> tdr timer with a future value of few ms only - aren't we going to get >>>>>>>> false tdr triggered on job B now because we didn't let it enough time >>>>>>>> to run and complete ? I would prefer the other extreme of longer time >>>>>>>> for time out to trigger then false TDR. Optimally we would have per >>>>>>>> job timer and rearm to exactly the reminder of it's time out value - >>>>>>>> but we gave up on per job tdr work long ago. >>>>>>> Well we only re-arm the timeout with a shorter period if it already >>>>>>> triggered once. If we just suspend the timeout then we should still use >>>>>>> the longer period. >>>>>> Can you explain more on this ? I don't get it. >>>>> See drm_sched_job_timedout(), we re-arm the timeout at the end of the >>>>> procedure. >>>>> >>>>> We should change that and re-arm the timer with a much lower timeout if >>>>> the job is still not finished. >>>>> >>>>> Christian. >>>> I still don't see how this can fix the problem of of long job in >>>> progress triggering false tdr if no HW reset was done, but maybe I am >>>> missing other pieces you have in mind, I will finish the patch and send >>>> it and then we can be more specific based on the code. >>> Ok sounds good. We should probably discuss less on details and prototype >>> a bit more. >>> >>> Might be that I'm missing something here as well, so probably good to >>> have some code to talk about things more directly. >>> >>> Christian. >>> >>>> Andrey >>>> >>>>>> Andrey >>>>>> >>>>>>>> In general the more i think about it (correct me if I am wrong) I am >>>>>>>> less sure how much the optimization feature is useful - if job's time >>>>>>>> out did trigger what are the chances that the little more time we give >>>>>>>> it between beginning of tdr function and the time we do start the >>>>&
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
Attached series is the first 2 patches we already discussed about ring mirror list handling racing with all your comments fixed (still not committed). The third patch is a prototype based on the first 2 patches and on our discussion. Please take a look. Andrey On 01/18/2019 01:32 PM, Koenig, Christian wrote: > Am 18.01.19 um 18:34 schrieb Grodzovsky, Andrey: >> On 01/18/2019 12:10 PM, Koenig, Christian wrote: >>> Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey: >>>> On 01/18/2019 04:25 AM, Koenig, Christian wrote: >>>>> [SNIP] >>>>>>>>> Re-arming the timeout should probably have a much reduced value >>>>>>>>> when the job hasn't changed. E.g. something like a few ms. >>>>>> Now i got thinking about non hanged job in progress (job A) and let's >>>>>> say it's a long job , it just started executing but due to time out of >>>>>> another job (job B) on another (or this scheduler) it's parent cb got >>>>>> disconnected, we disarmed the tdr timer for the job's scheduler, >>>>>> meanwhile the timed out job did manage to complete before HW reset >>>>>> check and hence we skip HW reset, attach back the cb and rearm job's A >>>>>> tdr timer with a future value of few ms only - aren't we going to get >>>>>> false tdr triggered on job B now because we didn't let it enough time >>>>>> to run and complete ? I would prefer the other extreme of longer time >>>>>> for time out to trigger then false TDR. Optimally we would have per >>>>>> job timer and rearm to exactly the reminder of it's time out value - >>>>>> but we gave up on per job tdr work long ago. >>>>> Well we only re-arm the timeout with a shorter period if it already >>>>> triggered once. If we just suspend the timeout then we should still use >>>>> the longer period. >>>> Can you explain more on this ? I don't get it. >>> See drm_sched_job_timedout(), we re-arm the timeout at the end of the >>> procedure. >>> >>> We should change that and re-arm the timer with a much lower timeout if >>> the job is still not finished. >>> >>> Christian. >> I still don't see how this can fix the problem of of long job in >> progress triggering false tdr if no HW reset was done, but maybe I am >> missing other pieces you have in mind, I will finish the patch and send >> it and then we can be more specific based on the code. > Ok sounds good. We should probably discuss less on details and prototype > a bit more. > > Might be that I'm missing something here as well, so probably good to > have some code to talk about things more directly. > > Christian. > >> Andrey >> >>>> Andrey >>>> >>>>>> In general the more i think about it (correct me if I am wrong) I am >>>>>> less sure how much the optimization feature is useful - if job's time >>>>>> out did trigger what are the chances that the little more time we give >>>>>> it between beginning of tdr function and the time we do start the >>>>>> actual HW reset will be exactly what it needed to complete. Also, this >>>>>> is still not water proof as the job might complete and signal it's HW >>>>>> fence exactly after we checked for completion but before starting the >>>>>> HW reset code. >>>>> I don't see this as an optimization, but rather as mandatory for correct >>>>> operation. >>>>> >>>>> See without this we can run into issues because we execute jobs multiple >>>>> times. That can still happen with this clean handling, but it is much >>>>> more unlikely. >>>>> >>>>> Christian. >>>>> >>>>>> Andrey >>>>>> >>>>>>>> By unchanged you mean when we didn't resubmit the job because of the >>>>>>>> optimized non HW reset, right ? >>>>>>> Correct, yes. >>>>>>> >>>>>>>>>> About flushing tdr jobs in progress from .free_job cb - looks like >>>>>>>>>> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we >>>>>>>>>> still need to take care of flushing all sced->work_tdr for a >>>>>>>>>> device and for all devices in
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/18/2019 12:10 PM, Koenig, Christian wrote: > Am 18.01.19 um 16:21 schrieb Grodzovsky, Andrey: >> On 01/18/2019 04:25 AM, Koenig, Christian wrote: >>> [SNIP] >>>>>>> Re-arming the timeout should probably have a much reduced value >>>>>>> when the job hasn't changed. E.g. something like a few ms. >>>> Now i got thinking about non hanged job in progress (job A) and let's >>>> say it's a long job , it just started executing but due to time out of >>>> another job (job B) on another (or this scheduler) it's parent cb got >>>> disconnected, we disarmed the tdr timer for the job's scheduler, >>>> meanwhile the timed out job did manage to complete before HW reset >>>> check and hence we skip HW reset, attach back the cb and rearm job's A >>>> tdr timer with a future value of few ms only - aren't we going to get >>>> false tdr triggered on job B now because we didn't let it enough time >>>> to run and complete ? I would prefer the other extreme of longer time >>>> for time out to trigger then false TDR. Optimally we would have per >>>> job timer and rearm to exactly the reminder of it's time out value - >>>> but we gave up on per job tdr work long ago. >>> Well we only re-arm the timeout with a shorter period if it already >>> triggered once. If we just suspend the timeout then we should still use >>> the longer period. >> Can you explain more on this ? I don't get it. > See drm_sched_job_timedout(), we re-arm the timeout at the end of the > procedure. > > We should change that and re-arm the timer with a much lower timeout if > the job is still not finished. > > Christian. I still don't see how this can fix the problem of of long job in progress triggering false tdr if no HW reset was done, but maybe I am missing other pieces you have in mind, I will finish the patch and send it and then we can be more specific based on the code. Andrey > >> Andrey >> >>>> In general the more i think about it (correct me if I am wrong) I am >>>> less sure how much the optimization feature is useful - if job's time >>>> out did trigger what are the chances that the little more time we give >>>> it between beginning of tdr function and the time we do start the >>>> actual HW reset will be exactly what it needed to complete. Also, this >>>> is still not water proof as the job might complete and signal it's HW >>>> fence exactly after we checked for completion but before starting the >>>> HW reset code. >>> I don't see this as an optimization, but rather as mandatory for correct >>> operation. >>> >>> See without this we can run into issues because we execute jobs multiple >>> times. That can still happen with this clean handling, but it is much >>> more unlikely. >>> >>> Christian. >>> >>>> Andrey >>>> >>>>>> By unchanged you mean when we didn't resubmit the job because of the >>>>>> optimized non HW reset, right ? >>>>> Correct, yes. >>>>> >>>>>>>> About flushing tdr jobs in progress from .free_job cb - looks like >>>>>>>> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we >>>>>>>> still need to take care of flushing all sced->work_tdr for a >>>>>>>> device and for all devices in hive for XGMI. >>>>>>>> What do you think ? >>>>>>> Why should that be necessary? We only wait for the delayed work to >>>>>>> make sure that the job is not destroyed while dealing with it. >>>>>>> >>>>>>> Christian. >>>>>> But we might not be waiting for the correct sched->work_tdr, we do >>>>>> the reset routine for all schedulers in a device accessing their >>>>>> jobs too and not only for the scheduler to which the job belongs. >>>>>> For XGMI not only that, we reset all the devices in the hive. >>>>> That is harmless you only need to wait for the work_tdr of the >>>>> current scheduler, not for all of them. >>>>> >>>>>> I was thinking, amdgpu driver is not even interested in allowing >>>>>> multiple sced->tdr to execute together - we have to serialize all of >>>>>> them anyway with the trylock mutex (even without XGMI), v3d in >>
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/18/2019 04:25 AM, Koenig, Christian wrote: > [SNIP] > Re-arming the timeout should probably have a much reduced value > when the job hasn't changed. E.g. something like a few ms. >> Now i got thinking about non hanged job in progress (job A) and let's >> say it's a long job , it just started executing but due to time out of >> another job (job B) on another (or this scheduler) it's parent cb got >> disconnected, we disarmed the tdr timer for the job's scheduler, >> meanwhile the timed out job did manage to complete before HW reset >> check and hence we skip HW reset, attach back the cb and rearm job's A >> tdr timer with a future value of few ms only - aren't we going to get >> false tdr triggered on job B now because we didn't let it enough time >> to run and complete ? I would prefer the other extreme of longer time >> for time out to trigger then false TDR. Optimally we would have per >> job timer and rearm to exactly the reminder of it's time out value - >> but we gave up on per job tdr work long ago. > Well we only re-arm the timeout with a shorter period if it already > triggered once. If we just suspend the timeout then we should still use > the longer period. Can you explain more on this ? I don't get it. Andrey > >> In general the more i think about it (correct me if I am wrong) I am >> less sure how much the optimization feature is useful - if job's time >> out did trigger what are the chances that the little more time we give >> it between beginning of tdr function and the time we do start the >> actual HW reset will be exactly what it needed to complete. Also, this >> is still not water proof as the job might complete and signal it's HW >> fence exactly after we checked for completion but before starting the >> HW reset code. > I don't see this as an optimization, but rather as mandatory for correct > operation. > > See without this we can run into issues because we execute jobs multiple > times. That can still happen with this clean handling, but it is much > more unlikely. > > Christian. > >> Andrey >> By unchanged you mean when we didn't resubmit the job because of the optimized non HW reset, right ? >>> Correct, yes. >>> >> About flushing tdr jobs in progress from .free_job cb - looks like >> drm_sched_job_finish->cancel_delayed_work_sync is not enough, we >> still need to take care of flushing all sced->work_tdr for a >> device and for all devices in hive for XGMI. >> What do you think ? > Why should that be necessary? We only wait for the delayed work to > make sure that the job is not destroyed while dealing with it. > > Christian. But we might not be waiting for the correct sched->work_tdr, we do the reset routine for all schedulers in a device accessing their jobs too and not only for the scheduler to which the job belongs. For XGMI not only that, we reset all the devices in the hive. >>> That is harmless you only need to wait for the work_tdr of the >>> current scheduler, not for all of them. >>> I was thinking, amdgpu driver is not even interested in allowing multiple sced->tdr to execute together - we have to serialize all of them anyway with the trylock mutex (even without XGMI), v3d in v3d_job_timedout seems also to reset all of his schedulers from the tdr work. Would it make sense to provide the sched->work_td as init parameter to scheduler (same one for all schedulers) so we can enforce serialization by disallowing more then 1 tdr work to execute in the same time ? Other drivers interested to do in parallel can provide unique sched->work_tdr per scheduler. This does imply drm_sched_job_timedout has to removed and delegated to specific driver implementation as probably other code dealing with sched->work_tdr... Maybe even move tdr handling to the driver all together ? >>> Yeah, I was thinking something similar. The problem with this >>> approach is that a delayed work item can have only one delay, but for >>> multiple engines we need multiple delays. >>> >>> What we could do is to make it a timer instead and raise the work >>> item from the device specific callback. >>> >>> But that doesn't really saves us the stop all schedulers trouble, so >>> it doesn't buy us much in the end if I see this correctly. >>> >>> Christian. > ___ > amd-gfx mailing list > amd-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/17/2019 10:29 AM, Koenig, Christian wrote: Am 17.01.19 um 16:22 schrieb Grodzovsky, Andrey: On 01/17/2019 02:45 AM, Christian König wrote: Am 16.01.19 um 18:17 schrieb Grodzovsky, Andrey: On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey: On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey: On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: On 01/11/2019 02:11 PM, Koenig, Christian wrote: Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: On 01/11/2019 04:42 AM, Koenig, Christian wrote: Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: [SNIP] But we will not be adding the cb back in drm_sched_stop anymore, now we are only going to add back the cb in drm_sched_startr after rerunning those jobs in drm_sched_resubmit_jobs and assign them a new parent there anyway. Yeah, but when we find that we don't need to reset anything anymore then adding the callbacks again won't be possible any more. Christian. I am not sure I understand it, can u point me to example of how this will happen ? I am attaching my latest patches with waiting only for the last job's fence here just so we are on same page regarding the code. Well the whole idea is to prepare all schedulers, then check once more if the offending job hasn't completed in the meantime. If the job completed we need to be able to rollback everything and continue as if nothing had happened. Christian. Oh, but this piece of functionality - skipping HW ASIC reset in case the guilty job done is totally missing form this patch series and so needs to be added. So what you say actually is that for the case were we skip HW asic reset because the guilty job did complete we also need to skip resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the old parent fence pointer for reuse ? If so I would like to add all this functionality as a third patch since the first 2 patches are more about resolving race condition with jobs in flight while doing reset - what do you think ? Yeah, sounds perfectly fine to me. Christian. I realized there is another complication now for XGMI hive use case, we currently skip gpu recover for adev in case another gpu recover for different adev in same hive is running, under the assumption that we are going to reset all devices in hive anyway because that should cover our own dev too. But if we chose to skip now HW asic reset if our guilty job did finish we will aslo not HW reset any other devices in the hive even if one of them might actually had a bad job, wanted to do gpu recover but skipped it because our recover was in progress in that time. My general idea on that is to keep a list of guilty jobs per hive, when you start gpu recover you first add you guilty job to the hive and trylock hive->reset_lock. Any owner of hive->reset_lock (gpu recovery in progress) once he finished his recovery and released hive->reset_lock should go over hive->bad_jobs_list and if at least one of them is still not signaled (not done) trigger another gpu recovery and so on. If you do manage to trylock you also go over the list, clean it and perform recovery. This list probably needs to be protected with per hive lock. I also think we can for now at least finish reviewing the first 2 patches and submit them since as I said before they are not dealing with this topic and fixing existing race conditions. If you are OK with that I can send for review the last iteration of the first 2 patches where I wait for the last fence in ring mirror list. Andrey I implemented HW reset avoidance including XGMI use case according to the plan i specified. Patch is attached but I can't test it yet due to XGMI regression in PSP which is supposed to be fixed soon. Please take a look. Looks a bit too complicated on first glance. In general we should probably get away from handling a hive in any special way. Yes, I guess i can do it the same way as the generic handling in amdgpu_device_gpu_recover - there is a list of devices to process which is of size 1 for non xgmi use case or more then 1 for XGMI. Multiple timeout jobs in a hive are identical to multiple timeout jobs on different engines on a single device. How about the following handling: 1. Add the timeout job to a list. 2. Try to grab a lock to handle the reset, if that doesn't work because there is already a reset underway return immediately. 3. Stop all schedulers on all affected devices including stopping the timeouts and detaching all callbacks. 4. Double check the list of timed out jobs, if all hw fences of all jobs are completed now we actually don't need to do anything. What if all the jobs on the timed out list did complete but other job (or jobs) for which we removed the time out timer became hanged ? Wouldn't we miss a required reset in this case and wouldn't even have any indication of their hang ? If t
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/17/2019 10:29 AM, Koenig, Christian wrote: Am 17.01.19 um 16:22 schrieb Grodzovsky, Andrey: On 01/17/2019 02:45 AM, Christian König wrote: Am 16.01.19 um 18:17 schrieb Grodzovsky, Andrey: On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey: On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey: On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: On 01/11/2019 02:11 PM, Koenig, Christian wrote: Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: On 01/11/2019 04:42 AM, Koenig, Christian wrote: Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: [SNIP] But we will not be adding the cb back in drm_sched_stop anymore, now we are only going to add back the cb in drm_sched_startr after rerunning those jobs in drm_sched_resubmit_jobs and assign them a new parent there anyway. Yeah, but when we find that we don't need to reset anything anymore then adding the callbacks again won't be possible any more. Christian. I am not sure I understand it, can u point me to example of how this will happen ? I am attaching my latest patches with waiting only for the last job's fence here just so we are on same page regarding the code. Well the whole idea is to prepare all schedulers, then check once more if the offending job hasn't completed in the meantime. If the job completed we need to be able to rollback everything and continue as if nothing had happened. Christian. Oh, but this piece of functionality - skipping HW ASIC reset in case the guilty job done is totally missing form this patch series and so needs to be added. So what you say actually is that for the case were we skip HW asic reset because the guilty job did complete we also need to skip resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the old parent fence pointer for reuse ? If so I would like to add all this functionality as a third patch since the first 2 patches are more about resolving race condition with jobs in flight while doing reset - what do you think ? Yeah, sounds perfectly fine to me. Christian. I realized there is another complication now for XGMI hive use case, we currently skip gpu recover for adev in case another gpu recover for different adev in same hive is running, under the assumption that we are going to reset all devices in hive anyway because that should cover our own dev too. But if we chose to skip now HW asic reset if our guilty job did finish we will aslo not HW reset any other devices in the hive even if one of them might actually had a bad job, wanted to do gpu recover but skipped it because our recover was in progress in that time. My general idea on that is to keep a list of guilty jobs per hive, when you start gpu recover you first add you guilty job to the hive and trylock hive->reset_lock. Any owner of hive->reset_lock (gpu recovery in progress) once he finished his recovery and released hive->reset_lock should go over hive->bad_jobs_list and if at least one of them is still not signaled (not done) trigger another gpu recovery and so on. If you do manage to trylock you also go over the list, clean it and perform recovery. This list probably needs to be protected with per hive lock. I also think we can for now at least finish reviewing the first 2 patches and submit them since as I said before they are not dealing with this topic and fixing existing race conditions. If you are OK with that I can send for review the last iteration of the first 2 patches where I wait for the last fence in ring mirror list. Andrey I implemented HW reset avoidance including XGMI use case according to the plan i specified. Patch is attached but I can't test it yet due to XGMI regression in PSP which is supposed to be fixed soon. Please take a look. Looks a bit too complicated on first glance. In general we should probably get away from handling a hive in any special way. Yes, I guess i can do it the same way as the generic handling in amdgpu_device_gpu_recover - there is a list of devices to process which is of size 1 for non xgmi use case or more then 1 for XGMI. Multiple timeout jobs in a hive are identical to multiple timeout jobs on different engines on a single device. How about the following handling: 1. Add the timeout job to a list. 2. Try to grab a lock to handle the reset, if that doesn't work because there is already a reset underway return immediately. 3. Stop all schedulers on all affected devices including stopping the timeouts and detaching all callbacks. 4. Double check the list of timed out jobs, if all hw fences of all jobs are completed now we actually don't need to do anything. What if all the jobs on the timed out list did complete but other job (or jobs) for which we removed the time out timer became hanged ? Wouldn't we miss a required reset in this case and wouldn't even have any indication of their hang ? If t
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/17/2019 02:45 AM, Christian König wrote: Am 16.01.19 um 18:17 schrieb Grodzovsky, Andrey: On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey: On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey: On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: On 01/11/2019 02:11 PM, Koenig, Christian wrote: Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: On 01/11/2019 04:42 AM, Koenig, Christian wrote: Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: [SNIP] But we will not be adding the cb back in drm_sched_stop anymore, now we are only going to add back the cb in drm_sched_startr after rerunning those jobs in drm_sched_resubmit_jobs and assign them a new parent there anyway. Yeah, but when we find that we don't need to reset anything anymore then adding the callbacks again won't be possible any more. Christian. I am not sure I understand it, can u point me to example of how this will happen ? I am attaching my latest patches with waiting only for the last job's fence here just so we are on same page regarding the code. Well the whole idea is to prepare all schedulers, then check once more if the offending job hasn't completed in the meantime. If the job completed we need to be able to rollback everything and continue as if nothing had happened. Christian. Oh, but this piece of functionality - skipping HW ASIC reset in case the guilty job done is totally missing form this patch series and so needs to be added. So what you say actually is that for the case were we skip HW asic reset because the guilty job did complete we also need to skip resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the old parent fence pointer for reuse ? If so I would like to add all this functionality as a third patch since the first 2 patches are more about resolving race condition with jobs in flight while doing reset - what do you think ? Yeah, sounds perfectly fine to me. Christian. I realized there is another complication now for XGMI hive use case, we currently skip gpu recover for adev in case another gpu recover for different adev in same hive is running, under the assumption that we are going to reset all devices in hive anyway because that should cover our own dev too. But if we chose to skip now HW asic reset if our guilty job did finish we will aslo not HW reset any other devices in the hive even if one of them might actually had a bad job, wanted to do gpu recover but skipped it because our recover was in progress in that time. My general idea on that is to keep a list of guilty jobs per hive, when you start gpu recover you first add you guilty job to the hive and trylock hive->reset_lock. Any owner of hive->reset_lock (gpu recovery in progress) once he finished his recovery and released hive->reset_lock should go over hive->bad_jobs_list and if at least one of them is still not signaled (not done) trigger another gpu recovery and so on. If you do manage to trylock you also go over the list, clean it and perform recovery. This list probably needs to be protected with per hive lock. I also think we can for now at least finish reviewing the first 2 patches and submit them since as I said before they are not dealing with this topic and fixing existing race conditions. If you are OK with that I can send for review the last iteration of the first 2 patches where I wait for the last fence in ring mirror list. Andrey I implemented HW reset avoidance including XGMI use case according to the plan i specified. Patch is attached but I can't test it yet due to XGMI regression in PSP which is supposed to be fixed soon. Please take a look. Looks a bit too complicated on first glance. In general we should probably get away from handling a hive in any special way. Yes, I guess i can do it the same way as the generic handling in amdgpu_device_gpu_recover - there is a list of devices to process which is of size 1 for non xgmi use case or more then 1 for XGMI. Multiple timeout jobs in a hive are identical to multiple timeout jobs on different engines on a single device. How about the following handling: 1. Add the timeout job to a list. 2. Try to grab a lock to handle the reset, if that doesn't work because there is already a reset underway return immediately. 3. Stop all schedulers on all affected devices including stopping the timeouts and detaching all callbacks. 4. Double check the list of timed out jobs, if all hw fences of all jobs are completed now we actually don't need to do anything. What if all the jobs on the timed out list did complete but other job (or jobs) for which we removed the time out timer became hanged ? Wouldn't we miss a required reset in this case and wouldn't even have any indication of their hang ? If the timeout triggers before we disable it we will have the job on the list of jobs which are ha
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/16/2019 11:02 AM, Koenig, Christian wrote: Am 16.01.19 um 16:45 schrieb Grodzovsky, Andrey: On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey: On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: On 01/11/2019 02:11 PM, Koenig, Christian wrote: Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: On 01/11/2019 04:42 AM, Koenig, Christian wrote: Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: [SNIP] But we will not be adding the cb back in drm_sched_stop anymore, now we are only going to add back the cb in drm_sched_startr after rerunning those jobs in drm_sched_resubmit_jobs and assign them a new parent there anyway. Yeah, but when we find that we don't need to reset anything anymore then adding the callbacks again won't be possible any more. Christian. I am not sure I understand it, can u point me to example of how this will happen ? I am attaching my latest patches with waiting only for the last job's fence here just so we are on same page regarding the code. Well the whole idea is to prepare all schedulers, then check once more if the offending job hasn't completed in the meantime. If the job completed we need to be able to rollback everything and continue as if nothing had happened. Christian. Oh, but this piece of functionality - skipping HW ASIC reset in case the guilty job done is totally missing form this patch series and so needs to be added. So what you say actually is that for the case were we skip HW asic reset because the guilty job did complete we also need to skip resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the old parent fence pointer for reuse ? If so I would like to add all this functionality as a third patch since the first 2 patches are more about resolving race condition with jobs in flight while doing reset - what do you think ? Yeah, sounds perfectly fine to me. Christian. I realized there is another complication now for XGMI hive use case, we currently skip gpu recover for adev in case another gpu recover for different adev in same hive is running, under the assumption that we are going to reset all devices in hive anyway because that should cover our own dev too. But if we chose to skip now HW asic reset if our guilty job did finish we will aslo not HW reset any other devices in the hive even if one of them might actually had a bad job, wanted to do gpu recover but skipped it because our recover was in progress in that time. My general idea on that is to keep a list of guilty jobs per hive, when you start gpu recover you first add you guilty job to the hive and trylock hive->reset_lock. Any owner of hive->reset_lock (gpu recovery in progress) once he finished his recovery and released hive->reset_lock should go over hive->bad_jobs_list and if at least one of them is still not signaled (not done) trigger another gpu recovery and so on. If you do manage to trylock you also go over the list, clean it and perform recovery. This list probably needs to be protected with per hive lock. I also think we can for now at least finish reviewing the first 2 patches and submit them since as I said before they are not dealing with this topic and fixing existing race conditions. If you are OK with that I can send for review the last iteration of the first 2 patches where I wait for the last fence in ring mirror list. Andrey I implemented HW reset avoidance including XGMI use case according to the plan i specified. Patch is attached but I can't test it yet due to XGMI regression in PSP which is supposed to be fixed soon. Please take a look. Looks a bit too complicated on first glance. In general we should probably get away from handling a hive in any special way. Yes, I guess i can do it the same way as the generic handling in amdgpu_device_gpu_recover - there is a list of devices to process which is of size 1 for non xgmi use case or more then 1 for XGMI. Multiple timeout jobs in a hive are identical to multiple timeout jobs on different engines on a single device. How about the following handling: 1. Add the timeout job to a list. 2. Try to grab a lock to handle the reset, if that doesn't work because there is already a reset underway return immediately. 3. Stop all schedulers on all affected devices including stopping the timeouts and detaching all callbacks. 4. Double check the list of timed out jobs, if all hw fences of all jobs are completed now we actually don't need to do anything. What if all the jobs on the timed out list did complete but other job (or jobs) for which we removed the time out timer became hanged ? Wouldn't we miss a required reset in this case and wouldn't even have any indication of their hang ? If the timeout triggers before we disable it we will have the job on the list of jobs which are hanging. If we found that we don't reset and re-enable the timeout it will trigger a bit later and
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/16/2019 02:46 AM, Christian König wrote: Am 15.01.19 um 23:01 schrieb Grodzovsky, Andrey: On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: On 01/11/2019 02:11 PM, Koenig, Christian wrote: Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: On 01/11/2019 04:42 AM, Koenig, Christian wrote: Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: [SNIP] But we will not be adding the cb back in drm_sched_stop anymore, now we are only going to add back the cb in drm_sched_startr after rerunning those jobs in drm_sched_resubmit_jobs and assign them a new parent there anyway. Yeah, but when we find that we don't need to reset anything anymore then adding the callbacks again won't be possible any more. Christian. I am not sure I understand it, can u point me to example of how this will happen ? I am attaching my latest patches with waiting only for the last job's fence here just so we are on same page regarding the code. Well the whole idea is to prepare all schedulers, then check once more if the offending job hasn't completed in the meantime. If the job completed we need to be able to rollback everything and continue as if nothing had happened. Christian. Oh, but this piece of functionality - skipping HW ASIC reset in case the guilty job done is totally missing form this patch series and so needs to be added. So what you say actually is that for the case were we skip HW asic reset because the guilty job did complete we also need to skip resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the old parent fence pointer for reuse ? If so I would like to add all this functionality as a third patch since the first 2 patches are more about resolving race condition with jobs in flight while doing reset - what do you think ? Yeah, sounds perfectly fine to me. Christian. I realized there is another complication now for XGMI hive use case, we currently skip gpu recover for adev in case another gpu recover for different adev in same hive is running, under the assumption that we are going to reset all devices in hive anyway because that should cover our own dev too. But if we chose to skip now HW asic reset if our guilty job did finish we will aslo not HW reset any other devices in the hive even if one of them might actually had a bad job, wanted to do gpu recover but skipped it because our recover was in progress in that time. My general idea on that is to keep a list of guilty jobs per hive, when you start gpu recover you first add you guilty job to the hive and trylock hive->reset_lock. Any owner of hive->reset_lock (gpu recovery in progress) once he finished his recovery and released hive->reset_lock should go over hive->bad_jobs_list and if at least one of them is still not signaled (not done) trigger another gpu recovery and so on. If you do manage to trylock you also go over the list, clean it and perform recovery. This list probably needs to be protected with per hive lock. I also think we can for now at least finish reviewing the first 2 patches and submit them since as I said before they are not dealing with this topic and fixing existing race conditions. If you are OK with that I can send for review the last iteration of the first 2 patches where I wait for the last fence in ring mirror list. Andrey I implemented HW reset avoidance including XGMI use case according to the plan i specified. Patch is attached but I can't test it yet due to XGMI regression in PSP which is supposed to be fixed soon. Please take a look. Looks a bit too complicated on first glance. In general we should probably get away from handling a hive in any special way. Yes, I guess i can do it the same way as the generic handling in amdgpu_device_gpu_recover - there is a list of devices to process which is of size 1 for non xgmi use case or more then 1 for XGMI. Multiple timeout jobs in a hive are identical to multiple timeout jobs on different engines on a single device. How about the following handling: 1. Add the timeout job to a list. 2. Try to grab a lock to handle the reset, if that doesn't work because there is already a reset underway return immediately. 3. Stop all schedulers on all affected devices including stopping the timeouts and detaching all callbacks. 4. Double check the list of timed out jobs, if all hw fences of all jobs are completed now we actually don't need to do anything. What if all the jobs on the timed out list did complete but other job (or jobs) for which we removed the time out timer became hanged ? Wouldn't we miss a required reset in this case and wouldn't even have any indication of their hang ? 5. Do the reset on all affected devices. 6. Drop the lock. 7. Add callbacks again and restart the schedulers. I see your steps don't include flushing any tdr in progress from drm_sched_job_finish (or as I did it from amdgpu_job_free_cb) does it mean you don't think we need to flush in ord
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/11/2019 05:03 PM, Andrey Grodzovsky wrote: > > > On 01/11/2019 02:11 PM, Koenig, Christian wrote: >> Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: >>> On 01/11/2019 04:42 AM, Koenig, Christian wrote: >>>> Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: >>>>> [SNIP] >>>>>>>> But we will not be adding the cb back in drm_sched_stop >>>>>>>> anymore, now we >>>>>>>> are only going to add back the cb in drm_sched_startr after >>>>>>>> rerunning >>>>>>>> those jobs in drm_sched_resubmit_jobs and assign them a new parent >>>>>>>> there >>>>>>>> anyway. >>>>>>> Yeah, but when we find that we don't need to reset anything anymore >>>>>>> then adding the callbacks again won't be possible any more. >>>>>>> >>>>>>> Christian. >>>>>> I am not sure I understand it, can u point me to example of how this >>>>>> will happen ? I am attaching my latest patches with waiting only for >>>>>> the last job's fence here just so we are on same page regarding >>>>>> the code. >>>> Well the whole idea is to prepare all schedulers, then check once more >>>> if the offending job hasn't completed in the meantime. >>>> >>>> If the job completed we need to be able to rollback everything and >>>> continue as if nothing had happened. >>>> >>>> Christian. >>> Oh, but this piece of functionality - skipping HW ASIC reset in case >>> the >>> guilty job done is totally missing form this patch series and so needs >>> to be added. So what you say actually is that for the case were we skip >>> HW asic reset because the guilty job did complete we also need to skip >>> resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the >>> old parent fence pointer for reuse ? If so I would like to add all this >>> functionality as a third patch since the first 2 patches are more about >>> resolving race condition with jobs in flight while doing reset - >>> what do >>> you think ? >> Yeah, sounds perfectly fine to me. >> >> Christian. > > I realized there is another complication now for XGMI hive use case, > we currently skip gpu recover for adev in case another gpu recover for > different adev in same hive is running, under the assumption that we > are going to reset all devices in hive anyway because that should > cover our own dev too. But if we chose to skip now HW asic reset if > our guilty job did finish we will aslo not HW reset any other devices > in the hive even if one of them might actually had a bad job, wanted > to do gpu recover but skipped it because our recover was in progress > in that time. > My general idea on that is to keep a list of guilty jobs per hive, > when you start gpu recover you first add you guilty job to the hive > and trylock hive->reset_lock. Any owner of hive->reset_lock (gpu > recovery in progress) once he finished his recovery and released > hive->reset_lock should go over hive->bad_jobs_list and if at least > one of them is still not signaled (not done) trigger another gpu > recovery and so on. If you do manage to trylock you also go over the > list, clean it and perform recovery. This list probably needs to be > protected with per hive lock. > I also think we can for now at least finish reviewing the first 2 > patches and submit them since as I said before they are not dealing > with this topic and fixing existing race conditions. If you are OK > with that I can send for review the last iteration of the first 2 > patches where I wait for the last fence in ring mirror list. > > Andrey I implemented HW reset avoidance including XGMI use case according to the plan i specified. Patch is attached but I can't test it yet due to XGMI regression in PSP which is supposed to be fixed soon. Please take a look. Andrey > >> >>> Andrey >>>>>> Andrey >>>>>> >>>> ___ >>>> amd-gfx mailing list >>>> amd-...@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> ___ >> amd-gfx mailing list >> amd-...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > From 60085547eec4002fe447783be68e7f14342944ff Mon Sep 17 00:00:00 2001 From:
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/11/2019 02:11 PM, Koenig, Christian wrote: > Am 11.01.19 um 16:37 schrieb Grodzovsky, Andrey: >> On 01/11/2019 04:42 AM, Koenig, Christian wrote: >>> Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: >>>> [SNIP] >>>>>>> But we will not be adding the cb back in drm_sched_stop anymore, now we >>>>>>> are only going to add back the cb in drm_sched_startr after rerunning >>>>>>> those jobs in drm_sched_resubmit_jobs and assign them a new parent >>>>>>> there >>>>>>> anyway. >>>>>> Yeah, but when we find that we don't need to reset anything anymore >>>>>> then adding the callbacks again won't be possible any more. >>>>>> >>>>>> Christian. >>>>> I am not sure I understand it, can u point me to example of how this >>>>> will happen ? I am attaching my latest patches with waiting only for >>>>> the last job's fence here just so we are on same page regarding the code. >>> Well the whole idea is to prepare all schedulers, then check once more >>> if the offending job hasn't completed in the meantime. >>> >>> If the job completed we need to be able to rollback everything and >>> continue as if nothing had happened. >>> >>> Christian. >> Oh, but this piece of functionality - skipping HW ASIC reset in case the >> guilty job done is totally missing form this patch series and so needs >> to be added. So what you say actually is that for the case were we skip >> HW asic reset because the guilty job did complete we also need to skip >> resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the >> old parent fence pointer for reuse ? If so I would like to add all this >> functionality as a third patch since the first 2 patches are more about >> resolving race condition with jobs in flight while doing reset - what do >> you think ? > Yeah, sounds perfectly fine to me. > > Christian. I realized there is another complication now for XGMI hive use case, we currently skip gpu recover for adev in case another gpu recover for different adev in same hive is running, under the assumption that we are going to reset all devices in hive anyway because that should cover our own dev too. But if we chose to skip now HW asic reset if our guilty job did finish we will aslo not HW reset any other devices in the hive even if one of them might actually had a bad job, wanted to do gpu recover but skipped it because our recover was in progress in that time. My general idea on that is to keep a list of guilty jobs per hive, when you start gpu recover you first add you guilty job to the hive and trylock hive->reset_lock. Any owner of hive->reset_lock (gpu recovery in progress) once he finished his recovery and released hive->reset_lock should go over hive->bad_jobs_list and if at least one of them is still not signaled (not done) trigger another gpu recovery and so on. If you do manage to trylock you also go over the list, clean it and perform recovery. This list probably needs to be protected with per hive lock. I also think we can for now at least finish reviewing the first 2 patches and submit them since as I said before they are not dealing with this topic and fixing existing race conditions. If you are OK with that I can send for review the last iteration of the first 2 patches where I wait for the last fence in ring mirror list. Andrey > >> Andrey >>>>> Andrey >>>>> >>> ___ >>> amd-gfx mailing list >>> amd-...@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ > amd-gfx mailing list > amd-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/11/2019 04:42 AM, Koenig, Christian wrote: > Am 10.01.19 um 16:56 schrieb Grodzovsky, Andrey: >> [SNIP] >>>>> But we will not be adding the cb back in drm_sched_stop anymore, now we >>>>> are only going to add back the cb in drm_sched_startr after rerunning >>>>> those jobs in drm_sched_resubmit_jobs and assign them a new parent >>>>> there >>>>> anyway. >>>> Yeah, but when we find that we don't need to reset anything anymore >>>> then adding the callbacks again won't be possible any more. >>>> >>>> Christian. >>> I am not sure I understand it, can u point me to example of how this >>> will happen ? I am attaching my latest patches with waiting only for >>> the last job's fence here just so we are on same page regarding the code. > Well the whole idea is to prepare all schedulers, then check once more > if the offending job hasn't completed in the meantime. > > If the job completed we need to be able to rollback everything and > continue as if nothing had happened. > > Christian. Oh, but this piece of functionality - skipping HW ASIC reset in case the guilty job done is totally missing form this patch series and so needs to be added. So what you say actually is that for the case were we skip HW asic reset because the guilty job did complete we also need to skip resubmitting the jobs in drm_sched_resubmit_jobs and hence preserve the old parent fence pointer for reuse ? If so I would like to add all this functionality as a third patch since the first 2 patches are more about resolving race condition with jobs in flight while doing reset - what do you think ? Andrey > >>> Andrey >>> > ___ > amd-gfx mailing list > amd-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
Just a ping. Andrey On 01/09/2019 10:18 AM, Andrey Grodzovsky wrote: > > > On 01/09/2019 05:22 AM, Christian König wrote: >> Am 07.01.19 um 20:47 schrieb Grodzovsky, Andrey: >>> >>> On 01/07/2019 09:13 AM, Christian König wrote: >>>> Am 03.01.19 um 18:42 schrieb Grodzovsky, Andrey: >>>>> On 01/03/2019 11:20 AM, Grodzovsky, Andrey wrote: >>>>>> On 01/03/2019 03:54 AM, Koenig, Christian wrote: >>>>>>> Am 21.12.18 um 21:36 schrieb Grodzovsky, Andrey: >>>>>>>> On 12/21/2018 01:37 PM, Christian König wrote: >>>>>>>>> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >>>>>>>>>> Decauple sched threads stop and start and ring mirror >>>>>>>>>> list handling from the policy of what to do about the >>>>>>>>>> guilty jobs. >>>>>>>>>> When stoppping the sched thread and detaching sched fences >>>>>>>>>> from non signaled HW fenes wait for all signaled HW fences >>>>>>>>>> to complete before rerunning the jobs. >>>>>>>>>> >>>>>>>>>> v2: Fix resubmission of guilty job into HW after refactoring. >>>>>>>>>> >>>>>>>>>> v4: >>>>>>>>>> Full restart for all the jobs, not only from guilty ring. >>>>>>>>>> Extract karma increase into standalone function. >>>>>>>>>> >>>>>>>>>> v5: >>>>>>>>>> Rework waiting for signaled jobs without relying on the job >>>>>>>>>> struct itself as those might already be freed for non 'guilty' >>>>>>>>>> job's schedulers. >>>>>>>>>> Expose karma increase to drivers. >>>>>>>>>> >>>>>>>>>> Suggested-by: Christian Koenig >>>>>>>>>> Signed-off-by: Andrey Grodzovsky >>>>>>>>>> --- >>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +-- >>>>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 +- >>>>>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 188 >>>>>>>>>> +++-- >>>>>>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 12 +- >>>>>>>>>> include/drm/gpu_scheduler.h | 10 +- >>>>>>>>>> 5 files changed, 151 insertions(+), 88 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> index 8a078f4..a4bd2d3 100644 >>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>>> @@ -3298,12 +3298,10 @@ static int >>>>>>>>>> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>>>>>>>> if (!ring || !ring->sched.thread) >>>>>>>>>> continue; >>>>>>>>>> - kthread_park(ring->sched.thread); >>>>>>>>>> + drm_sched_stop(&ring->sched, job ? &job->base : NULL); >>>>>>>>>> - if (job && job->base.sched != &ring->sched) >>>>>>>>>> - continue; >>>>>>>>>> - >>>>>>>>>> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : >>>>>>>>>> NULL); >>>>>>>>>> + if(job) >>>>>>>>>> + drm_sched_increase_karma(&job->base); >>>>>>>>> Since we dropped the "job && job->base.sched != &ring->sched" >>>>>>>>> check >>>>>>>>> above this will now increase the jobs karma multiple times. >>>>>>>>> >>>>>>>>> Maybe just move that outside of the loop. >>>>>>>>> >>>>>>>>>>
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/09/2019 05:22 AM, Christian König wrote: > Am 07.01.19 um 20:47 schrieb Grodzovsky, Andrey: >> >> On 01/07/2019 09:13 AM, Christian König wrote: >>> Am 03.01.19 um 18:42 schrieb Grodzovsky, Andrey: >>>> On 01/03/2019 11:20 AM, Grodzovsky, Andrey wrote: >>>>> On 01/03/2019 03:54 AM, Koenig, Christian wrote: >>>>>> Am 21.12.18 um 21:36 schrieb Grodzovsky, Andrey: >>>>>>> On 12/21/2018 01:37 PM, Christian König wrote: >>>>>>>> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >>>>>>>>> Decauple sched threads stop and start and ring mirror >>>>>>>>> list handling from the policy of what to do about the >>>>>>>>> guilty jobs. >>>>>>>>> When stoppping the sched thread and detaching sched fences >>>>>>>>> from non signaled HW fenes wait for all signaled HW fences >>>>>>>>> to complete before rerunning the jobs. >>>>>>>>> >>>>>>>>> v2: Fix resubmission of guilty job into HW after refactoring. >>>>>>>>> >>>>>>>>> v4: >>>>>>>>> Full restart for all the jobs, not only from guilty ring. >>>>>>>>> Extract karma increase into standalone function. >>>>>>>>> >>>>>>>>> v5: >>>>>>>>> Rework waiting for signaled jobs without relying on the job >>>>>>>>> struct itself as those might already be freed for non 'guilty' >>>>>>>>> job's schedulers. >>>>>>>>> Expose karma increase to drivers. >>>>>>>>> >>>>>>>>> Suggested-by: Christian Koenig >>>>>>>>> Signed-off-by: Andrey Grodzovsky >>>>>>>>> --- >>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +-- >>>>>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 +- >>>>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 188 >>>>>>>>> +++-- >>>>>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 12 +- >>>>>>>>> include/drm/gpu_scheduler.h | 10 +- >>>>>>>>> 5 files changed, 151 insertions(+), 88 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>> index 8a078f4..a4bd2d3 100644 >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>>>> @@ -3298,12 +3298,10 @@ static int >>>>>>>>> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>>>>>>> if (!ring || !ring->sched.thread) >>>>>>>>> continue; >>>>>>>>> - kthread_park(ring->sched.thread); >>>>>>>>> + drm_sched_stop(&ring->sched, job ? &job->base : NULL); >>>>>>>>> - if (job && job->base.sched != &ring->sched) >>>>>>>>> - continue; >>>>>>>>> - >>>>>>>>> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : >>>>>>>>> NULL); >>>>>>>>> + if(job) >>>>>>>>> + drm_sched_increase_karma(&job->base); >>>>>>>> Since we dropped the "job && job->base.sched != &ring->sched" >>>>>>>> check >>>>>>>> above this will now increase the jobs karma multiple times. >>>>>>>> >>>>>>>> Maybe just move that outside of the loop. >>>>>>>> >>>>>>>>> /* after all hw jobs are reset, hw fence is >>>>>>>>> meaningless, >>>>>>>>> so force_completion */ >>>>>>>>> amdgpu_fence_driver_force_completion(ring); >>>>>>>>> @@ -3454,14 +3452,10 @@ static void >>>&g
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/07/2019 09:13 AM, Christian König wrote: > Am 03.01.19 um 18:42 schrieb Grodzovsky, Andrey: >> >> On 01/03/2019 11:20 AM, Grodzovsky, Andrey wrote: >>> On 01/03/2019 03:54 AM, Koenig, Christian wrote: >>>> Am 21.12.18 um 21:36 schrieb Grodzovsky, Andrey: >>>>> On 12/21/2018 01:37 PM, Christian König wrote: >>>>>> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >>>>>>> Decauple sched threads stop and start and ring mirror >>>>>>> list handling from the policy of what to do about the >>>>>>> guilty jobs. >>>>>>> When stoppping the sched thread and detaching sched fences >>>>>>> from non signaled HW fenes wait for all signaled HW fences >>>>>>> to complete before rerunning the jobs. >>>>>>> >>>>>>> v2: Fix resubmission of guilty job into HW after refactoring. >>>>>>> >>>>>>> v4: >>>>>>> Full restart for all the jobs, not only from guilty ring. >>>>>>> Extract karma increase into standalone function. >>>>>>> >>>>>>> v5: >>>>>>> Rework waiting for signaled jobs without relying on the job >>>>>>> struct itself as those might already be freed for non 'guilty' >>>>>>> job's schedulers. >>>>>>> Expose karma increase to drivers. >>>>>>> >>>>>>> Suggested-by: Christian Koenig >>>>>>> Signed-off-by: Andrey Grodzovsky >>>>>>> --- >>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +-- >>>>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 +- >>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 188 >>>>>>> +++-- >>>>>>> drivers/gpu/drm/v3d/v3d_sched.c | 12 +- >>>>>>> include/drm/gpu_scheduler.h | 10 +- >>>>>>> 5 files changed, 151 insertions(+), 88 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> index 8a078f4..a4bd2d3 100644 >>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>>>> @@ -3298,12 +3298,10 @@ static int >>>>>>> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>>>>> if (!ring || !ring->sched.thread) >>>>>>> continue; >>>>>>> - kthread_park(ring->sched.thread); >>>>>>> + drm_sched_stop(&ring->sched, job ? &job->base : NULL); >>>>>>> - if (job && job->base.sched != &ring->sched) >>>>>>> - continue; >>>>>>> - >>>>>>> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : >>>>>>> NULL); >>>>>>> + if(job) >>>>>>> + drm_sched_increase_karma(&job->base); >>>>>> Since we dropped the "job && job->base.sched != &ring->sched" check >>>>>> above this will now increase the jobs karma multiple times. >>>>>> >>>>>> Maybe just move that outside of the loop. >>>>>> >>>>>>> /* after all hw jobs are reset, hw fence is >>>>>>> meaningless, >>>>>>> so force_completion */ >>>>>>> amdgpu_fence_driver_force_completion(ring); >>>>>>> @@ -3454,14 +3452,10 @@ static void >>>>>>> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >>>>>>> if (!ring || !ring->sched.thread) >>>>>>> continue; >>>>>>> - /* only need recovery sched of the given job's ring >>>>>>> - * or all rings (in the case @job is NULL) >>>>>>> - * after above amdgpu_reset accomplished >>>>>>> - */ >>>>>>> - if ((!job || job->base.sched == &ring->sched) && >
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/03/2019 11:20 AM, Grodzovsky, Andrey wrote: > > On 01/03/2019 03:54 AM, Koenig, Christian wrote: >> Am 21.12.18 um 21:36 schrieb Grodzovsky, Andrey: >>> On 12/21/2018 01:37 PM, Christian König wrote: >>>> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >>>>> Decauple sched threads stop and start and ring mirror >>>>> list handling from the policy of what to do about the >>>>> guilty jobs. >>>>> When stoppping the sched thread and detaching sched fences >>>>> from non signaled HW fenes wait for all signaled HW fences >>>>> to complete before rerunning the jobs. >>>>> >>>>> v2: Fix resubmission of guilty job into HW after refactoring. >>>>> >>>>> v4: >>>>> Full restart for all the jobs, not only from guilty ring. >>>>> Extract karma increase into standalone function. >>>>> >>>>> v5: >>>>> Rework waiting for signaled jobs without relying on the job >>>>> struct itself as those might already be freed for non 'guilty' >>>>> job's schedulers. >>>>> Expose karma increase to drivers. >>>>> >>>>> Suggested-by: Christian Koenig >>>>> Signed-off-by: Andrey Grodzovsky >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +-- >>>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 +- >>>>> drivers/gpu/drm/scheduler/sched_main.c | 188 >>>>> +++-- >>>>> drivers/gpu/drm/v3d/v3d_sched.c | 12 +- >>>>> include/drm/gpu_scheduler.h | 10 +- >>>>> 5 files changed, 151 insertions(+), 88 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> index 8a078f4..a4bd2d3 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>>> @@ -3298,12 +3298,10 @@ static int >>>>> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>>> if (!ring || !ring->sched.thread) >>>>> continue; >>>>> - kthread_park(ring->sched.thread); >>>>> + drm_sched_stop(&ring->sched, job ? &job->base : NULL); >>>>> - if (job && job->base.sched != &ring->sched) >>>>> - continue; >>>>> - >>>>> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : NULL); >>>>> + if(job) >>>>> + drm_sched_increase_karma(&job->base); >>>> Since we dropped the "job && job->base.sched != &ring->sched" check >>>> above this will now increase the jobs karma multiple times. >>>> >>>> Maybe just move that outside of the loop. >>>> >>>>> /* after all hw jobs are reset, hw fence is meaningless, >>>>> so force_completion */ >>>>> amdgpu_fence_driver_force_completion(ring); >>>>> @@ -3454,14 +3452,10 @@ static void >>>>> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >>>>> if (!ring || !ring->sched.thread) >>>>> continue; >>>>> - /* only need recovery sched of the given job's ring >>>>> - * or all rings (in the case @job is NULL) >>>>> - * after above amdgpu_reset accomplished >>>>> - */ >>>>> - if ((!job || job->base.sched == &ring->sched) && >>>>> !adev->asic_reset_res) >>>>> - drm_sched_job_recovery(&ring->sched); >>>>> + if (!adev->asic_reset_res) >>>>> + drm_sched_resubmit_jobs(&ring->sched); >>>>> - kthread_unpark(ring->sched.thread); >>>>> + drm_sched_start(&ring->sched, !adev->asic_reset_res); >>>>> } >>>>> if (!amdgpu_device_has_dc_support(adev)) { >>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>>>> index 49a6763..6f1268f 1
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 01/03/2019 03:54 AM, Koenig, Christian wrote: > Am 21.12.18 um 21:36 schrieb Grodzovsky, Andrey: >> On 12/21/2018 01:37 PM, Christian König wrote: >>> Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >>>> Decauple sched threads stop and start and ring mirror >>>> list handling from the policy of what to do about the >>>> guilty jobs. >>>> When stoppping the sched thread and detaching sched fences >>>> from non signaled HW fenes wait for all signaled HW fences >>>> to complete before rerunning the jobs. >>>> >>>> v2: Fix resubmission of guilty job into HW after refactoring. >>>> >>>> v4: >>>> Full restart for all the jobs, not only from guilty ring. >>>> Extract karma increase into standalone function. >>>> >>>> v5: >>>> Rework waiting for signaled jobs without relying on the job >>>> struct itself as those might already be freed for non 'guilty' >>>> job's schedulers. >>>> Expose karma increase to drivers. >>>> >>>> Suggested-by: Christian Koenig >>>> Signed-off-by: Andrey Grodzovsky >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +-- >>>> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 +- >>>> drivers/gpu/drm/scheduler/sched_main.c | 188 >>>> +++-- >>>> drivers/gpu/drm/v3d/v3d_sched.c | 12 +- >>>> include/drm/gpu_scheduler.h | 10 +- >>>> 5 files changed, 151 insertions(+), 88 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index 8a078f4..a4bd2d3 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3298,12 +3298,10 @@ static int >>>> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >>>> if (!ring || !ring->sched.thread) >>>> continue; >>>> - kthread_park(ring->sched.thread); >>>> + drm_sched_stop(&ring->sched, job ? &job->base : NULL); >>>> - if (job && job->base.sched != &ring->sched) >>>> - continue; >>>> - >>>> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : NULL); >>>> + if(job) >>>> + drm_sched_increase_karma(&job->base); >>> Since we dropped the "job && job->base.sched != &ring->sched" check >>> above this will now increase the jobs karma multiple times. >>> >>> Maybe just move that outside of the loop. >>> >>>> /* after all hw jobs are reset, hw fence is meaningless, >>>> so force_completion */ >>>> amdgpu_fence_driver_force_completion(ring); >>>> @@ -3454,14 +3452,10 @@ static void >>>> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >>>> if (!ring || !ring->sched.thread) >>>> continue; >>>> - /* only need recovery sched of the given job's ring >>>> - * or all rings (in the case @job is NULL) >>>> - * after above amdgpu_reset accomplished >>>> - */ >>>> - if ((!job || job->base.sched == &ring->sched) && >>>> !adev->asic_reset_res) >>>> - drm_sched_job_recovery(&ring->sched); >>>> + if (!adev->asic_reset_res) >>>> + drm_sched_resubmit_jobs(&ring->sched); >>>> - kthread_unpark(ring->sched.thread); >>>> + drm_sched_start(&ring->sched, !adev->asic_reset_res); >>>> } >>>> if (!amdgpu_device_has_dc_support(adev)) { >>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>>> index 49a6763..6f1268f 100644 >>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>>> @@ -109,16 +109,19 @@ static void etnaviv_sched_timedout_job(struct >>>> drm_sched_job *sched_job) >>>> } >>>> /* block scheduler */ >>>> - kthre
Re: [PATCH v5 1/2] drm/sched: Refactor ring mirror list handling.
On 12/21/2018 01:37 PM, Christian König wrote: > Am 20.12.18 um 20:23 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from non signaled HW fenes wait for all signaled HW fences >> to complete before rerunning the jobs. >> >> v2: Fix resubmission of guilty job into HW after refactoring. >> >> v4: >> Full restart for all the jobs, not only from guilty ring. >> Extract karma increase into standalone function. >> >> v5: >> Rework waiting for signaled jobs without relying on the job >> struct itself as those might already be freed for non 'guilty' >> job's schedulers. >> Expose karma increase to drivers. >> >> Suggested-by: Christian Koenig >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +-- >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 11 +- >> drivers/gpu/drm/scheduler/sched_main.c | 188 >> +++-- >> drivers/gpu/drm/v3d/v3d_sched.c | 12 +- >> include/drm/gpu_scheduler.h | 10 +- >> 5 files changed, 151 insertions(+), 88 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 8a078f4..a4bd2d3 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3298,12 +3298,10 @@ static int >> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> - kthread_park(ring->sched.thread); >> + drm_sched_stop(&ring->sched, job ? &job->base : NULL); >> - if (job && job->base.sched != &ring->sched) >> - continue; >> - >> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : NULL); >> + if(job) >> + drm_sched_increase_karma(&job->base); > > Since we dropped the "job && job->base.sched != &ring->sched" check > above this will now increase the jobs karma multiple times. > > Maybe just move that outside of the loop. > >> /* after all hw jobs are reset, hw fence is meaningless, >> so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> @@ -3454,14 +3452,10 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> - /* only need recovery sched of the given job's ring >> - * or all rings (in the case @job is NULL) >> - * after above amdgpu_reset accomplished >> - */ >> - if ((!job || job->base.sched == &ring->sched) && >> !adev->asic_reset_res) >> - drm_sched_job_recovery(&ring->sched); >> + if (!adev->asic_reset_res) >> + drm_sched_resubmit_jobs(&ring->sched); >> - kthread_unpark(ring->sched.thread); >> + drm_sched_start(&ring->sched, !adev->asic_reset_res); >> } >> if (!amdgpu_device_has_dc_support(adev)) { >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 49a6763..6f1268f 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -109,16 +109,19 @@ static void etnaviv_sched_timedout_job(struct >> drm_sched_job *sched_job) >> } >> /* block scheduler */ >> - kthread_park(gpu->sched.thread); >> - drm_sched_hw_job_reset(&gpu->sched, sched_job); >> + drm_sched_stop(&gpu->sched, sched_job); >> + >> + if(sched_job) >> + drm_sched_increase_karma(sched_job); >> /* get the GPU back into the init state */ >> etnaviv_core_dump(gpu); >> etnaviv_gpu_recover_hang(gpu); >> + drm_sched_resubmit_jobs(&gpu->sched); >> + >> /* restart scheduler after GPU is usable again */ >> - drm_sched_job_recovery(&gpu->sched); >> - kthread_unpark(gpu->sched.thread); >> + drm_sched_start(&gpu->sched, true); >> } >> static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index dbb6906..b5c5bee 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -60,8 +60,6 @@ >> static void drm_sched_process_job(struct dma_fence *f, struct >> dma_fence_cb *cb); >> -static void drm_sched_expel_job_unlocked(struct drm_sched_job >> *s_job); >> - >> /** >> * drm_sched_rq_init - initialize a given run queue struct >> * >> @@ -335,6 +333,42 @@ static void drm_sched_job_timedout(struct >> work_struct *work) >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> } > > Kernel doc here would be nice to have. > >> +void drm_sched_increase_karma(struct drm_sched_job *bad) >> +{ >> + int i; >> + struct drm
Re: [PATCH] drm: Block fb changes for async plane updates
As far as we discussed this internally looks good to me, but obviously we need to wait for some feedback from non AMD people. Acked-by: Andrey Grodzovsky Andrey On 12/21/2018 09:33 AM, Nicholas Kazlauskas wrote: > The behavior of drm_atomic_helper_cleanup_planes differs depending on > whether the commit was an asynchronous update or not. > > For a typical (non-async) atomic commit prepare_fb is called on the > new_plane_state and cleanup_fb is called on the old_plane_state. > > However, async commits are performed in place and don't swap the state > for the plane. The call to prepare_fb happens on the new_plane_state > and the call to cleanup_fb is also called on the new_plane_state in > this case (since the state hasn't swapped). > > This behavior can lead to use-after-free or unpin of an active fb. > > Consider the following sequence of events for interleaving fbs: > > - Async update, fb1 prepare, fb1 cleanup_fb > - Async update, fb2 prepare, fb2 cleanup_fb > - Non-async update, fb1 prepare, fb2 cleanup_fb > - Async update, fb2 cleanup_fb -> double cleanup, use-after-free > > This situation has been observed in practice for a double buffered > cursor when closing an X client. The non-async update occurs because > the new_plane_state->crtc != old_plane_state->crtc which forces the > non-async path to occur. > > The simplest fix for this is to block fb updates in > drm_atomic_helper_async_check. This guarantees that the framebuffer > will have previously been prepared and any subsequent async updates > will always call prepare and cleanup_fb like the non-async atomic > commit path would. > > Cc: Michel Dänzer > Cc: Daniel Vetter > Cc: Andrey Grodzovsky > Cc: Harry Wentland > Signed-off-by: Nicholas Kazlauskas > --- > drivers/gpu/drm/drm_atomic_helper.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c > b/drivers/gpu/drm/drm_atomic_helper.c > index 54e2ae614dcc..d2f80bf14f86 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -1599,7 +1599,8 @@ int drm_atomic_helper_async_check(struct drm_device > *dev, > return -EINVAL; > > if (!new_plane_state->crtc || > - old_plane_state->crtc != new_plane_state->crtc) > + old_plane_state->crtc != new_plane_state->crtc || > + old_plane_state->fb != new_plane_state->fb) > return -EINVAL; > > funcs = plane->helper_private; ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 1/2] drm/sched: Refactor ring mirror list handling.
On 12/19/2018 11:21 AM, Christian König wrote: > Am 17.12.18 um 20:51 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from non signaled HW fenes wait for all signaled HW fences >> to complete before rerunning the jobs. >> >> v2: Fix resubmission of guilty job into HW after refactoring. >> >> v4: >> Full restart for all the jobs, not only from guilty ring. >> Extract karma increase into standalone function. >> >> Suggested-by: Christian Koenig >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +-- >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 8 +- >> drivers/gpu/drm/scheduler/sched_main.c | 164 >> + >> drivers/gpu/drm/v3d/v3d_sched.c | 9 +- >> include/drm/gpu_scheduler.h | 9 +- >> 5 files changed, 118 insertions(+), 89 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index 8a078f4..8ac4f43 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3298,12 +3298,7 @@ static int amdgpu_device_pre_asic_reset(struct >> amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> - kthread_park(ring->sched.thread); >> - >> - if (job && job->base.sched != &ring->sched) >> - continue; >> - >> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : NULL); >> + drm_sched_stop(&ring->sched, job ? &job->base : NULL); >> /* after all hw jobs are reset, hw fence is meaningless, >> so force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> @@ -3454,14 +3449,10 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> if (!ring || !ring->sched.thread) >> continue; >> - /* only need recovery sched of the given job's ring >> - * or all rings (in the case @job is NULL) >> - * after above amdgpu_reset accomplished >> - */ >> - if ((!job || job->base.sched == &ring->sched) && >> !adev->asic_reset_res) >> - drm_sched_job_recovery(&ring->sched); >> + if (!adev->asic_reset_res) >> + drm_sched_resubmit_jobs(&ring->sched); >> - kthread_unpark(ring->sched.thread); >> + drm_sched_start(&ring->sched, !adev->asic_reset_res); >> } >> if (!amdgpu_device_has_dc_support(adev)) { >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 49a6763..d7075cd 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -109,16 +109,16 @@ static void etnaviv_sched_timedout_job(struct >> drm_sched_job *sched_job) >> } >> /* block scheduler */ >> - kthread_park(gpu->sched.thread); >> - drm_sched_hw_job_reset(&gpu->sched, sched_job); >> + drm_sched_stop(&gpu->sched, sched_job); >> /* get the GPU back into the init state */ >> etnaviv_core_dump(gpu); >> etnaviv_gpu_recover_hang(gpu); >> + drm_sched_resubmit_jobs(&gpu->sched); >> + >> /* restart scheduler after GPU is usable again */ >> - drm_sched_job_recovery(&gpu->sched); >> - kthread_unpark(gpu->sched.thread); >> + drm_sched_start(&gpu->sched, true); >> } >> static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index dbb6906..1cf9541 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -60,8 +60,6 @@ >> static void drm_sched_process_job(struct dma_fence *f, struct >> dma_fence_cb *cb); >> -static void drm_sched_expel_job_unlocked(struct drm_sched_job >> *s_job); >> - >> /** >> * drm_sched_rq_init - initialize a given run queue struct >> * >> @@ -335,6 +333,41 @@ static void drm_sched_job_timedout(struct >> work_struct *work) >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> } >> +static void drm_sched_increase_karma(struct drm_sched_job *bad) >> +{ >> + int i; >> + struct drm_sched_entity *tmp; >> + struct drm_sched_entity *entity; >> + struct drm_gpu_scheduler *sched = bad->sched; >> + >> + /* don't increase @bad's karma if it's from KERNEL RQ, >> + * because sometimes GPU hang would cause kernel jobs (like VM >> updating jobs) >> + * corrupt but keep in mind that kernel jobs always considered >> good. >> + */ >> + if (bad->s_priority != DRM_SCHED_PRIORITY_KERNEL) { >> + atomic_inc(&bad->karma); >> + for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_KERNEL; >> + i++) { >
Re: [PATCH v3 1/2] drm/sched: Refactor ring mirror list handling.
On 12/17/2018 10:27 AM, Christian König wrote: > Am 10.12.18 um 22:43 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from non signaled HW fenes wait for all signaled HW fences >> to complete before rerunning the jobs. >> >> v2: Fix resubmission of guilty job into HW after refactoring. >> >> Suggested-by: Christian Koenig >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++-- >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 8 +-- >> drivers/gpu/drm/scheduler/sched_main.c | 110 >> ++--- >> drivers/gpu/drm/v3d/v3d_sched.c | 11 +-- >> include/drm/gpu_scheduler.h | 10 ++- >> 5 files changed, 95 insertions(+), 61 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index ef36cc5..42111d5 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3292,17 +3292,16 @@ static int >> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> /* block all schedulers and reset given job's ring */ >> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> struct amdgpu_ring *ring = adev->rings[i]; >> + bool park_only = job && job->base.sched != &ring->sched; >> if (!ring || !ring->sched.thread) >> continue; >> - kthread_park(ring->sched.thread); >> + drm_sched_stop(&ring->sched, job ? &job->base : NULL, >> park_only); >> - if (job && job->base.sched != &ring->sched) >> + if (park_only) >> continue; >> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : >> NULL); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3445,6 +3444,7 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> int i; >> + bool unpark_only; >> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> struct amdgpu_ring *ring = adev->rings[i]; >> @@ -3456,10 +3456,13 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> * or all rings (in the case @job is NULL) >> * after above amdgpu_reset accomplished >> */ >> - if ((!job || job->base.sched == &ring->sched) && >> !adev->asic_reset_res) >> - drm_sched_job_recovery(&ring->sched); >> + unpark_only = (job && job->base.sched != &ring->sched) || >> + adev->asic_reset_res; >> + >> + if (!unpark_only) >> + drm_sched_resubmit_jobs(&ring->sched); >> - kthread_unpark(ring->sched.thread); >> + drm_sched_start(&ring->sched, unpark_only); >> } >> if (!amdgpu_device_has_dc_support(adev)) { >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 49a6763..fab3b51 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -109,16 +109,16 @@ static void etnaviv_sched_timedout_job(struct >> drm_sched_job *sched_job) >> } >> /* block scheduler */ >> - kthread_park(gpu->sched.thread); >> - drm_sched_hw_job_reset(&gpu->sched, sched_job); >> + drm_sched_stop(&gpu->sched, sched_job, false); >> /* get the GPU back into the init state */ >> etnaviv_core_dump(gpu); >> etnaviv_gpu_recover_hang(gpu); >> + drm_sched_resubmit_jobs(&gpu->sched); >> + >> /* restart scheduler after GPU is usable again */ >> - drm_sched_job_recovery(&gpu->sched); >> - kthread_unpark(gpu->sched.thread); >> + drm_sched_start(&gpu->sched); >> } >> static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index dbb6906..cdf95e2 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -60,8 +60,6 @@ >> static void drm_sched_process_job(struct dma_fence *f, struct >> dma_fence_cb *cb); >> -static void drm_sched_expel_job_unlocked(struct drm_sched_job >> *s_job); >> - >> /** >> * drm_sched_rq_init - initialize a given run queue struct >> * >> @@ -342,13 +340,21 @@ static void drm_sched_job_timedout(struct >> work_struct *work) >> * @bad: bad scheduler job >> * >> */ >> -void drm_sched_hw_job_reset(struct drm_gpu_scheduler *sched, struct >> drm_sched_job *bad) >> +void drm_sched_stop(struct drm_gpu_scheduler *sched, struct >> drm_sched_job *bad, >> + bool park_only) >> { >> struct drm_s
Re: [PATCH v3 2/2] drm/sched: Rework HW fence processing.
Just a reminder. Any new comments in light of all the discussion ? Andrey On 12/12/2018 08:08 AM, Grodzovsky, Andrey wrote: > BTW, the problem I pointed out with drm_sched_entity_kill_jobs_cb is not > an issue with this patch set since it removes the cb from > s_fence->finished in general so we only free the job once - directly > from drm_sched_entity_kill_jobs_cb. > > Andrey > > > On 12/11/2018 11:20 AM, Christian König wrote: >> Yeah, completely correct explained. >> >> I was unfortunately really busy today, but going to give that a look >> as soon as I have time. >> >> Christian. >> >> Am 11.12.18 um 17:01 schrieb Grodzovsky, Andrey: >>> A I understand you say that by the time the fence callback runs the job >>> might have already been released, >>> >>> but how so if the job gets released from drm_sched_job_finish work >>> handler in the normal flow - so, after the HW >>> >>> fence (s_fence->parent) cb is executed. Other 2 flows are error use >>> cases where amdgpu_job_free is called directly in which >>> >>> cases I assume the job wasn't submitted to HW. Last flow I see is >>> drm_sched_entity_kill_jobs_cb and here I actually see a problem >>> >>> with the code as it's today - drm_sched_fence_finished is called which >>> will trigger s_fence->finished callback to run which today >>> >>> schedules drm_sched_job_finish which releases the job, but we don't even >>> wait for that and call free_job cb directly from >>> >>> after that which seems wrong to me. >>> >>> Andrey >>> >>> >>> On 12/10/2018 09:45 PM, Zhou, David(ChunMing) wrote: >>>> I don't think adding cb to sched job would work as soon as their >>>> lifetime is different with fence. >>>> Unless you make the sched job reference, otherwise we will get >>>> trouble sooner or later. >>>> >>>> -David >>>> >>>>> -Original Message- >>>>> From: amd-gfx On Behalf Of >>>>> Andrey Grodzovsky >>>>> Sent: Tuesday, December 11, 2018 5:44 AM >>>>> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; >>>>> ckoenig.leichtzumer...@gmail.com; e...@anholt.net; >>>>> etna...@lists.freedesktop.org >>>>> Cc: Zhou, David(ChunMing) ; Liu, Monk >>>>> ; Grodzovsky, Andrey >>>>> >>>>> Subject: [PATCH v3 2/2] drm/sched: Rework HW fence processing. >>>>> >>>>> Expedite job deletion from ring mirror list to the HW fence signal >>>>> callback >>>>> instead from finish_work, together with waiting for all such fences >>>>> to signal in >>>>> drm_sched_stop we garantee that already signaled job will not be >>>>> processed >>>>> twice. >>>>> Remove the sched finish fence callback and just submit finish_work >>>>> directly >>>>> from the HW fence callback. >>>>> >>>>> v2: Fix comments. >>>>> >>>>> v3: Attach hw fence cb to sched_job >>>>> >>>>> Suggested-by: Christian Koenig >>>>> Signed-off-by: Andrey Grodzovsky >>>>> --- >>>>> drivers/gpu/drm/scheduler/sched_main.c | 58 >>>>> -- >>>>> >>>>> include/drm/gpu_scheduler.h | 6 ++-- >>>>> 2 files changed, 30 insertions(+), 34 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>>> index cdf95e2..f0c1f32 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>> @@ -284,8 +284,6 @@ static void drm_sched_job_finish(struct >>>>> work_struct >>>>> *work) >>>>> cancel_delayed_work_sync(&sched->work_tdr); >>>>> >>>>> spin_lock_irqsave(&sched->job_list_lock, flags); >>>>> - /* remove job from ring_mirror_list */ >>>>> - list_del_init(&s_job->node); >>>>> /* queue TDR for next job */ >>>>> drm_sched_start_timeout(sched); >>>>> spin_unlock_irqrestore(&sched->job_list_lock, flags); @@ >&g
Re: [PATCH v3 2/2] drm/sched: Rework HW fence processing.
BTW, the problem I pointed out with drm_sched_entity_kill_jobs_cb is not an issue with this patch set since it removes the cb from s_fence->finished in general so we only free the job once - directly from drm_sched_entity_kill_jobs_cb. Andrey On 12/11/2018 11:20 AM, Christian König wrote: > Yeah, completely correct explained. > > I was unfortunately really busy today, but going to give that a look > as soon as I have time. > > Christian. > > Am 11.12.18 um 17:01 schrieb Grodzovsky, Andrey: >> A I understand you say that by the time the fence callback runs the job >> might have already been released, >> >> but how so if the job gets released from drm_sched_job_finish work >> handler in the normal flow - so, after the HW >> >> fence (s_fence->parent) cb is executed. Other 2 flows are error use >> cases where amdgpu_job_free is called directly in which >> >> cases I assume the job wasn't submitted to HW. Last flow I see is >> drm_sched_entity_kill_jobs_cb and here I actually see a problem >> >> with the code as it's today - drm_sched_fence_finished is called which >> will trigger s_fence->finished callback to run which today >> >> schedules drm_sched_job_finish which releases the job, but we don't even >> wait for that and call free_job cb directly from >> >> after that which seems wrong to me. >> >> Andrey >> >> >> On 12/10/2018 09:45 PM, Zhou, David(ChunMing) wrote: >>> I don't think adding cb to sched job would work as soon as their >>> lifetime is different with fence. >>> Unless you make the sched job reference, otherwise we will get >>> trouble sooner or later. >>> >>> -David >>> >>>> -Original Message- >>>> From: amd-gfx On Behalf Of >>>> Andrey Grodzovsky >>>> Sent: Tuesday, December 11, 2018 5:44 AM >>>> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; >>>> ckoenig.leichtzumer...@gmail.com; e...@anholt.net; >>>> etna...@lists.freedesktop.org >>>> Cc: Zhou, David(ChunMing) ; Liu, Monk >>>> ; Grodzovsky, Andrey >>>> >>>> Subject: [PATCH v3 2/2] drm/sched: Rework HW fence processing. >>>> >>>> Expedite job deletion from ring mirror list to the HW fence signal >>>> callback >>>> instead from finish_work, together with waiting for all such fences >>>> to signal in >>>> drm_sched_stop we garantee that already signaled job will not be >>>> processed >>>> twice. >>>> Remove the sched finish fence callback and just submit finish_work >>>> directly >>>> from the HW fence callback. >>>> >>>> v2: Fix comments. >>>> >>>> v3: Attach hw fence cb to sched_job >>>> >>>> Suggested-by: Christian Koenig >>>> Signed-off-by: Andrey Grodzovsky >>>> --- >>>> drivers/gpu/drm/scheduler/sched_main.c | 58 >>>> -- >>>> >>>> include/drm/gpu_scheduler.h | 6 ++-- >>>> 2 files changed, 30 insertions(+), 34 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >>>> b/drivers/gpu/drm/scheduler/sched_main.c >>>> index cdf95e2..f0c1f32 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -284,8 +284,6 @@ static void drm_sched_job_finish(struct >>>> work_struct >>>> *work) >>>> cancel_delayed_work_sync(&sched->work_tdr); >>>> >>>> spin_lock_irqsave(&sched->job_list_lock, flags); >>>> - /* remove job from ring_mirror_list */ >>>> - list_del_init(&s_job->node); >>>> /* queue TDR for next job */ >>>> drm_sched_start_timeout(sched); >>>> spin_unlock_irqrestore(&sched->job_list_lock, flags); @@ >>>> -293,22 >>>> +291,11 @@ static void drm_sched_job_finish(struct work_struct *work) >>>> sched->ops->free_job(s_job); >>>> } >>>> >>>> -static void drm_sched_job_finish_cb(struct dma_fence *f, >>>> - struct dma_fence_cb *cb) >>>> -{ >>>> - struct drm_sched_job *job = container_of(cb, struct >>>> drm_sched_job, >>>> - finish_cb); >>
Re: [PATCH libdrm] amdgpu/test: Enable deadlock test for CI family (gfx7)
np Andrey On 12/11/2018 03:18 PM, Alex Deucher wrote: > On Tue, Dec 11, 2018 at 3:13 PM Andrey Grodzovsky > wrote: >> I retested GPU recovery with Bonaire ASIC and it works. >> >> Signed-off-by: Andrey Grodzovsky > Reviewed-by: Alex Deucher > > Care to enable it in the kernel as well? > > Alex > >> --- >> tests/amdgpu/deadlock_tests.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/tests/amdgpu/deadlock_tests.c b/tests/amdgpu/deadlock_tests.c >> index 6bd36aa..a6c2635 100644 >> --- a/tests/amdgpu/deadlock_tests.c >> +++ b/tests/amdgpu/deadlock_tests.c >> @@ -124,7 +124,8 @@ CU_BOOL suite_deadlock_tests_enable(void) >> * by default (currently GFX8/9 dGPUS) >> */ >> if (device_handle->info.family_id != AMDGPU_FAMILY_VI && >> - device_handle->info.family_id != AMDGPU_FAMILY_AI) { >> + device_handle->info.family_id != AMDGPU_FAMILY_AI && >> + device_handle->info.family_id != AMDGPU_FAMILY_CI) { >> printf("\n\nGPU reset is not enabled for the ASIC, deadlock >> suite disabled\n"); >> enable = CU_FALSE; >> } >> -- >> 2.7.4 >> >> ___ >> amd-gfx mailing list >> amd-...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 2/2] drm/sched: Rework HW fence processing.
A I understand you say that by the time the fence callback runs the job might have already been released, but how so if the job gets released from drm_sched_job_finish work handler in the normal flow - so, after the HW fence (s_fence->parent) cb is executed. Other 2 flows are error use cases where amdgpu_job_free is called directly in which cases I assume the job wasn't submitted to HW. Last flow I see is drm_sched_entity_kill_jobs_cb and here I actually see a problem with the code as it's today - drm_sched_fence_finished is called which will trigger s_fence->finished callback to run which today schedules drm_sched_job_finish which releases the job, but we don't even wait for that and call free_job cb directly from after that which seems wrong to me. Andrey On 12/10/2018 09:45 PM, Zhou, David(ChunMing) wrote: > I don't think adding cb to sched job would work as soon as their lifetime is > different with fence. > Unless you make the sched job reference, otherwise we will get trouble sooner > or later. > > -David > >> -Original Message- >> From: amd-gfx On Behalf Of >> Andrey Grodzovsky >> Sent: Tuesday, December 11, 2018 5:44 AM >> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; >> ckoenig.leichtzumer...@gmail.com; e...@anholt.net; >> etna...@lists.freedesktop.org >> Cc: Zhou, David(ChunMing) ; Liu, Monk >> ; Grodzovsky, Andrey >> >> Subject: [PATCH v3 2/2] drm/sched: Rework HW fence processing. >> >> Expedite job deletion from ring mirror list to the HW fence signal callback >> instead from finish_work, together with waiting for all such fences to >> signal in >> drm_sched_stop we garantee that already signaled job will not be processed >> twice. >> Remove the sched finish fence callback and just submit finish_work directly >> from the HW fence callback. >> >> v2: Fix comments. >> >> v3: Attach hw fence cb to sched_job >> >> Suggested-by: Christian Koenig >> Signed-off-by: Andrey Grodzovsky >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 58 -- >> >> include/drm/gpu_scheduler.h| 6 ++-- >> 2 files changed, 30 insertions(+), 34 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index cdf95e2..f0c1f32 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -284,8 +284,6 @@ static void drm_sched_job_finish(struct work_struct >> *work) >> cancel_delayed_work_sync(&sched->work_tdr); >> >> spin_lock_irqsave(&sched->job_list_lock, flags); >> -/* remove job from ring_mirror_list */ >> -list_del_init(&s_job->node); >> /* queue TDR for next job */ >> drm_sched_start_timeout(sched); >> spin_unlock_irqrestore(&sched->job_list_lock, flags); @@ -293,22 >> +291,11 @@ static void drm_sched_job_finish(struct work_struct *work) >> sched->ops->free_job(s_job); >> } >> >> -static void drm_sched_job_finish_cb(struct dma_fence *f, >> -struct dma_fence_cb *cb) >> -{ >> -struct drm_sched_job *job = container_of(cb, struct drm_sched_job, >> - finish_cb); >> -schedule_work(&job->finish_work); >> -} >> - >> static void drm_sched_job_begin(struct drm_sched_job *s_job) { >> struct drm_gpu_scheduler *sched = s_job->sched; >> unsigned long flags; >> >> -dma_fence_add_callback(&s_job->s_fence->finished, &s_job- >>> finish_cb, >> - drm_sched_job_finish_cb); >> - >> spin_lock_irqsave(&sched->job_list_lock, flags); >> list_add_tail(&s_job->node, &sched->ring_mirror_list); >> drm_sched_start_timeout(sched); >> @@ -359,12 +346,11 @@ void drm_sched_stop(struct drm_gpu_scheduler >> *sched, struct drm_sched_job *bad, >> list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) >> { >> if (s_job->s_fence->parent && >> dma_fence_remove_callback(s_job->s_fence->parent, >> - &s_job->s_fence->cb)) { >> + &s_job->cb)) { >> dma_fence_put(s_job->s_fence->parent); >> s_job->s_fence->parent = NULL; >> atomic_d
Re: [PATCH 2/2] drm/sched: Rework HW fence processing.
On 12/07/2018 03:19 AM, Christian König wrote: > Am 07.12.18 um 04:18 schrieb Zhou, David(ChunMing): >> >>> -Original Message- >>> From: dri-devel On Behalf Of >>> Andrey Grodzovsky >>> Sent: Friday, December 07, 2018 1:41 AM >>> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; >>> ckoenig.leichtzumer...@gmail.com; e...@anholt.net; >>> etna...@lists.freedesktop.org >>> Cc: Liu, Monk >>> Subject: [PATCH 2/2] drm/sched: Rework HW fence processing. >>> >>> Expedite job deletion from ring mirror list to the HW fence signal >>> callback >>> instead from finish_work, together with waiting for all such fences >>> to signal in >>> drm_sched_stop we garantee that already signaled job will not be >>> processed >>> twice. >>> Remove the sched finish fence callback and just submit finish_work >>> directly >>> from the HW fence callback. >>> >>> Suggested-by: Christian Koenig >>> Signed-off-by: Andrey Grodzovsky >>> --- >>> drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- >>> drivers/gpu/drm/scheduler/sched_main.c | 39 -- >>> --- >>> include/drm/gpu_scheduler.h | 10 +++-- >>> 3 files changed, 30 insertions(+), 23 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c >>> b/drivers/gpu/drm/scheduler/sched_fence.c >>> index d8d2dff..e62c239 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_fence.c >>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c >>> @@ -151,7 +151,8 @@ struct drm_sched_fence >>> *to_drm_sched_fence(struct dma_fence *f) >>> EXPORT_SYMBOL(to_drm_sched_fence); >>> >>> struct drm_sched_fence *drm_sched_fence_create(struct >>> drm_sched_entity *entity, >>> - void *owner) >>> + void *owner, >>> + struct drm_sched_job *s_job) >>> { >>> struct drm_sched_fence *fence = NULL; >>> unsigned seq; >>> @@ -163,6 +164,7 @@ struct drm_sched_fence >>> *drm_sched_fence_create(struct drm_sched_entity *entity, >>> fence->owner = owner; >>> fence->sched = entity->rq->sched; >>> spin_lock_init(&fence->lock); >>> + fence->s_job = s_job; >>> >>> seq = atomic_inc_return(&entity->fence_seq); >>> dma_fence_init(&fence->scheduled, >>> &drm_sched_fence_ops_scheduled, diff --git >>> a/drivers/gpu/drm/scheduler/sched_main.c >>> b/drivers/gpu/drm/scheduler/sched_main.c >>> index 8fb7f86..2860037 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -284,31 +284,17 @@ static void drm_sched_job_finish(struct >>> work_struct *work) >>> cancel_delayed_work_sync(&sched->work_tdr); >>> >>> spin_lock_irqsave(&sched->job_list_lock, flags); >>> - /* remove job from ring_mirror_list */ >>> - list_del_init(&s_job->node); >>> - /* queue TDR for next job */ >>> drm_sched_start_timeout(sched); >>> spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> >>> sched->ops->free_job(s_job); >>> } >>> >>> -static void drm_sched_job_finish_cb(struct dma_fence *f, >>> - struct dma_fence_cb *cb) >>> -{ >>> - struct drm_sched_job *job = container_of(cb, struct drm_sched_job, >>> - finish_cb); >>> - schedule_work(&job->finish_work); >>> -} >>> - >>> static void drm_sched_job_begin(struct drm_sched_job *s_job) { >>> struct drm_gpu_scheduler *sched = s_job->sched; >>> unsigned long flags; >>> >>> - dma_fence_add_callback(&s_job->s_fence->finished, &s_job- finish_cb, >>> - drm_sched_job_finish_cb); >>> - >>> spin_lock_irqsave(&sched->job_list_lock, flags); >>> list_add_tail(&s_job->node, &sched->ring_mirror_list); >>> drm_sched_start_timeout(sched); >>> @@ -418,13 +404,17 @@ void drm_sched_start(struct drm_gpu_scheduler >>> *sched, bool unpark_only) { >>> struct drm_sched_job *s_job, *tmp; >>> bool found_guilty = false; >>> - unsigned long flags; >>> int r; >>> >>> if (unpark_only) >>> goto unpark; >>> >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> + /* >>> + * Locking the list is not required here as the sched thread is >>> parked >>> + * so no new jobs are being pushed in to HW and in drm_sched_stop >>> we >>> + * flushed any in flight jobs who didn't signal yet. Also >>> concurrent >>> + * GPU recovers can't run in parallel. >>> + */ >>> list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, >>> node) { >>> struct drm_sched_fence *s_fence = s_job->s_fence; >>> struct dma_fence *fence; >>> @@ -453,7 +443,6 @@ void drm_sched_start(struct drm_gpu_scheduler >>> *sched, bool unpark_only) >>> } >>> >>> drm_sched_start_timeout(sched); >>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> >>> unpark: >>> kthread_unpark(sched->thread); >>> @@ -505,7 +494,7 @@ int drm_sched_job_init(struct drm_sch
Re: [PATCH 1/2] drm/sched: Refactor ring mirror list handling.
On 12/06/2018 01:33 PM, Christian König wrote: > Am 06.12.18 um 18:41 schrieb Andrey Grodzovsky: >> Decauple sched threads stop and start and ring mirror >> list handling from the policy of what to do about the >> guilty jobs. >> When stoppping the sched thread and detaching sched fences >> from non signaled HW fenes wait for all signaled HW fences >> to complete before rerunning the jobs. >> >> Suggested-by: Christian Koenig >> Signed-off-by: Andrey Grodzovsky > > Just briefly skimmed over this, but it looks exactly like what I had > in mind. > > Need to give that a more detailed thought tomorrow, > Christian. Please note I've already resent V2 after finding refactoring error. Andrey > >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 +++--- >> drivers/gpu/drm/etnaviv/etnaviv_sched.c | 8 +-- >> drivers/gpu/drm/scheduler/sched_main.c | 86 >> +++--- >> drivers/gpu/drm/v3d/v3d_sched.c | 11 ++-- >> include/drm/gpu_scheduler.h | 10 ++-- >> 5 files changed, 83 insertions(+), 49 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index ef36cc5..42111d5 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3292,17 +3292,16 @@ static int >> amdgpu_device_pre_asic_reset(struct amdgpu_device *adev, >> /* block all schedulers and reset given job's ring */ >> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> struct amdgpu_ring *ring = adev->rings[i]; >> + bool park_only = job && job->base.sched != &ring->sched; >> if (!ring || !ring->sched.thread) >> continue; >> - kthread_park(ring->sched.thread); >> + drm_sched_stop(&ring->sched, job ? &job->base : NULL, >> park_only); >> - if (job && job->base.sched != &ring->sched) >> + if (park_only) >> continue; >> - drm_sched_hw_job_reset(&ring->sched, job ? &job->base : >> NULL); >> - >> /* after all hw jobs are reset, hw fence is meaningless, so >> force_completion */ >> amdgpu_fence_driver_force_completion(ring); >> } >> @@ -3445,6 +3444,7 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> struct amdgpu_job *job) >> { >> int i; >> + bool unpark_only; >> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { >> struct amdgpu_ring *ring = adev->rings[i]; >> @@ -3456,10 +3456,13 @@ static void >> amdgpu_device_post_asic_reset(struct amdgpu_device *adev, >> * or all rings (in the case @job is NULL) >> * after above amdgpu_reset accomplished >> */ >> - if ((!job || job->base.sched == &ring->sched) && >> !adev->asic_reset_res) >> - drm_sched_job_recovery(&ring->sched); >> + unpark_only = (job && job->base.sched != &ring->sched) || >> + adev->asic_reset_res; >> + >> + if (!unpark_only) >> + drm_sched_resubmit_jobs(&ring->sched); >> - kthread_unpark(ring->sched.thread); >> + drm_sched_start(&ring->sched, unpark_only); >> } >> if (!amdgpu_device_has_dc_support(adev)) { >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 49a6763..fab3b51 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -109,16 +109,16 @@ static void etnaviv_sched_timedout_job(struct >> drm_sched_job *sched_job) >> } >> /* block scheduler */ >> - kthread_park(gpu->sched.thread); >> - drm_sched_hw_job_reset(&gpu->sched, sched_job); >> + drm_sched_stop(&gpu->sched, sched_job, false); >> /* get the GPU back into the init state */ >> etnaviv_core_dump(gpu); >> etnaviv_gpu_recover_hang(gpu); >> + drm_sched_resubmit_jobs(&gpu->sched); >> + >> /* restart scheduler after GPU is usable again */ >> - drm_sched_job_recovery(&gpu->sched); >> - kthread_unpark(gpu->sched.thread); >> + drm_sched_start(&gpu->sched); >> } >> static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index dbb6906..8fb7f86 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -60,8 +60,6 @@ >> static void drm_sched_process_job(struct dma_fence *f, struct >> dma_fence_cb *cb); >> -static void drm_sched_expel_job_unlocked(struct drm_sched_job >> *s_job); >> - >> /** >> * drm_sched_rq_init - initialize a given run queue struct >> * >> @@ -342,13 +340,21 @@ static void drm_sched_job_timedout(struct >> work_struct *work) >> * @bad: bad scheduler job >> * >> */ >> -void drm_sched_hw_job_reset(struct drm_gpu_scheduler *sched, struct >> drm
Re: [PATCH 2/2] drm/sched: Rework HW fence processing.
On 12/06/2018 12:41 PM, Andrey Grodzovsky wrote: > Expedite job deletion from ring mirror list to the HW fence signal > callback instead from finish_work, together with waiting for all > such fences to signal in drm_sched_stop we garantee that > already signaled job will not be processed twice. > Remove the sched finish fence callback and just submit finish_work > directly from the HW fence callback. > > Suggested-by: Christian Koenig > Signed-off-by: Andrey Grodzovsky > --- > drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- > drivers/gpu/drm/scheduler/sched_main.c | 39 > - > include/drm/gpu_scheduler.h | 10 +++-- > 3 files changed, 30 insertions(+), 23 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c > b/drivers/gpu/drm/scheduler/sched_fence.c > index d8d2dff..e62c239 100644 > --- a/drivers/gpu/drm/scheduler/sched_fence.c > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > @@ -151,7 +151,8 @@ struct drm_sched_fence *to_drm_sched_fence(struct > dma_fence *f) > EXPORT_SYMBOL(to_drm_sched_fence); > > struct drm_sched_fence *drm_sched_fence_create(struct drm_sched_entity > *entity, > -void *owner) > +void *owner, > +struct drm_sched_job *s_job) > { > struct drm_sched_fence *fence = NULL; > unsigned seq; > @@ -163,6 +164,7 @@ struct drm_sched_fence *drm_sched_fence_create(struct > drm_sched_entity *entity, > fence->owner = owner; > fence->sched = entity->rq->sched; > spin_lock_init(&fence->lock); > + fence->s_job = s_job; > > seq = atomic_inc_return(&entity->fence_seq); > dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 8fb7f86..2860037 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -284,31 +284,17 @@ static void drm_sched_job_finish(struct work_struct > *work) > cancel_delayed_work_sync(&sched->work_tdr); > > spin_lock_irqsave(&sched->job_list_lock, flags); > - /* remove job from ring_mirror_list */ > - list_del_init(&s_job->node); > - /* queue TDR for next job */ > drm_sched_start_timeout(sched); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > > sched->ops->free_job(s_job); > } > > -static void drm_sched_job_finish_cb(struct dma_fence *f, > - struct dma_fence_cb *cb) > -{ > - struct drm_sched_job *job = container_of(cb, struct drm_sched_job, > - finish_cb); > - schedule_work(&job->finish_work); > -} > - > static void drm_sched_job_begin(struct drm_sched_job *s_job) > { > struct drm_gpu_scheduler *sched = s_job->sched; > unsigned long flags; > > - dma_fence_add_callback(&s_job->s_fence->finished, &s_job->finish_cb, > -drm_sched_job_finish_cb); > - > spin_lock_irqsave(&sched->job_list_lock, flags); > list_add_tail(&s_job->node, &sched->ring_mirror_list); > drm_sched_start_timeout(sched); > @@ -418,13 +404,17 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, > bool unpark_only) > { > struct drm_sched_job *s_job, *tmp; > bool found_guilty = false; > - unsigned long flags; > int r; > > if (unpark_only) > goto unpark; > > - spin_lock_irqsave(&sched->job_list_lock, flags); > + /* > + * Locking the list is not required here as the sched thread is parked > + * so no new jobs are being pushed in to HW and in drm_sched_stop we > + * flushed any in flight jobs who didn't signal yet. The comment is inaccurate here - it's supposed to be ' any in flight jobs who already have their sched finished signaled and they are removed from the mirror ring list at that point already anyway' I will fix this text later with other comments received on the patches. Andrey > Also concurrent > + * GPU recovers can't run in parallel. > + */ > list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) { > struct drm_sched_fence *s_fence = s_job->s_fence; > struct dma_fence *fence; > @@ -453,7 +443,6 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, > bool unpark_only) > } > > drm_sched_start_timeout(sched); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > > unpark: > kthread_unpark(sched->thread); > @@ -505,7 +494,7 @@ int drm_sched_job_init(struct drm_sched_job *job, > job->sched = sched; > job->entity = entity; > job->s_priority = entity->rq - sched->sched_rq; > - job->s_fence = drm_sched_fence_create(entity, owner); > + job->s_fence = drm_sched_fence_create(e
Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.
There is a pplib messaging related failure currently during GPU reset. I will put this issue on my TODO list for later time after handling more prioritized stuff and will disable the deadlock test suite for all non dGPU gfx8/9 ASICs until then. Andrey On 11/02/2018 02:14 PM, Grodzovsky, Andrey wrote: Have you tried enabling reset on gfx7 dGPUs? It uses pretty much the same sequence as gfx8 so it might just work. Alex I haven't but I can give it a try. Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.
On 11/02/2018 02:12 PM, Alex Deucher wrote: > On Fri, Nov 2, 2018 at 11:59 AM Grodzovsky, Andrey > wrote: >> >> >> On 11/02/2018 10:24 AM, Michel Dänzer wrote: >>> On 2018-10-31 7:33 p.m., Andrey Grodzovsky wrote: >>>> Illegal access will cause CP hang followed by job timeout and >>>> recovery kicking in. >>>> Also, disable the suite for all APU ASICs until GPU >>>> reset issues for them will be resolved and GPU reset recovery >>>> will be enabled by default. >>>> >>>> Signed-off-by: Andrey Grodzovsky >>>> >>>> [...] >>>> >>>> @@ -94,7 +119,9 @@ CU_BOOL suite_deadlock_tests_enable(void) >>>>&minor_version, &device_handle)) >>>> return CU_FALSE; >>>> >>>> -if (device_handle->info.family_id == AMDGPU_FAMILY_SI) { >>>> +if (device_handle->info.family_id == AMDGPU_FAMILY_SI || >>>> +device_handle->info.family_id == AMDGPU_FAMILY_CZ || >>>> +device_handle->info.family_id == AMDGPU_FAMILY_RV) { >>>> printf("\n\nCurrently hangs the CP on this ASIC, deadlock >>>> suite disabled\n"); >>>> enable = CU_FALSE; >>>> } >>> Indentation is wrong here and in other places. The libdrm tree contains >>> configuration files for EditorConfig (https://editorconfig.org/); since >>> you're using Eclipse, https://github.com/ncjones/editorconfig-eclipse >>> should help. >> I installed the eclipse plugin. >>> >>> I run amdgpu_test as part of my daily build/test script during lunch >>> break; when I came back today, I was greeted by a GFX hang of the >>> Bonaire in my development box due to this test. Please disable it for >>> all pre-GFX8 ASICs. Ideally, it should also check at runtime that GPU >>> recovery is actually enabled, as that still isn't the case by default >>> except with bleeding edge amdgpu kernel code. >> Thanks for testing - I will send a fix. >> > Have you tried enabling reset on gfx7 dGPUs? It uses pretty much the > same sequence as gfx8 so it might just work. > > Alex I haven't but I can give it a try. Andrey > >> Andrey >>> >> ___ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.
On 11/02/2018 10:24 AM, Michel Dänzer wrote: > On 2018-10-31 7:33 p.m., Andrey Grodzovsky wrote: >> Illegal access will cause CP hang followed by job timeout and >> recovery kicking in. >> Also, disable the suite for all APU ASICs until GPU >> reset issues for them will be resolved and GPU reset recovery >> will be enabled by default. >> >> Signed-off-by: Andrey Grodzovsky >> >> [...] >> >> @@ -94,7 +119,9 @@ CU_BOOL suite_deadlock_tests_enable(void) >> &minor_version, &device_handle)) >> return CU_FALSE; >> >> -if (device_handle->info.family_id == AMDGPU_FAMILY_SI) { >> +if (device_handle->info.family_id == AMDGPU_FAMILY_SI || >> +device_handle->info.family_id == AMDGPU_FAMILY_CZ || >> +device_handle->info.family_id == AMDGPU_FAMILY_RV) { >> printf("\n\nCurrently hangs the CP on this ASIC, deadlock suite >> disabled\n"); >> enable = CU_FALSE; >> } > Indentation is wrong here and in other places. The libdrm tree contains > configuration files for EditorConfig (https://editorconfig.org/); since > you're using Eclipse, https://github.com/ncjones/editorconfig-eclipse > should help. I installed the eclipse plugin. > > > I run amdgpu_test as part of my daily build/test script during lunch > break; when I came back today, I was greeted by a GFX hang of the > Bonaire in my development box due to this test. Please disable it for > all pre-GFX8 ASICs. Ideally, it should also check at runtime that GPU > recovery is actually enabled, as that still isn't the case by default > except with bleeding edge amdgpu kernel code. Thanks for testing - I will send a fix. Andrey > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.
On 10/31/2018 03:49 PM, Alex Deucher wrote: > On Wed, Oct 31, 2018 at 2:33 PM Andrey Grodzovsky > wrote: >> Illegal access will cause CP hang followed by job timeout and >> recovery kicking in. >> Also, disable the suite for all APU ASICs until GPU >> reset issues for them will be resolved and GPU reset recovery >> will be enabled by default. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> tests/amdgpu/deadlock_tests.c | 118 >> +- >> 1 file changed, 117 insertions(+), 1 deletion(-) >> >> diff --git a/tests/amdgpu/deadlock_tests.c b/tests/amdgpu/deadlock_tests.c >> index 292ec4e..c565f7a 100644 >> --- a/tests/amdgpu/deadlock_tests.c >> +++ b/tests/amdgpu/deadlock_tests.c >> @@ -73,6 +73,29 @@ >> * 1 - pfp >> */ >> >> +#definePACKET3_WRITE_DATA 0x37 >> +#defineWRITE_DATA_DST_SEL(x) ((x) << 8) >> + /* 0 - register >> +* 1 - memory (sync - via GRBM) >> +* 2 - gl2 >> +* 3 - gds >> +* 4 - reserved >> +* 5 - memory (async - direct) >> +*/ >> +#defineWR_ONE_ADDR (1 << 16) >> +#defineWR_CONFIRM (1 << 20) >> +#defineWRITE_DATA_CACHE_POLICY(x) ((x) << 25) >> + /* 0 - LRU >> +* 1 - Stream >> +*/ >> +#defineWRITE_DATA_ENGINE_SEL(x)((x) << 30) >> + /* 0 - me >> +* 1 - pfp >> +* 2 - ce >> +*/ >> + >> +#define mmVM_CONTEXT0_PAGE_TABLE_BASE_ADDR >> 0x54f >> + >> static amdgpu_device_handle device_handle; >> static uint32_t major_version; >> static uint32_t minor_version; >> @@ -85,6 +108,8 @@ int use_uc_mtype = 0; >> static void amdgpu_deadlock_helper(unsigned ip_type); >> static void amdgpu_deadlock_gfx(void); >> static void amdgpu_deadlock_compute(void); >> +static void amdgpu_illegal_reg_access(); >> +static void amdgpu_illegal_mem_access(); >> >> CU_BOOL suite_deadlock_tests_enable(void) >> { >> @@ -94,7 +119,9 @@ CU_BOOL suite_deadlock_tests_enable(void) >> &minor_version, >> &device_handle)) >> return CU_FALSE; >> >> - if (device_handle->info.family_id == AMDGPU_FAMILY_SI) { >> + if (device_handle->info.family_id == AMDGPU_FAMILY_SI || > Add AMDGPU_FAMILY_KV for CI based APUs as well. > > >> + device_handle->info.family_id == AMDGPU_FAMILY_CZ || >> + device_handle->info.family_id == AMDGPU_FAMILY_RV) { >> printf("\n\nCurrently hangs the CP on this ASIC, deadlock >> suite disabled\n"); >> enable = CU_FALSE; >> } >> @@ -140,6 +167,8 @@ int suite_deadlock_tests_clean(void) >> CU_TestInfo deadlock_tests[] = { >> { "gfx ring block test", amdgpu_deadlock_gfx }, >> { "compute ring block test", amdgpu_deadlock_compute }, >> + { "illegal reg access test", amdgpu_illegal_reg_access }, >> + { "illegal mem access test", amdgpu_illegal_mem_access }, > Won't this illegal mem access just result in a page fault? Is the > idea to set vm_debug to force an MC halt to test reset? > > Alex For this test to hang the CP amdgpu.vm_fault_stop=2 needs to be set. Andrey > >> CU_TEST_INFO_NULL, >> }; >> >> @@ -257,3 +286,90 @@ static void amdgpu_deadlock_helper(unsigned ip_type) >> r = amdgpu_cs_ctx_free(context_handle); >> CU_ASSERT_EQUAL(r, 0); >> } >> + >> +static void bad_access_helper(int reg_access) >> +{ >> + amdgpu_context_handle context_handle; >> + amdgpu_bo_handle ib_result_handle; >> + void *ib_result_cpu; >> + uint64_t ib_result_mc_address; >> + struct amdgpu_cs_request ibs_request; >> + struct amdgpu_cs_ib_info ib_info; >> + struct amdgpu_cs_fence fence_status; >> + uint32_t expired; >> + int i, r; >> + amdgpu_bo_list_handle bo_list; >> + amdgpu_va_handle va_handle; >> + >> + r = amdgpu_cs_ctx_create(device_handle, &context_handle); >> + CU_ASSERT_EQUAL(r, 0); >> + >> + r = amdgpu_bo_alloc_and_map_raw(device_handle, 4096, 4096, >> + AMDGPU_GEM_DOMAIN_GTT, 0, 0, >> + &ib_result_handle, >> &ib_result_cpu, >> + >> &ib_result_mc_address, &va_handle); >> + CU_ASSERT_EQUAL(r, 0); >> + >> + r = amdgpu_get_bo_list(device_handle, ib_result_handle, NULL, >> + &bo_list); >> + CU_ASSERT_EQUAL(r, 0); >> + >> + ptr = ib_result_cpu; >> + i = 0; >> + >> + ptr[i++]
Re: [PATCH libdrm] amdgpu/test: Add illegal register and memory access test.
On 10/31/2018 03:49 PM, Alex Deucher wrote: > On Wed, Oct 31, 2018 at 2:33 PM Andrey Grodzovsky > wrote: >> Illegal access will cause CP hang followed by job timeout and >> recovery kicking in. >> Also, disable the suite for all APU ASICs until GPU >> reset issues for them will be resolved and GPU reset recovery >> will be enabled by default. >> >> Signed-off-by: Andrey Grodzovsky >> --- >> tests/amdgpu/deadlock_tests.c | 118 >> +- >> 1 file changed, 117 insertions(+), 1 deletion(-) >> >> diff --git a/tests/amdgpu/deadlock_tests.c b/tests/amdgpu/deadlock_tests.c >> index 292ec4e..c565f7a 100644 >> --- a/tests/amdgpu/deadlock_tests.c >> +++ b/tests/amdgpu/deadlock_tests.c >> @@ -73,6 +73,29 @@ >> * 1 - pfp >> */ >> >> +#definePACKET3_WRITE_DATA 0x37 >> +#defineWRITE_DATA_DST_SEL(x) ((x) << 8) >> + /* 0 - register >> +* 1 - memory (sync - via GRBM) >> +* 2 - gl2 >> +* 3 - gds >> +* 4 - reserved >> +* 5 - memory (async - direct) >> +*/ >> +#defineWR_ONE_ADDR (1 << 16) >> +#defineWR_CONFIRM (1 << 20) >> +#defineWRITE_DATA_CACHE_POLICY(x) ((x) << 25) >> + /* 0 - LRU >> +* 1 - Stream >> +*/ >> +#defineWRITE_DATA_ENGINE_SEL(x)((x) << 30) >> + /* 0 - me >> +* 1 - pfp >> +* 2 - ce >> +*/ >> + >> +#define mmVM_CONTEXT0_PAGE_TABLE_BASE_ADDR >> 0x54f >> + >> static amdgpu_device_handle device_handle; >> static uint32_t major_version; >> static uint32_t minor_version; >> @@ -85,6 +108,8 @@ int use_uc_mtype = 0; >> static void amdgpu_deadlock_helper(unsigned ip_type); >> static void amdgpu_deadlock_gfx(void); >> static void amdgpu_deadlock_compute(void); >> +static void amdgpu_illegal_reg_access(); >> +static void amdgpu_illegal_mem_access(); >> >> CU_BOOL suite_deadlock_tests_enable(void) >> { >> @@ -94,7 +119,9 @@ CU_BOOL suite_deadlock_tests_enable(void) >> &minor_version, >> &device_handle)) >> return CU_FALSE; >> >> - if (device_handle->info.family_id == AMDGPU_FAMILY_SI) { >> + if (device_handle->info.family_id == AMDGPU_FAMILY_SI || > Add AMDGPU_FAMILY_KV for CI based APUs as well. > > >> + device_handle->info.family_id == AMDGPU_FAMILY_CZ || >> + device_handle->info.family_id == AMDGPU_FAMILY_RV) { >> printf("\n\nCurrently hangs the CP on this ASIC, deadlock >> suite disabled\n"); >> enable = CU_FALSE; >> } >> @@ -140,6 +167,8 @@ int suite_deadlock_tests_clean(void) >> CU_TestInfo deadlock_tests[] = { >> { "gfx ring block test", amdgpu_deadlock_gfx }, >> { "compute ring block test", amdgpu_deadlock_compute }, >> + { "illegal reg access test", amdgpu_illegal_reg_access }, >> + { "illegal mem access test", amdgpu_illegal_mem_access }, > Won't this illegal mem access just result in a page fault? Is the > idea to set vm_debug to force an MC halt to test reset? > > Alex For this once to hang to CP amdgpu.vm_fault_stop=2 needs to be set. Andrey > >> CU_TEST_INFO_NULL, >> }; >> >> @@ -257,3 +286,90 @@ static void amdgpu_deadlock_helper(unsigned ip_type) >> r = amdgpu_cs_ctx_free(context_handle); >> CU_ASSERT_EQUAL(r, 0); >> } >> + >> +static void bad_access_helper(int reg_access) >> +{ >> + amdgpu_context_handle context_handle; >> + amdgpu_bo_handle ib_result_handle; >> + void *ib_result_cpu; >> + uint64_t ib_result_mc_address; >> + struct amdgpu_cs_request ibs_request; >> + struct amdgpu_cs_ib_info ib_info; >> + struct amdgpu_cs_fence fence_status; >> + uint32_t expired; >> + int i, r; >> + amdgpu_bo_list_handle bo_list; >> + amdgpu_va_handle va_handle; >> + >> + r = amdgpu_cs_ctx_create(device_handle, &context_handle); >> + CU_ASSERT_EQUAL(r, 0); >> + >> + r = amdgpu_bo_alloc_and_map_raw(device_handle, 4096, 4096, >> + AMDGPU_GEM_DOMAIN_GTT, 0, 0, >> + &ib_result_handle, >> &ib_result_cpu, >> + >> &ib_result_mc_address, &va_handle); >> + CU_ASSERT_EQUAL(r, 0); >> + >> + r = amdgpu_get_bo_list(device_handle, ib_result_handle, NULL, >> + &bo_list); >> + CU_ASSERT_EQUAL(r, 0); >> + >> + ptr = ib_result_cpu; >> + i = 0; >> + >> + ptr[i++]
Re: [PATCH v2] drm/scheduler: Add drm_sched_job_cleanup
Acked-by: Andrey Grodzovsky Andrey On 10/29/2018 05:32 AM, Sharat Masetty wrote: > This patch adds a new API to clean up the scheduler job resources. This > is primarliy needed in cases the job was created but was not queued to > the scheduler queue. Additionally with this change, the layer which > creates the scheduler job also gets to free up the job's resources and > this entails moving the dma_fence_put(finished_fence) to the drivers > ops free handler routines. > > Signed-off-by: Sharat Masetty > --- > Changes from v1: > Addressed review comments from Christian Koenig > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 ++ > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 3 +++ > drivers/gpu/drm/scheduler/sched_entity.c | 1 - > drivers/gpu/drm/scheduler/sched_main.c | 13 - > drivers/gpu/drm/v3d/v3d_sched.c | 2 ++ > include/drm/gpu_scheduler.h | 1 + > 7 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index 663043c..5d768f9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -1260,8 +1260,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, > return 0; > > error_abort: > - dma_fence_put(&job->base.s_fence->finished); > - job->base.s_fence = NULL; > + drm_sched_job_cleanup(&job->base); > amdgpu_mn_unlock(p->mn); > > error_unlock: > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > index 755f733..e0af44f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > @@ -112,6 +112,8 @@ static void amdgpu_job_free_cb(struct drm_sched_job > *s_job) > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); > struct amdgpu_job *job = to_amdgpu_job(s_job); > > + drm_sched_job_cleanup(s_job); > + > amdgpu_ring_priority_put(ring, s_job->s_priority); > dma_fence_put(job->fence); > amdgpu_sync_free(&job->sync); > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index e7c3ed6..6f3c9bf 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -127,6 +127,8 @@ static void etnaviv_sched_free_job(struct drm_sched_job > *sched_job) > { > struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); > > + drm_sched_job_cleanup(sched_job); > + > etnaviv_submit_put(submit); > } > > @@ -159,6 +161,7 @@ int etnaviv_sched_push_job(struct drm_sched_entity > *sched_entity, > submit->out_fence, 0, > INT_MAX, GFP_KERNEL); > if (submit->out_fence_id < 0) { > + drm_sched_job_cleanup(&submit->sched_job); > ret = -ENOMEM; > goto out_unlock; > } > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index 3e22a54..8ff9d21f 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -204,7 +204,6 @@ static void drm_sched_entity_kill_jobs_cb(struct > dma_fence *f, > > drm_sched_fence_finished(job->s_fence); > WARN_ON(job->s_fence->parent); > - dma_fence_put(&job->s_fence->finished); > job->sched->ops->free_job(job); > } > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 44fe587..14009af 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -220,7 +220,6 @@ static void drm_sched_job_finish(struct work_struct *work) > drm_sched_start_timeout(sched); > spin_unlock(&sched->job_list_lock); > > - dma_fence_put(&s_job->s_fence->finished); > sched->ops->free_job(s_job); > } > > @@ -424,6 +423,18 @@ int drm_sched_job_init(struct drm_sched_job *job, > EXPORT_SYMBOL(drm_sched_job_init); > > /** > + * drm_sched_job_cleanup - clean up scheduler job resources > + * > + * @job: scheduler job to clean up > + */ > +void drm_sched_job_cleanup(struct drm_sched_job *job) > +{ > + dma_fence_put(&job->s_fence->finished); > + job->s_fence = NULL; > +} > +EXPORT_SYMBOL(drm_sched_job_cleanup); > + > +/** >* drm_sched_ready -
Re: [PATCH v2 1/2] drm/sched: Add boolean to mark if sched is ready to work v2
On 10/22/2018 05:33 AM, Koenig, Christian wrote: > Am 19.10.18 um 22:52 schrieb Andrey Grodzovsky: >> Problem: >> A particular scheduler may become unsuable (underlying HW) after >> some event (e.g. GPU reset). If it's later chosen by >> the get free sched. policy a command will fail to be >> submitted. >> >> Fix: >> Add a driver specific callback to report the sched status so >> rq with bad sched can be avoided in favor of working one or >> none in which case job init will fail. >> >> v2: Switch from driver callback to flag in scheduler. >> >> Signed-off-by: Andrey Grodzovsky >> --- >>drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +- >>drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +- >>drivers/gpu/drm/scheduler/sched_entity.c | 9 - >>drivers/gpu/drm/scheduler/sched_main.c| 10 +- >>drivers/gpu/drm/v3d/v3d_sched.c | 4 ++-- >>include/drm/gpu_scheduler.h | 5 - >>6 files changed, 25 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> index 5448cf2..bf845b0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> @@ -450,7 +450,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring >> *ring, >> >> r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, >> num_hw_submission, amdgpu_job_hang_limit, >> - timeout, ring->name); >> + timeout, ring->name, false); >> if (r) { >> DRM_ERROR("Failed to create scheduler on ring %s.\n", >>ring->name); >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index f8c5f1e..9dca347 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -178,7 +178,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) >> >> ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, >> etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, >> - msecs_to_jiffies(500), dev_name(gpu->dev)); >> + msecs_to_jiffies(500), dev_name(gpu->dev), true); >> if (ret) >> return ret; >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c >> b/drivers/gpu/drm/scheduler/sched_entity.c >> index 3e22a54..ba54c30 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -130,7 +130,14 @@ drm_sched_entity_get_free_sched(struct drm_sched_entity >> *entity) >> int i; >> >> for (i = 0; i < entity->num_rq_list; ++i) { >> -num_jobs = atomic_read(&entity->rq_list[i]->sched->num_jobs); >> +struct drm_gpu_scheduler *sched = entity->rq_list[i]->sched; >> + >> +if (!entity->rq_list[i]->sched->ready) { >> +DRM_WARN("sched%s is not ready, skipping", sched->name); >> +continue; >> +} >> + >> +num_jobs = atomic_read(&sched->num_jobs); >> if (num_jobs < min_jobs) { >> min_jobs = num_jobs; >> rq = entity->rq_list[i]; >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 63b997d..772adec 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -420,6 +420,9 @@ int drm_sched_job_init(struct drm_sched_job *job, >> struct drm_gpu_scheduler *sched; >> >> drm_sched_entity_select_rq(entity); >> +if (!entity->rq) >> +return -ENOENT; >> + >> sched = entity->rq->sched; >> >> job->sched = sched; >> @@ -598,6 +601,7 @@ static int drm_sched_main(void *param) >> * @hang_limit: number of times to allow a job to hang before dropping it >> * @timeout: timeout value in jiffies for the scheduler >> * @name: name used for debugging >> + * @ready: marks if the underlying HW is ready to work >> * >> * Return 0 on success, otherwise error code. >> */ >> @@ -606,7 +610,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, >> unsigned hw_submission, >> unsigned hang_limit, >> long timeout, >> - const char *name) >> + const char *name, >> + bool ready) > Please drop the ready flag here. We should consider a scheduler ready as > soon as it is initialized. Not totally agree with this because this flag marks that HW ready to run (the HW ring) and not the scheduler which is SW entity, For amdgpu - drm_sched_init is called from the sw_init stage while the ring initialization and tests takes place in hw_init stage. Maybe if the flag is named 'hw_ready' instead of just 'ready' it wou
Re: [PATCH v3 2/2] drm/amdgpu: Retire amdgpu_ring.ready flag v3
On 10/23/2018 05:23 AM, Christian König wrote: > Am 22.10.18 um 22:46 schrieb Andrey Grodzovsky: >> Start using drm_gpu_scheduler.ready isntead. >> >> v3: >> Add helper function to run ring test and set >> sched.ready flag status accordingly, clean explicit >> sched.ready sets from the IP specific files. >> >> Signed-off-by: Andrey Grodzovsky >> --- > >> [SNIP] >> + >> +int amdgpu_ring_test_helper(struct amdgpu_ring *ring) > > This needs some kernel doc, with that fixed the patch is Reviewed-by: > Christian König > > Did you missed my comment on the first patch? > > Thanks for the nice cleanup, > Christian. Yea, it seems like mails destined with dri-devel in CC occasionally end up in dri-devel folder and not in my inbox even when sent to me. Will reply soon. Andrey ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel