Re: [PATCH] drm/gem: Expose the buffer object handle to userspace last
On 14/02/2023 12:50, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin > > Currently drm_gem_handle_create_tail exposes the handle to userspace > before the buffer object constructions is complete. This allowing > of working against a partially constructed object, which may also be in > the process of having its creation fail, can have a range of negative > outcomes. > > A lot of those will depend on what the individual drivers are doing in > their obj->funcs->open() callbacks, and also with a common failure mode > being -ENOMEM from drm_vma_node_allow. > > We can make sure none of this can happen by allocating a handle last, > although with a downside that more of the function now runs under the > dev->object_name_lock. > > Looking into the individual drivers open() hooks, we have > amdgpu_gem_object_open which seems like it could have a potential security > issue without this change. > > A couple drivers like qxl_gem_object_open and vmw_gem_object_open > implement no-op hooks so no impact for them. > > A bunch of other require a deeper look by individual owners to asses for > impact. Those are lima_gem_object_open, nouveau_gem_object_open, > panfrost_gem_open, radeon_gem_object_open and virtio_gpu_gem_object_open. I've looked over the panfrost code, and I can't see how this could create a security hole there. It looks like there's a path which can confuse the shrinker (so objects might not be purged when they could be[1]) but they would be freed properly in the normal path - so no worse than user space could already do. [1] gpu_usecount is incremented in panfrost_lookup_bos() per bo, but not decremented on failure. > Putting aside the risk assesment of the above, some common scenarios to > think about are along these lines: > > 1) > Userspace closes a handle by speculatively "guessing" it from a second > thread. > > This results in an unreachable buffer object so, a memory leak. > > 2) > Same as 1), but object is in the process of getting closed (failed > creation). > > The second thread is then able to re-cycle the handle and idr_remove would > in the first thread would then remove the handle it does not own from the > idr. This, however, looks plausible - and I can see how this could potentially trigger a security hole in user space. > 3) > Going back to the earlier per driver problem space - individual impact > assesment of allowing a second thread to access and operate on a partially > constructed handle / object. (Can something crash? Leak information?) > > In terms of identifying when the problem started I will tag some patches > as references, but not all, if even any, of them actually point to a > broken state. I am just identifying points at which more opportunity for > issues to arise was added. > > References: 304eda32920b ("drm/gem: add hooks to notify driver when object > handle is created/destroyed") > References: ca481c9b2a3a ("drm/gem: implement vma access management") > References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs") > Cc: dri-de...@lists.freedesktop.org > Cc: Rob Clark > Cc: Ben Skeggs > Cc: David Herrmann > Cc: Noralf Trønnes > Cc: David Airlie > Cc: Daniel Vetter > Cc: amd-gfx@lists.freedesktop.org > Cc: l...@lists.freedesktop.org > Cc: nouv...@lists.freedesktop.org > Cc: Steven Price > Cc: virtualizat...@lists.linux-foundation.org > Cc: spice-de...@lists.freedesktop.org > Cc: Zack Rusin FWIW I think this makes the code easier to reason about, so Reviewed-by: Steven Price > --- > drivers/gpu/drm/drm_gem.c | 48 +++ > 1 file changed, 24 insertions(+), 24 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index aa15c52ae182..e3d897bca0f2 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, > u32 *handlep) > { > struct drm_device *dev = obj->dev; > - u32 handle; > int ret; > > WARN_ON(!mutex_is_locked(>object_name_lock)); > if (obj->handle_count++ == 0) > drm_gem_object_get(obj); > > + ret = drm_vma_node_allow(>vma_node, file_priv); > + if (ret) > + goto err_put; > + > + if (obj->funcs->open) { > + ret = obj->funcs->open(obj, file_priv); > + if (ret) > + goto err_revoke; > + } > + > /* > - * Get the user-visible handle using idr. Preload and perform > - * allocation under our spinlock. > + * Get the user-visible handle
Re: [PATCH v6 04/22] drm/panfrost: Fix shrinker list corruption by madvise IOCTL
On 27/05/2022 00:50, Dmitry Osipenko wrote: > Calling madvise IOCTL twice on BO causes memory shrinker list corruption > and crashes kernel because BO is already on the list and it's added to > the list again, while BO should be removed from from the list before it's > re-added. Fix it. > > Cc: sta...@vger.kernel.org > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support") > Signed-off-by: Dmitry Osipenko Reviewed-by: Steven Price > --- > drivers/gpu/drm/panfrost/panfrost_drv.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c > b/drivers/gpu/drm/panfrost/panfrost_drv.c > index 087e69b98d06..b1e6d238674f 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c > @@ -433,8 +433,8 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, > void *data, > > if (args->retained) { > if (args->madv == PANFROST_MADV_DONTNEED) > - list_add_tail(>base.madv_list, > - >shrinker_list); > + list_move_tail(>base.madv_list, > +>shrinker_list); > else if (args->madv == PANFROST_MADV_WILLNEED) > list_del_init(>base.madv_list); > }
Re: [bug report] new kvmalloc() WARN() triggered by DRM ioctls tracking
On 16/12/2021 14:15, Boris Brezillon wrote: > Hi Steve, > > On Thu, 16 Dec 2021 14:02:25 + > Steven Price wrote: > >> + Boris >> >> On 16/12/2021 12:08, Dan Carpenter wrote: >>> Hi DRM Devs, >>> >>> In commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") >>> from July, Linus added a WARN_ONCE() for "crazy" allocations over 2GB. >>> I have a static checker warning for this and most of the warnings are >>> from DRM ioctls. >>> >>> drivers/gpu/drm/lima/lima_drv.c:124 lima_ioctl_gem_submit() warn: uncapped >>> user size for kvmalloc() will WARN >>> drivers/gpu/drm/radeon/radeon_cs.c:291 radeon_cs_parser_init() warn: >>> uncapped user size for kvmalloc() will WARN >>> drivers/gpu/drm/v3d/v3d_gem.c:311 v3d_lookup_bos() warn: uncapped user size >>> for kvmalloc() will WARN >>> drivers/gpu/drm/v3d/v3d_gem.c:319 v3d_lookup_bos() warn: uncapped user size >>> for kvmalloc() will WARN >>> drivers/gpu/drm/v3d/v3d_gem.c:601 v3d_get_multisync_post_deps() warn: >>> uncapped user size for kvmalloc() will WARN >>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:476 etnaviv_ioctl_gem_submit() >>> warn: uncapped user size for kvmalloc() will WARN >>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:477 etnaviv_ioctl_gem_submit() >>> warn: uncapped user size for kvmalloc() will WARN >>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:478 etnaviv_ioctl_gem_submit() >>> warn: uncapped user size for kvmalloc() will WARN >>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:479 etnaviv_ioctl_gem_submit() >>> warn: uncapped user size for kvmalloc() will WARN >>> drivers/gpu/drm/virtio/virtgpu_ioctl.c:186 virtio_gpu_execbuffer_ioctl() >>> warn: uncapped user size for kvmalloc() will WARN >>> drivers/gpu/drm/panfrost/panfrost_drv.c:198 panfrost_copy_in_sync() warn: >>> uncapped user size for kvmalloc() will WARN >> >> I believe this one in Panfrost would be fixed by Boris's series >> reworking the submit ioctl[1]. >> >> Boris: are you planning on submitting that series soon - or is it worth >> cherry picking the rework in patch 5 to fix this issue? > > Don't know when I'll get back to it, so I'd recommend cherry-picking > what you need. Thanks, no problem - it was mostly when I looked at the code I had the feeling that "surely this has already been fixed", then discovered your series was never merged ;) I'll hammer out a patch for this one issue. Thanks, Steve
Re: [bug report] new kvmalloc() WARN() triggered by DRM ioctls tracking
+ Boris On 16/12/2021 12:08, Dan Carpenter wrote: > Hi DRM Devs, > > In commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > from July, Linus added a WARN_ONCE() for "crazy" allocations over 2GB. > I have a static checker warning for this and most of the warnings are > from DRM ioctls. > > drivers/gpu/drm/lima/lima_drv.c:124 lima_ioctl_gem_submit() warn: uncapped > user size for kvmalloc() will WARN > drivers/gpu/drm/radeon/radeon_cs.c:291 radeon_cs_parser_init() warn: uncapped > user size for kvmalloc() will WARN > drivers/gpu/drm/v3d/v3d_gem.c:311 v3d_lookup_bos() warn: uncapped user size > for kvmalloc() will WARN > drivers/gpu/drm/v3d/v3d_gem.c:319 v3d_lookup_bos() warn: uncapped user size > for kvmalloc() will WARN > drivers/gpu/drm/v3d/v3d_gem.c:601 v3d_get_multisync_post_deps() warn: > uncapped user size for kvmalloc() will WARN > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:476 etnaviv_ioctl_gem_submit() > warn: uncapped user size for kvmalloc() will WARN > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:477 etnaviv_ioctl_gem_submit() > warn: uncapped user size for kvmalloc() will WARN > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:478 etnaviv_ioctl_gem_submit() > warn: uncapped user size for kvmalloc() will WARN > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:479 etnaviv_ioctl_gem_submit() > warn: uncapped user size for kvmalloc() will WARN > drivers/gpu/drm/virtio/virtgpu_ioctl.c:186 virtio_gpu_execbuffer_ioctl() > warn: uncapped user size for kvmalloc() will WARN > drivers/gpu/drm/panfrost/panfrost_drv.c:198 panfrost_copy_in_sync() warn: > uncapped user size for kvmalloc() will WARN I believe this one in Panfrost would be fixed by Boris's series reworking the submit ioctl[1]. Boris: are you planning on submitting that series soon - or is it worth cherry picking the rework in patch 5 to fix this issue? [1] https://lore.kernel.org/all/20210930190954.1525933-1-boris.brezil...@collabora.com/ > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:120 amdgpu_cs_parser_init() warn: > uncapped user size for kvmalloc() will WARN > > These ioctls can all trigger the stack dump. The line numbers are from > linux next (next-20211214). > > I feel like ideally if this could be fixed in a central way, but if not > then hopefully I've added the relevant lists to the CC. I've only looked at Panfrost, but at least here we're simply allowing user space to allocate an arbitrary amount of kernel memory in one go - which is always going to be a good way of triggering the OOM killer if nothing else. Boris's series includes a change that means instead trying to copy an (attacker controller) sized array into the kernel to process, we copy each each element of the array in turn. So I don't really see how this could be fixed in a central way (but some of the other cases might be different). Thanks, Steve > regards, > dan carpenter >
Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
On 26/03/2021 02:04, Zhang, Jack (Jian) wrote: [AMD Official Use Only - Internal Distribution Only] Hi, Steve, Thank you for your detailed comments. But currently the patch is not finalized. We found some potential race condition even with this patch. The solution is under discussion and hopefully we could find an ideal one. After that, I will start to consider other drm-driver if it will influence other drivers(except for amdgpu). No problem. Please keep me CC'd, the suggestion of using reference counts may be beneficial for Panfrost as we already build a reference count on top of struct drm_sched_job. So there may be scope for cleaning up Panfrost afterwards even if your work doesn't directly affect it. Thanks, Steve Best, Jack -Original Message- From: Steven Price Sent: Monday, March 22, 2021 11:29 PM To: Zhang, Jack (Jian) ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily ; Rob Herring ; Tomeu Vizoso Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak On 15/03/2021 05:23, Zhang, Jack (Jian) wrote: [AMD Public Use] Hi, Rob/Tomeu/Steven, Would you please help to review this patch for panfrost driver? Thanks, Jack Zhang -Original Message- From: Jack Zhang Sent: Monday, March 15, 2021 1:21 PM To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily Cc: Zhang, Jack (Jian) Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak re-insert Bailing jobs to avoid memory leak. V2: move re-insert step to drm/scheduler logic V3: add panfrost's return value for bailing jobs in case it hits the memleak issue. This commit message could do with some work - it's really hard to decipher what the actual problem you're solving is. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++-- drivers/gpu/drm/panfrost/panfrost_job.c| 4 ++-- drivers/gpu/drm/scheduler/sched_main.c | 8 +++- include/drm/gpu_scheduler.h| 1 + 5 files changed, 19 insertions(+), 6 deletions(-) [...] diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 6003cfeb1322..e2cb4f32dae1 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job * spurious. Bail out. */ if (dma_fence_is_signaled(job->done_fence)) -return DRM_GPU_SCHED_STAT_NOMINAL; +return DRM_GPU_SCHED_STAT_BAILING; dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", js, @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job /* Scheduler is already stopped, nothing to do. */ if (!panfrost_scheduler_stop(>js->queue[js], sched_job)) -return DRM_GPU_SCHED_STAT_NOMINAL; +return DRM_GPU_SCHED_STAT_BAILING; /* Schedule a reset if there's no reset in progress. */ if (!atomic_xchg(>reset.pending, 1)) This looks correct to me - in these two cases drm_sched_stop() is not called on the sched_job, so it looks like currently the job will be leaked. diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 92d8de24d0a1..a44f621fb5c4 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work) { struct drm_gpu_scheduler *sched; struct drm_sched_job *job; +int ret; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work) list_del_init(>list); spin_unlock(>job_list_lock); -job->sched->ops->timedout_job(job); +ret = job->sched->ops->timedout_job(job); +if (ret == DRM_GPU_SCHED_STAT_BAILING) { +spin_lock(>job_list_lock); +list_add(>node, >ring_mirror_list); +spin_unlock(>job_list_lock); +} I think we could really do with a comment somewhere explaining what "bailing" means in this context. For the Panfrost case we have two cases: * The GPU job actually finished while the timeout code was running (done_fence is signalled). * The GPU is already in the process of being reset (Panfrost has multiple queues, so mostly like a bad job in another queue). I'm also not convinced that (for Panfrost) it makes sense to be adding the jobs back to the list. For the first case above clearly the job could just be freed (it's complete). The second case is more interesting and Panfrost currently doesn't handle this well. In theory the driver could try to rescue the job ('soft sto
Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak
On 15/03/2021 05:23, Zhang, Jack (Jian) wrote: [AMD Public Use] Hi, Rob/Tomeu/Steven, Would you please help to review this patch for panfrost driver? Thanks, Jack Zhang -Original Message- From: Jack Zhang Sent: Monday, March 15, 2021 1:21 PM To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk ; Deng, Emily Cc: Zhang, Jack (Jian) Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak re-insert Bailing jobs to avoid memory leak. V2: move re-insert step to drm/scheduler logic V3: add panfrost's return value for bailing jobs in case it hits the memleak issue. This commit message could do with some work - it's really hard to decipher what the actual problem you're solving is. Signed-off-by: Jack Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++-- drivers/gpu/drm/panfrost/panfrost_job.c| 4 ++-- drivers/gpu/drm/scheduler/sched_main.c | 8 +++- include/drm/gpu_scheduler.h| 1 + 5 files changed, 19 insertions(+), 6 deletions(-) [...] diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 6003cfeb1322..e2cb4f32dae1 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job * spurious. Bail out. */ if (dma_fence_is_signaled(job->done_fence)) - return DRM_GPU_SCHED_STAT_NOMINAL; + return DRM_GPU_SCHED_STAT_BAILING; dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p", js, @@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct drm_sched_job /* Scheduler is already stopped, nothing to do. */ if (!panfrost_scheduler_stop(>js->queue[js], sched_job)) - return DRM_GPU_SCHED_STAT_NOMINAL; + return DRM_GPU_SCHED_STAT_BAILING; /* Schedule a reset if there's no reset in progress. */ if (!atomic_xchg(>reset.pending, 1)) This looks correct to me - in these two cases drm_sched_stop() is not called on the sched_job, so it looks like currently the job will be leaked. diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 92d8de24d0a1..a44f621fb5c4 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work) { struct drm_gpu_scheduler *sched; struct drm_sched_job *job; + int ret; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); @@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work) list_del_init(>list); spin_unlock(>job_list_lock); - job->sched->ops->timedout_job(job); + ret = job->sched->ops->timedout_job(job); + if (ret == DRM_GPU_SCHED_STAT_BAILING) { + spin_lock(>job_list_lock); + list_add(>node, >ring_mirror_list); + spin_unlock(>job_list_lock); + } I think we could really do with a comment somewhere explaining what "bailing" means in this context. For the Panfrost case we have two cases: * The GPU job actually finished while the timeout code was running (done_fence is signalled). * The GPU is already in the process of being reset (Panfrost has multiple queues, so mostly like a bad job in another queue). I'm also not convinced that (for Panfrost) it makes sense to be adding the jobs back to the list. For the first case above clearly the job could just be freed (it's complete). The second case is more interesting and Panfrost currently doesn't handle this well. In theory the driver could try to rescue the job ('soft stop' in Mali language) so that it could be resubmitted. Panfrost doesn't currently support that, so attempting to resubmit the job is almost certainly going to fail. It's on my TODO list to look at improving Panfrost in this regard, but sadly still quite far down. Steve /* * Guilty job did complete and hence needs to be manually removed * See drm_sched_stop doc. diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 4ea8606d91fe..8093ac2427ef 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -210,6 +210,7 @@ enum drm_gpu_sched_stat { DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */ DRM_GPU_SCHED_STAT_NOMINAL, DRM_GPU_SCHED_STAT_ENODEV, + DRM_GPU_SCHED_STAT_BAILING, }; /** ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org
Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v3)
On 20/01/2021 20:09, Luben Tuikov wrote: This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the device (GPU) is no longer available, such as after it's been unplugged, or whether all is normal, i.e. current behaviour. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_GPU_SCHED_STAT_NOMINAL to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. v2: Use enum as the status of a driver's job timeout callback method. v3: Return scheduler/device information, rather than task information. Cc: Alexander Deucher Cc: Andrey Grodzovsky Cc: Christian König Cc: Daniel Vetter Cc: Lucas Stach Cc: Russell King Cc: Christian Gmeiner Cc: Qiang Yu Cc: Rob Herring Cc: Tomeu Vizoso Cc: Steven Price Cc: Alyssa Rosenzweig Cc: Eric Anholt Reported-by: kernel test robot Signed-off-by: Luben Tuikov Acked-by: Steven Price ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)
On 11/12/2020 21:44, Luben Tuikov wrote: On 2020-12-10 4:46 a.m., Steven Price wrote: On 10/12/2020 02:14, Luben Tuikov wrote: This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the task (job) was successfully aborted or whether more time should be given to the task to complete. I find the definitions given a little confusing, see below. Default behaviour as of this patch, is preserved, except in obvious-by-comment case in the Panfrost driver, as documented below. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_TASK_STATUS_ALIVE to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. In the case of the Panfrost driver, its timedout callback correctly first checks if the job had completed in due time and if so, it now returns DRM_TASK_STATUS_COMPLETE to notify the DRM layer that the task can be moved to the done list, to be freed later. In the other two subsequent checks, the value of DRM_TASK_STATUS_ALIVE is returned, as per the default behaviour. A more involved driver's solutions can be had in subequent patches. NIT: ^ subsequent Thank you--no idea how my spellcheck and checkpatch.pl missed that. v2: Use enum as the status of a driver's job timeout callback method. Cc: Alexander Deucher Cc: Andrey Grodzovsky Cc: Christian König Cc: Daniel Vetter Cc: Lucas Stach Cc: Russell King Cc: Christian Gmeiner Cc: Qiang Yu Cc: Rob Herring Cc: Tomeu Vizoso Cc: Steven Price Cc: Alyssa Rosenzweig Cc: Eric Anholt Reported-by: kernel test robot This reported-by seems a little odd for this patch. I got this because I wasn't (originally) changing all drivers which were using the task timeout callback. Should I remove it? Well the kernel test robot would only have "reported" things like build failures and this patch isn't really 'fixing' anything (as you state it does not change current behaviour). So personally I'm not sure how the kernel test robot triggered you to write this patch. But equally if you were inspired by it and want to credit it, that's fine by me! I'd assumed that this was just a copy-paste error and you'd taken the list of CC's from another patch and accidentally included the Reported-by too. Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 +++-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++- drivers/gpu/drm/lima/lima_sched.c | 4 +++- drivers/gpu/drm/panfrost/panfrost_job.c | 9 --- drivers/gpu/drm/scheduler/sched_main.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 32 + include/drm/gpu_scheduler.h | 20 +--- 7 files changed, 57 insertions(+), 28 deletions(-) [] diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 2e0c368e19f6..cedfc5394e52 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, return s_job && atomic_inc_return(_job->karma) > threshold; } +enum drm_task_status { + DRM_TASK_STATUS_COMPLETE, + DRM_TASK_STATUS_ALIVE +}; + /** * struct drm_sched_backend_ops * @@ -230,10 +235,19 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job); /** - * @timedout_job: Called when a job has taken too long to execute, - * to trigger GPU recovery. +* @timedout_job: Called when a job has taken too long to execute, +* to trigger GPU recovery. +* +* Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy +* and executing in the hardware, i.e. it needs more time. So 'alive' means the job (was) alive, and GPU recovery is happening. I.e. it's the job just takes too long. Panfrost will trigger a GPU reset (killing the job) in this case while returning DRM_TASK_STATUS_ALIVE. "ALIVE" means "at this moment the task is in the hardware executing, using memories, DMA engines, compute units, the whole thing." You, can also call it active, executing, busy, etc. "ALIVE" makes no assumptions about how the task got there. Maybe this was original submission, and the task is taking its sweet time. Maybe the driver aborted it and reissued it, all in the timer timeout callback, etc. It merely tells the DRM layer that the task is actively executing in the hardware, so no assumptions can be made about freeing memories, decrementing krefs, etc. IOW, it's executing, check back later. Ok, makes sense. Although I think the comment about "it needs more time" could do with a clarification that it's up to the timeout handler wh
Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)
On 10/12/2020 02:14, Luben Tuikov wrote: This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the task (job) was successfully aborted or whether more time should be given to the task to complete. I find the definitions given a little confusing, see below. Default behaviour as of this patch, is preserved, except in obvious-by-comment case in the Panfrost driver, as documented below. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_TASK_STATUS_ALIVE to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. In the case of the Panfrost driver, its timedout callback correctly first checks if the job had completed in due time and if so, it now returns DRM_TASK_STATUS_COMPLETE to notify the DRM layer that the task can be moved to the done list, to be freed later. In the other two subsequent checks, the value of DRM_TASK_STATUS_ALIVE is returned, as per the default behaviour. A more involved driver's solutions can be had in subequent patches. NIT: ^ subsequent v2: Use enum as the status of a driver's job timeout callback method. Cc: Alexander Deucher Cc: Andrey Grodzovsky Cc: Christian König Cc: Daniel Vetter Cc: Lucas Stach Cc: Russell King Cc: Christian Gmeiner Cc: Qiang Yu Cc: Rob Herring Cc: Tomeu Vizoso Cc: Steven Price Cc: Alyssa Rosenzweig Cc: Eric Anholt Reported-by: kernel test robot This reported-by seems a little odd for this patch. Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 +++-- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++- drivers/gpu/drm/lima/lima_sched.c | 4 +++- drivers/gpu/drm/panfrost/panfrost_job.c | 9 --- drivers/gpu/drm/scheduler/sched_main.c | 4 +--- drivers/gpu/drm/v3d/v3d_sched.c | 32 + include/drm/gpu_scheduler.h | 20 +--- 7 files changed, 57 insertions(+), 28 deletions(-) [] diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 2e0c368e19f6..cedfc5394e52 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, return s_job && atomic_inc_return(_job->karma) > threshold; } +enum drm_task_status { + DRM_TASK_STATUS_COMPLETE, + DRM_TASK_STATUS_ALIVE +}; + /** * struct drm_sched_backend_ops * @@ -230,10 +235,19 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job); /** - * @timedout_job: Called when a job has taken too long to execute, - * to trigger GPU recovery. +* @timedout_job: Called when a job has taken too long to execute, +* to trigger GPU recovery. +* +* Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy +* and executing in the hardware, i.e. it needs more time. So 'alive' means the job (was) alive, and GPU recovery is happening. I.e. it's the job just takes too long. Panfrost will trigger a GPU reset (killing the job) in this case while returning DRM_TASK_STATUS_ALIVE. +* +* Return DRM_TASK_STATUS_COMPLETE, if the task (job) has +* been aborted or is unknown to the hardware, i.e. if +* the task is out of the hardware, and maybe it is now +* in the done list, or it was completed long ago, or +* if it is unknown to the hardware. Where 'complete' seems to mean a variety of things: * The job completed successfully (i.e. the timeout raced), this is the situation that Panfrost detects. In this case (and only this case) the GPU reset will *not* happen. * The job failed (aborted) and is no longer on the hardware. Panfrost currently handles a job failure by triggering drm_sched_fault() to trigger the timeout handler. But the timeout handler doesn't handle this differently so will return DRM_TASK_STATUS_ALIVE. * The job is "unknown to hardware". There are some corner cases in Panfrost (specifically two early returns from panfrost_job_hw_submit()) where the job never actually lands on the hardware, but the scheduler isn't informed. We currently rely on the timeout handling to recover from that. However, again, the timeout handler doesn't know about this soo will return DRM_TASK_STATUS_ALIVE. So of the four cases listed in these comments, Panfrost is only getting 2 'correct' after this change. But what I really want to know is what the scheduler is planning to do in these situations? The Panfrost return value in this patch is really a "did we trigger a GPU reset" - and doesn't seem to match the descriptions above. Steve */ - void (*timedout_job)(struct drm_s
Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status
On 25/11/2020 11:15, Lucas Stach wrote: Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price: On 25/11/2020 03:17, Luben Tuikov wrote: The job timeout handler now returns status indicating back to the DRM layer whether the job was successfully cancelled or whether more time should be given to the job to complete. I'm not sure I understand in what circumstances you would want to give the job more time to complete. Could you expand on that? On etnaviv we don't have the ability to preempt a running job, but we can look at the GPU state to determine if it's still making progress with the current job, so we want to extend the timeout in that case to not kill a long running but valid job. Ok, fair enough. Although from my experience (on Mali) jobs very rarely "get stuck" it's just that their run time can be excessive[1] causing other processes to not make forward progress. So I'd expect the timeout to be set based on how long a job can run before you need to stop it to allow other processes to run their jobs. But I'm not familiar with etnaviv so perhaps stuck jobs are actually a thing there. Thanks, Steve [1] Also on Mali it's quite possible to create an infinite duration job which appears to be making forward progress, so in that case our measure of progress isn't useful against these malicious jobs. Regards, Lucas One thing we're missing at the moment in Panfrost is the ability to suspend ("soft stop" is the Mali jargon) a job and pick something else to run. The propitiatory driver stack uses this to avoid timing out long running jobs while still allowing other processes to have time on the GPU. But this interface as it stands doesn't seem to provide that. As the kernel test robot has already pointed out - you'll need to at the very least update the other uses of this interface. Steve Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 -- include/drm/gpu_scheduler.h | 13 ++--- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index ff48101bab55..81b73790ecc6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,7 +28,7 @@ #include "amdgpu.h" #include "amdgpu_trace.h" -static void amdgpu_job_timedout(struct drm_sched_job *s_job) +static int amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name); - return; + return 0; } amdgpu_vm_get_task_info(ring->adev, job->pasid, ); @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) if (amdgpu_device_should_recover_gpu(ring->adev)) { amdgpu_device_gpu_recover(ring->adev, job); + return 0; } else { drm_sched_suspend_timeout(>sched); if (amdgpu_sriov_vf(adev)) adev->virt.tdr_debug = true; + return 1; } } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 2e0c368e19f6..61f7121e1c19 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -230,10 +230,17 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job); /** - * @timedout_job: Called when a job has taken too long to execute, - * to trigger GPU recovery. +* @timedout_job: Called when a job has taken too long to execute, +* to trigger GPU recovery. +* +* Return 0, if the job has been aborted successfully and will +* never be heard of from the device. Return non-zero if the +* job wasn't able to be aborted, i.e. if more time should be +* given to this job. The result is not "bool" as this +* function is not a predicate, although its result may seem +* as one. */ - void (*timedout_job)(struct drm_sched_job *sched_job); + int (*timedout_job)(struct drm_sched_job *sched_job); /** * @free_job: Called once the job's finished fence has been signaled ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 6/6] drm/sched: Make use of a "done" thread
On 25/11/2020 03:17, Luben Tuikov wrote: Add a "done" list to which all completed jobs are added to be freed. The drm_sched_job_done() callback is the producer of jobs to this list. Add a "done" thread which consumes from the done list and frees up jobs. Now, the main scheduler thread only pushes jobs to the GPU and the "done" thread frees them up, on the way out of the GPU when they've completed execution. Generally I'd be in favour of a "done thread" as I think there are some murky corners of Panfrost's locking that would be helped by deferring the free_job() callback. But I think you're trying to do too much in one patch here. And as Christian has pointed out there's some dodgy looking changes to locking which aren't explained. Steve Make use of the status returned by the GPU driver timeout handler to decide whether to leave the job in the pending list, or to send it off to the done list. If a job is done, it is added to the done list and the done thread woken up. If a job needs more time, it is left on the pending list and the timeout timer restarted. Eliminate the polling mechanism of picking out done jobs from the pending list, i.e. eliminate drm_sched_get_cleanup_job(). Now the main scheduler thread only pushes jobs down to the GPU. Various other optimizations to the GPU scheduler and job recovery are possible with this format. Signed-off-by: Luben Tuikov --- drivers/gpu/drm/scheduler/sched_main.c | 173 + include/drm/gpu_scheduler.h| 14 ++ 2 files changed, 101 insertions(+), 86 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 3eb7618a627d..289ae68cd97f 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) * drm_sched_job_done - complete a job * @s_job: pointer to the job which is done * - * Finish the job's fence and wake up the worker thread. + * Finish the job's fence, move it to the done list, + * and wake up the done thread. */ static void drm_sched_job_done(struct drm_sched_job *s_job) { @@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job *s_job) dma_fence_get(_fence->finished); drm_sched_fence_finished(s_fence); dma_fence_put(_fence->finished); - wake_up_interruptible(>wake_up_worker); + + spin_lock(>job_list_lock); + list_move(_job->list, >done_list); + spin_unlock(>job_list_lock); + + wake_up_interruptible(>done_wait_q); } /** @@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* fence, EXPORT_SYMBOL(drm_sched_dependency_optimized); /** - * drm_sched_start_timeout - start timeout for reset worker - * - * @sched: scheduler instance to start the worker for + * drm_sched_start_timeout - start a timeout timer + * @sched: scheduler instance whose job we're timing * - * Start the timeout for the given scheduler. + * Start a timeout timer for the given scheduler. */ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched) { @@ -305,8 +310,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job) spin_lock(>job_list_lock); list_add_tail(_job->list, >pending_list); - drm_sched_start_timeout(sched); spin_unlock(>job_list_lock); + drm_sched_start_timeout(sched); } static void drm_sched_job_timedout(struct work_struct *work) @@ -316,37 +321,30 @@ static void drm_sched_job_timedout(struct work_struct *work) sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); - /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ spin_lock(>job_list_lock); job = list_first_entry_or_null(>pending_list, struct drm_sched_job, list); + spin_unlock(>job_list_lock); if (job) { - /* -* Remove the bad job so it cannot be freed by concurrent -* drm_sched_cleanup_jobs. It will be reinserted back after sched->thread -* is parked at which point it's safe. -*/ - list_del_init(>list); - spin_unlock(>job_list_lock); + int res; - job->sched->ops->timedout_job(job); + job->job_status |= DRM_JOB_STATUS_TIMEOUT; + res = job->sched->ops->timedout_job(job); + if (res == 0) { + /* The job is out of the device. +*/ + spin_lock(>job_list_lock); + list_move(>list, >done_list); + spin_unlock(>job_list_lock); - /* -* Guilty job did complete and hence needs to be manually removed -* See drm_sched_stop doc. -*/ - if (sched->free_guilty) { -
Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status
On 25/11/2020 03:17, Luben Tuikov wrote: The job timeout handler now returns status indicating back to the DRM layer whether the job was successfully cancelled or whether more time should be given to the job to complete. I'm not sure I understand in what circumstances you would want to give the job more time to complete. Could you expand on that? One thing we're missing at the moment in Panfrost is the ability to suspend ("soft stop" is the Mali jargon) a job and pick something else to run. The propitiatory driver stack uses this to avoid timing out long running jobs while still allowing other processes to have time on the GPU. But this interface as it stands doesn't seem to provide that. As the kernel test robot has already pointed out - you'll need to at the very least update the other uses of this interface. Steve Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 -- include/drm/gpu_scheduler.h | 13 ++--- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index ff48101bab55..81b73790ecc6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -28,7 +28,7 @@ #include "amdgpu.h" #include "amdgpu_trace.h" -static void amdgpu_job_timedout(struct drm_sched_job *s_job) +static int amdgpu_job_timedout(struct drm_sched_job *s_job) { struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched); struct amdgpu_job *job = to_amdgpu_job(s_job); @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) { DRM_ERROR("ring %s timeout, but soft recovered\n", s_job->sched->name); - return; + return 0; } amdgpu_vm_get_task_info(ring->adev, job->pasid, ); @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job) if (amdgpu_device_should_recover_gpu(ring->adev)) { amdgpu_device_gpu_recover(ring->adev, job); + return 0; } else { drm_sched_suspend_timeout(>sched); if (amdgpu_sriov_vf(adev)) adev->virt.tdr_debug = true; + return 1; } } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 2e0c368e19f6..61f7121e1c19 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -230,10 +230,17 @@ struct drm_sched_backend_ops { struct dma_fence *(*run_job)(struct drm_sched_job *sched_job); /** - * @timedout_job: Called when a job has taken too long to execute, - * to trigger GPU recovery. +* @timedout_job: Called when a job has taken too long to execute, +* to trigger GPU recovery. +* +* Return 0, if the job has been aborted successfully and will +* never be heard of from the device. Return non-zero if the +* job wasn't able to be aborted, i.e. if more time should be +* given to this job. The result is not "bool" as this +* function is not a predicate, although its result may seem +* as one. */ - void (*timedout_job)(struct drm_sched_job *sched_job); + int (*timedout_job)(struct drm_sched_job *sched_job); /** * @free_job: Called once the job's finished fence has been signaled ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
On 12/03/2020 16:37, Jason Gunthorpe wrote: On Thu, Mar 12, 2020 at 04:16:33PM +, Steven Price wrote: Actually, while you are looking at this, do you think we should be adding at least READ_ONCE in the pagewalk.c walk_* functions? The multiple references of pmd, pud, etc without locking seems sketchy to me. I agree it seems worrying. I'm not entirely sure whether the holding of mmap_sem is sufficient, I looked at this question, and at least for PMD, mmap_sem is not sufficient. I didn't easilly figure it out for the other ones I'm guessing if PMD is not safe then none of them are. this isn't something that I changed so I've just been hoping that it's sufficient since it seems to have been working (whether that's by chance because the compiler didn't generate multiple reads I've no idea). For walking the kernel's page tables the lack of READ_ONCE is also not great, but at least for PTDUMP we don't care too much about accuracy and it should be crash proof because there's no RCU grace period. And again the code I was replacing didn't have any special protection. I can't see any harm in updating the code to include READ_ONCE and I'm happy to review a patch. The reason I ask is because hmm's walkers often have this pattern where they get the pointer and then de-ref it (again) then immediately have to recheck the 'again' conditions of the walker itself because the re-read may have given a different value. Having the walker deref the pointer and pass the value it into the ops for use rather than repeatedly de-refing an unlocked value seems like a much safer design to me. Yeah that sounds like a good idea. If this also implicitly relies on a RCU grace period then it is also missing RCU locking... True - I'm not 100% sure in what situations a page table entry can be freed. Anshuman has added locking to deal with memory hotplug[1]. I believe this is sufficient. [1] bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for kernel page table dump") I also didn't quite understand why walk_pte_range() skipped locking the pte in the no_vma case - I don't get why vma would be related to locking here. The no_vma case is for walking the kernel's page tables and they may have entries that are not backed by struct page, so there isn't (reliably) a PTE lock to take. I also saw that hmm open coded the pte walk, presumably for performance, so I was thinking of adding some kind of pte_range() callback to avoid the expensive indirect function call per pte, but hmm also can't have the pmd locked... Yeah the callback per PTE is a bit heavy because of the indirect function call. I'm not sure how to optimise it beyond open coding at the PMD level. One option would be to provide helper functions to make it a bit more generic. Do you have an idea of what pte_range() would look like? Steve ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
On 12/03/2020 15:11, Jason Gunthorpe wrote: On Thu, Mar 12, 2020 at 02:40:08PM +, Steven Price wrote: On 12/03/2020 14:27, Jason Gunthorpe wrote: On Thu, Mar 12, 2020 at 10:28:13AM +, Steven Price wrote: By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud) condition early it's possible to remove the 'ret' variable and remove a level of indentation from half the function making the code easier to read. No functional change. Signed-off-by: Steven Price Thanks to Jason's changes there were only two code paths left using the out_unlock label so it seemed like a good opportunity to refactor. Yes, I made something very similar, what do you think of this: https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36 Even better! Sorry I didn't realise you'd already done this. I just saw that the function was needlessly complicated after your fix, so I thought I'd do a drive-by cleanup since part of the mess was my fault! :) No worries, I've got a lot of patches for hmm_range_fault right now, just trying to organize them, test them and post them. Haven't posted that one yet. Actually, while you are looking at this, do you think we should be adding at least READ_ONCE in the pagewalk.c walk_* functions? The multiple references of pmd, pud, etc without locking seems sketchy to me. I agree it seems worrying. I'm not entirely sure whether the holding of mmap_sem is sufficient, this isn't something that I changed so I've just been hoping that it's sufficient since it seems to have been working (whether that's by chance because the compiler didn't generate multiple reads I've no idea). For walking the kernel's page tables the lack of READ_ONCE is also not great, but at least for PTDUMP we don't care too much about accuracy and it should be crash proof because there's no RCU grace period. And again the code I was replacing didn't have any special protection. I can't see any harm in updating the code to include READ_ONCE and I'm happy to review a patch. Thanks, Steve ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
On 12/03/2020 14:27, Jason Gunthorpe wrote: On Thu, Mar 12, 2020 at 10:28:13AM +, Steven Price wrote: By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud) condition early it's possible to remove the 'ret' variable and remove a level of indentation from half the function making the code easier to read. No functional change. Signed-off-by: Steven Price --- Thanks to Jason's changes there were only two code paths left using the out_unlock label so it seemed like a good opportunity to refactor. Yes, I made something very similar, what do you think of this: https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36 Even better! Sorry I didn't realise you'd already done this. I just saw that the function was needlessly complicated after your fix, so I thought I'd do a drive-by cleanup since part of the mess was my fault! :) Thanks, Steve ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
On 11/03/2020 18:35, Jason Gunthorpe wrote: From: Jason Gunthorpe This eventually calls into handle_mm_fault() which is a sleeping function. Release the lock first. hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not need the lock. Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()") Cc: Steven Price Signed-off-by: Jason Gunthorpe Sorry about that, thanks for fixing. Reviewed-by: Steven Price --- mm/hmm.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/hmm.c b/mm/hmm.c index 9e8f68eb83287a..32dcbfd3908315 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, pud = READ_ONCE(*pudp); if (pud_none(pud)) { - ret = hmm_vma_walk_hole(start, end, -1, walk); - goto out_unlock; + spin_unlock(ptl); + return hmm_vma_walk_hole(start, end, -1, walk); } if (pud_huge(pud) && pud_devmap(pud)) { @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, bool fault, write_fault; if (!pud_present(pud)) { - ret = hmm_vma_walk_hole(start, end, -1, walk); - goto out_unlock; + spin_unlock(ptl); + return hmm_vma_walk_hole(start, end, -1, walk); } i = (addr - range->start) >> PAGE_SHIFT; @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags, , _fault); if (fault || write_fault) { - ret = hmm_vma_walk_hole_(addr, end, fault, -write_fault, walk); - goto out_unlock; + spin_unlock(ptl); + return hmm_vma_walk_hole_(addr, end, fault, write_fault, + walk); } pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v3 2/2] drm/scheduler: Avoid accessing freed bad job.
On 25/11/2019 14:10, Andrey Grodzovsky wrote: > Problem: > Due to a race between drm_sched_cleanup_jobs in sched thread and > drm_sched_job_timedout in timeout work there is a possiblity that > bad job was already freed while still being accessed from the > timeout thread. > > Fix: > Instead of just peeking at the bad job in the mirror list > remove it from the list under lock and then put it back later when > we are garanteed no race with main sched thread is possible which > is after the thread is parked. > > v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs. > > v3: Rebase on top of drm-misc-next. v2 is not needed anymore as > drm_sched_cleanup_jobs already has a lock there. > > Signed-off-by: Andrey Grodzovsky > Reviewed-by: Christian König > Tested-by: Emily Deng > --- > drivers/gpu/drm/scheduler/sched_main.c | 29 + > 1 file changed, 29 insertions(+) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 6774955..a604dfa 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -284,10 +284,24 @@ static void drm_sched_job_timedout(struct work_struct > *work) > unsigned long flags; > > sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); > + > + /* > + * Protects against concurrent deletion in drm_sched_cleanup_jobs that ^ drm_sched_cleanup_jobs() is gone - replaced with drm_sched_get_cleanup_job() and code in drm_sched_main(). See: 588b9828f074 ("drm: Don't free jobs in wait_event_interruptible()") > + * is already in progress. > + */ > + spin_lock_irqsave(>job_list_lock, flags); > job = list_first_entry_or_null(>ring_mirror_list, > struct drm_sched_job, node); > > if (job) { > + /* > + * Remove the bad job so it cannot be freed by already in > progress > + * drm_sched_cleanup_jobs. It will be reinsrted back after > sched->thread ^ Typo: s/reinsrted/reinserted/ > + * is parked at which point it's safe. > + */ > + list_del_init(>node); > + spin_unlock_irqrestore(>job_list_lock, flags); > + > job->sched->ops->timedout_job(job); > > /* > @@ -298,6 +312,8 @@ static void drm_sched_job_timedout(struct work_struct > *work) > job->sched->ops->free_job(job); > sched->free_guilty = false; > } > + } else { > + spin_unlock_irqrestore(>job_list_lock, flags); > } > > spin_lock_irqsave(>job_list_lock, flags); > @@ -370,6 +386,19 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, > struct drm_sched_job *bad) > kthread_park(sched->thread); > > /* > + * Reinsert back the bad job here - now it's safe as > drm_sched_cleanup_jobs Another reference to drm_sched_cleanup_jobs. > + * cannot race against us and release the bad job at this point - we > parked > + * (waited for) any in progress (earlier) cleanups and any later ones > will > + * bail out due to sched->thread being parked. As explained in the other patch - after the kthread_park() has returned there will be no more calls to drm_sched_get_cleanup_job() until the thread is unparked. Steve > + */ > + if (bad && bad->sched == sched) > + /* > + * Add at the head of the queue to reflect it was the earliest > + * job extracted. > + */ > + list_add(>node, >ring_mirror_list); > + > + /* >* Iterate the job list from later to earlier one and either deactive >* their HW callbacks or remove them from mirror list if they already >* signaled. > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v3 1/2] drm/sched: Avoid job cleanup if sched thread is parked.
On 25/11/2019 14:10, Andrey Grodzovsky wrote: > When the sched thread is parked we assume ring_mirror_list is > not accessed from here. FWIW I don't think this is necessary. kthread_park() will wait until the thread is parked, at which point the thread is stuck in kthread_parkme() until unparked. So all this does is avoid waiting for any cleanup jobs before parking - which might be a reasonable goal in itself, but if so lets at least document that. Steve > > Signed-off-by: Andrey Grodzovsky > Reviewed-by: Christian König > --- > drivers/gpu/drm/scheduler/sched_main.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index d4cc728..6774955 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler > *sched) > struct drm_sched_job *job; > unsigned long flags; > > - /* Don't destroy jobs while the timeout worker is running */ > - if (sched->timeout != MAX_SCHEDULE_TIMEOUT && > - !cancel_delayed_work(>work_tdr)) > + /* > + * Don't destroy jobs while the timeout worker is running OR thread > + * is being parked and hence assumed to not touch ring_mirror_list > + */ > + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && > + !cancel_delayed_work(>work_tdr)) || > + __kthread_should_park(sched->thread)) > return NULL; > > spin_lock_irqsave(>job_list_lock, flags); > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx