Re: [PATCH] drm/gem: Expose the buffer object handle to userspace last

2023-02-15 Thread Steven Price
On 14/02/2023 12:50, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Currently drm_gem_handle_create_tail exposes the handle to userspace
> before the buffer object constructions is complete. This allowing
> of working against a partially constructed object, which may also be in
> the process of having its creation fail, can have a range of negative
> outcomes.
> 
> A lot of those will depend on what the individual drivers are doing in
> their obj->funcs->open() callbacks, and also with a common failure mode
> being -ENOMEM from drm_vma_node_allow.
> 
> We can make sure none of this can happen by allocating a handle last,
> although with a downside that more of the function now runs under the
> dev->object_name_lock.
> 
> Looking into the individual drivers open() hooks, we have
> amdgpu_gem_object_open which seems like it could have a potential security
> issue without this change.
> 
> A couple drivers like qxl_gem_object_open and vmw_gem_object_open
> implement no-op hooks so no impact for them.
> 
> A bunch of other require a deeper look by individual owners to asses for
> impact. Those are lima_gem_object_open, nouveau_gem_object_open,
> panfrost_gem_open, radeon_gem_object_open and virtio_gpu_gem_object_open.

I've looked over the panfrost code, and I can't see how this could
create a security hole there. It looks like there's a path which can
confuse the shrinker (so objects might not be purged when they could
be[1]) but they would be freed properly in the normal path - so no worse
than user space could already do.

[1] gpu_usecount is incremented in panfrost_lookup_bos() per bo, but not
decremented on failure.

> Putting aside the risk assesment of the above, some common scenarios to
> think about are along these lines:
> 
> 1)
> Userspace closes a handle by speculatively "guessing" it from a second
> thread.
> 
> This results in an unreachable buffer object so, a memory leak.
> 
> 2)
> Same as 1), but object is in the process of getting closed (failed
> creation).
> 
> The second thread is then able to re-cycle the handle and idr_remove would
> in the first thread would then remove the handle it does not own from the
> idr.

This, however, looks plausible - and I can see how this could
potentially trigger a security hole in user space.

> 3)
> Going back to the earlier per driver problem space - individual impact
> assesment of allowing a second thread to access and operate on a partially
> constructed handle / object. (Can something crash? Leak information?)
> 
> In terms of identifying when the problem started I will tag some patches
> as references, but not all, if even any, of them actually point to a
> broken state. I am just identifying points at which more opportunity for
> issues to arise was added.
> 
> References: 304eda32920b ("drm/gem: add hooks to notify driver when object 
> handle is created/destroyed")
> References: ca481c9b2a3a ("drm/gem: implement vma access management")
> References: b39b5394fabc ("drm/gem: Add drm_gem_object_funcs")
> Cc: dri-de...@lists.freedesktop.org
> Cc: Rob Clark 
> Cc: Ben Skeggs 
> Cc: David Herrmann 
> Cc: Noralf Trønnes 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: amd-gfx@lists.freedesktop.org
> Cc: l...@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> Cc: Steven Price 
> Cc: virtualizat...@lists.linux-foundation.org
> Cc: spice-de...@lists.freedesktop.org
> Cc: Zack Rusin 

FWIW I think this makes the code easier to reason about, so

Reviewed-by: Steven Price 

> ---
>  drivers/gpu/drm/drm_gem.c | 48 +++
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index aa15c52ae182..e3d897bca0f2 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -356,52 +356,52 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
>  u32 *handlep)
>  {
>   struct drm_device *dev = obj->dev;
> - u32 handle;
>   int ret;
>  
>   WARN_ON(!mutex_is_locked(>object_name_lock));
>   if (obj->handle_count++ == 0)
>   drm_gem_object_get(obj);
>  
> + ret = drm_vma_node_allow(>vma_node, file_priv);
> + if (ret)
> + goto err_put;
> +
> + if (obj->funcs->open) {
> + ret = obj->funcs->open(obj, file_priv);
> + if (ret)
> + goto err_revoke;
> + }
> +
>   /*
> -  * Get the user-visible handle using idr.  Preload and perform
> -  * allocation under our spinlock.
> +  * Get the user-visible handle

Re: [PATCH v6 04/22] drm/panfrost: Fix shrinker list corruption by madvise IOCTL

2022-05-30 Thread Steven Price
On 27/05/2022 00:50, Dmitry Osipenko wrote:
> Calling madvise IOCTL twice on BO causes memory shrinker list corruption
> and crashes kernel because BO is already on the list and it's added to
> the list again, while BO should be removed from from the list before it's
> re-added. Fix it.
> 
> Cc: sta...@vger.kernel.org
> Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> Signed-off-by: Dmitry Osipenko 

Reviewed-by: Steven Price 

> ---
>  drivers/gpu/drm/panfrost/panfrost_drv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> b/drivers/gpu/drm/panfrost/panfrost_drv.c
> index 087e69b98d06..b1e6d238674f 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> @@ -433,8 +433,8 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, 
> void *data,
>  
>   if (args->retained) {
>   if (args->madv == PANFROST_MADV_DONTNEED)
> - list_add_tail(>base.madv_list,
> -   >shrinker_list);
> + list_move_tail(>base.madv_list,
> +>shrinker_list);
>   else if (args->madv == PANFROST_MADV_WILLNEED)
>   list_del_init(>base.madv_list);
>   }



Re: [bug report] new kvmalloc() WARN() triggered by DRM ioctls tracking

2021-12-16 Thread Steven Price
On 16/12/2021 14:15, Boris Brezillon wrote:
> Hi Steve,
> 
> On Thu, 16 Dec 2021 14:02:25 +
> Steven Price  wrote:
> 
>> + Boris
>>
>> On 16/12/2021 12:08, Dan Carpenter wrote:
>>> Hi DRM Devs,
>>>
>>> In commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
>>> from July, Linus added a WARN_ONCE() for "crazy" allocations over 2GB.
>>> I have a static checker warning for this and most of the warnings are
>>> from DRM ioctls.
>>>
>>> drivers/gpu/drm/lima/lima_drv.c:124 lima_ioctl_gem_submit() warn: uncapped 
>>> user size for kvmalloc() will WARN
>>> drivers/gpu/drm/radeon/radeon_cs.c:291 radeon_cs_parser_init() warn: 
>>> uncapped user size for kvmalloc() will WARN
>>> drivers/gpu/drm/v3d/v3d_gem.c:311 v3d_lookup_bos() warn: uncapped user size 
>>> for kvmalloc() will WARN
>>> drivers/gpu/drm/v3d/v3d_gem.c:319 v3d_lookup_bos() warn: uncapped user size 
>>> for kvmalloc() will WARN
>>> drivers/gpu/drm/v3d/v3d_gem.c:601 v3d_get_multisync_post_deps() warn: 
>>> uncapped user size for kvmalloc() will WARN
>>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:476 etnaviv_ioctl_gem_submit() 
>>> warn: uncapped user size for kvmalloc() will WARN
>>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:477 etnaviv_ioctl_gem_submit() 
>>> warn: uncapped user size for kvmalloc() will WARN
>>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:478 etnaviv_ioctl_gem_submit() 
>>> warn: uncapped user size for kvmalloc() will WARN
>>> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:479 etnaviv_ioctl_gem_submit() 
>>> warn: uncapped user size for kvmalloc() will WARN
>>> drivers/gpu/drm/virtio/virtgpu_ioctl.c:186 virtio_gpu_execbuffer_ioctl() 
>>> warn: uncapped user size for kvmalloc() will WARN
>>> drivers/gpu/drm/panfrost/panfrost_drv.c:198 panfrost_copy_in_sync() warn: 
>>> uncapped user size for kvmalloc() will WARN  
>>
>> I believe this one in Panfrost would be fixed by Boris's series
>> reworking the submit ioctl[1].
>>
>> Boris: are you planning on submitting that series soon - or is it worth
>> cherry picking the rework in patch 5 to fix this issue?
> 
> Don't know when I'll get back to it, so I'd recommend cherry-picking
> what you need.

Thanks, no problem - it was mostly when I looked at the code I had the
feeling that "surely this has already been fixed", then discovered your
series was never merged ;)

I'll hammer out a patch for this one issue.

Thanks,

Steve


Re: [bug report] new kvmalloc() WARN() triggered by DRM ioctls tracking

2021-12-16 Thread Steven Price
+ Boris

On 16/12/2021 12:08, Dan Carpenter wrote:
> Hi DRM Devs,
> 
> In commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> from July, Linus added a WARN_ONCE() for "crazy" allocations over 2GB.
> I have a static checker warning for this and most of the warnings are
> from DRM ioctls.
> 
> drivers/gpu/drm/lima/lima_drv.c:124 lima_ioctl_gem_submit() warn: uncapped 
> user size for kvmalloc() will WARN
> drivers/gpu/drm/radeon/radeon_cs.c:291 radeon_cs_parser_init() warn: uncapped 
> user size for kvmalloc() will WARN
> drivers/gpu/drm/v3d/v3d_gem.c:311 v3d_lookup_bos() warn: uncapped user size 
> for kvmalloc() will WARN
> drivers/gpu/drm/v3d/v3d_gem.c:319 v3d_lookup_bos() warn: uncapped user size 
> for kvmalloc() will WARN
> drivers/gpu/drm/v3d/v3d_gem.c:601 v3d_get_multisync_post_deps() warn: 
> uncapped user size for kvmalloc() will WARN
> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:476 etnaviv_ioctl_gem_submit() 
> warn: uncapped user size for kvmalloc() will WARN
> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:477 etnaviv_ioctl_gem_submit() 
> warn: uncapped user size for kvmalloc() will WARN
> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:478 etnaviv_ioctl_gem_submit() 
> warn: uncapped user size for kvmalloc() will WARN
> drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c:479 etnaviv_ioctl_gem_submit() 
> warn: uncapped user size for kvmalloc() will WARN
> drivers/gpu/drm/virtio/virtgpu_ioctl.c:186 virtio_gpu_execbuffer_ioctl() 
> warn: uncapped user size for kvmalloc() will WARN
> drivers/gpu/drm/panfrost/panfrost_drv.c:198 panfrost_copy_in_sync() warn: 
> uncapped user size for kvmalloc() will WARN

I believe this one in Panfrost would be fixed by Boris's series
reworking the submit ioctl[1].

Boris: are you planning on submitting that series soon - or is it worth
cherry picking the rework in patch 5 to fix this issue?

[1]
https://lore.kernel.org/all/20210930190954.1525933-1-boris.brezil...@collabora.com/

> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:120 amdgpu_cs_parser_init() warn: 
> uncapped user size for kvmalloc() will WARN
> 
> These ioctls can all trigger the stack dump.  The line numbers are from
> linux next (next-20211214).
> 
> I feel like ideally if this could be fixed in a central way, but if not
> then hopefully I've added the relevant lists to the CC.

I've only looked at Panfrost, but at least here we're simply allowing
user space to allocate an arbitrary amount of kernel memory in one go -
which is always going to be a good way of triggering the OOM killer if
nothing else. Boris's series includes a change that means instead trying
to copy an (attacker controller) sized array into the kernel to process,
we copy each each element of the array in turn.

So I don't really see how this could be fixed in a central way (but some
of the other cases might be different).

Thanks,

Steve

> regards,
> dan carpenter
> 



Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-26 Thread Steven Price

On 26/03/2021 02:04, Zhang, Jack (Jian) wrote:

[AMD Official Use Only - Internal Distribution Only]

Hi, Steve,

Thank you for your detailed comments.

But currently the patch is not finalized.
We found some potential race condition even with this patch. The solution is 
under discussion and hopefully we could find an ideal one.
After that, I will start to consider other drm-driver if it will influence 
other drivers(except for amdgpu).


No problem. Please keep me CC'd, the suggestion of using reference 
counts may be beneficial for Panfrost as we already build a reference 
count on top of struct drm_sched_job. So there may be scope for cleaning 
up Panfrost afterwards even if your work doesn't directly affect it.


Thanks,

Steve


Best,
Jack

-Original Message-
From: Steven Price 
Sent: Monday, March 22, 2021 11:29 PM
To: Zhang, Jack (Jian) ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; 
Koenig, Christian ; Grodzovsky, Andrey ; Liu, Monk 
; Deng, Emily ; Rob Herring ; Tomeu Vizoso 

Subject: Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

On 15/03/2021 05:23, Zhang, Jack (Jian) wrote:

[AMD Public Use]

Hi, Rob/Tomeu/Steven,

Would you please help to review this patch for panfrost driver?

Thanks,
Jack Zhang

-Original Message-
From: Jack Zhang 
Sent: Monday, March 15, 2021 1:21 PM
To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org;
Koenig, Christian ; Grodzovsky, Andrey
; Liu, Monk ; Deng, Emily

Cc: Zhang, Jack (Jian) 
Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid
memleak

re-insert Bailing jobs to avoid memory leak.

V2: move re-insert step to drm/scheduler logic
V3: add panfrost's return value for bailing jobs in case it hits the
memleak issue.


This commit message could do with some work - it's really hard to decipher what 
the actual problem you're solving is.



Signed-off-by: Jack Zhang 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++--
   drivers/gpu/drm/panfrost/panfrost_job.c| 4 ++--
   drivers/gpu/drm/scheduler/sched_main.c | 8 +++-
   include/drm/gpu_scheduler.h| 1 +
   5 files changed, 19 insertions(+), 6 deletions(-)


[...]

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..e2cb4f32dae1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct 
drm_sched_job
* spurious. Bail out.
*/
   if (dma_fence_is_signaled(job->done_fence))
-return DRM_GPU_SCHED_STAT_NOMINAL;
+return DRM_GPU_SCHED_STAT_BAILING;

   dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, 
head=0x%x, tail=0x%x, sched_job=%p",
   js,
@@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat
panfrost_job_timedout(struct drm_sched_job

   /* Scheduler is already stopped, nothing to do. */
   if (!panfrost_scheduler_stop(>js->queue[js], sched_job))
-return DRM_GPU_SCHED_STAT_NOMINAL;
+return DRM_GPU_SCHED_STAT_BAILING;

   /* Schedule a reset if there's no reset in progress. */
   if (!atomic_xchg(>reset.pending, 1))


This looks correct to me - in these two cases drm_sched_stop() is not called on 
the sched_job, so it looks like currently the job will be leaked.


diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..a44f621fb5c4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
   {
   struct drm_gpu_scheduler *sched;
   struct drm_sched_job *job;
+int ret;

   sched = container_of(work, struct drm_gpu_scheduler,
work_tdr.work);

@@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
   list_del_init(>list);
   spin_unlock(>job_list_lock);

-job->sched->ops->timedout_job(job);
+ret = job->sched->ops->timedout_job(job);

+if (ret == DRM_GPU_SCHED_STAT_BAILING) {
+spin_lock(>job_list_lock);
+list_add(>node, >ring_mirror_list);
+spin_unlock(>job_list_lock);
+}


I think we could really do with a comment somewhere explaining what "bailing" 
means in this context. For the Panfrost case we have two cases:

   * The GPU job actually finished while the timeout code was running 
(done_fence is signalled).

   * The GPU is already in the process of being reset (Panfrost has multiple 
queues, so mostly like a bad job in another queue).

I'm also not convinced that (for Panfrost) it makes sense to be adding the jobs 
back to the list. For the first case above clearly the job could just be freed 
(it's complete). The second case is more interesting and Panfrost currently 
doesn't handle this well. In theory the driver could try to rescue the job 
('soft sto

Re: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

2021-03-22 Thread Steven Price

On 15/03/2021 05:23, Zhang, Jack (Jian) wrote:

[AMD Public Use]

Hi, Rob/Tomeu/Steven,

Would you please help to review this patch for panfrost driver?

Thanks,
Jack Zhang

-Original Message-
From: Jack Zhang 
Sent: Monday, March 15, 2021 1:21 PM
To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Koenig, Christian 
; Grodzovsky, Andrey ; Liu, Monk 
; Deng, Emily 
Cc: Zhang, Jack (Jian) 
Subject: [PATCH v3] drm/scheduler re-insert Bailing job to avoid memleak

re-insert Bailing jobs to avoid memory leak.

V2: move re-insert step to drm/scheduler logic
V3: add panfrost's return value for bailing jobs
in case it hits the memleak issue.


This commit message could do with some work - it's really hard to 
decipher what the actual problem you're solving is.




Signed-off-by: Jack Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 8 ++--
  drivers/gpu/drm/panfrost/panfrost_job.c| 4 ++--
  drivers/gpu/drm/scheduler/sched_main.c | 8 +++-
  include/drm/gpu_scheduler.h| 1 +
  5 files changed, 19 insertions(+), 6 deletions(-)


[...]

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..e2cb4f32dae1 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -444,7 +444,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct 
drm_sched_job
 * spurious. Bail out.
 */
if (dma_fence_is_signaled(job->done_fence))
-   return DRM_GPU_SCHED_STAT_NOMINAL;
+   return DRM_GPU_SCHED_STAT_BAILING;
  
  	dev_err(pfdev->dev, "gpu sched timeout, js=%d, config=0x%x, status=0x%x, head=0x%x, tail=0x%x, sched_job=%p",

js,
@@ -456,7 +456,7 @@ static enum drm_gpu_sched_stat panfrost_job_timedout(struct 
drm_sched_job
  
  	/* Scheduler is already stopped, nothing to do. */

if (!panfrost_scheduler_stop(>js->queue[js], sched_job))
-   return DRM_GPU_SCHED_STAT_NOMINAL;
+   return DRM_GPU_SCHED_STAT_BAILING;
  
  	/* Schedule a reset if there's no reset in progress. */

if (!atomic_xchg(>reset.pending, 1))


This looks correct to me - in these two cases drm_sched_stop() is not 
called on the sched_job, so it looks like currently the job will be leaked.



diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 92d8de24d0a1..a44f621fb5c4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
  {
struct drm_gpu_scheduler *sched;
struct drm_sched_job *job;
+   int ret;
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
@@ -331,8 +332,13 @@ static void drm_sched_job_timedout(struct work_struct *work)

list_del_init(>list);
spin_unlock(>job_list_lock);
  
-		job->sched->ops->timedout_job(job);

+   ret = job->sched->ops->timedout_job(job);
  
+		if (ret == DRM_GPU_SCHED_STAT_BAILING) {

+   spin_lock(>job_list_lock);
+   list_add(>node, >ring_mirror_list);
+   spin_unlock(>job_list_lock);
+   }


I think we could really do with a comment somewhere explaining what 
"bailing" means in this context. For the Panfrost case we have two cases:


 * The GPU job actually finished while the timeout code was running 
(done_fence is signalled).


 * The GPU is already in the process of being reset (Panfrost has 
multiple queues, so mostly like a bad job in another queue).


I'm also not convinced that (for Panfrost) it makes sense to be adding 
the jobs back to the list. For the first case above clearly the job 
could just be freed (it's complete). The second case is more interesting 
and Panfrost currently doesn't handle this well. In theory the driver 
could try to rescue the job ('soft stop' in Mali language) so that it 
could be resubmitted. Panfrost doesn't currently support that, so 
attempting to resubmit the job is almost certainly going to fail.


It's on my TODO list to look at improving Panfrost in this regard, but 
sadly still quite far down.


Steve


/*
 * Guilty job did complete and hence needs to be manually 
removed
 * See drm_sched_stop doc.
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 4ea8606d91fe..8093ac2427ef 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -210,6 +210,7 @@ enum drm_gpu_sched_stat {
DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
DRM_GPU_SCHED_STAT_NOMINAL,
DRM_GPU_SCHED_STAT_ENODEV,
+   DRM_GPU_SCHED_STAT_BAILING,
  };
  
  /**




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v3)

2021-01-21 Thread Steven Price

On 20/01/2021 20:09, Luben Tuikov wrote:

This patch does not change current behaviour.

The driver's job timeout handler now returns
status indicating back to the DRM layer whether
the device (GPU) is no longer available, such as
after it's been unplugged, or whether all is
normal, i.e. current behaviour.

All drivers which make use of the
drm_sched_backend_ops' .timedout_job() callback
have been accordingly renamed and return the
would've-been default value of
DRM_GPU_SCHED_STAT_NOMINAL to restart the task's
timeout timer--this is the old behaviour, and is
preserved by this patch.

v2: Use enum as the status of a driver's job
 timeout callback method.

v3: Return scheduler/device information, rather
 than task information.

Cc: Alexander Deucher 
Cc: Andrey Grodzovsky 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: Qiang Yu 
Cc: Rob Herring 
Cc: Tomeu Vizoso 
Cc: Steven Price 
Cc: Alyssa Rosenzweig 
Cc: Eric Anholt 
Reported-by: kernel test robot 
Signed-off-by: Luben Tuikov 


Acked-by: Steven Price 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)

2020-12-14 Thread Steven Price

On 11/12/2020 21:44, Luben Tuikov wrote:

On 2020-12-10 4:46 a.m., Steven Price wrote:

On 10/12/2020 02:14, Luben Tuikov wrote:

This patch does not change current behaviour.

The driver's job timeout handler now returns
status indicating back to the DRM layer whether
the task (job) was successfully aborted or whether
more time should be given to the task to complete.


I find the definitions given a little confusing, see below.


Default behaviour as of this patch, is preserved,
except in obvious-by-comment case in the Panfrost
driver, as documented below.

All drivers which make use of the
drm_sched_backend_ops' .timedout_job() callback
have been accordingly renamed and return the
would've-been default value of
DRM_TASK_STATUS_ALIVE to restart the task's
timeout timer--this is the old behaviour, and
is preserved by this patch.

In the case of the Panfrost driver, its timedout
callback correctly first checks if the job had
completed in due time and if so, it now returns
DRM_TASK_STATUS_COMPLETE to notify the DRM layer
that the task can be moved to the done list, to be
freed later. In the other two subsequent checks,
the value of DRM_TASK_STATUS_ALIVE is returned, as
per the default behaviour.

A more involved driver's solutions can be had
in subequent patches.


NIT: ^ subsequent


Thank you--no idea how my spellcheck and checkpatch.pl missed that.





v2: Use enum as the status of a driver's job
  timeout callback method.

Cc: Alexander Deucher 
Cc: Andrey Grodzovsky 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: Qiang Yu 
Cc: Rob Herring 
Cc: Tomeu Vizoso 
Cc: Steven Price 
Cc: Alyssa Rosenzweig 
Cc: Eric Anholt 
Reported-by: kernel test robot 


This reported-by seems a little odd for this patch.


I got this because I wasn't (originally) changing all drivers
which were using the task timeout callback.
Should I remove it?


Well the kernel test robot would only have "reported" things like build 
failures and this patch isn't really 'fixing' anything (as you state it 
does not change current behaviour). So personally I'm not sure how the 
kernel test robot triggered you to write this patch. But equally if you 
were inspired by it and want to credit it, that's fine by me! I'd 
assumed that this was just a copy-paste error and you'd taken the list 
of CC's from another patch and accidentally included the Reported-by too.





Signed-off-by: Luben Tuikov 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 +++--
   drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++-
   drivers/gpu/drm/lima/lima_sched.c   |  4 +++-
   drivers/gpu/drm/panfrost/panfrost_job.c |  9 ---
   drivers/gpu/drm/scheduler/sched_main.c  |  4 +---
   drivers/gpu/drm/v3d/v3d_sched.c | 32 +
   include/drm/gpu_scheduler.h | 20 +---
   7 files changed, 57 insertions(+), 28 deletions(-)



[]


diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 2e0c368e19f6..cedfc5394e52 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct 
drm_sched_job *s_job,
return s_job && atomic_inc_return(_job->karma) > threshold;
   }
   
+enum drm_task_status {

+   DRM_TASK_STATUS_COMPLETE,
+   DRM_TASK_STATUS_ALIVE
+};
+
   /**
* struct drm_sched_backend_ops
*
@@ -230,10 +235,19 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
   
   	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy
+* and executing in the hardware, i.e. it needs more time.


So 'alive' means the job (was) alive, and GPU recovery is happening.
I.e. it's the job just takes too long. Panfrost will trigger a GPU reset
(killing the job) in this case while returning DRM_TASK_STATUS_ALIVE.


"ALIVE" means "at this moment the task is in the hardware executing,
using memories, DMA engines, compute units, the whole thing." You,
can also call it active, executing, busy, etc.

"ALIVE" makes no assumptions about how the task got there. Maybe
this was original submission, and the task is taking its sweet time.
Maybe the driver aborted it and reissued it, all in the timer timeout
callback, etc.

It merely tells the DRM layer that the task is actively executing
in the hardware, so no assumptions can be made about freeing memories,
decrementing krefs, etc. IOW, it's executing, check back later.


Ok, makes sense. Although I think the comment about "it needs more time" 
could do with a clarification that it's up to the timeout handler 
wh

Re: [PATCH 1/1] drm/scheduler: Job timeout handler returns status (v2)

2020-12-10 Thread Steven Price

On 10/12/2020 02:14, Luben Tuikov wrote:

This patch does not change current behaviour.

The driver's job timeout handler now returns
status indicating back to the DRM layer whether
the task (job) was successfully aborted or whether
more time should be given to the task to complete.


I find the definitions given a little confusing, see below.


Default behaviour as of this patch, is preserved,
except in obvious-by-comment case in the Panfrost
driver, as documented below.

All drivers which make use of the
drm_sched_backend_ops' .timedout_job() callback
have been accordingly renamed and return the
would've-been default value of
DRM_TASK_STATUS_ALIVE to restart the task's
timeout timer--this is the old behaviour, and
is preserved by this patch.

In the case of the Panfrost driver, its timedout
callback correctly first checks if the job had
completed in due time and if so, it now returns
DRM_TASK_STATUS_COMPLETE to notify the DRM layer
that the task can be moved to the done list, to be
freed later. In the other two subsequent checks,
the value of DRM_TASK_STATUS_ALIVE is returned, as
per the default behaviour.

A more involved driver's solutions can be had
in subequent patches.


NIT: ^ subsequent



v2: Use enum as the status of a driver's job
 timeout callback method.

Cc: Alexander Deucher 
Cc: Andrey Grodzovsky 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Lucas Stach 
Cc: Russell King 
Cc: Christian Gmeiner 
Cc: Qiang Yu 
Cc: Rob Herring 
Cc: Tomeu Vizoso 
Cc: Steven Price 
Cc: Alyssa Rosenzweig 
Cc: Eric Anholt 
Reported-by: kernel test robot 


This reported-by seems a little odd for this patch.


Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 +++--
  drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 +++-
  drivers/gpu/drm/lima/lima_sched.c   |  4 +++-
  drivers/gpu/drm/panfrost/panfrost_job.c |  9 ---
  drivers/gpu/drm/scheduler/sched_main.c  |  4 +---
  drivers/gpu/drm/v3d/v3d_sched.c | 32 +
  include/drm/gpu_scheduler.h | 20 +---
  7 files changed, 57 insertions(+), 28 deletions(-)



[]


diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 2e0c368e19f6..cedfc5394e52 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -206,6 +206,11 @@ static inline bool drm_sched_invalidate_job(struct 
drm_sched_job *s_job,
return s_job && atomic_inc_return(_job->karma) > threshold;
  }
  
+enum drm_task_status {

+   DRM_TASK_STATUS_COMPLETE,
+   DRM_TASK_STATUS_ALIVE
+};
+
  /**
   * struct drm_sched_backend_ops
   *
@@ -230,10 +235,19 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
  
  	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return DRM_TASK_STATUS_ALIVE, if the task (job) is healthy
+* and executing in the hardware, i.e. it needs more time.


So 'alive' means the job (was) alive, and GPU recovery is happening. 
I.e. it's the job just takes too long. Panfrost will trigger a GPU reset 
(killing the job) in this case while returning DRM_TASK_STATUS_ALIVE.



+*
+* Return DRM_TASK_STATUS_COMPLETE, if the task (job) has
+* been aborted or is unknown to the hardware, i.e. if
+* the task is out of the hardware, and maybe it is now
+* in the done list, or it was completed long ago, or
+* if it is unknown to the hardware.


Where 'complete' seems to mean a variety of things:

 * The job completed successfully (i.e. the timeout raced), this is the 
situation that Panfrost detects. In this case (and only this case) the 
GPU reset will *not* happen.


 * The job failed (aborted) and is no longer on the hardware. Panfrost 
currently handles a job failure by triggering drm_sched_fault() to 
trigger the timeout handler. But the timeout handler doesn't handle this 
differently so will return DRM_TASK_STATUS_ALIVE.


 * The job is "unknown to hardware". There are some corner cases in 
Panfrost (specifically two early returns from panfrost_job_hw_submit()) 
where the job never actually lands on the hardware, but the scheduler 
isn't informed. We currently rely on the timeout handling to recover 
from that. However, again, the timeout handler doesn't know about this 
soo will return DRM_TASK_STATUS_ALIVE.


So of the four cases listed in these comments, Panfrost is only getting 
2 'correct' after this change.


But what I really want to know is what the scheduler is planning to do 
in these situations? The Panfrost return value in this patch is really a 
"did we trigger a GPU reset" - and doesn't seem to match the 
descriptions above.


Steve


 */
-   void (*timedout_job)(struct drm_s

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Steven Price

On 25/11/2020 11:15, Lucas Stach wrote:

Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:

On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give
the job more time to complete. Could you expand on that?


On etnaviv we don't have the ability to preempt a running job, but we
can look at the GPU state to determine if it's still making progress
with the current job, so we want to extend the timeout in that case to
not kill a long running but valid job.


Ok, fair enough. Although from my experience (on Mali) jobs very rarely 
"get stuck" it's just that their run time can be excessive[1] causing 
other processes to not make forward progress. So I'd expect the timeout 
to be set based on how long a job can run before you need to stop it to 
allow other processes to run their jobs.


But I'm not familiar with etnaviv so perhaps stuck jobs are actually a 
thing there.


Thanks,

Steve

[1] Also on Mali it's quite possible to create an infinite duration job 
which appears to be making forward progress, so in that case our measure 
of progress isn't useful against these malicious jobs.



Regards,
Lucas


One thing we're missing at the moment in Panfrost is the ability to
suspend ("soft stop" is the Mali jargon) a job and pick something else
to run. The propitiatory driver stack uses this to avoid timing out long
running jobs while still allowing other processes to have time on the
GPU. But this interface as it stands doesn't seem to provide that.

As the kernel test robot has already pointed out - you'll need to at the
very least update the other uses of this interface.

Steve


Signed-off-by: Luben Tuikov 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
   include/drm/gpu_scheduler.h | 13 ++---
   2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
   #include "amdgpu.h"
   #include "amdgpu_trace.h"
   
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
   {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
   
   	amdgpu_vm_get_task_info(ring->adev, job->pasid, );

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
   
   	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(>sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
   }
   
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
   
   	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.
 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);
   
   	/**

* @free_job: Called once the job's finished fence has been signaled





___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 6/6] drm/sched: Make use of a "done" thread

2020-11-25 Thread Steven Price

On 25/11/2020 03:17, Luben Tuikov wrote:

Add a "done" list to which all completed jobs are added
to be freed. The drm_sched_job_done() callback is the
producer of jobs to this list.

Add a "done" thread which consumes from the done list
and frees up jobs. Now, the main scheduler thread only
pushes jobs to the GPU and the "done" thread frees them
up, on the way out of the GPU when they've completed
execution.


Generally I'd be in favour of a "done thread" as I think there are some 
murky corners of Panfrost's locking that would be helped by deferring 
the free_job() callback.


But I think you're trying to do too much in one patch here. And as 
Christian has pointed out there's some dodgy looking changes to locking 
which aren't explained.


Steve



Make use of the status returned by the GPU driver
timeout handler to decide whether to leave the job in
the pending list, or to send it off to the done list.
If a job is done, it is added to the done list and the
done thread woken up. If a job needs more time, it is
left on the pending list and the timeout timer
restarted.

Eliminate the polling mechanism of picking out done
jobs from the pending list, i.e. eliminate
drm_sched_get_cleanup_job(). Now the main scheduler
thread only pushes jobs down to the GPU.

Various other optimizations to the GPU scheduler
and job recovery are possible with this format.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/scheduler/sched_main.c | 173 +
  include/drm/gpu_scheduler.h|  14 ++
  2 files changed, 101 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 3eb7618a627d..289ae68cd97f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
   * drm_sched_job_done - complete a job
   * @s_job: pointer to the job which is done
   *
- * Finish the job's fence and wake up the worker thread.
+ * Finish the job's fence, move it to the done list,
+ * and wake up the done thread.
   */
  static void drm_sched_job_done(struct drm_sched_job *s_job)
  {
@@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job *s_job)
dma_fence_get(_fence->finished);
drm_sched_fence_finished(s_fence);
dma_fence_put(_fence->finished);
-   wake_up_interruptible(>wake_up_worker);
+
+   spin_lock(>job_list_lock);
+   list_move(_job->list, >done_list);
+   spin_unlock(>job_list_lock);
+
+   wake_up_interruptible(>done_wait_q);
  }
  
  /**

@@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* 
fence,
  EXPORT_SYMBOL(drm_sched_dependency_optimized);
  
  /**

- * drm_sched_start_timeout - start timeout for reset worker
- *
- * @sched: scheduler instance to start the worker for
+ * drm_sched_start_timeout - start a timeout timer
+ * @sched: scheduler instance whose job we're timing
   *
- * Start the timeout for the given scheduler.
+ * Start a timeout timer for the given scheduler.
   */
  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  {
@@ -305,8 +310,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
  
  	spin_lock(>job_list_lock);

list_add_tail(_job->list, >pending_list);
-   drm_sched_start_timeout(sched);
spin_unlock(>job_list_lock);
+   drm_sched_start_timeout(sched);
  }
  
  static void drm_sched_job_timedout(struct work_struct *work)

@@ -316,37 +321,30 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
-	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */

spin_lock(>job_list_lock);
job = list_first_entry_or_null(>pending_list,
   struct drm_sched_job, list);
+   spin_unlock(>job_list_lock);
  
  	if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(>list);
-   spin_unlock(>job_list_lock);
+   int res;
  
-		job->sched->ops->timedout_job(job);

+   job->job_status |= DRM_JOB_STATUS_TIMEOUT;
+   res = job->sched->ops->timedout_job(job);
+   if (res == 0) {
+   /* The job is out of the device.
+*/
+   spin_lock(>job_list_lock);
+   list_move(>list, >done_list);
+   spin_unlock(>job_list_lock);
  
-		/*

-* Guilty job did complete and hence needs to be manually 
removed
-* See drm_sched_stop doc.
-*/
-   if (sched->free_guilty) {
-   

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Steven Price

On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give 
the job more time to complete. Could you expand on that?


One thing we're missing at the moment in Panfrost is the ability to 
suspend ("soft stop" is the Mali jargon) a job and pick something else 
to run. The propitiatory driver stack uses this to avoid timing out long 
running jobs while still allowing other processes to have time on the 
GPU. But this interface as it stands doesn't seem to provide that.


As the kernel test robot has already pointed out - you'll need to at the 
very least update the other uses of this interface.


Steve



Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
  include/drm/gpu_scheduler.h | 13 ++---
  2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
  {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
  
  	amdgpu_vm_get_task_info(ring->adev, job->pasid, );

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
  
  	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(>sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
  }
  
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
  
  	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.
 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);
  
  	/**

   * @free_job: Called once the job's finished fence has been signaled



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly

2020-03-12 Thread Steven Price

On 12/03/2020 16:37, Jason Gunthorpe wrote:

On Thu, Mar 12, 2020 at 04:16:33PM +, Steven Price wrote:

Actually, while you are looking at this, do you think we should be
adding at least READ_ONCE in the pagewalk.c walk_* functions? The
multiple references of pmd, pud, etc without locking seems sketchy to
me.


I agree it seems worrying. I'm not entirely sure whether the holding of
mmap_sem is sufficient,


I looked at this question, and at least for PMD, mmap_sem is not
sufficient. I didn't easilly figure it out for the other ones

I'm guessing if PMD is not safe then none of them are.


this isn't something that I changed so I've just
been hoping that it's sufficient since it seems to have been working
(whether that's by chance because the compiler didn't generate multiple
reads I've no idea). For walking the kernel's page tables the lack of
READ_ONCE is also not great, but at least for PTDUMP we don't care too much
about accuracy and it should be crash proof because there's no RCU grace
period. And again the code I was replacing didn't have any special
protection.

I can't see any harm in updating the code to include READ_ONCE and I'm happy
to review a patch.


The reason I ask is because hmm's walkers often have this pattern
where they get the pointer and then de-ref it (again) then
immediately have to recheck the 'again' conditions of the walker
itself because the re-read may have given a different value.

Having the walker deref the pointer and pass the value it into the ops
for use rather than repeatedly de-refing an unlocked value seems like
a much safer design to me.


Yeah that sounds like a good idea.


If this also implicitly relies on a RCU grace period then it is also
missing RCU locking...


True - I'm not 100% sure in what situations a page table entry can be 
freed. Anshuman has added locking to deal with memory hotplug[1]. I 
believe this is sufficient.


[1] bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for 
kernel page table dump")



I also didn't quite understand why walk_pte_range() skipped locking
the pte in the no_vma case - I don't get why vma would be related to
locking here.


The no_vma case is for walking the kernel's page tables and they may 
have entries that are not backed by struct page, so there isn't 
(reliably) a PTE lock to take.



I also saw that hmm open coded the pte walk, presumably for
performance, so I was thinking of adding some kind of pte_range()
callback to avoid the expensive indirect function call per pte, but
hmm also can't have the pmd locked...


Yeah the callback per PTE is a bit heavy because of the indirect 
function call. I'm not sure how to optimise it beyond open coding at the 
PMD level. One option would be to provide helper functions to make it a 
bit more generic.


Do you have an idea of what pte_range() would look like?

Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly

2020-03-12 Thread Steven Price

On 12/03/2020 15:11, Jason Gunthorpe wrote:

On Thu, Mar 12, 2020 at 02:40:08PM +, Steven Price wrote:

On 12/03/2020 14:27, Jason Gunthorpe wrote:

On Thu, Mar 12, 2020 at 10:28:13AM +, Steven Price wrote:

By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
condition early it's possible to remove the 'ret' variable and remove a
level of indentation from half the function making the code easier to
read.

No functional change.

Signed-off-by: Steven Price 
Thanks to Jason's changes there were only two code paths left using
the out_unlock label so it seemed like a good opportunity to
refactor.


Yes, I made something very similar, what do you think of this:

https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36


Even better! Sorry I didn't realise you'd already done this. I just saw that
the function was needlessly complicated after your fix, so I thought I'd do
a drive-by cleanup since part of the mess was my fault! :)


No worries, I've got a lot of patches for hmm_range_fault right now,
just trying to organize them, test them and post them. Haven't posted
that one yet.

Actually, while you are looking at this, do you think we should be
adding at least READ_ONCE in the pagewalk.c walk_* functions? The
multiple references of pmd, pud, etc without locking seems sketchy to
me.


I agree it seems worrying. I'm not entirely sure whether the holding of 
mmap_sem is sufficient, this isn't something that I changed so I've just 
been hoping that it's sufficient since it seems to have been working 
(whether that's by chance because the compiler didn't generate multiple 
reads I've no idea). For walking the kernel's page tables the lack of 
READ_ONCE is also not great, but at least for PTDUMP we don't care too 
much about accuracy and it should be crash proof because there's no RCU 
grace period. And again the code I was replacing didn't have any special 
protection.


I can't see any harm in updating the code to include READ_ONCE and I'm 
happy to review a patch.


Thanks,

Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly

2020-03-12 Thread Steven Price

On 12/03/2020 14:27, Jason Gunthorpe wrote:

On Thu, Mar 12, 2020 at 10:28:13AM +, Steven Price wrote:

By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
condition early it's possible to remove the 'ret' variable and remove a
level of indentation from half the function making the code easier to
read.

No functional change.

Signed-off-by: Steven Price 
---
Thanks to Jason's changes there were only two code paths left using
the out_unlock label so it seemed like a good opportunity to
refactor.


Yes, I made something very similar, what do you think of this:

https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36


Even better! Sorry I didn't realise you'd already done this. I just saw 
that the function was needlessly complicated after your fix, so I 
thought I'd do a drive-by cleanup since part of the mess was my fault! :)


Thanks,

Steve
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock

2020-03-12 Thread Steven Price

On 11/03/2020 18:35, Jason Gunthorpe wrote:

From: Jason Gunthorpe 

This eventually calls into handle_mm_fault() which is a sleeping function.
Release the lock first.

hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
need the lock.

Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
Cc: Steven Price 
Signed-off-by: Jason Gunthorpe 


Sorry about that, thanks for fixing.

Reviewed-by: Steven Price 


---
  mm/hmm.c | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 9e8f68eb83287a..32dcbfd3908315 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
start, unsigned long end,
  
  	pud = READ_ONCE(*pudp);

if (pud_none(pud)) {
-   ret = hmm_vma_walk_hole(start, end, -1, walk);
-   goto out_unlock;
+   spin_unlock(ptl);
+   return hmm_vma_walk_hole(start, end, -1, walk);
}
  
  	if (pud_huge(pud) && pud_devmap(pud)) {

@@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
start, unsigned long end,
bool fault, write_fault;
  
  		if (!pud_present(pud)) {

-   ret = hmm_vma_walk_hole(start, end, -1, walk);
-   goto out_unlock;
+   spin_unlock(ptl);
+   return hmm_vma_walk_hole(start, end, -1, walk);
}
  
  		i = (addr - range->start) >> PAGE_SHIFT;

@@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long 
start, unsigned long end,
hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 cpu_flags, , _fault);
if (fault || write_fault) {
-   ret = hmm_vma_walk_hole_(addr, end, fault,
-write_fault, walk);
-   goto out_unlock;
+   spin_unlock(ptl);
+   return hmm_vma_walk_hole_(addr, end, fault, write_fault,
+ walk);
}
  
  		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 2/2] drm/scheduler: Avoid accessing freed bad job.

2019-11-25 Thread Steven Price
On 25/11/2019 14:10, Andrey Grodzovsky wrote:
> Problem:
> Due to a race between drm_sched_cleanup_jobs in sched thread and
> drm_sched_job_timedout in timeout work there is a possiblity that
> bad job was already freed while still being accessed from the
> timeout thread.
> 
> Fix:
> Instead of just peeking at the bad job in the mirror list
> remove it from the list under lock and then put it back later when
> we are garanteed no race with main sched thread is possible which
> is after the thread is parked.
> 
> v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
> 
> v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
> drm_sched_cleanup_jobs already has a lock there.
> 
> Signed-off-by: Andrey Grodzovsky 
> Reviewed-by: Christian König 
> Tested-by: Emily Deng 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 6774955..a604dfa 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -284,10 +284,24 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
>   unsigned long flags;
>  
>   sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
> +
> + /*
> +  * Protects against concurrent deletion in drm_sched_cleanup_jobs that
   ^

drm_sched_cleanup_jobs() is gone - replaced with
drm_sched_get_cleanup_job() and code in drm_sched_main(). See:

588b9828f074 ("drm: Don't free jobs in wait_event_interruptible()")

> +  * is already in progress.
> +  */
> + spin_lock_irqsave(>job_list_lock, flags);
>   job = list_first_entry_or_null(>ring_mirror_list,
>  struct drm_sched_job, node);
>  
>   if (job) {
> + /*
> +  * Remove the bad job so it cannot be freed by already in 
> progress
> +  * drm_sched_cleanup_jobs. It will be reinsrted back after 
> sched->thread
  ^
Typo: s/reinsrted/reinserted/

> +  * is parked at which point it's safe.
> +  */
> + list_del_init(>node);
> + spin_unlock_irqrestore(>job_list_lock, flags);
> +
>   job->sched->ops->timedout_job(job);
>  
>   /*
> @@ -298,6 +312,8 @@ static void drm_sched_job_timedout(struct work_struct 
> *work)
>   job->sched->ops->free_job(job);
>   sched->free_guilty = false;
>   }
> + } else {
> + spin_unlock_irqrestore(>job_list_lock, flags);
>   }
>  
>   spin_lock_irqsave(>job_list_lock, flags);
> @@ -370,6 +386,19 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
> struct drm_sched_job *bad)
>   kthread_park(sched->thread);
>  
>   /*
> +  * Reinsert back the bad job here - now it's safe as 
> drm_sched_cleanup_jobs

Another reference to drm_sched_cleanup_jobs.

> +  * cannot race against us and release the bad job at this point - we 
> parked
> +  * (waited for) any in progress (earlier) cleanups and any later ones 
> will
> +  * bail out due to sched->thread being parked.

As explained in the other patch - after the kthread_park() has returned
there will be no more calls to drm_sched_get_cleanup_job() until the
thread is unparked.

Steve

> +  */
> + if (bad && bad->sched == sched)
> + /*
> +  * Add at the head of the queue to reflect it was the earliest
> +  * job extracted.
> +  */
> + list_add(>node, >ring_mirror_list);
> +
> + /*
>* Iterate the job list from later to  earlier one and either deactive
>* their HW callbacks or remove them from mirror list if they already
>* signaled.
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 1/2] drm/sched: Avoid job cleanup if sched thread is parked.

2019-11-25 Thread Steven Price
On 25/11/2019 14:10, Andrey Grodzovsky wrote:
> When the sched thread is parked we assume ring_mirror_list is
> not accessed from here.

FWIW I don't think this is necessary. kthread_park() will wait until the
thread is parked, at which point the thread is stuck in kthread_parkme()
until unparked.

So all this does is avoid waiting for any cleanup jobs before parking -
which might be a reasonable goal in itself, but if so lets at least
document that.

Steve

> 
> Signed-off-by: Andrey Grodzovsky 
> Reviewed-by: Christian König 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index d4cc728..6774955 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -635,9 +635,13 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler 
> *sched)
>   struct drm_sched_job *job;
>   unsigned long flags;
>  
> - /* Don't destroy jobs while the timeout worker is running */
> - if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> - !cancel_delayed_work(>work_tdr))
> + /*
> + * Don't destroy jobs while the timeout worker is running  OR thread
> + * is being parked and hence assumed to not touch ring_mirror_list
> + */
> + if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> +  !cancel_delayed_work(>work_tdr)) ||
> +  __kthread_should_park(sched->thread))
>   return NULL;
>  
>   spin_lock_irqsave(>job_list_lock, flags);
> 

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx