from:"Tvrtko Ursulin"

Re: [PATCH v6 5/5] drm/amdgpu: track bo memory stats at runtime

2024-11-07 Thread Tvrtko Ursulin




On 07/11/2024 14:24, Li, Yunxiang (Teddy) wrote:

[Public]


From: Tvrtko Ursulin 
Sent: Thursday, November 7, 2024 5:48
On 31/10/2024 13:48, Li, Yunxiang (Teddy) wrote:

[Public]


From: Christian König 
Sent: Thursday, October 31, 2024 8:54 Am 25.10.24 um 19:41 schrieb
Yunxiang Li:

Before, every time fdinfo is queried we try to lock all the BOs in
the VM and calculate memory usage from scratch. This works okay if
the fdinfo is rarely read and the VMs don't have a ton of BOs. If
either of these conditions is not true, we get a massive performance hit.

In this new revision, we track the BOs as they change states. This
way when the fdinfo is queried we only need to take the status lock
and copy out the usage stats with minimal impact to the runtime performance.

Signed-off-by: Yunxiang Li 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  14 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c  |  10 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 107 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 189 +++---

--

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  12 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |   1 +
8 files changed, 199 insertions(+), 141 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index b144404902255..1d8a0ff3c8604 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -36,6 +36,7 @@
#include "amdgpu_gem.h"
#include "amdgpu_dma_buf.h"
#include "amdgpu_xgmi.h"
+#include "amdgpu_vm.h"
#include 
#include 
#include 
@@ -190,6 +191,13 @@ static void amdgpu_dma_buf_unmap(struct

dma_buf_attachment *attach,

  }
}

+static void amdgpu_dma_buf_release(struct dma_buf *buf) {
+   struct amdgpu_bo *bo = gem_to_amdgpu_bo(buf->priv);
+   amdgpu_vm_bo_update_shared(bo, -1);
+   drm_gem_dmabuf_release(buf);


Please run checkpatch.pl on the patch. As far as I can see it would
complain about the coding style here (empty line between declaration and code).

Not much of an issue but we would like to prevent upstream from
complaining about such things.


Will do


+}
+
/**
 * amdgpu_dma_buf_begin_cpu_access -
&dma_buf_ops.begin_cpu_access

implementation

 * @dma_buf: Shared DMA buffer
@@ -237,7 +245,7 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = {
  .unpin = amdgpu_dma_buf_unpin,
  .map_dma_buf = amdgpu_dma_buf_map,
  .unmap_dma_buf = amdgpu_dma_buf_unmap,
-   .release = drm_gem_dmabuf_release,
+   .release = amdgpu_dma_buf_release,
  .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access,
  .mmap = drm_gem_dmabuf_mmap,
  .vmap = drm_gem_dmabuf_vmap,
@@ -265,8 +273,10 @@ struct dma_buf *amdgpu_gem_prime_export(struct

drm_gem_object *gobj,

  return ERR_PTR(-EPERM);

  buf = drm_gem_prime_export(gobj, flags);
-   if (!IS_ERR(buf))
+   if (!IS_ERR(buf)) {
  buf->ops = &amdgpu_dmabuf_ops;
+   amdgpu_vm_bo_update_shared(bo, +1);
+   }

  return buf;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 7a9573958d87c..e0e09f7b39d10 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -60,7 +60,7 @@ void amdgpu_show_fdinfo(struct drm_printer *p,
struct

drm_file *file)

  struct amdgpu_fpriv *fpriv = file->driver_priv;
  struct amdgpu_vm *vm = &fpriv->vm;

-   struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { };
+   struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST] = { };
  ktime_t usage[AMDGPU_HW_IP_NUM];
  const char *pl_name[] = {
  [TTM_PL_VRAM] = "vram", @@ -70,13 +70,7 @@ void
amdgpu_show_fdinfo(struct drm_printer *p, struct

drm_file *file)

  unsigned int hw_ip, i;
  int ret;

-   ret = amdgpu_bo_reserve(vm->root.bo, false);
-   if (ret)
-   return;
-
-   amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats));
-   amdgpu_bo_unreserve(vm->root.bo);
-
+   amdgpu_vm_get_memory(vm, stats);
  amdgpu_ctx_mgr_usage(&fpriv->ctx_mgr, usage);

  /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2436b7c9ad12b..98563124ff99c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1156,7 +1156,7 @@ void amdgpu_bo_move_notify(struct
ttm_buffer_object

*bo,

  return;

  abo = ttm_to_amdgpu_bo(bo);
-   amdgpu_vm_bo_invalidate(abo, evict);
+   amdgpu_vm_bo_move(abo, new_mem, evict);

  amdgpu_bo_kunmap(abo);

@@ -1169,86 +1169,6 @@ void amdgpu_bo_move_notify(struct

ttm_buffer_object *bo,

Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc

2024-11-07 Thread Tvrtko Ursulin




On 28/10/2024 10:34, Christian König wrote:

Am 25.10.24 um 11:05 schrieb Tvrtko Ursulin:


On 25/10/2024 09:59, Tvrtko Ursulin wrote:


On 24/10/2024 13:41, Christian König wrote:
Reports indicates that some userspace applications try to merge more 
than

80k of fences into a single dma_fence_array leading to a warning from
kzalloc() that the requested size becomes to big.

While that is clearly an userspace bug we should probably handle 
that case

gracefully in the kernel.

So we can either reject requests to merge more than a reasonable 
amount of
fences (64k maybe?) or we can start to use kvzalloc() instead of 
kzalloc().

This patch here does the later.


Rejecting would potentially be safer, otherwise there is a path for 
userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b 
("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) 
and spam dmesg at will.


Actually that is a WARN_ON_*ONCE* there so maybe not so critical to 
invent a limit. Up for discussion I suppose.


Regards,

Tvrtko



Question is what limit to set...


That's one of the reasons why I opted for kvzalloc() initially.


I didn't get that, what was the reason? To not have to invent an 
arbitrary limit?


I mean we could use some nice round number like 65536, but that would be 
totally arbitrary.


Yeah.. Set an arbitrary limit so a warning in __kvmalloc_node_noprof() 
is avoided? Or pass __GFP_NOWARN?



Any comments on the other two patches? I need to get them upstream.


Will look into them shortly.

Regards,

Tvrtko



Thanks,
Christian.



Regards,

Tvrtko


Signed-off-by: Christian König 
CC: sta...@vger.kernel.org
---
  drivers/dma-buf/dma-fence-array.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c

index 8a08ffde31e7..46ac42bcfac0 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -119,8 +119,8 @@ static void dma_fence_array_release(struct 
dma_fence *fence)

  for (i = 0; i < array->num_fences; ++i)
  dma_fence_put(array->fences[i]);
-    kfree(array->fences);
-    dma_fence_free(fence);
+    kvfree(array->fences);
+    kvfree_rcu(fence, rcu);
  }
  static void dma_fence_array_set_deadline(struct dma_fence *fence,
@@ -153,7 +153,7 @@ struct dma_fence_array 
*dma_fence_array_alloc(int num_fences)

  {
  struct dma_fence_array *array;
-    return kzalloc(struct_size(array, callbacks, num_fences), 
GFP_KERNEL);
+    return kvzalloc(struct_size(array, callbacks, num_fences), 
GFP_KERNEL);

  }
  EXPORT_SYMBOL(dma_fence_array_alloc);

Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc

2024-11-07 Thread Tvrtko Ursulin




On 07/11/2024 12:48, Christian König wrote:

Am 07.11.24 um 12:29 schrieb Tvrtko Ursulin:


On 28/10/2024 10:34, Christian König wrote:

Am 25.10.24 um 11:05 schrieb Tvrtko Ursulin:


On 25/10/2024 09:59, Tvrtko Ursulin wrote:


On 24/10/2024 13:41, Christian König wrote:
Reports indicates that some userspace applications try to merge 
more than

80k of fences into a single dma_fence_array leading to a warning from
kzalloc() that the requested size becomes to big.

While that is clearly an userspace bug we should probably handle 
that case

gracefully in the kernel.

So we can either reject requests to merge more than a reasonable 
amount of
fences (64k maybe?) or we can start to use kvzalloc() instead of 
kzalloc().

This patch here does the later.


Rejecting would potentially be safer, otherwise there is a path for 
userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b 
("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) 
and spam dmesg at will.


Actually that is a WARN_ON_*ONCE* there so maybe not so critical to 
invent a limit. Up for discussion I suppose.


Regards,

Tvrtko



Question is what limit to set...


That's one of the reasons why I opted for kvzalloc() initially.


I didn't get that, what was the reason? To not have to invent an 
arbitrary limit?


Well that I couldn't come up with any arbitrary limit that I had 
confidence would work and not block real world use cases.


Switching to kvzalloc() just seemed the more defensive approach.


Yeah it is.

I mean we could use some nice round number like 65536, but that would 
be totally arbitrary.


Yeah.. Set an arbitrary limit so a warning in __kvmalloc_node_noprof() 
is avoided? Or pass __GFP_NOWARN?


Well are we sure that will never hit 65536 in a real world use case? 
It's still pretty low.


Ah no, I did not express myself clearly. I did not mean 64k, but a limit 
to align with INT_MAX __kvmalloc_node_noprof(). Or __GFP_NOWARN might be 
better when allocation size is userspace controlled.


Regards,

Tvrtko


Any comments on the other two patches? I need to get them upstream.


Will look into them shortly.


Thanks,
Christian.



Regards,

Tvrtko



Thanks,
Christian.



Regards,

Tvrtko


Signed-off-by: Christian König 
CC: sta...@vger.kernel.org
---
  drivers/dma-buf/dma-fence-array.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c

index 8a08ffde31e7..46ac42bcfac0 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -119,8 +119,8 @@ static void dma_fence_array_release(struct 
dma_fence *fence)

  for (i = 0; i < array->num_fences; ++i)
  dma_fence_put(array->fences[i]);
-    kfree(array->fences);
-    dma_fence_free(fence);
+    kvfree(array->fences);
+    kvfree_rcu(fence, rcu);
  }
  static void dma_fence_array_set_deadline(struct dma_fence *fence,
@@ -153,7 +153,7 @@ struct dma_fence_array 
*dma_fence_array_alloc(int num_fences)

  {
  struct dma_fence_array *array;
-    return kzalloc(struct_size(array, callbacks, num_fences), 
GFP_KERNEL);
+    return kvzalloc(struct_size(array, callbacks, num_fences), 
GFP_KERNEL);

  }
  EXPORT_SYMBOL(dma_fence_array_alloc);

Re: [PATCH 2/3] dma-buf: sort fences in dma_fence_unwrap_merge

2024-11-07 Thread Tvrtko Ursulin




On 24/10/2024 13:41, Christian König wrote:

The merge function initially handled only individual fences and
arrays which in turn were created by the merge function. This allowed
to create the new array by a simple merge sort based on the fence
context number.

The problem is now that since the addition of timeline sync objects
userspace can create chain containers in basically any fence context
order.

If those are merged together it can happen that we create really
large arrays since the merge sort algorithm doesn't work any more.

So put an insert sort behind the merge sort which kicks in when the
input fences are not in the expected order. This isn't as efficient
as a heap sort, but has better properties for the most common use
case.

Signed-off-by: Christian König 
---
  drivers/dma-buf/dma-fence-unwrap.c | 39 ++
  1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-unwrap.c 
b/drivers/dma-buf/dma-fence-unwrap.c
index 628af51c81af..d9aa280d9ff6 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -106,7 +106,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int 
num_fences,
fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
  
  	count = 0;

-   do {
+   while (true) {
unsigned int sel;
  
  restart:

@@ -144,11 +144,40 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int 
num_fences,
}
}
  
-		if (tmp) {

-   array[count++] = dma_fence_get(tmp);
-   fences[sel] = dma_fence_unwrap_next(&iter[sel]);
+   if (!tmp)
+   break;
+
+   /*
+* We could use a binary search here, but since the assumption
+* is that the main input are already sorted dma_fence_arrays
+* just looking from end has a higher chance of finding the
+* right location on the first try
+*/
+
+   for (i = count; i--;) {
+   if (likely(array[i]->context < tmp->context))
+   break;
+
+   if (array[i]->context == tmp->context) {
+   if (dma_fence_is_later(tmp, array[i])) {
+   dma_fence_put(array[i]);
+   array[i] = dma_fence_get(tmp);
+   }
+   fences[sel] = dma_fence_unwrap_next(&iter[sel]);
+   goto restart;
+   }
}
-   } while (tmp);
+
+   ++i;
+   /*
+* Make room for the fence, this should be a nop most of the
+* time.
+*/
+   memcpy(&array[i + 1], &array[i], (count - i) * sizeof(*array));
+   array[i] = dma_fence_get(tmp);
+   fences[sel] = dma_fence_unwrap_next(&iter[sel]);
+   count++;


Having ventured into this function for the first time, I can say that 
this is some smart code which is not easy to grasp. It could definitely 
benefit from a high level comment before the do-while loop to explain 
what it is going to do.


Next and tmp local variable names I also wonder if could be renamed to 
something more descriptive.


And the algorithmic complexity of the end result, given the multiple 
loops and gotos, I have no idea what it could be.


Has a dumb solution been considered like a two-pass with a 
pessimistically allocated fence array been considered? Like:


1) Populate array with all unsignalled unwrapped fences. (O(count))

2) Bog standard include/linux/sort.h by context and seqno. (O(count*log 
(count)))


3) Walk array and squash same context to latest fence. (Before this 
patch that wasn't there, right?). (O(count)) (Overwrite in place, no 
memcpy needed.)


Algorithmic complexity of that would be obvious and code much simpler.

Regards,

Tvrtko


+   };
  
  	if (count == 0) {

tmp = dma_fence_allocate_private_stub(ktime_get());

Re: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero

2024-11-07 Thread Tvrtko Ursulin




On 07/11/2024 14:17, Li, Yunxiang (Teddy) wrote:

[AMD Official Use Only - AMD Internal Distribution Only]


From: Tvrtko Ursulin 
Sent: Thursday, November 7, 2024 5:41
On 25/10/2024 18:41, Yunxiang Li wrote:

Add a helper to check if the memory stats is zero, this will be used
to check for memory accounting errors.

Signed-off-by: Yunxiang Li 
---
   drivers/gpu/drm/drm_file.c | 9 +
   include/drm/drm_file.h | 1 +
   2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 714e42b051080..75ed701d80f74 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -859,6 +859,15 @@ static void print_size(struct drm_printer *p, const char

*stat,

 drm_printf(p, "drm-%s-%s:\t%llu%s\n", stat, region, sz, units[u]);
   }

+int drm_memory_stats_is_zero(const struct drm_memory_stats *stats) {
+   return (stats->shared == 0 &&
+   stats->private == 0 &&
+   stats->resident == 0 &&
+   stats->purgeable == 0 &&
+   stats->active == 0);
+}


Could use mem_is_zero() for some value of source/binary compactness.


Yeah, the patch set started out with that when it's just a function in amdgpu, 
but Christ didn't like it.


Okay, I don't feel so strongly about the implementation details.


+EXPORT_SYMBOL(drm_memory_stats_is_zero);
+


I am not a huge fan of adding this as an interface as the only caller appears 
to be a
sanity check in amdgpu_vm_fini():

   if (!amdgpu_vm_stats_is_zero(vm))
   dev_err(adev->dev, "VM memory stats is non-zero when fini\n");

But I guess there is some value in sanity checking since amdgpu does not have a
notion of debug only code (compiled at production and exercised via a test 
suite).

I do suggest to demote the dev_err to notice log level would suffice and be more
accurate.


I think it's very important to have a check like this when we have a known 
invariant, especially in this case where there's stat tracking code spread out 
everywhere and we have very little chance of catching a bug right when it 
happened. And since whenever this check fails we know for sure there is a bug, 
I don't see the harm of keeping it as an error.
It would indeed be a programming error if it can happen, but from the 
point of view of a driver and system log I think a warning is actually 
right.


Regards,

Tvrtko



Now that I think about it, I probably want to have the process & task name in 
here to aid in reproduction.

Teddy

Re: [PATCH v6 5/5] drm/amdgpu: track bo memory stats at runtime

2024-11-07 Thread Tvrtko Ursulin




On 31/10/2024 13:48, Li, Yunxiang (Teddy) wrote:

[Public]


From: Christian König 
Sent: Thursday, October 31, 2024 8:54
Am 25.10.24 um 19:41 schrieb Yunxiang Li:

Before, every time fdinfo is queried we try to lock all the BOs in the
VM and calculate memory usage from scratch. This works okay if the
fdinfo is rarely read and the VMs don't have a ton of BOs. If either
of these conditions is not true, we get a massive performance hit.

In this new revision, we track the BOs as they change states. This way
when the fdinfo is queried we only need to take the status lock and
copy out the usage stats with minimal impact to the runtime performance.

Signed-off-by: Yunxiang Li 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c |  14 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c  |  10 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 107 +++
   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |   5 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |   2 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 189 +++-
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  12 +-
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c   |   1 +
   8 files changed, 199 insertions(+), 141 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index b144404902255..1d8a0ff3c8604 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -36,6 +36,7 @@
   #include "amdgpu_gem.h"
   #include "amdgpu_dma_buf.h"
   #include "amdgpu_xgmi.h"
+#include "amdgpu_vm.h"
   #include 
   #include 
   #include 
@@ -190,6 +191,13 @@ static void amdgpu_dma_buf_unmap(struct

dma_buf_attachment *attach,

 }
   }

+static void amdgpu_dma_buf_release(struct dma_buf *buf) {
+   struct amdgpu_bo *bo = gem_to_amdgpu_bo(buf->priv);
+   amdgpu_vm_bo_update_shared(bo, -1);
+   drm_gem_dmabuf_release(buf);


Please run checkpatch.pl on the patch. As far as I can see it would complain 
about
the coding style here (empty line between declaration and code).

Not much of an issue but we would like to prevent upstream from complaining 
about
such things.


Will do


+}
+
   /**
* amdgpu_dma_buf_begin_cpu_access - &dma_buf_ops.begin_cpu_access

implementation

* @dma_buf: Shared DMA buffer
@@ -237,7 +245,7 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = {
 .unpin = amdgpu_dma_buf_unpin,
 .map_dma_buf = amdgpu_dma_buf_map,
 .unmap_dma_buf = amdgpu_dma_buf_unmap,
-   .release = drm_gem_dmabuf_release,
+   .release = amdgpu_dma_buf_release,
 .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access,
 .mmap = drm_gem_dmabuf_mmap,
 .vmap = drm_gem_dmabuf_vmap,
@@ -265,8 +273,10 @@ struct dma_buf *amdgpu_gem_prime_export(struct

drm_gem_object *gobj,

 return ERR_PTR(-EPERM);

 buf = drm_gem_prime_export(gobj, flags);
-   if (!IS_ERR(buf))
+   if (!IS_ERR(buf)) {
 buf->ops = &amdgpu_dmabuf_ops;
+   amdgpu_vm_bo_update_shared(bo, +1);
+   }

 return buf;
   }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 7a9573958d87c..e0e09f7b39d10 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -60,7 +60,7 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct

drm_file *file)

 struct amdgpu_fpriv *fpriv = file->driver_priv;
 struct amdgpu_vm *vm = &fpriv->vm;

-   struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { };
+   struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST] = { };
 ktime_t usage[AMDGPU_HW_IP_NUM];
 const char *pl_name[] = {
 [TTM_PL_VRAM] = "vram",
@@ -70,13 +70,7 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct

drm_file *file)

 unsigned int hw_ip, i;
 int ret;

-   ret = amdgpu_bo_reserve(vm->root.bo, false);
-   if (ret)
-   return;
-
-   amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats));
-   amdgpu_bo_unreserve(vm->root.bo);
-
+   amdgpu_vm_get_memory(vm, stats);
 amdgpu_ctx_mgr_usage(&fpriv->ctx_mgr, usage);

 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2436b7c9ad12b..98563124ff99c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1156,7 +1156,7 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object

*bo,

 return;

 abo = ttm_to_amdgpu_bo(bo);
-   amdgpu_vm_bo_invalidate(abo, evict);
+   amdgpu_vm_bo_move(abo, new_mem, evict);

 amdgpu_bo_kunmap(abo);

@@ -1169,86 +1169,6 @@ void amdgpu_bo_move_notify(struct

ttm_buffer_object *bo,

  old_mem ? old_mem->mem_type : -1);
   }

-void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
- struct amdgpu_mem_stats *stats,
- unsigned int sz)
-{
-   const unsigned int domain_to_pl[] = {
-   [ilog2(AMDGPU_GEM_DOMAIN_CPU)]  = TTM_PL_

Re: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero

2024-11-07 Thread Tvrtko Ursulin




On 25/10/2024 18:41, Yunxiang Li wrote:

Add a helper to check if the memory stats is zero, this will be used to
check for memory accounting errors.

Signed-off-by: Yunxiang Li 
---
  drivers/gpu/drm/drm_file.c | 9 +
  include/drm/drm_file.h | 1 +
  2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 714e42b051080..75ed701d80f74 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -859,6 +859,15 @@ static void print_size(struct drm_printer *p, const char 
*stat,
drm_printf(p, "drm-%s-%s:\t%llu%s\n", stat, region, sz, units[u]);
  }
  
+int drm_memory_stats_is_zero(const struct drm_memory_stats *stats) {

+   return (stats->shared == 0 &&
+   stats->private == 0 &&
+   stats->resident == 0 &&
+   stats->purgeable == 0 &&
+   stats->active == 0);
+}


Could use mem_is_zero() for some value of source/binary compactness.


+EXPORT_SYMBOL(drm_memory_stats_is_zero);
+


I am not a huge fan of adding this as an interface as the only caller 
appears to be a sanity check in amdgpu_vm_fini():


if (!amdgpu_vm_stats_is_zero(vm))
dev_err(adev->dev, "VM memory stats is non-zero when fini\n");

But I guess there is some value in sanity checking since amdgpu does not 
have a notion of debug only code (compiled at production and exercised 
via a test suite).


I do suggest to demote the dev_err to notice log level would suffice and 
be more accurate.


Regards,

Tvrtko


  /**
   * drm_print_memory_stats - A helper to print memory stats
   * @p: The printer to print output to
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index ab230d3af138d..7f91e35d027d9 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -477,6 +477,7 @@ struct drm_memory_stats {
  
  enum drm_gem_object_status;
  
+int drm_memory_stats_is_zero(const struct drm_memory_stats *stats);

  void drm_print_memory_stats(struct drm_printer *p,
const struct drm_memory_stats *stats,
enum drm_gem_object_status supported_status,

Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc

2024-10-25 Thread Tvrtko Ursulin




On 25/10/2024 09:59, Tvrtko Ursulin wrote:


On 24/10/2024 13:41, Christian König wrote:

Reports indicates that some userspace applications try to merge more than
80k of fences into a single dma_fence_array leading to a warning from
kzalloc() that the requested size becomes to big.

While that is clearly an userspace bug we should probably handle that 
case

gracefully in the kernel.

So we can either reject requests to merge more than a reasonable 
amount of
fences (64k maybe?) or we can start to use kvzalloc() instead of 
kzalloc().

This patch here does the later.


Rejecting would potentially be safer, otherwise there is a path for 
userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b 
("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) and 
spam dmesg at will.


Actually that is a WARN_ON_*ONCE* there so maybe not so critical to 
invent a limit. Up for discussion I suppose.


Regards,

Tvrtko



Question is what limit to set...

Regards,

Tvrtko


Signed-off-by: Christian König 
CC: sta...@vger.kernel.org
---
  drivers/dma-buf/dma-fence-array.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c

index 8a08ffde31e7..46ac42bcfac0 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -119,8 +119,8 @@ static void dma_fence_array_release(struct 
dma_fence *fence)

  for (i = 0; i < array->num_fences; ++i)
  dma_fence_put(array->fences[i]);
-    kfree(array->fences);
-    dma_fence_free(fence);
+    kvfree(array->fences);
+    kvfree_rcu(fence, rcu);
  }
  static void dma_fence_array_set_deadline(struct dma_fence *fence,
@@ -153,7 +153,7 @@ struct dma_fence_array *dma_fence_array_alloc(int 
num_fences)

  {
  struct dma_fence_array *array;
-    return kzalloc(struct_size(array, callbacks, num_fences), 
GFP_KERNEL);
+    return kvzalloc(struct_size(array, callbacks, num_fences), 
GFP_KERNEL);

  }
  EXPORT_SYMBOL(dma_fence_array_alloc);

Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc

2024-10-25 Thread Tvrtko Ursulin




On 24/10/2024 13:41, Christian König wrote:

Reports indicates that some userspace applications try to merge more than
80k of fences into a single dma_fence_array leading to a warning from
kzalloc() that the requested size becomes to big.

While that is clearly an userspace bug we should probably handle that case
gracefully in the kernel.

So we can either reject requests to merge more than a reasonable amount of
fences (64k maybe?) or we can start to use kvzalloc() instead of kzalloc().
This patch here does the later.


Rejecting would potentially be safer, otherwise there is a path for 
userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b 
("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) and 
spam dmesg at will.


Question is what limit to set...

Regards,

Tvrtko


Signed-off-by: Christian König 
CC: sta...@vger.kernel.org
---
  drivers/dma-buf/dma-fence-array.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c
index 8a08ffde31e7..46ac42bcfac0 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -119,8 +119,8 @@ static void dma_fence_array_release(struct dma_fence *fence)
for (i = 0; i < array->num_fences; ++i)
dma_fence_put(array->fences[i]);
  
-	kfree(array->fences);

-   dma_fence_free(fence);
+   kvfree(array->fences);
+   kvfree_rcu(fence, rcu);
  }
  
  static void dma_fence_array_set_deadline(struct dma_fence *fence,

@@ -153,7 +153,7 @@ struct dma_fence_array *dma_fence_array_alloc(int 
num_fences)
  {
struct dma_fence_array *array;
  
-	return kzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL);

+   return kvzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL);
  }
  EXPORT_SYMBOL(dma_fence_array_alloc);

Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime

2024-10-24 Thread Tvrtko Ursulin




On 23/10/2024 13:56, Christian König wrote:

Am 23.10.24 um 14:24 schrieb Tvrtko Ursulin:

[SNIP]
To fold or not the special placements (GWS, GDS & co) is also 
tangential. In my patch I just preserved the legacy behaviour so it 
can easily be tweaked on top.


Yeah, but again the original behavior is completely broken.

GWS, GDS and OA are counted in blocks of HW units (multiplied by 
PAGE_SIZE IIRC to avoid some GEM&TTM warnings).


When you accumulate that anywhere in the memory stats then that is 
just completely off.


Ooops. :) Are they backed by some memory though, be it system or VRAM?


GDS is an internal 4 or 64KiB memory block which is only valid while 
shaders are running. It is used to communicate stuff between different 
shader stages and not even CPU accessible.


GWS and OA are not even memory, those are just HW blocks which implement 
a fixed function.


IIRC most HW generation have 16 of each and when setting up the 
application virtual address space you can specify how many will be used 
by the application.


I see, thank you! Though I could have bothered to look in the code or 
even instrument at runtime too.


I agree removing it from system is correct. If wanted and/or desirable 
some or all could be exported as different memory regions even. DRM 
fdinfo specs already allows that. Like:


drm-total-vram: ...
drm-total-gds: ...
drm-total-oa: ...

Etc.

Regards,

Tvrtko

Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime

2024-10-23 Thread Tvrtko Ursulin




On 23/10/2024 14:31, Li, Yunxiang (Teddy) wrote:

[AMD Official Use Only - AMD Internal Distribution Only]


From: Tvrtko Ursulin 
Sent: Wednesday, October 23, 2024 8:25
On 23/10/2024 13:12, Christian König wrote:

Am 23.10.24 um 13:37 schrieb Tvrtko Ursulin:


On 23/10/2024 10:14, Christian König wrote:

Am 23.10.24 um 09:38 schrieb Tvrtko Ursulin:


On 22/10/2024 17:24, Christian König wrote:

Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy):

[Public]


+static uint32_t fold_memtype(uint32_t memtype) {

In general please add prefixes to even static functions, e.g.
amdgpu_vm_ or
amdgpu_bo_.


+   /* Squash private placements into 'cpu' to keep the legacy
userspace view.

*/

+   switch (mem_type) {
+   case TTM_PL_VRAM:
+   case TTM_PL_TT:
+   return memtype
+   default:
+   return TTM_PL_SYSTEM;
+   }
+}
+
+static uint32_t bo_get_memtype(struct amdgpu_bo *bo) {

That whole function belongs into amdgpu_bo.c

Do you mean bo_get_memtype or fold_memtype? I debated whether
bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and
since it's using fold_memtype and only useful for memory stats
because of folding the private placements I just left them here
together with the other mem stats code.

I can move it to amdgpu_bo.c make it return the memtype verbatim
and just fold it when I do the accounting.


I think that folding GDS, GWS and OA into system is also a bug. We
should really not doing that.

Just wanted to point out for this round that the code to query the
current placement from a BO should probably go into amdgpu_bo.c
and not amdgpu_vm.c




+   struct ttm_resource *res = bo->tbo.resource;
+   const uint32_t domain_to_pl[] = {
+   [ilog2(AMDGPU_GEM_DOMAIN_CPU)]  =
+TTM_PL_SYSTEM,
+   [ilog2(AMDGPU_GEM_DOMAIN_GTT)]  = TTM_PL_TT,
+   [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] =

TTM_PL_VRAM,

+   [ilog2(AMDGPU_GEM_DOMAIN_GDS)]  =
+AMDGPU_PL_GDS,
+   [ilog2(AMDGPU_GEM_DOMAIN_GWS)]  =
+AMDGPU_PL_GWS,
+   [ilog2(AMDGPU_GEM_DOMAIN_OA)]   =

AMDGPU_PL_OA,

+   [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] =

AMDGPU_PL_DOORBELL,

+   };
+   uint32_t domain;
+
+   if (res)
+   return fold_memtype(res->mem_type);
+
+   /*
+* If no backing store use one of the preferred domain for
basic
+* stats. We take the MSB since that should give a
+reasonable
+* view.
+*/
+   BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT ||

TTM_PL_VRAM <

TTM_PL_SYSTEM);

+   domain = fls(bo->preferred_domains &
+AMDGPU_GEM_DOMAIN_MASK);
+   if (drm_WARN_ON_ONCE(&adev->ddev,
+domain == 0 || --domain >=
ARRAY_SIZE(domain_to_pl)))

It's perfectly legal to create a BO without a placement. That
one just won't have a backing store.


This is lifted from the previous change I'm rebasing onto. I
think what it’s trying to do is if the BO doesn't have a
placement, use the "biggest" (VRAM > TT > SYSTEM) preferred
placement for the purpose of accounting. Previously we just
ignore BOs that doesn't have a placement. I guess there's
argument for going with either approaches.


I was not arguing, I'm simply pointing out a bug. It's perfectly
valid for bo->preferred_domains to be 0.

So the following WARN_ON() that no bit is set is incorrect.




+   return 0;
+   return fold_memtype(domain_to_pl[domain])

That would need specular execution mitigation if I'm not
completely mistaken.

Better use a switch/case statement.


Do you mean change the array indexing to a switch statement?


Yes.


Did you mean array_index_nospec?


Yes.


Domain is not a direct userspace input and is calculated from the
mask which sanitized to allowed values prior to this call. So I
*think* switch is an overkill but don't mind it either. Just
commenting FWIW.


I missed that the mask is applied.

Thinking more about it I'm not sure if we should do this conversion
in the first place. IIRC Tvrtko you once suggested a patch which
switched a bunch of code to use the TTM placement instead of the
UAPI flags.


Maybe 8fb0efb10184 ("drm/amdgpu: Reduce mem_type to domain double
indirection") is what are you thinking of?


Yes, exactly that one.




Going more into this direction I think when we want to look at the
current placement we should probably also use the TTM PL enumeration
directly.


It does this already. The placement flags are just to "invent" a TTM
PL enum when bo->tbo.resource == NULL.


Ah, good point! I though we would do the mapping the other way around.

In this case that is even more something we should probably not do at all.

When bo->tbo.resource is NULL then this BO isn't resident at all, so
it should not account to resident memory.


It doesn't, only for total. I should have pasted more context..:

   struct ttm_resource *res = bo->tbo.resource; ...
  /* DRM stats c

Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime

2024-10-23 Thread Tvrtko Ursulin




On 23/10/2024 13:12, Christian König wrote:

Am 23.10.24 um 13:37 schrieb Tvrtko Ursulin:


On 23/10/2024 10:14, Christian König wrote:

Am 23.10.24 um 09:38 schrieb Tvrtko Ursulin:


On 22/10/2024 17:24, Christian König wrote:

Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy):

[Public]


+static uint32_t fold_memtype(uint32_t memtype) {
In general please add prefixes to even static functions, e.g. 
amdgpu_vm_ or

amdgpu_bo_.

+   /* Squash private placements into 'cpu' to keep the legacy 
userspace view.

*/

+   switch (mem_type) {
+   case TTM_PL_VRAM:
+   case TTM_PL_TT:
+   return memtype
+   default:
+   return TTM_PL_SYSTEM;
+   }
+}
+
+static uint32_t bo_get_memtype(struct amdgpu_bo *bo) {

That whole function belongs into amdgpu_bo.c
Do you mean bo_get_memtype or fold_memtype? I debated whether 
bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and 
since it's using fold_memtype and only useful for memory stats 
because of folding the private placements I just left them here 
together with the other mem stats code.


I can move it to amdgpu_bo.c make it return the memtype verbatim 
and just fold it when I do the accounting.


I think that folding GDS, GWS and OA into system is also a bug. We 
should really not doing that.


Just wanted to point out for this round that the code to query the 
current placement from a BO should probably go into amdgpu_bo.c and 
not amdgpu_vm.c





+   struct ttm_resource *res = bo->tbo.resource;
+   const uint32_t domain_to_pl[] = {
+   [ilog2(AMDGPU_GEM_DOMAIN_CPU)]  = TTM_PL_SYSTEM,
+   [ilog2(AMDGPU_GEM_DOMAIN_GTT)]  = TTM_PL_TT,
+   [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM,
+   [ilog2(AMDGPU_GEM_DOMAIN_GDS)]  = AMDGPU_PL_GDS,
+   [ilog2(AMDGPU_GEM_DOMAIN_GWS)]  = AMDGPU_PL_GWS,
+   [ilog2(AMDGPU_GEM_DOMAIN_OA)]   = AMDGPU_PL_OA,
+   [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] =

AMDGPU_PL_DOORBELL,

+   };
+   uint32_t domain;
+
+   if (res)
+   return fold_memtype(res->mem_type);
+
+   /*
+    * If no backing store use one of the preferred domain for 
basic

+    * stats. We take the MSB since that should give a reasonable
+    * view.
+    */
+   BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM <

TTM_PL_SYSTEM);

+   domain = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK);
+   if (drm_WARN_ON_ONCE(&adev->ddev,
+    domain == 0 || --domain >= 
ARRAY_SIZE(domain_to_pl)))
It's perfectly legal to create a BO without a placement. That one 
just won't have a

backing store.

This is lifted from the previous change I'm rebasing onto. I think 
what it’s trying to do is if the BO doesn't have a placement, use 
the "biggest" (VRAM > TT > SYSTEM) preferred placement for the 
purpose of accounting. Previously we just ignore BOs that doesn't 
have a placement. I guess there's argument for going with either 
approaches.


I was not arguing, I'm simply pointing out a bug. It's perfectly 
valid for bo->preferred_domains to be 0.


So the following WARN_ON() that no bit is set is incorrect.




+   return 0;
+   return fold_memtype(domain_to_pl[domain])
That would need specular execution mitigation if I'm not 
completely mistaken.


Better use a switch/case statement.


Do you mean change the array indexing to a switch statement?


Yes.


Did you mean array_index_nospec?


Yes.

Domain is not a direct userspace input and is calculated from the 
mask which sanitized to allowed values prior to this call. So I 
*think* switch is an overkill but don't mind it either. Just 
commenting FWIW.


I missed that the mask is applied.

Thinking more about it I'm not sure if we should do this conversion 
in the first place. IIRC Tvrtko you once suggested a patch which 
switched a bunch of code to use the TTM placement instead of the UAPI 
flags.


Maybe 8fb0efb10184 ("drm/amdgpu: Reduce mem_type to domain double 
indirection") is what are you thinking of?


Yes, exactly that one.



Going more into this direction I think when we want to look at the 
current placement we should probably also use the TTM PL enumeration 
directly.


It does this already. The placement flags are just to "invent" a TTM 
PL enum when bo->tbo.resource == NULL.


Ah, good point! I though we would do the mapping the other way around.

In this case that is even more something we should probably not do at all.

When bo->tbo.resource is NULL then this BO isn't resident at all, so it 
should not account to resident memory.


It doesn't, only for total. I should have pasted more context..:

struct ttm_resource *res = bo->tbo.resource;
...
/* DRM stats common fields: */

stats[type].total += size;
if (drm_gem_object_is_shared_for_memory_stats(obj))
stats[type].drm.shared

Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime

2024-10-23 Thread Tvrtko Ursulin




On 23/10/2024 10:14, Christian König wrote:

Am 23.10.24 um 09:38 schrieb Tvrtko Ursulin:


On 22/10/2024 17:24, Christian König wrote:

Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy):

[Public]


+static uint32_t fold_memtype(uint32_t memtype) {
In general please add prefixes to even static functions, e.g. 
amdgpu_vm_ or

amdgpu_bo_.

+   /* Squash private placements into 'cpu' to keep the legacy 
userspace view.

*/

+   switch (mem_type) {
+   case TTM_PL_VRAM:
+   case TTM_PL_TT:
+   return memtype
+   default:
+   return TTM_PL_SYSTEM;
+   }
+}
+
+static uint32_t bo_get_memtype(struct amdgpu_bo *bo) {

That whole function belongs into amdgpu_bo.c
Do you mean bo_get_memtype or fold_memtype? I debated whether 
bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and since 
it's using fold_memtype and only useful for memory stats because of 
folding the private placements I just left them here together with 
the other mem stats code.


I can move it to amdgpu_bo.c make it return the memtype verbatim and 
just fold it when I do the accounting.


I think that folding GDS, GWS and OA into system is also a bug. We 
should really not doing that.


Just wanted to point out for this round that the code to query the 
current placement from a BO should probably go into amdgpu_bo.c and 
not amdgpu_vm.c





+   struct ttm_resource *res = bo->tbo.resource;
+   const uint32_t domain_to_pl[] = {
+   [ilog2(AMDGPU_GEM_DOMAIN_CPU)]  = TTM_PL_SYSTEM,
+   [ilog2(AMDGPU_GEM_DOMAIN_GTT)]  = TTM_PL_TT,
+   [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM,
+   [ilog2(AMDGPU_GEM_DOMAIN_GDS)]  = AMDGPU_PL_GDS,
+   [ilog2(AMDGPU_GEM_DOMAIN_GWS)]  = AMDGPU_PL_GWS,
+   [ilog2(AMDGPU_GEM_DOMAIN_OA)]   = AMDGPU_PL_OA,
+   [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] =

AMDGPU_PL_DOORBELL,

+   };
+   uint32_t domain;
+
+   if (res)
+   return fold_memtype(res->mem_type);
+
+   /*
+    * If no backing store use one of the preferred domain for basic
+    * stats. We take the MSB since that should give a reasonable
+    * view.
+    */
+   BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM <

TTM_PL_SYSTEM);

+   domain = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK);
+   if (drm_WARN_ON_ONCE(&adev->ddev,
+    domain == 0 || --domain >= 
ARRAY_SIZE(domain_to_pl)))
It's perfectly legal to create a BO without a placement. That one 
just won't have a

backing store.

This is lifted from the previous change I'm rebasing onto. I think 
what it’s trying to do is if the BO doesn't have a placement, use 
the "biggest" (VRAM > TT > SYSTEM) preferred placement for the 
purpose of accounting. Previously we just ignore BOs that doesn't 
have a placement. I guess there's argument for going with either 
approaches.


I was not arguing, I'm simply pointing out a bug. It's perfectly 
valid for bo->preferred_domains to be 0.


So the following WARN_ON() that no bit is set is incorrect.




+   return 0;
+   return fold_memtype(domain_to_pl[domain])
That would need specular execution mitigation if I'm not completely 
mistaken.


Better use a switch/case statement.


Do you mean change the array indexing to a switch statement?


Yes.


Did you mean array_index_nospec?


Yes.

Domain is not a direct userspace input and is calculated from the mask 
which sanitized to allowed values prior to this call. So I *think* 
switch is an overkill but don't mind it either. Just commenting FWIW.


I missed that the mask is applied.

Thinking more about it I'm not sure if we should do this conversion in 
the first place. IIRC Tvrtko you once suggested a patch which switched a 
bunch of code to use the TTM placement instead of the UAPI flags.


Maybe 8fb0efb10184 ("drm/amdgpu: Reduce mem_type to domain double 
indirection") is what are you thinking of?


Going more into this direction I think when we want to look at the 
current placement we should probably also use the TTM PL enumeration 
directly.


It does this already. The placement flags are just to "invent" a TTM PL 
enum when bo->tbo.resource == NULL.


if (!res) {
/*
 * If no backing store use one of the preferred domain 
for basic
 * stats. We take the MSB since that should give a 
reasonable

 * view.
 */
BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT ||
 TTM_PL_VRAM < TTM_PL_SYSTEM);
type = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK);
if (!type)
return;
type--;
if (drm_WARN_ON_ONCE(&adev->ddev,
 type >= ARRAY_SIZE(domain_to_pl)))
r

[PULL] drm-intel-gt-next

2024-10-23 Thread Tvrtko Ursulin



Hi Dave, Sima,

This is the main pull request for 6.13 merge window.

PXP GuC auto-teardown feature got enabled, GPU reset robustness improvement
for Haswell and basic PMU functionality was enabled for Gen2 platforms.

The rest is a handful of small cleanups.

Regards,

Tvrtko

drm-intel-gt-next-2024-10-23:
Driver Changes:

Fixes/improvements/new stuff:

- Enable PXP GuC autoteardown flow [guc] (Juston Li)
- Retry RING_HEAD reset until it sticks [gt] (Nitin Gote)
- Add basic PMU support for gen2 [pmu] (Ville Syrjälä)

Miscellaneous:

- Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich)
- PMU code cleanups (Lucas De Marchi)
- Fixed "CPU" -> "GPU" typo [gt] (Zhang He)
- Gen2/3 interrupt handling cleanup (Ville Syrjälä)
The following changes since commit 596a7f1084e49cc65072c458c348861e9b9ceab9:

  drm/i915: Remove extra unlikely helper (2024-09-05 15:44:37 -0400)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-gt-next-2024-10-23

for you to fetch changes up to 6ef0e3ef2662db71d363af77ce31fa940bb7d525:

  drm/i915/gt: Retry RING_HEAD reset until it get sticks (2024-10-22 11:35:07 
+0200)


Driver Changes:

Fixes/improvements/new stuff:

- Enable PXP GuC autoteardown flow [guc] (Juston Li)
- Retry RING_HEAD reset until it get sticks [gt] (Nitin Gote)
- Add basic PMU support for gen2 [pmu] (Ville Syrjälä)

Miscellaneous:

- Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich)
- PMU code cleanups (Lucas De Marchi)
- Fixed "CPU" -> "GPU" typo [gt] (Zhang He)
- Gen2/3 interrupt handling cleanup (Ville Syrjälä)


Juston Li (1):
  drm/i915/guc: Enable PXP GuC autoteardown flow

Lucas De Marchi (2):
  drm/i915/pmu: Drop is_igp()
  drm/i915/pmu: Use event_to_pmu()

Nikita Zhandarovich (1):
  drm/i915/guc: prevent a possible int overflow in wq offsets

Nitin Gote (1):
  drm/i915/gt: Retry RING_HEAD reset until it get sticks

Ville Syrjälä (3):
  drm/i915/gt: Nuke gen2_irq_{enable,disable}()
  drm/i915/gt: s/gen3/gen2/
  drm/i915/pmu: Add support for gen2

Zhang He (1):
  drm/i915/gt: Fixed "CPU" -> "GPU" typo

 drivers/gpu/drm/i915/gt/gen2_engine_cs.c  | 23 ++
 drivers/gpu/drm/i915/gt/gen2_engine_cs.h  |  6 +--
 drivers/gpu/drm/i915/gt/intel_engine_regs.h   |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c |  2 +-
 drivers/gpu/drm/i915/gt/intel_ring_submission.c   | 38 
 drivers/gpu/drm/i915/gt/uc/intel_guc.c|  8 
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  4 +-
 drivers/gpu/drm/i915/i915_drv.h   |  3 ++
 drivers/gpu/drm/i915/i915_pmu.c   | 54 +--
 drivers/gpu/drm/i915/pxp/intel_pxp.c  |  2 +-
 11 files changed, 82 insertions(+), 61 deletions(-)

Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime

2024-10-23 Thread Tvrtko Ursulin




On 22/10/2024 17:24, Christian König wrote:

Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy):

[Public]


+static uint32_t fold_memtype(uint32_t memtype) {
In general please add prefixes to even static functions, e.g. 
amdgpu_vm_ or

amdgpu_bo_.

+   /* Squash private placements into 'cpu' to keep the legacy 
userspace view.

*/

+   switch (mem_type) {
+   case TTM_PL_VRAM:
+   case TTM_PL_TT:
+   return memtype
+   default:
+   return TTM_PL_SYSTEM;
+   }
+}
+
+static uint32_t bo_get_memtype(struct amdgpu_bo *bo) {

That whole function belongs into amdgpu_bo.c
Do you mean bo_get_memtype or fold_memtype? I debated whether 
bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and since 
it's using fold_memtype and only useful for memory stats because of 
folding the private placements I just left them here together with the 
other mem stats code.


I can move it to amdgpu_bo.c make it return the memtype verbatim and 
just fold it when I do the accounting.


I think that folding GDS, GWS and OA into system is also a bug. We 
should really not doing that.


Just wanted to point out for this round that the code to query the 
current placement from a BO should probably go into amdgpu_bo.c and not 
amdgpu_vm.c





+   struct ttm_resource *res = bo->tbo.resource;
+   const uint32_t domain_to_pl[] = {
+   [ilog2(AMDGPU_GEM_DOMAIN_CPU)]  = TTM_PL_SYSTEM,
+   [ilog2(AMDGPU_GEM_DOMAIN_GTT)]  = TTM_PL_TT,
+   [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM,
+   [ilog2(AMDGPU_GEM_DOMAIN_GDS)]  = AMDGPU_PL_GDS,
+   [ilog2(AMDGPU_GEM_DOMAIN_GWS)]  = AMDGPU_PL_GWS,
+   [ilog2(AMDGPU_GEM_DOMAIN_OA)]   = AMDGPU_PL_OA,
+   [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] =

AMDGPU_PL_DOORBELL,

+   };
+   uint32_t domain;
+
+   if (res)
+   return fold_memtype(res->mem_type);
+
+   /*
+    * If no backing store use one of the preferred domain for basic
+    * stats. We take the MSB since that should give a reasonable
+    * view.
+    */
+   BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM <

TTM_PL_SYSTEM);

+   domain = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK);
+   if (drm_WARN_ON_ONCE(&adev->ddev,
+    domain == 0 || --domain >= 
ARRAY_SIZE(domain_to_pl)))
It's perfectly legal to create a BO without a placement. That one 
just won't have a

backing store.

This is lifted from the previous change I'm rebasing onto. I think 
what it’s trying to do is if the BO doesn't have a placement, use the 
"biggest" (VRAM > TT > SYSTEM) preferred placement for the purpose of 
accounting. Previously we just ignore BOs that doesn't have a 
placement. I guess there's argument for going with either approaches.


I was not arguing, I'm simply pointing out a bug. It's perfectly valid 
for bo->preferred_domains to be 0.


So the following WARN_ON() that no bit is set is incorrect.




+   return 0;
+   return fold_memtype(domain_to_pl[domain])
That would need specular execution mitigation if I'm not completely 
mistaken.


Better use a switch/case statement.


Do you mean change the array indexing to a switch statement?


Yes.


Did you mean array_index_nospec? Domain is not a direct userspace input 
and is calculated from the mask which sanitized to allowed values prior 
to this call. So I *think* switch is an overkill but don't mind it 
either. Just commenting FWIW.


Regards,

Tvrtko

Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime

2024-10-23 Thread Tvrtko Ursulin




On 22/10/2024 18:06, Christian König wrote:

Am 22.10.24 um 18:46 schrieb Li, Yunxiang (Teddy):

[Public]

I suppose we could add a field like amd-memory-private: to cover the 
private placements.


No, that is not really appropriate either. GWS, GDS and OA are not 
memory in the first place.


Those BOs are HW blocks which the driver allocated to use.

So accounting them for the memory usage doesn't make any sense at all.

We could print them in the fdinfo as something special for statistics, 
but it's probably not that useful.



  When would a BO not have a placement, is it when it is being moved?


There are BOs which are only temporary, so when they are evicted their 
backing store is just discarded.


Additional to that allocation of backing store is sometimes delayed 
until the first use.


Would this work correctly if instead of preferred allowed mask was used?

Point being, to correctly support fdinfo stats drm-total-, *if* a BO 
*can* have a backing store at any point it should always be counted there.


*If* it currently has a placement it is drm-resident-.

If it has a placement but can be discarded it is drm-purgeable-. Etc.

Regards,

Tvrtko



  Since we are tracking the state changes, I wonder if such situations 
can be avoided now so whenever we call these stat update functions the 
BO would always have a placement.


No, as I said before those use cases are perfectly valid. BO don't need 
a backing store nor do they need a placement.


So the code has to gracefully handle that.

Regards,
Christian.



Teddy

Re: [PATCH v5 2/4] drm/amdgpu: make drm-memory-* report resident memory

2024-10-18 Thread Tvrtko Ursulin




On 18/10/2024 14:33, Yunxiang Li wrote:

The old behavior reports the resident memory usage for this key and the
documentation say so as well. However this was accidentally changed to
include buffers that was evicted.

Fixes: a2529f67e2ed ("drm/amdgpu: Use drm_print_memory_stats helper from 
fdinfo")
Signed-off-by: Yunxiang Li 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 7 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 -
  3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 00a4ab082459f..8281dd45faaa0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -33,6 +33,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "amdgpu.h"

  #include "amdgpu_vm.h"
@@ -95,11 +96,11 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
  
  	/* Legacy amdgpu keys, alias to drm-resident-memory-: */

drm_printf(p, "drm-memory-vram:\t%llu KiB\n",
-  stats[TTM_PL_VRAM].total/1024UL);
+  stats[TTM_PL_VRAM].drm.resident/1024UL);
drm_printf(p, "drm-memory-gtt: \t%llu KiB\n",
-  stats[TTM_PL_TT].total/1024UL);
+  stats[TTM_PL_TT].drm.resident/1024UL);
drm_printf(p, "drm-memory-cpu: \t%llu KiB\n",
-  stats[TTM_PL_SYSTEM].total/1024UL);
+  stats[TTM_PL_SYSTEM].drm.resident/1024UL);
  
  	/* Amdgpu specific memory accounting keys: */

drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 045222b6bd049..2a53e72f3964f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1223,7 +1223,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
  
  	/* DRM stats common fields: */
  
-	stats[type].total += size;

if (drm_gem_object_is_shared_for_memory_stats(obj))
stats[type].drm.shared += size;
else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 7260349917ef0..a5653f474f85c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -142,7 +142,6 @@ struct amdgpu_bo_vm {
  struct amdgpu_mem_stats {
struct drm_memory_stats drm;
  
-	uint64_t total;

uint64_t visible;
uint64_t evicted;
uint64_t evicted_visible;


LGTM, thanks for fixing it!

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko

[PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

2024-10-16 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

In FIFO mode (which is the default), both drm_sched_entity_push_job() and
drm_sched_rq_update_fifo(), where the latter calls the former, are
currently taking and releasing the same entity->rq_lock.

We can avoid that design inelegance, and also have a miniscule
efficiency improvement on the submit from idle path, by introducing a new
drm_sched_rq_update_fifo_locked() helper and pulling up the lock taking to
its callers.

v2:
 * Remove drm_sched_rq_update_fifo() altogether. (Christian)

v3:
 * Improved commit message. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 13 +
 drivers/gpu/drm/scheduler/sched_main.c   |  6 +++---
 include/drm/gpu_scheduler.h  |  2 +-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 2951fcc2e6b1..b72cba292839 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
struct drm_sched_job *next;
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
-   if (next)
-   drm_sched_rq_update_fifo(entity, next->submit_ts);
+   if (next) {
+   spin_lock(&entity->rq_lock);
+   drm_sched_rq_update_fifo_locked(entity,
+   next->submit_ts);
+   spin_unlock(&entity->rq_lock);
+   }
}
 
/* Jobs and entities might have different lifecycles. Since we're
@@ -613,10 +617,11 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
sched = rq->sched;
 
drm_sched_rq_add_entity(rq, entity);
-   spin_unlock(&entity->rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+
+   spin_unlock(&entity->rq_lock);
 
drm_sched_wakeup(sched);
}
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index e32b0f7d7e94..bbd1630407e4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -169,14 +169,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *enti
}
 }
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
 * for entity from within concurrent drm_sched_entity_select_rq and the
 * other to update the rb tree structure.
 */
-   spin_lock(&entity->rq_lock);
+   lockdep_assert_held(&entity->rq_lock);
+
spin_lock(&entity->rq->lock);
 
drm_sched_rq_remove_fifo_locked(entity);
@@ -187,7 +188,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity 
*entity, ktime_t ts)
  drm_sched_entity_compare_before);
 
spin_unlock(&entity->rq->lock);
-   spin_unlock(&entity->rq_lock);
 }
 
 /**
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e9f075f51db3..3658a6cb048e 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity);
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts);
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts);
 
 int drm_sched_entity_init(struct drm_sched_entity *entity,
  enum drm_sched_priority priority,
-- 
2.46.0

[PATCH 2/5] drm/sched: Stop setting current entity in FIFO mode

2024-10-16 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

It does not seem there is a need to set the current entity in FIFO mode
since ot only serves as being a "cursor" in round-robin mode. Even if
scheduling mode is changed at runtime the change in behaviour is simply
to restart from the first entity, instead of continuing in RR mode from
where FIFO left it, and that sounds completely fine.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Acked-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index bbd1630407e4..07ee386b8e4b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -355,7 +355,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler 
*sched,
return ERR_PTR(-ENOSPC);
}

-   rq->current_entity = entity;
reinit_completion(&entity->entity_idle);
break;
}
-- 
2.46.0

[PATCH v2 0/5] Small DRM scheduler improvements

2024-10-16 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Leftovers from the earlier "DRM scheduler fixes and improvements" series.

It looks the fixes have now propagated back to drm-misc-next so this should now
be mergeable.

It also needed a small rebase to account for one revert and one spelling fix
which landed in the meantime.

As a reminder, what remains are kerneldoc improvements, struct layout tweaks for
clarity, one trivial cleanup for the FIFO mode, and most importantly two spin
lock-unlock cycles are removed from the push job path by pulling taking of the
locks one level up.

I smoke tested it on the Steam Deck and lockdep seems happy.

v2:
 * Tweaks to commit messages and rename of some leftover rq_lock naming inside
   kerneldoc.

Cc: Christian König 
Cc: Philipp Stanner 

Tvrtko Ursulin (5):
  drm/sched: Optimise drm_sched_entity_push_job
  drm/sched: Stop setting current entity in FIFO mode
  drm/sched: Re-order struct drm_sched_rq members for clarity
  drm/sched: Re-group and rename the entity run-queue lock
  drm/sched: Further optimise drm_sched_entity_push_job

 drivers/gpu/drm/scheduler/sched_entity.c | 42 +++-
 drivers/gpu/drm/scheduler/sched_main.c   | 32 +-
 include/drm/gpu_scheduler.h  | 34 ++-
 3 files changed, 61 insertions(+), 47 deletions(-)

-- 
2.46.0

[PATCH 5/5] drm/sched: Further optimise drm_sched_entity_push_job

2024-10-16 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and
drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a
parameter to the latter.

v2:
 * Fix after rebase of the series.
 * Avoid naming inconsistency between drm_sched_rq_add/remove. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 11 +++--
 drivers/gpu/drm/scheduler/sched_main.c   | 29 
 include/drm/gpu_scheduler.h  |  3 ++-
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index c013c2b49aa5..69bcf0e99d57 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
+   struct drm_sched_rq *rq;
+
spin_lock(&entity->lock);
-   drm_sched_rq_update_fifo_locked(entity,
+   rq = entity->rq;
+   spin_lock(&rq->lock);
+   drm_sched_rq_update_fifo_locked(entity, rq,
next->submit_ts);
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
}
}
@@ -616,11 +621,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
rq = entity->rq;
sched = rq->sched;
 
+   spin_lock(&rq->lock);
drm_sched_rq_add_entity(rq, entity);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
 
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
 
drm_sched_wakeup(sched);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 2670bf9f34b2..6e4d004d09ce 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -159,17 +159,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);
 }
 
-static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+   struct drm_sched_rq *rq)
 {
-   struct drm_sched_rq *rq = entity->rq;
-
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
 }
 
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq,
+ktime_t ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
@@ -177,17 +178,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts
 * other to update the rb tree structure.
 */
lockdep_assert_held(&entity->lock);
+   lockdep_assert_held(&rq->lock);
 
-   spin_lock(&entity->rq->lock);
-
-   drm_sched_rq_remove_fifo_locked(entity);
+   drm_sched_rq_remove_fifo_locked(entity, rq);
 
entity->oldest_job_waiting = ts;
 
-   rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+   rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
  drm_sched_entity_compare_before);
-
-   spin_unlock(&entity->rq->lock);
 }
 
 /**
@@ -219,15 +217,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler 
*sched,
 void drm_sched_rq_ad

[PATCH 3/5] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-10-16 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Current kerneldoc for struct drm_sched_rq incompletely documents what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by the lock.

While at it, lets also re-order the members so all protected by the lock
are in a single group.

v2:
 * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 include/drm/gpu_scheduler.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 3658a6cb048e..b6d095074c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -243,10 +243,10 @@ struct drm_sched_entity {
 /**
  * struct drm_sched_rq - queue of entities to be scheduled.
  *
- * @lock: to modify the entities list.
  * @sched: the scheduler to which this rq belongs to.
- * @entities: list of the entities to be scheduled.
+ * @lock: protects @entities, @rb_tree_root and @current_entity.
  * @current_entity: the entity which is to be scheduled.
+ * @entities: list of the entities to be scheduled.
  * @rb_tree_root: root of time based priority queue of entities for FIFO 
scheduling
  *
  * Run queue is a set of entities scheduling command submissions for
@@ -254,10 +254,12 @@ struct drm_sched_entity {
  * the next entity to emit commands from.
  */
 struct drm_sched_rq {
-   spinlock_t  lock;
struct drm_gpu_scheduler*sched;
-   struct list_headentities;
+
+   spinlock_t  lock;
+   /* Following members are protected by the @lock: */
struct drm_sched_entity *current_entity;
+   struct list_headentities;
struct rb_root_cached   rb_tree_root;
 };
 
-- 
2.46.0

[PATCH 4/5] drm/sched: Re-group and rename the entity run-queue lock

2024-10-16 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

When writing to a drm_sched_entity's run-queue, writers are protected
through the lock drm_sched_entity.rq_lock. This naming, however,
frequently collides with the separate internal lock of struct
drm_sched_rq, resulting in uses like this:

spin_lock(&entity->rq_lock);
spin_lock(&entity->rq->lock);

Rename drm_sched_entity.rq_lock to improve readability. While at it,
re-order that struct's members to make it more obvious what the lock
protects.

v2:
 * Rename some rq_lock straddlers in kerneldoc, improve commit text. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Suggested-by: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 28 
 drivers/gpu/drm/scheduler/sched_main.c   |  2 +-
 include/drm/gpu_scheduler.h  | 21 +-
 3 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index b72cba292839..c013c2b49aa5 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
/* We start in an idle state. */
complete_all(&entity->entity_idle);
 
-   spin_lock_init(&entity->rq_lock);
+   spin_lock_init(&entity->lock);
spsc_queue_init(&entity->job_queue);
 
atomic_set(&entity->fence_seq, 0);
@@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct 
drm_sched_entity *entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
@@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity 
*entity)
if (!entity->rq)
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->stopped = true;
drm_sched_rq_remove_entity(entity->rq, entity);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
/* Make sure this entity is not used by the scheduler at the moment */
wait_for_completion(&entity->entity_idle);
@@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
   enum drm_sched_priority priority)
 {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->priority = priority;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_set_priority);
 
@@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
drm_sched_rq_update_fifo_locked(entity,
next->submit_ts);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
}
}
 
@@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity 
*entity)
if (fence && !dma_fence_is_signaled(fence))
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
rq = sched ? sched->sched_rq[entity->priority] : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
}
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
if (entity->num_sched_list == 1)
entity->sched_list = NULL;
@@ -605,9 +605,9 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
struct drm_sched_rq *rq;
 
/* Add the entity to the run queue */
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
if (entity->stopped) {
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
DRM_ERROR("Trying to push t

Re: [RFC PATCH 1/2] drm/drm_file: Add display of driver's internal memory size

2024-10-16 Thread Tvrtko Ursulin




On 15/10/2024 20:05, Adrián Larumbe wrote:

Hi Tvrtko,

On 10.10.2024 10:50, Tvrtko Ursulin wrote:


On 09/10/2024 23:55, Adrián Larumbe wrote:

Hi Tvrtko,

On 04.10.2024 14:41, Tvrtko Ursulin wrote:


Hi Adrian,

On 03/10/2024 00:45, Adrián Larumbe wrote:

Some drivers must allocate a considerable amount of memory for bookkeeping
structures and GPU's MCU-kernel shared communication regions. These are
often created as a result of the invocation of the driver's ioctl()
interface functions, so it is sensible to consider them as being owned by
the render context associated with an open drm file.

However, at the moment drm_show_memory_stats only traverses the UM-exposed
drm objects for which a handle exists. Private driver objects and memory
regions, though connected to a render context, are unaccounted for in their
fdinfo numbers.

Add a new drm_memory_stats 'internal' memory category.

Because deciding what constitutes an 'internal' object and where to find
these are driver-dependent, calculation of this size must be done through a
driver-provided function pointer, which becomes the third argument of
drm_show_memory_stats. Drivers which have no interest in exposing the size
of internal memory objects can keep passing NULL for unaltered behaviour.

Signed-off-by: Adrián Larumbe 
Cc: Rob Clark 
Cc: Tvrtko Ursulin 
Cc: Lucas De Marchi 
---
drivers/gpu/drm/drm_file.c  | 6 +-
drivers/gpu/drm/msm/msm_drv.c   | 2 +-
drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
drivers/gpu/drm/v3d/v3d_drv.c   | 2 +-
include/drm/drm_file.h  | 7 ++-
5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ad1dc638c83b..937471339c9a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -856,6 +856,7 @@ void drm_print_memory_stats(struct drm_printer *p,
print_size(p, "total", region, stats->private + stats->shared);
print_size(p, "shared", region, stats->shared);
print_size(p, "active", region, stats->active);
+   print_size(p, "internal", region, stats->internal);
if (supported_status & DRM_GEM_OBJECT_RESIDENT)
print_size(p, "resident", region, stats->resident);
@@ -873,7 +874,7 @@ EXPORT_SYMBOL(drm_print_memory_stats);
 * Helper to iterate over GEM objects with a handle allocated in the 
specified
 * file.
 */
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, 
internal_bos func)
{
struct drm_gem_object *obj;
struct drm_memory_stats status = {};
@@ -919,6 +920,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
spin_unlock(&file->table_lock);
+   if (func)
+   func(&status, file);
+
drm_print_memory_stats(p, &status, supported_status, "memory");
}
EXPORT_SYMBOL(drm_show_memory_stats);
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index edbc1ab0fbc8..2b3feb79afc4 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
}
static const struct file_operations fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 04d615df5259..aaa8602bf00d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -609,7 +609,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
}
static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index fb35c5c3f1a7..314e77c67972 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -195,7 +195,7 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
   v3d_queue_to_string(queue), jobs_completed);
}
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
}
static const struct file_operations v3d_drm_fops = {
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 8c0030c77308..661d00d5350e 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -469,6 +469,7 @@ void drm_send_event_timestamp_locked(struct drm_device *dev,
 * @resident: Total size of GE

Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

2024-10-16 Thread Tvrtko Ursulin




On 15/10/2024 15:00, Philipp Stanner wrote:

On Tue, 2024-10-15 at 14:14 +0100, Tvrtko Ursulin wrote:


On 15/10/2024 12:38, Philipp Stanner wrote:

On Tue, 2024-10-15 at 09:12 +0100, Tvrtko Ursulin wrote:


On 15/10/2024 08:11, Philipp Stanner wrote:

On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote:


On 14/10/2024 12:32, Philipp Stanner wrote:

Hi,

On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to
immediately
re-
acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.



Please write detailed commit messages, as described here
[1].
   1. Describe the problem: current state and why it's
bad.
   2. Then, describe in imperative (present tense) form
what
the
commit
  does about the problem.


Both pieces of info are already there:

1. Drops the lock to immediately re-acquire it.
2. We avoid that by by adding a locked helper.

Optionally, in between can be information about why it's
solved
this
way and not another etc.

Applies to the other patches, too.


[1]
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes


Thanks I am new here and did not know this.

Seriosuly, lets not be too blindly strict about this because
it
can
get
IMO ridiculous.

One example when I previously accomodated your request is
patch
3/5
from
this series:

"""
Current kerneldoc for struct drm_sched_rq incompletely
documents
what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected
by
the
lock.
"""

While this was the original commit text you weren't happy
with:

"""
drm/sched: Re-order struct drm_sched_rq members for clarity

Lets re-order the members to make it clear which are
protected by
the
lock
and at the same time document it via kerneldoc.
"""

I maintain the original text was passable.

On top, this was just a respin to accomodate the merge
process.
All
approvals were done and dusted couple weeks or so ago so
asking
for
yet
another respin for such trivial objections is not great.


I understand that you're unhappy, but please understand the
position
I'm coming from. As you know, since you sent these patches
within a
different series (and, thus, since I reviewed them), I was
trusted
with
co-maintaining this piece of shared infrastructure.

And since you've worked on it a bit now, I suppose you also
know
that
the GPU Scheduler is arguably in quite a bad shape, has far too
little
documentation, has leaks, maybe race conditions, parts *where
the
locking rules are unclear* and is probably only fully
understood by
a
small hand full of people. I also argue that this is a *very*
complicated piece of software.


We already went over that and agreed. Not least I agreed the base
is
shaky since few years  ago. :)

Btw if things align, I hope you will at some point see a follow
up
series from me which makes some significant simplifications and
improvements at the same time.


Cool, good to hear!
(Would be even cooler if simplifications and improvements can be
delivered through separate patch series to be easier to review
etc.)


Yes, when I spot something I pull it ahead and/or standalone when it
makes sense. But it is early days and a big job.


So I might be or appear to be a bit pedantic, but I'm not doing
that to
terrorize you, but because I want this thing to become well
documented,
understandable, and bisectable. Working towards a canonical,
idiot-
proof commit style is one measure that will help with that.

I want to offer you the following: I can be more relaxed with
things
universally recognized as trivial (comment changes, struct
member
reordering) – but when something like a lock is touched in any
way,
we
shall document that in the commit message as canonically as
possible,
so someone who's less experienced and just bisected the commit
immediately understands what has been done (or rather: was
supposed
to
be done).


So how would you suggest to expand this commit text so it doesn't
read
too self-repeating?


My issue with this particular commit message is mainly that it
doesn't
make it obvious what the patch is supposed to do. So one can make
it
quicker and better to review by detailing it a bit more, so the
reviewer then can compare commit message vs. what the code does. It
seems to me for example that the actual optimization is being done
in
drm_sched_entity_push_job(), and drm_sched_entity_pop_job() had to
be
ported, too, for correctness


"It seems" aka the commit title says so. ;)


Another small thing that might be cool is something that makes it a
bit
more obvious that this is an optimization, not a fix.

So I would probably write:

"So far, drm_sched_rq_update_fifo() automatically takes
drm_sched_entity.rq_lock. For DRM_SCHED_POLICY_FIFO, this is
ineffic

Re: [PATCH 4/5] drm/sched: Re-group and rename the entity run-queue lock

2024-10-15 Thread Tvrtko Ursulin




On 15/10/2024 12:56, Philipp Stanner wrote:

On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Christian suggested to rename the lock and improve the documentation


Let's move it to Annotators:
Suggested-by: Christian König 


Ack.


(Otherwise some time in the future a Christian Kaiser might start
working on the scheduler on steal the praise ^^)


of
what it protects.


So without Christian's name here I'd phrase it as:
"When writing to a drm_sched_entity's run-queue, writers are protected
through the lock drm_sched_entity.rq_lock. This naming, however,
frequently collides with the separate internal lock of struct
drm_sched_rq, resulting in uses like this:

spin_lock(&entity->rq_lock);
spin_lock(&entity->rq->lock);

Rename drm_sched_entity.rq_lock to improve readability. While at it,
re-order that struct's members to make it more obvious what the lock
protects.


Will copy&paste - thanks for typing it out.


And to also re-order the structure members so all
protected by the lock are together in a block.





Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 28 --
--
  drivers/gpu/drm/scheduler/sched_main.c   |  2 +-
  include/drm/gpu_scheduler.h  | 15 +++--
  3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
b/drivers/gpu/drm/scheduler/sched_entity.c
index b72cba292839..c013c2b49aa5 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity
*entity,
    /* We start in an idle state. */
    complete_all(&entity->entity_idle);
  
-	spin_lock_init(&entity->rq_lock);

+   spin_lock_init(&entity->lock);
    spsc_queue_init(&entity->job_queue);
  
  	atomic_set(&entity->fence_seq, 0);

@@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct
drm_sched_entity *entity,
  {
    WARN_ON(!num_sched_list || !sched_list);
  
-	spin_lock(&entity->rq_lock);

+   spin_lock(&entity->lock);
    entity->sched_list = sched_list;
    entity->num_sched_list = num_sched_list;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
  }
  EXPORT_SYMBOL(drm_sched_entity_modify_sched);
  
@@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct

drm_sched_entity *entity)
    if (!entity->rq)
    return;
  
-	spin_lock(&entity->rq_lock);

+   spin_lock(&entity->lock);
    entity->stopped = true;
    drm_sched_rq_remove_entity(entity->rq, entity);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
  
  	/* Make sure this entity is not used by the scheduler at the

moment */
    wait_for_completion(&entity->entity_idle);
@@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct
dma_fence *f,
  void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
       enum drm_sched_priority priority)
  {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
    entity->priority = priority;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
  }
  EXPORT_SYMBOL(drm_sched_entity_set_priority);
  
@@ -515,10 +515,10 @@ struct drm_sched_job

*drm_sched_entity_pop_job(struct drm_sched_entity *entity)
  
  		next = to_drm_sched_job(spsc_queue_peek(&entity-

job_queue));

    if (next) {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
    drm_sched_rq_update_fifo_locked(entity,
    next-

submit_ts);

-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
    }
    }
  
@@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct

drm_sched_entity *entity)
    if (fence && !dma_fence_is_signaled(fence))
    return;
  
-	spin_lock(&entity->rq_lock);

+   spin_lock(&entity->lock);
    sched = drm_sched_pick_best(entity->sched_list, entity-

num_sched_list);

    rq = sched ? sched->sched_rq[entity->priority] : NULL;
    if (rq != entity->rq) {
    drm_sched_rq_remove_entity(entity->rq, entity);
    entity->rq = rq;
    }
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
  
  	if (entity->num_sched_list == 1)

    entity->sched_list = NULL;
@@ -60

Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

2024-10-15 Thread Tvrtko Ursulin




On 15/10/2024 12:38, Philipp Stanner wrote:

On Tue, 2024-10-15 at 09:12 +0100, Tvrtko Ursulin wrote:


On 15/10/2024 08:11, Philipp Stanner wrote:

On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote:


On 14/10/2024 12:32, Philipp Stanner wrote:

Hi,

On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to
immediately
re-
acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.



Please write detailed commit messages, as described here [1].
  1. Describe the problem: current state and why it's bad.
  2. Then, describe in imperative (present tense) form what
the
commit
     does about the problem.


Both pieces of info are already there:

1. Drops the lock to immediately re-acquire it.
2. We avoid that by by adding a locked helper.

Optionally, in between can be information about why it's solved
this
way and not another etc.

Applies to the other patches, too.


[1]
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes


Thanks I am new here and did not know this.

Seriosuly, lets not be too blindly strict about this because it
can
get
IMO ridiculous.

One example when I previously accomodated your request is patch
3/5
from
this series:

"""
Current kerneldoc for struct drm_sched_rq incompletely documents
what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by
the
lock.
"""

While this was the original commit text you weren't happy with:

"""
drm/sched: Re-order struct drm_sched_rq members for clarity

Lets re-order the members to make it clear which are protected by
the
lock
and at the same time document it via kerneldoc.
"""

I maintain the original text was passable.

On top, this was just a respin to accomodate the merge process.
All
approvals were done and dusted couple weeks or so ago so asking
for
yet
another respin for such trivial objections is not great.


I understand that you're unhappy, but please understand the
position
I'm coming from. As you know, since you sent these patches within a
different series (and, thus, since I reviewed them), I was trusted
with
co-maintaining this piece of shared infrastructure.

And since you've worked on it a bit now, I suppose you also know
that
the GPU Scheduler is arguably in quite a bad shape, has far too
little
documentation, has leaks, maybe race conditions, parts *where the
locking rules are unclear* and is probably only fully understood by
a
small hand full of people. I also argue that this is a *very*
complicated piece of software.


We already went over that and agreed. Not least I agreed the base is
shaky since few years  ago. :)

Btw if things align, I hope you will at some point see a follow up
series from me which makes some significant simplifications and
improvements at the same time.


Cool, good to hear!
(Would be even cooler if simplifications and improvements can be
delivered through separate patch series to be easier to review etc.)


Yes, when I spot something I pull it ahead and/or standalone when it 
makes sense. But it is early days and a big job.



So I might be or appear to be a bit pedantic, but I'm not doing
that to
terrorize you, but because I want this thing to become well
documented,
understandable, and bisectable. Working towards a canonical, idiot-
proof commit style is one measure that will help with that.

I want to offer you the following: I can be more relaxed with
things
universally recognized as trivial (comment changes, struct member
reordering) – but when something like a lock is touched in any way,
we
shall document that in the commit message as canonically as
possible,
so someone who's less experienced and just bisected the commit
immediately understands what has been done (or rather: was supposed
to
be done).


So how would you suggest to expand this commit text so it doesn't
read
too self-repeating?


My issue with this particular commit message is mainly that it doesn't
make it obvious what the patch is supposed to do. So one can make it
quicker and better to review by detailing it a bit more, so the
reviewer then can compare commit message vs. what the code does. It
seems to me for example that the actual optimization is being done in
drm_sched_entity_push_job(), and drm_sched_entity_pop_job() had to be
ported, too, for correctness


"It seems" aka the commit title says so. ;)


Another small thing that might be cool is something that makes it a bit
more obvious that this is an optimization, not a fix.

So I would probably write:

"So far, drm_sched_rq_update_fifo() automatically takes
drm_sched_entity.rq_lock. For DRM_SCHED_POLICY_FIFO, this is
inefficient because that lock is then taken, released and retaken in
drm_sched_entity_push_job().

Improve p

Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

2024-10-15 Thread Tvrtko Ursulin




On 15/10/2024 08:11, Philipp Stanner wrote:

On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote:


On 14/10/2024 12:32, Philipp Stanner wrote:

Hi,

On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to immediately
re-
acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.



Please write detailed commit messages, as described here [1].
     1. Describe the problem: current state and why it's bad.
     2. Then, describe in imperative (present tense) form what the
commit
    does about the problem.


Both pieces of info are already there:

1. Drops the lock to immediately re-acquire it.
2. We avoid that by by adding a locked helper.

Optionally, in between can be information about why it's solved
this
way and not another etc.

Applies to the other patches, too.


[1]
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes


Thanks I am new here and did not know this.

Seriosuly, lets not be too blindly strict about this because it can
get
IMO ridiculous.

One example when I previously accomodated your request is patch 3/5
from
this series:

"""
Current kerneldoc for struct drm_sched_rq incompletely documents what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by the
lock.
"""

While this was the original commit text you weren't happy with:

"""
drm/sched: Re-order struct drm_sched_rq members for clarity

Lets re-order the members to make it clear which are protected by the
lock
and at the same time document it via kerneldoc.
"""

I maintain the original text was passable.

On top, this was just a respin to accomodate the merge process. All
approvals were done and dusted couple weeks or so ago so asking for
yet
another respin for such trivial objections is not great.


I understand that you're unhappy, but please understand the position
I'm coming from. As you know, since you sent these patches within a
different series (and, thus, since I reviewed them), I was trusted with
co-maintaining this piece of shared infrastructure.

And since you've worked on it a bit now, I suppose you also know that
the GPU Scheduler is arguably in quite a bad shape, has far too little
documentation, has leaks, maybe race conditions, parts *where the
locking rules are unclear* and is probably only fully understood by a
small hand full of people. I also argue that this is a *very*
complicated piece of software.


We already went over that and agreed. Not least I agreed the base is 
shaky since few years  ago. :)


Btw if things align, I hope you will at some point see a follow up 
series from me which makes some significant simplifications and 
improvements at the same time.

So I might be or appear to be a bit pedantic, but I'm not doing that to
terrorize you, but because I want this thing to become well documented,
understandable, and bisectable. Working towards a canonical, idiot-
proof commit style is one measure that will help with that.

I want to offer you the following: I can be more relaxed with things
universally recognized as trivial (comment changes, struct member
reordering) – but when something like a lock is touched in any way, we
shall document that in the commit message as canonically as possible,
so someone who's less experienced and just bisected the commit
immediately understands what has been done (or rather: was supposed to
be done).


So how would you suggest to expand this commit text so it doesn't read 
too self-repeating?


Regards,

Tvrtko

Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

2024-10-14 Thread Tvrtko Ursulin




On 14/10/2024 12:32, Philipp Stanner wrote:

Hi,

On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to immediately re-
acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.



Please write detailed commit messages, as described here [1].
1. Describe the problem: current state and why it's bad.
2. Then, describe in imperative (present tense) form what the commit
   does about the problem.


Both pieces of info are already there:

1. Drops the lock to immediately re-acquire it.
2. We avoid that by by adding a locked helper.

Optionally, in between can be information about why it's solved this
way and not another etc.

Applies to the other patches, too.


[1] 
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes


Thanks I am new here and did not know this.

Seriosuly, lets not be too blindly strict about this because it can get 
IMO ridiculous.


One example when I previously accomodated your request is patch 3/5 from 
this series:


"""
Current kerneldoc for struct drm_sched_rq incompletely documents what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by the lock.
"""

While this was the original commit text you weren't happy with:

"""
drm/sched: Re-order struct drm_sched_rq members for clarity

Lets re-order the members to make it clear which are protected by the
lock
and at the same time document it via kerneldoc.
"""

I maintain the original text was passable.

On top, this was just a respin to accomodate the merge process. All 
approvals were done and dusted couple weeks or so ago so asking for yet 
another respin for such trivial objections is not great.


Regards,

Tvrtko


v2:
  * Remove drm_sched_rq_update_fifo() altogether. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 13 +
  drivers/gpu/drm/scheduler/sched_main.c   |  6 +++---
  include/drm/gpu_scheduler.h  |  2 +-
  3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
b/drivers/gpu/drm/scheduler/sched_entity.c
index 2951fcc2e6b1..b72cba292839 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -514,8 +514,12 @@ struct drm_sched_job
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)
   struct drm_sched_job *next;
  
   next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));

- if (next)
- drm_sched_rq_update_fifo(entity, next->submit_ts);
+ if (next) {
+ spin_lock(&entity->rq_lock);
+ drm_sched_rq_update_fifo_locked(entity,
+ next->submit_ts);
+ spin_unlock(&entity->rq_lock);
+ }
   }
  
   /* Jobs and entities might have different lifecycles. Since we're

@@ -613,10 +617,11 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job)
   sched = rq->sched;
  
   drm_sched_rq_add_entity(rq, entity);

- spin_unlock(&entity->rq_lock);
  
   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)

- drm_sched_rq_update_fifo(entity, submit_ts);
+ drm_sched_rq_update_fifo_locked(entity, submit_ts);
+
+ spin_unlock(&entity->rq_lock);
  
   drm_sched_wakeup(sched);

   }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index e32b0f7d7e94..bbd1630407e4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -169,14 +169,15 @@ static inline void
drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti
   }
  }
  
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity,

ktime_t ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity
*entity, ktime_t ts)


Since you touch function name / signature already, would you mind
writing a small doc string that also mentions the locking requirements
or lack of the same?


  {
   /*
   * Both locks need to be grabbed, one to protect from entity->rq
change
   * for entity from within concurrent drm_sched_entity_select_rq and
the
   * other to update the rb tree structure.
   */


It seems to me that the comment above is now out of date, no?


Thx for your efforts,
P.


- spin_lock(&entity->rq_lock);
+ lockdep_assert_held(&entity->rq_lock);
+
   spin_lock(&entity->rq->lock);
  
   drm_sched_rq_remove_fifo_locked(entity);

@@ -187,7 +188,6 @@ void drm_sched_rq_update_fifo(struct
drm_sched_entity *entity, ktime_t ts)
     drm_sched_entity_compare_before);
  
   spin_unlock(&entity->rq->lock);

- spin_unlock(&entity->rq_lock);
  }
  
  /**

diff --git a/include/drm/gpu_scheduler.h
b/include/drm/gpu_

[PATCH 3/5] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-10-14 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Current kerneldoc for struct drm_sched_rq incompletely documents what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by the lock.

While at it, lets also re-order the members so all protected by the lock
are in a single group.

v2:
 * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 include/drm/gpu_scheduler.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 3658a6cb048e..b6d095074c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -243,10 +243,10 @@ struct drm_sched_entity {
 /**
  * struct drm_sched_rq - queue of entities to be scheduled.
  *
- * @lock: to modify the entities list.
  * @sched: the scheduler to which this rq belongs to.
- * @entities: list of the entities to be scheduled.
+ * @lock: protects @entities, @rb_tree_root and @current_entity.
  * @current_entity: the entity which is to be scheduled.
+ * @entities: list of the entities to be scheduled.
  * @rb_tree_root: root of time based priority queue of entities for FIFO 
scheduling
  *
  * Run queue is a set of entities scheduling command submissions for
@@ -254,10 +254,12 @@ struct drm_sched_entity {
  * the next entity to emit commands from.
  */
 struct drm_sched_rq {
-   spinlock_t  lock;
struct drm_gpu_scheduler*sched;
-   struct list_headentities;
+
+   spinlock_t  lock;
+   /* Following members are protected by the @lock: */
struct drm_sched_entity *current_entity;
+   struct list_headentities;
struct rb_root_cached   rb_tree_root;
 };
 
-- 
2.46.0

[PATCH 4/5] drm/sched: Re-group and rename the entity run-queue lock

2024-10-14 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Christian suggested to rename the lock and improve the documentation of
what it protects. And to also re-order the structure members so all
protected by the lock are together in a block.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 28 
 drivers/gpu/drm/scheduler/sched_main.c   |  2 +-
 include/drm/gpu_scheduler.h  | 15 +++--
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index b72cba292839..c013c2b49aa5 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
/* We start in an idle state. */
complete_all(&entity->entity_idle);
 
-   spin_lock_init(&entity->rq_lock);
+   spin_lock_init(&entity->lock);
spsc_queue_init(&entity->job_queue);
 
atomic_set(&entity->fence_seq, 0);
@@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct 
drm_sched_entity *entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
@@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity 
*entity)
if (!entity->rq)
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->stopped = true;
drm_sched_rq_remove_entity(entity->rq, entity);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
/* Make sure this entity is not used by the scheduler at the moment */
wait_for_completion(&entity->entity_idle);
@@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
   enum drm_sched_priority priority)
 {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->priority = priority;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_set_priority);
 
@@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
drm_sched_rq_update_fifo_locked(entity,
next->submit_ts);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
}
}
 
@@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity 
*entity)
if (fence && !dma_fence_is_signaled(fence))
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
rq = sched ? sched->sched_rq[entity->priority] : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
}
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
if (entity->num_sched_list == 1)
entity->sched_list = NULL;
@@ -605,9 +605,9 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
struct drm_sched_rq *rq;
 
/* Add the entity to the run queue */
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
if (entity->stopped) {
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
DRM_ERROR("Trying to push to a killed entity\n");
return;
@@ -621,7 +621,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo_locked(entity, submit_ts);
 
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);

[PATCH 2/5] drm/sched: Stop setting current entity in FIFO mode

2024-10-14 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

It does not seem there is a need to set the current entity in FIFO mode
since ot only serves as being a "cursor" in round-robin mode. Even if
scheduling mode is changed at runtime the change in behaviour is simply
to restart from the first entity, instead of continuing in RR mode from
where FIFO left it, and that sounds completely fine.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Acked-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index bbd1630407e4..07ee386b8e4b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -355,7 +355,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler 
*sched,
return ERR_PTR(-ENOSPC);
}

-   rq->current_entity = entity;
reinit_completion(&entity->entity_idle);
break;
}
-- 
2.46.0

[PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job

2024-10-14 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to immediately re-acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.

v2:
 * Remove drm_sched_rq_update_fifo() altogether. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 13 +
 drivers/gpu/drm/scheduler/sched_main.c   |  6 +++---
 include/drm/gpu_scheduler.h  |  2 +-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 2951fcc2e6b1..b72cba292839 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
struct drm_sched_job *next;
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
-   if (next)
-   drm_sched_rq_update_fifo(entity, next->submit_ts);
+   if (next) {
+   spin_lock(&entity->rq_lock);
+   drm_sched_rq_update_fifo_locked(entity,
+   next->submit_ts);
+   spin_unlock(&entity->rq_lock);
+   }
}
 
/* Jobs and entities might have different lifecycles. Since we're
@@ -613,10 +617,11 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
sched = rq->sched;
 
drm_sched_rq_add_entity(rq, entity);
-   spin_unlock(&entity->rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+
+   spin_unlock(&entity->rq_lock);
 
drm_sched_wakeup(sched);
}
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index e32b0f7d7e94..bbd1630407e4 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -169,14 +169,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *enti
}
 }
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
 * for entity from within concurrent drm_sched_entity_select_rq and the
 * other to update the rb tree structure.
 */
-   spin_lock(&entity->rq_lock);
+   lockdep_assert_held(&entity->rq_lock);
+
spin_lock(&entity->rq->lock);
 
drm_sched_rq_remove_fifo_locked(entity);
@@ -187,7 +188,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity 
*entity, ktime_t ts)
  drm_sched_entity_compare_before);
 
spin_unlock(&entity->rq->lock);
-   spin_unlock(&entity->rq_lock);
 }
 
 /**
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index e9f075f51db3..3658a6cb048e 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity);
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts);
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts);
 
 int drm_sched_entity_init(struct drm_sched_entity *entity,
  enum drm_sched_priority priority,
-- 
2.46.0

[PATCH 5/5] drm/sched: Further optimise drm_sched_entity_push_job

2024-10-14 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and
drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a
parameter to the latter.

v2:
 * Fix after rebase of the series.
 * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 11 +++--
 drivers/gpu/drm/scheduler/sched_main.c   | 29 
 include/drm/gpu_scheduler.h  |  3 ++-
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index c013c2b49aa5..69bcf0e99d57 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
+   struct drm_sched_rq *rq;
+
spin_lock(&entity->lock);
-   drm_sched_rq_update_fifo_locked(entity,
+   rq = entity->rq;
+   spin_lock(&rq->lock);
+   drm_sched_rq_update_fifo_locked(entity, rq,
next->submit_ts);
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
}
}
@@ -616,11 +621,13 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
rq = entity->rq;
sched = rq->sched;
 
+   spin_lock(&rq->lock);
drm_sched_rq_add_entity(rq, entity);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
 
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
 
drm_sched_wakeup(sched);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 2670bf9f34b2..6e4d004d09ce 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -159,17 +159,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);
 }
 
-static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+   struct drm_sched_rq *rq)
 {
-   struct drm_sched_rq *rq = entity->rq;
-
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
 }
 
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq,
+ktime_t ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
@@ -177,17 +178,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts
 * other to update the rb tree structure.
 */
lockdep_assert_held(&entity->lock);
+   lockdep_assert_held(&rq->lock);
 
-   spin_lock(&entity->rq->lock);
-
-   drm_sched_rq_remove_fifo_locked(entity);
+   drm_sched_rq_remove_fifo_locked(entity, rq);
 
entity->oldest_job_waiting = ts;
 
-   rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+   rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
  drm_sched_entity_compare_before);
-
-   spin_unlock(&entity->rq->lock);
 }
 
 /**
@@ -219,15 +217,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler 
*sched,
 void drm_sched_rq_ad

[PATCH 0/5] Small DRM scheduler improvements

2024-10-14 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Leftovers from the earlier "DRM scheduler fixes and improvements" series.

It looks the fixes have now propagated back to drm-misc-next so this should now
be mergeable.

It also needed a small rebase to account for one revert and one spelling fix
which landed in the meantime.

As a reminder, what remains are kerneldoc improvements, struct layout tweaks for
clarity, one trivial cleanup for the FIFO mode, and most importantly two spin
lock-unlock cycles are removed from the push job path by pulling taking of the
locks one level up.

I smoke tested it on the Steam Deck and lockdep seems happy.

Cc: Christian König 
Cc: Philipp Stanner 

Tvrtko Ursulin (5):
  drm/sched: Optimise drm_sched_entity_push_job
  drm/sched: Stop setting current entity in FIFO mode
  drm/sched: Re-order struct drm_sched_rq members for clarity
  drm/sched: Re-group and rename the entity run-queue lock
  drm/sched: Further optimise drm_sched_entity_push_job

 drivers/gpu/drm/scheduler/sched_entity.c | 42 +++-
 drivers/gpu/drm/scheduler/sched_main.c   | 32 +-
 include/drm/gpu_scheduler.h  | 28 +---
 3 files changed, 58 insertions(+), 44 deletions(-)

-- 
2.46.0

Re: [RFC PATCH 1/2] drm/drm_file: Add display of driver's internal memory size

2024-10-10 Thread Tvrtko Ursulin




On 09/10/2024 23:55, Adrián Larumbe wrote:

Hi Tvrtko,

On 04.10.2024 14:41, Tvrtko Ursulin wrote:


Hi Adrian,

On 03/10/2024 00:45, Adrián Larumbe wrote:

Some drivers must allocate a considerable amount of memory for bookkeeping
structures and GPU's MCU-kernel shared communication regions. These are
often created as a result of the invocation of the driver's ioctl()
interface functions, so it is sensible to consider them as being owned by
the render context associated with an open drm file.

However, at the moment drm_show_memory_stats only traverses the UM-exposed
drm objects for which a handle exists. Private driver objects and memory
regions, though connected to a render context, are unaccounted for in their
fdinfo numbers.

Add a new drm_memory_stats 'internal' memory category.

Because deciding what constitutes an 'internal' object and where to find
these are driver-dependent, calculation of this size must be done through a
driver-provided function pointer, which becomes the third argument of
drm_show_memory_stats. Drivers which have no interest in exposing the size
of internal memory objects can keep passing NULL for unaltered behaviour.

Signed-off-by: Adrián Larumbe 
Cc: Rob Clark 
Cc: Tvrtko Ursulin 
Cc: Lucas De Marchi 
---
   drivers/gpu/drm/drm_file.c  | 6 +-
   drivers/gpu/drm/msm/msm_drv.c   | 2 +-
   drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
   drivers/gpu/drm/v3d/v3d_drv.c   | 2 +-
   include/drm/drm_file.h  | 7 ++-
   5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ad1dc638c83b..937471339c9a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -856,6 +856,7 @@ void drm_print_memory_stats(struct drm_printer *p,
print_size(p, "total", region, stats->private + stats->shared);
print_size(p, "shared", region, stats->shared);
print_size(p, "active", region, stats->active);
+   print_size(p, "internal", region, stats->internal);
if (supported_status & DRM_GEM_OBJECT_RESIDENT)
print_size(p, "resident", region, stats->resident);
@@ -873,7 +874,7 @@ EXPORT_SYMBOL(drm_print_memory_stats);
* Helper to iterate over GEM objects with a handle allocated in the 
specified
* file.
*/
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, 
internal_bos func)
   {
struct drm_gem_object *obj;
struct drm_memory_stats status = {};
@@ -919,6 +920,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
spin_unlock(&file->table_lock);
+   if (func)
+   func(&status, file);
+
drm_print_memory_stats(p, &status, supported_status, "memory");
   }
   EXPORT_SYMBOL(drm_show_memory_stats);
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index edbc1ab0fbc8..2b3feb79afc4 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
   }
   static const struct file_operations fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 04d615df5259..aaa8602bf00d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -609,7 +609,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
   }
   static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index fb35c5c3f1a7..314e77c67972 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -195,7 +195,7 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
   v3d_queue_to_string(queue), jobs_completed);
}
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
   }
   static const struct file_operations v3d_drm_fops = {
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 8c0030c77308..661d00d5350e 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -469,6 +469,7 @@ void drm_send_event_timestamp_locked(struct drm_device *dev,
* @resident: Total size of GEM objects backing pages
* @purgeable: Total size of GEM objects that can be purged (resident and 
not active)
* @ac

Re: [RFC PATCH 1/2] drm/drm_file: Add display of driver's internal memory size

2024-10-04 Thread Tvrtko Ursulin




Hi Adrian,

On 03/10/2024 00:45, Adrián Larumbe wrote:

Some drivers must allocate a considerable amount of memory for bookkeeping
structures and GPU's MCU-kernel shared communication regions. These are
often created as a result of the invocation of the driver's ioctl()
interface functions, so it is sensible to consider them as being owned by
the render context associated with an open drm file.

However, at the moment drm_show_memory_stats only traverses the UM-exposed
drm objects for which a handle exists. Private driver objects and memory
regions, though connected to a render context, are unaccounted for in their
fdinfo numbers.

Add a new drm_memory_stats 'internal' memory category.

Because deciding what constitutes an 'internal' object and where to find
these are driver-dependent, calculation of this size must be done through a
driver-provided function pointer, which becomes the third argument of
drm_show_memory_stats. Drivers which have no interest in exposing the size
of internal memory objects can keep passing NULL for unaltered behaviour.

Signed-off-by: Adrián Larumbe 
Cc: Rob Clark 
Cc: Tvrtko Ursulin 
Cc: Lucas De Marchi 
---
  drivers/gpu/drm/drm_file.c  | 6 +-
  drivers/gpu/drm/msm/msm_drv.c   | 2 +-
  drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
  drivers/gpu/drm/v3d/v3d_drv.c   | 2 +-
  include/drm/drm_file.h  | 7 ++-
  5 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ad1dc638c83b..937471339c9a 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -856,6 +856,7 @@ void drm_print_memory_stats(struct drm_printer *p,
print_size(p, "total", region, stats->private + stats->shared);
print_size(p, "shared", region, stats->shared);
print_size(p, "active", region, stats->active);
+   print_size(p, "internal", region, stats->internal);
  
  	if (supported_status & DRM_GEM_OBJECT_RESIDENT)

print_size(p, "resident", region, stats->resident);
@@ -873,7 +874,7 @@ EXPORT_SYMBOL(drm_print_memory_stats);
   * Helper to iterate over GEM objects with a handle allocated in the specified
   * file.
   */
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, 
internal_bos func)
  {
struct drm_gem_object *obj;
struct drm_memory_stats status = {};
@@ -919,6 +920,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
spin_unlock(&file->table_lock);
  
+	if (func)

+   func(&status, file);
+
drm_print_memory_stats(p, &status, supported_status, "memory");
  }
  EXPORT_SYMBOL(drm_show_memory_stats);
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index edbc1ab0fbc8..2b3feb79afc4 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
  
  	msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
  
-	drm_show_memory_stats(p, file);

+   drm_show_memory_stats(p, file, NULL);
  }
  
  static const struct file_operations fops = {

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 04d615df5259..aaa8602bf00d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -609,7 +609,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
  
  	panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
  
-	drm_show_memory_stats(p, file);

+   drm_show_memory_stats(p, file, NULL);
  }
  
  static const struct file_operations panfrost_drm_driver_fops = {

diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c
index fb35c5c3f1a7..314e77c67972 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.c
+++ b/drivers/gpu/drm/v3d/v3d_drv.c
@@ -195,7 +195,7 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
   v3d_queue_to_string(queue), jobs_completed);
}
  
-	drm_show_memory_stats(p, file);

+   drm_show_memory_stats(p, file, NULL);
  }
  
  static const struct file_operations v3d_drm_fops = {

diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 8c0030c77308..661d00d5350e 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -469,6 +469,7 @@ void drm_send_event_timestamp_locked(struct drm_device *dev,
   * @resident: Total size of GEM objects backing pages
   * @purgeable: Total size of GEM objects that can be purged (resident and not 
active)
   * @active: Total size of GEM objects active on one or more engines
+ * @internal: Total size of GEM objects that aren'

Re: [PATCH] drm/sched: revert "Always increment correct scheduler score"

2024-09-30 Thread Tvrtko Ursulin




On 30/09/2024 14:14, Christian König wrote:

This reverts commit 087913e0ba2b3b9d7ccbafb2acf5dab9e35ae1d5.

It turned out that the original code was correct since the rq can only
change when there is no armed job for an entity.

This change here broke the logic since we only incremented the counter
for the first job, so revert it.

Signed-off-by: Christian König 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index b2cf3e0c1838..a75eede8bf8d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -586,6 +586,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
ktime_t submit_ts;
  
  	trace_drm_sched_job(sched_job, entity);

+   atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
  
  	/*

@@ -613,7 +614,6 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
rq = entity->rq;
sched = rq->sched;
  
-		atomic_inc(sched->score);

drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
  


This was definitely broken so revert is the right thing, thank you.

Acked-by: Tvrtko Ursulin 

Regards,

Tvrtko

Re: [PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-30 Thread Tvrtko Ursulin




On 30/09/2024 14:07, Christian König wrote:

Am 30.09.24 um 15:01 schrieb Tvrtko Ursulin:


On 13/09/2024 17:05, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Entities run queue can change during drm_sched_entity_push_job() so make
sure to update the score consistently.

Signed-off-by: Tvrtko Ursulin 
Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with 
multiple queues")

Cc: Nirmoy Das 
Cc: Christian König 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.9+
Reviewed-by: Christian König 
Reviewed-by: Nirmoy Das 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index 76e422548d40..6645a8524699 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  ktime_t submit_ts;
    trace_drm_sched_job(sched_job, entity);
-    atomic_inc(entity->rq->sched->score);
  WRITE_ONCE(entity->last_user, current->group_leader);
    /*
@@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  rq = entity->rq;
  sched = rq->sched;
  +    atomic_inc(sched->score);


Ugh this is wrong. :(

I was working on some further consolidation and realised this.

It will create an imbalance in score since score is currently supposed 
to be accounted twice:


 1. +/- 1 for each entity (de-)queued
 2. +/- 1 for each job queued/completed

By moving it into the "if (first) branch" it unbalances it.

But it is still true the original placement is racy. It looks like 
what is required is an unconditional entity->lock section after 
spsc_queue_push. AFAICT that's the only way to be sure entity->rq is 
set for the submission at hand.


Question also is, why +/- score in entity add/remove and not just for 
jobs?


In the meantime patch will need to get reverted.


Ok going to revert that.


Thank you, and sorry for the trouble!

I also just realized that we don't need to change anything. The rq can't 
change as soon as there is a job armed for it.


So having the increment right before pushing the armed job to the entity 
was actually correct in the first place.


Are you sure? Two threads racing to arm and push on the same entity?


T1  T2

arm job
rq1 selected
..
push jobarm job
inc score rq1
spsc_queue_count check passes
 ---  just before T1 spsc_queue_push ---
changed to rq2
spsc_queue_push
if (first)
  resamples entity->rq
  queues rq2

Where rq1 and rq2 belong to different schedulers.   

Regards,

Tvrtko



Regards,
Christian.



Regards,

Tvrtko


  drm_sched_rq_add_entity(rq, entity);
  spin_unlock(&entity->rq_lock);

Re: [PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-30 Thread Tvrtko Ursulin




On 13/09/2024 17:05, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Entities run queue can change during drm_sched_entity_push_job() so make
sure to update the score consistently.

Signed-off-by: Tvrtko Ursulin 
Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple 
queues")
Cc: Nirmoy Das 
Cc: Christian König 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.9+
Reviewed-by: Christian König 
Reviewed-by: Nirmoy Das 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 76e422548d40..6645a8524699 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
ktime_t submit_ts;
  
  	trace_drm_sched_job(sched_job, entity);

-   atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
  
  	/*

@@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
rq = entity->rq;
sched = rq->sched;
  
+		atomic_inc(sched->score);


Ugh this is wrong. :(

I was working on some further consolidation and realised this.

It will create an imbalance in score since score is currently supposed 
to be accounted twice:


 1. +/- 1 for each entity (de-)queued
 2. +/- 1 for each job queued/completed

By moving it into the "if (first) branch" it unbalances it.

But it is still true the original placement is racy. It looks like what 
is required is an unconditional entity->lock section after 
spsc_queue_push. AFAICT that's the only way to be sure entity->rq is set 
for the submission at hand.


Question also is, why +/- score in entity add/remove and not just for jobs?

In the meantime patch will need to get reverted.

Regards,

Tvrtko


drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);

Re: [PATCH v4 6/6] drm/amdgpu: use drm_file::name in task_info::process_desc

2024-09-30 Thread Tvrtko Ursulin




On 27/09/2024 09:48, Pierre-Eric Pelloux-Prayer wrote:

If a drm_file name is set append it to the process name.

This information is useful with the virtio/native-context driver: this
allows the guest applications identifier to visible in amdgpu's output.

The output in amdgpu_vm_info/amdgpu_gem_info looks like this:
pid:12255   Process:glxgears/test-set-fd-name --

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  1 +
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  3 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 26 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  3 +++
  6 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f9d119448442..ad909173e419 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -299,6 +299,7 @@ int amdgpu_amdkfd_gpuvm_set_vm_pasid(struct amdgpu_device 
*adev,
 struct amdgpu_vm *avm, u32 pasid);
  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev,
struct amdgpu_vm *avm,
+   struct drm_file *filp,
void **process_info,
struct dma_fence **ef);
  void amdgpu_amdkfd_gpuvm_release_process_vm(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6d5fd371d5ce..172882af6705 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1558,6 +1558,7 @@ int amdgpu_amdkfd_gpuvm_set_vm_pasid(struct amdgpu_device 
*adev,
  
  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev,

   struct amdgpu_vm *avm,
+  struct drm_file *filp,
   void **process_info,
   struct dma_fence **ef)
  {
@@ -1577,7 +1578,7 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct 
amdgpu_device *adev,
if (ret)
return ret;
  
-	amdgpu_vm_set_task_info(avm);

+   amdgpu_vm_set_task_info(avm, filp);
  
  	return 0;

  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 891128ecee6d..5d43e24906d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1178,7 +1178,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
}
  
  	/* Use this opportunity to fill in task info for the vm */

-   amdgpu_vm_set_task_info(vm);
+   amdgpu_vm_set_task_info(vm, p->filp);
  
  	if (adev->debug_vm) {

/* Invalidate all BOs to test for userspace bugs */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index cec0a5cffcc8..f6e2be6d4e9e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2355,25 +2355,40 @@ amdgpu_vm_get_task_info_pasid(struct amdgpu_device 
*adev, u32 pasid)
amdgpu_vm_get_vm_from_pasid(adev, pasid));
  }
  
-static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm)

+static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm, struct drm_file 
*filp)
  {
char process_name[TASK_COMM_LEN];
-   int desc_len;
+   size_t desc_len;


Nit - would be nicer to avoid the churn from patch to patch by starting 
with the correct type in the previous patch.


  
  	get_task_comm(process_name, current->group_leader);

desc_len = strlen(process_name);
  
+	mutex_lock(&filp->client_name_lock);

+   if (filp->client_name)
+   desc_len += 1 + strlen(filp->client_name);
+
vm->task_info = kzalloc(
struct_size(vm->task_info, process_desc, desc_len + 1),
GFP_KERNEL);
  
-	if (!vm->task_info)

+   if (!vm->task_info) {
+   mutex_unlock(&filp->client_name_lock);
return -ENOMEM;
+   }
  
  	/* Set process attributes now. */

vm->task_info->tgid = current->group_leader->pid;
strscpy(vm->task_info->process_desc, process_name, desc_len + 1);
  
+	if (filp->client_name) {

+   size_t p_len = strlen(process_name);


Another nit is that you are taking this strlen twice. Maybe cache it in 
a top level local so it looks cleaner.


But those are just nits to make the series look more polished. 
Fundamentals look fine to me so up to you if you want to respin or not.


Regards,

Tvrtko


+
+   vm->task_info->process_desc[p_len] = '/';

Re: [PATCH v4 1/6] drm: add DRM_SET_CLIENT_NAME ioctl

2024-09-30 Thread Tvrtko Ursulin




On 27/09/2024 09:48, Pierre-Eric Pelloux-Prayer wrote:

Giving the opportunity to userspace to associate a free-form
name with a drm_file struct is helpful for tracking and debugging.

This is similar to the existing DMA_BUF_SET_NAME ioctl.

Access to client_name is protected by a mutex, and the 'clients' debugfs
file has been updated to print it.

Userspace MR to use this ioctl:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428

If the string passed by userspace contains chars that would mess up output
when it's going to be printed (in dmesg, fdinfo, etc), -EINVAL is returned.

A 0-length string is a valid use, and clears the existing name.

Reviewed-by: Tvrtko Ursulin 
Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  drivers/gpu/drm/drm_debugfs.c | 14 ++---
  drivers/gpu/drm/drm_file.c|  5 
  drivers/gpu/drm/drm_ioctl.c   | 55 +++
  include/drm/drm_file.h|  9 ++
  include/uapi/drm/drm.h| 17 +++
  5 files changed, 96 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 6b239a24f1df..5c99322a4c6f 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -78,12 +78,14 @@ static int drm_clients_info(struct seq_file *m, void *data)
kuid_t uid;
  
  	seq_printf(m,

-  "%20s %5s %3s master a %5s %10s\n",
+  "%20s %5s %3s master a %5s %10s %*s\n",
   "command",
   "tgid",
   "dev",
   "uid",
-  "magic");
+  "magic",
+  DRM_CLIENT_NAME_MAX_LEN,
+  "name");
  
  	/* dev->filelist is sorted youngest first, but we want to present

 * oldest first (i.e. kernel, servers, clients), so walk backwardss.
@@ -94,19 +96,23 @@ static int drm_clients_info(struct seq_file *m, void *data)
struct task_struct *task;
struct pid *pid;
  
+		mutex_lock(&priv->client_name_lock);

rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */
pid = rcu_dereference(priv->pid);
task = pid_task(pid, PIDTYPE_TGID);
uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
-   seq_printf(m, "%20s %5d %3d   %c%c %5d %10u\n",
+   seq_printf(m, "%20s %5d %3d   %c%c %5d %10u %*s\n",
   task ? task->comm : "",
   pid_vnr(pid),
   priv->minor->index,
   is_current_master ? 'y' : 'n',
   priv->authenticated ? 'y' : 'n',
   from_kuid_munged(seq_user_ns(m), uid),
-  priv->magic);
+  priv->magic,
+  DRM_CLIENT_NAME_MAX_LEN,
+  priv->client_name ? priv->client_name : "");
rcu_read_unlock();
+   mutex_unlock(&priv->client_name_lock);
}
mutex_unlock(&dev->filelist_mutex);
return 0;
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 01fde94fe2a9..64f5e15304e7 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
  
  	spin_lock_init(&file->master_lookup_lock);

mutex_init(&file->event_read_lock);
+   mutex_init(&file->client_name_lock);
  
  	if (drm_core_check_feature(dev, DRIVER_GEM))

drm_gem_open(dev, file);
@@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file)
WARN_ON(!list_empty(&file->event_list));
  
  	put_pid(rcu_access_pointer(file->pid));

+
+   mutex_destroy(&file->client_name_lock);
+   kfree(file->client_name);
+
kfree(file);
  }
  
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c

index 51f39912866f..df8d59bd5241 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -540,6 +540,59 @@ int drm_version(struct drm_device *dev, void *data,
return err;
  }
  
+/*

+ * Check if the passed string contains control char or spaces or
+ * anything that would mess up a formatted output.
+ */
+static int drm_validate_value_string(const char *value, size_t len)
+{
+   int i;
+
+   for (i = 0; i < len; i++) {
+   if (value[i] <= 32 || value[i] >= 127)


Would !isascii() || isgraph() work for what you have in mind here, 
considering the comment from the cover letter about the extended ASCII?



+   return -EINVAL;
+   }

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-26 Thread Tvrtko Ursulin




On 26/09/2024 09:15, Philipp Stanner wrote:

On Mon, 2024-09-23 at 15:35 +0100, Tvrtko Ursulin wrote:


Ping Christian and Philipp - reasonably happy with v2? I think it's
the
only unreviewed patch from the series.


Howdy,

sry for the delay, I had been traveling.

I have a few nits below regarding the commit message. Besides, I'm OK
with that, thx for your work :)


No worries.


On 16/09/2024 18:30, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch
titled
"drm/sched: Optimise drm_sched_entity_push_job",
with only a tiny bit
larger refactoring we can do the same optimisation


Well, the commit message does not state which optimization that is. One
would have to look for the previous patch, which you apparently cannot
provide a commit ID for yet because it's not in Big Boss's branch.


With added emphasis:

"Having _removed one re-lock cycle_ on the entity-lock..."

"...do the same optimisation on the rq->lock."

How it is not clear?


In this case I am for including a sentence about what is being
optimized also because


on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and


it's not clear what the "this" that's being achieved is.


"This" is the optimisation previous paragraph talks about.

What/why followed by how.

I honestly think this part of the commit text is good enough.


drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq
as a
parameter to the latter.

v2:
   * Fix after rebase of the series.
   * Avoid naming incosistency between drm_sched_rq_add/remove.
(Christian)

Signed-off-by: Tvrtko Ursulin 


Reviewed-by: Philipp Stanner 


Thank you!


Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
   drivers/gpu/drm/scheduler/sched_entity.c | 12 --
   drivers/gpu/drm/scheduler/sched_main.c   | 29 ---
-
   include/drm/gpu_scheduler.h  |  3 ++-
   3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
b/drivers/gpu/drm/scheduler/sched_entity.c
index d982cebc6bee..8ace1f1ea66b 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)
   
   		next = to_drm_sched_job(spsc_queue_peek(&entity-

job_queue));

    if (next) {
+   struct drm_sched_rq *rq;
+
    spin_lock(&entity->lock);
-   drm_sched_rq_update_fifo_locked(entity,
+   rq = entity->rq;
+   spin_lock(&rq->lock);
+   drm_sched_rq_update_fifo_locked(entity,
rq,
    next-

submit_ts);

+   spin_unlock(&rq->lock);
    spin_unlock(&entity->lock);
    }
    }
@@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job)
    sched = rq->sched;
   
   		atomic_inc(sched->score);

+
+   spin_lock(&rq->lock);
    drm_sched_rq_add_entity(rq, entity);
   
   		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)

-   drm_sched_rq_update_fifo_locked(entity,
submit_ts);
+   drm_sched_rq_update_fifo_locked(entity,
rq, submit_ts);
   
+		spin_unlock(&rq->lock);

    spin_unlock(&entity->lock);
   
   		drm_sched_wakeup(sched, entity);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 18a952f73ecb..5c83fb92bb89 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool
drm_sched_entity_compare_before(struct rb_node *a,
    return ktime_before(ent_a->oldest_job_waiting, ent_b-

oldest_job_waiting);

   }
   
-static inline void drm_sched_rq_remove_fifo_locked(struct

drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct


I think the commit message should contain a short sentence about why
you removed the inline.

AKA "As we're at it, remove the inline function specifier from
drm_sched_rq_remove_fifo_locked() because XYZ"


Fair play on this one, should have mentioned it. Probably just removed 
the inline by habit while touching the function signature. Under the 
"compiler knows better" mantra.


Regards,

Tvrtko


drm_sched_enti

Re: [PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin




On 24/09/2024 15:20, Christian König wrote:

Am 24.09.24 um 16:12 schrieb Tvrtko Ursulin:


On 24/09/2024 14:55, Christian König wrote:
I've pushed the first to drm-misc-next, but that one here fails to 
apply cleanly.


This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flow 
control race") in drm-misc-fixes.


In theory 1-3 from my series are fixes. Should they also go to 
drm-misc-fixes? I am not too familiar with the drm-misc flow.


Ah shit, in that case you should have spitted the patches up into fixes 
and next. Going to push the first 3 to fixes.


Sorry my drm-intel ways of thinking (cherry picked fixes) are hard to 
get rid of. Hence the series was structured as 1-3 fixes, 4-8 refactors etc.


Now appears it is too late to pull out the first one from drm-misc-next.

Or the series now needs to wait for some backmerge?


Are the remaining 3 patches independent? If not then we need to wait for 
a backmerge.


These are independent:

Fixes:

 1/8 "drm/sched: Add locking to drm_sched_entity_modify_sched"

Not fixes:

 5/8 "drm/sched: Stop setting current entity in FIFO mode"
 6/8 "drm/sched: Re-order struct drm_sched_rq members for clarity"

While the rest touch at least some common areas.

2/8 and 3/8 are also fixes.

4/8, 7/8 and 8/8 not fixes but depend on 2/8 and 3/8.

Regards,

Tvrtko


Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Since drm_sched_entity_modify_sched() can modify the entities run 
queue,

lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.

Alternative of moving the spin_unlock to after the wake up would for 
now

be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().

v2:
  * Improve commit message. (Philipp)
  * Cache the scheduler pointer directly. (Christian)

Signed-off-by: Tvrtko Ursulin 
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify 
sched list")

Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Philipp Stanner 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.7+
Reviewed-by: Christian König 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index 0e002c17fcb6..a75eede8bf8d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  /* first job wakes up scheduler */
  if (first) {
+    struct drm_gpu_scheduler *sched;
+    struct drm_sched_rq *rq;
+
  /* Add the entity to the run queue */
  spin_lock(&entity->rq_lock);
  if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  return;
  }
-    drm_sched_rq_add_entity(entity->rq, entity);
+    rq = entity->rq;
+    sched = rq->sched;
+
+    drm_sched_rq_add_entity(rq, entity);
  spin_unlock(&entity->rq_lock);
  if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
  drm_sched_rq_update_fifo(entity, submit_ts);
-    drm_sched_wakeup(entity->rq->sched);
+    drm_sched_wakeup(sched);
  }
  }
  EXPORT_SYMBOL(drm_sched_entity_push_job);

Re: [PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin




On 24/09/2024 14:55, Christian König wrote:
I've pushed the first to drm-misc-next, but that one here fails to apply 
cleanly.


This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flow control 
race") in drm-misc-fixes.


In theory 1-3 from my series are fixes. Should they also go to 
drm-misc-fixes? I am not too familiar with the drm-misc flow.


Or the series now needs to wait for some backmerge?

Regards,

Tvrtko


Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.

Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().

v2:
  * Improve commit message. (Philipp)
  * Cache the scheduler pointer directly. (Christian)

Signed-off-by: Tvrtko Ursulin 
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify 
sched list")

Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Philipp Stanner 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.7+
Reviewed-by: Christian König 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index 0e002c17fcb6..a75eede8bf8d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  /* first job wakes up scheduler */
  if (first) {
+    struct drm_gpu_scheduler *sched;
+    struct drm_sched_rq *rq;
+
  /* Add the entity to the run queue */
  spin_lock(&entity->rq_lock);
  if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  return;
  }
-    drm_sched_rq_add_entity(entity->rq, entity);
+    rq = entity->rq;
+    sched = rq->sched;
+
+    drm_sched_rq_add_entity(rq, entity);
  spin_unlock(&entity->rq_lock);
  if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
  drm_sched_rq_update_fifo(entity, submit_ts);
-    drm_sched_wakeup(entity->rq->sched);
+    drm_sched_wakeup(sched);
  }
  }
  EXPORT_SYMBOL(drm_sched_entity_push_job);

[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and
drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a
parameter to the latter.

v2:
 * Fix after rebase of the series.
 * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 12 --
 drivers/gpu/drm/scheduler/sched_main.c   | 29 
 include/drm/gpu_scheduler.h  |  3 ++-
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 5ebbba77e77d..0aa90829c1d2 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
+   struct drm_sched_rq *rq;
+
spin_lock(&entity->lock);
-   drm_sched_rq_update_fifo_locked(entity,
+   rq = entity->rq;
+   spin_lock(&rq->lock);
+   drm_sched_rq_update_fifo_locked(entity, rq,
next->submit_ts);
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
}
}
@@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
sched = rq->sched;
 
atomic_inc(sched->score);
+
+   spin_lock(&rq->lock);
drm_sched_rq_add_entity(rq, entity);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
 
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
 
drm_sched_wakeup(sched);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 5628a4c78242..bdb55545b57c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);
 }
 
-static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+   struct drm_sched_rq *rq)
 {
-   struct drm_sched_rq *rq = entity->rq;
-
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
 }
 
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq,
+ktime_t ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts
 * other to update the rb tree structure.
 */
lockdep_assert_held(&entity->lock);
+   lockdep_assert_held(&rq->lock);
 
-   spin_lock(&entity->rq->lock);
-
-   drm_sched_rq_remove_fifo_locked(entity);
+   drm_sched_rq_remove_fifo_locked(entity, rq);
 
entity->oldest_job_waiting = ts;
 
-   rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+   rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
  drm_sched_entity_compare_before);
-
-   spin_unlock(&entity->rq->lock);
 }
 
 /**
@@ -213,15 +211,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler 
*sched,
 void drm_sched_rq_add_entity(struct drm_sched_r

[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Christian suggested to rename the lock and improve the documentation of
what it protects. And to also re-order the structure members so all
protected by the lock are together in a block.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 28 
 drivers/gpu/drm/scheduler/sched_main.c   |  2 +-
 include/drm/gpu_scheduler.h  | 15 +++--
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 59f710afe992..5ebbba77e77d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
/* We start in an idle state. */
complete_all(&entity->entity_idle);
 
-   spin_lock_init(&entity->rq_lock);
+   spin_lock_init(&entity->lock);
spsc_queue_init(&entity->job_queue);
 
atomic_set(&entity->fence_seq, 0);
@@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct 
drm_sched_entity *entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
@@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity 
*entity)
if (!entity->rq)
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->stopped = true;
drm_sched_rq_remove_entity(entity->rq, entity);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
/* Make sure this entity is not used by the scheduler at the moment */
wait_for_completion(&entity->entity_idle);
@@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
   enum drm_sched_priority priority)
 {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->priority = priority;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_set_priority);
 
@@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
drm_sched_rq_update_fifo_locked(entity,
next->submit_ts);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
}
}
 
@@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity 
*entity)
if (fence && !dma_fence_is_signaled(fence))
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
rq = sched ? sched->sched_rq[entity->priority] : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
}
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
if (entity->num_sched_list == 1)
entity->sched_list = NULL;
@@ -606,9 +606,9 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
struct drm_sched_rq *rq;
 
/* Add the entity to the run queue */
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
if (entity->stopped) {
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
DRM_ERROR("Trying to push to a killed entity\n");
return;
@@ -623,7 +623,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo_locked(entity, submit_ts);
 
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);

[PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.

Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().

v2:
 * Improve commit message. (Philipp)
 * Cache the scheduler pointer directly. (Christian)

Signed-off-by: Tvrtko Ursulin 
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Philipp Stanner 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.7+
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 0e002c17fcb6..a75eede8bf8d 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 
/* first job wakes up scheduler */
if (first) {
+   struct drm_gpu_scheduler *sched;
+   struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
return;
}
 
-   drm_sched_rq_add_entity(entity->rq, entity);
+   rq = entity->rq;
+   sched = rq->sched;
+
+   drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
 
-   drm_sched_wakeup(entity->rq->sched);
+   drm_sched_wakeup(sched);
}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
-- 
2.46.0

[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

It does not seem there is a need to set the current entity in FIFO mode
since ot only serves as being a "cursor" in round-robin mode. Even if
scheduling mode is changed at runtime the change in behaviour is simply
to restart from the first entity, instead of continuing in RR mode from
where FIFO left it, and that sounds completely fine.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Acked-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index e312eb6ac85a..130b53f02bbf 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -349,7 +349,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler 
*sched,
return ERR_PTR(-ENOSPC);
}

-   rq->current_entity = entity;
reinit_completion(&entity->entity_idle);
break;
}
-- 
2.46.0

[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Without the locking amdgpu currently can race between
amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and
drm_sched_job_arm(), leading to the latter accesing potentially
inconsitent entity->sched_list and entity->num_sched_list pair.

v2:
 * Improve commit message. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc: Philipp Stanner 
Cc:  # v5.7+
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 567e5ace6d0c..0e002c17fcb6 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity 
*entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
+   spin_lock(&entity->rq_lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
+   spin_unlock(&entity->rq_lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
-- 
2.46.0

[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to immediately re-acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.

v2:
 * Remove drm_sched_rq_update_fifo() altogether. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 13 +
 drivers/gpu/drm/scheduler/sched_main.c   |  6 +++---
 include/drm/gpu_scheduler.h  |  2 +-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index b2cf3e0c1838..59f710afe992 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
struct drm_sched_job *next;
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
-   if (next)
-   drm_sched_rq_update_fifo(entity, next->submit_ts);
+   if (next) {
+   spin_lock(&entity->rq_lock);
+   drm_sched_rq_update_fifo_locked(entity,
+   next->submit_ts);
+   spin_unlock(&entity->rq_lock);
+   }
}
 
/* Jobs and entities might have different lifecycles. Since we're
@@ -615,10 +619,11 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 
atomic_inc(sched->score);
drm_sched_rq_add_entity(rq, entity);
-   spin_unlock(&entity->rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+
+   spin_unlock(&entity->rq_lock);
 
drm_sched_wakeup(sched);
}
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 03c532590e2a..e312eb6ac85a 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -163,14 +163,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *enti
}
 }
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
 * for entity from within concurrent drm_sched_entity_select_rq and the
 * other to update the rb tree structure.
 */
-   spin_lock(&entity->rq_lock);
+   lockdep_assert_held(&entity->rq_lock);
+
spin_lock(&entity->rq->lock);
 
drm_sched_rq_remove_fifo_locked(entity);
@@ -181,7 +182,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity 
*entity, ktime_t ts)
  drm_sched_entity_compare_before);
 
spin_unlock(&entity->rq->lock);
-   spin_unlock(&entity->rq_lock);
 }
 
 /**
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 0b679700a63a..f62cc31fea18 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity);
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts);
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts);
 
 int drm_sched_entity_init(struct drm_sched_entity *entity,
  enum drm_sched_priority priority,
-- 
2.46.0

[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Current kerneldoc for struct drm_sched_rq incompletely documents what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by the lock.

While at it, lets also re-order the members so all protected by the lock
are in a single group.

v2:
 * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 include/drm/gpu_scheduler.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index f62cc31fea18..33c60889e0a3 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -243,10 +243,10 @@ struct drm_sched_entity {
 /**
  * struct drm_sched_rq - queue of entities to be scheduled.
  *
- * @lock: to modify the entities list.
  * @sched: the scheduler to which this rq belongs to.
- * @entities: list of the entities to be scheduled.
+ * @lock: protects @entities, @rb_tree_root and @current_entity.
  * @current_entity: the entity which is to be scheduled.
+ * @entities: list of the entities to be scheduled.
  * @rb_tree_root: root of time based priory queue of entities for FIFO 
scheduling
  *
  * Run queue is a set of entities scheduling command submissions for
@@ -254,10 +254,12 @@ struct drm_sched_entity {
  * the next entity to emit commands from.
  */
 struct drm_sched_rq {
-   spinlock_t  lock;
struct drm_gpu_scheduler*sched;
-   struct list_headentities;
+
+   spinlock_t  lock;
+   /* Following members are protected by the @lock: */
struct drm_sched_entity *current_entity;
+   struct list_headentities;
struct rb_root_cached   rb_tree_root;
 };
 
-- 
2.46.0

[PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Entities run queue can change during drm_sched_entity_push_job() so make
sure to update the score consistently.

Signed-off-by: Tvrtko Ursulin 
Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple 
queues")
Cc: Nirmoy Das 
Cc: Christian König 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.9+
Reviewed-by: Christian König 
Reviewed-by: Nirmoy Das 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index a75eede8bf8d..b2cf3e0c1838 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
ktime_t submit_ts;
 
trace_drm_sched_job(sched_job, entity);
-   atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
 
/*
@@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
rq = entity->rq;
sched = rq->sched;
 
+   atomic_inc(sched->score);
drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
 
-- 
2.46.0

[PATCH v3 0/8] DRM scheduler fixes and improvements

2024-09-24 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

All reviewed now, re-sending after rebasing on latest drm-tip so it is in a
mergeable state.

Tvrtko Ursulin (8):
  drm/sched: Add locking to drm_sched_entity_modify_sched
  drm/sched: Always wake up correct scheduler in
drm_sched_entity_push_job
  drm/sched: Always increment correct scheduler score
  drm/sched: Optimise drm_sched_entity_push_job
  drm/sched: Stop setting current entity in FIFO mode
  drm/sched: Re-order struct drm_sched_rq members for clarity
  drm/sched: Re-group and rename the entity run-queue lock
  drm/sched: Further optimise drm_sched_entity_push_job

 drivers/gpu/drm/scheduler/sched_entity.c | 53 +---
 drivers/gpu/drm/scheduler/sched_main.c   | 32 +++---
 include/drm/gpu_scheduler.h  | 28 +++--
 3 files changed, 68 insertions(+), 45 deletions(-)

-- 
2.46.0

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin




On 24/09/2024 10:45, Tvrtko Ursulin wrote:


On 24/09/2024 09:20, Christian König wrote:

Am 16.09.24 um 19:30 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and
drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a
parameter to the latter.

v2:
  * Fix after rebase of the series.
  * Avoid naming incosistency between drm_sched_rq_add/remove. 
(Christian)


Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 


Reviewed-by: Christian König 


Thanks!

Are you okay to pull into drm-misc-next or we should do some more 
testing on this?


And/or should I resend the series once more in it's entirety so this v2 
is not a reply-to to the original?


I have to respin for the drm_sched_wakeup fix that landed.

Regards,

Tvrtko



Regards,

Tvrtko




---
  drivers/gpu/drm/scheduler/sched_entity.c | 12 --
  drivers/gpu/drm/scheduler/sched_main.c   | 29 
  include/drm/gpu_scheduler.h  |  3 ++-
  3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index d982cebc6bee..8ace1f1ea66b 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job 
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)

  next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
  if (next) {
+    struct drm_sched_rq *rq;
+
  spin_lock(&entity->lock);
-    drm_sched_rq_update_fifo_locked(entity,
+    rq = entity->rq;
+    spin_lock(&rq->lock);
+    drm_sched_rq_update_fifo_locked(entity, rq,
  next->submit_ts);
+    spin_unlock(&rq->lock);
  spin_unlock(&entity->lock);
  }
  }
@@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  sched = rq->sched;
  atomic_inc(sched->score);
+
+    spin_lock(&rq->lock);
  drm_sched_rq_add_entity(rq, entity);
  if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-    drm_sched_rq_update_fifo_locked(entity, submit_ts);
+    drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
+    spin_unlock(&rq->lock);
  spin_unlock(&entity->lock);
  drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 18a952f73ecb..5c83fb92bb89 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
  return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);

  }
-static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity,

+    struct drm_sched_rq *rq)
  {
-    struct drm_sched_rq *rq = entity->rq;
-
  if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
  rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
  RB_CLEAR_NODE(&entity->rb_tree_node);
  }
  }
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity 
*entity, ktime_t ts)

+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq,
+ ktime_t ts)
  {
  /*
   * Both locks need to be grabbed, one to protect from 
entity->rq change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts

   * other to update the rb tree structure.
   */
  lockdep_assert_held(&entity->lock);
+    lockdep_assert_held(&rq->lock);
-    spin_lock(&entity->rq->lock);
-
-    drm_sched_rq_remove_fifo_locked(entity);
+    drm_sched_rq_remove_fifo_locked(entity, rq);
  entity->oldest_job_waiting = ts;
-    rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+    rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
    drm_sched_entity_compare_before);
-
-    spin_unlock(&entity->rq->lo

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-24 Thread Tvrtko Ursulin




On 24/09/2024 09:20, Christian König wrote:

Am 16.09.24 um 19:30 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and
drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a
parameter to the latter.

v2:
  * Fix after rebase of the series.
  * Avoid naming incosistency between drm_sched_rq_add/remove. 
(Christian)


Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 


Reviewed-by: Christian König 


Thanks!

Are you okay to pull into drm-misc-next or we should do some more 
testing on this?


And/or should I resend the series once more in it's entirety so this v2 
is not a reply-to to the original?


Regards,

Tvrtko




---
  drivers/gpu/drm/scheduler/sched_entity.c | 12 --
  drivers/gpu/drm/scheduler/sched_main.c   | 29 
  include/drm/gpu_scheduler.h  |  3 ++-
  3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index d982cebc6bee..8ace1f1ea66b 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job 
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)

  next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
  if (next) {
+    struct drm_sched_rq *rq;
+
  spin_lock(&entity->lock);
-    drm_sched_rq_update_fifo_locked(entity,
+    rq = entity->rq;
+    spin_lock(&rq->lock);
+    drm_sched_rq_update_fifo_locked(entity, rq,
  next->submit_ts);
+    spin_unlock(&rq->lock);
  spin_unlock(&entity->lock);
  }
  }
@@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  sched = rq->sched;
  atomic_inc(sched->score);
+
+    spin_lock(&rq->lock);
  drm_sched_rq_add_entity(rq, entity);
  if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-    drm_sched_rq_update_fifo_locked(entity, submit_ts);
+    drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
+    spin_unlock(&rq->lock);
  spin_unlock(&entity->lock);
  drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 18a952f73ecb..5c83fb92bb89 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
  return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);

  }
-static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity,

+    struct drm_sched_rq *rq)
  {
-    struct drm_sched_rq *rq = entity->rq;
-
  if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
  rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
  RB_CLEAR_NODE(&entity->rb_tree_node);
  }
  }
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, 
ktime_t ts)

+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq,
+ ktime_t ts)
  {
  /*
   * Both locks need to be grabbed, one to protect from entity->rq 
change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts

   * other to update the rb tree structure.
   */
  lockdep_assert_held(&entity->lock);
+    lockdep_assert_held(&rq->lock);
-    spin_lock(&entity->rq->lock);
-
-    drm_sched_rq_remove_fifo_locked(entity);
+    drm_sched_rq_remove_fifo_locked(entity, rq);
  entity->oldest_job_waiting = ts;
-    rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+    rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
    drm_sched_entity_compare_before);
-
-    spin_unlock(&entity->rq->lock);
  }
  /**
@@ -213,15 +211,14 @@ static void drm_sched_rq_init(struct 
drm_gpu_scheduler *sched,

  void drm_sched_rq_a

Re: [PATCH v3 3/6] drm/amdgpu: delay the use of amdgpu_vm_set_task_info

2024-09-24 Thread Tvrtko Ursulin




On 24/09/2024 09:23, Christian König wrote:

Am 23.09.24 um 12:25 schrieb Tvrtko Ursulin:


On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote:

At this point the vm is locked so we safely modify it without risk of
concurrent access.


To which particular lock this is referring to and does this imply 
previous placement was unsafe?


We use the root PDs dma_resv object as VM lock to protect most field 
inside the VM structure, only a few are protected by an additional 
spinlock.


And yes, previously it was possible that you got a mangled process/task 
name because no lock was protecting the task_info structure.


Got it, thanks Christian!

In this case I only suggest to be more explicit in the commit message 
and clearly say it is fixing an existing bug. Like it stands I wasn't 
sure if it was that, or the movement was just enabling the changes which 
come later in the series.


Regards,

Tvrtko

Signed-off-by: Pierre-Eric Pelloux-Prayer 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 1e475eb01417..891128ecee6d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -309,9 +309,6 @@ static int amdgpu_cs_pass1(struct 
amdgpu_cs_parser *p,

  p->gang_leader->uf_addr = uf_offset;
  kvfree(chunk_array);
  -    /* Use this opportunity to fill in task info for the vm */
-    amdgpu_vm_set_task_info(vm);
-
  return 0;
    free_all_kdata:
@@ -1180,6 +1177,9 @@ static int amdgpu_cs_vm_handling(struct 
amdgpu_cs_parser *p)

  job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo);
  }
  +    /* Use this opportunity to fill in task info for the vm */
+    amdgpu_vm_set_task_info(vm);
+
  if (adev->debug_vm) {
  /* Invalidate all BOs to test for userspace bugs */
  amdgpu_bo_list_for_each_entry(e, p->bo_list) {

Re: [PATCH v3 1/6] drm: add DRM_SET_NAME ioctl

2024-09-24 Thread Tvrtko Ursulin




On 24/09/2024 09:22, Pierre-Eric Pelloux-Prayer wrote:



Le 23/09/2024 à 12:06, Tvrtko Ursulin a écrit :


On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote:

Giving the opportunity to userspace to associate a free-form
name with a drm_file struct is helpful for tracking and debugging.

This is similar to the existing DMA_BUF_SET_NAME ioctl.

Access to name is protected by a mutex, and the 'clients' debugfs
file has been updated to print it.

Userspace MR to use this ioctl:

https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428


The string passed by userspace is filtered a bit, to avoid messing
output when it's going to be printed (in dmesg, fdinfo, etc):
   * all chars failing isgraph() are replaced by '-'
   * if a 0-length string is passed the name is cleared

Signed-off-by: Pierre-Eric Pelloux-Prayer 


---
  drivers/gpu/drm/drm_debugfs.c | 12 ++---
  drivers/gpu/drm/drm_file.c    |  5 
  drivers/gpu/drm/drm_ioctl.c   | 48 +++
  include/drm/drm_file.h    |  9 +++
  include/uapi/drm/drm.h    | 17 +
  5 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_debugfs.c 
b/drivers/gpu/drm/drm_debugfs.c

index 6b239a24f1df..482e71160544 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, 
void *data)

  kuid_t uid;
  seq_printf(m,
-   "%20s %5s %3s master a %5s %10s\n",
+   "%20s %5s %3s master a %5s %10s %20s\n",


Allow full DRM_NAME_MAX_LEN? Not sure, feels not very consequential 
either way.


I'll switch to:

     seq_printf(m,
    "%20s %5s %3s master a %5s %10s %*s\n",
    "command",
    "tgid",
    "dev",
    "uid",
    "magic",
    DRM_CLIENT_NAME_MAX_LEN,
    "name");



That works.


And:

     seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u %*s\n",
    task ? task->comm : "",
    pid_vnr(pid),
    priv->minor->index,
    is_current_master ? 'y' : 'n',
    priv->authenticated ? 'y' : 'n',
    from_kuid_munged(seq_user_ns(m), uid),
    priv->magic,
    DRM_CLIENT_NAME_MAX_LEN,
    priv->client_name ? priv->client_name : 
"");


Also works for me although it will look a bit busy by default since 
every line will contain it.


I don't immediately see "parseability" is a concern (what Dmitry raised) 
because new code can detect if there is something there or not. For old 
code, or future changes, we do not care in debugfs.


Equally we don't care that much if it looks busy, hence why I said 
"" works for me.


I'd also be okay with repeating task->comm, but that perhaps complicates 
things too much when task is not available.





 "command",
 "tgid",
 "dev",
 "uid",
-   "magic");
+   "magic",
+   "name");
  /* dev->filelist is sorted youngest first, but we want to present
   * oldest first (i.e. kernel, servers, clients), so walk 
backwardss.
@@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, 
void *data)

  struct task_struct *task;
  struct pid *pid;
+    mutex_lock(&priv->name_lock);
  rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */
  pid = rcu_dereference(priv->pid);
  task = pid_task(pid, PIDTYPE_TGID);
  uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
-    seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
+    seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u %20s\n",
 task ? task->comm : "",
 pid_vnr(pid),
 priv->minor->index,
 is_current_master ? 'y' : 'n',
 priv->authenticated ? 'y' : 'n',
 from_kuid_munged(seq_user_ns(m), uid),
-   priv->magic);
+   priv->magic,
+   priv->name ?: "");
  rcu_read_unlock();
+    mutex_unlock(&priv->name_lock);
  }
  mutex_unlock(&dev->filelist_mutex);
  return 0;
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 01fde94fe2a9..e9dd0e90a1f9 100644
--- a/d

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-23 Thread Tvrtko Ursulin




Ping Christian and Philipp - reasonably happy with v2? I think it's the 
only unreviewed patch from the series.


Regards,

Tvrtko

On 16/09/2024 18:30, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and
drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a
parameter to the latter.

v2:
  * Fix after rebase of the series.
  * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
  drivers/gpu/drm/scheduler/sched_entity.c | 12 --
  drivers/gpu/drm/scheduler/sched_main.c   | 29 
  include/drm/gpu_scheduler.h  |  3 ++-
  3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index d982cebc6bee..8ace1f1ea66b 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
  
  		next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));

if (next) {
+   struct drm_sched_rq *rq;
+
spin_lock(&entity->lock);
-   drm_sched_rq_update_fifo_locked(entity,
+   rq = entity->rq;
+   spin_lock(&rq->lock);
+   drm_sched_rq_update_fifo_locked(entity, rq,
next->submit_ts);
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
}
}
@@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
sched = rq->sched;
  
  		atomic_inc(sched->score);

+
+   spin_lock(&rq->lock);
drm_sched_rq_add_entity(rq, entity);
  
  		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)

-   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
  
+		spin_unlock(&rq->lock);

spin_unlock(&entity->lock);
  
  		drm_sched_wakeup(sched, entity);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 18a952f73ecb..5c83fb92bb89 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);
  }
  
-static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity)

+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+   struct drm_sched_rq *rq)
  {
-   struct drm_sched_rq *rq = entity->rq;
-
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
  }
  
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts)

+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq,
+ktime_t ts)
  {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts
 * other to update the rb tree structure.
 */
lockdep_assert_held(&entity->lock);
+   lockdep_assert_held(&rq->lock);
  
-	spin_lock(&entity->rq->lock);

-
-   drm_sched_rq_remove_fifo_locked(entity);
+   drm_sched_rq_remove_fifo_locked(entity, rq);
  
  	entity->oldest_job_waiting = ts;
  
-	rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,

+   rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
  drm_sched_entity_compare_before);
-
-   spin_unlock(&entity->rq->lock);
  }
  
  /**

@@

Re: [PATCH v3 4/6] drm/amdgpu: alloc and init vm::task_info from first submit

2024-09-23 Thread Tvrtko Ursulin




On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote:

This will allow to use flexible array to store the process name and
other information.

This also means that process name will be determined once and for all,
instead of at each submit.


But the pid and others can still change? By design?


Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +--
  1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e20d19ae01b2..690676cab022 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2331,7 +2331,7 @@ amdgpu_vm_get_task_info_vm(struct amdgpu_vm *vm)
  {
struct amdgpu_task_info *ti = NULL;
  
-	if (vm) {

+   if (vm && vm->task_info) {
ti = vm->task_info;
kref_get(&vm->task_info->refcount);
}
@@ -2372,8 +2372,12 @@ static int amdgpu_vm_create_task_info(struct amdgpu_vm 
*vm)
   */
  void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
  {
-   if (!vm->task_info)
-   return;
+   if (!vm->task_info) {
+   if (amdgpu_vm_create_task_info(vm))
+   return;
+
+   get_task_comm(vm->task_info->process_name, 
current->group_leader);
+   }
  
  	if (vm->task_info->pid == current->pid)


This ends up relying on vm->task_info->pid being zero due kzalloc right?


return;
@@ -2385,7 +2389,6 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
return;
  
  	vm->task_info->tgid = current->group_leader->pid;

-   get_task_comm(vm->task_info->process_name, current->group_leader);
  }


I wonder how many of the task_info fields you want to set once instead 
of per submission. Like a fully one shot like the below be what you want?


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index a060c28f0877..da492223a8b5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2349,16 +2349,6 @@ amdgpu_vm_get_task_info_pasid(struct 
amdgpu_device *adev, u32 pasid)

amdgpu_vm_get_vm_from_pasid(adev, pasid));
 }

-static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm)
-{
-   vm->task_info = kzalloc(sizeof(struct amdgpu_task_info), GFP_KERNEL);
-   if (!vm->task_info)
-   return -ENOMEM;
-
-   kref_init(&vm->task_info->refcount);
-   return 0;
-}
-
 /**
  * amdgpu_vm_set_task_info - Sets VMs task info.
  *
@@ -2366,20 +2356,28 @@ static int amdgpu_vm_create_task_info(struct 
amdgpu_vm *vm)

  */
 void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
 {
-   if (!vm->task_info)
-   return;
+   struct amdgpu_task_info *task_info = vm->task_info;
+
+   if (!task_info) {
+   task_info = kzalloc(sizeof(struct amdgpu_task_info),
+   GFP_KERNEL);
+   if (!task_info)
+   return;

-   if (vm->task_info->pid == current->pid)
+   kref_init(&task_info->refcount);
+   } else {
return;
+   }

-   vm->task_info->pid = current->pid;
-   get_task_comm(vm->task_info->task_name, current);
+   task_info->pid = current->pid;
+   get_task_comm(task_info->task_name, current);

-   if (current->group_leader->mm != current->mm)
-   return;
+   if (current->group_leader->mm == current->mm) {
+   task_info->tgid = current->group_leader->pid;
+   get_task_comm(task_info->process_name, current->group_leader);
+   }

-   vm->task_info->tgid = current->group_leader->pid;
-   get_task_comm(vm->task_info->process_name, current->group_leader);
+   vm->task_info = task_info;
 }

 /**

End result is code like this:

void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
{
struct amdgpu_task_info *task_info = vm->task_info;

if (!task_info) {
task_info = kzalloc(sizeof(struct amdgpu_task_info),
GFP_KERNEL);
if (!task_info)
return;

kref_init(&task_info->refcount);
} else {
return;
}

task_info->pid = current->pid;
get_task_comm(task_info->task_name, current);

if (current->group_leader->mm == current->mm) {
task_info->tgid = current->group_leader->pid;
get_task_comm(task_info->process_name, current->group_leader);
}

vm->task_info = task_info;
}

?

  
  /**

@@ -2482,7 +2485,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
if (r)
goto error_free_root;
  
-	r = amdgpu_vm_create_task_info(vm);

if (r)
DRM_DEBUG("Failed to create task info for VM\n");


Two more lines to delete here.

Re: [PATCH v3 3/6] drm/amdgpu: delay the use of amdgpu_vm_set_task_info

2024-09-23 Thread Tvrtko Ursulin




On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote:

At this point the vm is locked so we safely modify it without risk of
concurrent access.


To which particular lock this is referring to and does this imply 
previous placement was unsafe?


Regards,

Tvrtko


Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 1e475eb01417..891128ecee6d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -309,9 +309,6 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
p->gang_leader->uf_addr = uf_offset;
kvfree(chunk_array);
  
-	/* Use this opportunity to fill in task info for the vm */

-   amdgpu_vm_set_task_info(vm);
-
return 0;
  
  free_all_kdata:

@@ -1180,6 +1177,9 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser 
*p)
job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo);
}
  
+	/* Use this opportunity to fill in task info for the vm */

+   amdgpu_vm_set_task_info(vm);
+
if (adev->debug_vm) {
/* Invalidate all BOs to test for userspace bugs */
amdgpu_bo_list_for_each_entry(e, p->bo_list) {

Re: [PATCH v3 1/6] drm: add DRM_SET_NAME ioctl

2024-09-23 Thread Tvrtko Ursulin

) {
+   kfree(new_name);
+   return -EINVAL;
+   }
+
+   /*
+* Filter out control char / spaces / new lines etc in the name
+* since it's going to be used in dmesg or fdinfo's output.
+*/
+   for (i = 0; i < len; i++) {
+   if (!isgraph(new_name[i]))
+   new_name[i] = '-';
+   }
+
+   mutex_lock(&file_priv->name_lock);
+   kfree(file_priv->name);
+   if (len > 0) {
+   file_priv->name = new_name;
+   } else {
+   kfree(new_name);
+   file_priv->name = NULL;
+   }
+   mutex_unlock(&file_priv->name_lock);
+
+   return 0;
+}
+
  static int drm_ioctl_permit(u32 flags, struct drm_file *file_priv)
  {
/* ROOT_ONLY is only for CAP_SYS_ADMIN */
@@ -610,6 +656,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_PRIME_HANDLE_TO_FD, 
drm_prime_handle_to_fd_ioctl, DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_PRIME_FD_TO_HANDLE, 
drm_prime_fd_to_handle_ioctl, DRM_RENDER_ALLOW),
  
+	DRM_IOCTL_DEF(DRM_IOCTL_SET_NAME, drm_set_name, DRM_RENDER_ALLOW),

+
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETPLANERESOURCES, drm_mode_getplane_res, 
0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETCRTC, drm_mode_getcrtc, 0),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETCRTC, drm_mode_setcrtc, DRM_MASTER),
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 8c0030c77308..df26eee8f79c 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -388,6 +388,15 @@ struct drm_file {
 * Per-file buffer caches used by the PRIME buffer sharing code.
 */
struct drm_prime_file_private prime;
+
+   /**
+* @name:
+*
+* Userspace-provided name; useful for accounting and debugging.
+*/
+   const char *name;
+   /** @name_lock: Protects @name. */
+   struct mutex name_lock;
  };
  
  /**

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 16122819edfe..f5e92e4f909b 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -1024,6 +1024,13 @@ struct drm_crtc_queue_sequence {
__u64 user_data;/* user data passed to event */
  };
  
+#define DRM_NAME_MAX_LEN	64

+struct drm_set_name {
+   __u64 name_len;
+   __u64 name;
+};
+
+
  #if defined(__cplusplus)
  }
  #endif
@@ -1288,6 +1295,16 @@ extern "C" {
   */
  #define DRM_IOCTL_MODE_CLOSEFBDRM_IOWR(0xD0, struct 
drm_mode_closefb)
  
+/**

+ * DRM_IOCTL_SET_NAME - Attach a name to a drm_file
+ *
+ * This ioctl is similar to DMA_BUF_SET_NAME - it allows for easier tracking
+ * and debugging.
+ * The length of the name must <= DRM_NAME_MAX_LEN. All characters that are
+ * non-printable or whitespaces will be replaced by -.
+ */
+#define DRM_IOCTL_SET_NAME DRM_IOWR(0xD1, struct drm_set_name)
+


A comment, nice! :) Overal looks good to me.

Reviewed-by: Tvrtko Ursulin 

I do however wish for more opinions (before merging) on whether strings 
with invalid characters should perhaps instead be rejected. I don't 
currently have a solid argument either way.


Perhaps the only argument against silent transformation is if someone 
sets some wild string, then greps for it somewhere, which would be a 
false negative without the understanding of what kind of remapping 
kernel does. It is weak but it is uapi so worth discussing every crazy 
possibility I think.


On the other hand it would create another annoying source of EINVAL. :shrug:

Also, how are with with testing the DRM core features? Add something for 
the uapi in IGT/tests/drm_client_name, or some such?


Regards,

Tvrtko


  /*
   * Device specific ioctls should only be in their respective headers
   * The device specific ioctl range is from 0x40 to 0x9f.

Re: [PATCH v6 01/12] spi: add driver for intel graphics on-die spi device

2024-09-23 Thread Tvrtko Ursulin




On 21/09/2024 14:00, Winkler, Tomas wrote:





On Thu, Sep 19, 2024 at 09:54:24AM +, Winkler, Tomas wrote:

On Mon, Sep 16, 2024 at 04:49:17PM +0300, Alexander Usyskin wrote:



@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2019-2024, Intel Corporation. All rights reserved.
+ */



Please make the entire comment a C++ one so things look more

intentional.


This is how it is required by Linux spdx checker,


There is no incompatibility between SPDX and what I'm asking for...


+   size = sizeof(*spi) + sizeof(spi->regions[0]) * nregions;
+   spi = kzalloc(size, GFP_KERNEL);



Use at least array_size().



Regions is not fixed size array, it will not work.


Yes, that's the wrong helper - there is a relevent one though which I'm not
remembering right now.



I don't think there is one, you can allocate arrays but this is not the case 
here.


struct_size() probably.

Regards,

Tvrtko

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-16 Thread Tvrtko Ursulin




On 16/09/2024 13:20, Tvrtko Ursulin wrote:


On 16/09/2024 13:11, Christian König wrote:

Am 13.09.24 um 18:05 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we rename drm_sched_rq_add_entity() to
drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be
held, and also add the same expectation to
drm_sched_rq_update_fifo_locked().

Finally, to align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity_locked() and
drm_sched_rq_remove_fifo_locked() function signatures, we add rq as a
parameter to the latter.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  8 --
  drivers/gpu/drm/scheduler/sched_main.c   | 34 +++-
  include/drm/gpu_scheduler.h  |  7 ++---
  3 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index d982cebc6bee..c48f17faef41 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -517,6 +517,7 @@ struct drm_sched_job 
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)

  if (next) {
  spin_lock(&entity->lock);
  drm_sched_rq_update_fifo_locked(entity,
+    entity->rq,
  next->submit_ts);
  spin_unlock(&entity->lock);
  }
@@ -618,11 +619,14 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  sched = rq->sched;
  atomic_inc(sched->score);
-    drm_sched_rq_add_entity(rq, entity);
+
+    spin_lock(&rq->lock);
+    drm_sched_rq_add_entity_locked(rq, entity);
  if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-    drm_sched_rq_update_fifo_locked(entity, submit_ts);
+    drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
+    spin_unlock(&rq->lock);
  spin_unlock(&entity->lock);
  drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 18a952f73ecb..c0d3f6ac3ae3 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
  return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);

  }
-static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity,

+    struct drm_sched_rq *rq)
  {
-    struct drm_sched_rq *rq = entity->rq;
-
  if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
  rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
  RB_CLEAR_NODE(&entity->rb_tree_node);
  }
  }
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity 
*entity, ktime_t ts)

+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq,
+ ktime_t ts)
  {
  /*
   * Both locks need to be grabbed, one to protect from 
entity->rq change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts

   * other to update the rb tree structure.
   */
  lockdep_assert_held(&entity->lock);
+    lockdep_assert_held(&rq->lock);
-    spin_lock(&entity->rq->lock);
-
-    drm_sched_rq_remove_fifo_locked(entity);
+    drm_sched_rq_remove_fifo_locked(entity, rq);
  entity->oldest_job_waiting = ts;
-    rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+    rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
    drm_sched_entity_compare_before);
-
-    spin_unlock(&entity->rq->lock);
  }
  /**
@@ -203,25 +201,23 @@ static void drm_sched_rq_init(struct 
drm_gpu_scheduler *sched,

  }
  /**
- * drm_sched_rq_add_entity - add an entity
+ * drm_sched_rq_add_entity_locked - add an entity
   *
   * @rq: scheduler run queue
   * @entity: scheduler entity
   *
   * Adds a scheduler entity to the run queue.
   */
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
- struct drm_sched_entity *entity)
+void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq,
+    struct drm_sched_entity *entity)
  {
+    lockdep_assert_held

[PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-16 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we make drm_sched_rq_update_fifo_locked() and
drm_sched_rq_add_entity() expect the rq->lock to be held.

We also align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity() and
drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a
parameter to the latter.

v2:
 * Fix after rebase of the series.
 * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 12 --
 drivers/gpu/drm/scheduler/sched_main.c   | 29 
 include/drm/gpu_scheduler.h  |  3 ++-
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index d982cebc6bee..8ace1f1ea66b 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
+   struct drm_sched_rq *rq;
+
spin_lock(&entity->lock);
-   drm_sched_rq_update_fifo_locked(entity,
+   rq = entity->rq;
+   spin_lock(&rq->lock);
+   drm_sched_rq_update_fifo_locked(entity, rq,
next->submit_ts);
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
}
}
@@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
sched = rq->sched;
 
atomic_inc(sched->score);
+
+   spin_lock(&rq->lock);
drm_sched_rq_add_entity(rq, entity);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
 
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
 
drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 18a952f73ecb..5c83fb92bb89 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);
 }
 
-static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+   struct drm_sched_rq *rq)
 {
-   struct drm_sched_rq *rq = entity->rq;
-
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
 }
 
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq,
+ktime_t ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts
 * other to update the rb tree structure.
 */
lockdep_assert_held(&entity->lock);
+   lockdep_assert_held(&rq->lock);
 
-   spin_lock(&entity->rq->lock);
-
-   drm_sched_rq_remove_fifo_locked(entity);
+   drm_sched_rq_remove_fifo_locked(entity, rq);
 
entity->oldest_job_waiting = ts;
 
-   rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+   rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
  drm_sched_entity_compare_before);
-
-   spin_unlock(&entity->rq->lock);
 }
 
 /**
@@ -213,15 +211,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler 
*sched,
 void drm_sched_rq_add_entity(struct drm_sched_r

Re: [PATCH v2 3/3] drm/amdgpu: use drm_file name

2024-09-16 Thread Tvrtko Ursulin




On 16/09/2024 14:32, Pierre-Eric Pelloux-Prayer wrote:

In debugfs gem_info/vm_info files, timeout handler and page fault reports.

This information is useful with the virtio/native-context driver: this
allows the guest applications identifier to visible in amdgpu's output.

The output in amdgpu_vm_info/amdgpu_gem_info looks like this:
pid:12255   Process:glxgears/test-set-fd-name --

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 16 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 25 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  4 +--
  5 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6d5fd371d5ce..1712feb2c238 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1577,7 +1577,7 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct 
amdgpu_device *adev,
if (ret)
return ret;
  
-	amdgpu_vm_set_task_info(avm);

+   amdgpu_vm_set_task_info(avm, NULL);
  
  	return 0;

  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 1e475eb01417..d32dc547cc80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -310,7 +310,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
kvfree(chunk_array);
  
  	/* Use this opportunity to fill in task info for the vm */

-   amdgpu_vm_set_task_info(vm);
+   amdgpu_vm_set_task_info(vm, p->filp);
  
  	return 0;
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 0e617dff8765..0c52168edbaf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -997,6 +997,10 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file 
*m, void *unused)
if (r)
return r;
  
+	r = mutex_lock_interruptible(&file->name_lock);

+   if (r)
+   goto out;


Shouldn't this be in the below loop?


+
list_for_each_entry(file, &dev->filelist, lhead) {
struct task_struct *task;
struct drm_gem_object *gobj;
@@ -1012,8 +1016,13 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file 
*m, void *unused)
rcu_read_lock();
pid = rcu_dereference(file->pid);
task = pid_task(pid, PIDTYPE_TGID);
-   seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
-  task ? task->comm : "");
+   seq_printf(m, "pid %8d command %s", pid_nr(pid),
+  task ? task->comm : "");
+   if (file->name) {
+   seq_putc(m, '/');
+   seq_puts(m, file->name);
+   }
+   seq_puts(m, ":\n");
rcu_read_unlock();
  
  		spin_lock(&file->table_lock);

@@ -1024,7 +1033,8 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file 
*m, void *unused)
}
spin_unlock(&file->table_lock);
}
-
+   mutex_unlock(&file->name_lock);
+out:
mutex_unlock(&dev->filelist_mutex);
return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e20d19ae01b2..5701d74159d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2370,7 +2370,7 @@ static int amdgpu_vm_create_task_info(struct amdgpu_vm 
*vm)
   *
   * @vm: vm for which to set the info
   */
-void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
+void amdgpu_vm_set_task_info(struct amdgpu_vm *vm, struct drm_file *file)
  {
if (!vm->task_info)
return;
@@ -2385,7 +2385,28 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
return;
  
  	vm->task_info->tgid = current->group_leader->pid;

-   get_task_comm(vm->task_info->process_name, current->group_leader);
+   __get_task_comm(vm->task_info->process_name, TASK_COMM_LEN,
+   current->group_leader);
+   /* Append drm_client_name if set. */
+   if (file && file->name) {
+   mutex_lock(&file->name_lock);
+
+   /* Assert that process_name is big enough to store process_name,
+* so: (TASK_COMM_LEN - 1) + '/' + '\0'.
+* This way we can concat file->name without worrying about 
space.
+*/
+   static_assert(sizeof(vm->task_info->process_name) >= 
TASK_COMM_LEN + 1);
+   if (file->name) {
+   int n;
+
+   n = strlen(vm->task_info->process_name);
+   vm->task_info->proces

Re: [PATCH v2 2/3] drm: use drm_file name in fdinfo

2024-09-16 Thread Tvrtko Ursulin




On 16/09/2024 14:32, Pierre-Eric Pelloux-Prayer wrote:

Add an optional drm-client-name field to drm fdinfo's output.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  Documentation/gpu/drm-usage-stats.rst | 5 +
  drivers/gpu/drm/drm_file.c| 5 +
  2 files changed, 10 insertions(+)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index a80f95ca1b2f..ed1d7edbbc5f 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -73,6 +73,11 @@ scope of each device, in which case `drm-pdev` shall be 
present as well.
  Userspace should make sure to not double account any usage statistics by using
  the above described criteria in order to associate data to individual clients.
  
+- drm-client-name: 

+
+String optionally set by userspace using DRM_IOCTL_SET_NAME.
+
+
  Utilization
  ^^^
  
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c

index e9dd0e90a1f9..6a3621f50784 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -955,6 +955,11 @@ void drm_show_fdinfo(struct seq_file *m, struct file *f)
   PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
}
  
+	mutex_lock(&file->name_lock);

+   if (file->name)
+   drm_printf(&p, "drm-client-name:\t%s\n", file->name);
+   mutex_unlock(&file->name_lock);
+
if (dev->driver->show_fdinfo)
dev->driver->show_fdinfo(&p, file);
  }


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko

Re: [PATCH v2 1/3] drm: add DRM_SET_NAME ioctl

2024-09-16 Thread Tvrtko Ursulin




On 16/09/2024 14:32, Pierre-Eric Pelloux-Prayer wrote:

Giving the opportunity to userspace to associate a free-form
name with a drm_file struct is helpful for tracking and debugging.

This is similar to the existing DMA_BUF_SET_NAME ioctl.

Access to name is protected by a mutex, and the 'clients' debugfs
file has been updated to print it.

Userspace MR to use this ioctl:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428

The string passed by userspace is filtered a bit, to avoid messing
output when it's going to be printed (in dmesg, fdinfo, etc):
   * all chars failing isgraph() are replaced by '-'
   * if a 0-length string is passed the name is cleared

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  drivers/gpu/drm/drm_debugfs.c | 12 ++
  drivers/gpu/drm/drm_file.c|  5 +
  drivers/gpu/drm/drm_ioctl.c   | 42 +++
  include/drm/drm_file.h|  9 
  include/uapi/drm/drm.h| 14 
  5 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 6b239a24f1df..b7492225ae88 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, void *data)
kuid_t uid;
  
  	seq_printf(m,

-  "%20s %5s %3s master a %5s %10s\n",
+  "%20s %5s %3s master a %5s %10s %20s\n",
   "command",
   "tgid",
   "dev",
   "uid",
-  "magic");
+  "magic",
+  "name");
  
  	/* dev->filelist is sorted youngest first, but we want to present

 * oldest first (i.e. kernel, servers, clients), so walk backwardss.
@@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, void *data)
struct task_struct *task;
struct pid *pid;
  
+		mutex_lock(&priv->name_lock);

rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */
pid = rcu_dereference(priv->pid);
task = pid_task(pid, PIDTYPE_TGID);
uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
-   seq_printf(m, "%20s %5d %3d   %c%c %5d %10u\n",
+   seq_printf(m, "%20s %5d %3d   %c%c %5d %10u %20s\n",
   task ? task->comm : "",
   pid_vnr(pid),
   priv->minor->index,
   is_current_master ? 'y' : 'n',
   priv->authenticated ? 'y' : 'n',
   from_kuid_munged(seq_user_ns(m), uid),
-  priv->magic);
+  priv->magic,
+  priv->name ? priv->name : "");
rcu_read_unlock();
+   mutex_unlock(&priv->name_lock);
}
mutex_unlock(&dev->filelist_mutex);
return 0;
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 01fde94fe2a9..e9dd0e90a1f9 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
  
  	spin_lock_init(&file->master_lookup_lock);

mutex_init(&file->event_read_lock);
+   mutex_init(&file->name_lock);
  
  	if (drm_core_check_feature(dev, DRIVER_GEM))

drm_gem_open(dev, file);
@@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file)
WARN_ON(!list_empty(&file->event_list));
  
  	put_pid(rcu_access_pointer(file->pid));

+
+   mutex_destroy(&file->name_lock);
+   kfree(file->name);
+
kfree(file);
  }
  
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c

index 51f39912866f..b7d7bede0ab3 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -540,6 +540,46 @@ int drm_version(struct drm_device *dev, void *data,
return err;
  }
  
+static int drm_set_name(struct drm_device *dev, void *data,

+   struct drm_file *file_priv)
+{
+   struct drm_set_name *name = data;
+   void *user_ptr;


__user as kernel test robot reminds us.


+   char *new_name;
+   size_t i, len;
+
+   if (name->name_len >= NAME_MAX)
+   return -EINVAL;


Maybe it is a bit unsubstantiated, but I am leaning towards a feeling of 
lets define own smaller limit, like dma-buf does. If 32 is deemed too 
restrictive make it larger but 255 feels unnecessary. But I don't feel 
to strongly about this so if people insist we need the names this long 
then so be it.



+
+   user_ptr = u64_to_user_ptr(name->name);
+
+   new_name = memdup_user_nul(user_ptr, name->name_len);
+


Nit: I'd zap this blank line since it is breaking a logical group.


+   if (IS_ERR(new_name))
+   return PTR_ERR(new_name);
+
+   /* Filter out control char / spac

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-16 Thread Tvrtko Ursulin




On 16/09/2024 13:11, Christian König wrote:

Am 13.09.24 um 18:05 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we rename drm_sched_rq_add_entity() to
drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be
held, and also add the same expectation to
drm_sched_rq_update_fifo_locked().

Finally, to align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity_locked() and
drm_sched_rq_remove_fifo_locked() function signatures, we add rq as a
parameter to the latter.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  8 --
  drivers/gpu/drm/scheduler/sched_main.c   | 34 +++-
  include/drm/gpu_scheduler.h  |  7 ++---
  3 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index d982cebc6bee..c48f17faef41 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -517,6 +517,7 @@ struct drm_sched_job 
*drm_sched_entity_pop_job(struct drm_sched_entity *entity)

  if (next) {
  spin_lock(&entity->lock);
  drm_sched_rq_update_fifo_locked(entity,
+    entity->rq,
  next->submit_ts);
  spin_unlock(&entity->lock);
  }
@@ -618,11 +619,14 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  sched = rq->sched;
  atomic_inc(sched->score);
-    drm_sched_rq_add_entity(rq, entity);
+
+    spin_lock(&rq->lock);
+    drm_sched_rq_add_entity_locked(rq, entity);
  if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-    drm_sched_rq_update_fifo_locked(entity, submit_ts);
+    drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
+    spin_unlock(&rq->lock);
  spin_unlock(&entity->lock);
  drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 18a952f73ecb..c0d3f6ac3ae3 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
  return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);

  }
-static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity,

+    struct drm_sched_rq *rq)
  {
-    struct drm_sched_rq *rq = entity->rq;
-
  if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
  rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
  RB_CLEAR_NODE(&entity->rb_tree_node);
  }
  }
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, 
ktime_t ts)

+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq,
+ ktime_t ts)
  {
  /*
   * Both locks need to be grabbed, one to protect from entity->rq 
change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts

   * other to update the rb tree structure.
   */
  lockdep_assert_held(&entity->lock);
+    lockdep_assert_held(&rq->lock);
-    spin_lock(&entity->rq->lock);
-
-    drm_sched_rq_remove_fifo_locked(entity);
+    drm_sched_rq_remove_fifo_locked(entity, rq);
  entity->oldest_job_waiting = ts;
-    rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+    rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
    drm_sched_entity_compare_before);
-
-    spin_unlock(&entity->rq->lock);
  }
  /**
@@ -203,25 +201,23 @@ static void drm_sched_rq_init(struct 
drm_gpu_scheduler *sched,

  }
  /**
- * drm_sched_rq_add_entity - add an entity
+ * drm_sched_rq_add_entity_locked - add an entity
   *
   * @rq: scheduler run queue
   * @entity: scheduler entity
   *
   * Adds a scheduler entity to the run queue.
   */
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
- struct drm_sched_entity *entity)
+void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq,
+    struct drm_sched_entity *entity)
  {
+    lockdep_assert_held(&rq->lock);
+
  if (!list_emp

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-16 Thread Tvrtko Ursulin




On 16/09/2024 09:16, Philipp Stanner wrote:

On Fri, 2024-09-13 at 17:05 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Current kerneldoc for struct drm_sched_rq incompletely documents what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by the
lock.

While at it, lets also re-order the members so all protected by the
lock
are in a single group.

v2:
  * Refer variables by kerneldoc syntax, more verbose commit text.
(Philipp)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 


Looks good, thx

Reviewed-by: Philipp Stanner 


Thanks!

4/8 and 8/8 are now the only two left with no r-b.

---
  include/drm/gpu_scheduler.h | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/drm/gpu_scheduler.h
b/include/drm/gpu_scheduler.h
index 38465b78c7d5..2f58af00f792 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -243,10 +243,10 @@ struct drm_sched_entity {
  /**
   * struct drm_sched_rq - queue of entities to be scheduled.
   *
- * @lock: to modify the entities list.
   * @sched: the scheduler to which this rq belongs to.
- * @entities: list of the entities to be scheduled.
+ * @lock: protects @entities, @rb_tree_root and @current_entity.


nit: in case you'll provide a new version anyways you could consider
sorting these three to be congruent with the lines below.


To me it looks the order of kerneldoc vs members is aligned. Unless I 
missed what you mean here?


Regards,

Tvrtko


   * @current_entity: the entity which is to be scheduled.
+ * @entities: list of the entities to be scheduled.
   * @rb_tree_root: root of time based priory queue of entities for
FIFO scheduling
   *
   * Run queue is a set of entities scheduling command submissions for
@@ -254,10 +254,12 @@ struct drm_sched_entity {
   * the next entity to emit commands from.
   */
  struct drm_sched_rq {
-   spinlock_t  lock;
    struct drm_gpu_scheduler*sched;
-   struct list_headentities;
+
+   spinlock_t  lock;
+   /* Following members are protected by the @lock: */
    struct drm_sched_entity *current_entity;
+   struct list_headentities;
    struct rb_root_cached   rb_tree_root;
  };

[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Current kerneldoc for struct drm_sched_rq incompletely documents what
fields are protected by the lock.

This is not good because it is misleading.

Lets fix it by listing all the elements which are protected by the lock.

While at it, lets also re-order the members so all protected by the lock
are in a single group.

v2:
 * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 include/drm/gpu_scheduler.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 38465b78c7d5..2f58af00f792 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -243,10 +243,10 @@ struct drm_sched_entity {
 /**
  * struct drm_sched_rq - queue of entities to be scheduled.
  *
- * @lock: to modify the entities list.
  * @sched: the scheduler to which this rq belongs to.
- * @entities: list of the entities to be scheduled.
+ * @lock: protects @entities, @rb_tree_root and @current_entity.
  * @current_entity: the entity which is to be scheduled.
+ * @entities: list of the entities to be scheduled.
  * @rb_tree_root: root of time based priory queue of entities for FIFO 
scheduling
  *
  * Run queue is a set of entities scheduling command submissions for
@@ -254,10 +254,12 @@ struct drm_sched_entity {
  * the next entity to emit commands from.
  */
 struct drm_sched_rq {
-   spinlock_t  lock;
struct drm_gpu_scheduler*sched;
-   struct list_headentities;
+
+   spinlock_t  lock;
+   /* Following members are protected by the @lock: */
struct drm_sched_entity *current_entity;
+   struct list_headentities;
struct rb_root_cached   rb_tree_root;
 };
 
-- 
2.46.0

[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Christian suggested to rename the lock and improve the documentation of
what it protects. And to also re-order the structure members so all
protected by the lock are together in a block.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 28 
 drivers/gpu/drm/scheduler/sched_main.c   |  2 +-
 include/drm/gpu_scheduler.h  | 15 +++--
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index aff79055643f..d982cebc6bee 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
/* We start in an idle state. */
complete_all(&entity->entity_idle);
 
-   spin_lock_init(&entity->rq_lock);
+   spin_lock_init(&entity->lock);
spsc_queue_init(&entity->job_queue);
 
atomic_set(&entity->fence_seq, 0);
@@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct 
drm_sched_entity *entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
@@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity 
*entity)
if (!entity->rq)
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->stopped = true;
drm_sched_rq_remove_entity(entity->rq, entity);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
/* Make sure this entity is not used by the scheduler at the moment */
wait_for_completion(&entity->entity_idle);
@@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
   enum drm_sched_priority priority)
 {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->priority = priority;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_set_priority);
 
@@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
if (next) {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
drm_sched_rq_update_fifo_locked(entity,
next->submit_ts);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
}
}
 
@@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity 
*entity)
if (fence && !dma_fence_is_signaled(fence))
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
rq = sched ? sched->sched_rq[entity->priority] : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
}
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
if (entity->num_sched_list == 1)
entity->sched_list = NULL;
@@ -606,9 +606,9 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
struct drm_sched_rq *rq;
 
/* Add the entity to the run queue */
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
if (entity->stopped) {
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
DRM_ERROR("Trying to push to a killed entity\n");
return;
@@ -623,7 +623,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo_locked(entity, submit_ts);
 
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);

[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we rename drm_sched_rq_add_entity() to
drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be
held, and also add the same expectation to
drm_sched_rq_update_fifo_locked().

Finally, to align drm_sched_rq_update_fifo_locked(),
drm_sched_rq_add_entity_locked() and
drm_sched_rq_remove_fifo_locked() function signatures, we add rq as a
parameter to the latter.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  8 --
 drivers/gpu/drm/scheduler/sched_main.c   | 34 +++-
 include/drm/gpu_scheduler.h  |  7 ++---
 3 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index d982cebc6bee..c48f17faef41 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -517,6 +517,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
if (next) {
spin_lock(&entity->lock);
drm_sched_rq_update_fifo_locked(entity,
+   entity->rq,
next->submit_ts);
spin_unlock(&entity->lock);
}
@@ -618,11 +619,14 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
sched = rq->sched;
 
atomic_inc(sched->score);
-   drm_sched_rq_add_entity(rq, entity);
+
+   spin_lock(&rq->lock);
+   drm_sched_rq_add_entity_locked(rq, entity);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
 
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
 
drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 18a952f73ecb..c0d3f6ac3ae3 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,17 +153,18 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);
 }
 
-static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+   struct drm_sched_rq *rq)
 {
-   struct drm_sched_rq *rq = entity->rq;
-
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
 }
 
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq,
+ktime_t ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
@@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct 
drm_sched_entity *entity, ktime_t ts
 * other to update the rb tree structure.
 */
lockdep_assert_held(&entity->lock);
+   lockdep_assert_held(&rq->lock);
 
-   spin_lock(&entity->rq->lock);
-
-   drm_sched_rq_remove_fifo_locked(entity);
+   drm_sched_rq_remove_fifo_locked(entity, rq);
 
entity->oldest_job_waiting = ts;
 
-   rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+   rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
  drm_sched_entity_compare_before);
-
-   spin_unlock(&entity->rq->lock);
 }
 
 /**
@@ -203,25 +201,23 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler 
*sched,
 }
 
 /**
- * drm_sched_rq_add_entity - add an entity
+ * drm_sched_rq_add_entity_locked - add an entity
  *
  * @rq: scheduler run queue
  * @entity: scheduler entity
  *
  * Adds a scheduler entity to the run queue.
  */
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
-struct drm_sched_entity

[PATCH v3 0/8] DRM scheduler fixes and improvements

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Re-spin of the series from last week. Changelog is in individual patches.

Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 

Tvrtko Ursulin (8):
  drm/sched: Add locking to drm_sched_entity_modify_sched
  drm/sched: Always wake up correct scheduler in
drm_sched_entity_push_job
  drm/sched: Always increment correct scheduler score
  drm/sched: Optimise drm_sched_entity_push_job
  drm/sched: Stop setting current entity in FIFO mode
  drm/sched: Re-order struct drm_sched_rq members for clarity
  drm/sched: Re-group and rename the entity run-queue lock
  drm/sched: Further optimise drm_sched_entity_push_job

 drivers/gpu/drm/scheduler/sched_entity.c | 49 
 drivers/gpu/drm/scheduler/sched_main.c   | 37 --
 include/drm/gpu_scheduler.h  | 32 +---
 3 files changed, 68 insertions(+), 50 deletions(-)

-- 
2.46.0

[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

It does not seem there is a need to set the current entity in FIFO mode
since ot only serves as being a "cursor" in round-robin mode. Even if
scheduling mode is changed at runtime the change in behaviour is simply
to restart from the first entity, instead of continuing in RR mode from
where FIFO left it, and that sounds completely fine.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Acked-by: Christian König 
Reviewed-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index d0ee0ba75a86..74eaa3b23821 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -349,7 +349,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler 
*sched,
return ERR_PTR(-ENOSPC);
}

-   rq->current_entity = entity;
reinit_completion(&entity->entity_idle);
break;
}
-- 
2.46.0

[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to immediately re-acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.

v2:
 * Remove drm_sched_rq_update_fifo() altogether. (Christian)

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 13 +
 drivers/gpu/drm/scheduler/sched_main.c   |  6 +++---
 include/drm/gpu_scheduler.h  |  2 +-
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6645a8524699..aff79055643f 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
struct drm_sched_job *next;
 
next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
-   if (next)
-   drm_sched_rq_update_fifo(entity, next->submit_ts);
+   if (next) {
+   spin_lock(&entity->rq_lock);
+   drm_sched_rq_update_fifo_locked(entity,
+   next->submit_ts);
+   spin_unlock(&entity->rq_lock);
+   }
}
 
/* Jobs and entities might have different lifecycles. Since we're
@@ -615,10 +619,11 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 
atomic_inc(sched->score);
drm_sched_rq_add_entity(rq, entity);
-   spin_unlock(&entity->rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+
+   spin_unlock(&entity->rq_lock);
 
drm_sched_wakeup(sched, entity);
}
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index f093616fe53c..d0ee0ba75a86 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -163,14 +163,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *enti
}
 }
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
 {
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
 * for entity from within concurrent drm_sched_entity_select_rq and the
 * other to update the rb tree structure.
 */
-   spin_lock(&entity->rq_lock);
+   lockdep_assert_held(&entity->rq_lock);
+
spin_lock(&entity->rq->lock);
 
drm_sched_rq_remove_fifo_locked(entity);
@@ -181,7 +182,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity 
*entity, ktime_t ts)
  drm_sched_entity_compare_before);
 
spin_unlock(&entity->rq->lock);
-   spin_unlock(&entity->rq_lock);
 }
 
 /**
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index a8d19b10f9b8..38465b78c7d5 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
 void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity);
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts);
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts);
 
 int drm_sched_entity_init(struct drm_sched_entity *entity,
  enum drm_sched_priority priority,
-- 
2.46.0

[PATCH 3/8] drm/sched: Always increment correct scheduler score

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Entities run queue can change during drm_sched_entity_push_job() so make
sure to update the score consistently.

Signed-off-by: Tvrtko Ursulin 
Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple 
queues")
Cc: Nirmoy Das 
Cc: Christian König 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.9+
Reviewed-by: Christian König 
Reviewed-by: Nirmoy Das 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 76e422548d40..6645a8524699 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
ktime_t submit_ts;
 
trace_drm_sched_job(sched_job, entity);
-   atomic_inc(entity->rq->sched->score);
WRITE_ONCE(entity->last_user, current->group_leader);
 
/*
@@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
rq = entity->rq;
sched = rq->sched;
 
+   atomic_inc(sched->score);
drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
 
-- 
2.46.0

[PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Since drm_sched_entity_modify_sched() can modify the entities run queue,
lets make sure to only dereference the pointer once so both adding and
waking up are guaranteed to be consistent.

Alternative of moving the spin_unlock to after the wake up would for now
be more problematic since the same lock is taken inside
drm_sched_rq_update_fifo().

v2:
 * Improve commit message. (Philipp)
 * Cache the scheduler pointer directly. (Christian)

Signed-off-by: Tvrtko Ursulin 
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Philipp Stanner 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v5.7+
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index ae8be30472cd..76e422548d40 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 
/* first job wakes up scheduler */
if (first) {
+   struct drm_gpu_scheduler *sched;
+   struct drm_sched_rq *rq;
+
/* Add the entity to the run queue */
spin_lock(&entity->rq_lock);
if (entity->stopped) {
@@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
return;
}
 
-   drm_sched_rq_add_entity(entity->rq, entity);
+   rq = entity->rq;
+   sched = rq->sched;
+
+   drm_sched_rq_add_entity(rq, entity);
spin_unlock(&entity->rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo(entity, submit_ts);
 
-   drm_sched_wakeup(entity->rq->sched, entity);
+   drm_sched_wakeup(sched, entity);
}
 }
 EXPORT_SYMBOL(drm_sched_entity_push_job);
-- 
2.46.0

[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-13 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Without the locking amdgpu currently can race between
amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and
drm_sched_job_arm(), leading to the latter accesing potentially
inconsitent entity->sched_list and entity->num_sched_list pair.

v2:
 * Improve commit message. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc: Philipp Stanner 
Cc:  # v5.7+
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 58c8161289fe..ae8be30472cd 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity 
*entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
+   spin_lock(&entity->rq_lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
+   spin_unlock(&entity->rq_lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
-- 
2.46.0

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin




On 13/09/2024 13:19, Philipp Stanner wrote:

On Wed, 2024-09-11 at 13:22 +0100, Tvrtko Ursulin wrote:


On 10/09/2024 11:25, Philipp Stanner wrote:

On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch
titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny
bit
larger refactoring we can do the same optimisation on the rq-

lock

(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same
lock.)

To achieve this we rename drm_sched_rq_add_entity() to
drm_sched_rq_add_entity_locked(), making it expect the rq->lock
to be
held, and also add the same expectation to
drm_sched_rq_update_fifo_locked().

For more stream-lining we also add the run-queue as an explicit
parameter
to drm_sched_rq_remove_fifo_locked() to avoid both callers and
callee
having to dereference entity->rq.


Why is dereferencing it a problem?


As you have noticed below the API is a bit unsightly. Consider for
example this call chain:

drm_sched_entity_kill(entity)
  drm_sched_rq_remove_entity(entity->rq, entity);
  drm_sched_rq_remove_fifo_locked(entity);
  struct drm_sched_rq *rq = entity->rq;

A bit confused, no?

I thought adding rq to remove_fifo_locked at least removes one back
and
forth between the entity->rq and rq.

And then if we cache the rq in a local variable, after having
explicitly
taken the correct lock, we have this other call chain example:

drm_sched_entity_push_job()
...
  rq = entity->rq;
  spin_lock(rq->lock);

  drm_sched_rq_add_entity_locked(rq, entity);
  drm_sched_rq_update_fifo_locked(rq, entity, submit_ts);

  spin_unlock(rq->lock);

To me at least this reads more streamlined.


Alright, doesn't sound to bad, but




Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
   drivers/gpu/drm/scheduler/sched_entity.c |  7 ++--
   drivers/gpu/drm/scheduler/sched_main.c   | 41 +
-
--
   include/drm/gpu_scheduler.h  |  7 ++--
   3 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
b/drivers/gpu/drm/scheduler/sched_entity.c
index b4c4f9923e0b..2102c726d275 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job)
    sched = rq->sched;
   
   		atomic_inc(sched->score);

-   drm_sched_rq_add_entity(rq, entity);
+
+   spin_lock(&rq->lock);
+   drm_sched_rq_add_entity_locked(rq, entity);
   
   		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)

-   drm_sched_rq_update_fifo_locked(entity,
submit_ts);
+   drm_sched_rq_update_fifo_locked(entity,
rq,
submit_ts);
   
+		spin_unlock(&rq->lock);

    spin_unlock(&entity->lock);
   
   		drm_sched_wakeup(sched, entity);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 937e7d1cfc49..1ccd2aed2d32 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,41 +153,44 @@ static __always_inline bool
drm_sched_entity_compare_before(struct rb_node *a,
    return ktime_before(ent_a->oldest_job_waiting, ent_b-

oldest_job_waiting);

   }
   
-static inline void drm_sched_rq_remove_fifo_locked(struct

drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct
drm_sched_entity
*entity,
+       struct drm_sched_rq
*rq)


I would then at least like to see a comment somewhere telling the
reader why rq is taken as a separate variable. One might otherwise
easily wonder why it's not obtained through the entity and what the
difference is.


I failed to find a nice place to put it. I'll send v3 of the series with 
some changes soo and then please have another look at this patch and see 
if you can think of where it would look good.


Regards,

Tvrtko





So here we'd add a new function parameter that still doesn't allow
for
getting rid of 'entity' as a parameter.


We can't get rid of the entity.

Maaaybe instead we could get rid of the rq in the whole chain, I mean
from drm_sched_rq_add_entity and drm_sched_rq_remove_entity to start
with.


Let's postpone that.



But then to remove double re-lock we still (like in this patch) need
to
make the callers take the locks and rename the helpers with _locked
suffix. Otherwise it would be incosistent that a lock is taken
outside
the helpers with no _locked suffix.

I am not sure if that is better. All it achieves is remove the rq as
explicit parameter my making the callees dereference it from the
entity.

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-13 Thread Tvrtko Ursulin




On 10/09/2024 16:03, Christian König wrote:

Am 10.09.24 um 11:46 schrieb Tvrtko Ursulin:


On 10/09/2024 10:08, Christian König wrote:

Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)


I think that goes into the wrong direction.

Probably better to move this here into drm_sched_rq_add_entity():

       if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
          drm_sched_rq_update_fifo_locked(entity, submit_ts);

We can then also drop adding the entity to the rr list when FIFO is 
in use.


Unfortuntely there is a few other places which appear to rely on the 
list. Like drm_sched_fini,


That should be only a warning.


Warning as in?

drm_sched_increase_karma and 


The karma handling was another bad idea from AMD how to populate back 
errors to userspace and I've just recently documented together with Sima 
that we should use dma-fence errors instead.


Just didn't had time to tackle cleaning that up yet.


even amdgpu_job_stop_all_jobs_on_sched.


Uff, seeing that for the first time just now. Another bad idea how to 
handle things which doesn't take the appropriate locks and looks racy to 
me.



Latter could perhaps be solved by adding an iterator helper to the 
scheduler, which would perhaps be a good move for component isolation. 
And first two could be handled by implementing a complete and mutually 
exclusive duality of how entities are walked depending on scheduling 
mode. Plus making the scheduling mode only be configurable at boot. It 
feels doable but significant work and in the meantime removing the 
double re-lock maybe acceptable?


I don't think we should optimize for something we want to remove in the 
long term.


I knew using the term optimise would just making things more difficult 
for myself. :) Lets view this as cleaning up the API to avoid the 
inelegance of taking the same lock twice right next to each other.


If we can achieve this while not making the API worse then there is 
nothing to lose either short, med or long term.


If possible I would rather say that we should completely drop the RR 
approach and only use FIFO or even something more sophisticated.


No complaints from me, but I don't know how that would work other than 
putting a depreciation warning if someone selected RR. And keeping that 
for a good number of kernel releases. Any other ideas?


Regards,

Tvrtko

Re: [PATCH 1/3] drm: add DRM_SET_NAME ioctl

2024-09-13 Thread Tvrtko Ursulin




On 13/09/2024 13:17, Pierre-Eric Pelloux-Prayer wrote:

Hi Tvrtko,

Le 12/09/2024 à 10:13, Tvrtko Ursulin a écrit :


On 11/09/2024 15:58, Pierre-Eric Pelloux-Prayer wrote:

Giving the opportunity to userspace to associate a free-form
name with a drm_file struct is helpful for tracking and debugging.

This is similar to the existing DMA_BUF_SET_NAME ioctl.

Access to name is protected by a mutex, and the 'clients' debugfs
file has been updated to print it.

Userspace MR to use this ioctl:

https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428


Idea seems useful to me. Various classes of comments/questions below:

Signed-off-by: Pierre-Eric Pelloux-Prayer 


---
  drivers/gpu/drm/drm_debugfs.c | 12 
  drivers/gpu/drm/drm_file.c    |  5 +
  drivers/gpu/drm/drm_ioctl.c   | 28 
  include/drm/drm_file.h    |  9 +
  include/uapi/drm/drm.h    | 14 ++
  5 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_debugfs.c 
b/drivers/gpu/drm/drm_debugfs.c

index 6b239a24f1df..b7492225ae88 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, 
void *data)

  kuid_t uid;
  seq_printf(m,
-   "%20s %5s %3s master a %5s %10s\n",
+   "%20s %5s %3s master a %5s %10s %20s\n",
 "command",
 "tgid",
 "dev",
 "uid",
-   "magic");
+   "magic",
+   "name");
  /* dev->filelist is sorted youngest first, but we want to present
   * oldest first (i.e. kernel, servers, clients), so walk 
backwardss.
@@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, 
void *data)

  struct task_struct *task;
  struct pid *pid;
+    mutex_lock(&priv->name_lock);
  rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */
  pid = rcu_dereference(priv->pid);
  task = pid_task(pid, PIDTYPE_TGID);
  uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
-    seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u\n",
+    seq_printf(m, "%20s %5d %3d   %c    %c %5d %10u %20s\n",
 task ? task->comm : "",
 pid_vnr(pid),
 priv->minor->index,
 is_current_master ? 'y' : 'n',
 priv->authenticated ? 'y' : 'n',
 from_kuid_munged(seq_user_ns(m), uid),
-   priv->magic);
+   priv->magic,
+   priv->name ? priv->name : "");
  rcu_read_unlock();
+    mutex_unlock(&priv->name_lock);


FWIW it is possible you could get away without the need for a lock on 
the read side if you make the pointer RCU managed and stick a 
synchronize_rcu before kfree in the ioctl update path.


Not because this lock would be a contentended one per se, but mostly 
to avoid complications such as amdgpu_debugfs_gem_info_show() where 
3/3 has it broken - cannot take the mutex in rcu locked section. Just 
something to consider in case it would end up simpler code.


I don't mind using RCU or a mutex. Christian suggested a mutex, so I 
used that, but I'm happy to switch if the RCU approach is preferred.


Mutex is fine as I said. Just mentioning RCU since it feels trivial and 
avoids the complications in amdgpu_debugfs_gem_info_show().



  mutex_unlock(&dev->filelist_mutex);
  return 0;
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 01fde94fe2a9..558151c3912e 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor 
*minor)

  spin_lock_init(&file->master_lookup_lock);
  mutex_init(&file->event_read_lock);
+    mutex_init(&file->name_lock);
  if (drm_core_check_feature(dev, DRIVER_GEM))
  drm_gem_open(dev, file);
@@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file)
  WARN_ON(!list_empty(&file->event_list));
  put_pid(rcu_access_pointer(file->pid));
+
+    mutex_destroy(&file->name_lock);
+    kvfree(file->name);


I think kfree is correct here.



OK, I'll update in v2.


+
  kfree(file);
  }
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 51f39912866f..ba2f2120e99b 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -540,6 +540,32 @@ int drm_version(struct drm_device *dev, void *data,
  return err;
  }
+static int drm_set_name(struct drm_device *dev, void *data,
+    struct drm_file *file_priv)
+{
+    struct drm_set_name *name = data;
+    void *us

Re: [PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

2024-09-13 Thread Tvrtko Ursulin




On 13/09/2024 13:30, Philipp Stanner wrote:

On Fri, 2024-09-13 at 12:56 +0100, Tvrtko Ursulin wrote:


Hi,

On 28/08/2024 10:41, Philipp Stanner wrote:

drm_sched_job_init() has no control over how users allocate struct
drm_sched_job. Unfortunately, the function can also not set some
struct
members such as job->sched.


job->sched usage from within looks like a bug. But not related to the
memset you add.

For this one something like this looks easiest for a start:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index ab53ab486fe6..877113b01af2 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
   * or worse--a blank screen--leave a trail in the
   * logs, so this can be debugged easier.
   */
-   drm_err(job->sched, "%s: entity has no rq!\n",
__func__);
+   pr_err("%s: entity has no rq!\n", __func__);
  return -ENOENT;
  }

Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to
variable
number of run-queues")
Cc:  # v6.7+


Danilo and I already solved that:

https://lore.kernel.org/all/20240827074521.12828-2-pstan...@redhat.com/


Ah.. I saw the link to this in your maintainership thread and 
superficially assumed it is among the pending stuff. All good.





This could theoretically lead to UB by users dereferencing the
struct's
pointer members too early.


Hmm if drm_sched_job_init returned an error callers should not
dereference anything. What was actually the issue you were debugging?


I was learning about the scheduler, wrote a dummy driver and had
awkward behavior. Turned out it was this pointer not being initialized.
I would have seen it immediately if it were NULL.

The actual issue was and is IMO that a function called
drm_sched_job_init() initializes the job. But it doesn't, it only
partially initializes it. Only after drm_sched_job_arm() ran you're
actually ready to go.


In my experience one good approach when developing stuff is to have the 
various kernel debugging aids enabled. Lockdep, SLAB debugging, memory 
poisoning, kfence.. Then if you were allocating your job without 
GFP_ZERO, _and_ dereferencing something too early out of 
misunderstanding of the API, you would get something obvious in the oops 
and not a random pointer.


Which also applies to various CI systems, such as the Intel's one which 
already runs a debug kernel and a lot of these mistakes are caught 
instantly.



Adding a memset is I think not the best solution since it is very
likely
redundant to someone doing a kzalloc in the first place.


It is redundant in most cases, but it is effectively for free. I
measured the runtime with 1e6 jobs with and without memset and there
was no difference.


I guess if kzalloc and drm_sched_job_init() are close enough in time so 
that cachelines stays put, and depending how you measure, it may be hard 
to see but cost if still there.


For instance 
https://lore.kernel.org/amd-gfx/20240813140310.82706-1-tursu...@igalia.com/ 
I can see with perf that both memsets are hotspots even when testing 
with glxgears and vsync off.


But I don't feel too strongly about this and there definitely is sense 
in initializing everything. Perhaps even instead of a memset we should 
use correct methods per field? Since in there we have spcs_node, 
atomic_t, ktime_t, dma_fence_cb (even in an annoying union), 
drm_sched_priority.. In an ideal world all those would have their 
initializers. But some don't so meh.


Regards,

Tvrtko


It is easier to debug such issues if these pointers are initialized
to
NULL, so dereferencing them causes a NULL pointer exception.
Accordingly, drm_sched_entity_init() does precisely that and
initializes
its struct with memset().

Initialize parameter "job" to 0 in drm_sched_job_init().

Signed-off-by: Philipp Stanner 
---
No changes in v2.
---
   drivers/gpu/drm/scheduler/sched_main.c | 8 
   1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 356c30fa24a8..b0c8ad10b419 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job
*job,
    return -EINVAL;
    }
   
+	/*

+* We don't know for sure how the user has allocated.
Thus, zero the
+* struct so that unallowed (i.e., too early) usage of
pointers that
+* this function does not set is guaranteed to lead to a
NULL pointer
+* exception instead of UB.
+*/
+   memset(job, 0, sizeof(*job));
+
    job->entity = entity;
    job->credits = credits;
    job->s_fence = drm_sched_fence_alloc(entity, owner);

Re: [PATCH v2 2/2] drm/sched: warn about drm_sched_job_init()'s partial init

2024-09-13 Thread Tvrtko Ursulin




On 28/08/2024 10:41, Philipp Stanner wrote:

drm_sched_job_init()'s name suggests that after the function succeeded,
parameter "job" will be fully initialized. This is not the case; some
members are only later set, notably "job->sched" by drm_sched_job_arm().

Document that drm_sched_job_init() does not set all struct members.

Document that job->sched in particular is uninitialized before
drm_sched_job_arm().

Signed-off-by: Philipp Stanner 
---
Changes in v2:
   - Change grammar in the new comments a bit.
---
  drivers/gpu/drm/scheduler/sched_main.c | 4 
  include/drm/gpu_scheduler.h| 7 +++
  2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b0c8ad10b419..721373938c1e 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -781,6 +781,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
   * Drivers must make sure drm_sched_job_cleanup() if this function returns
   * successfully, even when @job is aborted before drm_sched_job_arm() is 
called.
   *
+ * Note that this function does not assign a valid value to each struct member
+ * of struct drm_sched_job. Take a look at that struct's documentation to see
+ * who sets which struct member with what lifetime.


First sentence is fine, but the second I don't see the those details in 
struct drm_sched_job. (And I am not saying that they must be listed. IMO 
at some point it is better to have a high level overview than describe 
the lifetime rules with individual members.)



+ *
   * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
   * has died, which can mean that there's no valid runqueue for a @entity.
   * This function returns -ENOENT in this case (which probably should be -EIO 
as
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5acc64954a88..04a268cd22f1 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
dma_fence *f);
  struct drm_sched_job {
struct spsc_nodequeue_node;
struct list_headlist;
+
+   /*
+* The scheduler this job is or will be scheduled on.
+*
+* Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
+* callback "free_job()" has been called.


This is interesting - I was not sure where lifetime for job->sched is 
defined and couldn't find it browsing around. Where did you find the 
clues to tie it to the free_job() callback?


Regards,

Tvrtko


+*/
struct drm_gpu_scheduler*sched;
struct drm_sched_fence  *s_fence;

Re: [PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

2024-09-13 Thread Tvrtko Ursulin




Hi,

On 28/08/2024 10:41, Philipp Stanner wrote:

drm_sched_job_init() has no control over how users allocate struct
drm_sched_job. Unfortunately, the function can also not set some struct
members such as job->sched.


job->sched usage from within looks like a bug. But not related to the 
memset you add.


For this one something like this looks easiest for a start:

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index ab53ab486fe6..877113b01af2 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 * or worse--a blank screen--leave a trail in the
 * logs, so this can be debugged easier.
 */
-   drm_err(job->sched, "%s: entity has no rq!\n", __func__);
+   pr_err("%s: entity has no rq!\n", __func__);
return -ENOENT;
}

Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable 
number of run-queues")

Cc:  # v6.7+


This could theoretically lead to UB by users dereferencing the struct's
pointer members too early.


Hmm if drm_sched_job_init returned an error callers should not 
dereference anything. What was actually the issue you were debugging?


Adding a memset is I think not the best solution since it is very likely 
redundant to someone doing a kzalloc in the first place.


Regards,

Tvrtko


It is easier to debug such issues if these pointers are initialized to
NULL, so dereferencing them causes a NULL pointer exception.
Accordingly, drm_sched_entity_init() does precisely that and initializes
its struct with memset().

Initialize parameter "job" to 0 in drm_sched_job_init().

Signed-off-by: Philipp Stanner 
---
No changes in v2.
---
  drivers/gpu/drm/scheduler/sched_main.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 356c30fa24a8..b0c8ad10b419 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job *job,
return -EINVAL;
}
  
+	/*

+* We don't know for sure how the user has allocated. Thus, zero the
+* struct so that unallowed (i.e., too early) usage of pointers that
+* this function does not set is guaranteed to lead to a NULL pointer
+* exception instead of UB.
+*/
+   memset(job, 0, sizeof(*job));
+
job->entity = entity;
job->credits = credits;
job->s_fence = drm_sched_fence_alloc(entity, owner);

Re: [PATCH 7/7] dma-buf: rework the enable_signaling handling

2024-09-13 Thread Tvrtko Ursulin




On 11/09/2024 09:59, Christian König wrote:

The enable_signaling callback is the only function the dma_fence
objects calls with the fence lock held (the signaled callback might be
called with the fence lock held as well, but that isn't guaranted).

The background of that decision was to avoid races with other
CPUs trying to signal the fence at the same time and potentially
enforce an ordering of fence signaling.

The only problem is that this never worked correctly.

First of all the enabling_signaling call can still race with
signaling a fence, it's just that informing the installed callbacks
is blocking for the enable signaling to finish. If that is required
(radeon is an example of that) then drivers can still grab the fence
themselves, everybody else doesn't need that.

Then regarding fence ordering it is perfectly possible that fences
emitted in the order A,B,C call their installed callbacks in the
order B, C, A. The background is that the optimization to signal
fences from dma_fence_is_signaled() decouples the fence signaling
from the interrupt handlers. The result is that fence C can signal
because somebody queried it's state while A and B still wait for their
interrupt to arrive.

While those two reasons are just unnecessary churn the documentation
is simply erroneous and suggests an illegal operation to
implementations: "This function can be called from atomic context,
but not from irq context, so normal spinlocks can be used.". Since
the enable_signaling callback was called with interrupts disabled that
practice could deadlock.

Furtunately nobody actually ran into problems with that, but
considering that we should probably re-work the locking to allow
dma_fence objects to exists after their drivers were unloaded this
patch re-works all this to not call the callback with the dma_fence
spinlock held and rather move the handling into the drivers which
actually need it.

Signed-off-by: Christian König 
---
  drivers/dma-buf/dma-fence-array.c |  7 +-
  drivers/dma-buf/dma-fence-chain.c | 13 ++--
  drivers/dma-buf/dma-fence.c   | 68 +++
  drivers/dma-buf/st-dma-fence-chain.c  |  4 +-
  drivers/dma-buf/st-dma-fence-unwrap.c | 22 +++---
  drivers/dma-buf/st-dma-fence.c| 16 ++---
  drivers/dma-buf/st-dma-resv.c | 10 +--
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  | 12 ++--
  drivers/gpu/drm/i915/i915_active.c|  2 +-
  drivers/gpu/drm/i915/i915_request.c   | 12 +++-
  drivers/gpu/drm/nouveau/nouveau_fence.c   |  9 ++-
  drivers/gpu/drm/radeon/radeon_fence.c | 17 +++--
  drivers/gpu/drm/ttm/ttm_bo.c  |  2 +-
  drivers/gpu/drm/xe/xe_bo.c|  2 +-
  drivers/gpu/drm/xe/xe_hw_fence.c  |  4 +-
  drivers/gpu/drm/xe/xe_preempt_fence.c |  3 +-
  drivers/gpu/drm/xe/xe_pt.c|  2 +-
  drivers/gpu/drm/xe/xe_sched_job.c |  2 +-
  drivers/gpu/drm/xe/xe_vm.c|  6 +-
  drivers/gpu/host1x/fence.c| 14 ++--
  include/linux/dma-fence.h | 35 +++---
  21 files changed, 123 insertions(+), 139 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c
index c74ac197d5fe..1022b08c9b42 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -67,7 +67,7 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
dma_fence_put(&array->base);
  }
  
-static bool dma_fence_array_enable_signaling(struct dma_fence *fence)

+static void dma_fence_array_enable_signaling(struct dma_fence *fence)
  {
struct dma_fence_array *array = to_dma_fence_array(fence);
struct dma_fence_array_cb *cb = array->callbacks;
@@ -92,12 +92,11 @@ static bool dma_fence_array_enable_signaling(struct 
dma_fence *fence)
dma_fence_put(&array->base);
if (atomic_dec_and_test(&array->num_pending)) {
dma_fence_array_clear_pending_error(array);
-   return false;
+   dma_fence_signal(&array->base);
+   return;
}
}
}
-
-   return true;
  }
  
  static bool dma_fence_array_signaled(struct dma_fence *fence)

diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
index 9663ba1bb6ac..f56baa214a6c 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -9,7 +9,7 @@
  
  #include 
  
-static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);

+static void dma_fence_chain_enable_signaling(struct dma_fence *fence);
  
  /**

   * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
@@ -125,10 +125,7 @@ static void dma_fence_chain_irq_work(struct irq_work *work)
  
  	chain = contain

Re: [PATCH 3/3] drm/amdgpu: use drm_file name

2024-09-12 Thread Tvrtko Ursulin




On 11/09/2024 15:58, Pierre-Eric Pelloux-Prayer wrote:

In debugfs gem_info/vm_info files, timeout handler and page fault reports.

This information is useful with the virtio/native-context driver: this
allows the guest applications identifier to visible in amdgpu's output.

The output in amdgpu_vm_info/amdgpu_gem_info looks like this:
pid:12255   Process:glxgears/test-set-fd-name --

Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c|  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   | 11 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 20 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  4 ++--
  5 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6d5fd371d5ce..1712feb2c238 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1577,7 +1577,7 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct 
amdgpu_device *adev,
if (ret)
return ret;
  
-	amdgpu_vm_set_task_info(avm);

+   amdgpu_vm_set_task_info(avm, NULL);
  
  	return 0;

  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 1e475eb01417..d32dc547cc80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -310,7 +310,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
kvfree(chunk_array);
  
  	/* Use this opportunity to fill in task info for the vm */

-   amdgpu_vm_set_task_info(vm);
+   amdgpu_vm_set_task_info(vm, p->filp);
  
  	return 0;
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c

index 0e617dff8765..0e0d49060ca8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -1012,8 +1012,15 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file 
*m, void *unused)
rcu_read_lock();
pid = rcu_dereference(file->pid);
task = pid_task(pid, PIDTYPE_TGID);
-   seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
-  task ? task->comm : "");
+   seq_printf(m, "pid %8d command %s", pid_nr(pid),
+  task ? task->comm : "");
+   if (file->name) {
+   mutex_lock(&file->name_lock);


As mentioned taking a mutex under rcu_read_lock is not allowed. It will 
need to either be re-arranged or, also as mentioned, alternatively 
aligned to use the same RCU access rules.



+   seq_putc(m, '/');
+   seq_puts(m, file->name);
+   mutex_unlock(&file->name_lock);
+   }
+   seq_puts(m, ":\n");
rcu_read_unlock();
  
  		spin_lock(&file->table_lock);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index e20d19ae01b2..385211846ae3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2370,7 +2370,7 @@ static int amdgpu_vm_create_task_info(struct amdgpu_vm 
*vm)
   *
   * @vm: vm for which to set the info
   */
-void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
+void amdgpu_vm_set_task_info(struct amdgpu_vm *vm, struct drm_file *file)
  {
if (!vm->task_info)
return;
@@ -2385,7 +2385,23 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
return;
  
  	vm->task_info->tgid = current->group_leader->pid;

-   get_task_comm(vm->task_info->process_name, current->group_leader);
+   __get_task_comm(vm->task_info->process_name, TASK_COMM_LEN,
+   current->group_leader);
+   /* Append drm_client_name if set. */
+   if (file && file->name) {
+   int n;
+
+   mutex_lock(&file->name_lock);
+   n = strlen(vm->task_info->process_name);
+   if (n < NAME_MAX) {


NAME_MAX because sizeof(vm->task_info->process_name) is NAME_MAX? (hint)


+   if (file->name) {


FWIW could check before strlen.


+   vm->task_info->process_name[n] = '/';


Can this replace the null terminator at process_name[NAME_MAX - 1] with 
a '/'?



+   strscpy_pad(&vm->task_info->process_name[n + 1],
+   file->name, NAME_MAX - (n + 1));
+   }
+   }
+   mutex_unlock(&file->name_lock);
+   }
  }
  
  /**

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index d12d66dca8e9..cabec384b4d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_v

Re: [PATCH 1/3] drm: add DRM_SET_NAME ioctl

2024-09-12 Thread Tvrtko Ursulin




On 11/09/2024 15:58, Pierre-Eric Pelloux-Prayer wrote:

Giving the opportunity to userspace to associate a free-form
name with a drm_file struct is helpful for tracking and debugging.

This is similar to the existing DMA_BUF_SET_NAME ioctl.

Access to name is protected by a mutex, and the 'clients' debugfs
file has been updated to print it.

Userspace MR to use this ioctl:
https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428


Idea seems useful to me. Various classes of comments/questions below:


Signed-off-by: Pierre-Eric Pelloux-Prayer 
---
  drivers/gpu/drm/drm_debugfs.c | 12 
  drivers/gpu/drm/drm_file.c|  5 +
  drivers/gpu/drm/drm_ioctl.c   | 28 
  include/drm/drm_file.h|  9 +
  include/uapi/drm/drm.h| 14 ++
  5 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c
index 6b239a24f1df..b7492225ae88 100644
--- a/drivers/gpu/drm/drm_debugfs.c
+++ b/drivers/gpu/drm/drm_debugfs.c
@@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, void *data)
kuid_t uid;
  
  	seq_printf(m,

-  "%20s %5s %3s master a %5s %10s\n",
+  "%20s %5s %3s master a %5s %10s %20s\n",
   "command",
   "tgid",
   "dev",
   "uid",
-  "magic");
+  "magic",
+  "name");
  
  	/* dev->filelist is sorted youngest first, but we want to present

 * oldest first (i.e. kernel, servers, clients), so walk backwardss.
@@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, void *data)
struct task_struct *task;
struct pid *pid;
  
+		mutex_lock(&priv->name_lock);

rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */
pid = rcu_dereference(priv->pid);
task = pid_task(pid, PIDTYPE_TGID);
uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID;
-   seq_printf(m, "%20s %5d %3d   %c%c %5d %10u\n",
+   seq_printf(m, "%20s %5d %3d   %c%c %5d %10u %20s\n",
   task ? task->comm : "",
   pid_vnr(pid),
   priv->minor->index,
   is_current_master ? 'y' : 'n',
   priv->authenticated ? 'y' : 'n',
   from_kuid_munged(seq_user_ns(m), uid),
-  priv->magic);
+  priv->magic,
+  priv->name ? priv->name : "");
rcu_read_unlock();
+   mutex_unlock(&priv->name_lock);


FWIW it is possible you could get away without the need for a lock on 
the read side if you make the pointer RCU managed and stick a 
synchronize_rcu before kfree in the ioctl update path.


Not because this lock would be a contentended one per se, but mostly to 
avoid complications such as amdgpu_debugfs_gem_info_show() where 3/3 has 
it broken - cannot take the mutex in rcu locked section. Just something 
to consider in case it would end up simpler code.



}
mutex_unlock(&dev->filelist_mutex);
return 0;
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 01fde94fe2a9..558151c3912e 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
  
  	spin_lock_init(&file->master_lookup_lock);

mutex_init(&file->event_read_lock);
+   mutex_init(&file->name_lock);
  
  	if (drm_core_check_feature(dev, DRIVER_GEM))

drm_gem_open(dev, file);
@@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file)
WARN_ON(!list_empty(&file->event_list));
  
  	put_pid(rcu_access_pointer(file->pid));

+
+   mutex_destroy(&file->name_lock);
+   kvfree(file->name);


I think kfree is correct here.


+
kfree(file);
  }
  
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c

index 51f39912866f..ba2f2120e99b 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -540,6 +540,32 @@ int drm_version(struct drm_device *dev, void *data,
return err;
  }
  
+static int drm_set_name(struct drm_device *dev, void *data,

+   struct drm_file *file_priv)
+{
+   struct drm_set_name *name = data;
+   void *user_ptr;
+   char *new_name;
+
+   if (name->name_len >= NAME_MAX)
+   return -EINVAL;


Any special reason to use the filesystem NAME_MAX?


+
+   user_ptr = u64_to_user_ptr(name->name);
+
+   new_name = memdup_user_nul(user_ptr, name->name_len);
+
+   if (IS_ERR(new_name))
+   return PTR_ERR(new_name);
+
+   mutex_lock(&file_priv->name_lock);
+   if (file_priv->name)
+   kvfree(file

[PULL] drm-intel-fixes

2024-09-12 Thread Tvrtko Ursulin



Hi Dave, Sima,

It is late in the cycle and luckily the fix in this weeks PR is just
something to satisfy static analyzers, nothing that can happen in reality,
so pulling it is even optional.

Regards,

Tvrtko

drm-intel-fixes-2024-09-12:
- Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich)
The following changes since commit da3ea35007d0af457a0afc87e84fddaebc4e0b63:

  Linux 6.11-rc7 (2024-09-08 14:50:28 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-fixes-2024-09-12

for you to fetch changes up to d3d37f74683e2f16f2635ee265884f7ca69350ae:

  drm/i915/guc: prevent a possible int overflow in wq offsets (2024-09-10 
08:13:51 +0100)


- Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich)


Nikita Zhandarovich (1):
  drm/i915/guc: prevent a possible int overflow in wq offsets

 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-11 Thread Tvrtko Ursulin




On 10/09/2024 11:25, Philipp Stanner wrote:

On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch
titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we rename drm_sched_rq_add_entity() to
drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be
held, and also add the same expectation to
drm_sched_rq_update_fifo_locked().

For more stream-lining we also add the run-queue as an explicit
parameter
to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee
having to dereference entity->rq.


Why is dereferencing it a problem?


As you have noticed below the API is a bit unsightly. Consider for 
example this call chain:


drm_sched_entity_kill(entity)
drm_sched_rq_remove_entity(entity->rq, entity);
drm_sched_rq_remove_fifo_locked(entity);
struct drm_sched_rq *rq = entity->rq;

A bit confused, no?

I thought adding rq to remove_fifo_locked at least removes one back and 
forth between the entity->rq and rq.


And then if we cache the rq in a local variable, after having explicitly 
taken the correct lock, we have this other call chain example:


drm_sched_entity_push_job()
...
rq = entity->rq;
spin_lock(rq->lock);

drm_sched_rq_add_entity_locked(rq, entity);
drm_sched_rq_update_fifo_locked(rq, entity, submit_ts);

spin_unlock(rq->lock);

To me at least this reads more streamlined.


Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  7 ++--
  drivers/gpu/drm/scheduler/sched_main.c   | 41 +-
--
  include/drm/gpu_scheduler.h  |  7 ++--
  3 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
b/drivers/gpu/drm/scheduler/sched_entity.c
index b4c4f9923e0b..2102c726d275 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct
drm_sched_job *sched_job)
    sched = rq->sched;
  
  		atomic_inc(sched->score);

-   drm_sched_rq_add_entity(rq, entity);
+
+   spin_lock(&rq->lock);
+   drm_sched_rq_add_entity_locked(rq, entity);
  
  		if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)

-   drm_sched_rq_update_fifo_locked(entity,
submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq,
submit_ts);
  
+		spin_unlock(&rq->lock);

    spin_unlock(&entity->lock);
  
  		drm_sched_wakeup(sched, entity);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index 937e7d1cfc49..1ccd2aed2d32 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,41 +153,44 @@ static __always_inline bool
drm_sched_entity_compare_before(struct rb_node *a,
    return ktime_before(ent_a->oldest_job_waiting, ent_b-

oldest_job_waiting);

  }
  
-static inline void drm_sched_rq_remove_fifo_locked(struct

drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity
*entity,
+       struct drm_sched_rq *rq)


So here we'd add a new function parameter that still doesn't allow for
getting rid of 'entity' as a parameter.


We can't get rid of the entity.

Maaaybe instead we could get rid of the rq in the whole chain, I mean 
from drm_sched_rq_add_entity and drm_sched_rq_remove_entity to start with.


But then to remove double re-lock we still (like in this patch) need to 
make the callers take the locks and rename the helpers with _locked 
suffix. Otherwise it would be incosistent that a lock is taken outside 
the helpers with no _locked suffix.


I am not sure if that is better. All it achieves is remove the rq as 
explicit parameter my making the callees dereference it from the entity.


Worst part is all these helpers have drm_sched_rq_ prefix.. which to me 
reads as "we operate on rq". So not passing in rq is confusing to start 
with.


Granted, some confusion still remains with my approach since ideally, to 
those helpers, I wanted to add some asserts that rq == entity->rq...



The API gets larger that way and readers will immediately wonder why
sth is passed as a separate variable that could also be obtained
through the pointer.


  {
-   struct drm_sched_rq *rq = entity->rq;
-
    if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
    rb_erase_cached(&am

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-10 Thread Tvrtko Ursulin




On 10/09/2024 11:05, Philipp Stanner wrote:

On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Lets re-order the members to make it clear which are protected by the
lock
and at the same time document it via kerneldoc.


I'd prefer if commit messages follow the idiomatic kernel style of that
order:
1. Describe the current situation
2. State why it's bad or undesirable
3. (describe the solution)
4. Conclude commit message through sentences in imperative stating
   what the commit does.

In this case I would go for:
"struct drm_sched_rq contains a spinlock that protects several struct
members. The current documentation incorrectly states that this lock
only guards the entities list. In truth, it guards that list, the
rb_tree and the current entity.

Document what the lock actually guards. Rearrange struct members so
that this becomes even more visible."


IMO a bit much to ask for a text book format, for a trivial patch, when 
all points are already implicitly obvious. That is "lets make it clear" 
= current situation is not clear -> obviously bad with no need to 
explain; "and the same time document" = means it is currently not 
documented -> again obviously not desirable.


But okay, since I agree with the point below (*), I can explode the text 
for maximum redundancy.



Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
  include/drm/gpu_scheduler.h | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/drm/gpu_scheduler.h
b/include/drm/gpu_scheduler.h
index a06753987d93..d4a3ba333568 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -243,10 +243,10 @@ struct drm_sched_entity {
  /**
   * struct drm_sched_rq - queue of entities to be scheduled.
   *
- * @lock: to modify the entities list.
   * @sched: the scheduler to which this rq belongs to.
- * @entities: list of the entities to be scheduled.
+ * @lock: protects the list, tree and current entity.


Would be more consistent with the below comment if you'd address them
with their full name, aka "protects @entities, @rb_tree_root and
@current_entity".


*) this one I agree with.

Regards,

Tvrtko



Thanks,
P.



   * @current_entity: the entity which is to be scheduled.
+ * @entities: list of the entities to be scheduled.
   * @rb_tree_root: root of time based priory queue of entities for
FIFO scheduling
   *
   * Run queue is a set of entities scheduling command submissions for
@@ -254,10 +254,12 @@ struct drm_sched_entity {
   * the next entity to emit commands from.
   */
  struct drm_sched_rq {
- spinlock_t lock;
   struct drm_gpu_scheduler *sched;
- struct list_head entities;
+
+ spinlock_t lock;
+ /* Following members are protected by the @lock: */
   struct drm_sched_entity *current_entity;
+ struct list_head entities;
   struct rb_root_cached rb_tree_root;
  };

Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-10 Thread Tvrtko Ursulin




On 10/09/2024 10:08, Christian König wrote:

Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin:

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)


I think that goes into the wrong direction.

Probably better to move this here into drm_sched_rq_add_entity():

       if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
          drm_sched_rq_update_fifo_locked(entity, submit_ts);

We can then also drop adding the entity to the rr list when FIFO is in use.


Unfortuntely there is a few other places which appear to rely on the 
list. Like drm_sched_fini, drm_sched_increase_karma and even 
amdgpu_job_stop_all_jobs_on_sched. Latter could perhaps be solved by 
adding an iterator helper to the scheduler, which would perhaps be a 
good move for component isolation. And first two could be handled by 
implementing a complete and mutually exclusive duality of how entities 
are walked depending on scheduling mode. Plus making the scheduling mode 
only be configurable at boot. It feels doable but significant work and 
in the meantime removing the double re-lock maybe acceptable?


Regards,

Tvrtko


To achieve this we rename drm_sched_rq_add_entity() to
drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be
held, and also add the same expectation to
drm_sched_rq_update_fifo_locked().

For more stream-lining we also add the run-queue as an explicit parameter
to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee
having to dereference entity->rq.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
  drivers/gpu/drm/scheduler/sched_entity.c |  7 ++--
  drivers/gpu/drm/scheduler/sched_main.c   | 41 +---
  include/drm/gpu_scheduler.h  |  7 ++--
  3 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c

index b4c4f9923e0b..2102c726d275 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct 
drm_sched_job *sched_job)

  sched = rq->sched;
  atomic_inc(sched->score);
-    drm_sched_rq_add_entity(rq, entity);
+
+    spin_lock(&rq->lock);
+    drm_sched_rq_add_entity_locked(rq, entity);
  if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-    drm_sched_rq_update_fifo_locked(entity, submit_ts);
+    drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
+    spin_unlock(&rq->lock);
  spin_unlock(&entity->lock);
  drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c

index 937e7d1cfc49..1ccd2aed2d32 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,41 +153,44 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
  return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);

  }
-static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity,

+    struct drm_sched_rq *rq)
  {
-    struct drm_sched_rq *rq = entity->rq;
-
  if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
  rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
  RB_CLEAR_NODE(&entity->rb_tree_node);
  }
  }
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, 
ktime_t ts)

+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+ struct drm_sched_rq *rq,
+ ktime_t ts)
  {
  lockdep_assert_held(&entity->lock);
+    lockdep_assert_held(&rq->lock);
-    spin_lock(&entity->rq->lock);
-
-    drm_sched_rq_remove_fifo_locked(entity);
+    drm_sched_rq_remove_fifo_locked(entity, rq);
  entity->oldest_job_waiting = ts;
-    rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+    rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
    drm_sched_entity_compare_before);
-
-    spin_unlock(&entity->rq->lock);
  }
  void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, 
ktime_t ts)

  {
+    struct drm_sched_rq *rq;
+
  /*
   * Both locks need to be grabbed, one to protect from entity->rq 
change
   * for entity from within concurrent drm_sched_entity_select_rq 
and the

Re: [PATCH] drm/syncobj: Fix syncobj leak in drm_syncobj_eventfd_ioctl

2024-09-10 Thread Tvrtko Ursulin




On 09/09/2024 21:53, T.J. Mercier wrote:

A syncobj reference is taken in drm_syncobj_find, but not released if
eventfd_ctx_fdget or kzalloc fails. Put the reference in these error
paths.

Reported-by: Xingyu Jin 
Fixes: c7a472297169 ("drm/syncobj: add IOCTL to register an eventfd")
Signed-off-by: T.J. Mercier 
---
  drivers/gpu/drm/drm_syncobj.c | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index a0e94217b511..4fcfc0b9b386 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1464,6 +1464,7 @@ drm_syncobj_eventfd_ioctl(struct drm_device *dev, void 
*data,
struct drm_syncobj *syncobj;
struct eventfd_ctx *ev_fd_ctx;
struct syncobj_eventfd_entry *entry;
+   int ret;
  
  	if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))

return -EOPNOTSUPP;
@@ -1479,13 +1480,15 @@ drm_syncobj_eventfd_ioctl(struct drm_device *dev, void 
*data,
return -ENOENT;
  
  	ev_fd_ctx = eventfd_ctx_fdget(args->fd);

-   if (IS_ERR(ev_fd_ctx))
-   return PTR_ERR(ev_fd_ctx);
+   if (IS_ERR(ev_fd_ctx)) {
+   ret = PTR_ERR(ev_fd_ctx);
+   goto err_fdget;
+   }
  
  	entry = kzalloc(sizeof(*entry), GFP_KERNEL);

if (!entry) {
-   eventfd_ctx_put(ev_fd_ctx);
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto err_kzalloc;
}
entry->syncobj = syncobj;
entry->ev_fd_ctx = ev_fd_ctx;
@@ -1496,6 +1499,12 @@ drm_syncobj_eventfd_ioctl(struct drm_device *dev, void 
*data,
drm_syncobj_put(syncobj);
  
  	return 0;

+
+err_kzalloc:
+   eventfd_ctx_put(ev_fd_ctx);
+err_fdget:
+   drm_syncobj_put(syncobj);
+   return ret;
  }
  
  int


Easy enough to review while browsing the list:

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko

[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-09 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Having removed one re-lock cycle on the entity->lock in a patch titled
"drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
larger refactoring we can do the same optimisation on the rq->lock.
(Currently both drm_sched_rq_add_entity() and
drm_sched_rq_update_fifo_locked() take and release the same lock.)

To achieve this we rename drm_sched_rq_add_entity() to
drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be
held, and also add the same expectation to
drm_sched_rq_update_fifo_locked().

For more stream-lining we also add the run-queue as an explicit parameter
to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee
having to dereference entity->rq.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  7 ++--
 drivers/gpu/drm/scheduler/sched_main.c   | 41 +---
 include/drm/gpu_scheduler.h  |  7 ++--
 3 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index b4c4f9923e0b..2102c726d275 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
sched = rq->sched;
 
atomic_inc(sched->score);
-   drm_sched_rq_add_entity(rq, entity);
+
+   spin_lock(&rq->lock);
+   drm_sched_rq_add_entity_locked(rq, entity);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, rq, submit_ts);
 
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
 
drm_sched_wakeup(sched, entity);
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 937e7d1cfc49..1ccd2aed2d32 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -153,41 +153,44 @@ static __always_inline bool 
drm_sched_entity_compare_before(struct rb_node *a,
return ktime_before(ent_a->oldest_job_waiting, 
ent_b->oldest_job_waiting);
 }
 
-static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity 
*entity)
+static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity,
+   struct drm_sched_rq *rq)
 {
-   struct drm_sched_rq *rq = entity->rq;
-
if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root);
RB_CLEAR_NODE(&entity->rb_tree_node);
}
 }
 
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq,
+ktime_t ts)
 {
lockdep_assert_held(&entity->lock);
+   lockdep_assert_held(&rq->lock);
 
-   spin_lock(&entity->rq->lock);
-
-   drm_sched_rq_remove_fifo_locked(entity);
+   drm_sched_rq_remove_fifo_locked(entity, rq);
 
entity->oldest_job_waiting = ts;
 
-   rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root,
+   rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
  drm_sched_entity_compare_before);
-
-   spin_unlock(&entity->rq->lock);
 }
 
 void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
 {
+   struct drm_sched_rq *rq;
+
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
 * for entity from within concurrent drm_sched_entity_select_rq and the
 * other to update the rb tree structure.
 */
spin_lock(&entity->lock);
-   drm_sched_rq_update_fifo_locked(entity, ts);
+   rq = entity->rq;
+   spin_lock(&rq->lock);
+   drm_sched_rq_update_fifo_locked(entity, rq, ts);
+   spin_unlock(&rq->lock);
spin_unlock(&entity->lock);
 }
 
@@ -210,25 +213,23 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler 
*sched,
 }
 
 /**
- * drm_sched_rq_add_entity - add an entity
+ * drm_sched_rq_add_entity_locked - add an entity
  *
  * @rq: scheduler run queue
  * @entity: scheduler entity
  *
  * Adds a scheduler entity to the run queue.
  */
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
-struct drm_sched_entity *entity)
+void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq,
+

[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-09 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Without the locking amdgpu currently can race between
amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and
drm_sched_job_arm(), leading to the latter accesing potentially
inconsitent entity->sched_list and entity->num_sched_list pair.

v2:
 * Improve commit message. (Philipp)

Signed-off-by: Tvrtko Ursulin 
Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list")
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
Cc: Philipp Stanner 
Cc:  # v5.7+
---
 drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 58c8161289fe..ae8be30472cd 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity 
*entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
+   spin_lock(&entity->rq_lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
+   spin_unlock(&entity->rq_lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
-- 
2.46.0

[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock

2024-09-09 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Christian suggested to rename the lock and improve the documentation of
what it protects. And to also re-order the structure members so all
protected by the lock are together in a block.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 24 
 drivers/gpu/drm/scheduler/sched_main.c   |  6 +++---
 include/drm/gpu_scheduler.h  | 15 ---
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 2da677681291..b4c4f9923e0b 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
/* We start in an idle state. */
complete_all(&entity->entity_idle);
 
-   spin_lock_init(&entity->rq_lock);
+   spin_lock_init(&entity->lock);
spsc_queue_init(&entity->job_queue);
 
atomic_set(&entity->fence_seq, 0);
@@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct 
drm_sched_entity *entity,
 {
WARN_ON(!num_sched_list || !sched_list);
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->sched_list = sched_list;
entity->num_sched_list = num_sched_list;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_modify_sched);
 
@@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity 
*entity)
if (!entity->rq)
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->stopped = true;
drm_sched_rq_remove_entity(entity->rq, entity);
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
/* Make sure this entity is not used by the scheduler at the moment */
wait_for_completion(&entity->entity_idle);
@@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f,
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
   enum drm_sched_priority priority)
 {
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
entity->priority = priority;
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 }
 EXPORT_SYMBOL(drm_sched_entity_set_priority);
 
@@ -555,14 +555,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity 
*entity)
if (fence && !dma_fence_is_signaled(fence))
return;
 
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
rq = sched ? sched->sched_rq[entity->priority] : NULL;
if (rq != entity->rq) {
drm_sched_rq_remove_entity(entity->rq, entity);
entity->rq = rq;
}
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
if (entity->num_sched_list == 1)
entity->sched_list = NULL;
@@ -602,9 +602,9 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
struct drm_sched_rq *rq;
 
/* Add the entity to the run queue */
-   spin_lock(&entity->rq_lock);
+   spin_lock(&entity->lock);
if (entity->stopped) {
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
DRM_ERROR("Trying to push to a killed entity\n");
return;
@@ -619,7 +619,7 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
drm_sched_rq_update_fifo_locked(entity, submit_ts);
 
-   spin_unlock(&entity->rq_lock);
+   spin_unlock(&entity->lock);
 
drm_sched_wakeup(sched, entity);
}
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 54c5fe7a7d1d..937e7d1cfc49 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -165,7 +165,7 @@ static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *enti
 
 void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
 {
-   lockdep_assert_held(&entity->rq_lock);
+   lockdep_assert_held(&entity->lock);
 
spin_lock(&entity->rq->lock);
 
@@ -186,9 +186,9 @@

[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-09 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Lets re-order the members to make it clear which are protected by the lock
and at the same time document it via kerneldoc.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 include/drm/gpu_scheduler.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index a06753987d93..d4a3ba333568 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -243,10 +243,10 @@ struct drm_sched_entity {
 /**
  * struct drm_sched_rq - queue of entities to be scheduled.
  *
- * @lock: to modify the entities list.
  * @sched: the scheduler to which this rq belongs to.
- * @entities: list of the entities to be scheduled.
+ * @lock: protects the list, tree and current entity.
  * @current_entity: the entity which is to be scheduled.
+ * @entities: list of the entities to be scheduled.
  * @rb_tree_root: root of time based priory queue of entities for FIFO 
scheduling
  *
  * Run queue is a set of entities scheduling command submissions for
@@ -254,10 +254,12 @@ struct drm_sched_entity {
  * the next entity to emit commands from.
  */
 struct drm_sched_rq {
-   spinlock_t  lock;
struct drm_gpu_scheduler*sched;
-   struct list_headentities;
+
+   spinlock_t  lock;
+   /* Following members are protected by the @lock: */
struct drm_sched_entity *current_entity;
+   struct list_headentities;
struct rb_root_cached   rb_tree_root;
 };
 
-- 
2.46.0

[PATCH v2 0/8] DRM scheduler fixes, or not, or incorrect kind

2024-09-09 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

Re-spin of the series from two days ago with review feedback addressed and
some new patches added.

Changelog is in individual patches but essentially new patches are renames
and struct members re-ordering as discussed in v1, plus one more optimisation
when I noticed we can save another spinlock re-lock cycle this time on rq->lock.

Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 

Tvrtko Ursulin (8):
  drm/sched: Add locking to drm_sched_entity_modify_sched
  drm/sched: Always wake up correct scheduler in
drm_sched_entity_push_job
  drm/sched: Always increment correct scheduler score
  drm/sched: Optimise drm_sched_entity_push_job
  drm/sched: Stop setting current entity in FIFO mode
  drm/sched: Re-order struct drm_sched_rq members for clarity
  drm/sched: Re-group and rename the entity run-queue lock
  drm/sched: Further optimise drm_sched_entity_push_job

 drivers/gpu/drm/scheduler/sched_entity.c | 40 +++--
 drivers/gpu/drm/scheduler/sched_main.c   | 57 ++--
 include/drm/gpu_scheduler.h  | 31 +++--
 3 files changed, 77 insertions(+), 51 deletions(-)

-- 
2.46.0

[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job

2024-09-09 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

In FIFO mode We can avoid dropping the lock only to immediately re-acquire
by adding a new drm_sched_rq_update_fifo_locked() helper.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  5 +++--
 drivers/gpu/drm/scheduler/sched_main.c   | 21 ++---
 include/drm/gpu_scheduler.h  |  1 +
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 6645a8524699..2da677681291 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -615,10 +615,11 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job)
 
atomic_inc(sched->score);
drm_sched_rq_add_entity(rq, entity);
-   spin_unlock(&entity->rq_lock);
 
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
-   drm_sched_rq_update_fifo(entity, submit_ts);
+   drm_sched_rq_update_fifo_locked(entity, submit_ts);
+
+   spin_unlock(&entity->rq_lock);
 
drm_sched_wakeup(sched, entity);
}
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index ab53ab486fe6..10abbcefe9d8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -163,14 +163,10 @@ static inline void drm_sched_rq_remove_fifo_locked(struct 
drm_sched_entity *enti
}
 }
 
-void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts)
 {
-   /*
-* Both locks need to be grabbed, one to protect from entity->rq change
-* for entity from within concurrent drm_sched_entity_select_rq and the
-* other to update the rb tree structure.
-*/
-   spin_lock(&entity->rq_lock);
+   lockdep_assert_held(&entity->rq_lock);
+
spin_lock(&entity->rq->lock);
 
drm_sched_rq_remove_fifo_locked(entity);
@@ -181,6 +177,17 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity 
*entity, ktime_t ts)
  drm_sched_entity_compare_before);
 
spin_unlock(&entity->rq->lock);
+}
+
+void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
+{
+   /*
+* Both locks need to be grabbed, one to protect from entity->rq change
+* for entity from within concurrent drm_sched_entity_select_rq and the
+* other to update the rb tree structure.
+*/
+   spin_lock(&entity->rq_lock);
+   drm_sched_rq_update_fifo_locked(entity, ts);
spin_unlock(&entity->rq_lock);
 }
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index fe8edb917360..a06753987d93 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -594,6 +594,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
struct drm_sched_entity *entity);
 
 void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts);
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t 
ts);
 
 int drm_sched_entity_init(struct drm_sched_entity *entity,
  enum drm_sched_priority priority,
-- 
2.46.0

[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-09 Thread Tvrtko Ursulin

From: Tvrtko Ursulin 

It does not seem there is a need to set the current entity in FIFO mode
since ot only serves as being a "cursor" in round-robin mode. Even if
scheduling mode is changed at runtime the change in behaviour is simply
to restart from the first entity, instead of continuing in RR mode from
where FIFO left it, and that sounds completely fine.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Alex Deucher 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 10abbcefe9d8..54c5fe7a7d1d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -356,7 +356,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler 
*sched,
return ERR_PTR(-ENOSPC);
}

-   rq->current_entity = entity;
reinit_completion(&entity->entity_idle);
break;
}
-- 
2.46.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1694 matches

Mail list logo