Re: [PATCH v6 5/5] drm/amdgpu: track bo memory stats at runtime
On 07/11/2024 14:24, Li, Yunxiang (Teddy) wrote: [Public] From: Tvrtko Ursulin Sent: Thursday, November 7, 2024 5:48 On 31/10/2024 13:48, Li, Yunxiang (Teddy) wrote: [Public] From: Christian König Sent: Thursday, October 31, 2024 8:54 Am 25.10.24 um 19:41 schrieb Yunxiang Li: Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massive performance hit. In this new revision, we track the BOs as they change states. This way when the fdinfo is queried we only need to take the status lock and copy out the usage stats with minimal impact to the runtime performance. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 107 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 189 +++--- -- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 1 + 8 files changed, 199 insertions(+), 141 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index b144404902255..1d8a0ff3c8604 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -36,6 +36,7 @@ #include "amdgpu_gem.h" #include "amdgpu_dma_buf.h" #include "amdgpu_xgmi.h" +#include "amdgpu_vm.h" #include #include #include @@ -190,6 +191,13 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment *attach, } } +static void amdgpu_dma_buf_release(struct dma_buf *buf) { + struct amdgpu_bo *bo = gem_to_amdgpu_bo(buf->priv); + amdgpu_vm_bo_update_shared(bo, -1); + drm_gem_dmabuf_release(buf); Please run checkpatch.pl on the patch. As far as I can see it would complain about the coding style here (empty line between declaration and code). Not much of an issue but we would like to prevent upstream from complaining about such things. Will do +} + /** * amdgpu_dma_buf_begin_cpu_access - &dma_buf_ops.begin_cpu_access implementation * @dma_buf: Shared DMA buffer @@ -237,7 +245,7 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .unpin = amdgpu_dma_buf_unpin, .map_dma_buf = amdgpu_dma_buf_map, .unmap_dma_buf = amdgpu_dma_buf_unmap, - .release = drm_gem_dmabuf_release, + .release = amdgpu_dma_buf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap, .vmap = drm_gem_dmabuf_vmap, @@ -265,8 +273,10 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_gem_object *gobj, return ERR_PTR(-EPERM); buf = drm_gem_prime_export(gobj, flags); - if (!IS_ERR(buf)) + if (!IS_ERR(buf)) { buf->ops = &amdgpu_dmabuf_ops; + amdgpu_vm_bo_update_shared(bo, +1); + } return buf; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index 7a9573958d87c..e0e09f7b39d10 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -60,7 +60,7 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct amdgpu_fpriv *fpriv = file->driver_priv; struct amdgpu_vm *vm = &fpriv->vm; - struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { }; + struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST] = { }; ktime_t usage[AMDGPU_HW_IP_NUM]; const char *pl_name[] = { [TTM_PL_VRAM] = "vram", @@ -70,13 +70,7 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) unsigned int hw_ip, i; int ret; - ret = amdgpu_bo_reserve(vm->root.bo, false); - if (ret) - return; - - amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats)); - amdgpu_bo_unreserve(vm->root.bo); - + amdgpu_vm_get_memory(vm, stats); amdgpu_ctx_mgr_usage(&fpriv->ctx_mgr, usage); /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 2436b7c9ad12b..98563124ff99c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1156,7 +1156,7 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, return; abo = ttm_to_amdgpu_bo(bo); - amdgpu_vm_bo_invalidate(abo, evict); + amdgpu_vm_bo_move(abo, new_mem, evict); amdgpu_bo_kunmap(abo); @@ -1169,86 +1169,6 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo,
Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc
On 28/10/2024 10:34, Christian König wrote: Am 25.10.24 um 11:05 schrieb Tvrtko Ursulin: On 25/10/2024 09:59, Tvrtko Ursulin wrote: On 24/10/2024 13:41, Christian König wrote: Reports indicates that some userspace applications try to merge more than 80k of fences into a single dma_fence_array leading to a warning from kzalloc() that the requested size becomes to big. While that is clearly an userspace bug we should probably handle that case gracefully in the kernel. So we can either reject requests to merge more than a reasonable amount of fences (64k maybe?) or we can start to use kvzalloc() instead of kzalloc(). This patch here does the later. Rejecting would potentially be safer, otherwise there is a path for userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b ("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) and spam dmesg at will. Actually that is a WARN_ON_*ONCE* there so maybe not so critical to invent a limit. Up for discussion I suppose. Regards, Tvrtko Question is what limit to set... That's one of the reasons why I opted for kvzalloc() initially. I didn't get that, what was the reason? To not have to invent an arbitrary limit? I mean we could use some nice round number like 65536, but that would be totally arbitrary. Yeah.. Set an arbitrary limit so a warning in __kvmalloc_node_noprof() is avoided? Or pass __GFP_NOWARN? Any comments on the other two patches? I need to get them upstream. Will look into them shortly. Regards, Tvrtko Thanks, Christian. Regards, Tvrtko Signed-off-by: Christian König CC: sta...@vger.kernel.org --- drivers/dma-buf/dma-fence-array.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 8a08ffde31e7..46ac42bcfac0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -119,8 +119,8 @@ static void dma_fence_array_release(struct dma_fence *fence) for (i = 0; i < array->num_fences; ++i) dma_fence_put(array->fences[i]); - kfree(array->fences); - dma_fence_free(fence); + kvfree(array->fences); + kvfree_rcu(fence, rcu); } static void dma_fence_array_set_deadline(struct dma_fence *fence, @@ -153,7 +153,7 @@ struct dma_fence_array *dma_fence_array_alloc(int num_fences) { struct dma_fence_array *array; - return kzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); + return kvzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); } EXPORT_SYMBOL(dma_fence_array_alloc);
Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc
On 07/11/2024 12:48, Christian König wrote: Am 07.11.24 um 12:29 schrieb Tvrtko Ursulin: On 28/10/2024 10:34, Christian König wrote: Am 25.10.24 um 11:05 schrieb Tvrtko Ursulin: On 25/10/2024 09:59, Tvrtko Ursulin wrote: On 24/10/2024 13:41, Christian König wrote: Reports indicates that some userspace applications try to merge more than 80k of fences into a single dma_fence_array leading to a warning from kzalloc() that the requested size becomes to big. While that is clearly an userspace bug we should probably handle that case gracefully in the kernel. So we can either reject requests to merge more than a reasonable amount of fences (64k maybe?) or we can start to use kvzalloc() instead of kzalloc(). This patch here does the later. Rejecting would potentially be safer, otherwise there is a path for userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b ("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) and spam dmesg at will. Actually that is a WARN_ON_*ONCE* there so maybe not so critical to invent a limit. Up for discussion I suppose. Regards, Tvrtko Question is what limit to set... That's one of the reasons why I opted for kvzalloc() initially. I didn't get that, what was the reason? To not have to invent an arbitrary limit? Well that I couldn't come up with any arbitrary limit that I had confidence would work and not block real world use cases. Switching to kvzalloc() just seemed the more defensive approach. Yeah it is. I mean we could use some nice round number like 65536, but that would be totally arbitrary. Yeah.. Set an arbitrary limit so a warning in __kvmalloc_node_noprof() is avoided? Or pass __GFP_NOWARN? Well are we sure that will never hit 65536 in a real world use case? It's still pretty low. Ah no, I did not express myself clearly. I did not mean 64k, but a limit to align with INT_MAX __kvmalloc_node_noprof(). Or __GFP_NOWARN might be better when allocation size is userspace controlled. Regards, Tvrtko Any comments on the other two patches? I need to get them upstream. Will look into them shortly. Thanks, Christian. Regards, Tvrtko Thanks, Christian. Regards, Tvrtko Signed-off-by: Christian König CC: sta...@vger.kernel.org --- drivers/dma-buf/dma-fence-array.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 8a08ffde31e7..46ac42bcfac0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -119,8 +119,8 @@ static void dma_fence_array_release(struct dma_fence *fence) for (i = 0; i < array->num_fences; ++i) dma_fence_put(array->fences[i]); - kfree(array->fences); - dma_fence_free(fence); + kvfree(array->fences); + kvfree_rcu(fence, rcu); } static void dma_fence_array_set_deadline(struct dma_fence *fence, @@ -153,7 +153,7 @@ struct dma_fence_array *dma_fence_array_alloc(int num_fences) { struct dma_fence_array *array; - return kzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); + return kvzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); } EXPORT_SYMBOL(dma_fence_array_alloc);
Re: [PATCH 2/3] dma-buf: sort fences in dma_fence_unwrap_merge
On 24/10/2024 13:41, Christian König wrote: The merge function initially handled only individual fences and arrays which in turn were created by the merge function. This allowed to create the new array by a simple merge sort based on the fence context number. The problem is now that since the addition of timeline sync objects userspace can create chain containers in basically any fence context order. If those are merged together it can happen that we create really large arrays since the merge sort algorithm doesn't work any more. So put an insert sort behind the merge sort which kicks in when the input fences are not in the expected order. This isn't as efficient as a heap sort, but has better properties for the most common use case. Signed-off-by: Christian König --- drivers/dma-buf/dma-fence-unwrap.c | 39 ++ 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c index 628af51c81af..d9aa280d9ff6 100644 --- a/drivers/dma-buf/dma-fence-unwrap.c +++ b/drivers/dma-buf/dma-fence-unwrap.c @@ -106,7 +106,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences, fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]); count = 0; - do { + while (true) { unsigned int sel; restart: @@ -144,11 +144,40 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences, } } - if (tmp) { - array[count++] = dma_fence_get(tmp); - fences[sel] = dma_fence_unwrap_next(&iter[sel]); + if (!tmp) + break; + + /* +* We could use a binary search here, but since the assumption +* is that the main input are already sorted dma_fence_arrays +* just looking from end has a higher chance of finding the +* right location on the first try +*/ + + for (i = count; i--;) { + if (likely(array[i]->context < tmp->context)) + break; + + if (array[i]->context == tmp->context) { + if (dma_fence_is_later(tmp, array[i])) { + dma_fence_put(array[i]); + array[i] = dma_fence_get(tmp); + } + fences[sel] = dma_fence_unwrap_next(&iter[sel]); + goto restart; + } } - } while (tmp); + + ++i; + /* +* Make room for the fence, this should be a nop most of the +* time. +*/ + memcpy(&array[i + 1], &array[i], (count - i) * sizeof(*array)); + array[i] = dma_fence_get(tmp); + fences[sel] = dma_fence_unwrap_next(&iter[sel]); + count++; Having ventured into this function for the first time, I can say that this is some smart code which is not easy to grasp. It could definitely benefit from a high level comment before the do-while loop to explain what it is going to do. Next and tmp local variable names I also wonder if could be renamed to something more descriptive. And the algorithmic complexity of the end result, given the multiple loops and gotos, I have no idea what it could be. Has a dumb solution been considered like a two-pass with a pessimistically allocated fence array been considered? Like: 1) Populate array with all unsignalled unwrapped fences. (O(count)) 2) Bog standard include/linux/sort.h by context and seqno. (O(count*log (count))) 3) Walk array and squash same context to latest fence. (Before this patch that wasn't there, right?). (O(count)) (Overwrite in place, no memcpy needed.) Algorithmic complexity of that would be obvious and code much simpler. Regards, Tvrtko + }; if (count == 0) { tmp = dma_fence_allocate_private_stub(ktime_get());
Re: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero
On 07/11/2024 14:17, Li, Yunxiang (Teddy) wrote: [AMD Official Use Only - AMD Internal Distribution Only] From: Tvrtko Ursulin Sent: Thursday, November 7, 2024 5:41 On 25/10/2024 18:41, Yunxiang Li wrote: Add a helper to check if the memory stats is zero, this will be used to check for memory accounting errors. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/drm_file.c | 9 + include/drm/drm_file.h | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 714e42b051080..75ed701d80f74 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -859,6 +859,15 @@ static void print_size(struct drm_printer *p, const char *stat, drm_printf(p, "drm-%s-%s:\t%llu%s\n", stat, region, sz, units[u]); } +int drm_memory_stats_is_zero(const struct drm_memory_stats *stats) { + return (stats->shared == 0 && + stats->private == 0 && + stats->resident == 0 && + stats->purgeable == 0 && + stats->active == 0); +} Could use mem_is_zero() for some value of source/binary compactness. Yeah, the patch set started out with that when it's just a function in amdgpu, but Christ didn't like it. Okay, I don't feel so strongly about the implementation details. +EXPORT_SYMBOL(drm_memory_stats_is_zero); + I am not a huge fan of adding this as an interface as the only caller appears to be a sanity check in amdgpu_vm_fini(): if (!amdgpu_vm_stats_is_zero(vm)) dev_err(adev->dev, "VM memory stats is non-zero when fini\n"); But I guess there is some value in sanity checking since amdgpu does not have a notion of debug only code (compiled at production and exercised via a test suite). I do suggest to demote the dev_err to notice log level would suffice and be more accurate. I think it's very important to have a check like this when we have a known invariant, especially in this case where there's stat tracking code spread out everywhere and we have very little chance of catching a bug right when it happened. And since whenever this check fails we know for sure there is a bug, I don't see the harm of keeping it as an error. It would indeed be a programming error if it can happen, but from the point of view of a driver and system log I think a warning is actually right. Regards, Tvrtko Now that I think about it, I probably want to have the process & task name in here to aid in reproduction. Teddy
Re: [PATCH v6 5/5] drm/amdgpu: track bo memory stats at runtime
On 31/10/2024 13:48, Li, Yunxiang (Teddy) wrote: [Public] From: Christian König Sent: Thursday, October 31, 2024 8:54 Am 25.10.24 um 19:41 schrieb Yunxiang Li: Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massive performance hit. In this new revision, we track the BOs as they change states. This way when the fdinfo is queried we only need to take the status lock and copy out the usage stats with minimal impact to the runtime performance. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 107 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 189 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 12 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 1 + 8 files changed, 199 insertions(+), 141 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index b144404902255..1d8a0ff3c8604 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -36,6 +36,7 @@ #include "amdgpu_gem.h" #include "amdgpu_dma_buf.h" #include "amdgpu_xgmi.h" +#include "amdgpu_vm.h" #include #include #include @@ -190,6 +191,13 @@ static void amdgpu_dma_buf_unmap(struct dma_buf_attachment *attach, } } +static void amdgpu_dma_buf_release(struct dma_buf *buf) { + struct amdgpu_bo *bo = gem_to_amdgpu_bo(buf->priv); + amdgpu_vm_bo_update_shared(bo, -1); + drm_gem_dmabuf_release(buf); Please run checkpatch.pl on the patch. As far as I can see it would complain about the coding style here (empty line between declaration and code). Not much of an issue but we would like to prevent upstream from complaining about such things. Will do +} + /** * amdgpu_dma_buf_begin_cpu_access - &dma_buf_ops.begin_cpu_access implementation * @dma_buf: Shared DMA buffer @@ -237,7 +245,7 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = { .unpin = amdgpu_dma_buf_unpin, .map_dma_buf = amdgpu_dma_buf_map, .unmap_dma_buf = amdgpu_dma_buf_unmap, - .release = drm_gem_dmabuf_release, + .release = amdgpu_dma_buf_release, .begin_cpu_access = amdgpu_dma_buf_begin_cpu_access, .mmap = drm_gem_dmabuf_mmap, .vmap = drm_gem_dmabuf_vmap, @@ -265,8 +273,10 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_gem_object *gobj, return ERR_PTR(-EPERM); buf = drm_gem_prime_export(gobj, flags); - if (!IS_ERR(buf)) + if (!IS_ERR(buf)) { buf->ops = &amdgpu_dmabuf_ops; + amdgpu_vm_bo_update_shared(bo, +1); + } return buf; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index 7a9573958d87c..e0e09f7b39d10 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -60,7 +60,7 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct amdgpu_fpriv *fpriv = file->driver_priv; struct amdgpu_vm *vm = &fpriv->vm; - struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST + 1] = { }; + struct amdgpu_mem_stats stats[__AMDGPU_PL_LAST] = { }; ktime_t usage[AMDGPU_HW_IP_NUM]; const char *pl_name[] = { [TTM_PL_VRAM] = "vram", @@ -70,13 +70,7 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) unsigned int hw_ip, i; int ret; - ret = amdgpu_bo_reserve(vm->root.bo, false); - if (ret) - return; - - amdgpu_vm_get_memory(vm, stats, ARRAY_SIZE(stats)); - amdgpu_bo_unreserve(vm->root.bo); - + amdgpu_vm_get_memory(vm, stats); amdgpu_ctx_mgr_usage(&fpriv->ctx_mgr, usage); /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 2436b7c9ad12b..98563124ff99c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1156,7 +1156,7 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, return; abo = ttm_to_amdgpu_bo(bo); - amdgpu_vm_bo_invalidate(abo, evict); + amdgpu_vm_bo_move(abo, new_mem, evict); amdgpu_bo_kunmap(abo); @@ -1169,86 +1169,6 @@ void amdgpu_bo_move_notify(struct ttm_buffer_object *bo, old_mem ? old_mem->mem_type : -1); } -void amdgpu_bo_get_memory(struct amdgpu_bo *bo, - struct amdgpu_mem_stats *stats, - unsigned int sz) -{ - const unsigned int domain_to_pl[] = { - [ilog2(AMDGPU_GEM_DOMAIN_CPU)] = TTM_PL_
Re: [PATCH v6 4/5] drm: add drm_memory_stats_is_zero
On 25/10/2024 18:41, Yunxiang Li wrote: Add a helper to check if the memory stats is zero, this will be used to check for memory accounting errors. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/drm_file.c | 9 + include/drm/drm_file.h | 1 + 2 files changed, 10 insertions(+) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 714e42b051080..75ed701d80f74 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -859,6 +859,15 @@ static void print_size(struct drm_printer *p, const char *stat, drm_printf(p, "drm-%s-%s:\t%llu%s\n", stat, region, sz, units[u]); } +int drm_memory_stats_is_zero(const struct drm_memory_stats *stats) { + return (stats->shared == 0 && + stats->private == 0 && + stats->resident == 0 && + stats->purgeable == 0 && + stats->active == 0); +} Could use mem_is_zero() for some value of source/binary compactness. +EXPORT_SYMBOL(drm_memory_stats_is_zero); + I am not a huge fan of adding this as an interface as the only caller appears to be a sanity check in amdgpu_vm_fini(): if (!amdgpu_vm_stats_is_zero(vm)) dev_err(adev->dev, "VM memory stats is non-zero when fini\n"); But I guess there is some value in sanity checking since amdgpu does not have a notion of debug only code (compiled at production and exercised via a test suite). I do suggest to demote the dev_err to notice log level would suffice and be more accurate. Regards, Tvrtko /** * drm_print_memory_stats - A helper to print memory stats * @p: The printer to print output to diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index ab230d3af138d..7f91e35d027d9 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -477,6 +477,7 @@ struct drm_memory_stats { enum drm_gem_object_status; +int drm_memory_stats_is_zero(const struct drm_memory_stats *stats); void drm_print_memory_stats(struct drm_printer *p, const struct drm_memory_stats *stats, enum drm_gem_object_status supported_status,
Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc
On 25/10/2024 09:59, Tvrtko Ursulin wrote: On 24/10/2024 13:41, Christian König wrote: Reports indicates that some userspace applications try to merge more than 80k of fences into a single dma_fence_array leading to a warning from kzalloc() that the requested size becomes to big. While that is clearly an userspace bug we should probably handle that case gracefully in the kernel. So we can either reject requests to merge more than a reasonable amount of fences (64k maybe?) or we can start to use kvzalloc() instead of kzalloc(). This patch here does the later. Rejecting would potentially be safer, otherwise there is a path for userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b ("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) and spam dmesg at will. Actually that is a WARN_ON_*ONCE* there so maybe not so critical to invent a limit. Up for discussion I suppose. Regards, Tvrtko Question is what limit to set... Regards, Tvrtko Signed-off-by: Christian König CC: sta...@vger.kernel.org --- drivers/dma-buf/dma-fence-array.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 8a08ffde31e7..46ac42bcfac0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -119,8 +119,8 @@ static void dma_fence_array_release(struct dma_fence *fence) for (i = 0; i < array->num_fences; ++i) dma_fence_put(array->fences[i]); - kfree(array->fences); - dma_fence_free(fence); + kvfree(array->fences); + kvfree_rcu(fence, rcu); } static void dma_fence_array_set_deadline(struct dma_fence *fence, @@ -153,7 +153,7 @@ struct dma_fence_array *dma_fence_array_alloc(int num_fences) { struct dma_fence_array *array; - return kzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); + return kvzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); } EXPORT_SYMBOL(dma_fence_array_alloc);
Re: [PATCH 1/3] dma-buf/dma-fence_array: use kvzalloc
On 24/10/2024 13:41, Christian König wrote: Reports indicates that some userspace applications try to merge more than 80k of fences into a single dma_fence_array leading to a warning from kzalloc() that the requested size becomes to big. While that is clearly an userspace bug we should probably handle that case gracefully in the kernel. So we can either reject requests to merge more than a reasonable amount of fences (64k maybe?) or we can start to use kvzalloc() instead of kzalloc(). This patch here does the later. Rejecting would potentially be safer, otherwise there is a path for userspace to trigger a warn in kvmalloc_node (see 0829b5bcdd3b ("drm/i915: 2 GiB of relocations ought to be enough for anybody*")) and spam dmesg at will. Question is what limit to set... Regards, Tvrtko Signed-off-by: Christian König CC: sta...@vger.kernel.org --- drivers/dma-buf/dma-fence-array.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 8a08ffde31e7..46ac42bcfac0 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -119,8 +119,8 @@ static void dma_fence_array_release(struct dma_fence *fence) for (i = 0; i < array->num_fences; ++i) dma_fence_put(array->fences[i]); - kfree(array->fences); - dma_fence_free(fence); + kvfree(array->fences); + kvfree_rcu(fence, rcu); } static void dma_fence_array_set_deadline(struct dma_fence *fence, @@ -153,7 +153,7 @@ struct dma_fence_array *dma_fence_array_alloc(int num_fences) { struct dma_fence_array *array; - return kzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); + return kvzalloc(struct_size(array, callbacks, num_fences), GFP_KERNEL); } EXPORT_SYMBOL(dma_fence_array_alloc);
Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime
On 23/10/2024 13:56, Christian König wrote: Am 23.10.24 um 14:24 schrieb Tvrtko Ursulin: [SNIP] To fold or not the special placements (GWS, GDS & co) is also tangential. In my patch I just preserved the legacy behaviour so it can easily be tweaked on top. Yeah, but again the original behavior is completely broken. GWS, GDS and OA are counted in blocks of HW units (multiplied by PAGE_SIZE IIRC to avoid some GEM&TTM warnings). When you accumulate that anywhere in the memory stats then that is just completely off. Ooops. :) Are they backed by some memory though, be it system or VRAM? GDS is an internal 4 or 64KiB memory block which is only valid while shaders are running. It is used to communicate stuff between different shader stages and not even CPU accessible. GWS and OA are not even memory, those are just HW blocks which implement a fixed function. IIRC most HW generation have 16 of each and when setting up the application virtual address space you can specify how many will be used by the application. I see, thank you! Though I could have bothered to look in the code or even instrument at runtime too. I agree removing it from system is correct. If wanted and/or desirable some or all could be exported as different memory regions even. DRM fdinfo specs already allows that. Like: drm-total-vram: ... drm-total-gds: ... drm-total-oa: ... Etc. Regards, Tvrtko
Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime
On 23/10/2024 14:31, Li, Yunxiang (Teddy) wrote: [AMD Official Use Only - AMD Internal Distribution Only] From: Tvrtko Ursulin Sent: Wednesday, October 23, 2024 8:25 On 23/10/2024 13:12, Christian König wrote: Am 23.10.24 um 13:37 schrieb Tvrtko Ursulin: On 23/10/2024 10:14, Christian König wrote: Am 23.10.24 um 09:38 schrieb Tvrtko Ursulin: On 22/10/2024 17:24, Christian König wrote: Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy): [Public] +static uint32_t fold_memtype(uint32_t memtype) { In general please add prefixes to even static functions, e.g. amdgpu_vm_ or amdgpu_bo_. + /* Squash private placements into 'cpu' to keep the legacy userspace view. */ + switch (mem_type) { + case TTM_PL_VRAM: + case TTM_PL_TT: + return memtype + default: + return TTM_PL_SYSTEM; + } +} + +static uint32_t bo_get_memtype(struct amdgpu_bo *bo) { That whole function belongs into amdgpu_bo.c Do you mean bo_get_memtype or fold_memtype? I debated whether bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and since it's using fold_memtype and only useful for memory stats because of folding the private placements I just left them here together with the other mem stats code. I can move it to amdgpu_bo.c make it return the memtype verbatim and just fold it when I do the accounting. I think that folding GDS, GWS and OA into system is also a bug. We should really not doing that. Just wanted to point out for this round that the code to query the current placement from a BO should probably go into amdgpu_bo.c and not amdgpu_vm.c + struct ttm_resource *res = bo->tbo.resource; + const uint32_t domain_to_pl[] = { + [ilog2(AMDGPU_GEM_DOMAIN_CPU)] = +TTM_PL_SYSTEM, + [ilog2(AMDGPU_GEM_DOMAIN_GTT)] = TTM_PL_TT, + [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM, + [ilog2(AMDGPU_GEM_DOMAIN_GDS)] = +AMDGPU_PL_GDS, + [ilog2(AMDGPU_GEM_DOMAIN_GWS)] = +AMDGPU_PL_GWS, + [ilog2(AMDGPU_GEM_DOMAIN_OA)] = AMDGPU_PL_OA, + [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] = AMDGPU_PL_DOORBELL, + }; + uint32_t domain; + + if (res) + return fold_memtype(res->mem_type); + + /* +* If no backing store use one of the preferred domain for basic +* stats. We take the MSB since that should give a +reasonable +* view. +*/ + BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM < TTM_PL_SYSTEM); + domain = fls(bo->preferred_domains & +AMDGPU_GEM_DOMAIN_MASK); + if (drm_WARN_ON_ONCE(&adev->ddev, +domain == 0 || --domain >= ARRAY_SIZE(domain_to_pl))) It's perfectly legal to create a BO without a placement. That one just won't have a backing store. This is lifted from the previous change I'm rebasing onto. I think what it’s trying to do is if the BO doesn't have a placement, use the "biggest" (VRAM > TT > SYSTEM) preferred placement for the purpose of accounting. Previously we just ignore BOs that doesn't have a placement. I guess there's argument for going with either approaches. I was not arguing, I'm simply pointing out a bug. It's perfectly valid for bo->preferred_domains to be 0. So the following WARN_ON() that no bit is set is incorrect. + return 0; + return fold_memtype(domain_to_pl[domain]) That would need specular execution mitigation if I'm not completely mistaken. Better use a switch/case statement. Do you mean change the array indexing to a switch statement? Yes. Did you mean array_index_nospec? Yes. Domain is not a direct userspace input and is calculated from the mask which sanitized to allowed values prior to this call. So I *think* switch is an overkill but don't mind it either. Just commenting FWIW. I missed that the mask is applied. Thinking more about it I'm not sure if we should do this conversion in the first place. IIRC Tvrtko you once suggested a patch which switched a bunch of code to use the TTM placement instead of the UAPI flags. Maybe 8fb0efb10184 ("drm/amdgpu: Reduce mem_type to domain double indirection") is what are you thinking of? Yes, exactly that one. Going more into this direction I think when we want to look at the current placement we should probably also use the TTM PL enumeration directly. It does this already. The placement flags are just to "invent" a TTM PL enum when bo->tbo.resource == NULL. Ah, good point! I though we would do the mapping the other way around. In this case that is even more something we should probably not do at all. When bo->tbo.resource is NULL then this BO isn't resident at all, so it should not account to resident memory. It doesn't, only for total. I should have pasted more context..: struct ttm_resource *res = bo->tbo.resource; ... /* DRM stats c
Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime
On 23/10/2024 13:12, Christian König wrote: Am 23.10.24 um 13:37 schrieb Tvrtko Ursulin: On 23/10/2024 10:14, Christian König wrote: Am 23.10.24 um 09:38 schrieb Tvrtko Ursulin: On 22/10/2024 17:24, Christian König wrote: Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy): [Public] +static uint32_t fold_memtype(uint32_t memtype) { In general please add prefixes to even static functions, e.g. amdgpu_vm_ or amdgpu_bo_. + /* Squash private placements into 'cpu' to keep the legacy userspace view. */ + switch (mem_type) { + case TTM_PL_VRAM: + case TTM_PL_TT: + return memtype + default: + return TTM_PL_SYSTEM; + } +} + +static uint32_t bo_get_memtype(struct amdgpu_bo *bo) { That whole function belongs into amdgpu_bo.c Do you mean bo_get_memtype or fold_memtype? I debated whether bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and since it's using fold_memtype and only useful for memory stats because of folding the private placements I just left them here together with the other mem stats code. I can move it to amdgpu_bo.c make it return the memtype verbatim and just fold it when I do the accounting. I think that folding GDS, GWS and OA into system is also a bug. We should really not doing that. Just wanted to point out for this round that the code to query the current placement from a BO should probably go into amdgpu_bo.c and not amdgpu_vm.c + struct ttm_resource *res = bo->tbo.resource; + const uint32_t domain_to_pl[] = { + [ilog2(AMDGPU_GEM_DOMAIN_CPU)] = TTM_PL_SYSTEM, + [ilog2(AMDGPU_GEM_DOMAIN_GTT)] = TTM_PL_TT, + [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM, + [ilog2(AMDGPU_GEM_DOMAIN_GDS)] = AMDGPU_PL_GDS, + [ilog2(AMDGPU_GEM_DOMAIN_GWS)] = AMDGPU_PL_GWS, + [ilog2(AMDGPU_GEM_DOMAIN_OA)] = AMDGPU_PL_OA, + [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] = AMDGPU_PL_DOORBELL, + }; + uint32_t domain; + + if (res) + return fold_memtype(res->mem_type); + + /* + * If no backing store use one of the preferred domain for basic + * stats. We take the MSB since that should give a reasonable + * view. + */ + BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM < TTM_PL_SYSTEM); + domain = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK); + if (drm_WARN_ON_ONCE(&adev->ddev, + domain == 0 || --domain >= ARRAY_SIZE(domain_to_pl))) It's perfectly legal to create a BO without a placement. That one just won't have a backing store. This is lifted from the previous change I'm rebasing onto. I think what it’s trying to do is if the BO doesn't have a placement, use the "biggest" (VRAM > TT > SYSTEM) preferred placement for the purpose of accounting. Previously we just ignore BOs that doesn't have a placement. I guess there's argument for going with either approaches. I was not arguing, I'm simply pointing out a bug. It's perfectly valid for bo->preferred_domains to be 0. So the following WARN_ON() that no bit is set is incorrect. + return 0; + return fold_memtype(domain_to_pl[domain]) That would need specular execution mitigation if I'm not completely mistaken. Better use a switch/case statement. Do you mean change the array indexing to a switch statement? Yes. Did you mean array_index_nospec? Yes. Domain is not a direct userspace input and is calculated from the mask which sanitized to allowed values prior to this call. So I *think* switch is an overkill but don't mind it either. Just commenting FWIW. I missed that the mask is applied. Thinking more about it I'm not sure if we should do this conversion in the first place. IIRC Tvrtko you once suggested a patch which switched a bunch of code to use the TTM placement instead of the UAPI flags. Maybe 8fb0efb10184 ("drm/amdgpu: Reduce mem_type to domain double indirection") is what are you thinking of? Yes, exactly that one. Going more into this direction I think when we want to look at the current placement we should probably also use the TTM PL enumeration directly. It does this already. The placement flags are just to "invent" a TTM PL enum when bo->tbo.resource == NULL. Ah, good point! I though we would do the mapping the other way around. In this case that is even more something we should probably not do at all. When bo->tbo.resource is NULL then this BO isn't resident at all, so it should not account to resident memory. It doesn't, only for total. I should have pasted more context..: struct ttm_resource *res = bo->tbo.resource; ... /* DRM stats common fields: */ stats[type].total += size; if (drm_gem_object_is_shared_for_memory_stats(obj)) stats[type].drm.shared
Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime
On 23/10/2024 10:14, Christian König wrote: Am 23.10.24 um 09:38 schrieb Tvrtko Ursulin: On 22/10/2024 17:24, Christian König wrote: Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy): [Public] +static uint32_t fold_memtype(uint32_t memtype) { In general please add prefixes to even static functions, e.g. amdgpu_vm_ or amdgpu_bo_. + /* Squash private placements into 'cpu' to keep the legacy userspace view. */ + switch (mem_type) { + case TTM_PL_VRAM: + case TTM_PL_TT: + return memtype + default: + return TTM_PL_SYSTEM; + } +} + +static uint32_t bo_get_memtype(struct amdgpu_bo *bo) { That whole function belongs into amdgpu_bo.c Do you mean bo_get_memtype or fold_memtype? I debated whether bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and since it's using fold_memtype and only useful for memory stats because of folding the private placements I just left them here together with the other mem stats code. I can move it to amdgpu_bo.c make it return the memtype verbatim and just fold it when I do the accounting. I think that folding GDS, GWS and OA into system is also a bug. We should really not doing that. Just wanted to point out for this round that the code to query the current placement from a BO should probably go into amdgpu_bo.c and not amdgpu_vm.c + struct ttm_resource *res = bo->tbo.resource; + const uint32_t domain_to_pl[] = { + [ilog2(AMDGPU_GEM_DOMAIN_CPU)] = TTM_PL_SYSTEM, + [ilog2(AMDGPU_GEM_DOMAIN_GTT)] = TTM_PL_TT, + [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM, + [ilog2(AMDGPU_GEM_DOMAIN_GDS)] = AMDGPU_PL_GDS, + [ilog2(AMDGPU_GEM_DOMAIN_GWS)] = AMDGPU_PL_GWS, + [ilog2(AMDGPU_GEM_DOMAIN_OA)] = AMDGPU_PL_OA, + [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] = AMDGPU_PL_DOORBELL, + }; + uint32_t domain; + + if (res) + return fold_memtype(res->mem_type); + + /* + * If no backing store use one of the preferred domain for basic + * stats. We take the MSB since that should give a reasonable + * view. + */ + BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM < TTM_PL_SYSTEM); + domain = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK); + if (drm_WARN_ON_ONCE(&adev->ddev, + domain == 0 || --domain >= ARRAY_SIZE(domain_to_pl))) It's perfectly legal to create a BO without a placement. That one just won't have a backing store. This is lifted from the previous change I'm rebasing onto. I think what it’s trying to do is if the BO doesn't have a placement, use the "biggest" (VRAM > TT > SYSTEM) preferred placement for the purpose of accounting. Previously we just ignore BOs that doesn't have a placement. I guess there's argument for going with either approaches. I was not arguing, I'm simply pointing out a bug. It's perfectly valid for bo->preferred_domains to be 0. So the following WARN_ON() that no bit is set is incorrect. + return 0; + return fold_memtype(domain_to_pl[domain]) That would need specular execution mitigation if I'm not completely mistaken. Better use a switch/case statement. Do you mean change the array indexing to a switch statement? Yes. Did you mean array_index_nospec? Yes. Domain is not a direct userspace input and is calculated from the mask which sanitized to allowed values prior to this call. So I *think* switch is an overkill but don't mind it either. Just commenting FWIW. I missed that the mask is applied. Thinking more about it I'm not sure if we should do this conversion in the first place. IIRC Tvrtko you once suggested a patch which switched a bunch of code to use the TTM placement instead of the UAPI flags. Maybe 8fb0efb10184 ("drm/amdgpu: Reduce mem_type to domain double indirection") is what are you thinking of? Going more into this direction I think when we want to look at the current placement we should probably also use the TTM PL enumeration directly. It does this already. The placement flags are just to "invent" a TTM PL enum when bo->tbo.resource == NULL. if (!res) { /* * If no backing store use one of the preferred domain for basic * stats. We take the MSB since that should give a reasonable * view. */ BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM < TTM_PL_SYSTEM); type = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK); if (!type) return; type--; if (drm_WARN_ON_ONCE(&adev->ddev, type >= ARRAY_SIZE(domain_to_pl))) r
[PULL] drm-intel-gt-next
Hi Dave, Sima, This is the main pull request for 6.13 merge window. PXP GuC auto-teardown feature got enabled, GPU reset robustness improvement for Haswell and basic PMU functionality was enabled for Gen2 platforms. The rest is a handful of small cleanups. Regards, Tvrtko drm-intel-gt-next-2024-10-23: Driver Changes: Fixes/improvements/new stuff: - Enable PXP GuC autoteardown flow [guc] (Juston Li) - Retry RING_HEAD reset until it sticks [gt] (Nitin Gote) - Add basic PMU support for gen2 [pmu] (Ville Syrjälä) Miscellaneous: - Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich) - PMU code cleanups (Lucas De Marchi) - Fixed "CPU" -> "GPU" typo [gt] (Zhang He) - Gen2/3 interrupt handling cleanup (Ville Syrjälä) The following changes since commit 596a7f1084e49cc65072c458c348861e9b9ceab9: drm/i915: Remove extra unlikely helper (2024-09-05 15:44:37 -0400) are available in the Git repository at: https://gitlab.freedesktop.org/drm/i915/kernel.git tags/drm-intel-gt-next-2024-10-23 for you to fetch changes up to 6ef0e3ef2662db71d363af77ce31fa940bb7d525: drm/i915/gt: Retry RING_HEAD reset until it get sticks (2024-10-22 11:35:07 +0200) Driver Changes: Fixes/improvements/new stuff: - Enable PXP GuC autoteardown flow [guc] (Juston Li) - Retry RING_HEAD reset until it get sticks [gt] (Nitin Gote) - Add basic PMU support for gen2 [pmu] (Ville Syrjälä) Miscellaneous: - Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich) - PMU code cleanups (Lucas De Marchi) - Fixed "CPU" -> "GPU" typo [gt] (Zhang He) - Gen2/3 interrupt handling cleanup (Ville Syrjälä) Juston Li (1): drm/i915/guc: Enable PXP GuC autoteardown flow Lucas De Marchi (2): drm/i915/pmu: Drop is_igp() drm/i915/pmu: Use event_to_pmu() Nikita Zhandarovich (1): drm/i915/guc: prevent a possible int overflow in wq offsets Nitin Gote (1): drm/i915/gt: Retry RING_HEAD reset until it get sticks Ville Syrjälä (3): drm/i915/gt: Nuke gen2_irq_{enable,disable}() drm/i915/gt: s/gen3/gen2/ drm/i915/pmu: Add support for gen2 Zhang He (1): drm/i915/gt: Fixed "CPU" -> "GPU" typo drivers/gpu/drm/i915/gt/gen2_engine_cs.c | 23 ++ drivers/gpu/drm/i915/gt/gen2_engine_cs.h | 6 +-- drivers/gpu/drm/i915/gt/intel_engine_regs.h | 2 +- drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 2 +- drivers/gpu/drm/i915/gt/intel_ring_submission.c | 38 drivers/gpu/drm/i915/gt/uc/intel_guc.c| 8 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 +- drivers/gpu/drm/i915/i915_drv.h | 3 ++ drivers/gpu/drm/i915/i915_pmu.c | 54 +-- drivers/gpu/drm/i915/pxp/intel_pxp.c | 2 +- 11 files changed, 82 insertions(+), 61 deletions(-)
Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime
On 22/10/2024 17:24, Christian König wrote: Am 22.10.24 um 17:17 schrieb Li, Yunxiang (Teddy): [Public] +static uint32_t fold_memtype(uint32_t memtype) { In general please add prefixes to even static functions, e.g. amdgpu_vm_ or amdgpu_bo_. + /* Squash private placements into 'cpu' to keep the legacy userspace view. */ + switch (mem_type) { + case TTM_PL_VRAM: + case TTM_PL_TT: + return memtype + default: + return TTM_PL_SYSTEM; + } +} + +static uint32_t bo_get_memtype(struct amdgpu_bo *bo) { That whole function belongs into amdgpu_bo.c Do you mean bo_get_memtype or fold_memtype? I debated whether bo_get_memtype should go into amdgpu_vm.c or amdgpu_bo.c, and since it's using fold_memtype and only useful for memory stats because of folding the private placements I just left them here together with the other mem stats code. I can move it to amdgpu_bo.c make it return the memtype verbatim and just fold it when I do the accounting. I think that folding GDS, GWS and OA into system is also a bug. We should really not doing that. Just wanted to point out for this round that the code to query the current placement from a BO should probably go into amdgpu_bo.c and not amdgpu_vm.c + struct ttm_resource *res = bo->tbo.resource; + const uint32_t domain_to_pl[] = { + [ilog2(AMDGPU_GEM_DOMAIN_CPU)] = TTM_PL_SYSTEM, + [ilog2(AMDGPU_GEM_DOMAIN_GTT)] = TTM_PL_TT, + [ilog2(AMDGPU_GEM_DOMAIN_VRAM)] = TTM_PL_VRAM, + [ilog2(AMDGPU_GEM_DOMAIN_GDS)] = AMDGPU_PL_GDS, + [ilog2(AMDGPU_GEM_DOMAIN_GWS)] = AMDGPU_PL_GWS, + [ilog2(AMDGPU_GEM_DOMAIN_OA)] = AMDGPU_PL_OA, + [ilog2(AMDGPU_GEM_DOMAIN_DOORBELL)] = AMDGPU_PL_DOORBELL, + }; + uint32_t domain; + + if (res) + return fold_memtype(res->mem_type); + + /* + * If no backing store use one of the preferred domain for basic + * stats. We take the MSB since that should give a reasonable + * view. + */ + BUILD_BUG_ON(TTM_PL_VRAM < TTM_PL_TT || TTM_PL_VRAM < TTM_PL_SYSTEM); + domain = fls(bo->preferred_domains & AMDGPU_GEM_DOMAIN_MASK); + if (drm_WARN_ON_ONCE(&adev->ddev, + domain == 0 || --domain >= ARRAY_SIZE(domain_to_pl))) It's perfectly legal to create a BO without a placement. That one just won't have a backing store. This is lifted from the previous change I'm rebasing onto. I think what it’s trying to do is if the BO doesn't have a placement, use the "biggest" (VRAM > TT > SYSTEM) preferred placement for the purpose of accounting. Previously we just ignore BOs that doesn't have a placement. I guess there's argument for going with either approaches. I was not arguing, I'm simply pointing out a bug. It's perfectly valid for bo->preferred_domains to be 0. So the following WARN_ON() that no bit is set is incorrect. + return 0; + return fold_memtype(domain_to_pl[domain]) That would need specular execution mitigation if I'm not completely mistaken. Better use a switch/case statement. Do you mean change the array indexing to a switch statement? Yes. Did you mean array_index_nospec? Domain is not a direct userspace input and is calculated from the mask which sanitized to allowed values prior to this call. So I *think* switch is an overkill but don't mind it either. Just commenting FWIW. Regards, Tvrtko
Re: [PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime
On 22/10/2024 18:06, Christian König wrote: Am 22.10.24 um 18:46 schrieb Li, Yunxiang (Teddy): [Public] I suppose we could add a field like amd-memory-private: to cover the private placements. No, that is not really appropriate either. GWS, GDS and OA are not memory in the first place. Those BOs are HW blocks which the driver allocated to use. So accounting them for the memory usage doesn't make any sense at all. We could print them in the fdinfo as something special for statistics, but it's probably not that useful. When would a BO not have a placement, is it when it is being moved? There are BOs which are only temporary, so when they are evicted their backing store is just discarded. Additional to that allocation of backing store is sometimes delayed until the first use. Would this work correctly if instead of preferred allowed mask was used? Point being, to correctly support fdinfo stats drm-total-, *if* a BO *can* have a backing store at any point it should always be counted there. *If* it currently has a placement it is drm-resident-. If it has a placement but can be discarded it is drm-purgeable-. Etc. Regards, Tvrtko Since we are tracking the state changes, I wonder if such situations can be avoided now so whenever we call these stat update functions the BO would always have a placement. No, as I said before those use cases are perfectly valid. BO don't need a backing store nor do they need a placement. So the code has to gracefully handle that. Regards, Christian. Teddy
Re: [PATCH v5 2/4] drm/amdgpu: make drm-memory-* report resident memory
On 18/10/2024 14:33, Yunxiang Li wrote: The old behavior reports the resident memory usage for this key and the documentation say so as well. However this was accidentally changed to include buffers that was evicted. Fixes: a2529f67e2ed ("drm/amdgpu: Use drm_print_memory_stats helper from fdinfo") Signed-off-by: Yunxiang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 - 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c index 00a4ab082459f..8281dd45faaa0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "amdgpu.h" #include "amdgpu_vm.h" @@ -95,11 +96,11 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct drm_file *file) /* Legacy amdgpu keys, alias to drm-resident-memory-: */ drm_printf(p, "drm-memory-vram:\t%llu KiB\n", - stats[TTM_PL_VRAM].total/1024UL); + stats[TTM_PL_VRAM].drm.resident/1024UL); drm_printf(p, "drm-memory-gtt: \t%llu KiB\n", - stats[TTM_PL_TT].total/1024UL); + stats[TTM_PL_TT].drm.resident/1024UL); drm_printf(p, "drm-memory-cpu: \t%llu KiB\n", - stats[TTM_PL_SYSTEM].total/1024UL); + stats[TTM_PL_SYSTEM].drm.resident/1024UL); /* Amdgpu specific memory accounting keys: */ drm_printf(p, "amd-memory-visible-vram:\t%llu KiB\n", diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 045222b6bd049..2a53e72f3964f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1223,7 +1223,6 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, /* DRM stats common fields: */ - stats[type].total += size; if (drm_gem_object_is_shared_for_memory_stats(obj)) stats[type].drm.shared += size; else diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 7260349917ef0..a5653f474f85c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -142,7 +142,6 @@ struct amdgpu_bo_vm { struct amdgpu_mem_stats { struct drm_memory_stats drm; - uint64_t total; uint64_t visible; uint64_t evicted; uint64_t evicted_visible; LGTM, thanks for fixing it! Reviewed-by: Tvrtko Ursulin Regards, Tvrtko
[PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job
From: Tvrtko Ursulin In FIFO mode (which is the default), both drm_sched_entity_push_job() and drm_sched_rq_update_fifo(), where the latter calls the former, are currently taking and releasing the same entity->rq_lock. We can avoid that design inelegance, and also have a miniscule efficiency improvement on the submit from idle path, by introducing a new drm_sched_rq_update_fifo_locked() helper and pulling up the lock taking to its callers. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) v3: * Improved commit message. (Philipp) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 13 + drivers/gpu/drm/scheduler/sched_main.c | 6 +++--- include/drm/gpu_scheduler.h | 2 +- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 2951fcc2e6b1..b72cba292839 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) struct drm_sched_job *next; next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); - if (next) - drm_sched_rq_update_fifo(entity, next->submit_ts); + if (next) { + spin_lock(&entity->rq_lock); + drm_sched_rq_update_fifo_locked(entity, + next->submit_ts); + spin_unlock(&entity->rq_lock); + } } /* Jobs and entities might have different lifecycles. Since we're @@ -613,10 +617,11 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; drm_sched_rq_add_entity(rq, entity); - spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, submit_ts); + + spin_unlock(&entity->rq_lock); drm_sched_wakeup(sched); } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index e32b0f7d7e94..bbd1630407e4 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -169,14 +169,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti } } -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the * other to update the rb tree structure. */ - spin_lock(&entity->rq_lock); + lockdep_assert_held(&entity->rq_lock); + spin_lock(&entity->rq->lock); drm_sched_rq_remove_fifo_locked(entity); @@ -187,7 +188,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) drm_sched_entity_compare_before); spin_unlock(&entity->rq->lock); - spin_unlock(&entity->rq_lock); } /** diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index e9f075f51db3..3658a6cb048e 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity); -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts); +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts); int drm_sched_entity_init(struct drm_sched_entity *entity, enum drm_sched_priority priority, -- 2.46.0
[PATCH 2/5] drm/sched: Stop setting current entity in FIFO mode
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Acked-by: Christian König Reviewed-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index bbd1630407e4..07ee386b8e4b 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -355,7 +355,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched, return ERR_PTR(-ENOSPC); } - rq->current_entity = entity; reinit_completion(&entity->entity_idle); break; } -- 2.46.0
[PATCH v2 0/5] Small DRM scheduler improvements
From: Tvrtko Ursulin Leftovers from the earlier "DRM scheduler fixes and improvements" series. It looks the fixes have now propagated back to drm-misc-next so this should now be mergeable. It also needed a small rebase to account for one revert and one spelling fix which landed in the meantime. As a reminder, what remains are kerneldoc improvements, struct layout tweaks for clarity, one trivial cleanup for the FIFO mode, and most importantly two spin lock-unlock cycles are removed from the push job path by pulling taking of the locks one level up. I smoke tested it on the Steam Deck and lockdep seems happy. v2: * Tweaks to commit messages and rename of some leftover rq_lock naming inside kerneldoc. Cc: Christian König Cc: Philipp Stanner Tvrtko Ursulin (5): drm/sched: Optimise drm_sched_entity_push_job drm/sched: Stop setting current entity in FIFO mode drm/sched: Re-order struct drm_sched_rq members for clarity drm/sched: Re-group and rename the entity run-queue lock drm/sched: Further optimise drm_sched_entity_push_job drivers/gpu/drm/scheduler/sched_entity.c | 42 +++- drivers/gpu/drm/scheduler/sched_main.c | 32 +- include/drm/gpu_scheduler.h | 34 ++- 3 files changed, 61 insertions(+), 47 deletions(-) -- 2.46.0
[PATCH 5/5] drm/sched: Further optimise drm_sched_entity_push_job
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming inconsistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Reviewed-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 11 +++-- drivers/gpu/drm/scheduler/sched_main.c | 29 include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 25 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c013c2b49aa5..69bcf0e99d57 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next->submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -616,11 +621,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) rq = entity->rq; sched = rq->sched; + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 2670bf9f34b2..6e4d004d09ce 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -159,17 +159,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, +struct drm_sched_rq *rq, +ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -177,17 +178,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -219,15 +217,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, void drm_sched_rq_ad
[PATCH 3/5] drm/sched: Re-order struct drm_sched_rq members for clarity
From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. While at it, lets also re-order the members so all protected by the lock are in a single group. v2: * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Reviewed-by: Philipp Stanner --- include/drm/gpu_scheduler.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 3658a6cb048e..b6d095074c19 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -243,10 +243,10 @@ struct drm_sched_entity { /** * struct drm_sched_rq - queue of entities to be scheduled. * - * @lock: to modify the entities list. * @sched: the scheduler to which this rq belongs to. - * @entities: list of the entities to be scheduled. + * @lock: protects @entities, @rb_tree_root and @current_entity. * @current_entity: the entity which is to be scheduled. + * @entities: list of the entities to be scheduled. * @rb_tree_root: root of time based priority queue of entities for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for @@ -254,10 +254,12 @@ struct drm_sched_entity { * the next entity to emit commands from. */ struct drm_sched_rq { - spinlock_t lock; struct drm_gpu_scheduler*sched; - struct list_headentities; + + spinlock_t lock; + /* Following members are protected by the @lock: */ struct drm_sched_entity *current_entity; + struct list_headentities; struct rb_root_cached rb_tree_root; }; -- 2.46.0
[PATCH 4/5] drm/sched: Re-group and rename the entity run-queue lock
From: Tvrtko Ursulin When writing to a drm_sched_entity's run-queue, writers are protected through the lock drm_sched_entity.rq_lock. This naming, however, frequently collides with the separate internal lock of struct drm_sched_rq, resulting in uses like this: spin_lock(&entity->rq_lock); spin_lock(&entity->rq->lock); Rename drm_sched_entity.rq_lock to improve readability. While at it, re-order that struct's members to make it more obvious what the lock protects. v2: * Rename some rq_lock straddlers in kerneldoc, improve commit text. (Philipp) Signed-off-by: Tvrtko Ursulin Suggested-by: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 28 drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/gpu_scheduler.h | 21 +- 3 files changed, 26 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b72cba292839..c013c2b49aa5 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, /* We start in an idle state. */ complete_all(&entity->entity_idle); - spin_lock_init(&entity->rq_lock); + spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); atomic_set(&entity->fence_seq, 0); @@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); @@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) if (!entity->rq) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle); @@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, void drm_sched_entity_set_priority(struct drm_sched_entity *entity, enum drm_sched_priority priority) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->priority = priority; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_set_priority); @@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, next->submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } } @@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (fence && !dma_fence_is_signaled(fence)) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? sched->sched_rq[entity->priority] : NULL; if (rq != entity->rq) { drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; } - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); if (entity->num_sched_list == 1) entity->sched_list = NULL; @@ -605,9 +605,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) struct drm_sched_rq *rq; /* Add the entity to the run queue */ - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); if (entity->stopped) { - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); DRM_ERROR("Trying to push t
Re: [RFC PATCH 1/2] drm/drm_file: Add display of driver's internal memory size
On 15/10/2024 20:05, Adrián Larumbe wrote: Hi Tvrtko, On 10.10.2024 10:50, Tvrtko Ursulin wrote: On 09/10/2024 23:55, Adrián Larumbe wrote: Hi Tvrtko, On 04.10.2024 14:41, Tvrtko Ursulin wrote: Hi Adrian, On 03/10/2024 00:45, Adrián Larumbe wrote: Some drivers must allocate a considerable amount of memory for bookkeeping structures and GPU's MCU-kernel shared communication regions. These are often created as a result of the invocation of the driver's ioctl() interface functions, so it is sensible to consider them as being owned by the render context associated with an open drm file. However, at the moment drm_show_memory_stats only traverses the UM-exposed drm objects for which a handle exists. Private driver objects and memory regions, though connected to a render context, are unaccounted for in their fdinfo numbers. Add a new drm_memory_stats 'internal' memory category. Because deciding what constitutes an 'internal' object and where to find these are driver-dependent, calculation of this size must be done through a driver-provided function pointer, which becomes the third argument of drm_show_memory_stats. Drivers which have no interest in exposing the size of internal memory objects can keep passing NULL for unaltered behaviour. Signed-off-by: Adrián Larumbe Cc: Rob Clark Cc: Tvrtko Ursulin Cc: Lucas De Marchi --- drivers/gpu/drm/drm_file.c | 6 +- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- drivers/gpu/drm/v3d/v3d_drv.c | 2 +- include/drm/drm_file.h | 7 ++- 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index ad1dc638c83b..937471339c9a 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -856,6 +856,7 @@ void drm_print_memory_stats(struct drm_printer *p, print_size(p, "total", region, stats->private + stats->shared); print_size(p, "shared", region, stats->shared); print_size(p, "active", region, stats->active); + print_size(p, "internal", region, stats->internal); if (supported_status & DRM_GEM_OBJECT_RESIDENT) print_size(p, "resident", region, stats->resident); @@ -873,7 +874,7 @@ EXPORT_SYMBOL(drm_print_memory_stats); * Helper to iterate over GEM objects with a handle allocated in the specified * file. */ -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, internal_bos func) { struct drm_gem_object *obj; struct drm_memory_stats status = {}; @@ -919,6 +920,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } spin_unlock(&file->table_lock); + if (func) + func(&status, file); + drm_print_memory_stats(p, &status, supported_status, "memory"); } EXPORT_SYMBOL(drm_show_memory_stats); diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index edbc1ab0fbc8..2b3feb79afc4 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file) msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 04d615df5259..aaa8602bf00d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -609,7 +609,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index fb35c5c3f1a7..314e77c67972 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -195,7 +195,7 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) v3d_queue_to_string(queue), jobs_completed); } - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations v3d_drm_fops = { diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 8c0030c77308..661d00d5350e 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -469,6 +469,7 @@ void drm_send_event_timestamp_locked(struct drm_device *dev, * @resident: Total size of GE
Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job
On 15/10/2024 15:00, Philipp Stanner wrote: On Tue, 2024-10-15 at 14:14 +0100, Tvrtko Ursulin wrote: On 15/10/2024 12:38, Philipp Stanner wrote: On Tue, 2024-10-15 at 09:12 +0100, Tvrtko Ursulin wrote: On 15/10/2024 08:11, Philipp Stanner wrote: On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote: On 14/10/2024 12:32, Philipp Stanner wrote: Hi, On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re- acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Please write detailed commit messages, as described here [1]. 1. Describe the problem: current state and why it's bad. 2. Then, describe in imperative (present tense) form what the commit does about the problem. Both pieces of info are already there: 1. Drops the lock to immediately re-acquire it. 2. We avoid that by by adding a locked helper. Optionally, in between can be information about why it's solved this way and not another etc. Applies to the other patches, too. [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Thanks I am new here and did not know this. Seriosuly, lets not be too blindly strict about this because it can get IMO ridiculous. One example when I previously accomodated your request is patch 3/5 from this series: """ Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. """ While this was the original commit text you weren't happy with: """ drm/sched: Re-order struct drm_sched_rq members for clarity Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. """ I maintain the original text was passable. On top, this was just a respin to accomodate the merge process. All approvals were done and dusted couple weeks or so ago so asking for yet another respin for such trivial objections is not great. I understand that you're unhappy, but please understand the position I'm coming from. As you know, since you sent these patches within a different series (and, thus, since I reviewed them), I was trusted with co-maintaining this piece of shared infrastructure. And since you've worked on it a bit now, I suppose you also know that the GPU Scheduler is arguably in quite a bad shape, has far too little documentation, has leaks, maybe race conditions, parts *where the locking rules are unclear* and is probably only fully understood by a small hand full of people. I also argue that this is a *very* complicated piece of software. We already went over that and agreed. Not least I agreed the base is shaky since few years ago. :) Btw if things align, I hope you will at some point see a follow up series from me which makes some significant simplifications and improvements at the same time. Cool, good to hear! (Would be even cooler if simplifications and improvements can be delivered through separate patch series to be easier to review etc.) Yes, when I spot something I pull it ahead and/or standalone when it makes sense. But it is early days and a big job. So I might be or appear to be a bit pedantic, but I'm not doing that to terrorize you, but because I want this thing to become well documented, understandable, and bisectable. Working towards a canonical, idiot- proof commit style is one measure that will help with that. I want to offer you the following: I can be more relaxed with things universally recognized as trivial (comment changes, struct member reordering) – but when something like a lock is touched in any way, we shall document that in the commit message as canonically as possible, so someone who's less experienced and just bisected the commit immediately understands what has been done (or rather: was supposed to be done). So how would you suggest to expand this commit text so it doesn't read too self-repeating? My issue with this particular commit message is mainly that it doesn't make it obvious what the patch is supposed to do. So one can make it quicker and better to review by detailing it a bit more, so the reviewer then can compare commit message vs. what the code does. It seems to me for example that the actual optimization is being done in drm_sched_entity_push_job(), and drm_sched_entity_pop_job() had to be ported, too, for correctness "It seems" aka the commit title says so. ;) Another small thing that might be cool is something that makes it a bit more obvious that this is an optimization, not a fix. So I would probably write: "So far, drm_sched_rq_update_fifo() automatically takes drm_sched_entity.rq_lock. For DRM_SCHED_POLICY_FIFO, this is ineffic
Re: [PATCH 4/5] drm/sched: Re-group and rename the entity run-queue lock
On 15/10/2024 12:56, Philipp Stanner wrote: On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation Let's move it to Annotators: Suggested-by: Christian König Ack. (Otherwise some time in the future a Christian Kaiser might start working on the scheduler on steal the praise ^^) of what it protects. So without Christian's name here I'd phrase it as: "When writing to a drm_sched_entity's run-queue, writers are protected through the lock drm_sched_entity.rq_lock. This naming, however, frequently collides with the separate internal lock of struct drm_sched_rq, resulting in uses like this: spin_lock(&entity->rq_lock); spin_lock(&entity->rq->lock); Rename drm_sched_entity.rq_lock to improve readability. While at it, re-order that struct's members to make it more obvious what the lock protects. Will copy&paste - thanks for typing it out. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 28 -- -- drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/gpu_scheduler.h | 15 +++-- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b72cba292839..c013c2b49aa5 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, /* We start in an idle state. */ complete_all(&entity->entity_idle); - spin_lock_init(&entity->rq_lock); + spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); atomic_set(&entity->fence_seq, 0); @@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); @@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) if (!entity->rq) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle); @@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, void drm_sched_entity_set_priority(struct drm_sched_entity *entity, enum drm_sched_priority priority) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->priority = priority; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_set_priority); @@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity- job_queue)); if (next) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, next- submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } } @@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (fence && !dma_fence_is_signaled(fence)) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity- num_sched_list); rq = sched ? sched->sched_rq[entity->priority] : NULL; if (rq != entity->rq) { drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; } - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); if (entity->num_sched_list == 1) entity->sched_list = NULL; @@ -60
Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job
On 15/10/2024 12:38, Philipp Stanner wrote: On Tue, 2024-10-15 at 09:12 +0100, Tvrtko Ursulin wrote: On 15/10/2024 08:11, Philipp Stanner wrote: On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote: On 14/10/2024 12:32, Philipp Stanner wrote: Hi, On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re- acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Please write detailed commit messages, as described here [1]. 1. Describe the problem: current state and why it's bad. 2. Then, describe in imperative (present tense) form what the commit does about the problem. Both pieces of info are already there: 1. Drops the lock to immediately re-acquire it. 2. We avoid that by by adding a locked helper. Optionally, in between can be information about why it's solved this way and not another etc. Applies to the other patches, too. [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Thanks I am new here and did not know this. Seriosuly, lets not be too blindly strict about this because it can get IMO ridiculous. One example when I previously accomodated your request is patch 3/5 from this series: """ Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. """ While this was the original commit text you weren't happy with: """ drm/sched: Re-order struct drm_sched_rq members for clarity Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. """ I maintain the original text was passable. On top, this was just a respin to accomodate the merge process. All approvals were done and dusted couple weeks or so ago so asking for yet another respin for such trivial objections is not great. I understand that you're unhappy, but please understand the position I'm coming from. As you know, since you sent these patches within a different series (and, thus, since I reviewed them), I was trusted with co-maintaining this piece of shared infrastructure. And since you've worked on it a bit now, I suppose you also know that the GPU Scheduler is arguably in quite a bad shape, has far too little documentation, has leaks, maybe race conditions, parts *where the locking rules are unclear* and is probably only fully understood by a small hand full of people. I also argue that this is a *very* complicated piece of software. We already went over that and agreed. Not least I agreed the base is shaky since few years ago. :) Btw if things align, I hope you will at some point see a follow up series from me which makes some significant simplifications and improvements at the same time. Cool, good to hear! (Would be even cooler if simplifications and improvements can be delivered through separate patch series to be easier to review etc.) Yes, when I spot something I pull it ahead and/or standalone when it makes sense. But it is early days and a big job. So I might be or appear to be a bit pedantic, but I'm not doing that to terrorize you, but because I want this thing to become well documented, understandable, and bisectable. Working towards a canonical, idiot- proof commit style is one measure that will help with that. I want to offer you the following: I can be more relaxed with things universally recognized as trivial (comment changes, struct member reordering) – but when something like a lock is touched in any way, we shall document that in the commit message as canonically as possible, so someone who's less experienced and just bisected the commit immediately understands what has been done (or rather: was supposed to be done). So how would you suggest to expand this commit text so it doesn't read too self-repeating? My issue with this particular commit message is mainly that it doesn't make it obvious what the patch is supposed to do. So one can make it quicker and better to review by detailing it a bit more, so the reviewer then can compare commit message vs. what the code does. It seems to me for example that the actual optimization is being done in drm_sched_entity_push_job(), and drm_sched_entity_pop_job() had to be ported, too, for correctness "It seems" aka the commit title says so. ;) Another small thing that might be cool is something that makes it a bit more obvious that this is an optimization, not a fix. So I would probably write: "So far, drm_sched_rq_update_fifo() automatically takes drm_sched_entity.rq_lock. For DRM_SCHED_POLICY_FIFO, this is inefficient because that lock is then taken, released and retaken in drm_sched_entity_push_job(). Improve p
Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job
On 15/10/2024 08:11, Philipp Stanner wrote: On Mon, 2024-10-14 at 13:07 +0100, Tvrtko Ursulin wrote: On 14/10/2024 12:32, Philipp Stanner wrote: Hi, On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re- acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Please write detailed commit messages, as described here [1]. 1. Describe the problem: current state and why it's bad. 2. Then, describe in imperative (present tense) form what the commit does about the problem. Both pieces of info are already there: 1. Drops the lock to immediately re-acquire it. 2. We avoid that by by adding a locked helper. Optionally, in between can be information about why it's solved this way and not another etc. Applies to the other patches, too. [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Thanks I am new here and did not know this. Seriosuly, lets not be too blindly strict about this because it can get IMO ridiculous. One example when I previously accomodated your request is patch 3/5 from this series: """ Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. """ While this was the original commit text you weren't happy with: """ drm/sched: Re-order struct drm_sched_rq members for clarity Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. """ I maintain the original text was passable. On top, this was just a respin to accomodate the merge process. All approvals were done and dusted couple weeks or so ago so asking for yet another respin for such trivial objections is not great. I understand that you're unhappy, but please understand the position I'm coming from. As you know, since you sent these patches within a different series (and, thus, since I reviewed them), I was trusted with co-maintaining this piece of shared infrastructure. And since you've worked on it a bit now, I suppose you also know that the GPU Scheduler is arguably in quite a bad shape, has far too little documentation, has leaks, maybe race conditions, parts *where the locking rules are unclear* and is probably only fully understood by a small hand full of people. I also argue that this is a *very* complicated piece of software. We already went over that and agreed. Not least I agreed the base is shaky since few years ago. :) Btw if things align, I hope you will at some point see a follow up series from me which makes some significant simplifications and improvements at the same time. So I might be or appear to be a bit pedantic, but I'm not doing that to terrorize you, but because I want this thing to become well documented, understandable, and bisectable. Working towards a canonical, idiot- proof commit style is one measure that will help with that. I want to offer you the following: I can be more relaxed with things universally recognized as trivial (comment changes, struct member reordering) – but when something like a lock is touched in any way, we shall document that in the commit message as canonically as possible, so someone who's less experienced and just bisected the commit immediately understands what has been done (or rather: was supposed to be done). So how would you suggest to expand this commit text so it doesn't read too self-repeating? Regards, Tvrtko
Re: [PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job
On 14/10/2024 12:32, Philipp Stanner wrote: Hi, On Mon, 2024-10-14 at 11:46 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re- acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Please write detailed commit messages, as described here [1]. 1. Describe the problem: current state and why it's bad. 2. Then, describe in imperative (present tense) form what the commit does about the problem. Both pieces of info are already there: 1. Drops the lock to immediately re-acquire it. 2. We avoid that by by adding a locked helper. Optionally, in between can be information about why it's solved this way and not another etc. Applies to the other patches, too. [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Thanks I am new here and did not know this. Seriosuly, lets not be too blindly strict about this because it can get IMO ridiculous. One example when I previously accomodated your request is patch 3/5 from this series: """ Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. """ While this was the original commit text you weren't happy with: """ drm/sched: Re-order struct drm_sched_rq members for clarity Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. """ I maintain the original text was passable. On top, this was just a respin to accomodate the merge process. All approvals were done and dusted couple weeks or so ago so asking for yet another respin for such trivial objections is not great. Regards, Tvrtko v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 13 + drivers/gpu/drm/scheduler/sched_main.c | 6 +++--- include/drm/gpu_scheduler.h | 2 +- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 2951fcc2e6b1..b72cba292839 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) struct drm_sched_job *next; next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); - if (next) - drm_sched_rq_update_fifo(entity, next->submit_ts); + if (next) { + spin_lock(&entity->rq_lock); + drm_sched_rq_update_fifo_locked(entity, + next->submit_ts); + spin_unlock(&entity->rq_lock); + } } /* Jobs and entities might have different lifecycles. Since we're @@ -613,10 +617,11 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; drm_sched_rq_add_entity(rq, entity); - spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, submit_ts); + + spin_unlock(&entity->rq_lock); drm_sched_wakeup(sched); } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index e32b0f7d7e94..bbd1630407e4 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -169,14 +169,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti } } -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) Since you touch function name / signature already, would you mind writing a small doc string that also mentions the locking requirements or lack of the same? { /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the * other to update the rb tree structure. */ It seems to me that the comment above is now out of date, no? Thx for your efforts, P. - spin_lock(&entity->rq_lock); + lockdep_assert_held(&entity->rq_lock); + spin_lock(&entity->rq->lock); drm_sched_rq_remove_fifo_locked(entity); @@ -187,7 +188,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) drm_sched_entity_compare_before); spin_unlock(&entity->rq->lock); - spin_unlock(&entity->rq_lock); } /** diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_
[PATCH 3/5] drm/sched: Re-order struct drm_sched_rq members for clarity
From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. While at it, lets also re-order the members so all protected by the lock are in a single group. v2: * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Reviewed-by: Philipp Stanner --- include/drm/gpu_scheduler.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 3658a6cb048e..b6d095074c19 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -243,10 +243,10 @@ struct drm_sched_entity { /** * struct drm_sched_rq - queue of entities to be scheduled. * - * @lock: to modify the entities list. * @sched: the scheduler to which this rq belongs to. - * @entities: list of the entities to be scheduled. + * @lock: protects @entities, @rb_tree_root and @current_entity. * @current_entity: the entity which is to be scheduled. + * @entities: list of the entities to be scheduled. * @rb_tree_root: root of time based priority queue of entities for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for @@ -254,10 +254,12 @@ struct drm_sched_entity { * the next entity to emit commands from. */ struct drm_sched_rq { - spinlock_t lock; struct drm_gpu_scheduler*sched; - struct list_headentities; + + spinlock_t lock; + /* Following members are protected by the @lock: */ struct drm_sched_entity *current_entity; + struct list_headentities; struct rb_root_cached rb_tree_root; }; -- 2.46.0
[PATCH 4/5] drm/sched: Re-group and rename the entity run-queue lock
From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 28 drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/gpu_scheduler.h | 15 +++-- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b72cba292839..c013c2b49aa5 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, /* We start in an idle state. */ complete_all(&entity->entity_idle); - spin_lock_init(&entity->rq_lock); + spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); atomic_set(&entity->fence_seq, 0); @@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); @@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) if (!entity->rq) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle); @@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, void drm_sched_entity_set_priority(struct drm_sched_entity *entity, enum drm_sched_priority priority) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->priority = priority; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_set_priority); @@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, next->submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } } @@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (fence && !dma_fence_is_signaled(fence)) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? sched->sched_rq[entity->priority] : NULL; if (rq != entity->rq) { drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; } - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); if (entity->num_sched_list == 1) entity->sched_list = NULL; @@ -605,9 +605,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) struct drm_sched_rq *rq; /* Add the entity to the run queue */ - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); if (entity->stopped) { - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); DRM_ERROR("Trying to push to a killed entity\n"); return; @@ -621,7 +621,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo_locked(entity, submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock);
[PATCH 2/5] drm/sched: Stop setting current entity in FIFO mode
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Acked-by: Christian König Reviewed-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index bbd1630407e4..07ee386b8e4b 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -355,7 +355,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched, return ERR_PTR(-ENOSPC); } - rq->current_entity = entity; reinit_completion(&entity->entity_idle); break; } -- 2.46.0
[PATCH 1/5] drm/sched: Optimise drm_sched_entity_push_job
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 13 + drivers/gpu/drm/scheduler/sched_main.c | 6 +++--- include/drm/gpu_scheduler.h | 2 +- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 2951fcc2e6b1..b72cba292839 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) struct drm_sched_job *next; next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); - if (next) - drm_sched_rq_update_fifo(entity, next->submit_ts); + if (next) { + spin_lock(&entity->rq_lock); + drm_sched_rq_update_fifo_locked(entity, + next->submit_ts); + spin_unlock(&entity->rq_lock); + } } /* Jobs and entities might have different lifecycles. Since we're @@ -613,10 +617,11 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; drm_sched_rq_add_entity(rq, entity); - spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, submit_ts); + + spin_unlock(&entity->rq_lock); drm_sched_wakeup(sched); } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index e32b0f7d7e94..bbd1630407e4 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -169,14 +169,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti } } -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the * other to update the rb tree structure. */ - spin_lock(&entity->rq_lock); + lockdep_assert_held(&entity->rq_lock); + spin_lock(&entity->rq->lock); drm_sched_rq_remove_fifo_locked(entity); @@ -187,7 +188,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) drm_sched_entity_compare_before); spin_unlock(&entity->rq->lock); - spin_unlock(&entity->rq_lock); } /** diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index e9f075f51db3..3658a6cb048e 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity); -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts); +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts); int drm_sched_entity_init(struct drm_sched_entity *entity, enum drm_sched_priority priority, -- 2.46.0
[PATCH 5/5] drm/sched: Further optimise drm_sched_entity_push_job
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Reviewed-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 11 +++-- drivers/gpu/drm/scheduler/sched_main.c | 29 include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 25 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index c013c2b49aa5..69bcf0e99d57 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next->submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -616,11 +621,13 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) rq = entity->rq; sched = rq->sched; + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 2670bf9f34b2..6e4d004d09ce 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -159,17 +159,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, +struct drm_sched_rq *rq, +ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -177,17 +178,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -219,15 +217,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, void drm_sched_rq_ad
[PATCH 0/5] Small DRM scheduler improvements
From: Tvrtko Ursulin Leftovers from the earlier "DRM scheduler fixes and improvements" series. It looks the fixes have now propagated back to drm-misc-next so this should now be mergeable. It also needed a small rebase to account for one revert and one spelling fix which landed in the meantime. As a reminder, what remains are kerneldoc improvements, struct layout tweaks for clarity, one trivial cleanup for the FIFO mode, and most importantly two spin lock-unlock cycles are removed from the push job path by pulling taking of the locks one level up. I smoke tested it on the Steam Deck and lockdep seems happy. Cc: Christian König Cc: Philipp Stanner Tvrtko Ursulin (5): drm/sched: Optimise drm_sched_entity_push_job drm/sched: Stop setting current entity in FIFO mode drm/sched: Re-order struct drm_sched_rq members for clarity drm/sched: Re-group and rename the entity run-queue lock drm/sched: Further optimise drm_sched_entity_push_job drivers/gpu/drm/scheduler/sched_entity.c | 42 +++- drivers/gpu/drm/scheduler/sched_main.c | 32 +- include/drm/gpu_scheduler.h | 28 +--- 3 files changed, 58 insertions(+), 44 deletions(-) -- 2.46.0
Re: [RFC PATCH 1/2] drm/drm_file: Add display of driver's internal memory size
On 09/10/2024 23:55, Adrián Larumbe wrote: Hi Tvrtko, On 04.10.2024 14:41, Tvrtko Ursulin wrote: Hi Adrian, On 03/10/2024 00:45, Adrián Larumbe wrote: Some drivers must allocate a considerable amount of memory for bookkeeping structures and GPU's MCU-kernel shared communication regions. These are often created as a result of the invocation of the driver's ioctl() interface functions, so it is sensible to consider them as being owned by the render context associated with an open drm file. However, at the moment drm_show_memory_stats only traverses the UM-exposed drm objects for which a handle exists. Private driver objects and memory regions, though connected to a render context, are unaccounted for in their fdinfo numbers. Add a new drm_memory_stats 'internal' memory category. Because deciding what constitutes an 'internal' object and where to find these are driver-dependent, calculation of this size must be done through a driver-provided function pointer, which becomes the third argument of drm_show_memory_stats. Drivers which have no interest in exposing the size of internal memory objects can keep passing NULL for unaltered behaviour. Signed-off-by: Adrián Larumbe Cc: Rob Clark Cc: Tvrtko Ursulin Cc: Lucas De Marchi --- drivers/gpu/drm/drm_file.c | 6 +- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- drivers/gpu/drm/v3d/v3d_drv.c | 2 +- include/drm/drm_file.h | 7 ++- 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index ad1dc638c83b..937471339c9a 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -856,6 +856,7 @@ void drm_print_memory_stats(struct drm_printer *p, print_size(p, "total", region, stats->private + stats->shared); print_size(p, "shared", region, stats->shared); print_size(p, "active", region, stats->active); + print_size(p, "internal", region, stats->internal); if (supported_status & DRM_GEM_OBJECT_RESIDENT) print_size(p, "resident", region, stats->resident); @@ -873,7 +874,7 @@ EXPORT_SYMBOL(drm_print_memory_stats); * Helper to iterate over GEM objects with a handle allocated in the specified * file. */ -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, internal_bos func) { struct drm_gem_object *obj; struct drm_memory_stats status = {}; @@ -919,6 +920,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } spin_unlock(&file->table_lock); + if (func) + func(&status, file); + drm_print_memory_stats(p, &status, supported_status, "memory"); } EXPORT_SYMBOL(drm_show_memory_stats); diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index edbc1ab0fbc8..2b3feb79afc4 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file) msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 04d615df5259..aaa8602bf00d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -609,7 +609,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index fb35c5c3f1a7..314e77c67972 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -195,7 +195,7 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) v3d_queue_to_string(queue), jobs_completed); } - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations v3d_drm_fops = { diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 8c0030c77308..661d00d5350e 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -469,6 +469,7 @@ void drm_send_event_timestamp_locked(struct drm_device *dev, * @resident: Total size of GEM objects backing pages * @purgeable: Total size of GEM objects that can be purged (resident and not active) * @ac
Re: [RFC PATCH 1/2] drm/drm_file: Add display of driver's internal memory size
Hi Adrian, On 03/10/2024 00:45, Adrián Larumbe wrote: Some drivers must allocate a considerable amount of memory for bookkeeping structures and GPU's MCU-kernel shared communication regions. These are often created as a result of the invocation of the driver's ioctl() interface functions, so it is sensible to consider them as being owned by the render context associated with an open drm file. However, at the moment drm_show_memory_stats only traverses the UM-exposed drm objects for which a handle exists. Private driver objects and memory regions, though connected to a render context, are unaccounted for in their fdinfo numbers. Add a new drm_memory_stats 'internal' memory category. Because deciding what constitutes an 'internal' object and where to find these are driver-dependent, calculation of this size must be done through a driver-provided function pointer, which becomes the third argument of drm_show_memory_stats. Drivers which have no interest in exposing the size of internal memory objects can keep passing NULL for unaltered behaviour. Signed-off-by: Adrián Larumbe Cc: Rob Clark Cc: Tvrtko Ursulin Cc: Lucas De Marchi --- drivers/gpu/drm/drm_file.c | 6 +- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- drivers/gpu/drm/v3d/v3d_drv.c | 2 +- include/drm/drm_file.h | 7 ++- 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index ad1dc638c83b..937471339c9a 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -856,6 +856,7 @@ void drm_print_memory_stats(struct drm_printer *p, print_size(p, "total", region, stats->private + stats->shared); print_size(p, "shared", region, stats->shared); print_size(p, "active", region, stats->active); + print_size(p, "internal", region, stats->internal); if (supported_status & DRM_GEM_OBJECT_RESIDENT) print_size(p, "resident", region, stats->resident); @@ -873,7 +874,7 @@ EXPORT_SYMBOL(drm_print_memory_stats); * Helper to iterate over GEM objects with a handle allocated in the specified * file. */ -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, internal_bos func) { struct drm_gem_object *obj; struct drm_memory_stats status = {}; @@ -919,6 +920,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } spin_unlock(&file->table_lock); + if (func) + func(&status, file); + drm_print_memory_stats(p, &status, supported_status, "memory"); } EXPORT_SYMBOL(drm_show_memory_stats); diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index edbc1ab0fbc8..2b3feb79afc4 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file) msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 04d615df5259..aaa8602bf00d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -609,7 +609,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index fb35c5c3f1a7..314e77c67972 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -195,7 +195,7 @@ static void v3d_show_fdinfo(struct drm_printer *p, struct drm_file *file) v3d_queue_to_string(queue), jobs_completed); } - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations v3d_drm_fops = { diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 8c0030c77308..661d00d5350e 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -469,6 +469,7 @@ void drm_send_event_timestamp_locked(struct drm_device *dev, * @resident: Total size of GEM objects backing pages * @purgeable: Total size of GEM objects that can be purged (resident and not active) * @active: Total size of GEM objects active on one or more engines + * @internal: Total size of GEM objects that aren'
Re: [PATCH] drm/sched: revert "Always increment correct scheduler score"
On 30/09/2024 14:14, Christian König wrote: This reverts commit 087913e0ba2b3b9d7ccbafb2acf5dab9e35ae1d5. It turned out that the original code was correct since the rq can only change when there is no armed job for an entity. This change here broke the logic since we only incremented the counter for the first job, so revert it. Signed-off-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b2cf3e0c1838..a75eede8bf8d 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -586,6 +586,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) ktime_t submit_ts; trace_drm_sched_job(sched_job, entity); + atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); /* @@ -613,7 +614,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) rq = entity->rq; sched = rq->sched; - atomic_inc(sched->score); drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock); This was definitely broken so revert is the right thing, thank you. Acked-by: Tvrtko Ursulin Regards, Tvrtko
Re: [PATCH 3/8] drm/sched: Always increment correct scheduler score
On 30/09/2024 14:07, Christian König wrote: Am 30.09.24 um 15:01 schrieb Tvrtko Ursulin: On 13/09/2024 17:05, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian König Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: # v5.9+ Reviewed-by: Christian König Reviewed-by: Nirmoy Das --- drivers/gpu/drm/scheduler/sched_entity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 76e422548d40..6645a8524699 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) ktime_t submit_ts; trace_drm_sched_job(sched_job, entity); - atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); /* @@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) rq = entity->rq; sched = rq->sched; + atomic_inc(sched->score); Ugh this is wrong. :( I was working on some further consolidation and realised this. It will create an imbalance in score since score is currently supposed to be accounted twice: 1. +/- 1 for each entity (de-)queued 2. +/- 1 for each job queued/completed By moving it into the "if (first) branch" it unbalances it. But it is still true the original placement is racy. It looks like what is required is an unconditional entity->lock section after spsc_queue_push. AFAICT that's the only way to be sure entity->rq is set for the submission at hand. Question also is, why +/- score in entity add/remove and not just for jobs? In the meantime patch will need to get reverted. Ok going to revert that. Thank you, and sorry for the trouble! I also just realized that we don't need to change anything. The rq can't change as soon as there is a job armed for it. So having the increment right before pushing the armed job to the entity was actually correct in the first place. Are you sure? Two threads racing to arm and push on the same entity? T1 T2 arm job rq1 selected .. push jobarm job inc score rq1 spsc_queue_count check passes --- just before T1 spsc_queue_push --- changed to rq2 spsc_queue_push if (first) resamples entity->rq queues rq2 Where rq1 and rq2 belong to different schedulers. Regards, Tvrtko Regards, Christian. Regards, Tvrtko drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock);
Re: [PATCH 3/8] drm/sched: Always increment correct scheduler score
On 13/09/2024 17:05, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian König Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: # v5.9+ Reviewed-by: Christian König Reviewed-by: Nirmoy Das --- drivers/gpu/drm/scheduler/sched_entity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 76e422548d40..6645a8524699 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) ktime_t submit_ts; trace_drm_sched_job(sched_job, entity); - atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); /* @@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) rq = entity->rq; sched = rq->sched; + atomic_inc(sched->score); Ugh this is wrong. :( I was working on some further consolidation and realised this. It will create an imbalance in score since score is currently supposed to be accounted twice: 1. +/- 1 for each entity (de-)queued 2. +/- 1 for each job queued/completed By moving it into the "if (first) branch" it unbalances it. But it is still true the original placement is racy. It looks like what is required is an unconditional entity->lock section after spsc_queue_push. AFAICT that's the only way to be sure entity->rq is set for the submission at hand. Question also is, why +/- score in entity add/remove and not just for jobs? In the meantime patch will need to get reverted. Regards, Tvrtko drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock);
Re: [PATCH v4 6/6] drm/amdgpu: use drm_file::name in task_info::process_desc
On 27/09/2024 09:48, Pierre-Eric Pelloux-Prayer wrote: If a drm_file name is set append it to the process name. This information is useful with the virtio/native-context driver: this allows the guest applications identifier to visible in amdgpu's output. The output in amdgpu_vm_info/amdgpu_gem_info looks like this: pid:12255 Process:glxgears/test-set-fd-name -- Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 1 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 26 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 2 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +++ 6 files changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index f9d119448442..ad909173e419 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -299,6 +299,7 @@ int amdgpu_amdkfd_gpuvm_set_vm_pasid(struct amdgpu_device *adev, struct amdgpu_vm *avm, u32 pasid); int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, struct amdgpu_vm *avm, + struct drm_file *filp, void **process_info, struct dma_fence **ef); void amdgpu_amdkfd_gpuvm_release_process_vm(struct amdgpu_device *adev, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 6d5fd371d5ce..172882af6705 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1558,6 +1558,7 @@ int amdgpu_amdkfd_gpuvm_set_vm_pasid(struct amdgpu_device *adev, int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, struct amdgpu_vm *avm, + struct drm_file *filp, void **process_info, struct dma_fence **ef) { @@ -1577,7 +1578,7 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, if (ret) return ret; - amdgpu_vm_set_task_info(avm); + amdgpu_vm_set_task_info(avm, filp); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 891128ecee6d..5d43e24906d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1178,7 +1178,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p) } /* Use this opportunity to fill in task info for the vm */ - amdgpu_vm_set_task_info(vm); + amdgpu_vm_set_task_info(vm, p->filp); if (adev->debug_vm) { /* Invalidate all BOs to test for userspace bugs */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index cec0a5cffcc8..f6e2be6d4e9e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2355,25 +2355,40 @@ amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev, u32 pasid) amdgpu_vm_get_vm_from_pasid(adev, pasid)); } -static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) +static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm, struct drm_file *filp) { char process_name[TASK_COMM_LEN]; - int desc_len; + size_t desc_len; Nit - would be nicer to avoid the churn from patch to patch by starting with the correct type in the previous patch. get_task_comm(process_name, current->group_leader); desc_len = strlen(process_name); + mutex_lock(&filp->client_name_lock); + if (filp->client_name) + desc_len += 1 + strlen(filp->client_name); + vm->task_info = kzalloc( struct_size(vm->task_info, process_desc, desc_len + 1), GFP_KERNEL); - if (!vm->task_info) + if (!vm->task_info) { + mutex_unlock(&filp->client_name_lock); return -ENOMEM; + } /* Set process attributes now. */ vm->task_info->tgid = current->group_leader->pid; strscpy(vm->task_info->process_desc, process_name, desc_len + 1); + if (filp->client_name) { + size_t p_len = strlen(process_name); Another nit is that you are taking this strlen twice. Maybe cache it in a top level local so it looks cleaner. But those are just nits to make the series look more polished. Fundamentals look fine to me so up to you if you want to respin or not. Regards, Tvrtko + + vm->task_info->process_desc[p_len] = '/';
Re: [PATCH v4 1/6] drm: add DRM_SET_CLIENT_NAME ioctl
On 27/09/2024 09:48, Pierre-Eric Pelloux-Prayer wrote: Giving the opportunity to userspace to associate a free-form name with a drm_file struct is helpful for tracking and debugging. This is similar to the existing DMA_BUF_SET_NAME ioctl. Access to client_name is protected by a mutex, and the 'clients' debugfs file has been updated to print it. Userspace MR to use this ioctl: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428 If the string passed by userspace contains chars that would mess up output when it's going to be printed (in dmesg, fdinfo, etc), -EINVAL is returned. A 0-length string is a valid use, and clears the existing name. Reviewed-by: Tvrtko Ursulin Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/drm_debugfs.c | 14 ++--- drivers/gpu/drm/drm_file.c| 5 drivers/gpu/drm/drm_ioctl.c | 55 +++ include/drm/drm_file.h| 9 ++ include/uapi/drm/drm.h| 17 +++ 5 files changed, 96 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c index 6b239a24f1df..5c99322a4c6f 100644 --- a/drivers/gpu/drm/drm_debugfs.c +++ b/drivers/gpu/drm/drm_debugfs.c @@ -78,12 +78,14 @@ static int drm_clients_info(struct seq_file *m, void *data) kuid_t uid; seq_printf(m, - "%20s %5s %3s master a %5s %10s\n", + "%20s %5s %3s master a %5s %10s %*s\n", "command", "tgid", "dev", "uid", - "magic"); + "magic", + DRM_CLIENT_NAME_MAX_LEN, + "name"); /* dev->filelist is sorted youngest first, but we want to present * oldest first (i.e. kernel, servers, clients), so walk backwardss. @@ -94,19 +96,23 @@ static int drm_clients_info(struct seq_file *m, void *data) struct task_struct *task; struct pid *pid; + mutex_lock(&priv->client_name_lock); rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */ pid = rcu_dereference(priv->pid); task = pid_task(pid, PIDTYPE_TGID); uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID; - seq_printf(m, "%20s %5d %3d %c%c %5d %10u\n", + seq_printf(m, "%20s %5d %3d %c%c %5d %10u %*s\n", task ? task->comm : "", pid_vnr(pid), priv->minor->index, is_current_master ? 'y' : 'n', priv->authenticated ? 'y' : 'n', from_kuid_munged(seq_user_ns(m), uid), - priv->magic); + priv->magic, + DRM_CLIENT_NAME_MAX_LEN, + priv->client_name ? priv->client_name : ""); rcu_read_unlock(); + mutex_unlock(&priv->client_name_lock); } mutex_unlock(&dev->filelist_mutex); return 0; diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 01fde94fe2a9..64f5e15304e7 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) spin_lock_init(&file->master_lookup_lock); mutex_init(&file->event_read_lock); + mutex_init(&file->client_name_lock); if (drm_core_check_feature(dev, DRIVER_GEM)) drm_gem_open(dev, file); @@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file) WARN_ON(!list_empty(&file->event_list)); put_pid(rcu_access_pointer(file->pid)); + + mutex_destroy(&file->client_name_lock); + kfree(file->client_name); + kfree(file); } diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 51f39912866f..df8d59bd5241 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -540,6 +540,59 @@ int drm_version(struct drm_device *dev, void *data, return err; } +/* + * Check if the passed string contains control char or spaces or + * anything that would mess up a formatted output. + */ +static int drm_validate_value_string(const char *value, size_t len) +{ + int i; + + for (i = 0; i < len; i++) { + if (value[i] <= 32 || value[i] >= 127) Would !isascii() || isgraph() work for what you have in mind here, considering the comment from the cover letter about the extended ASCII? + return -EINVAL; + }
Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job
On 26/09/2024 09:15, Philipp Stanner wrote: On Mon, 2024-09-23 at 15:35 +0100, Tvrtko Ursulin wrote: Ping Christian and Philipp - reasonably happy with v2? I think it's the only unreviewed patch from the series. Howdy, sry for the delay, I had been traveling. I have a few nits below regarding the commit message. Besides, I'm OK with that, thx for your work :) No worries. On 16/09/2024 18:30, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation Well, the commit message does not state which optimization that is. One would have to look for the previous patch, which you apparently cannot provide a commit ID for yet because it's not in Big Boss's branch. With added emphasis: "Having _removed one re-lock cycle_ on the entity-lock..." "...do the same optimisation on the rq->lock." How it is not clear? In this case I am for including a sentence about what is being optimized also because on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and it's not clear what the "this" that's being achieved is. "This" is the optimisation previous paragraph talks about. What/why followed by how. I honestly think this part of the commit text is good enough. drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Reviewed-by: Philipp Stanner Thank you! Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 12 -- drivers/gpu/drm/scheduler/sched_main.c | 29 --- - include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..8ace1f1ea66b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity- job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next- submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); + + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..5c83fb92bb89 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b- oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct I think the commit message should contain a short sentence about why you removed the inline. AKA "As we're at it, remove the inline function specifier from drm_sched_rq_remove_fifo_locked() because XYZ" Fair play on this one, should have mentioned it. Probably just removed the inline by habit while touching the function signature. Under the "compiler knows better" mantra. Regards, Tvrtko drm_sched_enti
Re: [PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job
On 24/09/2024 15:20, Christian König wrote: Am 24.09.24 um 16:12 schrieb Tvrtko Ursulin: On 24/09/2024 14:55, Christian König wrote: I've pushed the first to drm-misc-next, but that one here fails to apply cleanly. This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flow control race") in drm-misc-fixes. In theory 1-3 from my series are fixes. Should they also go to drm-misc-fixes? I am not too familiar with the drm-misc flow. Ah shit, in that case you should have spitted the patches up into fixes and next. Going to push the first 3 to fixes. Sorry my drm-intel ways of thinking (cherry picked fixes) are hard to get rid of. Hence the series was structured as 1-3 fixes, 4-8 refactors etc. Now appears it is too late to pull out the first one from drm-misc-next. Or the series now needs to wait for some backmerge? Are the remaining 3 patches independent? If not then we need to wait for a backmerge. These are independent: Fixes: 1/8 "drm/sched: Add locking to drm_sched_entity_modify_sched" Not fixes: 5/8 "drm/sched: Stop setting current entity in FIFO mode" 6/8 "drm/sched: Re-order struct drm_sched_rq members for clarity" While the rest touch at least some common areas. 2/8 and 3/8 are also fixes. 4/8, 7/8 and 8/8 not fixes but depend on 2/8 and 3/8. Regards, Tvrtko Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more problematic since the same lock is taken inside drm_sched_rq_update_fifo(). v2: * Improve commit message. (Philipp) * Cache the scheduler pointer directly. (Christian) Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: Philipp Stanner Cc: dri-devel@lists.freedesktop.org Cc: # v5.7+ Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 0e002c17fcb6..a75eede8bf8d 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) /* first job wakes up scheduler */ if (first) { + struct drm_gpu_scheduler *sched; + struct drm_sched_rq *rq; + /* Add the entity to the run queue */ spin_lock(&entity->rq_lock); if (entity->stopped) { @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) return; } - drm_sched_rq_add_entity(entity->rq, entity); + rq = entity->rq; + sched = rq->sched; + + drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo(entity, submit_ts); - drm_sched_wakeup(entity->rq->sched); + drm_sched_wakeup(sched); } } EXPORT_SYMBOL(drm_sched_entity_push_job);
Re: [PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job
On 24/09/2024 14:55, Christian König wrote: I've pushed the first to drm-misc-next, but that one here fails to apply cleanly. This appears due 440d52b370b0 ("drm/sched: Fix dynamic job-flow control race") in drm-misc-fixes. In theory 1-3 from my series are fixes. Should they also go to drm-misc-fixes? I am not too familiar with the drm-misc flow. Or the series now needs to wait for some backmerge? Regards, Tvrtko Am 24.09.24 um 12:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more problematic since the same lock is taken inside drm_sched_rq_update_fifo(). v2: * Improve commit message. (Philipp) * Cache the scheduler pointer directly. (Christian) Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: Philipp Stanner Cc: dri-devel@lists.freedesktop.org Cc: # v5.7+ Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 0e002c17fcb6..a75eede8bf8d 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) /* first job wakes up scheduler */ if (first) { + struct drm_gpu_scheduler *sched; + struct drm_sched_rq *rq; + /* Add the entity to the run queue */ spin_lock(&entity->rq_lock); if (entity->stopped) { @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) return; } - drm_sched_rq_add_entity(entity->rq, entity); + rq = entity->rq; + sched = rq->sched; + + drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo(entity, submit_ts); - drm_sched_wakeup(entity->rq->sched); + drm_sched_wakeup(sched); } } EXPORT_SYMBOL(drm_sched_entity_push_job);
[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 12 -- drivers/gpu/drm/scheduler/sched_main.c | 29 include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 5ebbba77e77d..0aa90829c1d2 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next->submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); + + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 5628a4c78242..bdb55545b57c 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, +struct drm_sched_rq *rq, +ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -213,15 +211,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, void drm_sched_rq_add_entity(struct drm_sched_r
[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock
From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 28 drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/gpu_scheduler.h | 15 +++-- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 59f710afe992..5ebbba77e77d 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, /* We start in an idle state. */ complete_all(&entity->entity_idle); - spin_lock_init(&entity->rq_lock); + spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); atomic_set(&entity->fence_seq, 0); @@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); @@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) if (!entity->rq) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle); @@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, void drm_sched_entity_set_priority(struct drm_sched_entity *entity, enum drm_sched_priority priority) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->priority = priority; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_set_priority); @@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, next->submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } } @@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (fence && !dma_fence_is_signaled(fence)) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? sched->sched_rq[entity->priority] : NULL; if (rq != entity->rq) { drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; } - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); if (entity->num_sched_list == 1) entity->sched_list = NULL; @@ -606,9 +606,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) struct drm_sched_rq *rq; /* Add the entity to the run queue */ - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); if (entity->stopped) { - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); DRM_ERROR("Trying to push to a killed entity\n"); return; @@ -623,7 +623,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo_locked(entity, submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock);
[PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job
From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more problematic since the same lock is taken inside drm_sched_rq_update_fifo(). v2: * Improve commit message. (Philipp) * Cache the scheduler pointer directly. (Christian) Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: Philipp Stanner Cc: dri-devel@lists.freedesktop.org Cc: # v5.7+ Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 0e002c17fcb6..a75eede8bf8d 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) /* first job wakes up scheduler */ if (first) { + struct drm_gpu_scheduler *sched; + struct drm_sched_rq *rq; + /* Add the entity to the run queue */ spin_lock(&entity->rq_lock); if (entity->stopped) { @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) return; } - drm_sched_rq_add_entity(entity->rq, entity); + rq = entity->rq; + sched = rq->sched; + + drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo(entity, submit_ts); - drm_sched_wakeup(entity->rq->sched); + drm_sched_wakeup(sched); } } EXPORT_SYMBOL(drm_sched_entity_push_job); -- 2.46.0
[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Acked-by: Christian König Reviewed-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index e312eb6ac85a..130b53f02bbf 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -349,7 +349,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched, return ERR_PTR(-ENOSPC); } - rq->current_entity = entity; reinit_completion(&entity->entity_idle); break; } -- 2.46.0
[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched
From: Tvrtko Ursulin Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: Philipp Stanner Cc: # v5.7+ Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 567e5ace6d0c..0e002c17fcb6 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); + spin_lock(&entity->rq_lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; + spin_unlock(&entity->rq_lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); -- 2.46.0
[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 13 + drivers/gpu/drm/scheduler/sched_main.c | 6 +++--- include/drm/gpu_scheduler.h | 2 +- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b2cf3e0c1838..59f710afe992 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) struct drm_sched_job *next; next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); - if (next) - drm_sched_rq_update_fifo(entity, next->submit_ts); + if (next) { + spin_lock(&entity->rq_lock); + drm_sched_rq_update_fifo_locked(entity, + next->submit_ts); + spin_unlock(&entity->rq_lock); + } } /* Jobs and entities might have different lifecycles. Since we're @@ -615,10 +619,11 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(sched->score); drm_sched_rq_add_entity(rq, entity); - spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, submit_ts); + + spin_unlock(&entity->rq_lock); drm_sched_wakeup(sched); } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 03c532590e2a..e312eb6ac85a 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -163,14 +163,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti } } -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the * other to update the rb tree structure. */ - spin_lock(&entity->rq_lock); + lockdep_assert_held(&entity->rq_lock); + spin_lock(&entity->rq->lock); drm_sched_rq_remove_fifo_locked(entity); @@ -181,7 +182,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) drm_sched_entity_compare_before); spin_unlock(&entity->rq->lock); - spin_unlock(&entity->rq_lock); } /** diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 0b679700a63a..f62cc31fea18 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity); -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts); +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts); int drm_sched_entity_init(struct drm_sched_entity *entity, enum drm_sched_priority priority, -- 2.46.0
[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity
From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. While at it, lets also re-order the members so all protected by the lock are in a single group. v2: * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Reviewed-by: Philipp Stanner --- include/drm/gpu_scheduler.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index f62cc31fea18..33c60889e0a3 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -243,10 +243,10 @@ struct drm_sched_entity { /** * struct drm_sched_rq - queue of entities to be scheduled. * - * @lock: to modify the entities list. * @sched: the scheduler to which this rq belongs to. - * @entities: list of the entities to be scheduled. + * @lock: protects @entities, @rb_tree_root and @current_entity. * @current_entity: the entity which is to be scheduled. + * @entities: list of the entities to be scheduled. * @rb_tree_root: root of time based priory queue of entities for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for @@ -254,10 +254,12 @@ struct drm_sched_entity { * the next entity to emit commands from. */ struct drm_sched_rq { - spinlock_t lock; struct drm_gpu_scheduler*sched; - struct list_headentities; + + spinlock_t lock; + /* Following members are protected by the @lock: */ struct drm_sched_entity *current_entity; + struct list_headentities; struct rb_root_cached rb_tree_root; }; -- 2.46.0
[PATCH 3/8] drm/sched: Always increment correct scheduler score
From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian König Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: # v5.9+ Reviewed-by: Christian König Reviewed-by: Nirmoy Das --- drivers/gpu/drm/scheduler/sched_entity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index a75eede8bf8d..b2cf3e0c1838 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) ktime_t submit_ts; trace_drm_sched_job(sched_job, entity); - atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); /* @@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) rq = entity->rq; sched = rq->sched; + atomic_inc(sched->score); drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock); -- 2.46.0
[PATCH v3 0/8] DRM scheduler fixes and improvements
From: Tvrtko Ursulin All reviewed now, re-sending after rebasing on latest drm-tip so it is in a mergeable state. Tvrtko Ursulin (8): drm/sched: Add locking to drm_sched_entity_modify_sched drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job drm/sched: Always increment correct scheduler score drm/sched: Optimise drm_sched_entity_push_job drm/sched: Stop setting current entity in FIFO mode drm/sched: Re-order struct drm_sched_rq members for clarity drm/sched: Re-group and rename the entity run-queue lock drm/sched: Further optimise drm_sched_entity_push_job drivers/gpu/drm/scheduler/sched_entity.c | 53 +--- drivers/gpu/drm/scheduler/sched_main.c | 32 +++--- include/drm/gpu_scheduler.h | 28 +++-- 3 files changed, 68 insertions(+), 45 deletions(-) -- 2.46.0
Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job
On 24/09/2024 10:45, Tvrtko Ursulin wrote: On 24/09/2024 09:20, Christian König wrote: Am 16.09.24 um 19:30 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Thanks! Are you okay to pull into drm-misc-next or we should do some more testing on this? And/or should I resend the series once more in it's entirety so this v2 is not a reply-to to the original? I have to respin for the drm_sched_wakeup fix that landed. Regards, Tvrtko Regards, Tvrtko --- drivers/gpu/drm/scheduler/sched_entity.c | 12 -- drivers/gpu/drm/scheduler/sched_main.c | 29 include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..8ace1f1ea66b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next->submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); + + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..5c83fb92bb89 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq, + ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lo
Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job
On 24/09/2024 09:20, Christian König wrote: Am 16.09.24 um 19:30 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Thanks! Are you okay to pull into drm-misc-next or we should do some more testing on this? And/or should I resend the series once more in it's entirety so this v2 is not a reply-to to the original? Regards, Tvrtko --- drivers/gpu/drm/scheduler/sched_entity.c | 12 -- drivers/gpu/drm/scheduler/sched_main.c | 29 include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..8ace1f1ea66b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next->submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); + + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..5c83fb92bb89 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq, + ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -213,15 +211,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, void drm_sched_rq_a
Re: [PATCH v3 3/6] drm/amdgpu: delay the use of amdgpu_vm_set_task_info
On 24/09/2024 09:23, Christian König wrote: Am 23.09.24 um 12:25 schrieb Tvrtko Ursulin: On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: At this point the vm is locked so we safely modify it without risk of concurrent access. To which particular lock this is referring to and does this imply previous placement was unsafe? We use the root PDs dma_resv object as VM lock to protect most field inside the VM structure, only a few are protected by an additional spinlock. And yes, previously it was possible that you got a mangled process/task name because no lock was protecting the task_info structure. Got it, thanks Christian! In this case I only suggest to be more explicit in the commit message and clearly say it is fixing an existing bug. Like it stands I wasn't sure if it was that, or the movement was just enabling the changes which come later in the series. Regards, Tvrtko Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 1e475eb01417..891128ecee6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -309,9 +309,6 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, p->gang_leader->uf_addr = uf_offset; kvfree(chunk_array); - /* Use this opportunity to fill in task info for the vm */ - amdgpu_vm_set_task_info(vm); - return 0; free_all_kdata: @@ -1180,6 +1177,9 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p) job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo); } + /* Use this opportunity to fill in task info for the vm */ + amdgpu_vm_set_task_info(vm); + if (adev->debug_vm) { /* Invalidate all BOs to test for userspace bugs */ amdgpu_bo_list_for_each_entry(e, p->bo_list) {
Re: [PATCH v3 1/6] drm: add DRM_SET_NAME ioctl
On 24/09/2024 09:22, Pierre-Eric Pelloux-Prayer wrote: Le 23/09/2024 à 12:06, Tvrtko Ursulin a écrit : On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: Giving the opportunity to userspace to associate a free-form name with a drm_file struct is helpful for tracking and debugging. This is similar to the existing DMA_BUF_SET_NAME ioctl. Access to name is protected by a mutex, and the 'clients' debugfs file has been updated to print it. Userspace MR to use this ioctl: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428 The string passed by userspace is filtered a bit, to avoid messing output when it's going to be printed (in dmesg, fdinfo, etc): * all chars failing isgraph() are replaced by '-' * if a 0-length string is passed the name is cleared Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/drm_debugfs.c | 12 ++--- drivers/gpu/drm/drm_file.c | 5 drivers/gpu/drm/drm_ioctl.c | 48 +++ include/drm/drm_file.h | 9 +++ include/uapi/drm/drm.h | 17 + 5 files changed, 87 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c index 6b239a24f1df..482e71160544 100644 --- a/drivers/gpu/drm/drm_debugfs.c +++ b/drivers/gpu/drm/drm_debugfs.c @@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, void *data) kuid_t uid; seq_printf(m, - "%20s %5s %3s master a %5s %10s\n", + "%20s %5s %3s master a %5s %10s %20s\n", Allow full DRM_NAME_MAX_LEN? Not sure, feels not very consequential either way. I'll switch to: seq_printf(m, "%20s %5s %3s master a %5s %10s %*s\n", "command", "tgid", "dev", "uid", "magic", DRM_CLIENT_NAME_MAX_LEN, "name"); That works. And: seq_printf(m, "%20s %5d %3d %c %c %5d %10u %*s\n", task ? task->comm : "", pid_vnr(pid), priv->minor->index, is_current_master ? 'y' : 'n', priv->authenticated ? 'y' : 'n', from_kuid_munged(seq_user_ns(m), uid), priv->magic, DRM_CLIENT_NAME_MAX_LEN, priv->client_name ? priv->client_name : ""); Also works for me although it will look a bit busy by default since every line will contain it. I don't immediately see "parseability" is a concern (what Dmitry raised) because new code can detect if there is something there or not. For old code, or future changes, we do not care in debugfs. Equally we don't care that much if it looks busy, hence why I said "" works for me. I'd also be okay with repeating task->comm, but that perhaps complicates things too much when task is not available. "command", "tgid", "dev", "uid", - "magic"); + "magic", + "name"); /* dev->filelist is sorted youngest first, but we want to present * oldest first (i.e. kernel, servers, clients), so walk backwardss. @@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, void *data) struct task_struct *task; struct pid *pid; + mutex_lock(&priv->name_lock); rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */ pid = rcu_dereference(priv->pid); task = pid_task(pid, PIDTYPE_TGID); uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID; - seq_printf(m, "%20s %5d %3d %c %c %5d %10u\n", + seq_printf(m, "%20s %5d %3d %c %c %5d %10u %20s\n", task ? task->comm : "", pid_vnr(pid), priv->minor->index, is_current_master ? 'y' : 'n', priv->authenticated ? 'y' : 'n', from_kuid_munged(seq_user_ns(m), uid), - priv->magic); + priv->magic, + priv->name ?: ""); rcu_read_unlock(); + mutex_unlock(&priv->name_lock); } mutex_unlock(&dev->filelist_mutex); return 0; diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 01fde94fe2a9..e9dd0e90a1f9 100644 --- a/d
Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job
Ping Christian and Philipp - reasonably happy with v2? I think it's the only unreviewed patch from the series. Regards, Tvrtko On 16/09/2024 18:30, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 12 -- drivers/gpu/drm/scheduler/sched_main.c | 29 include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..8ace1f1ea66b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next->submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); + + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..5c83fb92bb89 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, +struct drm_sched_rq *rq, +ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@
Re: [PATCH v3 4/6] drm/amdgpu: alloc and init vm::task_info from first submit
On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: This will allow to use flexible array to store the process name and other information. This also means that process name will be determined once and for all, instead of at each submit. But the pid and others can still change? By design? Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index e20d19ae01b2..690676cab022 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2331,7 +2331,7 @@ amdgpu_vm_get_task_info_vm(struct amdgpu_vm *vm) { struct amdgpu_task_info *ti = NULL; - if (vm) { + if (vm && vm->task_info) { ti = vm->task_info; kref_get(&vm->task_info->refcount); } @@ -2372,8 +2372,12 @@ static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) */ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) { - if (!vm->task_info) - return; + if (!vm->task_info) { + if (amdgpu_vm_create_task_info(vm)) + return; + + get_task_comm(vm->task_info->process_name, current->group_leader); + } if (vm->task_info->pid == current->pid) This ends up relying on vm->task_info->pid being zero due kzalloc right? return; @@ -2385,7 +2389,6 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) return; vm->task_info->tgid = current->group_leader->pid; - get_task_comm(vm->task_info->process_name, current->group_leader); } I wonder how many of the task_info fields you want to set once instead of per submission. Like a fully one shot like the below be what you want? diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index a060c28f0877..da492223a8b5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2349,16 +2349,6 @@ amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev, u32 pasid) amdgpu_vm_get_vm_from_pasid(adev, pasid)); } -static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) -{ - vm->task_info = kzalloc(sizeof(struct amdgpu_task_info), GFP_KERNEL); - if (!vm->task_info) - return -ENOMEM; - - kref_init(&vm->task_info->refcount); - return 0; -} - /** * amdgpu_vm_set_task_info - Sets VMs task info. * @@ -2366,20 +2356,28 @@ static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) */ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) { - if (!vm->task_info) - return; + struct amdgpu_task_info *task_info = vm->task_info; + + if (!task_info) { + task_info = kzalloc(sizeof(struct amdgpu_task_info), + GFP_KERNEL); + if (!task_info) + return; - if (vm->task_info->pid == current->pid) + kref_init(&task_info->refcount); + } else { return; + } - vm->task_info->pid = current->pid; - get_task_comm(vm->task_info->task_name, current); + task_info->pid = current->pid; + get_task_comm(task_info->task_name, current); - if (current->group_leader->mm != current->mm) - return; + if (current->group_leader->mm == current->mm) { + task_info->tgid = current->group_leader->pid; + get_task_comm(task_info->process_name, current->group_leader); + } - vm->task_info->tgid = current->group_leader->pid; - get_task_comm(vm->task_info->process_name, current->group_leader); + vm->task_info = task_info; } /** End result is code like this: void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) { struct amdgpu_task_info *task_info = vm->task_info; if (!task_info) { task_info = kzalloc(sizeof(struct amdgpu_task_info), GFP_KERNEL); if (!task_info) return; kref_init(&task_info->refcount); } else { return; } task_info->pid = current->pid; get_task_comm(task_info->task_name, current); if (current->group_leader->mm == current->mm) { task_info->tgid = current->group_leader->pid; get_task_comm(task_info->process_name, current->group_leader); } vm->task_info = task_info; } ? /** @@ -2482,7 +2485,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (r) goto error_free_root; - r = amdgpu_vm_create_task_info(vm); if (r) DRM_DEBUG("Failed to create task info for VM\n"); Two more lines to delete here.
Re: [PATCH v3 3/6] drm/amdgpu: delay the use of amdgpu_vm_set_task_info
On 20/09/2024 10:06, Pierre-Eric Pelloux-Prayer wrote: At this point the vm is locked so we safely modify it without risk of concurrent access. To which particular lock this is referring to and does this imply previous placement was unsafe? Regards, Tvrtko Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 1e475eb01417..891128ecee6d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -309,9 +309,6 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, p->gang_leader->uf_addr = uf_offset; kvfree(chunk_array); - /* Use this opportunity to fill in task info for the vm */ - amdgpu_vm_set_task_info(vm); - return 0; free_all_kdata: @@ -1180,6 +1177,9 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p) job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo); } + /* Use this opportunity to fill in task info for the vm */ + amdgpu_vm_set_task_info(vm); + if (adev->debug_vm) { /* Invalidate all BOs to test for userspace bugs */ amdgpu_bo_list_for_each_entry(e, p->bo_list) {
Re: [PATCH v3 1/6] drm: add DRM_SET_NAME ioctl
) { + kfree(new_name); + return -EINVAL; + } + + /* +* Filter out control char / spaces / new lines etc in the name +* since it's going to be used in dmesg or fdinfo's output. +*/ + for (i = 0; i < len; i++) { + if (!isgraph(new_name[i])) + new_name[i] = '-'; + } + + mutex_lock(&file_priv->name_lock); + kfree(file_priv->name); + if (len > 0) { + file_priv->name = new_name; + } else { + kfree(new_name); + file_priv->name = NULL; + } + mutex_unlock(&file_priv->name_lock); + + return 0; +} + static int drm_ioctl_permit(u32 flags, struct drm_file *file_priv) { /* ROOT_ONLY is only for CAP_SYS_ADMIN */ @@ -610,6 +656,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_PRIME_HANDLE_TO_FD, drm_prime_handle_to_fd_ioctl, DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_PRIME_FD_TO_HANDLE, drm_prime_fd_to_handle_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SET_NAME, drm_set_name, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETPLANERESOURCES, drm_mode_getplane_res, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETCRTC, drm_mode_getcrtc, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETCRTC, drm_mode_setcrtc, DRM_MASTER), diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index 8c0030c77308..df26eee8f79c 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -388,6 +388,15 @@ struct drm_file { * Per-file buffer caches used by the PRIME buffer sharing code. */ struct drm_prime_file_private prime; + + /** +* @name: +* +* Userspace-provided name; useful for accounting and debugging. +*/ + const char *name; + /** @name_lock: Protects @name. */ + struct mutex name_lock; }; /** diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 16122819edfe..f5e92e4f909b 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -1024,6 +1024,13 @@ struct drm_crtc_queue_sequence { __u64 user_data;/* user data passed to event */ }; +#define DRM_NAME_MAX_LEN 64 +struct drm_set_name { + __u64 name_len; + __u64 name; +}; + + #if defined(__cplusplus) } #endif @@ -1288,6 +1295,16 @@ extern "C" { */ #define DRM_IOCTL_MODE_CLOSEFBDRM_IOWR(0xD0, struct drm_mode_closefb) +/** + * DRM_IOCTL_SET_NAME - Attach a name to a drm_file + * + * This ioctl is similar to DMA_BUF_SET_NAME - it allows for easier tracking + * and debugging. + * The length of the name must <= DRM_NAME_MAX_LEN. All characters that are + * non-printable or whitespaces will be replaced by -. + */ +#define DRM_IOCTL_SET_NAME DRM_IOWR(0xD1, struct drm_set_name) + A comment, nice! :) Overal looks good to me. Reviewed-by: Tvrtko Ursulin I do however wish for more opinions (before merging) on whether strings with invalid characters should perhaps instead be rejected. I don't currently have a solid argument either way. Perhaps the only argument against silent transformation is if someone sets some wild string, then greps for it somewhere, which would be a false negative without the understanding of what kind of remapping kernel does. It is weak but it is uapi so worth discussing every crazy possibility I think. On the other hand it would create another annoying source of EINVAL. :shrug: Also, how are with with testing the DRM core features? Add something for the uapi in IGT/tests/drm_client_name, or some such? Regards, Tvrtko /* * Device specific ioctls should only be in their respective headers * The device specific ioctl range is from 0x40 to 0x9f.
Re: [PATCH v6 01/12] spi: add driver for intel graphics on-die spi device
On 21/09/2024 14:00, Winkler, Tomas wrote: On Thu, Sep 19, 2024 at 09:54:24AM +, Winkler, Tomas wrote: On Mon, Sep 16, 2024 at 04:49:17PM +0300, Alexander Usyskin wrote: @@ -0,0 +1,142 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright(c) 2019-2024, Intel Corporation. All rights reserved. + */ Please make the entire comment a C++ one so things look more intentional. This is how it is required by Linux spdx checker, There is no incompatibility between SPDX and what I'm asking for... + size = sizeof(*spi) + sizeof(spi->regions[0]) * nregions; + spi = kzalloc(size, GFP_KERNEL); Use at least array_size(). Regions is not fixed size array, it will not work. Yes, that's the wrong helper - there is a relevent one though which I'm not remembering right now. I don't think there is one, you can allocate arrays but this is not the case here. struct_size() probably. Regards, Tvrtko
Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
On 16/09/2024 13:20, Tvrtko Ursulin wrote: On 16/09/2024 13:11, Christian König wrote: Am 13.09.24 um 18:05 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we rename drm_sched_rq_add_entity() to drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be held, and also add the same expectation to drm_sched_rq_update_fifo_locked(). Finally, to align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity_locked() and drm_sched_rq_remove_fifo_locked() function signatures, we add rq as a parameter to the latter. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 8 -- drivers/gpu/drm/scheduler/sched_main.c | 34 +++- include/drm/gpu_scheduler.h | 7 ++--- 3 files changed, 26 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..c48f17faef41 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -517,6 +517,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) if (next) { spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, + entity->rq, next->submit_ts); spin_unlock(&entity->lock); } @@ -618,11 +619,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); - drm_sched_rq_add_entity(rq, entity); + + spin_lock(&rq->lock); + drm_sched_rq_add_entity_locked(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..c0d3f6ac3ae3 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq, + ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -203,25 +201,23 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, } /** - * drm_sched_rq_add_entity - add an entity + * drm_sched_rq_add_entity_locked - add an entity * * @rq: scheduler run queue * @entity: scheduler entity * * Adds a scheduler entity to the run queue. */ -void drm_sched_rq_add_entity(struct drm_sched_rq *rq, - struct drm_sched_entity *entity) +void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq, + struct drm_sched_entity *entity) { + lockdep_assert_held
[PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming incosistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 12 -- drivers/gpu/drm/scheduler/sched_main.c | 29 include/drm/gpu_scheduler.h | 3 ++- 3 files changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..8ace1f1ea66b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -515,9 +515,14 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { + struct drm_sched_rq *rq; + spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, next->submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } } @@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); + + spin_lock(&rq->lock); drm_sched_rq_add_entity(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..5c83fb92bb89 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, +struct drm_sched_rq *rq, +ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -213,15 +211,14 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, void drm_sched_rq_add_entity(struct drm_sched_r
Re: [PATCH v2 3/3] drm/amdgpu: use drm_file name
On 16/09/2024 14:32, Pierre-Eric Pelloux-Prayer wrote: In debugfs gem_info/vm_info files, timeout handler and page fault reports. This information is useful with the virtio/native-context driver: this allows the guest applications identifier to visible in amdgpu's output. The output in amdgpu_vm_info/amdgpu_gem_info looks like this: pid:12255 Process:glxgears/test-set-fd-name -- Signed-off-by: Pierre-Eric Pelloux-Prayer --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 16 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 25 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 4 +-- 5 files changed, 40 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 6d5fd371d5ce..1712feb2c238 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1577,7 +1577,7 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, if (ret) return ret; - amdgpu_vm_set_task_info(avm); + amdgpu_vm_set_task_info(avm, NULL); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 1e475eb01417..d32dc547cc80 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -310,7 +310,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, kvfree(chunk_array); /* Use this opportunity to fill in task info for the vm */ - amdgpu_vm_set_task_info(vm); + amdgpu_vm_set_task_info(vm, p->filp); return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 0e617dff8765..0c52168edbaf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -997,6 +997,10 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused) if (r) return r; + r = mutex_lock_interruptible(&file->name_lock); + if (r) + goto out; Shouldn't this be in the below loop? + list_for_each_entry(file, &dev->filelist, lhead) { struct task_struct *task; struct drm_gem_object *gobj; @@ -1012,8 +1016,13 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused) rcu_read_lock(); pid = rcu_dereference(file->pid); task = pid_task(pid, PIDTYPE_TGID); - seq_printf(m, "pid %8d command %s:\n", pid_nr(pid), - task ? task->comm : ""); + seq_printf(m, "pid %8d command %s", pid_nr(pid), + task ? task->comm : ""); + if (file->name) { + seq_putc(m, '/'); + seq_puts(m, file->name); + } + seq_puts(m, ":\n"); rcu_read_unlock(); spin_lock(&file->table_lock); @@ -1024,7 +1033,8 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused) } spin_unlock(&file->table_lock); } - + mutex_unlock(&file->name_lock); +out: mutex_unlock(&dev->filelist_mutex); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index e20d19ae01b2..5701d74159d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2370,7 +2370,7 @@ static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) * * @vm: vm for which to set the info */ -void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) +void amdgpu_vm_set_task_info(struct amdgpu_vm *vm, struct drm_file *file) { if (!vm->task_info) return; @@ -2385,7 +2385,28 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) return; vm->task_info->tgid = current->group_leader->pid; - get_task_comm(vm->task_info->process_name, current->group_leader); + __get_task_comm(vm->task_info->process_name, TASK_COMM_LEN, + current->group_leader); + /* Append drm_client_name if set. */ + if (file && file->name) { + mutex_lock(&file->name_lock); + + /* Assert that process_name is big enough to store process_name, +* so: (TASK_COMM_LEN - 1) + '/' + '\0'. +* This way we can concat file->name without worrying about space. +*/ + static_assert(sizeof(vm->task_info->process_name) >= TASK_COMM_LEN + 1); + if (file->name) { + int n; + + n = strlen(vm->task_info->process_name); + vm->task_info->proces
Re: [PATCH v2 2/3] drm: use drm_file name in fdinfo
On 16/09/2024 14:32, Pierre-Eric Pelloux-Prayer wrote: Add an optional drm-client-name field to drm fdinfo's output. Signed-off-by: Pierre-Eric Pelloux-Prayer --- Documentation/gpu/drm-usage-stats.rst | 5 + drivers/gpu/drm/drm_file.c| 5 + 2 files changed, 10 insertions(+) diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index a80f95ca1b2f..ed1d7edbbc5f 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -73,6 +73,11 @@ scope of each device, in which case `drm-pdev` shall be present as well. Userspace should make sure to not double account any usage statistics by using the above described criteria in order to associate data to individual clients. +- drm-client-name: + +String optionally set by userspace using DRM_IOCTL_SET_NAME. + + Utilization ^^^ diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index e9dd0e90a1f9..6a3621f50784 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -955,6 +955,11 @@ void drm_show_fdinfo(struct seq_file *m, struct file *f) PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn)); } + mutex_lock(&file->name_lock); + if (file->name) + drm_printf(&p, "drm-client-name:\t%s\n", file->name); + mutex_unlock(&file->name_lock); + if (dev->driver->show_fdinfo) dev->driver->show_fdinfo(&p, file); } Reviewed-by: Tvrtko Ursulin Regards, Tvrtko
Re: [PATCH v2 1/3] drm: add DRM_SET_NAME ioctl
On 16/09/2024 14:32, Pierre-Eric Pelloux-Prayer wrote: Giving the opportunity to userspace to associate a free-form name with a drm_file struct is helpful for tracking and debugging. This is similar to the existing DMA_BUF_SET_NAME ioctl. Access to name is protected by a mutex, and the 'clients' debugfs file has been updated to print it. Userspace MR to use this ioctl: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428 The string passed by userspace is filtered a bit, to avoid messing output when it's going to be printed (in dmesg, fdinfo, etc): * all chars failing isgraph() are replaced by '-' * if a 0-length string is passed the name is cleared Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/drm_debugfs.c | 12 ++ drivers/gpu/drm/drm_file.c| 5 + drivers/gpu/drm/drm_ioctl.c | 42 +++ include/drm/drm_file.h| 9 include/uapi/drm/drm.h| 14 5 files changed, 78 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c index 6b239a24f1df..b7492225ae88 100644 --- a/drivers/gpu/drm/drm_debugfs.c +++ b/drivers/gpu/drm/drm_debugfs.c @@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, void *data) kuid_t uid; seq_printf(m, - "%20s %5s %3s master a %5s %10s\n", + "%20s %5s %3s master a %5s %10s %20s\n", "command", "tgid", "dev", "uid", - "magic"); + "magic", + "name"); /* dev->filelist is sorted youngest first, but we want to present * oldest first (i.e. kernel, servers, clients), so walk backwardss. @@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, void *data) struct task_struct *task; struct pid *pid; + mutex_lock(&priv->name_lock); rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */ pid = rcu_dereference(priv->pid); task = pid_task(pid, PIDTYPE_TGID); uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID; - seq_printf(m, "%20s %5d %3d %c%c %5d %10u\n", + seq_printf(m, "%20s %5d %3d %c%c %5d %10u %20s\n", task ? task->comm : "", pid_vnr(pid), priv->minor->index, is_current_master ? 'y' : 'n', priv->authenticated ? 'y' : 'n', from_kuid_munged(seq_user_ns(m), uid), - priv->magic); + priv->magic, + priv->name ? priv->name : ""); rcu_read_unlock(); + mutex_unlock(&priv->name_lock); } mutex_unlock(&dev->filelist_mutex); return 0; diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 01fde94fe2a9..e9dd0e90a1f9 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) spin_lock_init(&file->master_lookup_lock); mutex_init(&file->event_read_lock); + mutex_init(&file->name_lock); if (drm_core_check_feature(dev, DRIVER_GEM)) drm_gem_open(dev, file); @@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file) WARN_ON(!list_empty(&file->event_list)); put_pid(rcu_access_pointer(file->pid)); + + mutex_destroy(&file->name_lock); + kfree(file->name); + kfree(file); } diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 51f39912866f..b7d7bede0ab3 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -540,6 +540,46 @@ int drm_version(struct drm_device *dev, void *data, return err; } +static int drm_set_name(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_set_name *name = data; + void *user_ptr; __user as kernel test robot reminds us. + char *new_name; + size_t i, len; + + if (name->name_len >= NAME_MAX) + return -EINVAL; Maybe it is a bit unsubstantiated, but I am leaning towards a feeling of lets define own smaller limit, like dma-buf does. If 32 is deemed too restrictive make it larger but 255 feels unnecessary. But I don't feel to strongly about this so if people insist we need the names this long then so be it. + + user_ptr = u64_to_user_ptr(name->name); + + new_name = memdup_user_nul(user_ptr, name->name_len); + Nit: I'd zap this blank line since it is breaking a logical group. + if (IS_ERR(new_name)) + return PTR_ERR(new_name); + + /* Filter out control char / spac
Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
On 16/09/2024 13:11, Christian König wrote: Am 13.09.24 um 18:05 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we rename drm_sched_rq_add_entity() to drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be held, and also add the same expectation to drm_sched_rq_update_fifo_locked(). Finally, to align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity_locked() and drm_sched_rq_remove_fifo_locked() function signatures, we add rq as a parameter to the latter. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 8 -- drivers/gpu/drm/scheduler/sched_main.c | 34 +++- include/drm/gpu_scheduler.h | 7 ++--- 3 files changed, 26 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..c48f17faef41 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -517,6 +517,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) if (next) { spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, + entity->rq, next->submit_ts); spin_unlock(&entity->lock); } @@ -618,11 +619,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); - drm_sched_rq_add_entity(rq, entity); + + spin_lock(&rq->lock); + drm_sched_rq_add_entity_locked(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..c0d3f6ac3ae3 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq, + ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -203,25 +201,23 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, } /** - * drm_sched_rq_add_entity - add an entity + * drm_sched_rq_add_entity_locked - add an entity * * @rq: scheduler run queue * @entity: scheduler entity * * Adds a scheduler entity to the run queue. */ -void drm_sched_rq_add_entity(struct drm_sched_rq *rq, - struct drm_sched_entity *entity) +void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq, + struct drm_sched_entity *entity) { + lockdep_assert_held(&rq->lock); + if (!list_emp
Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity
On 16/09/2024 09:16, Philipp Stanner wrote: On Fri, 2024-09-13 at 17:05 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. While at it, lets also re-order the members so all protected by the lock are in a single group. v2: * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König Looks good, thx Reviewed-by: Philipp Stanner Thanks! 4/8 and 8/8 are now the only two left with no r-b. --- include/drm/gpu_scheduler.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 38465b78c7d5..2f58af00f792 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -243,10 +243,10 @@ struct drm_sched_entity { /** * struct drm_sched_rq - queue of entities to be scheduled. * - * @lock: to modify the entities list. * @sched: the scheduler to which this rq belongs to. - * @entities: list of the entities to be scheduled. + * @lock: protects @entities, @rb_tree_root and @current_entity. nit: in case you'll provide a new version anyways you could consider sorting these three to be congruent with the lines below. To me it looks the order of kerneldoc vs members is aligned. Unless I missed what you mean here? Regards, Tvrtko * @current_entity: the entity which is to be scheduled. + * @entities: list of the entities to be scheduled. * @rb_tree_root: root of time based priory queue of entities for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for @@ -254,10 +254,12 @@ struct drm_sched_entity { * the next entity to emit commands from. */ struct drm_sched_rq { - spinlock_t lock; struct drm_gpu_scheduler*sched; - struct list_headentities; + + spinlock_t lock; + /* Following members are protected by the @lock: */ struct drm_sched_entity *current_entity; + struct list_headentities; struct rb_root_cached rb_tree_root; };
[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity
From: Tvrtko Ursulin Current kerneldoc for struct drm_sched_rq incompletely documents what fields are protected by the lock. This is not good because it is misleading. Lets fix it by listing all the elements which are protected by the lock. While at it, lets also re-order the members so all protected by the lock are in a single group. v2: * Refer variables by kerneldoc syntax, more verbose commit text. (Philipp) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- include/drm/gpu_scheduler.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 38465b78c7d5..2f58af00f792 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -243,10 +243,10 @@ struct drm_sched_entity { /** * struct drm_sched_rq - queue of entities to be scheduled. * - * @lock: to modify the entities list. * @sched: the scheduler to which this rq belongs to. - * @entities: list of the entities to be scheduled. + * @lock: protects @entities, @rb_tree_root and @current_entity. * @current_entity: the entity which is to be scheduled. + * @entities: list of the entities to be scheduled. * @rb_tree_root: root of time based priory queue of entities for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for @@ -254,10 +254,12 @@ struct drm_sched_entity { * the next entity to emit commands from. */ struct drm_sched_rq { - spinlock_t lock; struct drm_gpu_scheduler*sched; - struct list_headentities; + + spinlock_t lock; + /* Following members are protected by the @lock: */ struct drm_sched_entity *current_entity; + struct list_headentities; struct rb_root_cached rb_tree_root; }; -- 2.46.0
[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock
From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 28 drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/gpu_scheduler.h | 15 +++-- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index aff79055643f..d982cebc6bee 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, /* We start in an idle state. */ complete_all(&entity->entity_idle); - spin_lock_init(&entity->rq_lock); + spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); atomic_set(&entity->fence_seq, 0); @@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); @@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) if (!entity->rq) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle); @@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, void drm_sched_entity_set_priority(struct drm_sched_entity *entity, enum drm_sched_priority priority) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->priority = priority; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_set_priority); @@ -515,10 +515,10 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); if (next) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, next->submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } } @@ -559,14 +559,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (fence && !dma_fence_is_signaled(fence)) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? sched->sched_rq[entity->priority] : NULL; if (rq != entity->rq) { drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; } - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); if (entity->num_sched_list == 1) entity->sched_list = NULL; @@ -606,9 +606,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) struct drm_sched_rq *rq; /* Add the entity to the run queue */ - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); if (entity->stopped) { - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); DRM_ERROR("Trying to push to a killed entity\n"); return; @@ -623,7 +623,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo_locked(entity, submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock);
[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we rename drm_sched_rq_add_entity() to drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be held, and also add the same expectation to drm_sched_rq_update_fifo_locked(). Finally, to align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity_locked() and drm_sched_rq_remove_fifo_locked() function signatures, we add rq as a parameter to the latter. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 8 -- drivers/gpu/drm/scheduler/sched_main.c | 34 +++- include/drm/gpu_scheduler.h | 7 ++--- 3 files changed, 26 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index d982cebc6bee..c48f17faef41 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -517,6 +517,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) if (next) { spin_lock(&entity->lock); drm_sched_rq_update_fifo_locked(entity, + entity->rq, next->submit_ts); spin_unlock(&entity->lock); } @@ -618,11 +619,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); - drm_sched_rq_add_entity(rq, entity); + + spin_lock(&rq->lock); + drm_sched_rq_add_entity_locked(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 18a952f73ecb..c0d3f6ac3ae3 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,17 +153,18 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, +struct drm_sched_rq *rq, +ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change @@ -171,17 +172,14 @@ void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts * other to update the rb tree structure. */ lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } /** @@ -203,25 +201,23 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, } /** - * drm_sched_rq_add_entity - add an entity + * drm_sched_rq_add_entity_locked - add an entity * * @rq: scheduler run queue * @entity: scheduler entity * * Adds a scheduler entity to the run queue. */ -void drm_sched_rq_add_entity(struct drm_sched_rq *rq, -struct drm_sched_entity
[PATCH v3 0/8] DRM scheduler fixes and improvements
From: Tvrtko Ursulin Re-spin of the series from last week. Changelog is in individual patches. Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Tvrtko Ursulin (8): drm/sched: Add locking to drm_sched_entity_modify_sched drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job drm/sched: Always increment correct scheduler score drm/sched: Optimise drm_sched_entity_push_job drm/sched: Stop setting current entity in FIFO mode drm/sched: Re-order struct drm_sched_rq members for clarity drm/sched: Re-group and rename the entity run-queue lock drm/sched: Further optimise drm_sched_entity_push_job drivers/gpu/drm/scheduler/sched_entity.c | 49 drivers/gpu/drm/scheduler/sched_main.c | 37 -- include/drm/gpu_scheduler.h | 32 +--- 3 files changed, 68 insertions(+), 50 deletions(-) -- 2.46.0
[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Acked-by: Christian König Reviewed-by: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index d0ee0ba75a86..74eaa3b23821 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -349,7 +349,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched, return ERR_PTR(-ENOSPC); } - rq->current_entity = entity; reinit_completion(&entity->entity_idle); break; } -- 2.46.0
[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 13 + drivers/gpu/drm/scheduler/sched_main.c | 6 +++--- include/drm/gpu_scheduler.h | 2 +- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6645a8524699..aff79055643f 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -514,8 +514,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) struct drm_sched_job *next; next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); - if (next) - drm_sched_rq_update_fifo(entity, next->submit_ts); + if (next) { + spin_lock(&entity->rq_lock); + drm_sched_rq_update_fifo_locked(entity, + next->submit_ts); + spin_unlock(&entity->rq_lock); + } } /* Jobs and entities might have different lifecycles. Since we're @@ -615,10 +619,11 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(sched->score); drm_sched_rq_add_entity(rq, entity); - spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, submit_ts); + + spin_unlock(&entity->rq_lock); drm_sched_wakeup(sched, entity); } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index f093616fe53c..d0ee0ba75a86 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -163,14 +163,15 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti } } -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) { /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the * other to update the rb tree structure. */ - spin_lock(&entity->rq_lock); + lockdep_assert_held(&entity->rq_lock); + spin_lock(&entity->rq->lock); drm_sched_rq_remove_fifo_locked(entity); @@ -181,7 +182,6 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) drm_sched_entity_compare_before); spin_unlock(&entity->rq->lock); - spin_unlock(&entity->rq_lock); } /** diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index a8d19b10f9b8..38465b78c7d5 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -593,7 +593,7 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity); -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts); +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts); int drm_sched_entity_init(struct drm_sched_entity *entity, enum drm_sched_priority priority, -- 2.46.0
[PATCH 3/8] drm/sched: Always increment correct scheduler score
From: Tvrtko Ursulin Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das Cc: Christian König Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: # v5.9+ Reviewed-by: Christian König Reviewed-by: Nirmoy Das --- drivers/gpu/drm/scheduler/sched_entity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 76e422548d40..6645a8524699 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -586,7 +586,6 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) ktime_t submit_ts; trace_drm_sched_job(sched_job, entity); - atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); /* @@ -614,6 +613,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) rq = entity->rq; sched = rq->sched; + atomic_inc(sched->score); drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock); -- 2.46.0
[PATCH 2/8] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job
From: Tvrtko Ursulin Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more problematic since the same lock is taken inside drm_sched_rq_update_fifo(). v2: * Improve commit message. (Philipp) * Cache the scheduler pointer directly. (Christian) Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: Philipp Stanner Cc: dri-devel@lists.freedesktop.org Cc: # v5.7+ Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index ae8be30472cd..76e422548d40 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -599,6 +599,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) /* first job wakes up scheduler */ if (first) { + struct drm_gpu_scheduler *sched; + struct drm_sched_rq *rq; + /* Add the entity to the run queue */ spin_lock(&entity->rq_lock); if (entity->stopped) { @@ -608,13 +611,16 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) return; } - drm_sched_rq_add_entity(entity->rq, entity); + rq = entity->rq; + sched = rq->sched; + + drm_sched_rq_add_entity(rq, entity); spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo(entity, submit_ts); - drm_sched_wakeup(entity->rq->sched, entity); + drm_sched_wakeup(sched, entity); } } EXPORT_SYMBOL(drm_sched_entity_push_job); -- 2.46.0
[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched
From: Tvrtko Ursulin Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: Philipp Stanner Cc: # v5.7+ Reviewed-by: Christian König --- drivers/gpu/drm/scheduler/sched_entity.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 58c8161289fe..ae8be30472cd 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); + spin_lock(&entity->rq_lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; + spin_unlock(&entity->rq_lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); -- 2.46.0
Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
On 13/09/2024 13:19, Philipp Stanner wrote: On Wed, 2024-09-11 at 13:22 +0100, Tvrtko Ursulin wrote: On 10/09/2024 11:25, Philipp Stanner wrote: On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq- lock (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we rename drm_sched_rq_add_entity() to drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be held, and also add the same expectation to drm_sched_rq_update_fifo_locked(). For more stream-lining we also add the run-queue as an explicit parameter to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee having to dereference entity->rq. Why is dereferencing it a problem? As you have noticed below the API is a bit unsightly. Consider for example this call chain: drm_sched_entity_kill(entity) drm_sched_rq_remove_entity(entity->rq, entity); drm_sched_rq_remove_fifo_locked(entity); struct drm_sched_rq *rq = entity->rq; A bit confused, no? I thought adding rq to remove_fifo_locked at least removes one back and forth between the entity->rq and rq. And then if we cache the rq in a local variable, after having explicitly taken the correct lock, we have this other call chain example: drm_sched_entity_push_job() ... rq = entity->rq; spin_lock(rq->lock); drm_sched_rq_add_entity_locked(rq, entity); drm_sched_rq_update_fifo_locked(rq, entity, submit_ts); spin_unlock(rq->lock); To me at least this reads more streamlined. Alright, doesn't sound to bad, but Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-- drivers/gpu/drm/scheduler/sched_main.c | 41 + - -- include/drm/gpu_scheduler.h | 7 ++-- 3 files changed, 31 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b4c4f9923e0b..2102c726d275 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); - drm_sched_rq_add_entity(rq, entity); + + spin_lock(&rq->lock); + drm_sched_rq_add_entity_locked(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 937e7d1cfc49..1ccd2aed2d32 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,41 +153,44 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b- oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) I would then at least like to see a comment somewhere telling the reader why rq is taken as a separate variable. One might otherwise easily wonder why it's not obtained through the entity and what the difference is. I failed to find a nice place to put it. I'll send v3 of the series with some changes soo and then please have another look at this patch and see if you can think of where it would look good. Regards, Tvrtko So here we'd add a new function parameter that still doesn't allow for getting rid of 'entity' as a parameter. We can't get rid of the entity. Maaaybe instead we could get rid of the rq in the whole chain, I mean from drm_sched_rq_add_entity and drm_sched_rq_remove_entity to start with. Let's postpone that. But then to remove double re-lock we still (like in this patch) need to make the callers take the locks and rename the helpers with _locked suffix. Otherwise it would be incosistent that a lock is taken outside the helpers with no _locked suffix. I am not sure if that is better. All it achieves is remove the rq as explicit parameter my making the callees dereference it from the entity.
Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
On 10/09/2024 16:03, Christian König wrote: Am 10.09.24 um 11:46 schrieb Tvrtko Ursulin: On 10/09/2024 10:08, Christian König wrote: Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) I think that goes into the wrong direction. Probably better to move this here into drm_sched_rq_add_entity(): if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo_locked(entity, submit_ts); We can then also drop adding the entity to the rr list when FIFO is in use. Unfortuntely there is a few other places which appear to rely on the list. Like drm_sched_fini, That should be only a warning. Warning as in? drm_sched_increase_karma and The karma handling was another bad idea from AMD how to populate back errors to userspace and I've just recently documented together with Sima that we should use dma-fence errors instead. Just didn't had time to tackle cleaning that up yet. even amdgpu_job_stop_all_jobs_on_sched. Uff, seeing that for the first time just now. Another bad idea how to handle things which doesn't take the appropriate locks and looks racy to me. Latter could perhaps be solved by adding an iterator helper to the scheduler, which would perhaps be a good move for component isolation. And first two could be handled by implementing a complete and mutually exclusive duality of how entities are walked depending on scheduling mode. Plus making the scheduling mode only be configurable at boot. It feels doable but significant work and in the meantime removing the double re-lock maybe acceptable? I don't think we should optimize for something we want to remove in the long term. I knew using the term optimise would just making things more difficult for myself. :) Lets view this as cleaning up the API to avoid the inelegance of taking the same lock twice right next to each other. If we can achieve this while not making the API worse then there is nothing to lose either short, med or long term. If possible I would rather say that we should completely drop the RR approach and only use FIFO or even something more sophisticated. No complaints from me, but I don't know how that would work other than putting a depreciation warning if someone selected RR. And keeping that for a good number of kernel releases. Any other ideas? Regards, Tvrtko
Re: [PATCH 1/3] drm: add DRM_SET_NAME ioctl
On 13/09/2024 13:17, Pierre-Eric Pelloux-Prayer wrote: Hi Tvrtko, Le 12/09/2024 à 10:13, Tvrtko Ursulin a écrit : On 11/09/2024 15:58, Pierre-Eric Pelloux-Prayer wrote: Giving the opportunity to userspace to associate a free-form name with a drm_file struct is helpful for tracking and debugging. This is similar to the existing DMA_BUF_SET_NAME ioctl. Access to name is protected by a mutex, and the 'clients' debugfs file has been updated to print it. Userspace MR to use this ioctl: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428 Idea seems useful to me. Various classes of comments/questions below: Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/drm_debugfs.c | 12 drivers/gpu/drm/drm_file.c | 5 + drivers/gpu/drm/drm_ioctl.c | 28 include/drm/drm_file.h | 9 + include/uapi/drm/drm.h | 14 ++ 5 files changed, 64 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c index 6b239a24f1df..b7492225ae88 100644 --- a/drivers/gpu/drm/drm_debugfs.c +++ b/drivers/gpu/drm/drm_debugfs.c @@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, void *data) kuid_t uid; seq_printf(m, - "%20s %5s %3s master a %5s %10s\n", + "%20s %5s %3s master a %5s %10s %20s\n", "command", "tgid", "dev", "uid", - "magic"); + "magic", + "name"); /* dev->filelist is sorted youngest first, but we want to present * oldest first (i.e. kernel, servers, clients), so walk backwardss. @@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, void *data) struct task_struct *task; struct pid *pid; + mutex_lock(&priv->name_lock); rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */ pid = rcu_dereference(priv->pid); task = pid_task(pid, PIDTYPE_TGID); uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID; - seq_printf(m, "%20s %5d %3d %c %c %5d %10u\n", + seq_printf(m, "%20s %5d %3d %c %c %5d %10u %20s\n", task ? task->comm : "", pid_vnr(pid), priv->minor->index, is_current_master ? 'y' : 'n', priv->authenticated ? 'y' : 'n', from_kuid_munged(seq_user_ns(m), uid), - priv->magic); + priv->magic, + priv->name ? priv->name : ""); rcu_read_unlock(); + mutex_unlock(&priv->name_lock); FWIW it is possible you could get away without the need for a lock on the read side if you make the pointer RCU managed and stick a synchronize_rcu before kfree in the ioctl update path. Not because this lock would be a contentended one per se, but mostly to avoid complications such as amdgpu_debugfs_gem_info_show() where 3/3 has it broken - cannot take the mutex in rcu locked section. Just something to consider in case it would end up simpler code. I don't mind using RCU or a mutex. Christian suggested a mutex, so I used that, but I'm happy to switch if the RCU approach is preferred. Mutex is fine as I said. Just mentioning RCU since it feels trivial and avoids the complications in amdgpu_debugfs_gem_info_show(). mutex_unlock(&dev->filelist_mutex); return 0; diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 01fde94fe2a9..558151c3912e 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) spin_lock_init(&file->master_lookup_lock); mutex_init(&file->event_read_lock); + mutex_init(&file->name_lock); if (drm_core_check_feature(dev, DRIVER_GEM)) drm_gem_open(dev, file); @@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file) WARN_ON(!list_empty(&file->event_list)); put_pid(rcu_access_pointer(file->pid)); + + mutex_destroy(&file->name_lock); + kvfree(file->name); I think kfree is correct here. OK, I'll update in v2. + kfree(file); } diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 51f39912866f..ba2f2120e99b 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -540,6 +540,32 @@ int drm_version(struct drm_device *dev, void *data, return err; } +static int drm_set_name(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_set_name *name = data; + void *us
Re: [PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()
On 13/09/2024 13:30, Philipp Stanner wrote: On Fri, 2024-09-13 at 12:56 +0100, Tvrtko Ursulin wrote: Hi, On 28/08/2024 10:41, Philipp Stanner wrote: drm_sched_job_init() has no control over how users allocate struct drm_sched_job. Unfortunately, the function can also not set some struct members such as job->sched. job->sched usage from within looks like a bug. But not related to the memset you add. For this one something like this looks easiest for a start: diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index ab53ab486fe6..877113b01af2 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job, * or worse--a blank screen--leave a trail in the * logs, so this can be debugged easier. */ - drm_err(job->sched, "%s: entity has no rq!\n", __func__); + pr_err("%s: entity has no rq!\n", __func__); return -ENOENT; } Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Cc: # v6.7+ Danilo and I already solved that: https://lore.kernel.org/all/20240827074521.12828-2-pstan...@redhat.com/ Ah.. I saw the link to this in your maintainership thread and superficially assumed it is among the pending stuff. All good. This could theoretically lead to UB by users dereferencing the struct's pointer members too early. Hmm if drm_sched_job_init returned an error callers should not dereference anything. What was actually the issue you were debugging? I was learning about the scheduler, wrote a dummy driver and had awkward behavior. Turned out it was this pointer not being initialized. I would have seen it immediately if it were NULL. The actual issue was and is IMO that a function called drm_sched_job_init() initializes the job. But it doesn't, it only partially initializes it. Only after drm_sched_job_arm() ran you're actually ready to go. In my experience one good approach when developing stuff is to have the various kernel debugging aids enabled. Lockdep, SLAB debugging, memory poisoning, kfence.. Then if you were allocating your job without GFP_ZERO, _and_ dereferencing something too early out of misunderstanding of the API, you would get something obvious in the oops and not a random pointer. Which also applies to various CI systems, such as the Intel's one which already runs a debug kernel and a lot of these mistakes are caught instantly. Adding a memset is I think not the best solution since it is very likely redundant to someone doing a kzalloc in the first place. It is redundant in most cases, but it is effectively for free. I measured the runtime with 1e6 jobs with and without memset and there was no difference. I guess if kzalloc and drm_sched_job_init() are close enough in time so that cachelines stays put, and depending how you measure, it may be hard to see but cost if still there. For instance https://lore.kernel.org/amd-gfx/20240813140310.82706-1-tursu...@igalia.com/ I can see with perf that both memsets are hotspots even when testing with glxgears and vsync off. But I don't feel too strongly about this and there definitely is sense in initializing everything. Perhaps even instead of a memset we should use correct methods per field? Since in there we have spcs_node, atomic_t, ktime_t, dma_fence_cb (even in an annoying union), drm_sched_priority.. In an ideal world all those would have their initializers. But some don't so meh. Regards, Tvrtko It is easier to debug such issues if these pointers are initialized to NULL, so dereferencing them causes a NULL pointer exception. Accordingly, drm_sched_entity_init() does precisely that and initializes its struct with memset(). Initialize parameter "job" to 0 in drm_sched_job_init(). Signed-off-by: Philipp Stanner --- No changes in v2. --- drivers/gpu/drm/scheduler/sched_main.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 356c30fa24a8..b0c8ad10b419 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job *job, return -EINVAL; } + /* +* We don't know for sure how the user has allocated. Thus, zero the +* struct so that unallowed (i.e., too early) usage of pointers that +* this function does not set is guaranteed to lead to a NULL pointer +* exception instead of UB. +*/ + memset(job, 0, sizeof(*job)); + job->entity = entity; job->credits = credits; job->s_fence = drm_sched_fence_alloc(entity, owner);
Re: [PATCH v2 2/2] drm/sched: warn about drm_sched_job_init()'s partial init
On 28/08/2024 10:41, Philipp Stanner wrote: drm_sched_job_init()'s name suggests that after the function succeeded, parameter "job" will be fully initialized. This is not the case; some members are only later set, notably "job->sched" by drm_sched_job_arm(). Document that drm_sched_job_init() does not set all struct members. Document that job->sched in particular is uninitialized before drm_sched_job_arm(). Signed-off-by: Philipp Stanner --- Changes in v2: - Change grammar in the new comments a bit. --- drivers/gpu/drm/scheduler/sched_main.c | 4 include/drm/gpu_scheduler.h| 7 +++ 2 files changed, 11 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index b0c8ad10b419..721373938c1e 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -781,6 +781,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs); * Drivers must make sure drm_sched_job_cleanup() if this function returns * successfully, even when @job is aborted before drm_sched_job_arm() is called. * + * Note that this function does not assign a valid value to each struct member + * of struct drm_sched_job. Take a look at that struct's documentation to see + * who sets which struct member with what lifetime. First sentence is fine, but the second I don't see the those details in struct drm_sched_job. (And I am not saying that they must be listed. IMO at some point it is better to have a high level overview than describe the lifetime rules with individual members.) + * * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware * has died, which can mean that there's no valid runqueue for a @entity. * This function returns -ENOENT in this case (which probably should be -EIO as diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 5acc64954a88..04a268cd22f1 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); struct drm_sched_job { struct spsc_nodequeue_node; struct list_headlist; + + /* +* The scheduler this job is or will be scheduled on. +* +* Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops +* callback "free_job()" has been called. This is interesting - I was not sure where lifetime for job->sched is defined and couldn't find it browsing around. Where did you find the clues to tie it to the free_job() callback? Regards, Tvrtko +*/ struct drm_gpu_scheduler*sched; struct drm_sched_fence *s_fence;
Re: [PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()
Hi, On 28/08/2024 10:41, Philipp Stanner wrote: drm_sched_job_init() has no control over how users allocate struct drm_sched_job. Unfortunately, the function can also not set some struct members such as job->sched. job->sched usage from within looks like a bug. But not related to the memset you add. For this one something like this looks easiest for a start: diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index ab53ab486fe6..877113b01af2 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job, * or worse--a blank screen--leave a trail in the * logs, so this can be debugged easier. */ - drm_err(job->sched, "%s: entity has no rq!\n", __func__); + pr_err("%s: entity has no rq!\n", __func__); return -ENOENT; } Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Cc: # v6.7+ This could theoretically lead to UB by users dereferencing the struct's pointer members too early. Hmm if drm_sched_job_init returned an error callers should not dereference anything. What was actually the issue you were debugging? Adding a memset is I think not the best solution since it is very likely redundant to someone doing a kzalloc in the first place. Regards, Tvrtko It is easier to debug such issues if these pointers are initialized to NULL, so dereferencing them causes a NULL pointer exception. Accordingly, drm_sched_entity_init() does precisely that and initializes its struct with memset(). Initialize parameter "job" to 0 in drm_sched_job_init(). Signed-off-by: Philipp Stanner --- No changes in v2. --- drivers/gpu/drm/scheduler/sched_main.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 356c30fa24a8..b0c8ad10b419 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job *job, return -EINVAL; } + /* +* We don't know for sure how the user has allocated. Thus, zero the +* struct so that unallowed (i.e., too early) usage of pointers that +* this function does not set is guaranteed to lead to a NULL pointer +* exception instead of UB. +*/ + memset(job, 0, sizeof(*job)); + job->entity = entity; job->credits = credits; job->s_fence = drm_sched_fence_alloc(entity, owner);
Re: [PATCH 7/7] dma-buf: rework the enable_signaling handling
On 11/09/2024 09:59, Christian König wrote: The enable_signaling callback is the only function the dma_fence objects calls with the fence lock held (the signaled callback might be called with the fence lock held as well, but that isn't guaranted). The background of that decision was to avoid races with other CPUs trying to signal the fence at the same time and potentially enforce an ordering of fence signaling. The only problem is that this never worked correctly. First of all the enabling_signaling call can still race with signaling a fence, it's just that informing the installed callbacks is blocking for the enable signaling to finish. If that is required (radeon is an example of that) then drivers can still grab the fence themselves, everybody else doesn't need that. Then regarding fence ordering it is perfectly possible that fences emitted in the order A,B,C call their installed callbacks in the order B, C, A. The background is that the optimization to signal fences from dma_fence_is_signaled() decouples the fence signaling from the interrupt handlers. The result is that fence C can signal because somebody queried it's state while A and B still wait for their interrupt to arrive. While those two reasons are just unnecessary churn the documentation is simply erroneous and suggests an illegal operation to implementations: "This function can be called from atomic context, but not from irq context, so normal spinlocks can be used.". Since the enable_signaling callback was called with interrupts disabled that practice could deadlock. Furtunately nobody actually ran into problems with that, but considering that we should probably re-work the locking to allow dma_fence objects to exists after their drivers were unloaded this patch re-works all this to not call the callback with the dma_fence spinlock held and rather move the handling into the drivers which actually need it. Signed-off-by: Christian König --- drivers/dma-buf/dma-fence-array.c | 7 +- drivers/dma-buf/dma-fence-chain.c | 13 ++-- drivers/dma-buf/dma-fence.c | 68 +++ drivers/dma-buf/st-dma-fence-chain.c | 4 +- drivers/dma-buf/st-dma-fence-unwrap.c | 22 +++--- drivers/dma-buf/st-dma-fence.c| 16 ++--- drivers/dma-buf/st-dma-resv.c | 10 +-- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 12 ++-- drivers/gpu/drm/i915/i915_active.c| 2 +- drivers/gpu/drm/i915/i915_request.c | 12 +++- drivers/gpu/drm/nouveau/nouveau_fence.c | 9 ++- drivers/gpu/drm/radeon/radeon_fence.c | 17 +++-- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/xe/xe_bo.c| 2 +- drivers/gpu/drm/xe/xe_hw_fence.c | 4 +- drivers/gpu/drm/xe/xe_preempt_fence.c | 3 +- drivers/gpu/drm/xe/xe_pt.c| 2 +- drivers/gpu/drm/xe/xe_sched_job.c | 2 +- drivers/gpu/drm/xe/xe_vm.c| 6 +- drivers/gpu/host1x/fence.c| 14 ++-- include/linux/dma-fence.h | 35 +++--- 21 files changed, 123 insertions(+), 139 deletions(-) diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index c74ac197d5fe..1022b08c9b42 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -67,7 +67,7 @@ static void dma_fence_array_cb_func(struct dma_fence *f, dma_fence_put(&array->base); } -static bool dma_fence_array_enable_signaling(struct dma_fence *fence) +static void dma_fence_array_enable_signaling(struct dma_fence *fence) { struct dma_fence_array *array = to_dma_fence_array(fence); struct dma_fence_array_cb *cb = array->callbacks; @@ -92,12 +92,11 @@ static bool dma_fence_array_enable_signaling(struct dma_fence *fence) dma_fence_put(&array->base); if (atomic_dec_and_test(&array->num_pending)) { dma_fence_array_clear_pending_error(array); - return false; + dma_fence_signal(&array->base); + return; } } } - - return true; } static bool dma_fence_array_signaled(struct dma_fence *fence) diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c index 9663ba1bb6ac..f56baa214a6c 100644 --- a/drivers/dma-buf/dma-fence-chain.c +++ b/drivers/dma-buf/dma-fence-chain.c @@ -9,7 +9,7 @@ #include -static bool dma_fence_chain_enable_signaling(struct dma_fence *fence); +static void dma_fence_chain_enable_signaling(struct dma_fence *fence); /** * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence @@ -125,10 +125,7 @@ static void dma_fence_chain_irq_work(struct irq_work *work) chain = contain
Re: [PATCH 3/3] drm/amdgpu: use drm_file name
On 11/09/2024 15:58, Pierre-Eric Pelloux-Prayer wrote: In debugfs gem_info/vm_info files, timeout handler and page fault reports. This information is useful with the virtio/native-context driver: this allows the guest applications identifier to visible in amdgpu's output. The output in amdgpu_vm_info/amdgpu_gem_info looks like this: pid:12255 Process:glxgears/test-set-fd-name -- Signed-off-by: Pierre-Eric Pelloux-Prayer --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 11 -- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 20 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 4 ++-- 5 files changed, 31 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 6d5fd371d5ce..1712feb2c238 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1577,7 +1577,7 @@ int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct amdgpu_device *adev, if (ret) return ret; - amdgpu_vm_set_task_info(avm); + amdgpu_vm_set_task_info(avm, NULL); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 1e475eb01417..d32dc547cc80 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -310,7 +310,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, kvfree(chunk_array); /* Use this opportunity to fill in task info for the vm */ - amdgpu_vm_set_task_info(vm); + amdgpu_vm_set_task_info(vm, p->filp); return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 0e617dff8765..0e0d49060ca8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -1012,8 +1012,15 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused) rcu_read_lock(); pid = rcu_dereference(file->pid); task = pid_task(pid, PIDTYPE_TGID); - seq_printf(m, "pid %8d command %s:\n", pid_nr(pid), - task ? task->comm : ""); + seq_printf(m, "pid %8d command %s", pid_nr(pid), + task ? task->comm : ""); + if (file->name) { + mutex_lock(&file->name_lock); As mentioned taking a mutex under rcu_read_lock is not allowed. It will need to either be re-arranged or, also as mentioned, alternatively aligned to use the same RCU access rules. + seq_putc(m, '/'); + seq_puts(m, file->name); + mutex_unlock(&file->name_lock); + } + seq_puts(m, ":\n"); rcu_read_unlock(); spin_lock(&file->table_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index e20d19ae01b2..385211846ae3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2370,7 +2370,7 @@ static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) * * @vm: vm for which to set the info */ -void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) +void amdgpu_vm_set_task_info(struct amdgpu_vm *vm, struct drm_file *file) { if (!vm->task_info) return; @@ -2385,7 +2385,23 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm) return; vm->task_info->tgid = current->group_leader->pid; - get_task_comm(vm->task_info->process_name, current->group_leader); + __get_task_comm(vm->task_info->process_name, TASK_COMM_LEN, + current->group_leader); + /* Append drm_client_name if set. */ + if (file && file->name) { + int n; + + mutex_lock(&file->name_lock); + n = strlen(vm->task_info->process_name); + if (n < NAME_MAX) { NAME_MAX because sizeof(vm->task_info->process_name) is NAME_MAX? (hint) + if (file->name) { FWIW could check before strlen. + vm->task_info->process_name[n] = '/'; Can this replace the null terminator at process_name[NAME_MAX - 1] with a '/'? + strscpy_pad(&vm->task_info->process_name[n + 1], + file->name, NAME_MAX - (n + 1)); + } + } + mutex_unlock(&file->name_lock); + } } /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index d12d66dca8e9..cabec384b4d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_v
Re: [PATCH 1/3] drm: add DRM_SET_NAME ioctl
On 11/09/2024 15:58, Pierre-Eric Pelloux-Prayer wrote: Giving the opportunity to userspace to associate a free-form name with a drm_file struct is helpful for tracking and debugging. This is similar to the existing DMA_BUF_SET_NAME ioctl. Access to name is protected by a mutex, and the 'clients' debugfs file has been updated to print it. Userspace MR to use this ioctl: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1428 Idea seems useful to me. Various classes of comments/questions below: Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/drm_debugfs.c | 12 drivers/gpu/drm/drm_file.c| 5 + drivers/gpu/drm/drm_ioctl.c | 28 include/drm/drm_file.h| 9 + include/uapi/drm/drm.h| 14 ++ 5 files changed, 64 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_debugfs.c b/drivers/gpu/drm/drm_debugfs.c index 6b239a24f1df..b7492225ae88 100644 --- a/drivers/gpu/drm/drm_debugfs.c +++ b/drivers/gpu/drm/drm_debugfs.c @@ -78,12 +78,13 @@ static int drm_clients_info(struct seq_file *m, void *data) kuid_t uid; seq_printf(m, - "%20s %5s %3s master a %5s %10s\n", + "%20s %5s %3s master a %5s %10s %20s\n", "command", "tgid", "dev", "uid", - "magic"); + "magic", + "name"); /* dev->filelist is sorted youngest first, but we want to present * oldest first (i.e. kernel, servers, clients), so walk backwardss. @@ -94,19 +95,22 @@ static int drm_clients_info(struct seq_file *m, void *data) struct task_struct *task; struct pid *pid; + mutex_lock(&priv->name_lock); rcu_read_lock(); /* Locks priv->pid and pid_task()->comm! */ pid = rcu_dereference(priv->pid); task = pid_task(pid, PIDTYPE_TGID); uid = task ? __task_cred(task)->euid : GLOBAL_ROOT_UID; - seq_printf(m, "%20s %5d %3d %c%c %5d %10u\n", + seq_printf(m, "%20s %5d %3d %c%c %5d %10u %20s\n", task ? task->comm : "", pid_vnr(pid), priv->minor->index, is_current_master ? 'y' : 'n', priv->authenticated ? 'y' : 'n', from_kuid_munged(seq_user_ns(m), uid), - priv->magic); + priv->magic, + priv->name ? priv->name : ""); rcu_read_unlock(); + mutex_unlock(&priv->name_lock); FWIW it is possible you could get away without the need for a lock on the read side if you make the pointer RCU managed and stick a synchronize_rcu before kfree in the ioctl update path. Not because this lock would be a contentended one per se, but mostly to avoid complications such as amdgpu_debugfs_gem_info_show() where 3/3 has it broken - cannot take the mutex in rcu locked section. Just something to consider in case it would end up simpler code. } mutex_unlock(&dev->filelist_mutex); return 0; diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 01fde94fe2a9..558151c3912e 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -158,6 +158,7 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor) spin_lock_init(&file->master_lookup_lock); mutex_init(&file->event_read_lock); + mutex_init(&file->name_lock); if (drm_core_check_feature(dev, DRIVER_GEM)) drm_gem_open(dev, file); @@ -259,6 +260,10 @@ void drm_file_free(struct drm_file *file) WARN_ON(!list_empty(&file->event_list)); put_pid(rcu_access_pointer(file->pid)); + + mutex_destroy(&file->name_lock); + kvfree(file->name); I think kfree is correct here. + kfree(file); } diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 51f39912866f..ba2f2120e99b 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -540,6 +540,32 @@ int drm_version(struct drm_device *dev, void *data, return err; } +static int drm_set_name(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_set_name *name = data; + void *user_ptr; + char *new_name; + + if (name->name_len >= NAME_MAX) + return -EINVAL; Any special reason to use the filesystem NAME_MAX? + + user_ptr = u64_to_user_ptr(name->name); + + new_name = memdup_user_nul(user_ptr, name->name_len); + + if (IS_ERR(new_name)) + return PTR_ERR(new_name); + + mutex_lock(&file_priv->name_lock); + if (file_priv->name) + kvfree(file
[PULL] drm-intel-fixes
Hi Dave, Sima, It is late in the cycle and luckily the fix in this weeks PR is just something to satisfy static analyzers, nothing that can happen in reality, so pulling it is even optional. Regards, Tvrtko drm-intel-fixes-2024-09-12: - Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich) The following changes since commit da3ea35007d0af457a0afc87e84fddaebc4e0b63: Linux 6.11-rc7 (2024-09-08 14:50:28 -0700) are available in the Git repository at: https://gitlab.freedesktop.org/drm/i915/kernel.git tags/drm-intel-fixes-2024-09-12 for you to fetch changes up to d3d37f74683e2f16f2635ee265884f7ca69350ae: drm/i915/guc: prevent a possible int overflow in wq offsets (2024-09-10 08:13:51 +0100) - Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich) Nikita Zhandarovich (1): drm/i915/guc: prevent a possible int overflow in wq offsets drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
On 10/09/2024 11:25, Philipp Stanner wrote: On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we rename drm_sched_rq_add_entity() to drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be held, and also add the same expectation to drm_sched_rq_update_fifo_locked(). For more stream-lining we also add the run-queue as an explicit parameter to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee having to dereference entity->rq. Why is dereferencing it a problem? As you have noticed below the API is a bit unsightly. Consider for example this call chain: drm_sched_entity_kill(entity) drm_sched_rq_remove_entity(entity->rq, entity); drm_sched_rq_remove_fifo_locked(entity); struct drm_sched_rq *rq = entity->rq; A bit confused, no? I thought adding rq to remove_fifo_locked at least removes one back and forth between the entity->rq and rq. And then if we cache the rq in a local variable, after having explicitly taken the correct lock, we have this other call chain example: drm_sched_entity_push_job() ... rq = entity->rq; spin_lock(rq->lock); drm_sched_rq_add_entity_locked(rq, entity); drm_sched_rq_update_fifo_locked(rq, entity, submit_ts); spin_unlock(rq->lock); To me at least this reads more streamlined. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-- drivers/gpu/drm/scheduler/sched_main.c | 41 +- -- include/drm/gpu_scheduler.h | 7 ++-- 3 files changed, 31 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b4c4f9923e0b..2102c726d275 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); - drm_sched_rq_add_entity(rq, entity); + + spin_lock(&rq->lock); + drm_sched_rq_add_entity_locked(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 937e7d1cfc49..1ccd2aed2d32 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,41 +153,44 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b- oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) So here we'd add a new function parameter that still doesn't allow for getting rid of 'entity' as a parameter. We can't get rid of the entity. Maaaybe instead we could get rid of the rq in the whole chain, I mean from drm_sched_rq_add_entity and drm_sched_rq_remove_entity to start with. But then to remove double re-lock we still (like in this patch) need to make the callers take the locks and rename the helpers with _locked suffix. Otherwise it would be incosistent that a lock is taken outside the helpers with no _locked suffix. I am not sure if that is better. All it achieves is remove the rq as explicit parameter my making the callees dereference it from the entity. Worst part is all these helpers have drm_sched_rq_ prefix.. which to me reads as "we operate on rq". So not passing in rq is confusing to start with. Granted, some confusion still remains with my approach since ideally, to those helpers, I wanted to add some asserts that rq == entity->rq... The API gets larger that way and readers will immediately wonder why sth is passed as a separate variable that could also be obtained through the pointer. { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&am
Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity
On 10/09/2024 11:05, Philipp Stanner wrote: On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. I'd prefer if commit messages follow the idiomatic kernel style of that order: 1. Describe the current situation 2. State why it's bad or undesirable 3. (describe the solution) 4. Conclude commit message through sentences in imperative stating what the commit does. In this case I would go for: "struct drm_sched_rq contains a spinlock that protects several struct members. The current documentation incorrectly states that this lock only guards the entities list. In truth, it guards that list, the rb_tree and the current entity. Document what the lock actually guards. Rearrange struct members so that this becomes even more visible." IMO a bit much to ask for a text book format, for a trivial patch, when all points are already implicitly obvious. That is "lets make it clear" = current situation is not clear -> obviously bad with no need to explain; "and the same time document" = means it is currently not documented -> again obviously not desirable. But okay, since I agree with the point below (*), I can explode the text for maximum redundancy. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- include/drm/gpu_scheduler.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index a06753987d93..d4a3ba333568 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -243,10 +243,10 @@ struct drm_sched_entity { /** * struct drm_sched_rq - queue of entities to be scheduled. * - * @lock: to modify the entities list. * @sched: the scheduler to which this rq belongs to. - * @entities: list of the entities to be scheduled. + * @lock: protects the list, tree and current entity. Would be more consistent with the below comment if you'd address them with their full name, aka "protects @entities, @rb_tree_root and @current_entity". *) this one I agree with. Regards, Tvrtko Thanks, P. * @current_entity: the entity which is to be scheduled. + * @entities: list of the entities to be scheduled. * @rb_tree_root: root of time based priory queue of entities for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for @@ -254,10 +254,12 @@ struct drm_sched_entity { * the next entity to emit commands from. */ struct drm_sched_rq { - spinlock_t lock; struct drm_gpu_scheduler *sched; - struct list_head entities; + + spinlock_t lock; + /* Following members are protected by the @lock: */ struct drm_sched_entity *current_entity; + struct list_head entities; struct rb_root_cached rb_tree_root; };
Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
On 10/09/2024 10:08, Christian König wrote: Am 09.09.24 um 19:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) I think that goes into the wrong direction. Probably better to move this here into drm_sched_rq_add_entity(): if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo_locked(entity, submit_ts); We can then also drop adding the entity to the rr list when FIFO is in use. Unfortuntely there is a few other places which appear to rely on the list. Like drm_sched_fini, drm_sched_increase_karma and even amdgpu_job_stop_all_jobs_on_sched. Latter could perhaps be solved by adding an iterator helper to the scheduler, which would perhaps be a good move for component isolation. And first two could be handled by implementing a complete and mutually exclusive duality of how entities are walked depending on scheduling mode. Plus making the scheduling mode only be configurable at boot. It feels doable but significant work and in the meantime removing the double re-lock maybe acceptable? Regards, Tvrtko To achieve this we rename drm_sched_rq_add_entity() to drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be held, and also add the same expectation to drm_sched_rq_update_fifo_locked(). For more stream-lining we also add the run-queue as an explicit parameter to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee having to dereference entity->rq. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-- drivers/gpu/drm/scheduler/sched_main.c | 41 +--- include/drm/gpu_scheduler.h | 7 ++-- 3 files changed, 31 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b4c4f9923e0b..2102c726d275 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); - drm_sched_rq_add_entity(rq, entity); + + spin_lock(&rq->lock); + drm_sched_rq_add_entity_locked(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 937e7d1cfc49..1ccd2aed2d32 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,41 +153,44 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq, + ktime_t ts) { lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) { + struct drm_sched_rq *rq; + /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the
Re: [PATCH] drm/syncobj: Fix syncobj leak in drm_syncobj_eventfd_ioctl
On 09/09/2024 21:53, T.J. Mercier wrote: A syncobj reference is taken in drm_syncobj_find, but not released if eventfd_ctx_fdget or kzalloc fails. Put the reference in these error paths. Reported-by: Xingyu Jin Fixes: c7a472297169 ("drm/syncobj: add IOCTL to register an eventfd") Signed-off-by: T.J. Mercier --- drivers/gpu/drm/drm_syncobj.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index a0e94217b511..4fcfc0b9b386 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1464,6 +1464,7 @@ drm_syncobj_eventfd_ioctl(struct drm_device *dev, void *data, struct drm_syncobj *syncobj; struct eventfd_ctx *ev_fd_ctx; struct syncobj_eventfd_entry *entry; + int ret; if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE)) return -EOPNOTSUPP; @@ -1479,13 +1480,15 @@ drm_syncobj_eventfd_ioctl(struct drm_device *dev, void *data, return -ENOENT; ev_fd_ctx = eventfd_ctx_fdget(args->fd); - if (IS_ERR(ev_fd_ctx)) - return PTR_ERR(ev_fd_ctx); + if (IS_ERR(ev_fd_ctx)) { + ret = PTR_ERR(ev_fd_ctx); + goto err_fdget; + } entry = kzalloc(sizeof(*entry), GFP_KERNEL); if (!entry) { - eventfd_ctx_put(ev_fd_ctx); - return -ENOMEM; + ret = -ENOMEM; + goto err_kzalloc; } entry->syncobj = syncobj; entry->ev_fd_ctx = ev_fd_ctx; @@ -1496,6 +1499,12 @@ drm_syncobj_eventfd_ioctl(struct drm_device *dev, void *data, drm_syncobj_put(syncobj); return 0; + +err_kzalloc: + eventfd_ctx_put(ev_fd_ctx); +err_fdget: + drm_syncobj_put(syncobj); + return ret; } int Easy enough to review while browsing the list: Reviewed-by: Tvrtko Ursulin Regards, Tvrtko
[PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job
From: Tvrtko Ursulin Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we rename drm_sched_rq_add_entity() to drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be held, and also add the same expectation to drm_sched_rq_update_fifo_locked(). For more stream-lining we also add the run-queue as an explicit parameter to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee having to dereference entity->rq. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-- drivers/gpu/drm/scheduler/sched_main.c | 41 +--- include/drm/gpu_scheduler.h | 7 ++-- 3 files changed, 31 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b4c4f9923e0b..2102c726d275 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) sched = rq->sched; atomic_inc(sched->score); - drm_sched_rq_add_entity(rq, entity); + + spin_lock(&rq->lock); + drm_sched_rq_add_entity_locked(rq, entity); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo_locked(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, rq, submit_ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 937e7d1cfc49..1ccd2aed2d32 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -153,41 +153,44 @@ static __always_inline bool drm_sched_entity_compare_before(struct rb_node *a, return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); } -static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity) +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *entity, + struct drm_sched_rq *rq) { - struct drm_sched_rq *rq = entity->rq; - if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); RB_CLEAR_NODE(&entity->rb_tree_node); } } -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, +struct drm_sched_rq *rq, +ktime_t ts) { lockdep_assert_held(&entity->lock); + lockdep_assert_held(&rq->lock); - spin_lock(&entity->rq->lock); - - drm_sched_rq_remove_fifo_locked(entity); + drm_sched_rq_remove_fifo_locked(entity, rq); entity->oldest_job_waiting = ts; - rb_add_cached(&entity->rb_tree_node, &entity->rq->rb_tree_root, + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, drm_sched_entity_compare_before); - - spin_unlock(&entity->rq->lock); } void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) { + struct drm_sched_rq *rq; + /* * Both locks need to be grabbed, one to protect from entity->rq change * for entity from within concurrent drm_sched_entity_select_rq and the * other to update the rb tree structure. */ spin_lock(&entity->lock); - drm_sched_rq_update_fifo_locked(entity, ts); + rq = entity->rq; + spin_lock(&rq->lock); + drm_sched_rq_update_fifo_locked(entity, rq, ts); + spin_unlock(&rq->lock); spin_unlock(&entity->lock); } @@ -210,25 +213,23 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, } /** - * drm_sched_rq_add_entity - add an entity + * drm_sched_rq_add_entity_locked - add an entity * * @rq: scheduler run queue * @entity: scheduler entity * * Adds a scheduler entity to the run queue. */ -void drm_sched_rq_add_entity(struct drm_sched_rq *rq, -struct drm_sched_entity *entity) +void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq, +
[PATCH 1/8] drm/sched: Add locking to drm_sched_entity_modify_sched
From: Tvrtko Ursulin Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Cc: Philipp Stanner Cc: # v5.7+ --- drivers/gpu/drm/scheduler/sched_entity.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 58c8161289fe..ae8be30472cd 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); + spin_lock(&entity->rq_lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; + spin_unlock(&entity->rq_lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); -- 2.46.0
[PATCH 7/8] drm/sched: Re-group and rename the entity run-queue lock
From: Tvrtko Ursulin Christian suggested to rename the lock and improve the documentation of what it protects. And to also re-order the structure members so all protected by the lock are together in a block. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 24 drivers/gpu/drm/scheduler/sched_main.c | 6 +++--- include/drm/gpu_scheduler.h | 15 --- 3 files changed, 23 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 2da677681291..b4c4f9923e0b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -105,7 +105,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, /* We start in an idle state. */ complete_all(&entity->entity_idle); - spin_lock_init(&entity->rq_lock); + spin_lock_init(&entity->lock); spsc_queue_init(&entity->job_queue); atomic_set(&entity->fence_seq, 0); @@ -133,10 +133,10 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, { WARN_ON(!num_sched_list || !sched_list); - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->sched_list = sched_list; entity->num_sched_list = num_sched_list; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_modify_sched); @@ -244,10 +244,10 @@ static void drm_sched_entity_kill(struct drm_sched_entity *entity) if (!entity->rq) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->stopped = true; drm_sched_rq_remove_entity(entity->rq, entity); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); /* Make sure this entity is not used by the scheduler at the moment */ wait_for_completion(&entity->entity_idle); @@ -396,9 +396,9 @@ static void drm_sched_entity_wakeup(struct dma_fence *f, void drm_sched_entity_set_priority(struct drm_sched_entity *entity, enum drm_sched_priority priority) { - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); entity->priority = priority; - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); } EXPORT_SYMBOL(drm_sched_entity_set_priority); @@ -555,14 +555,14 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity) if (fence && !dma_fence_is_signaled(fence)) return; - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list); rq = sched ? sched->sched_rq[entity->priority] : NULL; if (rq != entity->rq) { drm_sched_rq_remove_entity(entity->rq, entity); entity->rq = rq; } - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); if (entity->num_sched_list == 1) entity->sched_list = NULL; @@ -602,9 +602,9 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) struct drm_sched_rq *rq; /* Add the entity to the run queue */ - spin_lock(&entity->rq_lock); + spin_lock(&entity->lock); if (entity->stopped) { - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); DRM_ERROR("Trying to push to a killed entity\n"); return; @@ -619,7 +619,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_update_fifo_locked(entity, submit_ts); - spin_unlock(&entity->rq_lock); + spin_unlock(&entity->lock); drm_sched_wakeup(sched, entity); } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 54c5fe7a7d1d..937e7d1cfc49 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -165,7 +165,7 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) { - lockdep_assert_held(&entity->rq_lock); + lockdep_assert_held(&entity->lock); spin_lock(&entity->rq->lock); @@ -186,9 +186,9 @@
[PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity
From: Tvrtko Ursulin Lets re-order the members to make it clear which are protected by the lock and at the same time document it via kerneldoc. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- include/drm/gpu_scheduler.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index a06753987d93..d4a3ba333568 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -243,10 +243,10 @@ struct drm_sched_entity { /** * struct drm_sched_rq - queue of entities to be scheduled. * - * @lock: to modify the entities list. * @sched: the scheduler to which this rq belongs to. - * @entities: list of the entities to be scheduled. + * @lock: protects the list, tree and current entity. * @current_entity: the entity which is to be scheduled. + * @entities: list of the entities to be scheduled. * @rb_tree_root: root of time based priory queue of entities for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for @@ -254,10 +254,12 @@ struct drm_sched_entity { * the next entity to emit commands from. */ struct drm_sched_rq { - spinlock_t lock; struct drm_gpu_scheduler*sched; - struct list_headentities; + + spinlock_t lock; + /* Following members are protected by the @lock: */ struct drm_sched_entity *current_entity; + struct list_headentities; struct rb_root_cached rb_tree_root; }; -- 2.46.0
[PATCH v2 0/8] DRM scheduler fixes, or not, or incorrect kind
From: Tvrtko Ursulin Re-spin of the series from two days ago with review feedback addressed and some new patches added. Changelog is in individual patches but essentially new patches are renames and struct members re-ordering as discussed in v1, plus one more optimisation when I noticed we can save another spinlock re-lock cycle this time on rq->lock. Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner Tvrtko Ursulin (8): drm/sched: Add locking to drm_sched_entity_modify_sched drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job drm/sched: Always increment correct scheduler score drm/sched: Optimise drm_sched_entity_push_job drm/sched: Stop setting current entity in FIFO mode drm/sched: Re-order struct drm_sched_rq members for clarity drm/sched: Re-group and rename the entity run-queue lock drm/sched: Further optimise drm_sched_entity_push_job drivers/gpu/drm/scheduler/sched_entity.c | 40 +++-- drivers/gpu/drm/scheduler/sched_main.c | 57 ++-- include/drm/gpu_scheduler.h | 31 +++-- 3 files changed, 77 insertions(+), 51 deletions(-) -- 2.46.0
[PATCH 4/8] drm/sched: Optimise drm_sched_entity_push_job
From: Tvrtko Ursulin In FIFO mode We can avoid dropping the lock only to immediately re-acquire by adding a new drm_sched_rq_update_fifo_locked() helper. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_entity.c | 5 +++-- drivers/gpu/drm/scheduler/sched_main.c | 21 ++--- include/drm/gpu_scheduler.h | 1 + 3 files changed, 18 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6645a8524699..2da677681291 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -615,10 +615,11 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(sched->score); drm_sched_rq_add_entity(rq, entity); - spin_unlock(&entity->rq_lock); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) - drm_sched_rq_update_fifo(entity, submit_ts); + drm_sched_rq_update_fifo_locked(entity, submit_ts); + + spin_unlock(&entity->rq_lock); drm_sched_wakeup(sched, entity); } diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index ab53ab486fe6..10abbcefe9d8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -163,14 +163,10 @@ static inline void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity *enti } } -void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts) { - /* -* Both locks need to be grabbed, one to protect from entity->rq change -* for entity from within concurrent drm_sched_entity_select_rq and the -* other to update the rb tree structure. -*/ - spin_lock(&entity->rq_lock); + lockdep_assert_held(&entity->rq_lock); + spin_lock(&entity->rq->lock); drm_sched_rq_remove_fifo_locked(entity); @@ -181,6 +177,17 @@ void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) drm_sched_entity_compare_before); spin_unlock(&entity->rq->lock); +} + +void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts) +{ + /* +* Both locks need to be grabbed, one to protect from entity->rq change +* for entity from within concurrent drm_sched_entity_select_rq and the +* other to update the rb tree structure. +*/ + spin_lock(&entity->rq_lock); + drm_sched_rq_update_fifo_locked(entity, ts); spin_unlock(&entity->rq_lock); } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index fe8edb917360..a06753987d93 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -594,6 +594,7 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity); void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts); +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity, ktime_t ts); int drm_sched_entity_init(struct drm_sched_entity *entity, enum drm_sched_priority priority, -- 2.46.0
[PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode
From: Tvrtko Ursulin It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin Cc: Christian König Cc: Alex Deucher Cc: Luben Tuikov Cc: Matthew Brost Cc: Philipp Stanner --- drivers/gpu/drm/scheduler/sched_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 10abbcefe9d8..54c5fe7a7d1d 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -356,7 +356,6 @@ drm_sched_rq_select_entity_fifo(struct drm_gpu_scheduler *sched, return ERR_PTR(-ENOSPC); } - rq->current_entity = entity; reinit_completion(&entity->entity_idle); break; } -- 2.46.0