On 18. 09. 25 11:13, David Rosca wrote:
On 18. 09. 25 9:47, Tvrtko Ursulin wrote:
On 17/09/2025 11:54, David Rosca wrote:
Hi,
On 17. 09. 25 12:15, Tvrtko Ursulin wrote:
Hi,
On 17/09/2025 10:59, David Rosca wrote:
drm_syncobj_find_fence returns fence chain for timeline syncobjs.
Scheduler expects normal fences as job dependencies to be able to
determine whether the fences come from the same entity or sched
and skip waiting on them.
With fence chain as job dependency, the fence will always be
waited on forcing CPU round-trip before starting the job.
Interesting! I was sending patches to fix this differently last
year or so, by making the scheduler use dma_fence_array for
tracking dependencies and relying on dma_fence_unwrap_merge to
unwrap, coalesce contexts and only keep the latest fence for each.
But I did not have a good story to show for which use cases it
helped. So I am curious if you could share which scenario you found
gets an improvement from your patch?
The scenario I am trying to fix is very simple to reproduce with
Vulkan when using timeline semaphore to sync submissions on the same
queue (eg. each submit waiting on value signaled by previous
submit). I have noticed this issue with FFmpeg Vulkan video code,
but it will happen with any Vulkan app using this pattern.
Still out of curiosity, is the performance loss from the CPU
round-trip something you are able to measure?
Yes, I'm seeing measurable improvement with this patch applied. I have
tested three cases (decoding three different videos) now and the
result was +2%, +4% and +6%.
Regards,
David
Cc Christian
Btw your patch is I think fine, so:
Reviewed-by: Tvrtko Ursulin <[email protected]>
But you will probably need Christian to ack it.
Regards,
Tvrtko
Signed-off-by: David Rosca <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/
drm/amd/amdgpu/amdgpu_cs.c
index 2e93d570153c..779c11227a53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -29,6 +29,7 @@
#include <linux/pagemap.h>
#include <linux/sync_file.h>
#include <linux/dma-buf.h>
+#include <linux/dma-fence-unwrap.h>
#include <drm/amdgpu_drm.h>
#include <drm/drm_syncobj.h>
@@ -450,7 +451,8 @@ static int
amdgpu_syncobj_lookup_and_add(struct amdgpu_cs_parser *p,
uint32_t handle, u64 point,
u64 flags)
{
- struct dma_fence *fence;
+ struct dma_fence *fence, *f;
+ struct dma_fence_unwrap iter;
int r;
r = drm_syncobj_find_fence(p->filp, handle, point, flags,
&fence);
@@ -460,7 +462,11 @@ static int
amdgpu_syncobj_lookup_and_add(struct amdgpu_cs_parser *p,
return r;
}
- r = amdgpu_sync_fence(&p->sync, fence, GFP_KERNEL);
+ dma_fence_unwrap_for_each(f, &iter, fence) {
+ if (!r)
+ r = amdgpu_sync_fence(&p->sync, f, GFP_KERNEL);
+ }
+
dma_fence_put(fence);
return r;
}