On 18.09.25 11:13, David Rosca wrote:
> 
> On 18. 09. 25 9:47, Tvrtko Ursulin wrote:
>>
>> On 17/09/2025 11:54, David Rosca wrote:
>>> Hi,
>>>
>>> On 17. 09. 25 12:15, Tvrtko Ursulin wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 17/09/2025 10:59, David Rosca wrote:
>>>>> drm_syncobj_find_fence returns fence chain for timeline syncobjs.
>>>>> Scheduler expects normal fences as job dependencies to be able to
>>>>> determine whether the fences come from the same entity or sched
>>>>> and skip waiting on them.
>>>>> With fence chain as job dependency, the fence will always be
>>>>> waited on forcing CPU round-trip before starting the job.
>>>>
>>>> Interesting! I was sending patches to fix this differently last year or 
>>>> so, by making the scheduler use dma_fence_array for tracking dependencies 
>>>> and relying on dma_fence_unwrap_merge to unwrap, coalesce contexts and 
>>>> only keep the latest fence for each. But I did not have a good story to 
>>>> show for which use cases it helped. So I am curious if you could share 
>>>> which scenario you found gets an improvement from your patch?
>>>
>>> The scenario I am trying to fix is very simple to reproduce with Vulkan 
>>> when using timeline semaphore to sync submissions on the same queue (eg. 
>>> each submit waiting on value signaled by previous submit). I have noticed 
>>> this issue with FFmpeg Vulkan video code, but it will happen with any 
>>> Vulkan app using this pattern.
>>
>> Still out of curiosity, is the performance loss from the CPU round-trip 
>> something you are able to measure?
> 
> Yes, I'm seeing measurable improvement with this patch applied. I have tested 
> three cases (decoding three different videos) now and the result was +2%, +4% 
> and +6%.

Mhm, I'm nearly 100% sure that we unwrapped the fences when we added timeline 
support. That must have been broken at some point without anybody noticing it.

Potentially when we moved the dependency handling from driver specific into 
common code. Probably a good idea to add CC stable to the patch before pushing 
it.

Going to review it on the original.

Regards,
Christian.

> 
> Regards,
> David
> 
>>
>> Btw your patch is I think fine, so:
>>
>> Reviewed-by: Tvrtko Ursulin <[email protected]>
>>
>> But you will probably need Christian to ack it.
>>
>> Regards,
>>
>> Tvrtko
>>
>>>>> Signed-off-by: David Rosca <[email protected]>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 10 ++++++++--
>>>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/ 
>>>>> drm/amd/amdgpu/amdgpu_cs.c
>>>>> index 2e93d570153c..779c11227a53 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>>> @@ -29,6 +29,7 @@
>>>>>   #include <linux/pagemap.h>
>>>>>   #include <linux/sync_file.h>
>>>>>   #include <linux/dma-buf.h>
>>>>> +#include <linux/dma-fence-unwrap.h>
>>>>>     #include <drm/amdgpu_drm.h>
>>>>>   #include <drm/drm_syncobj.h>
>>>>> @@ -450,7 +451,8 @@ static int amdgpu_syncobj_lookup_and_add(struct 
>>>>> amdgpu_cs_parser *p,
>>>>>                        uint32_t handle, u64 point,
>>>>>                        u64 flags)
>>>>>   {
>>>>> -    struct dma_fence *fence;
>>>>> +    struct dma_fence *fence, *f;
>>>>> +    struct dma_fence_unwrap iter;
>>>>>       int r;
>>>>>         r = drm_syncobj_find_fence(p->filp, handle, point, flags, &fence);
>>>>> @@ -460,7 +462,11 @@ static int amdgpu_syncobj_lookup_and_add(struct 
>>>>> amdgpu_cs_parser *p,
>>>>>           return r;
>>>>>       }
>>>>>   -    r = amdgpu_sync_fence(&p->sync, fence, GFP_KERNEL);
>>>>> +    dma_fence_unwrap_for_each(f, &iter, fence) {
>>>>> +        if (!r)
>>>>> +            r = amdgpu_sync_fence(&p->sync, f, GFP_KERNEL);
>>>>> +    }
>>>>> +
>>>>>       dma_fence_put(fence);
>>>>>       return r;
>>>>>   }
>>>>
>>

Reply via email to