On 10/31/25 14:53, Alex Deucher wrote:
> On Fri, Oct 31, 2025 at 4:40 AM Christian König
> <[email protected]> wrote:
>>
>> On 10/27/25 23:02, Alex Deucher wrote:
>>> If we don't end up initializing the fences, free them when
>>> we free the job.
>>>
>>> v2: take a reference to the fences if we emit them
>>>
>>> Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling")
>>> Reviewed-by: Jesse Zhang <[email protected]> (v1)
>>> Signed-off-by: Alex Deucher <[email protected]>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 ++
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 18 ++++++++++++++++++
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 ++
>>> 3 files changed, 22 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> index 39229ece83f83..0596114377600 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> @@ -302,6 +302,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring,
>>> unsigned int num_ibs,
>>> return r;
>>> }
>>> *f = &af->base;
>>> + /* get a ref for the job */
>>> + dma_fence_get(*f);
>>
>> I think it would be better to set the fence inside the job to NULL as soon
>> as it is consumed/initialized.
>
> We need the pointer for the job timed out handling.
I don't think that is true. During a timeout we should have
job->s_fence->parent for the HW fence.
But even when we go down that route here, you only grab a reference to the
hw_fence but not the hw_vm_fence.
That looks broken to me.
Christian.
>
> Alex
>
>>
>>>
>>> if (ring->funcs->insert_end)
>>> ring->funcs->insert_end(ring);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 55c7e104d5ca0..dc970f5fe601b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -295,6 +295,15 @@ static void amdgpu_job_free_cb(struct drm_sched_job
>>> *s_job)
>>>
>>> amdgpu_sync_free(&job->explicit_sync);
>>>
>>> + if (job->hw_fence->base.ops)
>>> + dma_fence_put(&job->hw_fence->base);
>>> + else
>>> + kfree(job->hw_fence);
>>> + if (job->hw_vm_fence->base.ops)
>>> + dma_fence_put(&job->hw_vm_fence->base);
>>> + else
>>> + kfree(job->hw_vm_fence);
>>> +
>>
>> This way that here can just be a kfree(..).
>>
>> Regards,
>> Christian.
>>
>>> kfree(job);
>>> }
>>>
>>> @@ -324,6 +333,15 @@ void amdgpu_job_free(struct amdgpu_job *job)
>>> if (job->gang_submit != &job->base.s_fence->scheduled)
>>> dma_fence_put(job->gang_submit);
>>>
>>> + if (job->hw_fence->base.ops)
>>> + dma_fence_put(&job->hw_fence->base);
>>> + else
>>> + kfree(job->hw_fence);
>>> + if (job->hw_vm_fence->base.ops)
>>> + dma_fence_put(&job->hw_vm_fence->base);
>>> + else
>>> + kfree(job->hw_vm_fence);
>>> +
>>> kfree(job);
>>> }
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index db66b4232de02..f8c67840f446f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -845,6 +845,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct
>>> amdgpu_job *job,
>>> if (r)
>>> return r;
>>> fence = &job->hw_vm_fence->base;
>>> + /* get a ref for the job */
>>> + dma_fence_get(fence);
>>> }
>>>
>>> if (vm_flush_needed) {
>>