Hi Luben,

I was comparing the bad jobs of failed ib test and the ones that causes the 
TDR, and I think the main difference is whether it is submitted via drm_sched 
or not. In simple test cases it doesn't seem to incorrectly signal the fences 
that shouldn't be signaled. We indeed may need more heavier tests but so far 
based on static analyze I think I didn't notice the case you mentioned. There's 
another case using direct job submission during resete, but it happens in 
recover_vram which happens after the pre_asic reset so I think it won’t be 
affected.

I'll move this lines into a new function as you suggested and resent a v2 patch.

Regards,
Yubiao Wang

-----Original Message-----
From: Tuikov, Luben <luben.tui...@amd.com> 
Sent: Wednesday, March 8, 2023 7:22 AM
To: Koenig, Christian <christian.koe...@amd.com>; Wang, YuBiao 
<yubiao.w...@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Chen, Horace <horace.c...@amd.com>; Deucher, Alexander 
<alexander.deuc...@amd.com>; Zhang, Hawking <hawking.zh...@amd.com>; Liu, Monk 
<monk....@amd.com>; Xu, Feifei <feifei...@amd.com>; Wang, Yang(Kevin) 
<kevinyang.w...@amd.com>
Subject: Re: [PATCH] drm/amdgpu: Force signal hw_fences that are embedded in 
non-sched jobs

On 2023-03-07 15:36, Luben Tuikov wrote:
> +                     job = container_of(old, struct amdgpu_job, hw_fence);
> +                     if (!job->base.s_fence && !dma_fence_is_signaled(old))
> +                             dma_fence_signal(old);

Thinking about this more, is !job->base.s_fence condition here enough to mean 
"non-sched jobs like ib_test"?

I feel that it is a bit overloaded here--could we have this condition 
satisfied,yet we can't willy-nilly signal the fence here?
--
Regards,
Luben

Reply via email to