Hi Philipp, On Wed, 12 Nov 2025 13:38:00 +0100 Philipp Stanner <[email protected]> wrote:
> On Wed, 2025-11-12 at 13:17 +0100, Boris Brezillon wrote: > > From: Ashley Smith <[email protected]> > > > > The timeout logic provided by drm_sched leads to races when we try > > to suspend it while the drm_sched workqueue queues more jobs. Let's > > overhaul the timeout handling in panthor to have our own delayed work > > that's resumed/suspended when a group is resumed/suspended. When an > > actual timeout occurs, we call drm_sched_fault() to report it > > through drm_sched, still. But otherwise, the drm_sched timeout is > > disabled (set to MAX_SCHEDULE_TIMEOUT), which leaves us in control of > > how we protect modifications on the timer. > > > > One issue seems to be when we call drm_sched_suspend_timeout() from > > both queue_run_job() and tick_work() which could lead to races due to > > drm_sched_suspend_timeout() not having a lock. Another issue seems to > > be in queue_run_job() if the group is not scheduled, we suspend the > > timeout again which undoes what drm_sched_job_begin() did when calling > > drm_sched_start_timeout(). So the timeout does not reset when a job > > is finished. > > > > > > […] > > > + > > +static void > > +queue_reset_timeout_locked(struct panthor_queue *queue) > > +{ > > + lockdep_assert_held(&queue->fence_ctx.lock); > > + > > + if (queue->timeout.remaining != MAX_SCHEDULE_TIMEOUT) { > > + mod_delayed_work(queue->scheduler.timeout_wq, > > Here you are interfering with the scheduler's internals again, don't > you. I think we agreed that we don't want to do such things anymore, > didn't we? We're not really touching drm_gpu_scheduler's internals, we're just retrieving the timeout workqueue we passed at init time: panthor_queue::timeout is panthor internals not drm_sched internals. This being said, I agree we should use ptdev->reset.wq instead of retrieving the timeout workqueue through the drm_gpu_scheduler object. Just to be clear, the goal of this patch is to bypass the drm_gpu_scheduler timeout logic entirely, so we can have our own thing that's not racy (see below). > > You could write a proper drm_sched API function which serves your > usecase. It's not really lack of support for our usecase that drives this change, but more the fact the current helpers are racy for drivers that have a 1:1 entity:sched relationship with queues that can be scheduled out behind drm_gpu_scheduler's back. > > Or could maybe DRM_GPU_SCHED_STAT_NO_HANG be returned from your driver > in case an interrupt actually fires? I don't think it helps, see below. > > > + &queue->timeout.work, > > + msecs_to_jiffies(JOB_TIMEOUT_MS)); > > + } > > +} > > + > > +static bool > > +group_can_run(struct panthor_group *group) > > +{ > > + return group->state != PANTHOR_CS_GROUP_TERMINATED && > > + group->state != PANTHOR_CS_GROUP_UNKNOWN_STATE && > > + !group->destroyed && group->fatal_queues == 0 && > > + !group->timedout; > > +} > > + > > > > […] > > > +queue_suspend_timeout(struct panthor_queue *queue) > > +{ > > + spin_lock(&queue->fence_ctx.lock); > > + queue_suspend_timeout_locked(queue); > > + spin_unlock(&queue->fence_ctx.lock); > > +} > > + > > +static void > > +queue_resume_timeout(struct panthor_queue *queue) > > +{ > > + spin_lock(&queue->fence_ctx.lock); > > + > > + if (queue_timeout_is_suspended(queue)) { > > + mod_delayed_work(queue->scheduler.timeout_wq, > > There is drm_sched_resume_timeout(). Why can it not be used? Because it's racy. I don't remember all the details, but IIRC, it had to do with job insertion calling drm_sched_start_timeout() while we're calling drm_sched_resume_timeout() from cs_slot_reset_locked(). We tried to make it work, but we gave up at some point and went for a driver-specific timer, because ultimately what we want is a per-queue timeout that we can pause/resume without drm_sched interfering when new jobs are queued to our ring buffers: we resume when the execution context (AKA group) is scheduled in, and we pause when this execution context is scheduled out. That's the very reason we set drm_gpu_scheduler::timeout to MAX_SCHEDULE_TIMEOUT at init time (AKA, timeout disabled) and never touch that again. When our driver-internal timer expires, we forward the information to the drm_sched layer by calling drm_sched_fault(). > > > + &queue->timeout.work, > > + queue->timeout.remaining); > > + > > + queue->timeout.remaining = MAX_SCHEDULE_TIMEOUT; > > + } > > + > > + spin_unlock(&queue->fence_ctx.lock); > > +} > > + > > > > […] > > > > > @@ -3270,6 +3379,11 @@ queue_timedout_job(struct drm_sched_job *sched_job) > > > > queue_start(queue); > > > > + /* We already flagged the queue as faulty, make sure we don't get > > + * called again. > > + */ > > + queue->scheduler.timeout = MAX_SCHEDULE_TIMEOUT; > > + > > return DRM_GPU_SCHED_STAT_RESET; > > DRM_GPU_SCHED_STAT_NO_HANG instead of just modifying the scheduler's > internal data?? queue->scheduler.timeout = MAX_SCHEDULE_TIMEOUT; is a leftover from a previous version. We shouldn't have to modify that here because the timeout is initialized to MAX_SCHEDULE_TIMEOUT and never touched again. Regards, Boris
