El jue, 05-03-2026 a las 09:16 -0300, Maíra Canal escribió:
> Hi Iago and Tvrtko,
> 
> Thanks for the reviews!
> 
> On 05/03/26 08:21, Iago Toral wrote:
> > El jue, 05-03-2026 a las 10:50 +0000, Tvrtko Ursulin escribió:
> > > 
> > > On 05/03/2026 10:35, Iago Toral wrote:
> > > > El jue, 05-03-2026 a las 10:25 +0000, Tvrtko Ursulin escribió:
> > > > > 
> > > > > On 05/03/2026 10:18, Iago Toral wrote:
> > > > > > El jue, 05-03-2026 a las 09:34 +0000, Tvrtko Ursulin
> > > > > > escribió:
> > > > > > > 
> > > > > > > On 05/03/2026 09:15, Iago Toral wrote:
> > > > > > > > El mar, 17-02-2026 a las 09:18 -0300, Maíra Canal
> > > > > > > > escribió:
> > > > > > > > > From: Tvrtko Ursulin <[email protected]>
> > > > > > > > > 
> > > > > > > > > To remove the file_priv NULL-ing dance needed to
> > > > > > > > > check if
> > > > > > > > > the
> > > > > > > > > file
> > > > > > > > > descriptor is open, move the per-fd reset counter
> > > > > > > > > into
> > > > > > > > > v3d_stats,
> > > > > > > > > which
> > > > > > > > > is heap-allocated and refcounted, outliving the fd as
> > > > > > > > > long as
> > > > > > > > > jobs
> > > > > > > > > reference it.
> > > > > > > > > 
> > > > > > > > > This change allows the removal of the last
> > > > > > > > > `queue_lock`
> > > > > > > > > usage
> > > > > > > > > to
> > > > > > > > > protect
> > > > > > > > > `job->file_priv` and avoids possible NULL ptr
> > > > > > > > > dereference
> > > > > > > > > issues
> > > > > > > > > due
> > > > > > > > > to
> > > > > > > > > lifetime mismatches.
> > > > > > > > > 
> > > > > > > > > Also, to simplify locking, replace both the global
> > > > > > > > > and
> > > > > > > > > per-fd
> > > > > > > > > locked
> > > > > > > > > reset counters with atomics.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Tvrtko Ursulin
> > > > > > > > > <[email protected]>
> > > > > > > > > Co-developed-by: Maíra Canal <[email protected]>
> > > > > > > > > Signed-off-by: Maíra Canal <[email protected]>
> > > > > > > > > ---
> > > > > > > > >      drivers/gpu/drm/v3d/v3d_drv.c   | 20 ++++-------
> > > > > > > > > -----
> > > > > > > > > ----
> > > > > > > > >      drivers/gpu/drm/v3d/v3d_drv.h   | 14 ++++-------
> > > > > > > > > ---
> > > > > > > > >      drivers/gpu/drm/v3d/v3d_sched.c |  9 ++-------
> > > > > > > > >      3 files changed, 10 insertions(+), 33
> > > > > > > > > deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c
> > > > > > > > > b/drivers/gpu/drm/v3d/v3d_drv.c
> > > > > > > > > index
> > > > > > > > > aafb402c6ac3118a57df9fc0a0d21d35d48e3b2c..4e77f480814
> > > > > > > > > 5df2
> > > > > > > > > 1746
> > > > > > > > > ff4b
> > > > > > > > > 7058
> > > > > > > > > 089d0d161e3fc 100644
> > > > > > > > > --- a/drivers/gpu/drm/v3d/v3d_drv.c
> > > > > > > > > +++ b/drivers/gpu/drm/v3d/v3d_drv.c
> > > > > > > > > @@ -110,13 +110,13 @@ static int
> > > > > > > > > v3d_get_param_ioctl(struct
> > > > > > > > > drm_device *dev, void *data,
> > > > > > > > >               args->value =
> > > > > > > > > !!drm_gem_get_huge_mnt(dev);
> > > > > > > > >               return 0;
> > > > > > > > >       case DRM_V3D_PARAM_GLOBAL_RESET_COUNTER:
> > > > > > > > > -             mutex_lock(&v3d->reset_lock);
> > > > > > > > > -             args->value = v3d->reset_counter;
> > > > > > > > > -             mutex_unlock(&v3d->reset_lock);
> > > > > > > > > +             args->value = atomic_read(&v3d-
> > > > > > > > > > reset_counter);
> > > > > > > > 
> > > > > > > > Don't we still need to take the reset lock here?
> > > > > > > > Otherwise
> > > > > > > > there
> > > > > > > > would
> > > > > > > > be a chance that we read the counter while a reset is
> > > > > > > > in
> > > > > > > > flight,
> > > > > > > > no?
> > > > > > > 
> > > > > > > I don't see that it would make a difference but maybe I
> > > > > > > am
> > > > > > > not
> > > > > > > seeing
> > > > > > > your concern. It uses atomic_t so the increment versus
> > > > > > > read
> > > > > > > is
> > > > > > > fine.
> > > > > > > Are
> > > > > > > you maybe saying the v3d ABI guarantees reset is 100%
> > > > > > > done
> > > > > > > (so
> > > > > > > not in
> > > > > > > progress, for some definition of progress, because
> > > > > > > hardware
> > > > > > > reset
> > > > > > > is
> > > > > > > done by then, only re-submit and re-start of the software
> > > > > > > state
> > > > > > > is
> > > > > > > poending) if userspace observes an increased global reset
> > > > > > > counter?
> > > > > > > That
> > > > > > > would be surprising and I don't see how it could make a
> > > > > > > practical
> > > > > > > difference, but perhaps could be mitigated by moving the
> > > > > > > atomic_inc
> > > > > > > to
> > > > > > > the end of v3d_gpu_reset_for_timeout(). Or still taking
> > > > > > > the
> > > > > > > lock
> > > > > > > as
> > > > > > > you say.
> > > > > > 
> > > > > > My concern is just that it is possible for the query and
> > > > > > the
> > > > > > reset
> > > > > > to
> > > > > > race and that I think it would make sense for the counter
> > > > > > query
> > > > > > to
> > > > > > include in-flight resets (since what apps really care about
> > > > > > is
> > > > > > whether
> > > > > > a GPU reset happened not if it completed the reset
> > > > > > process).
> > > > > 
> > > > > Then there is no problem I think. Mutex lock or not, in both
> > > > > cases it
> > > > > is
> > > > > not guaranteed reset either is not in progress at the time of
> > > > > the
> > > > > ioctl.
> > > > > Even if the ioctl does not return an increased counter
> > > > > perhaps
> > > > > the
> > > > > reset
> > > > > handler is running but hasn't grabbed the mutex yet.
> > > > > 
> > > > > 
> > > > 
> > > > That's true, but then I wonder: what is the rationale for still
> > > > taking
> > > > the lock when resolving the DRM_V3D_PARAM_CONTEXT_RESET_COUNTER
> > > > query?
> > > 
> > > Good question and I asked myself the same this morning. For full
> > > disclosure I wrote this patch back in Sep'25.. so between then
> > > and
> > > now I
> > > forgot a thing or two.
> > > 
> > > In the latest local branch that I can find I had it without the
> > > mutex
> > > even for DRM_V3D_PARAM_CONTEXT_RESET_COUNTER. Maira, was there a
> > > newer
> > > version somewhere which I forgot about?
> > > 
> > > Mutex would make sense if there was any chance for paired jobs
> > > across
> > > two engines to get reset at the same time. I think Maira was
> > > explaining
> > > to me that could be a possibility, maybe with some future rework.
> > > Unless
> > > I am confusing things.
> > > 
> > > In any case, in the current upstream v3d it indeed looks to be
> > > safe
> > > with
> > > no mutex.
> > 
> > Ok, thanks for checking the history here! Let's see if Maíra has
> > anything to add to this so we have the full picture and then we can
> > decide if we can also drop the remaining lock (and I guess in that
> > case
> > the reset mutex too).
> 
> Initially, I was trying to make sure we had some consistency between
> queues, so that the result for the user is a reliable snapshot from
> all
> queues. So, for example, let's say we are iterating the loop and we
> reached q=3 (CSD) and the BIN queue reset counter is incremented. In
> such scenario, the user would get an outdated information.

I am not sure that we strictly need to handle this scenario, but I am
okay with keeping the lock for this. Maybe we can add a small comment
explaining the rationale. With that patch 5 is also:

Reviewed-by: Iago Toral Quiroga <[email protected]>


> 
> If we take the lock, this scenario wouldn't happen, as the timeout
> hook
> wouldn't be able to take the lock. Feel free to correct me if this
> train
> of thought is mistaken (or if you believe this level of consistency
> is
> not needed).
> 
> About removing the reset mutex, I don't believe it's possible, as we
> need it to make sure that two queues are not resetting at the same
> time.
> Also, I'll use this mutex (and the reset counter) in a future patch
> to
> address redundant resets that happens when two queues are competing
> for
> the `reset_lock`.

Makes sense, thanks Maíra.

Iago

> 
> Best regards,
> - Maíra
> 
> 
> 

Reply via email to