On Mon, 12 Jan 2026 16:03:18 +0000
Steven Price <[email protected]> wrote:

> On 12/01/2026 14:17, Boris Brezillon wrote:
> > On Mon, 12 Jan 2026 12:06:17 +0000
> > Steven Price <[email protected]> wrote:
> >   
> >> On 09/01/2026 13:07, Boris Brezillon wrote:  
> >>> While drm_gem_shmem_object does most of the job we need it to do, the
> >>> way sub-resources (pages, sgt, vmap) are handled and their lifetimes
> >>> gets in the way of BO reclaim. There has been attempts to address
> >>> that [1], but in the meantime, new gem_shmem users were introduced
> >>> (accel drivers), and some of them manually free some of these resources.
> >>> This makes things harder to control/sanitize/validate.
> >>>
> >>> Thomas Zimmerman is not a huge fan of enforcing lifetimes of sub-resources
> >>> and forcing gem_shmem users to go through new gem_shmem helpers when they
> >>> need manual control of some sort, and I believe this is a dead end if
> >>> we don't force users to follow some stricter rules through carefully
> >>> designed helpers, because there will always be one user doing crazy things
> >>> with gem_shmem_object internals, which ends up tripping out the common
> >>> helpers when they are called.
> >>>
> >>> The consensus we reached was that we would be better off forking
> >>> gem_shmem in panthor. So here we are, parting ways with gem_shmem. The
> >>> current transition tries to minimize the changes, but there are still
> >>> some aspects that are different, the main one being that we no longer
> >>> have a pages_use_count, and pages stays around until the GEM object is
> >>> destroyed (or when evicted once we've added a shrinker). The sgt also
> >>> no longer retains pages. This is losely based on how msm does things by
> >>> the way.    
> >>
> >> From a reviewing perspective it's a little tricky trying to match up the
> >> implementation to shmem because of these changes. I don't know how
> >> difficult it would be to split the changes to a patch which literally
> >> copies (with renames) from shmem, followed by simplifying out the parts
> >> we don't want.  
> > 
> > It's a bit annoying as the new implementation is not based on shmem at
> > all, but if you think it helps the review, I can try what you're
> > suggesting. I mean, I'm not convinced it will be significantly easier
> > to review with this extra step, since the new logic is different enough
> > (especially when it comes to resource refcounting) that it needs a
> > careful review anyway (which you started doing here).  
> 
> I wasn't sure how much you had originally based it on shmem. I noticed
> some comments were copied over and in some places it was easy to match
> up. But in others it's much less clear.
> 
> If you haven't actually started from a direct copy of shmem then it's
> probably not going to be much clearer doing that as an extra step. It's
> just in places it looked like you had.

The reason both look similar has more to do with the fact they both
use shmem for their memory allocation than one being a copy of the
other. That's not to say I didn't pick bits and pieces here and there
(including comments), but it didn't start as a full copy followed by
incremental modifications.

> >>  
> >>> + }
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static int panthor_gem_backing_pin_locked(struct panthor_gem_object *bo)
> >>> +{
> >>> + int ret;
> >>> +
> >>> + dma_resv_assert_held(bo->base.resv);
> >>> + drm_WARN_ON_ONCE(bo->base.dev, drm_gem_is_imported(&bo->base));
> >>> +
> >>> + if (refcount_inc_not_zero(&bo->backing.pin_count))
> >>> +         return 0;
> >>> +
> >>> + ret = panthor_gem_backing_get_pages_locked(bo);
> >>> + if (!ret)
> >>> +         refcount_set(&bo->backing.pin_count, 1);
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>> +static void panthor_gem_backing_unpin_locked(struct panthor_gem_object 
> >>> *bo)
> >>> +{
> >>> + dma_resv_assert_held(bo->base.resv);
> >>> + drm_WARN_ON_ONCE(bo->base.dev, drm_gem_is_imported(&bo->base));
> >>> +
> >>> + /* We don't release anything when pin_count drops to zero.
> >>> +  * Pages stay there until an explicit cleanup is requested.
> >>> +  */
> >>> + if (!refcount_dec_not_one(&bo->backing.pin_count))
> >>> +         refcount_set(&bo->backing.pin_count, 0);    
> >>
> >> Why not just refcount_dec()?  
> > 
> > Because refcount_dec() complains when it's passed a value that's less
> > than 2. The rational being that you need to do something special
> > (release resources) when you reach zero. In our case we don't, because
> > pages are lazily reclaimed, so we just set the counter back to zero.  
> 
> Ah, yes I'd misread the "old <= 1" check as "old < 1". Hmm, I dislike it
> because it's breaking the atomicity - if another thread does a increment
> between the two operations then we lose a reference count.

I don't think we do, because any 0 <-> 1 transition needs to happen
with the resv lock held (see the dma_resv_assert_held() in both
panthor_gem_backing_unpin_locked() and
panthor_gem_backing_pin_locked()).

> 
> It does make me think that perhaps the refcount APIs are not designed
> for this case and perhaps we should just use atomics directly.

It's the lazy/deferred put_pages() that makes it look weird I think,
but for the rest, refcount_t looks like the right tool (!locked variants
and even _pin_locked() look sane).

Reply via email to