Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-26 Thread Philipp Stanner
On Tue, 2024-09-24 at 14:02 +0200, Christian König wrote:
> Am 24.09.24 um 11:58 schrieb Tvrtko Ursulin:
> > 
> > On 24/09/2024 10:45, Tvrtko Ursulin wrote:
> > > 
> > > On 24/09/2024 09:20, Christian König wrote:
> > > > Am 16.09.24 um 19:30 schrieb Tvrtko Ursulin:
> > > > > From: Tvrtko Ursulin 
> > > > > 
> > > > > Having removed one re-lock cycle on the entity->lock in a
> > > > > patch titled
> > > > > "drm/sched: Optimise drm_sched_entity_push_job", with only a
> > > > > tiny bit
> > > > > larger refactoring we can do the same optimisation on the rq-
> > > > > >lock.
> > > > > (Currently both drm_sched_rq_add_entity() and
> > > > > drm_sched_rq_update_fifo_locked() take and release the same
> > > > > lock.)
> > > > > 
> > > > > To achieve this we make drm_sched_rq_update_fifo_locked() and
> > > > > drm_sched_rq_add_entity() expect the rq->lock to be held.
> > > > > 
> > > > > We also align drm_sched_rq_update_fifo_locked(),
> > > > > drm_sched_rq_add_entity() and
> > > > > drm_sched_rq_remove_fifo_locked() function signatures, by
> > > > > adding rq 
> > > > > as a
> > > > > parameter to the latter.
> > > > > 
> > > > > v2:
> > > > >   * Fix after rebase of the series.
> > > > >   * Avoid naming incosistency between
> > > > > drm_sched_rq_add/remove. 
> > > > > (Christian)
> > > > > 
> > > > > Signed-off-by: Tvrtko Ursulin 
> > > > > Cc: Christian König 
> > > > > Cc: Alex Deucher 
> > > > > Cc: Luben Tuikov 
> > > > > Cc: Matthew Brost 
> > > > > Cc: Philipp Stanner 
> > > > 
> > > > Reviewed-by: Christian König 
> > > 
> > > Thanks!
> > > 
> > > Are you okay to pull into drm-misc-next or we should do some more
> > > testing on this?
> > > 
> > > And/or should I resend the series once more in it's entirety so
> > > this 
> > > v2 is not a reply-to to the original?
> > 
> > I have to respin for the drm_sched_wakeup fix that landed.
> 
> When I should push the series to drm-misc-next then please send it to
> me 
> once more.
> 
> On the other hand we should now have to maintainers for that.

Yup, will pick up this responsibilities soonish. Danilo and I have been
on conference recently and I'm out of office soon for a bit, but you
can expect me / us to take over that work soonish in early autumn.

Regards,
P.

> 
> Regards,
> Christian.
> 
> > 
> > Regards,
> > 
> > Tvrtko
> > 
> > > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > 
> > > > > ---
> > > > >   drivers/gpu/drm/scheduler/sched_entity.c | 12 --
> > > > >   drivers/gpu/drm/scheduler/sched_main.c   | 29 
> > > > > 
> > > > >   include/drm/gpu_scheduler.h  |  3 ++-
> > > > >   3 files changed, 26 insertions(+), 18 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> > > > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > index d982cebc6bee..8ace1f1ea66b 100644
> > > > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > > > @@ -515,9 +515,14 @@ struct drm_sched_job 
> > > > > *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> > > > >   next = 
> > > > > to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
> > > > >   if (next) {
> > > > > +    struct drm_sched_rq *rq;
> > > > > +
> > > > >   spin_lock(&entity->lock);
> > > > > -    drm_sched_rq_update_fifo_locked(entity,
> > > > > +    rq = entity->rq;
> > > > > +    spin_lock(&rq->lock);
> > > > > +    drm_sched_rq_update_fifo_locked(entity, rq,
> > > > >   next->submit_ts);
> > > > > +    spin_unlock(&rq->lock);
> > > > >   spin_unlock(&entity->lock);
>

Re: [PATCH v3] drm/scheduler: Improve documentation

2024-09-26 Thread Philipp Stanner
On Tue, 2024-09-24 at 12:03 +0200, Simona Vetter wrote:
> On Sun, Sep 22, 2024 at 05:29:36PM +, Lin, Shuicheng wrote:
> > Hi all,
> > I am not familiar with the process yet. To get it merged, should I
> > add more mail-list or how to notify the maintainers?
> > Thanks in advance for your guide.
> 
> drm/sched is a bit undermaintained, things unfortunately fall through
> cracks. I've picked this up and merged it to drm-misc-next, thanks a
> lot.
> -Sima

Thx!

Feel free to ping Danilo and me in the future. We might be unavailable
at times individually, but in generally will take care of that.

P.

> 
> > 
> > Best Regards
> > Shuicheng
> > 
> > > -Original Message-
> > > From: Lin, Shuicheng 
> > > Sent: Tuesday, September 17, 2024 7:48 AM
> > > To: dri-devel@lists.freedesktop.org
> > > Cc: Lin, Shuicheng ; Philipp Stanner
> > > 
> > > Subject: [PATCH v3] drm/scheduler: Improve documentation
> > > 
> > > Function drm_sched_entity_push_job() doesn't have a return value,
> > > remove the
> > > return value description for it.
> > > Correct several other typo errors.
> > > 
> > > v2 (Philipp):
> > > - more correction with related comments.
> > > 
> > > Signed-off-by: Shuicheng Lin 
> > > Reviewed-by: Philipp Stanner 
> > > ---
> > >  drivers/gpu/drm/scheduler/sched_entity.c | 10 --
> > >  drivers/gpu/drm/scheduler/sched_main.c   |  4 ++--
> > >  include/drm/gpu_scheduler.h  | 12 ++--
> > >  include/linux/dma-resv.h |  6 +++---
> > >  4 files changed, 15 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index 58c8161289fe..ffa3e765f5db 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -51,7 +51,7 @@
> > >   * drm_sched_entity_set_priority(). For changing the set of
> > > schedulers
> > >   * @sched_list at runtime see drm_sched_entity_modify_sched().
> > >   *
> > > - * An entity is cleaned up by callind drm_sched_entity_fini().
> > > See also
> > > + * An entity is cleaned up by calling drm_sched_entity_fini().
> > > See also
> > >   * drm_sched_entity_destroy().
> > >   *
> > >   * Returns 0 on success or a negative error code on failure.
> > > @@ -370,8 +370,8 @@ static void drm_sched_entity_clear_dep(struct
> > > dma_fence *f,  }
> > > 
> > >  /*
> > > - * drm_sched_entity_clear_dep - callback to clear the entities
> > > dependency and
> > > - * wake up scheduler
> > > + * drm_sched_entity_wakeup - callback to clear the entity's
> > > dependency
> > > + and
> > > + * wake up the scheduler
> > >   */
> > >  static void drm_sched_entity_wakeup(struct dma_fence *f,
> > >       struct dma_fence_cb *cb)
> > > @@ -389,7 +389,7 @@ static void drm_sched_entity_wakeup(struct
> > > dma_fence
> > > *f,
> > >   * @entity: scheduler entity
> > >   * @priority: scheduler priority
> > >   *
> > > - * Update the priority of runqueus used for the entity.
> > > + * Update the priority of runqueues used for the entity.
> > >   */
> > >  void drm_sched_entity_set_priority(struct drm_sched_entity
> > > *entity,
> > >      enum drm_sched_priority
> > > priority) @@ -574,8
> > > +574,6 @@ void drm_sched_entity_select_rq(struct drm_sched_entity
> > > *entity)
> > >   * fence sequence number this function should be called with
> > > drm_sched_job_arm()
> > >   * under common lock for the struct drm_sched_entity that was
> > > set up for
> > >   * @sched_job in drm_sched_job_init().
> > > - *
> > > - * Returns 0 for success, negative error code otherwise.
> > >   */
> > >  void drm_sched_entity_push_job(struct drm_sched_job *sched_job) 
> > > { diff --git
> > > a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index ab53ab486fe6..cadf1662bc01 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -41,7 +41,7 @@
> > >   * 4. Entities themselves maintain a queue of jobs t

Re: [PATCH v2] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-26 Thread Philipp Stanner
On Mon, 2024-09-23 at 15:35 +0100, Tvrtko Ursulin wrote:
> 
> Ping Christian and Philipp - reasonably happy with v2? I think it's
> the 
> only unreviewed patch from the series.

Howdy,

sry for the delay, I had been traveling.

I have a few nits below regarding the commit message. Besides, I'm OK
with that, thx for your work :)

> 
> Regards,
> 
> Tvrtko
> 
> On 16/09/2024 18:30, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin 
> > 
> > Having removed one re-lock cycle on the entity->lock in a patch
> > titled
> > "drm/sched: Optimise drm_sched_entity_push_job", 
> > with only a tiny bit
> > larger refactoring we can do the same optimisation 

Well, the commit message does not state which optimization that is. One
would have to look for the previous patch, which you apparently cannot
provide a commit ID for yet because it's not in Big Boss's branch.

In this case I am for including a sentence about what is being
optimized also because

> > on the rq->lock.
> > (Currently both drm_sched_rq_add_entity() and
> > drm_sched_rq_update_fifo_locked() take and release the same lock.)
> > 
> > To achieve this we make drm_sched_rq_update_fifo_locked() and

it's not clear what the "this" that's being achieved is.

> > drm_sched_rq_add_entity() expect the rq->lock to be held.
> > 
> > We also align drm_sched_rq_update_fifo_locked(),
> > drm_sched_rq_add_entity() and
> > drm_sched_rq_remove_fifo_locked() function signatures, by adding rq
> > as a
> > parameter to the latter.
> > 
> > v2:
> >   * Fix after rebase of the series.
> >   * Avoid naming incosistency between drm_sched_rq_add/remove.
> > (Christian)
> > 
> > Signed-off-by: Tvrtko Ursulin 

Reviewed-by: Philipp Stanner 

> > Cc: Christian König 
> > Cc: Alex Deucher 
> > Cc: Luben Tuikov 
> > Cc: Matthew Brost 
> > Cc: Philipp Stanner 
> > ---
> >   drivers/gpu/drm/scheduler/sched_entity.c | 12 --
> >   drivers/gpu/drm/scheduler/sched_main.c   | 29 ---
> > -
> >   include/drm/gpu_scheduler.h  |  3 ++-
> >   3 files changed, 26 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index d982cebc6bee..8ace1f1ea66b 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -515,9 +515,14 @@ struct drm_sched_job
> > *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> >   
> >     next = to_drm_sched_job(spsc_queue_peek(&entity-
> > >job_queue));
> >     if (next) {
> > +   struct drm_sched_rq *rq;
> > +
> >     spin_lock(&entity->lock);
> > -   drm_sched_rq_update_fifo_locked(entity,
> > +   rq = entity->rq;
> > +   spin_lock(&rq->lock);
> > +   drm_sched_rq_update_fifo_locked(entity,
> > rq,
> >     next-
> > >submit_ts);
> > +   spin_unlock(&rq->lock);
> >     spin_unlock(&entity->lock);
> >     }
> >     }
> > @@ -618,11 +623,14 @@ void drm_sched_entity_push_job(struct
> > drm_sched_job *sched_job)
> >     sched = rq->sched;
> >   
> >     atomic_inc(sched->score);
> > +
> > +   spin_lock(&rq->lock);
> >     drm_sched_rq_add_entity(rq, entity);
> >   
> >     if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > -   drm_sched_rq_update_fifo_locked(entity,
> > submit_ts);
> > +   drm_sched_rq_update_fifo_locked(entity,
> > rq, submit_ts);
> >   
> > +   spin_unlock(&rq->lock);
> >     spin_unlock(&entity->lock);
> >   
> >     drm_sched_wakeup(sched, entity);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 18a952f73ecb..5c83fb92bb89 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -153,17 +153,18 @@ static __always_inline bool
> > drm_sched_entity_compare_before(struct rb_node *a,
> >     return ktime_before(ent_a->oldest_job_waiting, ent_b-
> > >oldest_job_waiting);
> >   }
> >   
> > -static inline void drm_sched_rq_remove_fifo_locked(

Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini

2024-09-25 Thread Philipp Stanner
On Tue, 2024-09-24 at 13:18 +0200, Simona Vetter wrote:
> On Mon, Sep 23, 2024 at 05:24:10PM +0200, Christian König wrote:
> > Am 20.09.24 um 15:26 schrieb Philipp Stanner:
> > > On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote:
> > > > Am 20.09.24 um 10:57 schrieb Philipp Stanner:
> > > > > On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote:
> > > > > > Tearing down the scheduler with jobs still on the pending
> > > > > > list
> > > > > > can
> > > > > > lead to use after free issues. Add a warning if drivers try
> > > > > > to
> > > > > > destroy a scheduler which still has work pushed to the HW.
> > > > > Did you have time yet to look into my proposed waitque-
> > > > > solution?
> > > > I don't remember seeing anything. What have I missed?
> > > https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/
> > 
> > Mhm, I didn't got that in my inbox for some reason.
> > 
> > Interesting approach, I'm just not sure if we can or should wait in
> > drm_sched_fini().

We do agree that jobs still pending when drm_sched_fini() starts is
always a bug, right?

If so, what are the disadvantages of waiting in drm_sched_fini()? We
could block buggy drivers as I see it. Which wouldn't be good, but
could then be fixed on drivers' site.

> > 
> > Probably better to make that a separate function, something like
> > drm_sched_flush() or similar.

We could do that. Such a function could then be called by drivers which
are not sure whether all jobs are done before they start tearing down.

> 
> Yeah I don't think we should smash this into drm_sched_fini
> unconditionally. I think conceptually there's about three cases:
> 
> - Ringbuffer schedules. Probably want everything as-is, because
>   drm_sched_fini is called long after all the entities are gone in
>   drm_device cleanup.
> 
> - fw scheduler hardware with preemption support. There we probably
> want to
>   nuke the context by setting the tdr timeout to zero (or maybe just
> as
>   long as context preemption takes to be efficient), and relying on
> the
>   normal gpu reset flow to handle things. drm_sched_entity_flush
> kinda
>   does this, except not really and it's a lot more focused on the
>   ringbuffer context. So maybe we want a new drm_sched_entity_kill.
> 
>   For this case calling drm_sched_fini() after the 1:1 entity is gone
>   should not find any linger jobs, it would actually be a bug
> somewhere if
>   there's a job lingering. Maybe a sanity check that there's not just
> no
>   jobs lingering, but also no entity left would be good here?

The check for lingering ones is in Christian's patch here IISC.
At which position would you imagine the check for the entity being
performed?

> 
> - fw scheduler without preemption support. There we kinda need the
>   drm_sched_flush, except blocking in fops->close is not great. So
> instead
>   I think the following is better:
>   1. drm_sched_entity_stopped, which only stops new submissions (for
>   paranoia) but doesn't tear down the entity

Who would call that function?
Drivers using it voluntarily could just as well stop accepting new jobs
from userspace to their entities, couldn't they?

>   2. drm_dev_get
>   3. launch a worker which does a) drm_sched_flush (or
>   drm_sched_entity_flush or whatever we call it) b)
> drm_sched_entity_fini
>   + drm_sched_fini c) drm_dev_put
> 
>   Note that semantically this implements the refcount in the other
> path
>   from Phillip:
> 
>  
> https://lore.kernel.org/all/20240903094531.29893-2-pstan...@redhat.com/
>   
>   Except it doesn't impose refcount on everyone else who doesn't need
> it,
>   and it doesn't even impose refcounting on drivers that do need it
>   because we use drm_sched_flush and a worker to achieve the same.

I indeed wasn't happy with the refcount approach for that reason,
agreed.

> 
> Essentially helper functions for the common use-cases instead of
> trying to
> solve them all by putting drm_sched_flush as a potentially very
> blocking
> function into drm_sched_fini.

I'm still not able to see why it blocking would be undesired – as far
as I can see, it is only invoked on driver teardown, so not during
active operation. Teardown doesn't happen that often, and it can (if
implemented correctly) only block until the driver's code has signaled
the last fences. If that doesn't happen, the block would reveal a bug.

But don't get me wrong: I don't want to *push* this solution. I just
want to understand when

Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini

2024-09-20 Thread Philipp Stanner
On Fri, 2024-09-20 at 12:33 +0200, Christian König wrote:
> Am 20.09.24 um 10:57 schrieb Philipp Stanner:
> > On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote:
> > > Tearing down the scheduler with jobs still on the pending list
> > > can
> > > lead to use after free issues. Add a warning if drivers try to
> > > destroy a scheduler which still has work pushed to the HW.
> > Did you have time yet to look into my proposed waitque-solution?
> 
> I don't remember seeing anything. What have I missed?

https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/

> 
> > 
> > > When there are still entities with jobs the situation is even
> > > worse
> > > since the dma_fences for those jobs can never signal we can just
> > > choose between potentially locking up core memory management and
> > > random memory corruption. When drivers really mess it up that
> > > well
> > > let them run into a BUG_ON().
> > > 
> > > Signed-off-by: Christian König 
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_main.c | 19 ++-
> > >   1 file changed, 18 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index f093616fe53c..8a46fab5cdc8 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct
> > > drm_gpu_scheduler
> > > *sched)
> > I agree with Sima that it should first be documented in the
> > function's
> > docstring what the user is expected to have done before calling the
> > function.
> 
> Good point, going to update the documentation as well.

Cool thing, thx.
Would be great if everything (not totally trivial) necessary to be done
before _fini() is mentioned.

One could also think about providing a hint at how the driver can do
that. AFAICS the only way for the driver to ensure that is to maintain
its own, separate list of submitted jobs.

P.

> 
> Thanks,
> Christian.
> 
> > 
> > P.
> > 
> > >   
> > >   drm_sched_wqueue_stop(sched);
> > >   
> > > + /*
> > > +  * Tearing down the scheduler wile there are still
> > > unprocessed jobs can
> > > +  * lead to use after free issues in the scheduler fence.
> > > +  */
> > > + WARN_ON(!list_empty(&sched->pending_list));
> > > +
> > >   for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs;
> > > i++)
> > > {
> > >   struct drm_sched_rq *rq = sched->sched_rq[i];
> > >   
> > >   spin_lock(&rq->lock);
> > > - list_for_each_entry(s_entity, &rq->entities,
> > > list)
> > > + list_for_each_entry(s_entity, &rq->entities,
> > > list) {
> > > + /*
> > > +  * The justification for this BUG_ON()
> > > is
> > > that tearing
> > > +  * down the scheduler while jobs are
> > > pending
> > > leaves
> > > +  * dma_fences unsignaled. Since we have
> > > dependencies
> > > +  * from the core memory management to
> > > eventually signal
> > > +  * dma_fences this can trivially lead to
> > > a
> > > system wide
> > > +  * stop because of a locked up memory
> > > management.
> > > +  */
> > > + BUG_ON(spsc_queue_count(&s_entity-
> > > > job_queue));
> > > +
> > >   /*
> > >    * Prevents reinsertion and marks
> > > job_queue
> > > as idle,
> > >    * it will removed from rq in
> > > drm_sched_entity_fini
> > >    * eventually
> > >    */
> > >   s_entity->stopped = true;
> > > + }
> > >   spin_unlock(&rq->lock);
> > >   kfree(sched->sched_rq[i]);
> > >   }
> 



Re: [PATCH 1/2] drm/sched: add WARN_ON and BUG_ON to drm_sched_fini

2024-09-20 Thread Philipp Stanner
On Wed, 2024-09-18 at 15:39 +0200, Christian König wrote:
> Tearing down the scheduler with jobs still on the pending list can
> lead to use after free issues. Add a warning if drivers try to
> destroy a scheduler which still has work pushed to the HW.

Did you have time yet to look into my proposed waitque-solution?

> 
> When there are still entities with jobs the situation is even worse
> since the dma_fences for those jobs can never signal we can just
> choose between potentially locking up core memory management and
> random memory corruption. When drivers really mess it up that well
> let them run into a BUG_ON().
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index f093616fe53c..8a46fab5cdc8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1333,17 +1333,34 @@ void drm_sched_fini(struct drm_gpu_scheduler
> *sched)

I agree with Sima that it should first be documented in the function's
docstring what the user is expected to have done before calling the
function.

P.

>  
>   drm_sched_wqueue_stop(sched);
>  
> + /*
> +  * Tearing down the scheduler wile there are still
> unprocessed jobs can
> +  * lead to use after free issues in the scheduler fence.
> +  */
> + WARN_ON(!list_empty(&sched->pending_list));
> +
>   for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++)
> {
>   struct drm_sched_rq *rq = sched->sched_rq[i];
>  
>   spin_lock(&rq->lock);
> - list_for_each_entry(s_entity, &rq->entities, list)
> + list_for_each_entry(s_entity, &rq->entities, list) {
> + /*
> +  * The justification for this BUG_ON() is
> that tearing
> +  * down the scheduler while jobs are pending
> leaves
> +  * dma_fences unsignaled. Since we have
> dependencies
> +  * from the core memory management to
> eventually signal
> +  * dma_fences this can trivially lead to a
> system wide
> +  * stop because of a locked up memory
> management.
> +  */
> + BUG_ON(spsc_queue_count(&s_entity-
> >job_queue));
> +
>   /*
>    * Prevents reinsertion and marks job_queue
> as idle,
>    * it will removed from rq in
> drm_sched_entity_fini
>    * eventually
>    */
>   s_entity->stopped = true;
> + }
>   spin_unlock(&rq->lock);
>   kfree(sched->sched_rq[i]);
>   }



Re: [PATCH v2] drm/scheduler: Improve documentation

2024-09-17 Thread Philipp Stanner
On Mon, 2024-09-16 at 21:05 +, Shuicheng Lin wrote:
> Function drm_sched_entity_push_job() doesn't have return value,

Doesn't have *a* return value

> remove the return value description for it.
> Correct several other typo errors.
> 
> v2 (Philipp):
> - more correction with related comments.
> 
> Signed-off-by: Shuicheng Lin 

Except for the nit above, looks good to me:

Reviewed-by: Philipp Stanner 


Thx

> Cc: Philipp Stanner 
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 10 --
>  drivers/gpu/drm/scheduler/sched_main.c   |  4 ++--
>  include/drm/gpu_scheduler.h  | 12 ++--
>  include/linux/dma-resv.h |  6 +++---
>  4 files changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 58c8161289fe..ffa3e765f5db 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -51,7 +51,7 @@
>   * drm_sched_entity_set_priority(). For changing the set of
> schedulers
>   * @sched_list at runtime see drm_sched_entity_modify_sched().
>   *
> - * An entity is cleaned up by callind drm_sched_entity_fini(). See
> also
> + * An entity is cleaned up by calling drm_sched_entity_fini(). See
> also
>   * drm_sched_entity_destroy().
>   *
>   * Returns 0 on success or a negative error code on failure.
> @@ -370,8 +370,8 @@ static void drm_sched_entity_clear_dep(struct
> dma_fence *f,
>  }
>  
>  /*
> - * drm_sched_entity_clear_dep - callback to clear the entities
> dependency and
> - * wake up scheduler
> + * drm_sched_entity_wakeup - callback to clear the entity's
> dependency and
> + * wake up the scheduler
>   */
>  static void drm_sched_entity_wakeup(struct dma_fence *f,
>       struct dma_fence_cb *cb)
> @@ -389,7 +389,7 @@ static void drm_sched_entity_wakeup(struct
> dma_fence *f,
>   * @entity: scheduler entity
>   * @priority: scheduler priority
>   *
> - * Update the priority of runqueus used for the entity.
> + * Update the priority of runqueues used for the entity.
>   */
>  void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>      enum drm_sched_priority priority)
> @@ -574,8 +574,6 @@ void drm_sched_entity_select_rq(struct
> drm_sched_entity *entity)
>   * fence sequence number this function should be called with
> drm_sched_job_arm()
>   * under common lock for the struct drm_sched_entity that was set up
> for
>   * @sched_job in drm_sched_job_init().
> - *
> - * Returns 0 for success, negative error code otherwise.
>   */
>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>  {
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index ab53ab486fe6..cadf1662bc01 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -41,7 +41,7 @@
>   * 4. Entities themselves maintain a queue of jobs that will be
> scheduled on
>   *    the hardware.
>   *
> - * The jobs in a entity are always scheduled in the order that they
> were pushed.
> + * The jobs in an entity are always scheduled in the order in which
> they were pushed.
>   *
>   * Note that once a job was taken from the entities queue and pushed
> to the
>   * hardware, i.e. the pending queue, the entity must not be
> referenced anymore
> @@ -1339,7 +1339,7 @@ void drm_sched_fini(struct drm_gpu_scheduler
> *sched)
>   list_for_each_entry(s_entity, &rq->entities, list)
>   /*
>    * Prevents reinsertion and marks job_queue
> as idle,
> -  * it will removed from rq in
> drm_sched_entity_fini
> +  * it will be removed from the rq in
> drm_sched_entity_fini()
>    * eventually
>    */
>   s_entity->stopped = true;
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index fe8edb917360..ef23113451e4 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -33,11 +33,11 @@
>  #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
>  
>  /**
> - * DRM_SCHED_FENCE_DONT_PIPELINE - Prefent dependency pipelining
> + * DRM_SCHED_FENCE_DONT_PIPELINE - Prevent dependency pipelining
>   *
>   * Setting this flag on a scheduler fence prevents pipelining of
> jobs depending
>   * on this fence. In other words we always insert a full CPU round
> trip before
> - * dependen jobs are pushed to the hw queue.
> + 

[PATCH] MAINTAINERS: drm/sched: Add new maintainers

2024-09-16 Thread Philipp Stanner
DRM's GPU scheduler is arguably in need of more intensive maintenance.
Danilo and Philipp volunteer to help with the maintainership.

Signed-off-by: Philipp Stanner 
Cc: Christian König 
Cc: Luben Tuikov 
Cc: Matthew Brost 
Cc: Danilo Krummrich 
Cc: Tvrtko Ursulin 

---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 10430778c998..fc2d8bf3ee74 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7710,6 +7710,8 @@ F:drivers/gpu/drm/xlnx/
 DRM GPU SCHEDULER
 M: Luben Tuikov 
 M: Matthew Brost 
+M: Danilo Krummrich 
+M: Philipp Stanner 
 L: dri-devel@lists.freedesktop.org
 S: Maintained
 T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
-- 
2.46.0



Re: [PATCH] drm/scheduler: correct comments relate to scheduler

2024-09-16 Thread Philipp Stanner
Hi,

I would call the commit "drm/scheduler: Improve documentation"

On Sun, 2024-09-15 at 15:52 +, Shuicheng Lin wrote:
> function drm_sched_entity_push_job doesn't have return value,

s/function/Function

It's also nice to always terminate a function's name with its
parenthesis: drm_sched_entity_push_job()

> remove the return value description for it.
> Correct several other typo errors.
> 
> Signed-off-by: Shuicheng Lin 
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c |  8 +++-
>  drivers/gpu/drm/scheduler/sched_main.c   |  4 ++--
>  include/drm/gpu_scheduler.h  | 12 ++--
>  include/linux/dma-resv.h |  4 ++--
>  4 files changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 58c8161289fe..4d6a05fc35ca 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -51,7 +51,7 @@
>   * drm_sched_entity_set_priority(). For changing the set of
> schedulers
>   * @sched_list at runtime see drm_sched_entity_modify_sched().
>   *
> - * An entity is cleaned up by callind drm_sched_entity_fini(). See
> also
> + * An entity is cleaned up by calling drm_sched_entity_fini(). See
> also
>   * drm_sched_entity_destroy().
>   *
>   * Returns 0 on success or a negative error code on failure.
> @@ -370,7 +370,7 @@ static void drm_sched_entity_clear_dep(struct
> dma_fence *f,
>  }
>  
>  /*
> - * drm_sched_entity_clear_dep - callback to clear the entities
> dependency and
> + * drm_sched_entity_wakeup - callback to clear the entities
> dependency and

While you're at it:

s/entities dependency/entity's dependency

>   * wake up scheduler
>   */
>  static void drm_sched_entity_wakeup(struct dma_fence *f,
> @@ -389,7 +389,7 @@ static void drm_sched_entity_wakeup(struct
> dma_fence *f,
>   * @entity: scheduler entity
>   * @priority: scheduler priority
>   *
> - * Update the priority of runqueus used for the entity.
> + * Update the priority of runqueues used for the entity.
>   */
>  void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
>      enum drm_sched_priority priority)
> @@ -574,8 +574,6 @@ void drm_sched_entity_select_rq(struct
> drm_sched_entity *entity)
>   * fence sequence number this function should be called with
> drm_sched_job_arm()
>   * under common lock for the struct drm_sched_entity that was set up
> for
>   * @sched_job in drm_sched_job_init().
> - *
> - * Returns 0 for success, negative error code otherwise.
>   */
>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>  {
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index f093616fe53c..6e8c7651bd95 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -41,7 +41,7 @@
>   * 4. Entities themselves maintain a queue of jobs that will be
> scheduled on
>   *    the hardware.
>   *
> - * The jobs in a entity are always scheduled in the order that they
> were pushed.
> + * The jobs in an entity are always scheduled in the order that they
> were pushed.

"in the order in which they were ..."?

>   *
>   * Note that once a job was taken from the entities queue and pushed
> to the
>   * hardware, i.e. the pending queue, the entity must not be
> referenced anymore
> @@ -1340,7 +1340,7 @@ void drm_sched_fini(struct drm_gpu_scheduler
> *sched)
>   list_for_each_entry(s_entity, &rq->entities, list)
>   /*
>    * Prevents reinsertion and marks job_queue
> as idle,
> -  * it will removed from rq in
> drm_sched_entity_fini
> +  * it will be removed from rq in
> drm_sched_entity_fini

"from the rq"?

s/drm_sched_entity_fini/drm_sched_entity_fini()

>    * eventually
>    */
>   s_entity->stopped = true;
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index a8d19b10f9b8..9e1b12ca84b9 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -33,11 +33,11 @@
>  #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
>  
>  /**
> - * DRM_SCHED_FENCE_DONT_PIPELINE - Prefent dependency pipelining
> + * DRM_SCHED_FENCE_DONT_PIPELINE - Prevent dependency pipelining
>   *
>   * Setting this flag on a scheduler fence prevents pipelining of
> jobs depending
>   * on this fence. In other words we always insert a full CPU round
> trip before
> - * dependen jobs are pushed to the hw queue.
> + * dependent jobs are pushed to the hw queue.
>   */
>  #define DRM_SCHED_FENCE_DONT_PIPELINEDMA_FENCE_FLAG_USER_BITS
>  
> @@ -71,7 +71,7 @@ enum drm_sched_priority {
>   DRM_SCHED_PRIORITY_COUNT
>  };
>  
> -/* Used to chose between FIFO and RR jobs scheduling */
> +/* Used to choose between FIFO and RR jobs scheduling */


Re: [PATCH] drm/sched: Fix dynamic job-flow control race

2024-09-16 Thread Philipp Stanner
On Fri, 2024-09-13 at 09:53 -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> Fixes a race condition reported here:
> https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609

As Danilo suggested before, I'd put this in a Fixes: section at the
bottom and instead have a sentence here detailing what the race
consists of, i.e., who is racing with whom.

P.

> 
> The whole premise of lockless access to a single-producer-single-
> consumer queue is that there is just a single producer and single
> consumer.  That means we can't call drm_sched_can_queue() (which is
> about queueing more work to the hw, not to the spsc queue) from
> anywhere other than the consumer (wq).
> 
> This call in the producer is just an optimization to avoid scheduling
> the consuming worker if it cannot yet queue more work to the hw.  It
> is safe to drop this optimization to avoid the race condition.
> 
> Suggested-by: Asahi Lina 
> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control")
> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index ab53ab486fe6..1af1dbe757d5 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1020,8 +1020,7 @@ EXPORT_SYMBOL(drm_sched_job_cleanup);
>  void drm_sched_wakeup(struct drm_gpu_scheduler *sched,
>     struct drm_sched_entity *entity)
>  {
> - if (drm_sched_can_queue(sched, entity))
> - drm_sched_run_job_queue(sched);
> + drm_sched_run_job_queue(sched);
>  }
>  
>  /**



Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-16 Thread Philipp Stanner
On Fri, 2024-09-13 at 17:05 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Current kerneldoc for struct drm_sched_rq incompletely documents what
> fields are protected by the lock.
> 
> This is not good because it is misleading.
> 
> Lets fix it by listing all the elements which are protected by the
> lock.
> 
> While at it, lets also re-order the members so all protected by the
> lock
> are in a single group.
> 
> v2:
>  * Refer variables by kerneldoc syntax, more verbose commit text.
> (Philipp)
> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Luben Tuikov 
> Cc: Matthew Brost 
> Cc: Philipp Stanner 
> Reviewed-by: Christian König 

Looks good, thx

Reviewed-by: Philipp Stanner 


> ---
>  include/drm/gpu_scheduler.h | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index 38465b78c7d5..2f58af00f792 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -243,10 +243,10 @@ struct drm_sched_entity {
>  /**
>   * struct drm_sched_rq - queue of entities to be scheduled.
>   *
> - * @lock: to modify the entities list.
>   * @sched: the scheduler to which this rq belongs to.
> - * @entities: list of the entities to be scheduled.
> + * @lock: protects @entities, @rb_tree_root and @current_entity.

nit: in case you'll provide a new version anyways you could consider
sorting these three to be congruent with the lines below.

Thank you!
P.


>   * @current_entity: the entity which is to be scheduled.
> + * @entities: list of the entities to be scheduled.
>   * @rb_tree_root: root of time based priory queue of entities for
> FIFO scheduling
>   *
>   * Run queue is a set of entities scheduling command submissions for
> @@ -254,10 +254,12 @@ struct drm_sched_entity {
>   * the next entity to emit commands from.
>   */
>  struct drm_sched_rq {
> - spinlock_t  lock;
>   struct drm_gpu_scheduler*sched;
> - struct list_headentities;
> +
> + spinlock_t  lock;
> + /* Following members are protected by the @lock: */
>   struct drm_sched_entity *current_entity;
> + struct list_headentities;
>   struct rb_root_cached   rb_tree_root;
>  };
>  



Re: [PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

2024-09-13 Thread Philipp Stanner
On Fri, 2024-09-13 at 12:56 +0100, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 28/08/2024 10:41, Philipp Stanner wrote:
> > drm_sched_job_init() has no control over how users allocate struct
> > drm_sched_job. Unfortunately, the function can also not set some
> > struct
> > members such as job->sched.
> 
> job->sched usage from within looks like a bug. But not related to the
> memset you add.
> 
> For this one something like this looks easiest for a start:
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index ab53ab486fe6..877113b01af2 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -788,7 +788,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   * or worse--a blank screen--leave a trail in the
>   * logs, so this can be debugged easier.
>   */
> -   drm_err(job->sched, "%s: entity has no rq!\n",
> __func__);
> +   pr_err("%s: entity has no rq!\n", __func__);
>  return -ENOENT;
>  }
> 
> Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to
> variable 
> number of run-queues")
> Cc:  # v6.7+

Danilo and I already solved that:

https://lore.kernel.org/all/20240827074521.12828-2-pstan...@redhat.com/


> 
> > This could theoretically lead to UB by users dereferencing the
> > struct's
> > pointer members too early.
> 
> Hmm if drm_sched_job_init returned an error callers should not 
> dereference anything. What was actually the issue you were debugging?

I was learning about the scheduler, wrote a dummy driver and had
awkward behavior. Turned out it was this pointer not being initialized.
I would have seen it immediately if it were NULL.

The actual issue was and is IMO that a function called
drm_sched_job_init() initializes the job. But it doesn't, it only
partially initializes it. Only after drm_sched_job_arm() ran you're
actually ready to go.

> 
> Adding a memset is I think not the best solution since it is very
> likely 
> redundant to someone doing a kzalloc in the first place.

It is redundant in most cases, but it is effectively for free. I
measured the runtime with 1e6 jobs with and without memset and there
was no difference.


P.

> 
> Regards,
> 
> Tvrtko
> 
> > It is easier to debug such issues if these pointers are initialized
> > to
> > NULL, so dereferencing them causes a NULL pointer exception.
> > Accordingly, drm_sched_entity_init() does precisely that and
> > initializes
> > its struct with memset().
> > 
> > Initialize parameter "job" to 0 in drm_sched_job_init().
> > 
> > Signed-off-by: Philipp Stanner 
> > ---
> > No changes in v2.
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 8 
> >   1 file changed, 8 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 356c30fa24a8..b0c8ad10b419 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job
> > *job,
> >     return -EINVAL;
> >     }
> >   
> > +   /*
> > +* We don't know for sure how the user has allocated.
> > Thus, zero the
> > +* struct so that unallowed (i.e., too early) usage of
> > pointers that
> > +* this function does not set is guaranteed to lead to a
> > NULL pointer
> > +* exception instead of UB.
> > +*/
> > +   memset(job, 0, sizeof(*job));
> > +
> >     job->entity = entity;
> >     job->credits = credits;
> >     job->s_fence = drm_sched_fence_alloc(entity, owner);
> 



Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-13 Thread Philipp Stanner
On Wed, 2024-09-11 at 13:22 +0100, Tvrtko Ursulin wrote:
> 
> On 10/09/2024 11:25, Philipp Stanner wrote:
> > On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin 
> > > 
> > > Having removed one re-lock cycle on the entity->lock in a patch
> > > titled
> > > "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny
> > > bit
> > > larger refactoring we can do the same optimisation on the rq-
> > > >lock
> > > (Currently both drm_sched_rq_add_entity() and
> > > drm_sched_rq_update_fifo_locked() take and release the same
> > > lock.)
> > > 
> > > To achieve this we rename drm_sched_rq_add_entity() to
> > > drm_sched_rq_add_entity_locked(), making it expect the rq->lock
> > > to be
> > > held, and also add the same expectation to
> > > drm_sched_rq_update_fifo_locked().
> > > 
> > > For more stream-lining we also add the run-queue as an explicit
> > > parameter
> > > to drm_sched_rq_remove_fifo_locked() to avoid both callers and
> > > callee
> > > having to dereference entity->rq.
> > 
> > Why is dereferencing it a problem?
> 
> As you have noticed below the API is a bit unsightly. Consider for 
> example this call chain:
> 
> drm_sched_entity_kill(entity)
>  drm_sched_rq_remove_entity(entity->rq, entity);
>  drm_sched_rq_remove_fifo_locked(entity);
>  struct drm_sched_rq *rq = entity->rq;
> 
> A bit confused, no?
> 
> I thought adding rq to remove_fifo_locked at least removes one back
> and 
> forth between the entity->rq and rq.
> 
> And then if we cache the rq in a local variable, after having
> explicitly 
> taken the correct lock, we have this other call chain example:
> 
> drm_sched_entity_push_job()
> ...
>  rq = entity->rq;
>  spin_lock(rq->lock);
> 
>  drm_sched_rq_add_entity_locked(rq, entity);
>  drm_sched_rq_update_fifo_locked(rq, entity, submit_ts);
> 
>  spin_unlock(rq->lock);
> 
> To me at least this reads more streamlined.

Alright, doesn't sound to bad, but

> 
> > > Signed-off-by: Tvrtko Ursulin 
> > > Cc: Christian König 
> > > Cc: Alex Deucher 
> > > Cc: Luben Tuikov 
> > > Cc: Matthew Brost 
> > > Cc: Philipp Stanner 
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_entity.c |  7 ++--
> > >   drivers/gpu/drm/scheduler/sched_main.c   | 41 +
> > > -
> > > --
> > >   include/drm/gpu_scheduler.h  |  7 ++--
> > >   3 files changed, 31 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index b4c4f9923e0b..2102c726d275 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct
> > > drm_sched_job *sched_job)
> > >   sched = rq->sched;
> > >   
> > >   atomic_inc(sched->score);
> > > - drm_sched_rq_add_entity(rq, entity);
> > > +
> > > + spin_lock(&rq->lock);
> > > + drm_sched_rq_add_entity_locked(rq, entity);
> > >   
> > >   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> > > - drm_sched_rq_update_fifo_locked(entity,
> > > submit_ts);
> > > + drm_sched_rq_update_fifo_locked(entity,
> > > rq,
> > > submit_ts);
> > >   
> > > + spin_unlock(&rq->lock);
> > >   spin_unlock(&entity->lock);
> > >   
> > >   drm_sched_wakeup(sched, entity);
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 937e7d1cfc49..1ccd2aed2d32 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -153,41 +153,44 @@ static __always_inline bool
> > > drm_sched_entity_compare_before(struct rb_node *a,
> > >   return ktime_before(ent_a->oldest_job_waiting, ent_b-
> > > > oldest_job_waiting);
> > >   }
> > >   
> > > -static inline void drm_sched_rq_remove_fifo_locked(struct
> > > drm_sched_entity *entity)
> > > +static void drm_sched_rq_remove_fifo_

drm: GPU Scheduler maintainership

2024-09-13 Thread Philipp Stanner
Hi everyone,

it seemed to me in recent weeks that the GPU Scheduler is not that
actively maintained.

At least I haven't seen Luben posting that much, and a trivial patch of
mine [1] has been pending for a while now. We also didn't have that
much discussion yet about looking deeper into the scheduler teardown
[2].

@Luben, Matthew: How's it going, are you still passionate about the
scheduler? Can one help you with anything?

I certainly would be willing to help, but at this point would judge
that I understand it far too badly to do more than reviews.

*glances at Christian*
;)


P.


[1] 
https://lore.kernel.org/all/74a7e80ea893c2b7fefbd0ae3b53881ddf789c3f.ca...@redhat.com/
[2] https://lore.kernel.org/all/20240903094446.29797-2-pstan...@redhat.com/



Re: [PATCH 1/7] dma-buf: add WARN_ON() illegal dma-fence signaling

2024-09-13 Thread Philipp Stanner
On Thu, 2024-09-12 at 16:55 +0200, Christian König wrote:
> Am 11.09.24 um 11:44 schrieb Philipp Stanner:
> > On Wed, 2024-09-11 at 10:58 +0200, Christian König wrote:
> > > Calling the signaling a NULL fence is obviously a coding error in
> > > a
> > > driver. Those functions unfortunately just returned silently
> > > without
> > > raising a warning.
> > Good catch
> > 
> > > Signed-off-by: Christian König 
> > > ---
> > >   drivers/dma-buf/dma-fence.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-
> > > fence.c
> > > index 0393a9bba3a8..325a263ac798 100644
> > > --- a/drivers/dma-buf/dma-fence.c
> > > +++ b/drivers/dma-buf/dma-fence.c
> > > @@ -412,7 +412,7 @@ int dma_fence_signal_timestamp(struct
> > > dma_fence
> > > *fence, ktime_t timestamp)
> > >   unsigned long flags;
> > >   int ret;
> > >   
> > > - if (!fence)
> > > + if (WARN_ON(!fence))
> > >   return -EINVAL;
> > While one can do that, as far as I can see there are only a hand
> > full
> > of users of that function anyways.
> 
> The dma_fence_signal() function has tons of users, it's basically the
> core of the DMA-buf framework.

I meant dma_fence_signal_timestamp() itself.

> 
> > Couldn't one (additionally) add the error check of
> > dma_fenc_signal_timestapm() to those? Like in
> > dma_fenc_allocate_private_stub().
> > 
> > It seems some of them are void functions, though. Hm.
> > There is also the attribute __must_check that could be considered
> > now
> > or in the future for such functions.
> 
> I actually want to remove the error return from dma_fence_signal()
> and 
> the other variants. There is no valid reason that those functions
> should 
> fail.

Makes sense to me.
+1

P.

> 
> The only user is some obscure use case in AMDs KFD driver and I would
> rather like to clean that one up.
> 
> Regards,
> Christian.
> 
> > 
> > Regards,
> > P.
> > 
> > 
> > >   
> > >   spin_lock_irqsave(fence->lock, flags);
> > > @@ -464,7 +464,7 @@ int dma_fence_signal(struct dma_fence *fence)
> > >   int ret;
> > >   bool tmp;
> > >   
> > > - if (!fence)
> > > + if (WARN_ON(!fence))
> > >   return -EINVAL;
> > >   
> > >   tmp = dma_fence_begin_signalling();
> 



Re: [PATCH 1/7] dma-buf: add WARN_ON() illegal dma-fence signaling

2024-09-11 Thread Philipp Stanner
On Wed, 2024-09-11 at 10:58 +0200, Christian König wrote:
> Calling the signaling a NULL fence is obviously a coding error in a
> driver. Those functions unfortunately just returned silently without
> raising a warning.

Good catch

> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-fence.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-
> fence.c
> index 0393a9bba3a8..325a263ac798 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -412,7 +412,7 @@ int dma_fence_signal_timestamp(struct dma_fence
> *fence, ktime_t timestamp)
>   unsigned long flags;
>   int ret;
>  
> - if (!fence)
> + if (WARN_ON(!fence))
>   return -EINVAL;

While one can do that, as far as I can see there are only a hand full
of users of that function anyways.

Couldn't one (additionally) add the error check of
dma_fenc_signal_timestapm() to those? Like in
dma_fenc_allocate_private_stub().

It seems some of them are void functions, though. Hm.
There is also the attribute __must_check that could be considered now
or in the future for such functions.

Regards,
P.


>  
>   spin_lock_irqsave(fence->lock, flags);
> @@ -464,7 +464,7 @@ int dma_fence_signal(struct dma_fence *fence)
>   int ret;
>   bool tmp;
>  
> - if (!fence)
> + if (WARN_ON(!fence))
>   return -EINVAL;
>  
>   tmp = dma_fence_begin_signalling();



Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-10 Thread Philipp Stanner
On Tue, 2024-09-10 at 11:42 +0100, Tvrtko Ursulin wrote:
> 
> On 10/09/2024 11:05, Philipp Stanner wrote:
> > On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin 
> > > 
> > > Lets re-order the members to make it clear which are protected by
> > > the
> > > lock
> > > and at the same time document it via kerneldoc.
> > 
> > I'd prefer if commit messages follow the idiomatic kernel style of
> > that
> > order:
> >     1. Describe the current situation
> >     2. State why it's bad or undesirable
> >     3. (describe the solution)
> >     4. Conclude commit message through sentences in imperative
> > stating
> >    what the commit does.
> > 
> > In this case I would go for:
> > "struct drm_sched_rq contains a spinlock that protects several
> > struct
> > members. The current documentation incorrectly states that this
> > lock
> > only guards the entities list. In truth, it guards that list, the
> > rb_tree and the current entity.
> > 
> > Document what the lock actually guards. Rearrange struct members so
> > that this becomes even more visible."
> 
> IMO a bit much to ask for a text book format, for a trivial patch,
> when 
> all points are already implicitly obvious. That is "lets make it
> clear" 
> = current situation is not clear -> obviously bad with no need to 
> explain; "and the same time document" = means it is currently not 
> documented -> again obviously not desirable.
> 
> But okay, since I agree with the point below (*), I can explode the
> text 
> for maximum redundancy.

I agree that for very short / trivial changes one can keep it short.
But the line separating what is obvious for oneself and for others is
often thin.

P.

> 
> > > Signed-off-by: Tvrtko Ursulin 
> > > Cc: Christian König 
> > > Cc: Alex Deucher 
> > > Cc: Luben Tuikov 
> > > Cc: Matthew Brost 
> > > Cc: Philipp Stanner 
> > > ---
> > >   include/drm/gpu_scheduler.h | 10 ++
> > >   1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/include/drm/gpu_scheduler.h
> > > b/include/drm/gpu_scheduler.h
> > > index a06753987d93..d4a3ba333568 100644
> > > --- a/include/drm/gpu_scheduler.h
> > > +++ b/include/drm/gpu_scheduler.h
> > > @@ -243,10 +243,10 @@ struct drm_sched_entity {
> > >   /**
> > >    * struct drm_sched_rq - queue of entities to be scheduled.
> > >    *
> > > - * @lock: to modify the entities list.
> > >    * @sched: the scheduler to which this rq belongs to.
> > > - * @entities: list of the entities to be scheduled.
> > > + * @lock: protects the list, tree and current entity.
> > 
> > Would be more consistent with the below comment if you'd address
> > them
> > with their full name, aka "protects @entities, @rb_tree_root and
> > @current_entity".
> 
> *) this one I agree with.
> 
> Regards,
> 
> Tvrtko
> 
> > 
> > Thanks,
> > P.
> > 
> > 
> > >    * @current_entity: the entity which is to be scheduled.
> > > + * @entities: list of the entities to be scheduled.
> > >    * @rb_tree_root: root of time based priory queue of entities
> > > for
> > > FIFO scheduling
> > >    *
> > >    * Run queue is a set of entities scheduling command
> > > submissions for
> > > @@ -254,10 +254,12 @@ struct drm_sched_entity {
> > >    * the next entity to emit commands from.
> > >    */
> > >   struct drm_sched_rq {
> > > - spinlock_t lock;
> > >    struct drm_gpu_scheduler *sched;
> > > - struct list_head entities;
> > > +
> > > + spinlock_t lock;
> > > + /* Following members are protected by the @lock: */
> > >    struct drm_sched_entity *current_entity;
> > > + struct list_head entities;
> > >    struct rb_root_cached rb_tree_root;
> > >   };
> > >   
> > 
> 



Re: [PATCH 8/8] drm/sched: Further optimise drm_sched_entity_push_job

2024-09-10 Thread Philipp Stanner
On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Having removed one re-lock cycle on the entity->lock in a patch
> titled
> "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit
> larger refactoring we can do the same optimisation on the rq->lock
> (Currently both drm_sched_rq_add_entity() and
> drm_sched_rq_update_fifo_locked() take and release the same lock.)
> 
> To achieve this we rename drm_sched_rq_add_entity() to
> drm_sched_rq_add_entity_locked(), making it expect the rq->lock to be
> held, and also add the same expectation to
> drm_sched_rq_update_fifo_locked().
> 
> For more stream-lining we also add the run-queue as an explicit
> parameter
> to drm_sched_rq_remove_fifo_locked() to avoid both callers and callee
> having to dereference entity->rq.

Why is dereferencing it a problem?

> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Luben Tuikov 
> Cc: Matthew Brost 
> Cc: Philipp Stanner 
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c |  7 ++--
>  drivers/gpu/drm/scheduler/sched_main.c   | 41 +-
> --
>  include/drm/gpu_scheduler.h  |  7 ++--
>  3 files changed, 31 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index b4c4f9923e0b..2102c726d275 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct
> drm_sched_job *sched_job)
>   sched = rq->sched;
>  
>   atomic_inc(sched->score);
> - drm_sched_rq_add_entity(rq, entity);
> +
> + spin_lock(&rq->lock);
> + drm_sched_rq_add_entity_locked(rq, entity);
>  
>   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
> - drm_sched_rq_update_fifo_locked(entity,
> submit_ts);
> + drm_sched_rq_update_fifo_locked(entity, rq,
> submit_ts);
>  
> + spin_unlock(&rq->lock);
>   spin_unlock(&entity->lock);
>  
>   drm_sched_wakeup(sched, entity);
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 937e7d1cfc49..1ccd2aed2d32 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -153,41 +153,44 @@ static __always_inline bool
> drm_sched_entity_compare_before(struct rb_node *a,
>   return ktime_before(ent_a->oldest_job_waiting, ent_b-
> >oldest_job_waiting);
>  }
>  
> -static inline void drm_sched_rq_remove_fifo_locked(struct
> drm_sched_entity *entity)
> +static void drm_sched_rq_remove_fifo_locked(struct drm_sched_entity
> *entity,
> +     struct drm_sched_rq *rq)

So here we'd add a new function parameter that still doesn't allow for
getting rid of 'entity' as a parameter.

The API gets larger that way and readers will immediately wonder why
sth is passed as a separate variable that could also be obtained
through the pointer.

>  {
> - struct drm_sched_rq *rq = entity->rq;
> -
>   if (!RB_EMPTY_NODE(&entity->rb_tree_node)) {
>   rb_erase_cached(&entity->rb_tree_node, &rq-
> >rb_tree_root);
>   RB_CLEAR_NODE(&entity->rb_tree_node);
>   }
>  }
>  
> -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity
> *entity, ktime_t ts)
> +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity
> *entity,
> +  struct drm_sched_rq *rq,
> +  ktime_t ts)

The function is still called _locked. That implies to the reader that
this function takes care of locking. But it doesn't anymore. Instead,

>  {
>   lockdep_assert_held(&entity->lock);
> + lockdep_assert_held(&rq->lock);
>  
> - spin_lock(&entity->rq->lock);
> -
> - drm_sched_rq_remove_fifo_locked(entity);
> + drm_sched_rq_remove_fifo_locked(entity, rq);
>  
>   entity->oldest_job_waiting = ts;
>  
> - rb_add_cached(&entity->rb_tree_node, &entity->rq-
> >rb_tree_root,
> + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root,
>     drm_sched_entity_compare_before);
> -
> - spin_unlock(&entity->rq->lock);
>  }
>  
>  void drm_sched_rq_update_fifo(struct drm_sched_entity *entity,
> ktime_t ts)
>  {
> + struct drm_sched_rq *rq;
> +
>   /*
>  

Re: [PATCH 6/8] drm/sched: Re-order struct drm_sched_rq members for clarity

2024-09-10 Thread Philipp Stanner
On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Lets re-order the members to make it clear which are protected by the
> lock
> and at the same time document it via kerneldoc.

I'd prefer if commit messages follow the idiomatic kernel style of that
order:
   1. Describe the current situation
   2. State why it's bad or undesirable
   3. (describe the solution)
   4. Conclude commit message through sentences in imperative stating
  what the commit does.

In this case I would go for:
"struct drm_sched_rq contains a spinlock that protects several struct
members. The current documentation incorrectly states that this lock
only guards the entities list. In truth, it guards that list, the
rb_tree and the current entity.

Document what the lock actually guards. Rearrange struct members so
that this becomes even more visible."

> 
> Signed-off-by: Tvrtko Ursulin 
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Luben Tuikov 
> Cc: Matthew Brost 
> Cc: Philipp Stanner 
> ---
>  include/drm/gpu_scheduler.h | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/include/drm/gpu_scheduler.h
> b/include/drm/gpu_scheduler.h
> index a06753987d93..d4a3ba333568 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -243,10 +243,10 @@ struct drm_sched_entity {
>  /**
>   * struct drm_sched_rq - queue of entities to be scheduled.
>   *
> - * @lock: to modify the entities list.
>   * @sched: the scheduler to which this rq belongs to.
> - * @entities: list of the entities to be scheduled.
> + * @lock: protects the list, tree and current entity.

Would be more consistent with the below comment if you'd address them
with their full name, aka "protects @entities, @rb_tree_root and
@current_entity".


Thanks,
P.


>   * @current_entity: the entity which is to be scheduled.
> + * @entities: list of the entities to be scheduled.
>   * @rb_tree_root: root of time based priory queue of entities for
> FIFO scheduling
>   *
>   * Run queue is a set of entities scheduling command submissions for
> @@ -254,10 +254,12 @@ struct drm_sched_entity {
>   * the next entity to emit commands from.
>   */
>  struct drm_sched_rq {
> - spinlock_t lock;
>   struct drm_gpu_scheduler *sched;
> - struct list_head entities;
> +
> + spinlock_t lock;
> + /* Following members are protected by the @lock: */
>   struct drm_sched_entity *current_entity;
> + struct list_head entities;
>   struct rb_root_cached rb_tree_root;
>  };
>  



Re: [PATCH 5/8] drm/sched: Stop setting current entity in FIFO mode

2024-09-10 Thread Philipp Stanner
On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> It does not seem there is a need to set the current entity in FIFO
> mode
> since ot only serves as being a "cursor" in round-robin mode. Even if
> scheduling mode is changed at runtime the change in behaviour is
> simply
> to restart from the first entity, instead of continuing in RR mode
> from
> where FIFO left it, and that sounds completely fine.
> 
> Signed-off-by: Tvrtko Ursulin 

I went through the code and agree that this looks good.

Reviewed-by: Philipp Stanner 

> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Luben Tuikov 
> Cc: Matthew Brost 
> Cc: Philipp Stanner 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 10abbcefe9d8..54c5fe7a7d1d 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -356,7 +356,6 @@ drm_sched_rq_select_entity_fifo(struct
> drm_gpu_scheduler *sched,
>   return ERR_PTR(-ENOSPC);
>   }
>  
> - rq->current_entity = entity;
>   reinit_completion(&entity->entity_idle);
>   break;
>   }



Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-09 Thread Philipp Stanner
On Mon, 2024-09-09 at 14:27 +0100, Tvrtko Ursulin wrote:
> 
> On 09/09/2024 13:46, Philipp Stanner wrote:
> > On Mon, 2024-09-09 at 13:37 +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 09/09/2024 13:18, Christian König wrote:
> > > > Am 09.09.24 um 14:13 schrieb Philipp Stanner:
> > > > > On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote:
> > > > > > Am 09.09.24 um 11:44 schrieb Philipp Stanner:
> > > > > > > On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> > > > > > > > From: Tvrtko Ursulin 
> > > > > > > > 
> > > > > > > > Without the locking amdgpu currently can race
> > > > > > > > amdgpu_ctx_set_entity_priority() and
> > > > > > > > drm_sched_job_arm(),
> > > > > > > I would explicitly say "amdgpu's
> > > > > > > amdgpu_ctx_set_entity_priority()
> > > > > > > races
> > > > > > > through drm_sched_entity_modify_sched() with
> > > > > > > drm_sched_job_arm()".
> > > > > > > 
> > > > > > > The actual issue then seems to be drm_sched_job_arm()
> > > > > > > calling
> > > > > > > drm_sched_entity_select_rq(). I would mention that, too.
> > > > > > > 
> > > > > > > 
> > > > > > > > leading to the
> > > > > > > > latter accesing potentially inconsitent entity-
> > > > > > > > >sched_list
> > > > > > > > and
> > > > > > > > entity->num_sched_list pair.
> > > > > > > > 
> > > > > > > > The comment on drm_sched_entity_modify_sched() however
> > > > > > > > says:
> > > > > > > > 
> > > > > > > > """
> > > > > > > > * Note that this must be called under the same
> > > > > > > > common
> > > > > > > > lock for
> > > > > > > > @entity as
> > > > > > > > * drm_sched_job_arm() and
> > > > > > > > drm_sched_entity_push_job(),
> > > > > > > > or the
> > > > > > > > driver
> > > > > > > > needs to
> > > > > > > > * guarantee through some other means that this is
> > > > > > > > never
> > > > > > > > called
> > > > > > > > while
> > > > > > > > new jobs
> > > > > > > > * can be pushed to @entity.
> > > > > > > > """
> > > > > > > > 
> > > > > > > > It is unclear if that is referring to this race or
> > > > > > > > something
> > > > > > > > else.
> > > > > > > That comment is indeed a bit awkward. Both
> > > > > > > drm_sched_entity_push_job()
> > > > > > > and drm_sched_job_arm() take rq_lock. But
> > > > > > > drm_sched_entity_modify_sched() doesn't.
> > > > > > > 
> > > > > > > The comment was written in 981b04d968561. Interestingly,
> > > > > > > in
> > > > > > > drm_sched_entity_push_job(), this "common lock" is
> > > > > > > mentioned
> > > > > > > with
> > > > > > > the
> > > > > > > soft requirement word "should" and apparently is more
> > > > > > > about
> > > > > > > keeping
> > > > > > > sequence numbers in order when inserting.
> > > > > > > 
> > > > > > > I tend to think that the issue discovered by you is
> > > > > > > unrelated
> > > > > > > to
> > > > > > > that
> > > > > > > comment. But if no one can make sense of the comment,
> > > > > > > should
> > > > > > > it
> > > > > > > maybe
> > > > > > > be removed? Confusing comment is arguably worse than no
> > > > > > > comment
> > > > > > Agree, we probably mixed up in 981b04d968561 that
> > > > > > submission
> > > > > 

Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-09 Thread Philipp Stanner
On Mon, 2024-09-09 at 13:37 +0100, Tvrtko Ursulin wrote:
> 
> On 09/09/2024 13:18, Christian König wrote:
> > Am 09.09.24 um 14:13 schrieb Philipp Stanner:
> > > On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote:
> > > > Am 09.09.24 um 11:44 schrieb Philipp Stanner:
> > > > > On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> > > > > > From: Tvrtko Ursulin 
> > > > > > 
> > > > > > Without the locking amdgpu currently can race
> > > > > > amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(),
> > > > > I would explicitly say "amdgpu's
> > > > > amdgpu_ctx_set_entity_priority()
> > > > > races
> > > > > through drm_sched_entity_modify_sched() with
> > > > > drm_sched_job_arm()".
> > > > > 
> > > > > The actual issue then seems to be drm_sched_job_arm() calling
> > > > > drm_sched_entity_select_rq(). I would mention that, too.
> > > > > 
> > > > > 
> > > > > > leading to the
> > > > > > latter accesing potentially inconsitent entity->sched_list
> > > > > > and
> > > > > > entity->num_sched_list pair.
> > > > > > 
> > > > > > The comment on drm_sched_entity_modify_sched() however
> > > > > > says:
> > > > > > 
> > > > > > """
> > > > > >    * Note that this must be called under the same common
> > > > > > lock for
> > > > > > @entity as
> > > > > >    * drm_sched_job_arm() and drm_sched_entity_push_job(),
> > > > > > or the
> > > > > > driver
> > > > > > needs to
> > > > > >    * guarantee through some other means that this is never
> > > > > > called
> > > > > > while
> > > > > > new jobs
> > > > > >    * can be pushed to @entity.
> > > > > > """
> > > > > > 
> > > > > > It is unclear if that is referring to this race or
> > > > > > something
> > > > > > else.
> > > > > That comment is indeed a bit awkward. Both
> > > > > drm_sched_entity_push_job()
> > > > > and drm_sched_job_arm() take rq_lock. But
> > > > > drm_sched_entity_modify_sched() doesn't.
> > > > > 
> > > > > The comment was written in 981b04d968561. Interestingly, in
> > > > > drm_sched_entity_push_job(), this "common lock" is mentioned
> > > > > with
> > > > > the
> > > > > soft requirement word "should" and apparently is more about
> > > > > keeping
> > > > > sequence numbers in order when inserting.
> > > > > 
> > > > > I tend to think that the issue discovered by you is unrelated
> > > > > to
> > > > > that
> > > > > comment. But if no one can make sense of the comment, should
> > > > > it
> > > > > maybe
> > > > > be removed? Confusing comment is arguably worse than no
> > > > > comment
> > > > Agree, we probably mixed up in 981b04d968561 that submission
> > > > needs a
> > > > common lock and that rq/priority needs to be protected by the
> > > > rq_lock.
> > > > 
> > > > There is also the big FIXME in the drm_sched_entity
> > > > documentation
> > > > pointing out that this is most likely not implemented
> > > > correctly.
> > > > 
> > > > I suggest to move the rq, priority and rq_lock fields together
> > > > in the
> > > > drm_sched_entity structure and document that rq_lock is
> > > > protecting
> > > > the two.
> > > That could also be a great opportunity for improving the lock
> > > naming:
> > 
> > Well that comment made me laugh because I point out the same when
> > the 
> > scheduler came out ~8years ago and nobody cared about it since
> > then.
> > 
> > But yeah completely agree :)
> 
> Maybe, but we need to keep in sight the fact some of these fixes may
> be 
> good to backport. In which case re-naming exercises are best left to
> follow.

My argument basically. It's good if fixes and other improvements are
separated, in general, unless there is 

Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-09 Thread Philipp Stanner
On Mon, 2024-09-09 at 13:29 +0200, Christian König wrote:
> Am 09.09.24 um 11:44 schrieb Philipp Stanner:
> > On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin 
> > > 
> > > Without the locking amdgpu currently can race
> > > amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(),
> > I would explicitly say "amdgpu's amdgpu_ctx_set_entity_priority()
> > races
> > through drm_sched_entity_modify_sched() with drm_sched_job_arm()".
> > 
> > The actual issue then seems to be drm_sched_job_arm() calling
> > drm_sched_entity_select_rq(). I would mention that, too.
> > 
> > 
> > > leading to the
> > > latter accesing potentially inconsitent entity->sched_list and
> > > entity->num_sched_list pair.
> > > 
> > > The comment on drm_sched_entity_modify_sched() however says:
> > > 
> > > """
> > >   * Note that this must be called under the same common lock for
> > > @entity as
> > >   * drm_sched_job_arm() and drm_sched_entity_push_job(), or the
> > > driver
> > > needs to
> > >   * guarantee through some other means that this is never called
> > > while
> > > new jobs
> > >   * can be pushed to @entity.
> > > """
> > > 
> > > It is unclear if that is referring to this race or something
> > > else.
> > That comment is indeed a bit awkward. Both
> > drm_sched_entity_push_job()
> > and drm_sched_job_arm() take rq_lock. But
> > drm_sched_entity_modify_sched() doesn't.
> > 
> > The comment was written in 981b04d968561. Interestingly, in
> > drm_sched_entity_push_job(), this "common lock" is mentioned with
> > the
> > soft requirement word "should" and apparently is more about keeping
> > sequence numbers in order when inserting.
> > 
> > I tend to think that the issue discovered by you is unrelated to
> > that
> > comment. But if no one can make sense of the comment, should it
> > maybe
> > be removed? Confusing comment is arguably worse than no comment
> 
> Agree, we probably mixed up in 981b04d968561 that submission needs a 
> common lock and that rq/priority needs to be protected by the
> rq_lock.
> 
> There is also the big FIXME in the drm_sched_entity documentation 
> pointing out that this is most likely not implemented correctly.
> 
> I suggest to move the rq, priority and rq_lock fields together in the
> drm_sched_entity structure and document that rq_lock is protecting
> the two.

That could also be a great opportunity for improving the lock naming:

void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, ktime_t ts)
{
/*
 * Both locks need to be grabbed, one to protect from entity->rq change
 * for entity from within concurrent drm_sched_entity_select_rq and the
 * other to update the rb tree structure.
 */
spin_lock(&entity->rq_lock);
spin_lock(&entity->rq->lock);

[...]


P.


> 
> Then audit the code if all users of rq and priority actually hold the
> correct locks while reading and writing them.
> 
> Regards,
> Christian.
> 
> > 
> > P.
> > 
> > > Signed-off-by: Tvrtko Ursulin 
> > > Fixes: b37aced31eb0 ("drm/scheduler: implement a function to
> > > modify
> > > sched list")
> > > Cc: Christian König 
> > > Cc: Alex Deucher 
> > > Cc: Luben Tuikov 
> > > Cc: Matthew Brost 
> > > Cc: David Airlie 
> > > Cc: Daniel Vetter 
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc:  # v5.7+
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> > > b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index 58c8161289fe..ae8be30472cd 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct
> > > drm_sched_entity *entity,
> > >   {
> > >   WARN_ON(!num_sched_list || !sched_list);
> > >   
> > > + spin_lock(&entity->rq_lock);
> > >   entity->sched_list = sched_list;
> > >   entity->num_sched_list = num_sched_list;
> > > + spin_unlock(&entity->rq_lock);
> > >   }
> > >   EXPORT_SYMBOL(drm_sched_entity_modify_sched);
> > >   
> 



Re: [RFC 2/4] drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job

2024-09-09 Thread Philipp Stanner
On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Since drm_sched_entity_modify_sched() can modify the entities run
> queue
> lets make sure to only derefernce the pointer once so both adding and
> waking up are guaranteed to be consistent.
> 
> Signed-off-by: Tvrtko Ursulin 
> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify
> sched list")
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Luben Tuikov 
> Cc: Matthew Brost 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: dri-devel@lists.freedesktop.org
> Cc:  # v5.7+
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index ae8be30472cd..62b07ef7630a 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -599,6 +599,8 @@ void drm_sched_entity_push_job(struct
> drm_sched_job *sched_job)
>  
>   /* first job wakes up scheduler */
>   if (first) {
> + struct drm_sched_rq *rq;
> +
>   /* Add the entity to the run queue */
>   spin_lock(&entity->rq_lock);
>   if (entity->stopped) {
> @@ -608,13 +610,15 @@ void drm_sched_entity_push_job(struct
> drm_sched_job *sched_job)
>   return;
>   }
>  
> - drm_sched_rq_add_entity(entity->rq, entity);
> + rq = entity->rq;
> +
> + drm_sched_rq_add_entity(rq, entity);
>   spin_unlock(&entity->rq_lock);
>  
>   if (drm_sched_policy == DRM_SCHED_POLICY_FIFO)
>   drm_sched_rq_update_fifo(entity, submit_ts);
>  
> - drm_sched_wakeup(entity->rq->sched, entity);
> + drm_sched_wakeup(rq->sched, entity);

OK, I think that makes sense. But I'd mention that the more readable
solution of moving the spin_unlock() down here cannot be done because
drm_sched_rq_update_fifo() needs that same lock.

P.

>   }
>  }
>  EXPORT_SYMBOL(drm_sched_entity_push_job);



Re: [RFC 1/4] drm/sched: Add locking to drm_sched_entity_modify_sched

2024-09-09 Thread Philipp Stanner
On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> Without the locking amdgpu currently can race
> amdgpu_ctx_set_entity_priority() and drm_sched_job_arm(), 

I would explicitly say "amdgpu's amdgpu_ctx_set_entity_priority() races
through drm_sched_entity_modify_sched() with drm_sched_job_arm()".

The actual issue then seems to be drm_sched_job_arm() calling
drm_sched_entity_select_rq(). I would mention that, too.


> leading to the
> latter accesing potentially inconsitent entity->sched_list and
> entity->num_sched_list pair.
> 
> The comment on drm_sched_entity_modify_sched() however says:
> 
> """
>  * Note that this must be called under the same common lock for
> @entity as
>  * drm_sched_job_arm() and drm_sched_entity_push_job(), or the driver
> needs to
>  * guarantee through some other means that this is never called while
> new jobs
>  * can be pushed to @entity.
> """
> 
> It is unclear if that is referring to this race or something else.

That comment is indeed a bit awkward. Both drm_sched_entity_push_job()
and drm_sched_job_arm() take rq_lock. But
drm_sched_entity_modify_sched() doesn't.

The comment was written in 981b04d968561. Interestingly, in
drm_sched_entity_push_job(), this "common lock" is mentioned with the
soft requirement word "should" and apparently is more about keeping
sequence numbers in order when inserting.

I tend to think that the issue discovered by you is unrelated to that
comment. But if no one can make sense of the comment, should it maybe
be removed? Confusing comment is arguably worse than no comment

P.

> 
> Signed-off-by: Tvrtko Ursulin 
> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify
> sched list")
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Luben Tuikov 
> Cc: Matthew Brost 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: dri-devel@lists.freedesktop.org
> Cc:  # v5.7+
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 58c8161289fe..ae8be30472cd 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -133,8 +133,10 @@ void drm_sched_entity_modify_sched(struct
> drm_sched_entity *entity,
>  {
>   WARN_ON(!num_sched_list || !sched_list);
>  
> + spin_lock(&entity->rq_lock);
>   entity->sched_list = sched_list;
>   entity->num_sched_list = num_sched_list;
> + spin_unlock(&entity->rq_lock);
>  }
>  EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>  



Re: [RFC 0/4] DRM scheduler fixes, or not, or incorrect kind

2024-09-09 Thread Philipp Stanner
Hi,

On Fri, 2024-09-06 at 19:06 +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin 
> 
> In a recent conversation with Christian there was a thought that
> drm_sched_entity_modify_sched() should start using the entity-
> >rq_lock to be
> safe against job submission and simultaneous priority changes.

There are also FIXMEs in gpu_scheduler.h that might be related.

> 
> The kerneldoc accompanying that function however is a bit unclear to
> me. For
> instance is amdgpu simply doing it wrongly by not serializing the two
> in the
> driver? Or is the comment referring to some other race condition than
> which is
> of concern in this series?
> 
> To cut the long story short, first three patches try to fix this race
> in three
> places I *think* can manifest in different ways.
> 
> Last patch is a trivial optimisation I spotted can be easily done.

I took a look and at least to me it doesn't appear to be that trivial,
mostly because it takes two locks.

Would you mind branching that out as a separate patch so that the
series would 100% address bugs?

P.  

> 
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Luben Tuikov 
> Cc: Matthew Brost 
> 
> Tvrtko Ursulin (4):
>   drm/sched: Add locking to drm_sched_entity_modify_sched
>   drm/sched: Always wake up correct scheduler in
>     drm_sched_entity_push_job
>   drm/sched: Always increment correct scheduler score
>   drm/sched: Optimise drm_sched_entity_push_job
> 
>  drivers/gpu/drm/scheduler/sched_entity.c | 17 -
>  drivers/gpu/drm/scheduler/sched_main.c   | 21 ++---
>  include/drm/gpu_scheduler.h  |  1 +
>  3 files changed, 27 insertions(+), 12 deletions(-)
> 



Re: [RFC PATCH] drm/sched: Fix teardown leaks with waitqueue

2024-09-05 Thread Philipp Stanner
On Wed, 2024-09-04 at 19:47 +0200, Simona Vetter wrote:
> On Tue, Sep 03, 2024 at 11:44:47AM +0200, Philipp Stanner wrote:
> > The GPU scheduler currently does not ensure that its pending_list
> > is
> > empty before performing various other teardown tasks in
> > drm_sched_fini().
> > 
> > If there are still jobs in the pending_list, this is problematic
> > because
> > after scheduler teardown, no one will call backend_ops.free_job()
> > anymore. This would, consequently, result in memory leaks.
> > 
> > One way to solves this is to implement a waitqueue that
> > drm_sched_fini()
> > blocks on until the pending_list has become empty.
> > 
> > Add a waitqueue to struct drm_gpu_scheduler. Wake up waiters once
> > the
> > pending_list becomes empty. Wait in drm_sched_fini() for that to
> > happen.
> > 
> > Suggested-by: Danilo Krummrich 
> > Signed-off-by: Philipp Stanner 
> > ---
> > Hi all,
> > 
> > since the scheduler has many stake holders, I want this solution
> > discussed as an RFC first.
> > 
> > This version here has IMO the advantage (and disadvantage...) that
> > it
> > blocks infinitly if the driver messed up the clean up, so problems
> > might become more visible than the refcount solution I proposed in
> > parallel.
> 
> Very quick comment because I'm heading out for the r4l conference,
> but
> maybe I can discuss this there with Danilo a bit.
> 
> Maybe we should do step 0 first, and document the current rules? The
> kerneldoc isn't absolute zero anymore, but it's very, very bare-
> bones.
> Then get that acked and merged, which is a very good way to make sure
> we're actually standing on common ground.

Yes, documentation is definitely also on my TODO list. I wanted to send
out something clarifying the objects' lifetimes (based on Christian's
previous series [1]) quite soon.

> 
> Then maybe step 0.5 would be to add runtime asserts to enforce the
> rules,
> like if you tear down the scheduler and there's stuff in flight, you
> splat
> on a WARN_ON.
> 
> With that foundation there should be a lot clearer basis to discuss
> whether there is an issue, and what a better design could look like.

I mean, I'm quite sure that there are teardown issues. But we could
indeed make them visible first through documentation (and a FIXME tag)
and after establishing consensus through merging that go on as you
suggested.

> The
> little pondering I've done I've come up with a few more ideas along
> similar lines.
> 
> One comment below, kinda unrelated.
> 
> > 
> > Cheers,
> > P.
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 40
> > ++
> >  include/drm/gpu_scheduler.h    |  4 +++
> >  2 files changed, 44 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 7e90c9f95611..200fa932f289 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -564,6 +564,13 @@ static void drm_sched_job_timedout(struct
> > work_struct *work)
> >  * is parked at which point it's safe.
> >  */
> >     list_del_init(&job->list);
> > +
> > +   /*
> > +* Inform tasks blocking in drm_sched_fini() that
> > it's now safe to proceed.
> > +*/
> > +   if (list_empty(&sched->pending_list))
> > +   wake_up(&sched->job_list_waitque);
> > +
> >     spin_unlock(&sched->job_list_lock);
> >  
> >     status = job->sched->ops->timedout_job(job);
> > @@ -584,6 +591,15 @@ static void drm_sched_job_timedout(struct
> > work_struct *work)
> >     drm_sched_start_timeout_unlocked(sched);
> >  }
> >  
> > +static bool drm_sched_no_jobs_pending(struct drm_gpu_scheduler
> > *sched)
> > +{
> > +   /*
> > +* For list_empty() to work without a lock.
> > +*/
> 
> So this is pretty far from the gold standard for documenting memory
> barrier semantics in lockless code. Ideally we have a comment for
> both
> sides of the barrier (you always need two, or there's no function
> barrier), pointing at each another, and explaining exactly what's
> being
> synchronized against what and how.
> 
> I did years ago add a few missing barriers with that approach, see
> b0a5303d4e14 ("drm/sched: Barriers are needed for
> entity-

Re: [PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

2024-09-04 Thread Philipp Stanner
Luben? Christian?

On Wed, 2024-08-28 at 11:41 +0200, Philipp Stanner wrote:
> drm_sched_job_init() has no control over how users allocate struct
> drm_sched_job. Unfortunately, the function can also not set some
> struct
> members such as job->sched.
> 
> This could theoretically lead to UB by users dereferencing the
> struct's
> pointer members too early.
> 
> It is easier to debug such issues if these pointers are initialized
> to
> NULL, so dereferencing them causes a NULL pointer exception.
> Accordingly, drm_sched_entity_init() does precisely that and
> initializes
> its struct with memset().
> 
> Initialize parameter "job" to 0 in drm_sched_job_init().
> 
> Signed-off-by: Philipp Stanner 
> ---
> No changes in v2.
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 356c30fa24a8..b0c8ad10b419 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job
> *job,
>   return -EINVAL;
>   }
>  
> + /*
> +  * We don't know for sure how the user has allocated. Thus,
> zero the
> +  * struct so that unallowed (i.e., too early) usage of
> pointers that
> +  * this function does not set is guaranteed to lead to a
> NULL pointer
> +  * exception instead of UB.
> +  */
> + memset(job, 0, sizeof(*job));
> +
>   job->entity = entity;
>   job->credits = credits;
>   job->s_fence = drm_sched_fence_alloc(entity, owner);



[RFC PATCH] drm/sched: Fix teardown leaks with refcounting

2024-09-03 Thread Philipp Stanner
The GPU scheduler currently does not ensure that its pending_list is
empty before performing various other teardown tasks in
drm_sched_fini().

If there are still jobs in the pending_list, this is problematic because
after scheduler teardown, no one will call backend_ops.free_job()
anymore. This would, consequently, result in memory leaks.

One way to solves this is to implement reference counting for struct
drm_gpu_scheduler itself. Each job added to the pending_list takes a
reference, each one removed drops a reference.

This approach would keep the scheduler running even after users have
called drm_sched_fini(), and it would ultimately stop after the last job
has been removed from pending_list.

Add reference counting to struct drm_gpu_scheduler. Move the teardown
logic to __drm_sched_fini() and have drm_sched_fini() just also drop a
reference.

Suggested-by: Danilo Krummrich 
Signed-off-by: Philipp Stanner 
---
Hi all,

since the scheduler has many stake holders, I want this solution
discussed as an RFC first.

The advantage of this version would be that it does not block
drm_sched_fini(), but the price paid for that is that the scheduler
keeps running until all jobs are gone.

Cheers,
P.
---
 drivers/gpu/drm/scheduler/sched_main.c | 17 -
 include/drm/gpu_scheduler.h|  5 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..62b453c8ed76 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -99,6 +99,8 @@ int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 MODULE_PARM_DESC(sched_policy, "Specify the scheduling policy for entities on 
a run-queue, " __stringify(DRM_SCHED_POLICY_RR) " = Round Robin, " 
__stringify(DRM_SCHED_POLICY_FIFO) " = FIFO (default).");
 module_param_named(sched_policy, drm_sched_policy, int, 0444);
 
+static void __drm_sched_fini(struct kref *jobs_remaining);
+
 static u32 drm_sched_available_credits(struct drm_gpu_scheduler *sched)
 {
u32 credits;
@@ -540,6 +542,7 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
 
spin_lock(&sched->job_list_lock);
list_add_tail(&s_job->list, &sched->pending_list);
+   kref_get(&sched->jobs_remaining);
drm_sched_start_timeout(sched);
spin_unlock(&sched->job_list_lock);
 }
@@ -564,6 +567,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 * is parked at which point it's safe.
 */
list_del_init(&job->list);
+   kref_put(&sched->jobs_remaining, __drm_sched_fini);
spin_unlock(&sched->job_list_lock);
 
status = job->sched->ops->timedout_job(job);
@@ -637,6 +641,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
drm_sched_job *bad)
 */
spin_lock(&sched->job_list_lock);
list_del_init(&s_job->list);
+   kref_put(&sched->jobs_remaining, __drm_sched_fini);
spin_unlock(&sched->job_list_lock);
 
/*
@@ -1084,6 +1089,7 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler 
*sched)
if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
/* remove job from pending_list */
list_del_init(&job->list);
+   kref_put(&sched->jobs_remaining, __drm_sched_fini);
 
/* cancel this job's TO timer */
cancel_delayed_work(&sched->work_tdr);
@@ -1303,6 +1309,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
init_waitqueue_head(&sched->job_scheduled);
INIT_LIST_HEAD(&sched->pending_list);
spin_lock_init(&sched->job_list_lock);
+   kref_init(&sched->jobs_remaining);
atomic_set(&sched->credit_count, 0);
INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
@@ -1334,11 +1341,14 @@ EXPORT_SYMBOL(drm_sched_init);
  *
  * Tears down and cleans up the scheduler.
  */
-void drm_sched_fini(struct drm_gpu_scheduler *sched)
+static void __drm_sched_fini(struct kref *jobs_remaining)
 {
+   struct drm_gpu_scheduler *sched;
struct drm_sched_entity *s_entity;
int i;
 
+   sched = container_of(jobs_remaining, struct drm_gpu_scheduler, 
jobs_remaining);
+
drm_sched_wqueue_stop(sched);
 
for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
@@ -1368,6 +1378,11 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
kfree(sched->sched_rq);
sched->sched_rq = NULL;
 }
+
+void drm_sched_fini(struct d

[RFC PATCH] drm/sched: Fix teardown leaks with waitqueue

2024-09-03 Thread Philipp Stanner
The GPU scheduler currently does not ensure that its pending_list is
empty before performing various other teardown tasks in
drm_sched_fini().

If there are still jobs in the pending_list, this is problematic because
after scheduler teardown, no one will call backend_ops.free_job()
anymore. This would, consequently, result in memory leaks.

One way to solves this is to implement a waitqueue that drm_sched_fini()
blocks on until the pending_list has become empty.

Add a waitqueue to struct drm_gpu_scheduler. Wake up waiters once the
pending_list becomes empty. Wait in drm_sched_fini() for that to happen.

Suggested-by: Danilo Krummrich 
Signed-off-by: Philipp Stanner 
---
Hi all,

since the scheduler has many stake holders, I want this solution
discussed as an RFC first.

This version here has IMO the advantage (and disadvantage...) that it
blocks infinitly if the driver messed up the clean up, so problems
might become more visible than the refcount solution I proposed in
parallel.

Cheers,
P.
---
 drivers/gpu/drm/scheduler/sched_main.c | 40 ++
 include/drm/gpu_scheduler.h|  4 +++
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..200fa932f289 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -564,6 +564,13 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
 * is parked at which point it's safe.
 */
list_del_init(&job->list);
+
+   /*
+* Inform tasks blocking in drm_sched_fini() that it's now safe 
to proceed.
+*/
+   if (list_empty(&sched->pending_list))
+   wake_up(&sched->job_list_waitque);
+
spin_unlock(&sched->job_list_lock);
 
status = job->sched->ops->timedout_job(job);
@@ -584,6 +591,15 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
drm_sched_start_timeout_unlocked(sched);
 }
 
+static bool drm_sched_no_jobs_pending(struct drm_gpu_scheduler *sched)
+{
+   /*
+* For list_empty() to work without a lock.
+*/
+   rmb();
+   return list_empty(&sched->pending_list);
+}
+
 /**
  * drm_sched_stop - stop the scheduler
  *
@@ -659,6 +675,12 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
struct drm_sched_job *bad)
}
}
 
+   /*
+* Inform tasks blocking in drm_sched_fini() that it's now safe to 
proceed.
+*/
+   if (drm_sched_no_jobs_pending(sched))
+   wake_up(&sched->job_list_waitque);
+
/*
 * Stop pending timer in flight as we rearm it in  drm_sched_start. This
 * avoids the pending timeout work in progress to fire right away after
@@ -1085,6 +1107,12 @@ drm_sched_get_finished_job(struct drm_gpu_scheduler 
*sched)
/* remove job from pending_list */
list_del_init(&job->list);
 
+   /*
+* Inform tasks blocking in drm_sched_fini() that it's now safe 
to proceed.
+*/
+   if (list_empty(&sched->pending_list))
+   wake_up(&sched->job_list_waitque);
+
/* cancel this job's TO timer */
cancel_delayed_work(&sched->work_tdr);
/* make the scheduled timestamp more accurate */
@@ -1303,6 +1331,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
init_waitqueue_head(&sched->job_scheduled);
INIT_LIST_HEAD(&sched->pending_list);
spin_lock_init(&sched->job_list_lock);
+   init_waitqueue_head(&sched->job_list_waitque);
atomic_set(&sched->credit_count, 0);
INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
INIT_WORK(&sched->work_run_job, drm_sched_run_job_work);
@@ -1333,12 +1362,23 @@ EXPORT_SYMBOL(drm_sched_init);
  * @sched: scheduler instance
  *
  * Tears down and cleans up the scheduler.
+ *
+ * Note that this function blocks until the fences returned by
+ * backend_ops.run_job() have been signalled.
  */
 void drm_sched_fini(struct drm_gpu_scheduler *sched)
 {
struct drm_sched_entity *s_entity;
int i;
 
+   /*
+* Jobs that have neither been scheduled or which have timed out are
+* gone by now, but jobs that have been submitted through
+* backend_ops.run_job() and have not yet terminated are still pending.
+*
+* Wait for the pending_list to become empty to avoid leaking those 
jobs.
+*/
+   wait_event(sched->job_list_waitque, drm_sched_no_jobs_pending(sched));
drm_sched_wqueue_stop(sched);
 
for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {

Re: [PATCH 1/4] drm/sched: add optional errno to drm_sched_start()

2024-09-02 Thread Philipp Stanner
On Fri, 2024-08-30 at 15:15 +0200, Christian König wrote:
> Am 28.08.24 um 11:30 schrieb Philipp Stanner:
> > On Mon, 2024-08-26 at 14:25 +0200, Christian König wrote:
> > > The current implementation of drm_sched_start uses a hardcoded
> > > -ECANCELED to dispose of a job when the parent/hw fence is NULL.
> > > This results in drm_sched_job_done being called with -ECANCELED
> > > for
> > > each job with a NULL parent in the pending list, making it
> > > difficult
> > > to distinguish between recovery methods, whether a queue reset or
> > > a
> > > full GPU reset was used.
> > > 
> > > To improve this, we first try a soft recovery for timeout jobs
> > > and
> > > use the error code -ENODATA. If soft recovery fails, we proceed
> > > with
> > > a queue reset, where the error code remains -ENODATA for the job.
> > > Finally, for a full GPU reset, we use error codes -ECANCELED or
> > > -ETIME. This patch adds an error code parameter to
> > > drm_sched_start,
> > > allowing us to differentiate between queue reset and GPU reset
> > > failures. This enables user mode and test applications to
> > > validate
> > > the expected correctness of the requested operation. After a
> > > successful queue reset, the only way to continue normal operation
> > > is
> > > to call drm_sched_job_done with the specific error code -ENODATA.
> > > 
> > > v1: Initial implementation by Jesse utilized
> > > amdgpu_device_lock_reset_domain
> > >  and amdgpu_device_unlock_reset_domain to allow user mode to
> > > track
> > >  the queue reset status and distinguish between queue reset
> > > and
> > >  GPU reset.
> > > v2: Christian suggested using the error codes -ENODATA for queue
> > > reset
> > >  and -ECANCELED or -ETIME for GPU reset, returned to
> > >  amdgpu_cs_wait_ioctl.
> > > v3: To meet the requirements, we introduce a new function
> > >  drm_sched_start_ex with an additional parameter to set
> > >  dma_fence_set_error, allowing us to handle the specific
> > > error
> > >  codes appropriately and dispose of bad jobs with the
> > > selected
> > >  error code depending on whether it was a queue reset or GPU
> > > reset.
> > > v4: Alex suggested using a new name,
> > > drm_sched_start_with_recovery_error,
> > >  which more accurately describes the function's purpose.
> > >  Additionally, it was recommended to add documentation
> > > details
> > >  about the new method.
> > > v5: Fixed declaration of new function
> > > drm_sched_start_with_recovery_error.(Alex)
> > > v6 (chk): rebase on upstream changes, cleanup the commit message,
> > >    drop the new function again and update all callers,
> > >    apply the errno also to scheduler fences with hw
> > > fences
> > > 
> > > Signed-off-by: Jesse Zhang 
> > > Signed-off-by: Vitaly Prosyak 
> > > Signed-off-by: Christian König 
> > > Cc: Alex Deucher 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 4 ++--
> > >   drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +-
> > >   drivers/gpu/drm/imagination/pvr_queue.c | 4 ++--
> > >   drivers/gpu/drm/lima/lima_sched.c   | 2 +-
> > >   drivers/gpu/drm/nouveau/nouveau_sched.c | 2 +-
> > >   drivers/gpu/drm/panfrost/panfrost_job.c | 2 +-
> > >   drivers/gpu/drm/panthor/panthor_mmu.c   | 2 +-
> > >   drivers/gpu/drm/scheduler/sched_main.c  | 7 ---
> > >   drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
> > >   include/drm/gpu_scheduler.h | 2 +-
> > >   11 files changed, 16 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> > > index 2320df51c914..18135d8235f9 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> > > @@ -300,7 +300,7 @@ static int
> > > suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool
> > > sus
> > >   if (r)
> > 

Re: [PATCH] drm/meson: switch to a managed drm device

2024-08-28 Thread Philipp Stanner
On Wed, 2024-08-28 at 14:04 +0300, Anastasia Belova wrote:
> Switch to a managed drm device to cleanup some error handling
> and make future work easier.

It's best to state what the current situation is like and then how this
patch addresses it.

If the cleanup topic is separate from the addressed NULL pointer issue
referenced below, it might make sense to split it into two patches, one
that does the cleanup, and another that addresses the NULL pointer
deref.

> 
> Fix dereference of NULL in meson_drv_bind_master by removing
> drm_dev_put(drm) before meson_encoder_*_remove where drm
> dereferenced.

(-> "where drm *is* dereferenced" )

Can this be backported to older versions?

If yes, sta...@vger.kernel.org should be put on CC.


> 
> Co-developed by Linux Verification Center (linuxtesting.org).
> 
> Signed-off-by: Anastasia Belova 

Should also include:

Cc: sta...@vger.kernel.org # vX.Y
Fixes: 

Спасибо,
P.

> ---
>  drivers/gpu/drm/meson/meson_crtc.c | 10 +--
>  drivers/gpu/drm/meson/meson_drv.c  | 71 ++--
> --
>  drivers/gpu/drm/meson/meson_drv.h  |  2 +-
>  drivers/gpu/drm/meson/meson_encoder_cvbs.c |  8 +--
>  drivers/gpu/drm/meson/meson_overlay.c  |  8 +--
>  drivers/gpu/drm/meson/meson_plane.c    | 10 +--
>  6 files changed, 51 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/gpu/drm/meson/meson_crtc.c
> b/drivers/gpu/drm/meson/meson_crtc.c
> index d70616da8ce2..e1c0bf3baeea 100644
> --- a/drivers/gpu/drm/meson/meson_crtc.c
> +++ b/drivers/gpu/drm/meson/meson_crtc.c
> @@ -662,13 +662,13 @@ void meson_crtc_irq(struct meson_drm *priv)
>  
>   drm_crtc_handle_vblank(priv->crtc);
>  
> - spin_lock_irqsave(&priv->drm->event_lock, flags);
> + spin_lock_irqsave(&priv->drm.event_lock, flags);
>   if (meson_crtc->event) {
>   drm_crtc_send_vblank_event(priv->crtc, meson_crtc-
> >event);
>   drm_crtc_vblank_put(priv->crtc);
>   meson_crtc->event = NULL;
>   }
> - spin_unlock_irqrestore(&priv->drm->event_lock, flags);
> + spin_unlock_irqrestore(&priv->drm.event_lock, flags);
>  }
>  
>  int meson_crtc_create(struct meson_drm *priv)
> @@ -677,18 +677,18 @@ int meson_crtc_create(struct meson_drm *priv)
>   struct drm_crtc *crtc;
>   int ret;
>  
> - meson_crtc = devm_kzalloc(priv->drm->dev,
> sizeof(*meson_crtc),
> + meson_crtc = devm_kzalloc(priv->drm.dev,
> sizeof(*meson_crtc),
>     GFP_KERNEL);
>   if (!meson_crtc)
>   return -ENOMEM;
>  
>   meson_crtc->priv = priv;
>   crtc = &meson_crtc->base;
> - ret = drm_crtc_init_with_planes(priv->drm, crtc,
> + ret = drm_crtc_init_with_planes(&priv->drm, crtc,
>   priv->primary_plane, NULL,
>   &meson_crtc_funcs,
> "meson_crtc");
>   if (ret) {
> - dev_err(priv->drm->dev, "Failed to init CRTC\n");
> + dev_err(priv->drm.dev, "Failed to init CRTC\n");
>   return ret;
>   }
>  
> diff --git a/drivers/gpu/drm/meson/meson_drv.c
> b/drivers/gpu/drm/meson/meson_drv.c
> index 4bd0baa2a4f5..2e7c2e7c7b82 100644
> --- a/drivers/gpu/drm/meson/meson_drv.c
> +++ b/drivers/gpu/drm/meson/meson_drv.c
> @@ -182,7 +182,6 @@ static int meson_drv_bind_master(struct device
> *dev, bool has_components)
>   struct platform_device *pdev = to_platform_device(dev);
>   const struct meson_drm_match_data *match;
>   struct meson_drm *priv;
> - struct drm_device *drm;
>   struct resource *res;
>   void __iomem *regs;
>   int ret, i;
> @@ -197,17 +196,13 @@ static int meson_drv_bind_master(struct device
> *dev, bool has_components)
>   if (!match)
>   return -ENODEV;
>  
> - drm = drm_dev_alloc(&meson_driver, dev);
> - if (IS_ERR(drm))
> - return PTR_ERR(drm);
> + priv = devm_drm_dev_alloc(dev, &meson_driver,
> +  struct meson_drm, drm);
>  
> - priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> - if (!priv) {
> - ret = -ENOMEM;
> - goto free_drm;
> - }
> - drm->dev_private = priv;
> - priv->drm = drm;
> + if (IS_ERR(priv))
> + return PTR_ERR(priv);
> +
> + priv->drm.dev_private = priv;
>   priv->dev = dev;
>   priv->compat = match->compat;
>   priv->afbcd.ops = match->afbcd_ops;
> @@ -215,7 +210,7 @@ static int meson_drv_bind_master(struct device
> *dev, bool has_components)
>   regs = devm_platform_ioremap_resource_byname(pdev, "vpu");
>   if (IS_ERR(regs)) {
>   ret = PTR_ERR(regs);
> - goto free_drm;
> + goto remove_encoders;
>   }
>  
>   priv->io_base = regs;
> @@ -223,13 +218,13 @@ static int meson_drv_bind_master(struct device
> *dev, bool has_components)
>   res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> "hhi");
>   if (!res) {
>   

[PATCH v2 2/2] drm/sched: warn about drm_sched_job_init()'s partial init

2024-08-28 Thread Philipp Stanner
drm_sched_job_init()'s name suggests that after the function succeeded,
parameter "job" will be fully initialized. This is not the case; some
members are only later set, notably "job->sched" by drm_sched_job_arm().

Document that drm_sched_job_init() does not set all struct members.

Document that job->sched in particular is uninitialized before
drm_sched_job_arm().

Signed-off-by: Philipp Stanner 
---
Changes in v2:
  - Change grammar in the new comments a bit.
---
 drivers/gpu/drm/scheduler/sched_main.c | 4 
 include/drm/gpu_scheduler.h| 7 +++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b0c8ad10b419..721373938c1e 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -781,6 +781,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * Drivers must make sure drm_sched_job_cleanup() if this function returns
  * successfully, even when @job is aborted before drm_sched_job_arm() is 
called.
  *
+ * Note that this function does not assign a valid value to each struct member
+ * of struct drm_sched_job. Take a look at that struct's documentation to see
+ * who sets which struct member with what lifetime.
+ *
  * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
  * has died, which can mean that there's no valid runqueue for a @entity.
  * This function returns -ENOENT in this case (which probably should be -EIO as
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5acc64954a88..04a268cd22f1 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
dma_fence *f);
 struct drm_sched_job {
struct spsc_nodequeue_node;
struct list_headlist;
+
+   /*
+* The scheduler this job is or will be scheduled on.
+*
+* Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
+* callback "free_job()" has been called.
+*/
struct drm_gpu_scheduler*sched;
struct drm_sched_fence  *s_fence;
 
-- 
2.46.0



[PATCH v2 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

2024-08-28 Thread Philipp Stanner
drm_sched_job_init() has no control over how users allocate struct
drm_sched_job. Unfortunately, the function can also not set some struct
members such as job->sched.

This could theoretically lead to UB by users dereferencing the struct's
pointer members too early.

It is easier to debug such issues if these pointers are initialized to
NULL, so dereferencing them causes a NULL pointer exception.
Accordingly, drm_sched_entity_init() does precisely that and initializes
its struct with memset().

Initialize parameter "job" to 0 in drm_sched_job_init().

Signed-off-by: Philipp Stanner 
---
No changes in v2.
---
 drivers/gpu/drm/scheduler/sched_main.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 356c30fa24a8..b0c8ad10b419 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -806,6 +806,14 @@ int drm_sched_job_init(struct drm_sched_job *job,
return -EINVAL;
}
 
+   /*
+* We don't know for sure how the user has allocated. Thus, zero the
+* struct so that unallowed (i.e., too early) usage of pointers that
+* this function does not set is guaranteed to lead to a NULL pointer
+* exception instead of UB.
+*/
+   memset(job, 0, sizeof(*job));
+
job->entity = entity;
job->credits = credits;
job->s_fence = drm_sched_fence_alloc(entity, owner);
-- 
2.46.0



Re: [PATCH 1/4] drm/sched: add optional errno to drm_sched_start()

2024-08-28 Thread Philipp Stanner
On Mon, 2024-08-26 at 14:25 +0200, Christian König wrote:
> The current implementation of drm_sched_start uses a hardcoded
> -ECANCELED to dispose of a job when the parent/hw fence is NULL.
> This results in drm_sched_job_done being called with -ECANCELED for
> each job with a NULL parent in the pending list, making it difficult
> to distinguish between recovery methods, whether a queue reset or a
> full GPU reset was used.
> 
> To improve this, we first try a soft recovery for timeout jobs and
> use the error code -ENODATA. If soft recovery fails, we proceed with
> a queue reset, where the error code remains -ENODATA for the job.
> Finally, for a full GPU reset, we use error codes -ECANCELED or
> -ETIME. This patch adds an error code parameter to drm_sched_start,
> allowing us to differentiate between queue reset and GPU reset
> failures. This enables user mode and test applications to validate
> the expected correctness of the requested operation. After a
> successful queue reset, the only way to continue normal operation is
> to call drm_sched_job_done with the specific error code -ENODATA.
> 
> v1: Initial implementation by Jesse utilized
> amdgpu_device_lock_reset_domain
>     and amdgpu_device_unlock_reset_domain to allow user mode to track
>     the queue reset status and distinguish between queue reset and
>     GPU reset.
> v2: Christian suggested using the error codes -ENODATA for queue
> reset
>     and -ECANCELED or -ETIME for GPU reset, returned to
>     amdgpu_cs_wait_ioctl.
> v3: To meet the requirements, we introduce a new function
>     drm_sched_start_ex with an additional parameter to set
>     dma_fence_set_error, allowing us to handle the specific error
>     codes appropriately and dispose of bad jobs with the selected
>     error code depending on whether it was a queue reset or GPU
> reset.
> v4: Alex suggested using a new name,
> drm_sched_start_with_recovery_error,
>     which more accurately describes the function's purpose.
>     Additionally, it was recommended to add documentation details
>     about the new method.
> v5: Fixed declaration of new function
> drm_sched_start_with_recovery_error.(Alex)
> v6 (chk): rebase on upstream changes, cleanup the commit message,
>   drop the new function again and update all callers,
>   apply the errno also to scheduler fences with hw fences
> 
> Signed-off-by: Jesse Zhang 
> Signed-off-by: Vitaly Prosyak 
> Signed-off-by: Christian König 
> Cc: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 4 ++--
>  drivers/gpu/drm/etnaviv/etnaviv_sched.c | 2 +-
>  drivers/gpu/drm/imagination/pvr_queue.c | 4 ++--
>  drivers/gpu/drm/lima/lima_sched.c   | 2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c | 2 +-
>  drivers/gpu/drm/panfrost/panfrost_job.c | 2 +-
>  drivers/gpu/drm/panthor/panthor_mmu.c   | 2 +-
>  drivers/gpu/drm/scheduler/sched_main.c  | 7 ---
>  drivers/gpu/drm/v3d/v3d_sched.c | 2 +-
>  include/drm/gpu_scheduler.h | 2 +-
>  11 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> index 2320df51c914..18135d8235f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
> @@ -300,7 +300,7 @@ static int
> suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
>   if (r)
>   goto out;
>   } else {
> - drm_sched_start(&ring->sched);
> + drm_sched_start(&ring->sched, 0);
>   }
>   }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1cd7d355689c..5891312e44ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5879,7 +5879,7 @@ int amdgpu_device_gpu_recover(struct
> amdgpu_device *adev,
>   if (!amdgpu_ring_sched_ready(ring))
>   continue;
>  
> - drm_sched_start(&ring->sched);
> + drm_sched_start(&ring->sched, 0);
>   }
>  
>   if
> (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) &&
> !job_signaled)
> @@ -6374,7 +6374,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
>   if (!amdgpu_ring_sched_ready(ring))
>   continue;
>  
> - drm_sched_start(&ring->sched);
> + drm_sched_start(&ring->sched, 0);
>   }
>  
>   amdgpu_device_unset_mp1_state(adev);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index ab9ca4824b62..23ced5896c7c 100644

Re: [PATCH] drm/sched: Fix UB pointer dereference

2024-08-27 Thread Philipp Stanner
On Tue, 2024-08-27 at 11:00 +0200, Christian König wrote:
> Am 27.08.24 um 10:39 schrieb Danilo Krummrich:
> > On 8/27/24 9:45 AM, Philipp Stanner wrote:
> > > In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert
> > > the
> > > GPU scheduler to variable number of run-queues") implemented a
> > > call to
> > > drm_err(), which uses the job's scheduler pointer as a parameter.
> > > job->sched, however, is not yet valid as it gets set by
> > > drm_sched_job_arm(), which is always called after
> > > drm_sched_job_init().
> > > 
> > > Since the scheduler code has no control over how the API-User has
> > > allocated or set 'job', the pointer's dereference is undefined
> > > behavior.
> > > 
> > > Fix the UB by replacing drm_err() with pr_err().
> > > 
> > > Cc:     # 6.7+
> > > Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to 
> > > variable number of run-queues")
> > > Reported-by: Danilo Krummrich 
> > > Closes: 
> > > https://lore.kernel.org/lkml/20231108022716.15250-1-d...@redhat.com/
> > > Signed-off-by: Philipp Stanner 
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 7e90c9f95611..356c30fa24a8 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -797,7 +797,7 @@ int drm_sched_job_init(struct drm_sched_job
> > > *job,
> > >    * or worse--a blank screen--leave a trail in the
> > >    * logs, so this can be debugged easier.
> > >    */
> > > -    drm_err(job->sched, "%s: entity has no rq!\n",
> > > __func__);
> > > +    pr_err("*ERROR* %s: entity has no rq!\n", __func__);
> > 
> > I don't think the "*ERROR*" string is necessary, it's pr_err()
> > already.
> 
> Good point. I will remove that and also add a comment why drm_err
> won't 
> work here before pushing it to drm-misc-fixes.

Well, as we're at it I want to point out that the exact same mechanism
occurs just a few lines below, from where I shamelessly copied it:

if (unlikely(!credits)) {
pr_err("*ERROR* %s: credits cannot be 0!\n", __func__);


P.

> 
> Thanks,
> Christian.
> 
> > 
> > Otherwise,
> > 
> > Acked-by: Danilo Krummrich 
> > 
> > >   return -ENOENT;
> > >   }
> 



[PATCH] drm/sched: Fix UB pointer dereference

2024-08-27 Thread Philipp Stanner
In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert the
GPU scheduler to variable number of run-queues") implemented a call to
drm_err(), which uses the job's scheduler pointer as a parameter.
job->sched, however, is not yet valid as it gets set by
drm_sched_job_arm(), which is always called after drm_sched_job_init().

Since the scheduler code has no control over how the API-User has
allocated or set 'job', the pointer's dereference is undefined behavior.

Fix the UB by replacing drm_err() with pr_err().

Cc: # 6.7+
Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number 
of run-queues")
Reported-by: Danilo Krummrich 
Closes: https://lore.kernel.org/lkml/20231108022716.15250-1-d...@redhat.com/
Signed-off-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..356c30fa24a8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -797,7 +797,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
 * or worse--a blank screen--leave a trail in the
 * logs, so this can be debugged easier.
 */
-   drm_err(job->sched, "%s: entity has no rq!\n", __func__);
+   pr_err("*ERROR* %s: entity has no rq!\n", __func__);
return -ENOENT;
}
 
-- 
2.46.0



[PATCH] drm/sched: Document drm_sched_job_arm()'s effect on fences

2024-08-26 Thread Philipp Stanner
The GPU Scheduler's job initialization is split into two steps,
drm_sched_job_init() and drm_sched_job_arm(). One reason for this is
that actually arming a job results in the job's fences getting
initialized (armed).

Currently, the documentation does not explicitly state what
drm_sched_job_arm() does in this regard and which rules the API-User has
to follow once the function has been called.

Add a section to drm_sched_job_arm()'s docstring which details the
function's consequences regarding the job's fences.

Signed-off-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..e563eff4887c 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -831,6 +831,12 @@ EXPORT_SYMBOL(drm_sched_job_init);
  * Refer to drm_sched_entity_push_job() documentation for locking
  * considerations.
  *
+ * drm_sched_job_cleanup() can be used to disarm the job again - but only
+ * _before_ the job's fences have been published. Once a drm_sched_fence was
+ * published, the associated job needs to be submitted to and processed by the
+ * scheduler to avoid potential deadlocks on the DMA fences encapsulated by
+ * drm_sched_fence.
+ *
  * This can only be called if drm_sched_job_init() succeeded.
  */
 void drm_sched_job_arm(struct drm_sched_job *job)
-- 
2.46.0



Re: [PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

2024-08-20 Thread Philipp Stanner
*PING*


On Tue, 2024-08-06 at 16:38 +0200, Philipp Stanner wrote:
> drm_sched_job_init() has no control over how users allocate struct
> drm_sched_job. Unfortunately, the function can also not set some
> struct
> members such as job->sched.
> 
> This could theoretically lead to UB by users dereferencing the
> struct's
> pointer members too early.
> 
> It is easier to debug such issues if these pointers are initialized
> to
> NULL, so dereferencing them causes a NULL pointer exception.
> Accordingly, drm_sched_entity_init() does precisely that and
> initializes
> its struct with memset().
> 
> Initialize parameter "job" to 0 in drm_sched_job_init().
> 
> Signed-off-by: Philipp Stanner 
> ---
> Hi all,
> I did some experiments with the scheduler recently and am trying to
> make
> the documentation and bits of the code more bullet proof.
> 
> I tested the performance of v6.11-rc2 with and without this memset()
> by
> creating 1e6 jobs and found no performance regression.
> 
> Cheers,
> P.
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 76969f9c59c2..1498ee3cbf39 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -936,6 +936,14 @@ int drm_sched_job_init(struct drm_sched_job
> *job,
>   return -EINVAL;
>   }
>  
> + /*
> +  * We don't know for sure how the user has allocated. Thus,
> zero the
> +  * struct so that unallowed (i.e., too early) usage of
> pointers that
> +  * this function does not set is guaranteed to lead to a
> NULL pointer
> +  * exception instead of UB.
> +  */
> + memset(job, 0, sizeof(*job));
> +
>   job->entity = entity;
>   job->credits = credits;
>   job->s_fence = drm_sched_fence_alloc(entity, owner);



[PATCH 2/2] drm/ast: Request PCI BAR with devres

2024-08-07 Thread Philipp Stanner
ast currently ioremaps two PCI BARs using pcim_iomap(). It does not
perform a request on the regions, however, which would make the driver a
bit more robust.

PCI now offers pcim_iomap_region(), a managed function which both
requests and ioremaps a BAR.

Replace pcim_iomap() with pcim_iomap_region().

Suggested-by: Thomas Zimmermann 
Signed-off-by: Philipp Stanner 
---
 drivers/gpu/drm/ast/ast_drv.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_drv.c b/drivers/gpu/drm/ast/ast_drv.c
index aae019e79bda..1fadaadfbe39 100644
--- a/drivers/gpu/drm/ast/ast_drv.c
+++ b/drivers/gpu/drm/ast/ast_drv.c
@@ -287,9 +287,9 @@ static int ast_pci_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (ret)
return ret;
 
-   regs = pcim_iomap(pdev, 1, 0);
-   if (!regs)
-   return -EIO;
+   regs = pcim_iomap_region(pdev, 1, "ast");
+   if (IS_ERR(regs))
+   return PTR_ERR(regs);
 
if (pdev->revision >= 0x40) {
/*
@@ -311,9 +311,9 @@ static int ast_pci_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
if (len < AST_IO_MM_LENGTH)
return -EIO;
-   ioregs = pcim_iomap(pdev, 2, 0);
-   if (!ioregs)
-   return -EIO;
+   ioregs = pcim_iomap_region(pdev, 2, "ast");
+   if (IS_ERR(ioregs))
+   return PTR_ERR(ioregs);
} else {
/*
 * Anything else is best effort.
-- 
2.45.2



[PATCH 1/2] PCI: Deprecate pcim_iomap_regions() in favor of pcim_iomap_region()

2024-08-07 Thread Philipp Stanner
pcim_iomap_regions() is a complicated function that uses a bit mask to
determine the BARs the user wishes to request and ioremap. Almost all
users only ever set a single bit in that mask, making that mechanism
questionable.

pcim_iomap_region() is now available as a more simple replacement.

Make pcim_iomap_region() a public function.

Mark pcim_iomap_regions() as deprecated.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 8 ++--
 include/linux/pci.h  | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 3780a9f9ec00..89ec26ea1501 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -728,7 +728,7 @@ EXPORT_SYMBOL(pcim_iounmap);
  * Mapping and region will get automatically released on driver detach. If
  * desired, release manually only with pcim_iounmap_region().
  */
-static void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar,
+void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar,
   const char *name)
 {
int ret;
@@ -761,6 +761,7 @@ static void __iomem *pcim_iomap_region(struct pci_dev 
*pdev, int bar,
 
return IOMEM_ERR_PTR(ret);
 }
+EXPORT_SYMBOL(pcim_iomap_region);
 
 /**
  * pcim_iounmap_region - Unmap and release a PCI BAR
@@ -783,7 +784,7 @@ static void pcim_iounmap_region(struct pci_dev *pdev, int 
bar)
 }
 
 /**
- * pcim_iomap_regions - Request and iomap PCI BARs
+ * pcim_iomap_regions - Request and iomap PCI BARs (DEPRECATED)
  * @pdev: PCI device to map IO resources for
  * @mask: Mask of BARs to request and iomap
  * @name: Name associated with the requests
@@ -791,6 +792,9 @@ static void pcim_iounmap_region(struct pci_dev *pdev, int 
bar)
  * Returns: 0 on success, negative error code on failure.
  *
  * Request and iomap regions specified by @mask.
+ *
+ * This function is DEPRECATED. Do not use it in new code.
+ * Use pcim_iomap_region() instead.
  */
 int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4cf89a4b4cbc..fc30176d28ca 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2292,6 +2292,8 @@ static inline void pci_fixup_device(enum pci_fixup_pass 
pass,
 void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen);
 void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr);
 void __iomem * const *pcim_iomap_table(struct pci_dev *pdev);
+void __iomem *pcim_iomap_region(struct pci_dev *pdev, int bar,
+  const char *name);
 int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name);
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name);
-- 
2.45.2



[PATCH 2/2] drm/sched: warn about drm_sched_job_init()'s partial init

2024-08-06 Thread Philipp Stanner
drm_sched_job_init()'s name suggests that after the function succeeded,
parameter "job" will be fully initialized. This is not the case; some
members are only later set, notably "job->sched" by drm_sched_job_arm().

Document that drm_sched_job_init() does not set all struct members.

Document that job->sched in particular is uninitialized before
drm_sched_job_arm().

Signed-off-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 4 
 include/drm/gpu_scheduler.h| 7 +++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 1498ee3cbf39..2adb13745500 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -911,6 +911,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * Drivers must make sure drm_sched_job_cleanup() if this function returns
  * successfully, even when @job is aborted before drm_sched_job_arm() is 
called.
  *
+ * Note that this function does not assign a valid value to each struct member
+ * of struct drm_sched_job. Take a look at that struct's documentation to see
+ * who sets which struct member with what lifetime.
+ *
  * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
  * has died, which can mean that there's no valid runqueue for a @entity.
  * This function returns -ENOENT in this case (which probably should be -EIO as
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index ce15c50d8a10..7df81a07f1f9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
dma_fence *f);
 struct drm_sched_job {
struct spsc_nodequeue_node;
struct list_headlist;
+
+   /*
+* The scheduler this job is or will be scheduled on.
+*
+* Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
+* callback "free_job()" is  called.
+*/
struct drm_gpu_scheduler*sched;
struct drm_sched_fence  *s_fence;
 
-- 
2.45.2



[PATCH 1/2] drm/sched: memset() 'job' in drm_sched_job_init()

2024-08-06 Thread Philipp Stanner
drm_sched_job_init() has no control over how users allocate struct
drm_sched_job. Unfortunately, the function can also not set some struct
members such as job->sched.

This could theoretically lead to UB by users dereferencing the struct's
pointer members too early.

It is easier to debug such issues if these pointers are initialized to
NULL, so dereferencing them causes a NULL pointer exception.
Accordingly, drm_sched_entity_init() does precisely that and initializes
its struct with memset().

Initialize parameter "job" to 0 in drm_sched_job_init().

Signed-off-by: Philipp Stanner 
---
Hi all,
I did some experiments with the scheduler recently and am trying to make
the documentation and bits of the code more bullet proof.

I tested the performance of v6.11-rc2 with and without this memset() by
creating 1e6 jobs and found no performance regression.

Cheers,
P.
---
 drivers/gpu/drm/scheduler/sched_main.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 76969f9c59c2..1498ee3cbf39 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -936,6 +936,14 @@ int drm_sched_job_init(struct drm_sched_job *job,
return -EINVAL;
}
 
+   /*
+* We don't know for sure how the user has allocated. Thus, zero the
+* struct so that unallowed (i.e., too early) usage of pointers that
+* this function does not set is guaranteed to lead to a NULL pointer
+* exception instead of UB.
+*/
+   memset(job, 0, sizeof(*job));
+
job->entity = entity;
job->credits = credits;
job->s_fence = drm_sched_fence_alloc(entity, owner);
-- 
2.45.2



[PATCH 2/2] drm/vboxvideo: Add PCI region request

2024-07-29 Thread Philipp Stanner
vboxvideo currently does not reserve its PCI BAR through a region
request.

Implement the request through the managed function
pcim_request_region().

Signed-off-by: Philipp Stanner 
---
 drivers/gpu/drm/vboxvideo/vbox_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c 
b/drivers/gpu/drm/vboxvideo/vbox_main.c
index d4ade9325401..7f686a0190e6 100644
--- a/drivers/gpu/drm/vboxvideo/vbox_main.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
@@ -114,6 +114,10 @@ int vbox_hw_init(struct vbox_private *vbox)
 
DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
 
+   ret = pcim_request_region(pdev, 0, "vboxvideo");
+   if (ret)
+   return ret;
+
/* Map guest-heap at end of vram */
vbox->guest_heap = pcim_iomap_range(pdev, 0,
GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
-- 
2.45.2



[PATCH 1/2] PCI: Make pcim_request_region() a public function

2024-07-29 Thread Philipp Stanner
pcim_request_region() is the managed counterpart of
pci_request_region(). It is currently only used internally for PCI.

It can be useful for a number of drivers and exporting it is a step
towards deprecating more complicated functions.

Make pcim_request_region a public function.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 1 +
 drivers/pci/pci.h| 2 --
 include/linux/pci.h  | 1 +
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 3780a9f9ec00..0127ca58c6e5 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -863,6 +863,7 @@ int pcim_request_region(struct pci_dev *pdev, int bar, 
const char *name)
 {
return _pcim_request_region(pdev, bar, name, 0);
 }
+EXPORT_SYMBOL(pcim_request_region);
 
 /**
  * pcim_request_region_exclusive - Request a PCI BAR exclusively
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 79c8398f3938..2fe6055a334d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -887,8 +887,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
 #endif
 
 int pcim_intx(struct pci_dev *dev, int enable);
-
-int pcim_request_region(struct pci_dev *pdev, int bar, const char *name);
 int pcim_request_region_exclusive(struct pci_dev *pdev, int bar,
  const char *name);
 void pcim_release_region(struct pci_dev *pdev, int bar);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e36b6c1810e..e5d8406874e2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2294,6 +2294,7 @@ static inline void pci_fixup_device(enum pci_fixup_pass 
pass,
 void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen);
 void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr);
 void __iomem * const *pcim_iomap_table(struct pci_dev *pdev);
+int pcim_request_region(struct pci_dev *pdev, int bar, const char *name);
 int pcim_iomap_regions(struct pci_dev *pdev, int mask, const char *name);
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name);
-- 
2.45.2



[PATCH 0/2] Use pcim_request_region() in vboxvideo

2024-07-29 Thread Philipp Stanner
Hi everyone,

Now that we've got the simplified PCI devres API available we can slowly
start using it in drivers and step by step phase the more problematic
API out.

vboxvideo currently does not have a region request, so it is a suitable
first user.

P.

Philipp Stanner (2):
  PCI: Make pcim_request_region() a public function
  drm/vboxvideo: Add PCI region request

 drivers/gpu/drm/vboxvideo/vbox_main.c | 4 
 drivers/pci/devres.c  | 1 +
 drivers/pci/pci.h | 2 --
 include/linux/pci.h   | 1 +
 4 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.45.2



Re: [PATCH 1/2] drm/scheduler: improve GPU scheduler documentation v2

2024-07-19 Thread Philipp Stanner
On Thu, 2023-11-16 at 15:15 +0100, Christian König wrote:
> Start to improve the scheduler document. Especially document the
> lifetime of each of the objects as well as the restrictions around
> DMA-fence handling and userspace compatibility.

Hallo Christian,

thanks for working on this.

I'm currently looking deeper into the scheduler and am documenting the
pitfalls etc. that I have found so far.


What are your current plans with this documentation series? If you
don't intend to get it upstreamed in the foreseeable future, I would
like to hijack the series and use it as a basis for my own improvements
to the documentation.

Please tell me what you think,


Regards,
P.


> 
> v2: Some improvements suggested by Danilo, add section about error
>     handling.
> 
> Signed-off-by: Christian König 
> ---
>  Documentation/gpu/drm-mm.rst   |  36 +
>  drivers/gpu/drm/scheduler/sched_main.c | 174 +--
> --
>  2 files changed, 188 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-
> mm.rst
> index acc5901ac840..112463fa9f3a 100644
> --- a/Documentation/gpu/drm-mm.rst
> +++ b/Documentation/gpu/drm-mm.rst
> @@ -552,12 +552,48 @@ Overview
>  .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
>     :doc: Overview
>  
> +Job Object
> +--
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
> +   :doc: Job Object
> +
> +Entity Object
> +-
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
> +   :doc: Entity Object
> +
> +Hardware Fence Object
> +-
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
> +   :doc: Hardware Fence Object
> +
> +Scheduler Fence Object
> +--
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
> +   :doc: Scheduler Fence Object
> +
> +Scheduler and Run Queue Objects
> +---
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
> +   :doc: Scheduler and Run Queue Objects
> +
>  Flow Control
>  
>  
>  .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
>     :doc: Flow Control
>  
> +Error and Timeout handling
> +--
> +
> +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c
> +   :doc: Error and Timeout handling
> +
>  Scheduler Function References
>  -
>  
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 044a8c4875ba..026123497b0e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -24,28 +24,122 @@
>  /**
>   * DOC: Overview
>   *
> - * The GPU scheduler provides entities which allow userspace to push
> jobs
> - * into software queues which are then scheduled on a hardware run
> queue.
> - * The software queues have a priority among them. The scheduler
> selects the entities
> - * from the run queue using a FIFO. The scheduler provides
> dependency handling
> - * features among jobs. The driver is supposed to provide callback
> functions for
> - * backend operations to the scheduler like submitting a job to
> hardware run queue,
> - * returning the dependencies of a job etc.
> - *
> - * The organisation of the scheduler is the following:
> - *
> - * 1. Each hw run queue has one scheduler
> - * 2. Each scheduler has multiple run queues with different
> priorities
> - *    (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL)
> - * 3. Each scheduler run queue has a queue of entities to schedule
> - * 4. Entities themselves maintain a queue of jobs that will be
> scheduled on
> - *    the hardware.
> - *
> - * The jobs in a entity are always scheduled in the order that they
> were pushed.
> - *
> - * Note that once a job was taken from the entities queue and pushed
> to the
> - * hardware, i.e. the pending queue, the entity must not be
> referenced anymore
> - * through the jobs entity pointer.
> + * The GPU scheduler implements some logic to decide which command
> submission
> + * to push next to the hardware. Another major use case of the GPU
> scheduler
> + * is to enforce correct driver behavior around those command
> submissions.
> + * Because of this it's also used by drivers which don't need the
> actual
> + * scheduling functionality.
> + *
> + * All callbacks the driver needs to implement are restricted by
> DMA-fence
> + * signaling rules to guarantee deadlock free forward progress. This
> especially
> + * means that for normal operation no memory can be allocated in a
> callback.
> + * All memory which is needed for pushing the job to the hardware
> must be
> + * allocated before arming a job. It also means that no locks can be
> taken
> + * under which memory might be allocated as well.
> + *
> + * Memory which is optional to allocate, for example for device core
> dumping or
> + * debugging, *must* be allocated with GFP_NOWAIT and appropriate
> error
> + * handling taking if that allocation fails. GFP_A

[PATCH] drm/scheduler: Use ternary operator in standardized manner

2024-07-15 Thread Philipp Stanner
drm_sched_init() omits the middle operand when using the ternary
operator to set the timeout_wq if one has been passed.

This is a non-standardized GNU extension to the C language [1].

It decreases code readability and might be read as a bug. Furthermore,
it is not consistent with all other places in drm/scheduler where the
ternary operator is used.

Replace the expression with the standard one.

[1] https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Conditionals.html

Suggested-by: Marco Pagani 
Signed-off-by: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..02cf9c37a232 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1257,7 +1257,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
sched->credit_limit = credit_limit;
sched->name = name;
sched->timeout = timeout;
-   sched->timeout_wq = timeout_wq ? : system_wq;
+   sched->timeout_wq = timeout_wq ? timeout_wq : system_wq;
sched->hang_limit = hang_limit;
sched->score = score ? score : &sched->_score;
sched->dev = dev;
-- 
2.45.0



[PATCH v2] drm/nouveau: Improve variable names in nouveau_sched_init()

2024-07-11 Thread Philipp Stanner
nouveau_sched_init() uses the function drm_sched_init(). The latter
function has parameters called "hang_limit" and "timeout" in its API
documentation.

nouveau_sched_init(), however, defines a variable called
"job_hang_limit" which is passed to drm_sched_init()'s "timeout"
parameter. The actual "hang_limit" parameter is directly set to 0.

Rename "job_hang_limit" to "timeout".

Signed-off-by: Philipp Stanner 
---
Changes in v2:
- Remove variable "hang_limit". (Danilo)
---
 drivers/gpu/drm/nouveau/nouveau_sched.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c 
b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 32fa2e273965..ba4139288a6d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -404,7 +404,7 @@ nouveau_sched_init(struct nouveau_sched *sched, struct 
nouveau_drm *drm,
 {
struct drm_gpu_scheduler *drm_sched = &sched->base;
struct drm_sched_entity *entity = &sched->entity;
-   long job_hang_limit = msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
+   const long timeout = msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
int ret;
 
if (!wq) {
@@ -418,7 +418,7 @@ nouveau_sched_init(struct nouveau_sched *sched, struct 
nouveau_drm *drm,
 
ret = drm_sched_init(drm_sched, &nouveau_sched_ops, wq,
 NOUVEAU_SCHED_PRIORITY_COUNT,
-credit_limit, 0, job_hang_limit,
+credit_limit, 0, timeout,
 NULL, NULL, "nouveau_sched", drm->dev->dev);
if (ret)
goto fail_wq;
-- 
2.45.0



[PATCH] drm/nouveau: Improve variable names in nouveau_sched_init()

2024-07-11 Thread Philipp Stanner
nouveau_sched_init() uses the function drm_sched_init(). The latter
function has parameters called "hang_limit" and "timeout" in its API
documentation.

nouveau_sched_init(), however, defines a variable called
"job_hang_limit" which is passed to drm_sched_init()'s "timeout"
parameter. The actual "hang_limit" parameter is directly set to 0.

Define a new variable and rename the existing one to make naming
congruent with the function API.

Signed-off-by: Philipp Stanner 
---
 drivers/gpu/drm/nouveau/nouveau_sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c 
b/drivers/gpu/drm/nouveau/nouveau_sched.c
index 32fa2e273965..ee1f49056737 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -404,7 +404,8 @@ nouveau_sched_init(struct nouveau_sched *sched, struct 
nouveau_drm *drm,
 {
struct drm_gpu_scheduler *drm_sched = &sched->base;
struct drm_sched_entity *entity = &sched->entity;
-   long job_hang_limit = msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
+   const long timeout = msecs_to_jiffies(NOUVEAU_SCHED_JOB_TIMEOUT_MS);
+   const unsigned int hang_limit = 0;
int ret;
 
if (!wq) {
@@ -418,7 +419,7 @@ nouveau_sched_init(struct nouveau_sched *sched, struct 
nouveau_drm *drm,
 
ret = drm_sched_init(drm_sched, &nouveau_sched_ops, wq,
 NOUVEAU_SCHED_PRIORITY_COUNT,
-credit_limit, 0, job_hang_limit,
+credit_limit, hang_limit, timeout,
 NULL, NULL, "nouveau_sched", drm->dev->dev);
if (ret)
goto fail_wq;
-- 
2.45.0



Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

2024-07-09 Thread Philipp Stanner
>From c24bd5b66e798a341caf183fb7cdbdf235502d90 Mon Sep 17 00:00:00 2001
From: Philipp Stanner 
Date: Tue, 9 Jul 2024 09:45:48 +0200
Subject: [PATCH] PCI: Fix pcim_intx() recursive calls

pci_intx() calls into pcim_intx() in managed mode, i.e., when
pcim_enable_device() had been called. This recursive call causes a bug
by re-registering the device resource in the release callback.

This is the same phenomenon that made it necessary to implement some
functionality a second time, see __pcim_request_region().

Implement __pcim_intx() to bypass the hybrid nature of pci_intx() on
driver detach.

Fixes: https://lore.kernel.org/all/20240708214656.4721-1-ashish.ka...@amd.com/
Reported-by: Ashish Kalra 
Signed-off-by: Philipp Stanner 
---
Hi Ashish,
I hacked down this fix that should be applyable on top.
Could you maybe have a first quick look whether this fixes the issue?
---
 drivers/pci/devres.c | 33 +
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 2f0379a4e58f..dcef049b72fe 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -408,12 +408,31 @@ static inline bool mask_contains_bar(int mask, int bar)
return mask & BIT(bar);
 }
 
+/*
+ * This is a copy of pci_intx() used to bypass the problem of occuring
+ * recursive function calls due to the hybrid nature of pci_intx().
+ */
+static void __pcim_intx(struct pci_dev *pdev, int enable)
+{
+   u16 pci_command, new;
+
+   pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
+
+   if (enable)
+   new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
+   else
+   new = pci_command | PCI_COMMAND_INTX_DISABLE;
+
+   if (new != pci_command)
+   pci_write_config_word(pdev, PCI_COMMAND, new);
+}
+
 static void pcim_intx_restore(struct device *dev, void *data)
 {
struct pci_dev *pdev = to_pci_dev(dev);
struct pcim_intx_devres *res = data;
 
-   pci_intx(pdev, res->orig_intx);
+   __pcim_intx(pdev, res->orig_intx);
 }
 
 static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
@@ -443,7 +462,6 @@ static struct pcim_intx_devres 
*get_or_create_intx_devres(struct device *dev)
  */
 int pcim_intx(struct pci_dev *pdev, int enable)
 {
-   u16 pci_command, new;
struct pcim_intx_devres *res;
 
res = get_or_create_intx_devres(&pdev->dev);
@@ -451,16 +469,7 @@ int pcim_intx(struct pci_dev *pdev, int enable)
return -ENOMEM;
 
res->orig_intx = !enable;
-
-   pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
-
-   if (enable)
-   new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
-   else
-   new = pci_command | PCI_COMMAND_INTX_DISABLE;
-
-   if (new != pci_command)
-   pci_write_config_word(pdev, PCI_COMMAND, new);
+   __pcim_intx(pdev, enable);
 
return 0;
 }
-- 
2.45.0



Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

2024-07-09 Thread Philipp Stanner
@Bjorn, @Krzysztof

On Mon, 2024-07-08 at 21:46 +, Ashish Kalra wrote:
> With this patch applied, we are observing unloading and then
> reloading issues with the AMD Crypto (CCP) driver:

Thank you very much for digging into this, Ashish

Could you give me some pointers how one could test CCP by himself?

> 
> with DEVRES logging enabled, we observe the following logs:
> 
> [  218.093588] ccp :a2:00.1: DEVRES REL c18c52fb
> 0x8d09dc1972c0 devm_kzalloc_release (152 bytes)
> [  218.105527] ccp :a2:00.1: DEVRES REL 3091fb95
> 0x8d09d3aad000 devm_kzalloc_release (3072 bytes)
> [  218.117500] ccp :a2:00.1: DEVRES REL 49e4adfe
> 0x8d09d588f000 pcim_intx_restore (4 bytes)
> [  218.129519] ccp :a2:00.1: DEVRES ADD 1a2ac6ad
> 0x8cfa867b7cc0 pcim_intx_restore (4 bytes)
> [  218.140434] ccp :a2:00.1: DEVRES REL 627ecaf7
> 0x8d09d588f680 pcim_msi_release (16 bytes)
> [  218.151665] ccp :a2:00.1: DEVRES REL 58b2252a
> 0x8d09dc199680 msi_device_data_release (80 bytes)
> [  218.163625] ccp :a2:00.1: DEVRES REL 435cc85e
> 0x8d09d588ff80 devm_attr_group_remove (8 bytes)
> [  218.175224] ccp :a2:00.1: DEVRES REL cb6fcd9b
> 0x8d09eb583660 pcim_addr_resource_release (40 bytes)
> [  218.187319] ccp :a2:00.1: DEVRES REL d64a8b84
> 0x8d09eb583180 pcim_iomap_release (48 bytes)
> [  218.198615] ccp :a2:00.1: DEVRES REL 99ac6b28
> 0x8d09eb5830c0 pcim_addr_resource_release (40 bytes)
> [  218.210730] ccp :a2:00.1: DEVRES REL bdd27f88
> 0x8d09d3ac2700 pcim_release (0 bytes)
> [  218.221489] ccp :a2:00.1: DEVRES REL e763315c
> 0x8d09d3ac2240 devm_kzalloc_release (20 bytes)
> [  218.233008] ccp :a2:00.1: DEVRES REL ae90f983
> 0x8d09dc25a800 devm_kzalloc_release (184 bytes)
> [  218.245251] ccp :23:00.1: DEVRES REL a2ec0085
> 0x8cfa86bee700 fw_name_devm_release (16 bytes)
> [  218.256748] ccp :23:00.1: DEVRES REL 21bccd98
> 0x8cfaa528d5c0 devm_pages_release (16 bytes)
> [  218.268044] ccp :23:00.1: DEVRES REL 3ef7cbc7
> 0x8cfaa1b5ec00 devm_kzalloc_release (104 bytes)
> [  218.279631] ccp :23:00.1: DEVRES REL 619322e1
> 0x8cfaa1b5e480 devm_kzalloc_release (152 bytes)
> [  218.300438] ccp :23:00.1: DEVRES REL c261523b
> 0x8cfaad88b000 devm_kzalloc_release (3072 bytes)
> [  218.331000] ccp :23:00.1: DEVRES REL fbd19618
> 0x8cfaa528d140 pcim_intx_restore (4 bytes)
> [  218.361330] ccp :23:00.1: DEVRES ADD 57f8e767
> 0x8cfa867b7740 pcim_intx_restore (4 bytes)
> [  218.391226] ccp :23:00.1: DEVRES REL 58c9dce1
> 0x8cfaa528d880 pcim_msi_release (16 bytes)
> [  218.421340] ccp :23:00.1: DEVRES REL c8ab08a7
> 0x8cfa9e617300 msi_device_data_release (80 bytes)
> [  218.452357] ccp :23:00.1: DEVRES REL cf5baccb
> 0x8cfaa528d8c0 devm_attr_group_remove (8 bytes)
> [  218.483011] ccp :23:00.1: DEVRES REL b8cbbadd
> 0x8cfa9c596060 pcim_addr_resource_release (40 bytes)
> [  218.514343] ccp :23:00.1: DEVRES REL 920f9607
> 0x8cfa9c596c60 pcim_iomap_release (48 bytes)
> [  218.544659] ccp :23:00.1: DEVRES REL d401a708
> 0x8cfa9c596840 pcim_addr_resource_release (40 bytes)
> [  218.575774] ccp :23:00.1: DEVRES REL 865d2fa2
> 0x8cfaa528d940 pcim_release (0 bytes)
> [  218.605758] ccp :23:00.1: DEVRES REL f5b79222
> 0x8cfaa528d080 devm_kzalloc_release (20 bytes)
> [  218.636260] ccp :23:00.1: DEVRES REL 37ef240a
> 0x8cfa9eeb3f00 devm_kzalloc_release (184 bytes)
> 
> and the CCP driver reload issue during driver probe:
> 
> [  226.552684] pci :23:00.1: Resources present before probing
> [  226.568846] pci :a2:00.1: Resources present before probing
> 
> From the above DEVRES logging, it looks like pcim_intx_restore
> associated resource is being released but then
> being re-added during detach/unload, which causes really_probe() to
> fail at probe time, as dev->devres_head is
> not empty due to this added resource:
> ...
> [  218.331000] ccp :23:00.1: DEVRES REL fbd19618
> 0x8cfaa528d140 pcim_intx_restore (4 bytes)
> [  218.361330] ccp :23:00.1: DEVRES ADD 57f8e767
> 0x8cfa867b7740 pcim_intx_restore (4 bytes)
> ...
> 
> Going more deep into this: 
> 
> This is the initial pcim_intx_resoure associated resource being added
> during first (CCP) driver load:
> 
> [   40.418933]  pcim_intx+0x3a/0x120
> [   40.418936]  pci_intx+0x8b/0xa0
> [   40.418939]  __pci_enable_msix_range+0x369/0x530
> [   40.418943]  pci_enable_msix_range+0x18/0x20
> [   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
> [   40.418965] ipmi device interface
> [   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
> [   40.418969]  local_pci_probe+0x4f/0xb0
> [   40.418973]  work_for_cpu_fn+0x1e/0x30
>

Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

2024-06-18 Thread Philipp Stanner
On Mon, 2024-06-17 at 11:46 -0500, Bjorn Helgaas wrote:
> On Mon, Jun 17, 2024 at 10:21:10AM +0200, Philipp Stanner wrote:
> > On Fri, 2024-06-14 at 11:14 -0500, Bjorn Helgaas wrote:
> > > On Fri, Jun 14, 2024 at 10:09:46AM +0200, Philipp Stanner wrote:
> > ...
> 
> > > > Apparently INTx is "old IRQ management" and should be done
> > > > through
> > > > pci_alloc_irq_vectors() nowadays.
> > > 
> > > Do we have pcim_ support for pci_alloc_irq_vectors()?
> > 
> > Nope.
> 
> Should we?  Or is IRQ support not amenable to devm?

I don't see why it wouldn't work, AFAIU you just register a callback
that deregisters the interrupts again.

This series here, though, stems from me trying to clean up drivers in
DRM. That's when I discovered that regions / IO-mappings (which I need)
are broken.

Adding further stuff to pci/devres.c is no problem at all and
independent from this series; one just has to add the code and call the
appropriate devm_ functions.

> 
> Happened to see this new driver:
> https://lore.kernel.org/all/20240617100359.2550541-3-basavaraj.nati...@amd.com/
> that uses devm and the only PCI-related part of .remove() is cleaning
> up the IRQs.
> 

OK. They also use pcim_iomap_table() and stuff. I think we should
inform about the deprecation.

I don't have a user for IRQ at hand for my DRM work right now. I'd try
to upstream new infrastructure we need there as I did for vboxvideo.


Grüße,
P.



Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

2024-06-17 Thread Philipp Stanner
On Fri, 2024-06-14 at 11:14 -0500, Bjorn Helgaas wrote:
> On Fri, Jun 14, 2024 at 10:09:46AM +0200, Philipp Stanner wrote:
> > On Thu, 2024-06-13 at 16:06 -0500, Bjorn Helgaas wrote:
> > > On Thu, Jun 13, 2024 at 01:50:23PM +0200, Philipp Stanner wrote:
> > > > pci_intx() is one of the functions that have "hybrid mode"
> > > > (i.e.,
> > > > sometimes managed, sometimes not). Providing a separate
> > > > pcim_intx()
> > > > function with its own device resource and cleanup callback
> > > > allows
> > > > for
> > > > removing further large parts of the legacy PCI devres
> > > > implementation.
> > > > 
> > > > As in the region-request-functions, pci_intx() has to call into
> > > > its
> > > > managed counterpart for backwards compatibility.
> > > > 
> > > > As pci_intx() is an outdated function, pcim_intx() shall not be
> > > > made
> > > > visible to drivers via a public API.
> > > 
> > > What makes pci_intx() outdated?  If it's outdated, we should
> > > mention
> > > why and what the 30+ callers (including a couple in drivers/pci/)
> > > should use instead.
> > 
> > That is 100% based on Andy Shevchenko's (+CC) statement back from
> > January 2024 a.D. [1]
> > 
> > Apparently INTx is "old IRQ management" and should be done through
> > pci_alloc_irq_vectors() nowadays.
> 
> Do we have pcim_ support for pci_alloc_irq_vectors()?

Nope.

All PCI devres functions that exist are now in pci/devres.c, except for
the hybrid functions (pci_intx() and everything calling
__pci_request_region()) in pci.c


P.

> 
> > [1]
> > https://lore.kernel.org/all/ZabyY3csP0y-p7lb@surfacebook.localdomain/
> 



Re: [PATCH v9 00/13] Make PCI's devres API more consistent

2024-06-14 Thread Philipp Stanner
On Thu, 2024-06-13 at 16:57 -0500, Bjorn Helgaas wrote:
> On Thu, Jun 13, 2024 at 01:50:13PM +0200, Philipp Stanner wrote:
> > Changes in v9:
> >   - Remove forgotten dead code ('enabled' bit in struct pci_dev) in
> >     patch No.8 ("Move pinned status bit...")
> >   - Rework patch No.3:
> >   - Change title from "Reimplement plural devres functions"
> >     to "Add partial-BAR devres support".
> >   - Drop excessive details about the general cleanup from the
> > commit
> > message. Only motivate why this patch's new infrastructure
> > is
> > necessary.
> >   - Fix some minor spelling issues (s/pci/PCI ...)
> > 
> > Changes in v8:
> >   - Rebase the series on the already merged patches which were
> > slightly
> >     modified by Bjorn Helgaas.
> >   - Reword the pci_intx() commit message so it clearly states it's
> > about
> >     reworking pci_intx().
> >   - Move the removal of find_pci_dr() from patch "Remove legacy
> >     pcim_release()" to patch "Give pci_intx() its own devres
> > callback"
> >     since this later patch already removed all calls to that
> > function.
> >   - In patch "Give pci_intx() its own devres callback": use
> >     pci_is_enabled() (and, thus, the enabled_cnt in struct pci_dev)
> >     instead of a separate enabled field. (Bjorn)
> > 
> > Changes in v7:
> >   - Split the entire series in smaller, more atomic chunks /
> > patches
> >     (Bjorn)
> >   - Remove functions (such as pcim_iomap_region_range()) that do
> > not yet
> >     have a user (Bjorn)
> >   - Don't export interfaces publicly anymore, except for
> >     pcim_iomap_range(), needed by vboxvideo (Bjorn)
> >   - Mention the actual (vboxvideo) bug in "PCI: Warn users..."
> > commit
> >     (Bjorn)
> >   - Drop docstring warnings on PCI-internal functions (Bjorn)
> >   - Rework docstring warnings
> >   - Fix spelling in a few places. Rewrapp paragraphs (Bjorn)
> > 
> > Changes in v6:
> >   - Restructure the cleanup in pcim_iomap_regions_request_all() so
> > that
> >     it doesn't trigger a (false positive) test robot warning. No
> >     behavior change intended. (Dan Carpenter)
> > 
> > Changes in v5:
> >   - Add Hans's Reviewed-by to vboxvideo patch (Hans de Goede)
> >   - Remove stable-kernel from CC in vboxvideo patch (Hans de Goede)
> > 
> > Changes in v4:
> >   - Rebase against linux-next
> > 
> > Changes in v3:
> >   - Use the term "PCI devres API" at some forgotten places.
> >   - Fix more grammar errors in patch #3.
> >   - Remove the comment advising to call (the outdated) pcim_intx()
> > in pci.c
> >   - Rename __pcim_request_region_range() flags-field "exclusive" to
> >     "req_flags", since this is what the int actually represents.
> >   - Remove the call to pcim_region_request() from patch #10. (Hans)
> > 
> > Changes in v2:
> >   - Make commit head lines congruent with PCI's style (Bjorn)
> >   - Add missing error checks for devm_add_action(). (Andy)
> >   - Repair the "Returns: " marks for docu generation (Andy)
> >   - Initialize the addr_devres struct with memset(). (Andy)
> >   - Make pcim_intx() a PCI-internal function so that new drivers
> > won't
> >     be encouraged to use the outdated pci_intx() mechanism.
> >     (Andy / Philipp)
> >   - Fix grammar and spelling (Bjorn)
> >   - Be more precise on why pcim_iomap_table() is problematic
> > (Bjorn)
> >   - Provide the actual structs' and functions' names in the commit
> >     messages (Bjorn)
> >   - Remove redundant variable initializers (Andy)
> >   - Regroup PM bitfield members in struct pci_dev (Andy)
> >   - Make pcim_intx() visible only for the PCI subsystem so that
> > new    
> >     drivers won't use this outdated API (Andy, Myself)
> >   - Add a NOTE to pcim_iomap() to warn about this function being
> > the one
> >     exception that does just return NULL.
> >   - Consistently use the term "PCI devres API"; also in Patch #10
> > (Bjorn)
> > 
> > 
> > ¡Hola!
> > 
> > PCI's devres API suffers several weaknesses:
> > 
> > 1. There are functions prefixed with pcim_. Those are always
> > managed
> >    counterparts to never-managed functions prefixed with pci_ – or
> > so one
> >    would like to think. There are some appa

Re: [PATCH v7 09/13] PCI: Give pcim_set_mwi() its own devres callback

2024-06-14 Thread Philipp Stanner
On Thu, 2024-06-13 at 20:19 +0300, Ilpo Järvinen wrote:
> On Wed, 5 Jun 2024, Philipp Stanner wrote:
> 
> > Managing pci_set_mwi() with devres can easily be done with its own
> > callback, without the necessity to store any state about it in a
> > device-related struct.
> > 
> > Remove the MWI state from struct pci_devres.
> > Give pcim_set_mwi() a separate devres-callback.
> > 
> > Signed-off-by: Philipp Stanner 
> > ---
> >  drivers/pci/devres.c | 29 ++---
> >  drivers/pci/pci.h    |  1 -
> >  2 files changed, 18 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
> > index 936369face4b..0bafb67e1886 100644
> > --- a/drivers/pci/devres.c
> > +++ b/drivers/pci/devres.c
> > @@ -361,24 +361,34 @@ void __iomem
> > *devm_pci_remap_cfg_resource(struct device *dev,
> >  }
> >  EXPORT_SYMBOL(devm_pci_remap_cfg_resource);
> >  
> > +static void __pcim_clear_mwi(void *pdev_raw)
> > +{
> > +   struct pci_dev *pdev = pdev_raw;
> > +
> > +   pci_clear_mwi(pdev);
> > +}
> > +
> >  /**
> >   * pcim_set_mwi - a device-managed pci_set_mwi()
> > - * @dev: the PCI device for which MWI is enabled
> > + * @pdev: the PCI device for which MWI is enabled
> >   *
> >   * Managed pci_set_mwi().
> >   *
> >   * RETURNS: An appropriate -ERRNO error value on error, or zero
> > for success.
> >   */
> > -int pcim_set_mwi(struct pci_dev *dev)
> > +int pcim_set_mwi(struct pci_dev *pdev)
> >  {
> > -   struct pci_devres *dr;
> > +   int ret;
> >  
> > -   dr = find_pci_dr(dev);
> > -   if (!dr)
> > -   return -ENOMEM;
> > +   ret = devm_add_action(&pdev->dev, __pcim_clear_mwi, pdev);
> > +   if (ret != 0)
> > +   return ret;
> > +
> > +   ret = pci_set_mwi(pdev);
> > +   if (ret != 0)
> > +   devm_remove_action(&pdev->dev, __pcim_clear_mwi,
> > pdev);
> 
> I'm sorry if this is a stupid question but why this cannot use 
> devm_add_action_or_reset()?

For MWI that could be done.

This is basically just consistent with the new pcim_enable_device() in
patch No.11 where devm_add_action_or_reset() could collide with
pcim_pin_device().

We could squash usage of devm_add_action_or_reset() in here. I don't
care.

P.


> 
> > -   dr->mwi = 1;
> > -   return pci_set_mwi(dev);
> > +   return ret;
> >  }
> >  EXPORT_SYMBOL(pcim_set_mwi);
> 



Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

2024-06-14 Thread Philipp Stanner
On Thu, 2024-06-13 at 16:06 -0500, Bjorn Helgaas wrote:
> On Thu, Jun 13, 2024 at 01:50:23PM +0200, Philipp Stanner wrote:
> > pci_intx() is one of the functions that have "hybrid mode" (i.e.,
> > sometimes managed, sometimes not). Providing a separate pcim_intx()
> > function with its own device resource and cleanup callback allows
> > for
> > removing further large parts of the legacy PCI devres
> > implementation.
> > 
> > As in the region-request-functions, pci_intx() has to call into its
> > managed counterpart for backwards compatibility.
> > 
> > As pci_intx() is an outdated function, pcim_intx() shall not be
> > made
> > visible to drivers via a public API.
> 
> What makes pci_intx() outdated?  If it's outdated, we should mention
> why and what the 30+ callers (including a couple in drivers/pci/)
> should use instead.

That is 100% based on Andy Shevchenko's (+CC) statement back from
January 2024 a.D. [1]

Apparently INTx is "old IRQ management" and should be done through
pci_alloc_irq_vectors() nowadays.


[1] https://lore.kernel.org/all/ZabyY3csP0y-p7lb@surfacebook.localdomain/


P.


> 
> Bjorn
> 



Re: [PATCH v9 03/13] PCI: Add partial-BAR devres support

2024-06-14 Thread Philipp Stanner
On Thu, 2024-06-13 at 16:28 -0500, Bjorn Helgaas wrote:
> On Thu, Jun 13, 2024 at 01:50:16PM +0200, Philipp Stanner wrote:
> > With the current PCI devres API implementing a managed version of
> > pci_iomap_range() is impossible.
> > 
> > Furthermore, the PCI devres API currently is inconsistent and
> > complicated. This is in large part due to the fact that there are
> > hybrid
> > functions which are only sometimes managed via devres, and
> > functions
> > IO-mapping and requesting several BARs at once and returning
> > mappings
> > through a separately administrated table.
> > 
> > This table's indexing mechanism does not support partial-BAR
> > mappings.
> > 
> > Another notable problem is that there are no separate managed
> > counterparts for region-request functions such as
> > pci_request_region(),
> > as they exist for other PCI functions (e.g., pci_iomap() <->
> > pcim_iomap()). Instead, functions based on __pci_request_region()
> > change
> > their internal behavior and suddenly become managed functions when
> > pcim_enable_device() instead of pci_enable_device() is used.
> 
> The hybrid thing is certainly a problem, but does this patch address
> it?  I don't see that it does (other than adding comments in
> __pci_request_region() and pci_release_region()), but maybe I missed
> it.

This is just justification for why __pcim_request_region() etc. are
implemented. They bypass the hybrid nature  of __pci_request_region().
If the latter wouldn't have that behavior

> 
> Correct me if I'm wrong, but I don't think this patch makes any
> user-visible changes.

Except for deprecating that two functions and adding a new public one,
the entire series shouldn't make user-visible changes. That's the
point.

P.

> 
> I'm proposing this:
> 
>   PCI: Add managed partial-BAR request and map infrastructure
> 
>   The pcim_iomap_devres table tracks entire-BAR mappings, so we can't
> use it
>   to build a managed version of pci_iomap_range(), which maps partial
> BARs.
> 
>   Add struct pcim_addr_devres, which can track request and mapping of
> both
>   entire BARs and partial BARs.
> 
>   Add the following internal devres functions based on struct
>   pcim_addr_devres:
> 
>     pcim_iomap_region()   # request & map entire BAR
>     pcim_iounmap_region() # unmap & release entire BAR
>     pcim_request_region() # request entire BAR
>     pcim_release_region() # release entire BAR
>     pcim_request_all_regions()    # request all entire BARs
>     pcim_release_all_regions()    # release all entire BARs
> 
>   Rework the following public interfaces using the new infrastructure
>   listed above:
> 
>     pcim_iomap()  # map partial BAR
>     pcim_iounmap()    # unmap partial BAR
>     pcim_iomap_regions()  # request & map specified BARs
>     pcim_iomap_regions_request_all()  # request all BARs, map
> specified BARs
>     pcim_iounmap_regions()    # unmap & release specified
> BARs
> 
> 
> > This API is hard to understand and potentially bug-provoking.
> > Hence, it
> > should be made more consistent.
> > 
> > This patch adds the necessary infrastructure for partial-BAR
> > mappings
> > managed with devres. That infrastructure also serves as a ground
> > layer
> > for significantly simplifying the PCI devres API in subsequent
> > patches
> > which can then cleanly separate managed and unmanaged API.
> > 
> > When having the long term goal of providing always managed
> > functions
> > prefixed with "pcim_" and never managed functions prefixed with
> > "pci_"
> > and, thus, separating managed and unmanaged APIs cleanly, new PCI
> > devres
> > infrastructure cannot use __pci_request_region() and its wrappers
> > since
> > those would then again interact with PCI devres and, consequently,
> > prevent the managed nature from being removed from the pci_*
> > functions
> > in the first place. Thus, it's necessary to provide an alternative
> > to
> > __pci_request_region().
> > 
> > This patch addresses the following problems of the PCI devres API:
> > 
> >   a) There is no PCI devres infrastructure on which a managed
> > counter
> >  part to pci_iomap_range() could be based on.
> > 
> >   b) The vast majority of the users of plural functions such as
> >  pcim_iomap_regions() only ever sets a single bit in the bit
> > mask,
> >

[PATCH v9 12/13] PCI: Add pcim_iomap_range()

2024-06-13 Thread Philipp Stanner
The only managed mapping function currently is pcim_iomap() which
doesn't allow for mapping an area starting at a certain offset, which
many drivers want.

Add pcim_iomap_range() as an exported function.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 44 
 include/linux/pci.h  |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 37ac8fd37291..2f0379a4e58f 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -1015,3 +1015,47 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask)
}
 }
 EXPORT_SYMBOL(pcim_iounmap_regions);
+
+/**
+ * pcim_iomap_range - Create a ranged __iomap mapping within a PCI BAR
+ * @pdev: PCI device to map IO resources for
+ * @bar: Index of the BAR
+ * @offset: Offset from the begin of the BAR
+ * @len: Length in bytes for the mapping
+ *
+ * Returns: __iomem pointer on success, an IOMEM_ERR_PTR on failure.
+ *
+ * Creates a new IO-Mapping within the specified @bar, ranging from @offset to
+ * @offset + @len.
+ *
+ * The mapping will automatically get unmapped on driver detach. If desired,
+ * release manually only with pcim_iounmap().
+ */
+void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar,
+   unsigned long offset, unsigned long len)
+{
+   void __iomem *mapping;
+   struct pcim_addr_devres *res;
+
+   res = pcim_addr_devres_alloc(pdev);
+   if (!res)
+   return IOMEM_ERR_PTR(-ENOMEM);
+
+   mapping = pci_iomap_range(pdev, bar, offset, len);
+   if (!mapping) {
+   pcim_addr_devres_free(res);
+   return IOMEM_ERR_PTR(-EINVAL);
+   }
+
+   res->type = PCIM_ADDR_DEVRES_TYPE_MAPPING;
+   res->baseaddr = mapping;
+
+   /*
+* Ranged mappings don't get added to the legacy-table, since the table
+* only ever keeps track of whole BARs.
+*/
+
+   devres_add(&pdev->dev, res);
+   return mapping;
+}
+EXPORT_SYMBOL(pcim_iomap_range);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0c19f0717899..98893a89bb5b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2303,6 +2303,8 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name);
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name);
 void pcim_iounmap_regions(struct pci_dev *pdev, int mask);
+void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar,
+   unsigned long offset, unsigned long len);
 
 extern int pci_pci_problems;
 #define PCIPCI_FAIL1   /* No PCI PCI DMA */
-- 
2.45.0



[PATCH v9 11/13] PCI: Remove legacy pcim_release()

2024-06-13 Thread Philipp Stanner
Thanks to preceding cleanup steps, pcim_release() is now not needed
anymore and can be replaced by pcim_disable_device(), which is the exact
counterpart to pcim_enable_device().

This permits removing further parts of the old PCI devres implementation.

Replace pcim_release() with pcim_disable_device().
Remove the now surplus function get_pci_dr().
Remove the struct pci_devres from pci.h.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 53 +---
 drivers/pci/pci.h| 16 -
 2 files changed, 25 insertions(+), 44 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 7b72c952a9e5..37ac8fd37291 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -465,48 +465,45 @@ int pcim_intx(struct pci_dev *pdev, int enable)
return 0;
 }
 
-static void pcim_release(struct device *gendev, void *res)
+static void pcim_disable_device(void *pdev_raw)
 {
-   struct pci_dev *dev = to_pci_dev(gendev);
-
-   if (pci_is_enabled(dev) && !dev->pinned)
-   pci_disable_device(dev);
-}
-
-static struct pci_devres *get_pci_dr(struct pci_dev *pdev)
-{
-   struct pci_devres *dr, *new_dr;
-
-   dr = devres_find(&pdev->dev, pcim_release, NULL, NULL);
-   if (dr)
-   return dr;
+   struct pci_dev *pdev = pdev_raw;
 
-   new_dr = devres_alloc(pcim_release, sizeof(*new_dr), GFP_KERNEL);
-   if (!new_dr)
-   return NULL;
-   return devres_get(&pdev->dev, new_dr, NULL, NULL);
+   if (!pdev->pinned)
+   pci_disable_device(pdev);
 }
 
 /**
  * pcim_enable_device - Managed pci_enable_device()
  * @pdev: PCI device to be initialized
  *
- * Managed pci_enable_device().
+ * Returns: 0 on success, negative error code on failure.
+ *
+ * Managed pci_enable_device(). Device will automatically be disabled on
+ * driver detach.
  */
 int pcim_enable_device(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
-   int rc;
+   int ret;
 
-   dr = get_pci_dr(pdev);
-   if (unlikely(!dr))
-   return -ENOMEM;
+   ret = devm_add_action(&pdev->dev, pcim_disable_device, pdev);
+   if (ret != 0)
+   return ret;
 
-   rc = pci_enable_device(pdev);
-   if (!rc)
-   pdev->is_managed = 1;
+   /*
+* We prefer removing the action in case of an error over
+* devm_add_action_or_reset() because the later could theoretically be
+* disturbed by users having pinned the device too soon.
+*/
+   ret = pci_enable_device(pdev);
+   if (ret != 0) {
+   devm_remove_action(&pdev->dev, pcim_disable_device, pdev);
+   return ret;
+   }
 
-   return rc;
+   pdev->is_managed = true;
+
+   return ret;
 }
 EXPORT_SYMBOL(pcim_enable_device);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9e87528f1157..e51e6fa79fcc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -810,22 +810,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
 }
 #endif
 
-/*
- * Managed PCI resources.  This manages device on/off, INTx/MSI/MSI-X
- * on/off and BAR regions.  pci_dev itself records MSI/MSI-X status, so
- * there's no need to track it separately.  pci_devres is initialized
- * when a device is enabled using managed PCI device enable interface.
- *
- * TODO: Struct pci_devres only needs to be here because they're used in pci.c.
- * Port or move these functions to devres.c and then remove them from here.
- */
-struct pci_devres {
-   /*
-* TODO:
-* This struct is now surplus. Remove it by refactoring pci/devres.c
-*/
-};
-
 int pcim_intx(struct pci_dev *dev, int enable);
 
 int pcim_request_region(struct pci_dev *pdev, int bar, const char *name);
-- 
2.45.0



[PATCH v9 09/13] PCI: Give pcim_set_mwi() its own devres callback

2024-06-13 Thread Philipp Stanner
Managing pci_set_mwi() with devres can easily be done with its own
callback, without the necessity to store any state about it in a
device-related struct.

Remove the MWI state from struct pci_devres.
Give pcim_set_mwi() a separate devres-callback.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 29 ++---
 drivers/pci/pci.h|  1 -
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 84caa0034813..e8de93e95eb6 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -366,24 +366,34 @@ void __iomem *devm_pci_remap_cfg_resource(struct device 
*dev,
 }
 EXPORT_SYMBOL(devm_pci_remap_cfg_resource);
 
+static void __pcim_clear_mwi(void *pdev_raw)
+{
+   struct pci_dev *pdev = pdev_raw;
+
+   pci_clear_mwi(pdev);
+}
+
 /**
  * pcim_set_mwi - a device-managed pci_set_mwi()
- * @dev: the PCI device for which MWI is enabled
+ * @pdev: the PCI device for which MWI is enabled
  *
  * Managed pci_set_mwi().
  *
  * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
  */
-int pcim_set_mwi(struct pci_dev *dev)
+int pcim_set_mwi(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
+   int ret;
 
-   dr = find_pci_dr(dev);
-   if (!dr)
-   return -ENOMEM;
+   ret = devm_add_action(&pdev->dev, __pcim_clear_mwi, pdev);
+   if (ret != 0)
+   return ret;
+
+   ret = pci_set_mwi(pdev);
+   if (ret != 0)
+   devm_remove_action(&pdev->dev, __pcim_clear_mwi, pdev);
 
-   dr->mwi = 1;
-   return pci_set_mwi(dev);
+   return ret;
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
@@ -397,9 +407,6 @@ static void pcim_release(struct device *gendev, void *res)
struct pci_dev *dev = to_pci_dev(gendev);
struct pci_devres *this = res;
 
-   if (this->mwi)
-   pci_clear_mwi(dev);
-
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6e02ba1b5947..c355bb6a698d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -823,7 +823,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
 struct pci_devres {
unsigned int orig_intx:1;
unsigned int restore_intx:1;
-   unsigned int mwi:1;
 };
 
 struct pci_devres *find_pci_dr(struct pci_dev *pdev);
-- 
2.45.0



[PATCH v9 08/13] PCI: Move pinned status bit to struct pci_dev

2024-06-13 Thread Philipp Stanner
The bit describing whether the PCI device is currently pinned is stored
in struct pci_devres. To clean up and simplify the PCI devres API, it's
better if this information is stored in struct pci_dev.

This will later permit simplifying pcim_enable_device().

Move the 'pinned' boolean bit to struct pci_dev.

Restructure bits in struct pci_dev so the pm / pme fields are next to
each other.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 14 --
 drivers/pci/pci.h|  1 -
 include/linux/pci.h  |  3 ++-
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 643e3a94a1d6..84caa0034813 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -403,7 +403,7 @@ static void pcim_release(struct device *gendev, void *res)
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
-   if (pci_is_enabled(dev) && !this->pinned)
+   if (pci_is_enabled(dev) && !dev->pinned)
pci_disable_device(dev);
 }
 
@@ -459,18 +459,12 @@ EXPORT_SYMBOL(pcim_enable_device);
  * pcim_pin_device - Pin managed PCI device
  * @pdev: PCI device to pin
  *
- * Pin managed PCI device @pdev.  Pinned device won't be disabled on
- * driver detach.  @pdev must have been enabled with
- * pcim_enable_device().
+ * Pin managed PCI device @pdev. Pinned device won't be disabled on driver
+ * detach. @pdev must have been enabled with pcim_enable_device().
  */
 void pcim_pin_device(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
-
-   dr = find_pci_dr(pdev);
-   WARN_ON(!dr || !pci_is_enabled(pdev));
-   if (dr)
-   dr->pinned = 1;
+   pdev->pinned = true;
 }
 EXPORT_SYMBOL(pcim_pin_device);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d7f00b43b098..6e02ba1b5947 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -821,7 +821,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
  * then remove them from here.
  */
 struct pci_devres {
-   unsigned int pinned:1;
unsigned int orig_intx:1;
unsigned int restore_intx:1;
unsigned int mwi:1;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index fb004fd4e889..0c19f0717899 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -367,10 +367,11 @@ struct pci_dev {
   this is D0-D3, D0 being fully
   functional, and D3 being off. */
u8  pm_cap; /* PM capability offset */
-   unsigned intimm_ready:1;/* Supports Immediate Readiness */
unsigned intpme_support:5;  /* Bitmask of states from which PME#
   can be generated */
unsigned intpme_poll:1; /* Poll device's PME status bit */
+   unsigned intpinned:1;   /* Whether this dev is pinned */
+   unsigned intimm_ready:1;/* Supports Immediate Readiness */
unsigned intd1_support:1;   /* Low power state D1 is supported */
unsigned intd2_support:1;   /* Low power state D2 is supported */
unsigned intno_d1d2:1;  /* D1 and D2 are forbidden */
-- 
2.45.0



[PATCH v9 06/13] PCI: Warn users about complicated devres nature

2024-06-13 Thread Philipp Stanner
The PCI region-request functions become managed functions when
pcim_enable_device() has been called previously instead of
pci_enable_device().

This has already caused a bug (in 8558de401b5f) by confusing users, who
came to believe that all PCI functions, such as pci_iomap_range(), suddenly
are managed that way, which is not the case.

Add comments to the relevant functions' docstrings that warn users about
this behavior.

Link: https://lore.kernel.org/r/20240605081605.18769-8-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iomap.c | 16 
 drivers/pci/pci.c   | 42 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iomap.c b/drivers/pci/iomap.c
index c9725428e387..a715a4803c95 100644
--- a/drivers/pci/iomap.c
+++ b/drivers/pci/iomap.c
@@ -23,6 +23,10 @@
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR from offset to the end, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_range(struct pci_dev *dev,
  int bar,
@@ -63,6 +67,10 @@ EXPORT_SYMBOL(pci_iomap_range);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR from offset to the end, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
 int bar,
@@ -106,6 +114,10 @@ EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR without checking for its length first, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device(). If you need automatic cleanup, use pcim_iomap().
  * */
 void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 {
@@ -127,6 +139,10 @@ EXPORT_SYMBOL(pci_iomap);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR without checking for its length first, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
 {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7013699db242..5e4f377411ec 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3900,6 +3900,8 @@ EXPORT_SYMBOL(pci_release_region);
  * @res_name: Name to be associated with resource.
  * @exclusive: whether the region access is exclusive or not
  *
+ * Returns: 0 on success, negative error code on failure.
+ *
  * Mark the PCI region associated with PCI device @pdev BAR @bar as
  * being reserved by owner @res_name.  Do not access any
  * address inside the PCI regions unless this call returns
@@ -3950,6 +3952,8 @@ static int __pci_request_region(struct pci_dev *pdev, int 
bar,
  * @bar: BAR to be reserved
  * @res_name: Name to be associated with resource
  *
+ * Returns: 0 on success, negative error code on failure.
+ *
  * Mark the PCI region associated with PCI device @pdev BAR @bar as
  * being reserved by owner @res_name.  Do not access any
  * address inside the PCI regions unless this call returns
@@ -3957,6 +3961,11 @@ static int __pci_request_region(struct pci_dev *pdev, 
int bar,
  *
  * Returns 0 on success, or %EBUSY on error.  A warning
  * message is also printed on failure.
+ *
+ * NOTE:
+ * This is a "hybrid" function: It's normally unmanaged, but becomes managed
+ * when pcim_enable_device() has been called in advance. This hybrid feature is
+ * DEPRECATED! If you want managed cleanup, use the pcim_* functions instead.
  */
 int pci_request_region(struct pci_dev *pdev, int bar, const char *res_name)
 {
@@ -4007,6 +4016,13 @@ static int __pci_request_selected_regions(struct pci_dev 
*pdev, int bars,
  * @pdev: PCI device whose resources are to be reserved
  * @bars: Bitmask of BARs to be requested
  * @res_name: Name to be associated with resource
+ *
+ * Returns: 0 on success, negative error code on failure.
+ *
+ * NOTE:
+ * This is a "hybrid" function: It's normally unmanaged, but becomes managed
+ * when pcim_enable_device() has been called in advance. This hybrid feature is
+ * DEPRECATED! If you want managed cleanup, use the pcim_* functions instead.
  */
 int pci_request_selected_regions(struct pci_dev *pdev, int bars,
 const char *res_name)
@@ -4015,6 +4031,19 @@ int pci_request_selected_regions(struct pci_dev *pdev, 
int bars,
 }
 EXPORT_SYMBOL(pci_request_selected_regions);
 
+/**
+ * pci_request_selected_regions_exclusive - Request regions exclusively
+ * @pdev: PCI device to request regions from
+ *

[PATCH v9 07/13] PCI: Remove enabled status bit from pci_devres

2024-06-13 Thread Philipp Stanner
The PCI devres implementation has a separate boolean to track whether a
device is enabled. That, however, can easily be tracked in an agnostic
manner through the function pci_is_enabled().

Using it allows for simplifying the PCI devres implementation.

Replace the separate 'enabled' status bit from struct pci_devres with
calls to pci_is_enabled() at the appropriate places.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 11 ---
 drivers/pci/pci.c|  6 --
 drivers/pci/pci.h|  1 -
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index d90bed785c3f..643e3a94a1d6 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -403,7 +403,7 @@ static void pcim_release(struct device *gendev, void *res)
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
-   if (this->enabled && !this->pinned)
+   if (pci_is_enabled(dev) && !this->pinned)
pci_disable_device(dev);
 }
 
@@ -446,14 +446,11 @@ int pcim_enable_device(struct pci_dev *pdev)
dr = get_pci_dr(pdev);
if (unlikely(!dr))
return -ENOMEM;
-   if (dr->enabled)
-   return 0;
 
rc = pci_enable_device(pdev);
-   if (!rc) {
+   if (!rc)
pdev->is_managed = 1;
-   dr->enabled = 1;
-   }
+
return rc;
 }
 EXPORT_SYMBOL(pcim_enable_device);
@@ -471,7 +468,7 @@ void pcim_pin_device(struct pci_dev *pdev)
struct pci_devres *dr;
 
dr = find_pci_dr(pdev);
-   WARN_ON(!dr || !dr->enabled);
+   WARN_ON(!dr || !pci_is_enabled(pdev));
if (dr)
dr->pinned = 1;
 }
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 5e4f377411ec..db2cc48f3d63 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2218,12 +2218,6 @@ void pci_disable_enabled_device(struct pci_dev *dev)
  */
 void pci_disable_device(struct pci_dev *dev)
 {
-   struct pci_devres *dr;
-
-   dr = find_pci_dr(dev);
-   if (dr)
-   dr->enabled = 0;
-
dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
  "disabling already-disabled device");
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2403c5a0ff7a..d7f00b43b098 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -821,7 +821,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
  * then remove them from here.
  */
 struct pci_devres {
-   unsigned int enabled:1;
unsigned int pinned:1;
unsigned int orig_intx:1;
unsigned int restore_intx:1;
-- 
2.45.0



[PATCH v9 05/13] PCI: Make devres region requests consistent

2024-06-13 Thread Philipp Stanner
Now that pure managed region request functions are available, the
implementation of the hybrid-functions which are only sometimes managed can
be made more consistent and readable by wrapping those always-managed
functions.

Implement pcim_request_region_exclusive() as a PCI-internal helper.  Have
the PCI request / release functions call their pcim_ counterparts.  Remove
the now surplus region_mask from struct pci_devres.

Link: https://lore.kernel.org/r/20240605081605.18769-7-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 53 ++--
 drivers/pci/pci.c| 47 +--
 drivers/pci/pci.h| 10 -
 3 files changed, 45 insertions(+), 65 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 5ecffc7424ed..d90bed785c3f 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -24,18 +24,15 @@
  *
  *Consequently, in the new API, region requests performed by the pcim_
  *functions are automatically cleaned up through the devres callback
- *pcim_addr_resource_release(), while requests performed by
- *pcim_enable_device() + pci_*region*() are automatically cleaned up
- *through the for-loop in pcim_release().
+ *pcim_addr_resource_release().
+ *Users utilizing pcim_enable_device() + pci_*region*() are redirected in
+ *pci.c to the managed functions here in this file. This isn't exactly
+ *perfect, but the only alternative way would be to port ALL drivers using
+ *said combination to pcim_ functions.
  *
- * TODO 1:
+ * TODO:
  * Remove the legacy table entirely once all calls to pcim_iomap_table() in
  * the kernel have been removed.
- *
- * TODO 2:
- * Port everyone calling pcim_enable_device() + pci_*region*() to using the
- * pcim_ functions. Then, remove all devres functionality from pci_*region*()
- * functions and remove the associated cleanups described above in point #2.
  */
 
 /*
@@ -399,22 +396,6 @@ static void pcim_release(struct device *gendev, void *res)
 {
struct pci_dev *dev = to_pci_dev(gendev);
struct pci_devres *this = res;
-   int i;
-
-   /*
-* This is legacy code.
-*
-* All regions requested by a pcim_ function do get released through
-* pcim_addr_resource_release(). Thanks to the hybrid nature of the pci_
-* region-request functions, this for-loop has to release the regions
-* if they have been requested by such a function.
-*
-* TODO: Remove this once all users of pcim_enable_device() PLUS
-* pci-region-request-functions have been ported to pcim_ functions.
-*/
-   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
-   if (mask_contains_bar(this->region_mask, i))
-   pci_release_region(dev, i);
 
if (this->mwi)
pci_clear_mwi(dev);
@@ -823,11 +804,29 @@ static int _pcim_request_region(struct pci_dev *pdev, int 
bar, const char *name,
  * The region will automatically be released on driver detach. If desired,
  * release manually only with pcim_release_region().
  */
-static int pcim_request_region(struct pci_dev *pdev, int bar, const char *name)
+int pcim_request_region(struct pci_dev *pdev, int bar, const char *name)
 {
return _pcim_request_region(pdev, bar, name, 0);
 }
 
+/**
+ * pcim_request_region_exclusive - Request a PCI BAR exclusively
+ * @pdev: PCI device to requestion region for
+ * @bar: Index of BAR to request
+ * @name: Name associated with the request
+ *
+ * Returns: 0 on success, a negative error code on failure.
+ *
+ * Request region specified by @bar exclusively.
+ *
+ * The region will automatically be released on driver detach. If desired,
+ * release manually only with pcim_release_region().
+ */
+int pcim_request_region_exclusive(struct pci_dev *pdev, int bar, const char 
*name)
+{
+   return _pcim_request_region(pdev, bar, name, IORESOURCE_EXCLUSIVE);
+}
+
 /**
  * pcim_release_region - Release a PCI BAR
  * @pdev: PCI device to operate on
@@ -836,7 +835,7 @@ static int pcim_request_region(struct pci_dev *pdev, int 
bar, const char *name)
  * Release a region manually that was previously requested by
  * pcim_request_region().
  */
-static void pcim_release_region(struct pci_dev *pdev, int bar)
+void pcim_release_region(struct pci_dev *pdev, int bar)
 {
struct pcim_addr_devres res_searched;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d94445f5f882..7013699db242 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3872,7 +3872,15 @@ EXPORT_SYMBOL(pci_enable_atomic_ops_to_root);
  */
 void pci_release_region(struct pci_dev *pdev, int bar)
 {
-   struct pci_devres *dr;
+   /*
+* This is done for backwards compatibility, because the old PCI devres
+* API had a mode in which the function became managed if it had been
+

[PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

2024-06-13 Thread Philipp Stanner
pci_intx() is one of the functions that have "hybrid mode" (i.e.,
sometimes managed, sometimes not). Providing a separate pcim_intx()
function with its own device resource and cleanup callback allows for
removing further large parts of the legacy PCI devres implementation.

As in the region-request-functions, pci_intx() has to call into its
managed counterpart for backwards compatibility.

As pci_intx() is an outdated function, pcim_intx() shall not be made
visible to drivers via a public API.

Implement pcim_intx() with its own device resource.
Make pci_intx() call pcim_intx() in the managed case.
Remove the now surplus function find_pci_dr().

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 76 
 drivers/pci/pci.c| 21 ++--
 drivers/pci/pci.h| 13 
 3 files changed, 80 insertions(+), 30 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index e8de93e95eb6..7b72c952a9e5 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -42,6 +42,11 @@ struct pcim_iomap_devres {
void __iomem *table[PCI_STD_NUM_BARS];
 };
 
+/* Used to restore the old intx state on driver detach. */
+struct pcim_intx_devres {
+   int orig_intx;
+};
+
 enum pcim_addr_devres_type {
/* Default initializer. */
PCIM_ADDR_DEVRES_TYPE_INVALID,
@@ -397,32 +402,75 @@ int pcim_set_mwi(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
+
 static inline bool mask_contains_bar(int mask, int bar)
 {
return mask & BIT(bar);
 }
 
-static void pcim_release(struct device *gendev, void *res)
+static void pcim_intx_restore(struct device *dev, void *data)
 {
-   struct pci_dev *dev = to_pci_dev(gendev);
-   struct pci_devres *this = res;
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pcim_intx_devres *res = data;
 
-   if (this->restore_intx)
-   pci_intx(dev, this->orig_intx);
+   pci_intx(pdev, res->orig_intx);
+}
 
-   if (pci_is_enabled(dev) && !dev->pinned)
-   pci_disable_device(dev);
+static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
+{
+   struct pcim_intx_devres *res;
+
+   res = devres_find(dev, pcim_intx_restore, NULL, NULL);
+   if (res)
+   return res;
+
+   res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL);
+   if (res)
+   devres_add(dev, res);
+
+   return res;
 }
 
-/*
- * TODO: After the last four callers in pci.c are ported, find_pci_dr()
- * needs to be made static again.
+/**
+ * pcim_intx - managed pci_intx()
+ * @pdev: the PCI device to operate on
+ * @enable: boolean: whether to enable or disable PCI INTx
+ *
+ * Returns: 0 on success, -ENOMEM on error.
+ *
+ * Enables/disables PCI INTx for device @pdev.
+ * Restores the original state on driver detach.
  */
-struct pci_devres *find_pci_dr(struct pci_dev *pdev)
+int pcim_intx(struct pci_dev *pdev, int enable)
 {
-   if (pci_is_managed(pdev))
-   return devres_find(&pdev->dev, pcim_release, NULL, NULL);
-   return NULL;
+   u16 pci_command, new;
+   struct pcim_intx_devres *res;
+
+   res = get_or_create_intx_devres(&pdev->dev);
+   if (!res)
+   return -ENOMEM;
+
+   res->orig_intx = !enable;
+
+   pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
+
+   if (enable)
+   new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
+   else
+   new = pci_command | PCI_COMMAND_INTX_DISABLE;
+
+   if (new != pci_command)
+   pci_write_config_word(pdev, PCI_COMMAND, new);
+
+   return 0;
+}
+
+static void pcim_release(struct device *gendev, void *res)
+{
+   struct pci_dev *dev = to_pci_dev(gendev);
+
+   if (pci_is_enabled(dev) && !dev->pinned)
+   pci_disable_device(dev);
 }
 
 static struct pci_devres *get_pci_dr(struct pci_dev *pdev)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index db2cc48f3d63..1b4832a60047 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4443,6 +4443,16 @@ void pci_intx(struct pci_dev *pdev, int enable)
 {
u16 pci_command, new;
 
+   /*
+* This is done for backwards compatibility, because the old PCI devres
+* API had a mode in which this function became managed if the dev had
+* been enabled with pcim_enable_device() instead of 
pci_enable_device().
+*/
+   if (pci_is_managed(pdev)) {
+   WARN_ON_ONCE(pcim_intx(pdev, enable) != 0);
+   return;
+   }
+
pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
 
if (enable)
@@ -4450,17 +4460,8 @@ void pci_intx(struct pci_dev *pdev, int enable)
else
new = pci_command | PCI_COMMAND_INTX_DISABLE;
 
-   if (new != pci_command) {
-   struct pci_devres *dr;
-
+   if (new !=

[PATCH v9 13/13] drm/vboxvideo: fix mapping leaks

2024-06-13 Thread Philipp Stanner
When the PCI devres API was introduced to this driver, it was wrongly
assumed that initializing the device with pcim_enable_device() instead
of pci_enable_device() will make all PCI functions managed.

This is wrong and was caused by the quite confusing PCI devres API in
which some, but not all, functions become managed that way.

The function pci_iomap_range() is never managed.

Replace pci_iomap_range() with the actually managed function
pcim_iomap_range().

Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions")
Signed-off-by: Philipp Stanner 
Reviewed-by: Hans de Goede 
---
 drivers/gpu/drm/vboxvideo/vbox_main.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c 
b/drivers/gpu/drm/vboxvideo/vbox_main.c
index 42c2d8a99509..d4ade9325401 100644
--- a/drivers/gpu/drm/vboxvideo/vbox_main.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
@@ -42,12 +42,11 @@ static int vbox_accel_init(struct vbox_private *vbox)
/* Take a command buffer for each screen from the end of usable VRAM. */
vbox->available_vram_size -= vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE;
 
-   vbox->vbva_buffers = pci_iomap_range(pdev, 0,
-vbox->available_vram_size,
-vbox->num_crtcs *
-VBVA_MIN_BUFFER_SIZE);
-   if (!vbox->vbva_buffers)
-   return -ENOMEM;
+   vbox->vbva_buffers = pcim_iomap_range(
+   pdev, 0, vbox->available_vram_size,
+   vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE);
+   if (IS_ERR(vbox->vbva_buffers))
+   return PTR_ERR(vbox->vbva_buffers);
 
for (i = 0; i < vbox->num_crtcs; ++i) {
vbva_setup_buffer_context(&vbox->vbva_info[i],
@@ -116,11 +115,10 @@ int vbox_hw_init(struct vbox_private *vbox)
DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
 
/* Map guest-heap at end of vram */
-   vbox->guest_heap =
-   pci_iomap_range(pdev, 0, GUEST_HEAP_OFFSET(vbox),
-   GUEST_HEAP_SIZE);
-   if (!vbox->guest_heap)
-   return -ENOMEM;
+   vbox->guest_heap = pcim_iomap_range(pdev, 0,
+   GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
+   if (IS_ERR(vbox->guest_heap))
+   return PTR_ERR(vbox->guest_heap);
 
/* Create guest-heap mem-pool use 2^4 = 16 byte chunks */
vbox->guest_pool = devm_gen_pool_create(vbox->ddev.dev, 4, -1,
-- 
2.45.0



[PATCH v9 04/13] PCI: Deprecate two surplus devres functions

2024-06-13 Thread Philipp Stanner
pcim_iomap_table() should not be used anymore because it contributed to the
PCI devres API being designed contrary to devres's design goals.

pcim_iomap_regions_request_all() is a surplus, complicated function that
can easily be replaced by using a pcim_* request function in combination
with a pcim_* mapping function.

Mark pcim_iomap_table() and pcim_iomap_regions_request_all() as deprecated
in the function documentation.

Link: https://lore.kernel.org/r/20240605081605.18769-6-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index cf2c11b54ca6..5ecffc7424ed 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -507,7 +507,7 @@ static void pcim_iomap_release(struct device *gendev, void 
*res)
 }
 
 /**
- * pcim_iomap_table - access iomap allocation table
+ * pcim_iomap_table - access iomap allocation table (DEPRECATED)
  * @pdev: PCI device to access iomap table for
  *
  * Returns:
@@ -521,6 +521,11 @@ static void pcim_iomap_release(struct device *gendev, void 
*res)
  * This function might sleep when the table is first allocated but can
  * be safely called without context and guaranteed to succeed once
  * allocated.
+ *
+ * This function is DEPRECATED. Do not use it in new code. Instead, obtain a
+ * mapping's address directly from one of the pcim_* mapping functions. For
+ * example:
+ * void __iomem *mappy = pcim_iomap(pdev, bar, length);
  */
 void __iomem * const *pcim_iomap_table(struct pci_dev *pdev)
 {
@@ -894,6 +899,7 @@ static int pcim_request_all_regions(struct pci_dev *pdev, 
const char *name)
 
 /**
  * pcim_iomap_regions_request_all - Request all BARs and iomap specified ones
+ * (DEPRECATED)
  * @pdev: PCI device to map IO resources for
  * @mask: Mask of BARs to iomap
  * @name: Name associated with the requests
@@ -904,6 +910,10 @@ static int pcim_request_all_regions(struct pci_dev *pdev, 
const char *name)
  *
  * To release these resources manually, call pcim_release_region() for the
  * regions and pcim_iounmap() for the mappings.
+ *
+ * This function is DEPRECATED. Don't use it in new code. Instead, use one
+ * of the pcim_* region request functions in combination with a pcim_*
+ * mapping function.
  */
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name)
-- 
2.45.0



[PATCH v9 03/13] PCI: Add partial-BAR devres support

2024-06-13 Thread Philipp Stanner
With the current PCI devres API implementing a managed version of
pci_iomap_range() is impossible.

Furthermore, the PCI devres API currently is inconsistent and
complicated. This is in large part due to the fact that there are hybrid
functions which are only sometimes managed via devres, and functions
IO-mapping and requesting several BARs at once and returning mappings
through a separately administrated table.

This table's indexing mechanism does not support partial-BAR mappings.

Another notable problem is that there are no separate managed
counterparts for region-request functions such as pci_request_region(),
as they exist for other PCI functions (e.g., pci_iomap() <->
pcim_iomap()). Instead, functions based on __pci_request_region() change
their internal behavior and suddenly become managed functions when
pcim_enable_device() instead of pci_enable_device() is used.

This API is hard to understand and potentially bug-provoking. Hence, it
should be made more consistent.

This patch adds the necessary infrastructure for partial-BAR mappings
managed with devres. That infrastructure also serves as a ground layer
for significantly simplifying the PCI devres API in subsequent patches
which can then cleanly separate managed and unmanaged API.

When having the long term goal of providing always managed functions
prefixed with "pcim_" and never managed functions prefixed with "pci_"
and, thus, separating managed and unmanaged APIs cleanly, new PCI devres
infrastructure cannot use __pci_request_region() and its wrappers since
those would then again interact with PCI devres and, consequently,
prevent the managed nature from being removed from the pci_* functions
in the first place. Thus, it's necessary to provide an alternative to
__pci_request_region().

This patch addresses the following problems of the PCI devres API:

  a) There is no PCI devres infrastructure on which a managed counter
 part to pci_iomap_range() could be based on.

  b) The vast majority of the users of plural functions such as
 pcim_iomap_regions() only ever sets a single bit in the bit mask,
 consequently making them singular functions anyways.

  c) region-request functions being sometimes managed and sometimes not
 is bug-provoking. pcim_* functions should always be managed, pci_*
 functions never.

Add a new PCI device resource, pcim_addr_devres, that serves to
encapsulate all device resource types related to region requests and
IO-mappings since those are very frequently created together.

Add a set of alternatives cleanly separated from the hybrid mechanism in
__pci_request_region() and its respective wrappers:
  - __pcim_request_region_range()
  - __pcim_release_region_range()
  - __pcim_request_region()
  - __pcim_release_region()

Add the following PCI-internal devres functions based on the above:
  - pcim_iomap_region()
  - pcim_iounmap_region()
  - _pcim_request_region()
  - pcim_request_region()
  - pcim_release_region()
  - pcim_request_all_regions()
  - pcim_release_all_regions()

Add new needed helper pcim_remove_bar_from_legacy_table().

Rework the following public interfaces using the new infrastructure
listed above:
  - pcim_iomap_release()
  - pcim_iomap()
  - pcim_iounmap()
  - pcim_iomap_regions()
  - pcim_iomap_regions_request_all()
  - pcim_iounmap_regions()

Update API documentation.

Link: https://lore.kernel.org/r/20240605081605.18769-5-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 608 ++-
 drivers/pci/pci.c|  22 ++
 drivers/pci/pci.h|   5 +
 3 files changed, 568 insertions(+), 67 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 845d6fab0ce7..cf2c11b54ca6 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -4,14 +4,243 @@
 #include "pci.h"
 
 /*
- * PCI iomap devres
+ * On the state of PCI's devres implementation:
+ *
+ * The older devres API for PCI has two significant problems:
+ *
+ * 1. It is very strongly tied to the statically allocated mapping table in
+ *struct pcim_iomap_devres below. This is mostly solved in the sense of the
+ *pcim_ functions in this file providing things like ranged mapping by
+ *bypassing this table, wheras the functions that were present in the old
+ *API still enter the mapping addresses into the table for users of the old
+ *API.
+ *
+ * 2. The region-request-functions in pci.c do become managed IF the device has
+ *been enabled with pcim_enable_device() instead of pci_enable_device().
+ *This resulted in the API becoming inconsistent: Some functions have an
+ *obviously managed counter-part (e.g., pci_iomap() <-> pcim_iomap()),
+ *whereas some don't and are never managed, while others don't and are
+ *_sometimes_ managed (e.g. pci_request_region()).
+ *
+ *Consequently, in the new API, region request

[PATCH v9 01/13] PCI: Add and use devres helper for bit masks

2024-06-13 Thread Philipp Stanner
The current derves implementation uses manual shift operations to check
whether a bit in a mask is set. The code can be made more readable by
writing a small helper function for that.

Implement mask_contains_bar() and use it where applicable.

Link: https://lore.kernel.org/r/20240605081605.18769-3-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 2c562b9eaf80..f13edd4a3873 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -161,6 +161,10 @@ int pcim_set_mwi(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
+static inline bool mask_contains_bar(int mask, int bar)
+{
+   return mask & BIT(bar);
+}
 
 static void pcim_release(struct device *gendev, void *res)
 {
@@ -169,7 +173,7 @@ static void pcim_release(struct device *gendev, void *res)
int i;
 
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
-   if (this->region_mask & (1 << i))
+   if (mask_contains_bar(this->region_mask, i))
pci_release_region(dev, i);
 
if (this->mwi)
@@ -363,7 +367,7 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name)
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
unsigned long len;
 
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
 
rc = -EINVAL;
@@ -386,7 +390,7 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name)
pci_release_region(pdev, i);
  err_inval:
while (--i >= 0) {
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
pcim_iounmap(pdev, iomap[i]);
pci_release_region(pdev, i);
@@ -438,7 +442,7 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask)
return;
 
for (i = 0; i < PCIM_IOMAP_MAX; i++) {
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
 
pcim_iounmap(pdev, iomap[i]);
-- 
2.45.0



[PATCH v9 00/13] Make PCI's devres API more consistent

2024-06-13 Thread Philipp Stanner
functions.
- preserves backwards compatibility so that drivers using the existing
  API won't notice any changes.
- adds documentation, especially some warning users about the
  complicated nature of PCI's devres.


Note that this series is based on my "unify pci_iounmap"-series from a
few weeks ago. [1]

I tested this on a x86 VM with a simple pci test-device with two
regions. Operates and reserves resources as intended on my system.
Kasan and kmemleak didn't find any problems.

I believe this series cleans the API up as much as possible without
having to port all existing drivers to the new API. Especially, I think
that this implementation is easy to extend if the need for new managed
functions arises :)

Greetings,
P.

Philipp Stanner (13):
  PCI: Add and use devres helper for bit masks
  PCI: Add devres helpers for iomap table
  PCI: Add partial-BAR devres support
  PCI: Deprecate two surplus devres functions
  PCI: Make devres region requests consistent
  PCI: Warn users about complicated devres nature
  PCI: Remove enabled status bit from pci_devres
  PCI: Move pinned status bit to struct pci_dev
  PCI: Give pcim_set_mwi() its own devres callback
  PCI: Give pci_intx() its own devres callback
  PCI: Remove legacy pcim_release()
  PCI: Add pcim_iomap_range()
  drm/vboxvideo: fix mapping leaks

 drivers/gpu/drm/vboxvideo/vbox_main.c |  20 +-
 drivers/pci/devres.c  | 903 +-
 drivers/pci/iomap.c   |  16 +
 drivers/pci/pci.c |  94 ++-
 drivers/pci/pci.h |  23 +-
 include/linux/pci.h   |   5 +-
 6 files changed, 858 insertions(+), 203 deletions(-)

-- 
2.45.0



[PATCH v9 02/13] PCI: Add devres helpers for iomap table

2024-06-13 Thread Philipp Stanner
The pcim_iomap_devres.table administrated by pcim_iomap_table() has its
entries set and unset at several places throughout devres.c using manual
iterations which are effectively code duplications.

Add pcim_add_mapping_to_legacy_table() and
pcim_remove_mapping_from_legacy_table() helper functions and use them where
possible.

Link: https://lore.kernel.org/r/20240605081605.18769-4-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
[bhelgaas: s/short bar/int bar/ for consistency]
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 77 +---
 1 file changed, 58 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index f13edd4a3873..845d6fab0ce7 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -297,6 +297,52 @@ void __iomem * const *pcim_iomap_table(struct pci_dev 
*pdev)
 }
 EXPORT_SYMBOL(pcim_iomap_table);
 
+/*
+ * Fill the legacy mapping-table, so that drivers using the old API can
+ * still get a BAR's mapping address through pcim_iomap_table().
+ */
+static int pcim_add_mapping_to_legacy_table(struct pci_dev *pdev,
+   void __iomem *mapping, int bar)
+{
+   void __iomem **legacy_iomap_table;
+
+   if (bar >= PCI_STD_NUM_BARS)
+   return -EINVAL;
+
+   legacy_iomap_table = (void __iomem **)pcim_iomap_table(pdev);
+   if (!legacy_iomap_table)
+   return -ENOMEM;
+
+   /* The legacy mechanism doesn't allow for duplicate mappings. */
+   WARN_ON(legacy_iomap_table[bar]);
+
+   legacy_iomap_table[bar] = mapping;
+
+   return 0;
+}
+
+/*
+ * Remove a mapping. The table only contains whole-BAR mappings, so this will
+ * never interfere with ranged mappings.
+ */
+static void pcim_remove_mapping_from_legacy_table(struct pci_dev *pdev,
+ void __iomem *addr)
+{
+   int bar;
+   void __iomem **legacy_iomap_table;
+
+   legacy_iomap_table = (void __iomem **)pcim_iomap_table(pdev);
+   if (!legacy_iomap_table)
+   return;
+
+   for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
+   if (legacy_iomap_table[bar] == addr) {
+   legacy_iomap_table[bar] = NULL;
+   return;
+   }
+   }
+}
+
 /**
  * pcim_iomap - Managed pcim_iomap()
  * @pdev: PCI device to iomap for
@@ -308,16 +354,20 @@ EXPORT_SYMBOL(pcim_iomap_table);
  */
 void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen)
 {
-   void __iomem **tbl;
+   void __iomem *mapping;
 
-   BUG_ON(bar >= PCIM_IOMAP_MAX);
-
-   tbl = (void __iomem **)pcim_iomap_table(pdev);
-   if (!tbl || tbl[bar])   /* duplicate mappings not allowed */
+   mapping = pci_iomap(pdev, bar, maxlen);
+   if (!mapping)
return NULL;
 
-   tbl[bar] = pci_iomap(pdev, bar, maxlen);
-   return tbl[bar];
+   if (pcim_add_mapping_to_legacy_table(pdev, mapping, bar) != 0)
+   goto err_table;
+
+   return mapping;
+
+err_table:
+   pci_iounmap(pdev, mapping);
+   return NULL;
 }
 EXPORT_SYMBOL(pcim_iomap);
 
@@ -330,20 +380,9 @@ EXPORT_SYMBOL(pcim_iomap);
  */
 void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr)
 {
-   void __iomem **tbl;
-   int i;
-
pci_iounmap(pdev, addr);
 
-   tbl = (void __iomem **)pcim_iomap_table(pdev);
-   BUG_ON(!tbl);
-
-   for (i = 0; i < PCIM_IOMAP_MAX; i++)
-   if (tbl[i] == addr) {
-   tbl[i] = NULL;
-   return;
-   }
-   WARN_ON(1);
+   pcim_remove_mapping_from_legacy_table(pdev, addr);
 }
 EXPORT_SYMBOL(pcim_iounmap);
 
-- 
2.45.0



Re: [PATCH v8 03/13] PCI: Reimplement plural devres functions

2024-06-12 Thread Philipp Stanner
On Wed, 2024-06-12 at 15:42 -0500, Bjorn Helgaas wrote:
> On Wed, Jun 12, 2024 at 10:51:40AM +0200, Philipp Stanner wrote:
> > On Tue, 2024-06-11 at 16:44 -0500, Bjorn Helgaas wrote:
> > > I'm trying to merge these into pci/next, but I'm having a hard
> > > time
> > > writing the merge commit log.  I want a one-sentence description
> > > of
> > > each patch that tells me what the benefit of the patch is. 
> > > Usually
> > > the subject line is a good start.
> > > 
> > > "Reimplement plural devres functions" is kind of vague and
> > > doesn't
> > > quite motivate this patch, and I'm having a hard time extracting
> > > the
> > > relevant details from the commit log below.
> > 
> > I would say that the summary would be something along the lines:
> > "Set ground layer for devres simplification and extension"
> > 
> > because this patch simplifies the existing functions and adds
> > infrastructure that can later be used to deprecate the bloated
> > existing
> > functions, remove the hybrid mechanism and add pcim_iomap_range().
> 
> I think something concrete like "Add partial-BAR devres support"
> would
> give people a hint about what to look for.

Okay, will do.

> 
> This patch contains quite a bit more than that, and if it were
> possible, it might be nice to split the rest to a different patch,
> but
> I'm not sure it's even possible 

I tried and got screamed at by the build chain because of dead code. So
I don't really think they can be split more, unfortunately.

In possibly following series's to PCI I'll pay attention to design
things as atomically as possible from the start.


> and I just want to get this series out
> the door.

That's actually something you and I have in common. I have been working
on the preparations for this since November last year ^^'

> 
> If the commit log includes the partial-BAR idea and the specific
> functions added, I think that will hold together.  And then it makes
> sense for why the "plural" functions would be implemented on top of
> the "singular" ones.
> 
> > > > Implement a set of singular functions 
> > > 
> > > What is this set of functions?  My guess is below.
> > > 
> > > > that use devres as it's intended and
> > > > use those singular functions to reimplement the plural
> > > > functions.
> > > 
> > > What does "as it's intended" mean?  Too nebulous to fit here.
> > 
> > Well, the idea behind devres is that you allocate a device resource
> > _for each_ object you want to be freed / deinitialized
> > automatically.
> > One devres object per driver / subsystem object, one devres
> > callback
> > per cleanup job for the driver / subsystem.
> > 
> > What PCI devres did instead was to use just ONE devres object _for
> > everything_ and then it had to implement all sorts of checks to
> > check
> > which sub-resource this master resource is actually about:
> > 
> > (from devres.c)
> > static void pcim_release(struct device *gendev, void *res)
> > {
> > struct pci_dev *dev = to_pci_dev(gendev);
> > struct pci_devres *this = res;
> > int i;
> > 
> > for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
> > if (this->region_mask & (1 << i))
> > pci_release_region(dev, i);
> > 
> > if (this->mwi)
> > pci_clear_mwi(dev);
> > 
> > if (this->restore_intx)
> > pci_intx(dev, this->orig_intx);
> > 
> > if (this->enabled && !this->pinned)
> > pci_disable_device(dev);
> > }
> > 
> > 
> > So one could dare to say that devres was partially re-implemented
> > on
> > top of devres.
> > 
> > The for-loop and the if-conditions constitute that "re-
> > implementation".
> > No one has any clue why it has been done that way, because it
> > provides
> > 0 upsides and would have been far easier to implement by just
> > letting
> > devres do its job.
> > 
> > Would you like to see the above details in the commit message?
> 
> No.  Just remove the "use devres as it's intended" since that's not
> needed to motivate this patch.  I think we need fewer and
> more-specific words.

ACK. I will rework it


Thank you Bjorn for your time and effort,

P.


> 
> Bjorn
> 



Re: [PATCH v8 03/13] PCI: Reimplement plural devres functions

2024-06-12 Thread Philipp Stanner
On Tue, 2024-06-11 at 16:44 -0500, Bjorn Helgaas wrote:
> I'm trying to merge these into pci/next, but I'm having a hard time
> writing the merge commit log.  I want a one-sentence description of
> each patch that tells me what the benefit of the patch is.  Usually
> the subject line is a good start.
> 
> "Reimplement plural devres functions" is kind of vague and doesn't
> quite motivate this patch, and I'm having a hard time extracting the
> relevant details from the commit log below.

I would say that the summary would be something along the lines:
"Set ground layer for devres simplification and extension"

because this patch simplifies the existing functions and adds
infrastructure that can later be used to deprecate the bloated existing
functions, remove the hybrid mechanism and add pcim_iomap_range().

> 
> On Mon, Jun 10, 2024 at 11:31:25AM +0200, Philipp Stanner wrote:
> > When the original PCI devres API was implemented, priority was
> > given to the
> > creation of a set of "plural functions" such as
> > pcim_request_regions().
> > These functions have bit masks as parameters to specify which BARs
> > shall
> > get mapped. Most users, however, only use those to map 1-3 BARs.
> > 
> > A complete set of "singular functions" does not exist.
> > 
> > As functions mapping / requesting multiple BARs at once have
> > (almost) no
> > mechanism in C to return the resources to the caller of the plural
> > function, the PCI devres API utilizes the iomap-table administrated
> > by the
> > function pcim_iomap_table().
> > 
> > The entire PCI devres API was strongly tied to that table which
> > only allows
> > for mapping whole, complete BARs, as the BAR's index is used as
> > table
> > index. Consequently, it's not possible to, e.g., have a
> > pcim_iomap_range()
> > function with that mechanism.
> 
> I'm getting the hint that part of the point of this patch is to add
> infrastructure so we can request and map either an entire BAR or just
> part of a BAR?

Yes, that and in the long term the simplification of the PCI devres API
is the goal.

> 
> > An additional problem is that the PCI devres API has been
> > implemented in a
> > sort of "hybrid-mode": Some unmanaged functions have managed
> > counterparts
> > (e.g.: pci_iomap() <-> pcim_iomap()), making their managed nature
> > obvious
> > to the programmer. However, the region-request functions in pci.c,
> > prefixed
> > with pci_, behave either managed or unmanaged, depending on whether
> > pci_enable_device() or pcim_enable_device() has been called in
> > advance.
> > 
> > This hybrid API is confusing and should be more cleanly separated
> > by
> > providing always-managed functions prefixed with pcim_.
> 
> I'm not sure these two paragraphs apply to this patch.  If they do,
> be
> specific about which functions are affected and how this patch fixes
> them.

That's a relict from the days when this series consisted of fewer
commits. Back then this commit's ancestor served preparing the entire
series; therefore, it contained a lot of motivational information.

I can just cut that out. I guess a link to the mail thread and its
cover letter in the commit message would explain the wider motivation
behind all of this.

> 
> > Thus, the existing PCI devres API is not desirable because:
> > 
> >   a) The vast majority of the users of the plural functions only
> > ever sets
> >  a single bit in the bit mask, consequently making them
> > singular
> >  functions anyways.
> > 
> >   b) There is no mechanism to request / iomap only part of a BAR.
> 
> >   c) The iomap-table mechanism is over-engineered and complicated.
> > Even
> >  worse, some users index over the table administration function
> >  directly, e.g.:
> > 
> >    void __iomem *mapping = pcim_iomap_table(pdev)[my_index];
> > 
> >  This can not perform bounds checks; an invalid index won't
> > cause
> >  return of -EINVAL or even NULL, resulting in undefined
> > behavior.
> 
> True, but *this* patch doesn't remove or deprecate
> pcim_iomap_table(),
> so needs to be in the patch that does that, not here.
> 
> 
> ACK, I will remove it.
> 
> 
> >   d) region-request functions being sometimes managed and sometimes
> > not
> >  is bug-provoking.
> 
> I'm not sure all the deficiencies of the past are necessary.  I'm
> more
> interested in specifics about what's being added or fixed.

Re: [PATCH v8 08/13] PCI: Move pinned status bit to struct pci_dev

2024-06-10 Thread Philipp Stanner
On Mon, 2024-06-10 at 11:31 +0200, Philipp Stanner wrote:
> The bit describing whether the PCI device is currently pinned is stored
> in struct pci_devres. To clean up and simplify the PCI devres API, it's
> better if this information is stored in struct pci_dev.
> 
> This will later permit simplifying pcim_enable_device().
> 
> Move the 'pinned' boolean bit to struct pci_dev.
> 
> Restructure bits in struct pci_dev so the pm / pme fields are next to
> each other.
> 
> Signed-off-by: Philipp Stanner 
> ---
>  drivers/pci/devres.c | 14 --
>  drivers/pci/pci.h    |  1 -
>  include/linux/pci.h  |  4 +++-
>  3 files changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
> index 9d25940ce260..2696baef5c2c 100644
> --- a/drivers/pci/devres.c
> +++ b/drivers/pci/devres.c
> @@ -403,7 +403,7 @@ static void pcim_release(struct device *gendev, void *res)
> if (this->restore_intx)
> pci_intx(dev, this->orig_intx);
>  
> -   if (pci_is_enabled(dev) && !this->pinned)
> +   if (pci_is_enabled(dev) && !dev->pinned)
> pci_disable_device(dev);
>  }
>  
> @@ -459,18 +459,12 @@ EXPORT_SYMBOL(pcim_enable_device);
>   * pcim_pin_device - Pin managed PCI device
>   * @pdev: PCI device to pin
>   *
> - * Pin managed PCI device @pdev.  Pinned device won't be disabled on
> - * driver detach.  @pdev must have been enabled with
> - * pcim_enable_device().
> + * Pin managed PCI device @pdev. Pinned device won't be disabled on driver
> + * detach. @pdev must have been enabled with pcim_enable_device().
>   */
>  void pcim_pin_device(struct pci_dev *pdev)
>  {
> -   struct pci_devres *dr;
> -
> -   dr = find_pci_dr(pdev);
> -   WARN_ON(!dr || !pci_is_enabled(pdev));
> -   if (dr)
> -   dr->pinned = 1;
> +   pdev->pinned = true;
>  }
>  EXPORT_SYMBOL(pcim_pin_device);
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index d7f00b43b098..6e02ba1b5947 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -821,7 +821,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
> pci_dev *pdev)
>   * then remove them from here.
>   */
>  struct pci_devres {
> -   unsigned int pinned:1;
> unsigned int orig_intx:1;
> unsigned int restore_intx:1;
> unsigned int mwi:1;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index fb004fd4e889..cc9247f78158 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -367,10 +367,12 @@ struct pci_dev {
>    this is D0-D3, D0 being fully
>    functional, and D3 being off. */
> u8  pm_cap; /* PM capability offset */
> -   unsigned intimm_ready:1;/* Supports Immediate Readiness */
> unsigned intpme_support:5;  /* Bitmask of states from which PME#
>    can be generated */
> unsigned intpme_poll:1; /* Poll device's PME status bit */
> +   unsigned intenabled:1;  /* Whether this dev is enabled */

Ah crap, here it survived for some reason...

Should just be dead code and not have any effect. In any case, we
should remove it.

@Bjorn: Feel free to remove it yourself. Otherwise I could provide a v9
together with potential further feedback taken into account in a few
days

Thx,
P.

> +   unsigned intpinned:1;   /* Whether this dev is pinned */
> +   unsigned intimm_ready:1;/* Supports Immediate Readiness */
> unsigned intd1_support:1;   /* Low power state D1 is supported */
> unsigned intd2_support:1;   /* Low power state D2 is supported */
> unsigned intno_d1d2:1;  /* D1 and D2 are forbidden */



[PATCH v8 10/13] PCI: Give pci_intx() its own devres callback

2024-06-10 Thread Philipp Stanner
pci_intx() is one of the functions that have "hybrid mode" (i.e.,
sometimes managed, sometimes not). Providing a separate pcim_intx()
function with its own device resource and cleanup callback allows for
removing further large parts of the legacy PCI devres implementation.

As in the region-request-functions, pci_intx() has to call into its
managed counterpart for backwards compatibility.

As pci_intx() is an outdated function, pcim_intx() shall not be made
visible to drivers via a public API.

Implement pcim_intx() with its own device resource.
Make pci_intx() call pcim_intx() in the managed case.
Remove the now surplus function find_pci_dr().

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 76 
 drivers/pci/pci.c| 21 ++--
 drivers/pci/pci.h| 13 
 3 files changed, 80 insertions(+), 30 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index a0a59338cd92..0bb144fdb69b 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -42,6 +42,11 @@ struct pcim_iomap_devres {
void __iomem *table[PCI_STD_NUM_BARS];
 };
 
+/* Used to restore the old intx state on driver detach. */
+struct pcim_intx_devres {
+   int orig_intx;
+};
+
 enum pcim_addr_devres_type {
/* Default initializer. */
PCIM_ADDR_DEVRES_TYPE_INVALID,
@@ -397,32 +402,75 @@ int pcim_set_mwi(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
+
 static inline bool mask_contains_bar(int mask, int bar)
 {
return mask & BIT(bar);
 }
 
-static void pcim_release(struct device *gendev, void *res)
+static void pcim_intx_restore(struct device *dev, void *data)
 {
-   struct pci_dev *dev = to_pci_dev(gendev);
-   struct pci_devres *this = res;
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pcim_intx_devres *res = data;
 
-   if (this->restore_intx)
-   pci_intx(dev, this->orig_intx);
+   pci_intx(pdev, res->orig_intx);
+}
 
-   if (pci_is_enabled(dev) && !dev->pinned)
-   pci_disable_device(dev);
+static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
+{
+   struct pcim_intx_devres *res;
+
+   res = devres_find(dev, pcim_intx_restore, NULL, NULL);
+   if (res)
+   return res;
+
+   res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL);
+   if (res)
+   devres_add(dev, res);
+
+   return res;
 }
 
-/*
- * TODO: After the last four callers in pci.c are ported, find_pci_dr()
- * needs to be made static again.
+/**
+ * pcim_intx - managed pci_intx()
+ * @pdev: the PCI device to operate on
+ * @enable: boolean: whether to enable or disable PCI INTx
+ *
+ * Returns: 0 on success, -ENOMEM on error.
+ *
+ * Enables/disables PCI INTx for device @pdev.
+ * Restores the original state on driver detach.
  */
-struct pci_devres *find_pci_dr(struct pci_dev *pdev)
+int pcim_intx(struct pci_dev *pdev, int enable)
 {
-   if (pci_is_managed(pdev))
-   return devres_find(&pdev->dev, pcim_release, NULL, NULL);
-   return NULL;
+   u16 pci_command, new;
+   struct pcim_intx_devres *res;
+
+   res = get_or_create_intx_devres(&pdev->dev);
+   if (!res)
+   return -ENOMEM;
+
+   res->orig_intx = !enable;
+
+   pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
+
+   if (enable)
+   new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
+   else
+   new = pci_command | PCI_COMMAND_INTX_DISABLE;
+
+   if (new != pci_command)
+   pci_write_config_word(pdev, PCI_COMMAND, new);
+
+   return 0;
+}
+
+static void pcim_release(struct device *gendev, void *res)
+{
+   struct pci_dev *dev = to_pci_dev(gendev);
+
+   if (pci_is_enabled(dev) && !dev->pinned)
+   pci_disable_device(dev);
 }
 
 static struct pci_devres *get_pci_dr(struct pci_dev *pdev)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index db2cc48f3d63..1b4832a60047 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4443,6 +4443,16 @@ void pci_intx(struct pci_dev *pdev, int enable)
 {
u16 pci_command, new;
 
+   /*
+* This is done for backwards compatibility, because the old PCI devres
+* API had a mode in which this function became managed if the dev had
+* been enabled with pcim_enable_device() instead of 
pci_enable_device().
+*/
+   if (pci_is_managed(pdev)) {
+   WARN_ON_ONCE(pcim_intx(pdev, enable) != 0);
+   return;
+   }
+
pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
 
if (enable)
@@ -4450,17 +4460,8 @@ void pci_intx(struct pci_dev *pdev, int enable)
else
new = pci_command | PCI_COMMAND_INTX_DISABLE;
 
-   if (new != pci_command) {
-   struct pci_devres *dr;
-
+   if (new !=

[PATCH v8 04/13] PCI: Deprecate two surplus devres functions

2024-06-10 Thread Philipp Stanner
pcim_iomap_table() should not be used anymore because it contributed to the
PCI devres API being designed contrary to devres's design goals.

pcim_iomap_regions_request_all() is a surplus, complicated function that
can easily be replaced by using a pcim_* request function in combination
with a pcim_* mapping function.

Mark pcim_iomap_table() and pcim_iomap_regions_request_all() as deprecated
in the function documentation.

Link: https://lore.kernel.org/r/20240605081605.18769-6-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 82f71f5e164a..54b10f5433ab 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -507,7 +507,7 @@ static void pcim_iomap_release(struct device *gendev, void 
*res)
 }
 
 /**
- * pcim_iomap_table - access iomap allocation table
+ * pcim_iomap_table - access iomap allocation table (DEPRECATED)
  * @pdev: PCI device to access iomap table for
  *
  * Returns:
@@ -521,6 +521,11 @@ static void pcim_iomap_release(struct device *gendev, void 
*res)
  * This function might sleep when the table is first allocated but can
  * be safely called without context and guaranteed to succeed once
  * allocated.
+ *
+ * This function is DEPRECATED. Do not use it in new code. Instead, obtain a
+ * mapping's address directly from one of the pcim_* mapping functions. For
+ * example:
+ * void __iomem *mappy = pcim_iomap(pdev, bar, length);
  */
 void __iomem * const *pcim_iomap_table(struct pci_dev *pdev)
 {
@@ -894,6 +899,7 @@ static int pcim_request_all_regions(struct pci_dev *pdev, 
const char *name)
 
 /**
  * pcim_iomap_regions_request_all - Request all BARs and iomap specified ones
+ * (DEPRECATED)
  * @pdev: PCI device to map IO resources for
  * @mask: Mask of BARs to iomap
  * @name: Name associated with the requests
@@ -904,6 +910,10 @@ static int pcim_request_all_regions(struct pci_dev *pdev, 
const char *name)
  *
  * To release these resources manually, call pcim_release_region() for the
  * regions and pcim_iounmap() for the mappings.
+ *
+ * This function is DEPRECATED. Don't use it in new code. Instead, use one
+ * of the pcim_* region request functions in combination with a pcim_*
+ * mapping function.
  */
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name)
-- 
2.45.0



[PATCH v8 07/13] PCI: Remove enabled status bit from pci_devres

2024-06-10 Thread Philipp Stanner
The PCI devres implementation has a separate boolean to track whether a
device is enabled. However, that can easily be tracked through the
function pci_is_enabled() which is agnostic.

Using it allows for simplifying the PCI devres implementation.

Replace the separate 'enabled' status bit from struct pci_devres with
calls to pci_is_enabled() at the appropriate places.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 11 ---
 drivers/pci/pci.c|  6 --
 drivers/pci/pci.h|  1 -
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index f2a1250c0679..9d25940ce260 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -403,7 +403,7 @@ static void pcim_release(struct device *gendev, void *res)
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
-   if (this->enabled && !this->pinned)
+   if (pci_is_enabled(dev) && !this->pinned)
pci_disable_device(dev);
 }
 
@@ -446,14 +446,11 @@ int pcim_enable_device(struct pci_dev *pdev)
dr = get_pci_dr(pdev);
if (unlikely(!dr))
return -ENOMEM;
-   if (dr->enabled)
-   return 0;
 
rc = pci_enable_device(pdev);
-   if (!rc) {
+   if (!rc)
pdev->is_managed = 1;
-   dr->enabled = 1;
-   }
+
return rc;
 }
 EXPORT_SYMBOL(pcim_enable_device);
@@ -471,7 +468,7 @@ void pcim_pin_device(struct pci_dev *pdev)
struct pci_devres *dr;
 
dr = find_pci_dr(pdev);
-   WARN_ON(!dr || !dr->enabled);
+   WARN_ON(!dr || !pci_is_enabled(pdev));
if (dr)
dr->pinned = 1;
 }
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 5e4f377411ec..db2cc48f3d63 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2218,12 +2218,6 @@ void pci_disable_enabled_device(struct pci_dev *dev)
  */
 void pci_disable_device(struct pci_dev *dev)
 {
-   struct pci_devres *dr;
-
-   dr = find_pci_dr(dev);
-   if (dr)
-   dr->enabled = 0;
-
dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0,
  "disabling already-disabled device");
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2403c5a0ff7a..d7f00b43b098 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -821,7 +821,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
  * then remove them from here.
  */
 struct pci_devres {
-   unsigned int enabled:1;
unsigned int pinned:1;
unsigned int orig_intx:1;
unsigned int restore_intx:1;
-- 
2.45.0



[PATCH v8 13/13] drm/vboxvideo: fix mapping leaks

2024-06-10 Thread Philipp Stanner
When the PCI devres API was introduced to this driver, it was wrongly
assumed that initializing the device with pcim_enable_device() instead
of pci_enable_device() will make all PCI functions managed.

This is wrong and was caused by the quite confusing PCI devres API in
which some, but not all, functions become managed that way.

The function pci_iomap_range() is never managed.

Replace pci_iomap_range() with the actually managed function
pcim_iomap_range().

Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions")
Signed-off-by: Philipp Stanner 
Reviewed-by: Hans de Goede 
---
 drivers/gpu/drm/vboxvideo/vbox_main.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c 
b/drivers/gpu/drm/vboxvideo/vbox_main.c
index 42c2d8a99509..d4ade9325401 100644
--- a/drivers/gpu/drm/vboxvideo/vbox_main.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
@@ -42,12 +42,11 @@ static int vbox_accel_init(struct vbox_private *vbox)
/* Take a command buffer for each screen from the end of usable VRAM. */
vbox->available_vram_size -= vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE;
 
-   vbox->vbva_buffers = pci_iomap_range(pdev, 0,
-vbox->available_vram_size,
-vbox->num_crtcs *
-VBVA_MIN_BUFFER_SIZE);
-   if (!vbox->vbva_buffers)
-   return -ENOMEM;
+   vbox->vbva_buffers = pcim_iomap_range(
+   pdev, 0, vbox->available_vram_size,
+   vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE);
+   if (IS_ERR(vbox->vbva_buffers))
+   return PTR_ERR(vbox->vbva_buffers);
 
for (i = 0; i < vbox->num_crtcs; ++i) {
vbva_setup_buffer_context(&vbox->vbva_info[i],
@@ -116,11 +115,10 @@ int vbox_hw_init(struct vbox_private *vbox)
DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
 
/* Map guest-heap at end of vram */
-   vbox->guest_heap =
-   pci_iomap_range(pdev, 0, GUEST_HEAP_OFFSET(vbox),
-   GUEST_HEAP_SIZE);
-   if (!vbox->guest_heap)
-   return -ENOMEM;
+   vbox->guest_heap = pcim_iomap_range(pdev, 0,
+   GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
+   if (IS_ERR(vbox->guest_heap))
+   return PTR_ERR(vbox->guest_heap);
 
/* Create guest-heap mem-pool use 2^4 = 16 byte chunks */
vbox->guest_pool = devm_gen_pool_create(vbox->ddev.dev, 4, -1,
-- 
2.45.0



[PATCH v8 12/13] PCI: Add pcim_iomap_range()

2024-06-10 Thread Philipp Stanner
The only managed mapping function currently is pcim_iomap() which
doesn't allow for mapping an area starting at a certain offset, which
many drivers want.

Add pcim_iomap_range() as an exported function.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 44 
 include/linux/pci.h  |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index e92a8802832f..96f18243742b 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -1015,3 +1015,47 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask)
}
 }
 EXPORT_SYMBOL(pcim_iounmap_regions);
+
+/**
+ * pcim_iomap_range - Create a ranged __iomap mapping within a PCI BAR
+ * @pdev: PCI device to map IO resources for
+ * @bar: Index of the BAR
+ * @offset: Offset from the begin of the BAR
+ * @len: Length in bytes for the mapping
+ *
+ * Returns: __iomem pointer on success, an IOMEM_ERR_PTR on failure.
+ *
+ * Creates a new IO-Mapping within the specified @bar, ranging from @offset to
+ * @offset + @len.
+ *
+ * The mapping will automatically get unmapped on driver detach. If desired,
+ * release manually only with pcim_iounmap().
+ */
+void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar,
+   unsigned long offset, unsigned long len)
+{
+   void __iomem *mapping;
+   struct pcim_addr_devres *res;
+
+   res = pcim_addr_devres_alloc(pdev);
+   if (!res)
+   return IOMEM_ERR_PTR(-ENOMEM);
+
+   mapping = pci_iomap_range(pdev, bar, offset, len);
+   if (!mapping) {
+   pcim_addr_devres_free(res);
+   return IOMEM_ERR_PTR(-EINVAL);
+   }
+
+   res->type = PCIM_ADDR_DEVRES_TYPE_MAPPING;
+   res->baseaddr = mapping;
+
+   /*
+* Ranged mappings don't get added to the legacy-table, since the table
+* only ever keeps track of whole BARs.
+*/
+
+   devres_add(&pdev->dev, res);
+   return mapping;
+}
+EXPORT_SYMBOL(pcim_iomap_range);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cc9247f78158..bee1b2754219 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2304,6 +2304,8 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name);
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name);
 void pcim_iounmap_regions(struct pci_dev *pdev, int mask);
+void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar,
+   unsigned long offset, unsigned long len);
 
 extern int pci_pci_problems;
 #define PCIPCI_FAIL1   /* No PCI PCI DMA */
-- 
2.45.0



[PATCH v8 11/13] PCI: Remove legacy pcim_release()

2024-06-10 Thread Philipp Stanner
Thanks to preceding cleanup steps, pcim_release() is now not needed
anymore and can be replaced by pcim_disable_device(), which is the exact
counterpart to pcim_enable_device().

This permits removing further parts of the old PCI devres implementation.

Replace pcim_release() with pcim_disable_device().
Remove the now surplus function get_pci_dr().
Remove the struct pci_devres from pci.h.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 53 +---
 drivers/pci/pci.h| 16 -
 2 files changed, 25 insertions(+), 44 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 0bb144fdb69b..e92a8802832f 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -465,48 +465,45 @@ int pcim_intx(struct pci_dev *pdev, int enable)
return 0;
 }
 
-static void pcim_release(struct device *gendev, void *res)
+static void pcim_disable_device(void *pdev_raw)
 {
-   struct pci_dev *dev = to_pci_dev(gendev);
-
-   if (pci_is_enabled(dev) && !dev->pinned)
-   pci_disable_device(dev);
-}
-
-static struct pci_devres *get_pci_dr(struct pci_dev *pdev)
-{
-   struct pci_devres *dr, *new_dr;
-
-   dr = devres_find(&pdev->dev, pcim_release, NULL, NULL);
-   if (dr)
-   return dr;
+   struct pci_dev *pdev = pdev_raw;
 
-   new_dr = devres_alloc(pcim_release, sizeof(*new_dr), GFP_KERNEL);
-   if (!new_dr)
-   return NULL;
-   return devres_get(&pdev->dev, new_dr, NULL, NULL);
+   if (!pdev->pinned)
+   pci_disable_device(pdev);
 }
 
 /**
  * pcim_enable_device - Managed pci_enable_device()
  * @pdev: PCI device to be initialized
  *
- * Managed pci_enable_device().
+ * Returns: 0 on success, negative error code on failure.
+ *
+ * Managed pci_enable_device(). Device will automatically be disabled on
+ * driver detach.
  */
 int pcim_enable_device(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
-   int rc;
+   int ret;
 
-   dr = get_pci_dr(pdev);
-   if (unlikely(!dr))
-   return -ENOMEM;
+   ret = devm_add_action(&pdev->dev, pcim_disable_device, pdev);
+   if (ret != 0)
+   return ret;
 
-   rc = pci_enable_device(pdev);
-   if (!rc)
-   pdev->is_managed = 1;
+   /*
+* We prefer removing the action in case of an error over
+* devm_add_action_or_reset() because the later could theoretically be
+* disturbed by users having pinned the device too soon.
+*/
+   ret = pci_enable_device(pdev);
+   if (ret != 0) {
+   devm_remove_action(&pdev->dev, pcim_disable_device, pdev);
+   return ret;
+   }
 
-   return rc;
+   pdev->is_managed = true;
+
+   return ret;
 }
 EXPORT_SYMBOL(pcim_enable_device);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9e87528f1157..e51e6fa79fcc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -810,22 +810,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
 }
 #endif
 
-/*
- * Managed PCI resources.  This manages device on/off, INTx/MSI/MSI-X
- * on/off and BAR regions.  pci_dev itself records MSI/MSI-X status, so
- * there's no need to track it separately.  pci_devres is initialized
- * when a device is enabled using managed PCI device enable interface.
- *
- * TODO: Struct pci_devres only needs to be here because they're used in pci.c.
- * Port or move these functions to devres.c and then remove them from here.
- */
-struct pci_devres {
-   /*
-* TODO:
-* This struct is now surplus. Remove it by refactoring pci/devres.c
-*/
-};
-
 int pcim_intx(struct pci_dev *dev, int enable);
 
 int pcim_request_region(struct pci_dev *pdev, int bar, const char *name);
-- 
2.45.0



[PATCH v8 09/13] PCI: Give pcim_set_mwi() its own devres callback

2024-06-10 Thread Philipp Stanner
Managing pci_set_mwi() with devres can easily be done with its own
callback, without the necessity to store any state about it in a
device-related struct.

Remove the MWI state from struct pci_devres.
Give pcim_set_mwi() a separate devres-callback.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 29 ++---
 drivers/pci/pci.h|  1 -
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 2696baef5c2c..a0a59338cd92 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -366,24 +366,34 @@ void __iomem *devm_pci_remap_cfg_resource(struct device 
*dev,
 }
 EXPORT_SYMBOL(devm_pci_remap_cfg_resource);
 
+static void __pcim_clear_mwi(void *pdev_raw)
+{
+   struct pci_dev *pdev = pdev_raw;
+
+   pci_clear_mwi(pdev);
+}
+
 /**
  * pcim_set_mwi - a device-managed pci_set_mwi()
- * @dev: the PCI device for which MWI is enabled
+ * @pdev: the PCI device for which MWI is enabled
  *
  * Managed pci_set_mwi().
  *
  * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
  */
-int pcim_set_mwi(struct pci_dev *dev)
+int pcim_set_mwi(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
+   int ret;
 
-   dr = find_pci_dr(dev);
-   if (!dr)
-   return -ENOMEM;
+   ret = devm_add_action(&pdev->dev, __pcim_clear_mwi, pdev);
+   if (ret != 0)
+   return ret;
+
+   ret = pci_set_mwi(pdev);
+   if (ret != 0)
+   devm_remove_action(&pdev->dev, __pcim_clear_mwi, pdev);
 
-   dr->mwi = 1;
-   return pci_set_mwi(dev);
+   return ret;
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
@@ -397,9 +407,6 @@ static void pcim_release(struct device *gendev, void *res)
struct pci_dev *dev = to_pci_dev(gendev);
struct pci_devres *this = res;
 
-   if (this->mwi)
-   pci_clear_mwi(dev);
-
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6e02ba1b5947..c355bb6a698d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -823,7 +823,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
 struct pci_devres {
unsigned int orig_intx:1;
unsigned int restore_intx:1;
-   unsigned int mwi:1;
 };
 
 struct pci_devres *find_pci_dr(struct pci_dev *pdev);
-- 
2.45.0



[PATCH v8 08/13] PCI: Move pinned status bit to struct pci_dev

2024-06-10 Thread Philipp Stanner
The bit describing whether the PCI device is currently pinned is stored
in struct pci_devres. To clean up and simplify the PCI devres API, it's
better if this information is stored in struct pci_dev.

This will later permit simplifying pcim_enable_device().

Move the 'pinned' boolean bit to struct pci_dev.

Restructure bits in struct pci_dev so the pm / pme fields are next to
each other.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 14 --
 drivers/pci/pci.h|  1 -
 include/linux/pci.h  |  4 +++-
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 9d25940ce260..2696baef5c2c 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -403,7 +403,7 @@ static void pcim_release(struct device *gendev, void *res)
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
-   if (pci_is_enabled(dev) && !this->pinned)
+   if (pci_is_enabled(dev) && !dev->pinned)
pci_disable_device(dev);
 }
 
@@ -459,18 +459,12 @@ EXPORT_SYMBOL(pcim_enable_device);
  * pcim_pin_device - Pin managed PCI device
  * @pdev: PCI device to pin
  *
- * Pin managed PCI device @pdev.  Pinned device won't be disabled on
- * driver detach.  @pdev must have been enabled with
- * pcim_enable_device().
+ * Pin managed PCI device @pdev. Pinned device won't be disabled on driver
+ * detach. @pdev must have been enabled with pcim_enable_device().
  */
 void pcim_pin_device(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
-
-   dr = find_pci_dr(pdev);
-   WARN_ON(!dr || !pci_is_enabled(pdev));
-   if (dr)
-   dr->pinned = 1;
+   pdev->pinned = true;
 }
 EXPORT_SYMBOL(pcim_pin_device);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d7f00b43b098..6e02ba1b5947 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -821,7 +821,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
  * then remove them from here.
  */
 struct pci_devres {
-   unsigned int pinned:1;
unsigned int orig_intx:1;
unsigned int restore_intx:1;
unsigned int mwi:1;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index fb004fd4e889..cc9247f78158 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -367,10 +367,12 @@ struct pci_dev {
   this is D0-D3, D0 being fully
   functional, and D3 being off. */
u8  pm_cap; /* PM capability offset */
-   unsigned intimm_ready:1;/* Supports Immediate Readiness */
unsigned intpme_support:5;  /* Bitmask of states from which PME#
   can be generated */
unsigned intpme_poll:1; /* Poll device's PME status bit */
+   unsigned intenabled:1;  /* Whether this dev is enabled */
+   unsigned intpinned:1;   /* Whether this dev is pinned */
+   unsigned intimm_ready:1;/* Supports Immediate Readiness */
unsigned intd1_support:1;   /* Low power state D1 is supported */
unsigned intd2_support:1;   /* Low power state D2 is supported */
unsigned intno_d1d2:1;  /* D1 and D2 are forbidden */
-- 
2.45.0



[PATCH v8 01/13] PCI: Add and use devres helper for bit masks

2024-06-10 Thread Philipp Stanner
The current derves implementation uses manual shift operations to check
whether a bit in a mask is set. The code can be made more readable by
writing a small helper function for that.

Implement mask_contains_bar() and use it where applicable.

Link: https://lore.kernel.org/r/20240605081605.18769-3-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 2c562b9eaf80..f13edd4a3873 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -161,6 +161,10 @@ int pcim_set_mwi(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
+static inline bool mask_contains_bar(int mask, int bar)
+{
+   return mask & BIT(bar);
+}
 
 static void pcim_release(struct device *gendev, void *res)
 {
@@ -169,7 +173,7 @@ static void pcim_release(struct device *gendev, void *res)
int i;
 
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
-   if (this->region_mask & (1 << i))
+   if (mask_contains_bar(this->region_mask, i))
pci_release_region(dev, i);
 
if (this->mwi)
@@ -363,7 +367,7 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name)
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
unsigned long len;
 
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
 
rc = -EINVAL;
@@ -386,7 +390,7 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name)
pci_release_region(pdev, i);
  err_inval:
while (--i >= 0) {
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
pcim_iounmap(pdev, iomap[i]);
pci_release_region(pdev, i);
@@ -438,7 +442,7 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask)
return;
 
for (i = 0; i < PCIM_IOMAP_MAX; i++) {
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
 
pcim_iounmap(pdev, iomap[i]);
-- 
2.45.0



[PATCH v8 06/13] PCI: Warn users about complicated devres nature

2024-06-10 Thread Philipp Stanner
The PCI region-request functions become managed functions when
pcim_enable_device() has been called previously instead of
pci_enable_device().

This has already caused a bug (in 8558de401b5f) by confusing users, who
came to believe that all pci functions, such as pci_iomap_range(), suddenly
are managed that way, which is not the case.

Add comments to the relevant functions' docstrings that warn users about
this behavior.

Link: https://lore.kernel.org/r/20240605081605.18769-8-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/iomap.c | 16 
 drivers/pci/pci.c   | 42 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iomap.c b/drivers/pci/iomap.c
index c9725428e387..a715a4803c95 100644
--- a/drivers/pci/iomap.c
+++ b/drivers/pci/iomap.c
@@ -23,6 +23,10 @@
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR from offset to the end, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_range(struct pci_dev *dev,
  int bar,
@@ -63,6 +67,10 @@ EXPORT_SYMBOL(pci_iomap_range);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR from offset to the end, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
 int bar,
@@ -106,6 +114,10 @@ EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR without checking for its length first, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device(). If you need automatic cleanup, use pcim_iomap().
  * */
 void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 {
@@ -127,6 +139,10 @@ EXPORT_SYMBOL(pci_iomap);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR without checking for its length first, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
 {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7013699db242..5e4f377411ec 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3900,6 +3900,8 @@ EXPORT_SYMBOL(pci_release_region);
  * @res_name: Name to be associated with resource.
  * @exclusive: whether the region access is exclusive or not
  *
+ * Returns: 0 on success, negative error code on failure.
+ *
  * Mark the PCI region associated with PCI device @pdev BAR @bar as
  * being reserved by owner @res_name.  Do not access any
  * address inside the PCI regions unless this call returns
@@ -3950,6 +3952,8 @@ static int __pci_request_region(struct pci_dev *pdev, int 
bar,
  * @bar: BAR to be reserved
  * @res_name: Name to be associated with resource
  *
+ * Returns: 0 on success, negative error code on failure.
+ *
  * Mark the PCI region associated with PCI device @pdev BAR @bar as
  * being reserved by owner @res_name.  Do not access any
  * address inside the PCI regions unless this call returns
@@ -3957,6 +3961,11 @@ static int __pci_request_region(struct pci_dev *pdev, 
int bar,
  *
  * Returns 0 on success, or %EBUSY on error.  A warning
  * message is also printed on failure.
+ *
+ * NOTE:
+ * This is a "hybrid" function: It's normally unmanaged, but becomes managed
+ * when pcim_enable_device() has been called in advance. This hybrid feature is
+ * DEPRECATED! If you want managed cleanup, use the pcim_* functions instead.
  */
 int pci_request_region(struct pci_dev *pdev, int bar, const char *res_name)
 {
@@ -4007,6 +4016,13 @@ static int __pci_request_selected_regions(struct pci_dev 
*pdev, int bars,
  * @pdev: PCI device whose resources are to be reserved
  * @bars: Bitmask of BARs to be requested
  * @res_name: Name to be associated with resource
+ *
+ * Returns: 0 on success, negative error code on failure.
+ *
+ * NOTE:
+ * This is a "hybrid" function: It's normally unmanaged, but becomes managed
+ * when pcim_enable_device() has been called in advance. This hybrid feature is
+ * DEPRECATED! If you want managed cleanup, use the pcim_* functions instead.
  */
 int pci_request_selected_regions(struct pci_dev *pdev, int bars,
 const char *res_name)
@@ -4015,6 +4031,19 @@ int pci_request_selected_regions(struct pci_dev *pdev, 
int bars,
 }
 EXPORT_SYMBOL(pci_request_selected_regions);
 
+/**
+ * pci_request_selected_regions_exclusive - Request regions exclusively
+ * @pdev: PCI device to request regions from
+ *

[PATCH v8 05/13] PCI: Make devres region requests consistent

2024-06-10 Thread Philipp Stanner
Now that pure managed region request functions are available, the
implementation of the hybrid-functions which are only sometimes managed can
be made more consistent and readable by wrapping those always-managed
functions.

Implement pcim_request_region_exclusive() as a PCI-internal helper.  Have
the PCI request / release functions call their pcim_ counterparts.  Remove
the now surplus region_mask from struct pci_devres.

Link: https://lore.kernel.org/r/20240605081605.18769-7-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 53 ++--
 drivers/pci/pci.c| 47 +--
 drivers/pci/pci.h| 10 -
 3 files changed, 45 insertions(+), 65 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 54b10f5433ab..f2a1250c0679 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -24,18 +24,15 @@
  *
  *Consequently, in the new API, region requests performed by the pcim_
  *functions are automatically cleaned up through the devres callback
- *pcim_addr_resource_release(), while requests performed by
- *pcim_enable_device() + pci_*region*() are automatically cleaned up
- *through the for-loop in pcim_release().
+ *pcim_addr_resource_release().
+ *Users utilizing pcim_enable_device() + pci_*region*() are redirected in
+ *pci.c to the managed functions here in this file. This isn't exactly
+ *perfect, but the only alternative way would be to port ALL drivers using
+ *said combination to pcim_ functions.
  *
- * TODO 1:
+ * TODO:
  * Remove the legacy table entirely once all calls to pcim_iomap_table() in
  * the kernel have been removed.
- *
- * TODO 2:
- * Port everyone calling pcim_enable_device() + pci_*region*() to using the
- * pcim_ functions. Then, remove all devres functionality from pci_*region*()
- * functions and remove the associated cleanups described above in point #2.
  */
 
 /*
@@ -399,22 +396,6 @@ static void pcim_release(struct device *gendev, void *res)
 {
struct pci_dev *dev = to_pci_dev(gendev);
struct pci_devres *this = res;
-   int i;
-
-   /*
-* This is legacy code.
-*
-* All regions requested by a pcim_ function do get released through
-* pcim_addr_resource_release(). Thanks to the hybrid nature of the pci_
-* region-request functions, this for-loop has to release the regions
-* if they have been requested by such a function.
-*
-* TODO: Remove this once all users of pcim_enable_device() PLUS
-* pci-region-request-functions have been ported to pcim_ functions.
-*/
-   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
-   if (mask_contains_bar(this->region_mask, i))
-   pci_release_region(dev, i);
 
if (this->mwi)
pci_clear_mwi(dev);
@@ -823,11 +804,29 @@ static int _pcim_request_region(struct pci_dev *pdev, int 
bar, const char *name,
  * The region will automatically be released on driver detach. If desired,
  * release manually only with pcim_release_region().
  */
-static int pcim_request_region(struct pci_dev *pdev, int bar, const char *name)
+int pcim_request_region(struct pci_dev *pdev, int bar, const char *name)
 {
return _pcim_request_region(pdev, bar, name, 0);
 }
 
+/**
+ * pcim_request_region_exclusive - Request a PCI BAR exclusively
+ * @pdev: PCI device to requestion region for
+ * @bar: Index of BAR to request
+ * @name: Name associated with the request
+ *
+ * Returns: 0 on success, a negative error code on failure.
+ *
+ * Request region specified by @bar exclusively.
+ *
+ * The region will automatically be released on driver detach. If desired,
+ * release manually only with pcim_release_region().
+ */
+int pcim_request_region_exclusive(struct pci_dev *pdev, int bar, const char 
*name)
+{
+   return _pcim_request_region(pdev, bar, name, IORESOURCE_EXCLUSIVE);
+}
+
 /**
  * pcim_release_region - Release a PCI BAR
  * @pdev: PCI device to operate on
@@ -836,7 +835,7 @@ static int pcim_request_region(struct pci_dev *pdev, int 
bar, const char *name)
  * Release a region manually that was previously requested by
  * pcim_request_region().
  */
-static void pcim_release_region(struct pci_dev *pdev, int bar)
+void pcim_release_region(struct pci_dev *pdev, int bar)
 {
struct pcim_addr_devres res_searched;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d94445f5f882..7013699db242 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3872,7 +3872,15 @@ EXPORT_SYMBOL(pci_enable_atomic_ops_to_root);
  */
 void pci_release_region(struct pci_dev *pdev, int bar)
 {
-   struct pci_devres *dr;
+   /*
+* This is done for backwards compatibility, because the old PCI devres
+* API had a mode in which the function became managed if it had been
+

[PATCH v8 03/13] PCI: Reimplement plural devres functions

2024-06-10 Thread Philipp Stanner
When the original PCI devres API was implemented, priority was given to the
creation of a set of "plural functions" such as pcim_request_regions().
These functions have bit masks as parameters to specify which BARs shall
get mapped. Most users, however, only use those to map 1-3 BARs.

A complete set of "singular functions" does not exist.

As functions mapping / requesting multiple BARs at once have (almost) no
mechanism in C to return the resources to the caller of the plural
function, the PCI devres API utilizes the iomap-table administrated by the
function pcim_iomap_table().

The entire PCI devres API was strongly tied to that table which only allows
for mapping whole, complete BARs, as the BAR's index is used as table
index. Consequently, it's not possible to, e.g., have a pcim_iomap_range()
function with that mechanism.

An additional problem is that the PCI devres API has been implemented in a
sort of "hybrid-mode": Some unmanaged functions have managed counterparts
(e.g.: pci_iomap() <-> pcim_iomap()), making their managed nature obvious
to the programmer. However, the region-request functions in pci.c, prefixed
with pci_, behave either managed or unmanaged, depending on whether
pci_enable_device() or pcim_enable_device() has been called in advance.

This hybrid API is confusing and should be more cleanly separated by
providing always-managed functions prefixed with pcim_.

Thus, the existing PCI devres API is not desirable because:

  a) The vast majority of the users of the plural functions only ever sets
 a single bit in the bit mask, consequently making them singular
 functions anyways.

  b) There is no mechanism to request / iomap only part of a BAR.

  c) The iomap-table mechanism is over-engineered and complicated. Even
 worse, some users index over the table administration function
 directly, e.g.:

   void __iomem *mapping = pcim_iomap_table(pdev)[my_index];

 This can not perform bounds checks; an invalid index won't cause
 return of -EINVAL or even NULL, resulting in undefined behavior.

  d) region-request functions being sometimes managed and sometimes not
 is bug-provoking.

Implement a set of internal helper functions that don't have the problem of
a hybrid nature that their counter parts in pci.c have. Write those helpers
in a generic manner so that they can easily be extended to, e.g., ranged
mappings and requests.

Implement a set of singular functions that use devres as it's intended and
use those singular functions to reimplement the plural functions.

Link: https://lore.kernel.org/r/20240605081605.18769-5-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 608 ++-
 drivers/pci/pci.c|  22 ++
 drivers/pci/pci.h|   5 +
 3 files changed, 568 insertions(+), 67 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 845d6fab0ce7..82f71f5e164a 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -4,14 +4,243 @@
 #include "pci.h"
 
 /*
- * PCI iomap devres
+ * On the state of PCI's devres implementation:
+ *
+ * The older devres API for PCI has two significant problems:
+ *
+ * 1. It is very strongly tied to the statically allocated mapping table in
+ *struct pcim_iomap_devres below. This is mostly solved in the sense of the
+ *pcim_ functions in this file providing things like ranged mapping by
+ *bypassing this table, wheras the functions that were present in the old
+ *API still enter the mapping addresses into the table for users of the old
+ *API.
+ *
+ * 2. The region-request-functions in pci.c do become managed IF the device has
+ *been enabled with pcim_enable_device() instead of pci_enable_device().
+ *This resulted in the API becoming inconsistent: Some functions have an
+ *obviously managed counter-part (e.g., pci_iomap() <-> pcim_iomap()),
+ *whereas some don't and are never managed, while others don't and are
+ *_sometimes_ managed (e.g. pci_request_region()).
+ *
+ *Consequently, in the new API, region requests performed by the pcim_
+ *functions are automatically cleaned up through the devres callback
+ *pcim_addr_resource_release(), while requests performed by
+ *pcim_enable_device() + pci_*region*() are automatically cleaned up
+ *through the for-loop in pcim_release().
+ *
+ * TODO 1:
+ * Remove the legacy table entirely once all calls to pcim_iomap_table() in
+ * the kernel have been removed.
+ *
+ * TODO 2:
+ * Port everyone calling pcim_enable_device() + pci_*region*() to using the
+ * pcim_ functions. Then, remove all devres functionality from pci_*region*()
+ * functions and remove the associated cleanups described above in point #2.
  */
-#define PCIM_IOMAP_MAX PCI_STD_NUM_BARS
 
+/*
+ * Legacy struct storing addresses to whole mapped BARs

[PATCH v8 02/13] PCI: Add devres helpers for iomap table

2024-06-10 Thread Philipp Stanner
The pcim_iomap_devres.table administrated by pcim_iomap_table() has its
entries set and unset at several places throughout devres.c using manual
iterations which are effectively code duplications.

Add pcim_add_mapping_to_legacy_table() and
pcim_remove_mapping_from_legacy_table() helper functions and use them where
possible.

Link: https://lore.kernel.org/r/20240605081605.18769-4-pstan...@redhat.com
Signed-off-by: Philipp Stanner 
[bhelgaas: s/short bar/int bar/ for consistency]
Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/devres.c | 77 +---
 1 file changed, 58 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index f13edd4a3873..845d6fab0ce7 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -297,6 +297,52 @@ void __iomem * const *pcim_iomap_table(struct pci_dev 
*pdev)
 }
 EXPORT_SYMBOL(pcim_iomap_table);
 
+/*
+ * Fill the legacy mapping-table, so that drivers using the old API can
+ * still get a BAR's mapping address through pcim_iomap_table().
+ */
+static int pcim_add_mapping_to_legacy_table(struct pci_dev *pdev,
+   void __iomem *mapping, int bar)
+{
+   void __iomem **legacy_iomap_table;
+
+   if (bar >= PCI_STD_NUM_BARS)
+   return -EINVAL;
+
+   legacy_iomap_table = (void __iomem **)pcim_iomap_table(pdev);
+   if (!legacy_iomap_table)
+   return -ENOMEM;
+
+   /* The legacy mechanism doesn't allow for duplicate mappings. */
+   WARN_ON(legacy_iomap_table[bar]);
+
+   legacy_iomap_table[bar] = mapping;
+
+   return 0;
+}
+
+/*
+ * Remove a mapping. The table only contains whole-BAR mappings, so this will
+ * never interfere with ranged mappings.
+ */
+static void pcim_remove_mapping_from_legacy_table(struct pci_dev *pdev,
+ void __iomem *addr)
+{
+   int bar;
+   void __iomem **legacy_iomap_table;
+
+   legacy_iomap_table = (void __iomem **)pcim_iomap_table(pdev);
+   if (!legacy_iomap_table)
+   return;
+
+   for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
+   if (legacy_iomap_table[bar] == addr) {
+   legacy_iomap_table[bar] = NULL;
+   return;
+   }
+   }
+}
+
 /**
  * pcim_iomap - Managed pcim_iomap()
  * @pdev: PCI device to iomap for
@@ -308,16 +354,20 @@ EXPORT_SYMBOL(pcim_iomap_table);
  */
 void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen)
 {
-   void __iomem **tbl;
+   void __iomem *mapping;
 
-   BUG_ON(bar >= PCIM_IOMAP_MAX);
-
-   tbl = (void __iomem **)pcim_iomap_table(pdev);
-   if (!tbl || tbl[bar])   /* duplicate mappings not allowed */
+   mapping = pci_iomap(pdev, bar, maxlen);
+   if (!mapping)
return NULL;
 
-   tbl[bar] = pci_iomap(pdev, bar, maxlen);
-   return tbl[bar];
+   if (pcim_add_mapping_to_legacy_table(pdev, mapping, bar) != 0)
+   goto err_table;
+
+   return mapping;
+
+err_table:
+   pci_iounmap(pdev, mapping);
+   return NULL;
 }
 EXPORT_SYMBOL(pcim_iomap);
 
@@ -330,20 +380,9 @@ EXPORT_SYMBOL(pcim_iomap);
  */
 void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr)
 {
-   void __iomem **tbl;
-   int i;
-
pci_iounmap(pdev, addr);
 
-   tbl = (void __iomem **)pcim_iomap_table(pdev);
-   BUG_ON(!tbl);
-
-   for (i = 0; i < PCIM_IOMAP_MAX; i++)
-   if (tbl[i] == addr) {
-   tbl[i] = NULL;
-   return;
-   }
-   WARN_ON(1);
+   pcim_remove_mapping_from_legacy_table(pdev, addr);
 }
 EXPORT_SYMBOL(pcim_iounmap);
 
-- 
2.45.0



[PATCH v8 00/13] Make PCI's devres API more consistent

2024-06-10 Thread Philipp Stanner
 complicated nature of PCI's devres.


Note that this series is based on my "unify pci_iounmap"-series from a
few weeks ago. [1]

I tested this on a x86 VM with a simple pci test-device with two
regions. Operates and reserves resources as intended on my system.
Kasan and kmemleak didn't find any problems.

I believe this series cleans the API up as much as possible without
having to port all existing drivers to the new API. Especially, I think
that this implementation is easy to extend if the need for new managed
functions arises :)

Greetings,
P.

Philipp Stanner (13):
  PCI: Add and use devres helper for bit masks
  PCI: Add devres helpers for iomap table
  PCI: Reimplement plural devres functions
  PCI: Deprecate two surplus devres functions
  PCI: Make devres region requests consistent
  PCI: Warn users about complicated devres nature
  PCI: Remove enabled status bit from pci_devres
  PCI: Move pinned status bit to struct pci_dev
  PCI: Give pcim_set_mwi() its own devres callback
  PCI: Give pci_intx() its own devres callback
  PCI: Remove legacy pcim_release()
  PCI: Add pcim_iomap_range()
  drm/vboxvideo: fix mapping leaks

 drivers/gpu/drm/vboxvideo/vbox_main.c |  20 +-
 drivers/pci/devres.c  | 903 +-
 drivers/pci/iomap.c   |  16 +
 drivers/pci/pci.c |  94 ++-
 drivers/pci/pci.h |  23 +-
 include/linux/pci.h   |   6 +-
 6 files changed, 859 insertions(+), 203 deletions(-)

-- 
2.45.0



Re: [PATCH v7 07/13] PCI: Move dev-enabled status bit to struct pci_dev

2024-06-06 Thread Philipp Stanner
On Wed, 2024-06-05 at 16:11 -0500, Bjorn Helgaas wrote:
> On Wed, Jun 05, 2024 at 10:15:59AM +0200, Philipp Stanner wrote:
> > The bit describing whether the PCI device is currently enabled is
> > stored
> > in struct pci_devres. Besides this struct being subject of a
> > cleanup
> > process, struct pci_device is in general the right place to store
> > this
> > information, since it is not devres-specific.
> > 
> > Move the 'enabled' boolean bit to struct pci_dev.
> 
> I think this (and the similar 'pinned' patch) appeared in v6.

Yes. This patch and its brothers serve to remove members from
struct pci_devres step by step, so it can ultimately be removed, so
that we won't have a generic devres struct anymore, but actual
resource-specific structs.

> 
> It sounds plausible to have this in struct pci_dev, but it's
> confusing
> to have both:
> 
>   pci_dev.enabled
>   pci_dev.enable_cnt, used by pci_is_enabled()
> 
> I haven't looked hard enough to see whether both are required.  If
> they are, I think we should rename "enabled" to something descriptive
> enough to make it obviously different from "enable_cnt".

I took a look at it and I think we can actually drop "enabled" and use
"enable_cnt" for everything. That would even simplify things more, I'd
say.

Let me provide that in v8.


P.

> 
> > Signed-off-by: Philipp Stanner 
> > ---
> >  drivers/pci/devres.c | 11 ---
> >  drivers/pci/pci.c    | 17 ++---
> >  drivers/pci/pci.h    |  1 -
> >  include/linux/pci.h  |  1 +
> >  4 files changed, 15 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
> > index 572a4e193879..ea590caf8995 100644
> > --- a/drivers/pci/devres.c
> > +++ b/drivers/pci/devres.c
> > @@ -398,7 +398,7 @@ static void pcim_release(struct device *gendev,
> > void *res)
> > if (this->restore_intx)
> > pci_intx(dev, this->orig_intx);
> >  
> > -   if (this->enabled && !this->pinned)
> > +   if (!this->pinned)
> > pci_disable_device(dev);
> >  }
> >  
> > @@ -441,14 +441,11 @@ int pcim_enable_device(struct pci_dev *pdev)
> > dr = get_pci_dr(pdev);
> > if (unlikely(!dr))
> > return -ENOMEM;
> > -   if (dr->enabled)
> > -   return 0;
> >  
> > rc = pci_enable_device(pdev);
> > -   if (!rc) {
> > +   if (!rc)
> > pdev->is_managed = 1;
> > -   dr->enabled = 1;
> > -   }
> > +
> > return rc;
> >  }
> >  EXPORT_SYMBOL(pcim_enable_device);
> > @@ -466,7 +463,7 @@ void pcim_pin_device(struct pci_dev *pdev)
> > struct pci_devres *dr;
> >  
> > dr = find_pci_dr(pdev);
> > -   WARN_ON(!dr || !dr->enabled);
> > +   WARN_ON(!dr || !pdev->enabled);
> > if (dr)
> > dr->pinned = 1;
> >  }
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 8dd711b9a291..04accdfab7ce 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -2011,6 +2011,9 @@ static int do_pci_enable_device(struct
> > pci_dev *dev, int bars)
> > u16 cmd;
> > u8 pin;
> >  
> > +   if (dev->enabled)
> > +   return 0;
> > +
> > err = pci_set_power_state(dev, PCI_D0);
> > if (err < 0 && err != -EIO)
> > return err;
> > @@ -2025,7 +2028,7 @@ static int do_pci_enable_device(struct
> > pci_dev *dev, int bars)
> > pci_fixup_device(pci_fixup_enable, dev);
> >  
> > if (dev->msi_enabled || dev->msix_enabled)
> > -   return 0;
> > +   goto success_out;
> >  
> > pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &pin);
> > if (pin) {
> > @@ -2035,6 +2038,8 @@ static int do_pci_enable_device(struct
> > pci_dev *dev, int bars)
> >   cmd &
> > ~PCI_COMMAND_INTX_DISABLE);
> > }
> >  
> > +success_out:
> > +   dev->enabled = true;
> > return 0;
> >  }
> >  
> > @@ -2193,6 +2198,9 @@ static void do_pci_disable_device(struct
> > pci_dev *dev)
> >  {
> > u16 pci_command;
> >  
> > +   if (!dev->enabled)
> > +   return;
> > +
> > 

[PATCH v7 01/13] PCI: Add and use devres helper for bit masks

2024-06-05 Thread Philipp Stanner
The current derves implementation uses manual shift operations to check
whether a bit in a mask is set. The code can be made more readable by
writing a small helper function for that.

Implement mask_contains_bar() and use it where applicable.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 2c562b9eaf80..f13edd4a3873 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -161,6 +161,10 @@ int pcim_set_mwi(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
+static inline bool mask_contains_bar(int mask, int bar)
+{
+   return mask & BIT(bar);
+}
 
 static void pcim_release(struct device *gendev, void *res)
 {
@@ -169,7 +173,7 @@ static void pcim_release(struct device *gendev, void *res)
int i;
 
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++)
-   if (this->region_mask & (1 << i))
+   if (mask_contains_bar(this->region_mask, i))
pci_release_region(dev, i);
 
if (this->mwi)
@@ -363,7 +367,7 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name)
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
unsigned long len;
 
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
 
rc = -EINVAL;
@@ -386,7 +390,7 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name)
pci_release_region(pdev, i);
  err_inval:
while (--i >= 0) {
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
pcim_iounmap(pdev, iomap[i]);
pci_release_region(pdev, i);
@@ -438,7 +442,7 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask)
return;
 
for (i = 0; i < PCIM_IOMAP_MAX; i++) {
-   if (!(mask & (1 << i)))
+   if (!mask_contains_bar(mask, i))
continue;
 
pcim_iounmap(pdev, iomap[i]);
-- 
2.45.0



[PATCH v7 13/13] drm/vboxvideo: fix mapping leaks

2024-06-05 Thread Philipp Stanner
When the PCI devres API was introduced to this driver, it was wrongly
assumed that initializing the device with pcim_enable_device() instead
of pci_enable_device() will make all PCI functions managed.

This is wrong and was caused by the quite confusing PCI devres API in
which some, but not all, functions become managed that way.

The function pci_iomap_range() is never managed.

Replace pci_iomap_range() with the actually managed function
pcim_iomap_range().

Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions")
Signed-off-by: Philipp Stanner 
Reviewed-by: Hans de Goede 
---
 drivers/gpu/drm/vboxvideo/vbox_main.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c 
b/drivers/gpu/drm/vboxvideo/vbox_main.c
index 42c2d8a99509..d4ade9325401 100644
--- a/drivers/gpu/drm/vboxvideo/vbox_main.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
@@ -42,12 +42,11 @@ static int vbox_accel_init(struct vbox_private *vbox)
/* Take a command buffer for each screen from the end of usable VRAM. */
vbox->available_vram_size -= vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE;
 
-   vbox->vbva_buffers = pci_iomap_range(pdev, 0,
-vbox->available_vram_size,
-vbox->num_crtcs *
-VBVA_MIN_BUFFER_SIZE);
-   if (!vbox->vbva_buffers)
-   return -ENOMEM;
+   vbox->vbva_buffers = pcim_iomap_range(
+   pdev, 0, vbox->available_vram_size,
+   vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE);
+   if (IS_ERR(vbox->vbva_buffers))
+   return PTR_ERR(vbox->vbva_buffers);
 
for (i = 0; i < vbox->num_crtcs; ++i) {
vbva_setup_buffer_context(&vbox->vbva_info[i],
@@ -116,11 +115,10 @@ int vbox_hw_init(struct vbox_private *vbox)
DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
 
/* Map guest-heap at end of vram */
-   vbox->guest_heap =
-   pci_iomap_range(pdev, 0, GUEST_HEAP_OFFSET(vbox),
-   GUEST_HEAP_SIZE);
-   if (!vbox->guest_heap)
-   return -ENOMEM;
+   vbox->guest_heap = pcim_iomap_range(pdev, 0,
+   GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
+   if (IS_ERR(vbox->guest_heap))
+   return PTR_ERR(vbox->guest_heap);
 
/* Create guest-heap mem-pool use 2^4 = 16 byte chunks */
vbox->guest_pool = devm_gen_pool_create(vbox->ddev.dev, 4, -1,
-- 
2.45.0



[PATCH v7 04/13] PCI: Deprecate two surplus devres functions

2024-06-05 Thread Philipp Stanner
pcim_iomap_table() should not be used anymore because it contributed to
the PCI devres API being designed contrary to devres's design goals.

pcim_iomap_regions_request_all() is a surplus, complicated function
that can easily be replaced by using a pcim_* request function in
combination with a pcim_* mapping function.

Mark pcim_iomap_table() and pcim_iomap_regions_request_all() as
deprecated in the function documentation.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index e6e791c9db6e..f199f610ae51 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -501,7 +501,7 @@ static void pcim_iomap_release(struct device *gendev, void 
*res)
 }
 
 /**
- * pcim_iomap_table - access iomap allocation table
+ * pcim_iomap_table - access iomap allocation table (DEPRECATED)
  * @pdev: PCI device to access iomap table for
  *
  * Returns:
@@ -515,6 +515,11 @@ static void pcim_iomap_release(struct device *gendev, void 
*res)
  * This function might sleep when the table is first allocated but can
  * be safely called without context and guaranteed to succeed once
  * allocated.
+ *
+ * This function is DEPRECATED. Do not use it in new code. Instead, obtain a
+ * mapping's address directly from one of the pcim_* mapping functions. For
+ * example:
+ * void __iomem *mappy = pcim_iomap(pdev, barnr, length);
  */
 void __iomem * const *pcim_iomap_table(struct pci_dev *pdev)
 {
@@ -886,7 +891,7 @@ static int pcim_request_all_regions(struct pci_dev *pdev, 
const char *name)
 }
 
 /**
- * pcim_iomap_regions_request_all - Request all BARs and iomap specified ones
+ * pcim_iomap_regions_request_all - Request all BARs and iomap specified ones 
(DEPRECATED)
  * @pdev: PCI device to map IO resources for
  * @mask: Mask of BARs to iomap
  * @name: Name associated with the requests
@@ -897,6 +902,9 @@ static int pcim_request_all_regions(struct pci_dev *pdev, 
const char *name)
  *
  * To release these resources manually, call pcim_release_region() for the
  * regions and pcim_iounmap() for the mappings.
+ *
+ * This function is DEPRECATED. Don't use it in new code. Instead, use one of 
the
+ * pcim_* region request functions in combination with a pcim_* mapping 
function.
  */
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name)
-- 
2.45.0



[PATCH v7 12/13] PCI: Add pcim_iomap_range()

2024-06-05 Thread Philipp Stanner
The only managed mapping function currently is pcim_iomap() which
doesn't allow for mapping an area starting at a certain offset, which
many drivers want.

Add pcim_iomap_range() as an exported function.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 44 
 include/linux/pci.h  |  2 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 271ffd1aaf47..5ddcfe001d08 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -1007,3 +1007,47 @@ void pcim_iounmap_regions(struct pci_dev *pdev, int mask)
}
 }
 EXPORT_SYMBOL(pcim_iounmap_regions);
+
+/**
+ * pcim_iomap_range - Create a ranged __iomap mapping within a PCI BAR
+ * @pdev: PCI device to map IO resources for
+ * @bar: Index of the BAR
+ * @offset: Offset from the begin of the BAR
+ * @len: Length in bytes for the mapping
+ *
+ * Returns: __iomem pointer on success, an IOMEM_ERR_PTR on failure.
+ *
+ * Creates a new IO-Mapping within the specified @bar, ranging from @offset to
+ * @offset + @len.
+ *
+ * The mapping will automatically get unmapped on driver detach. If desired,
+ * release manually only with pcim_iounmap().
+ */
+void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar,
+   unsigned long offset, unsigned long len)
+{
+   void __iomem *mapping;
+   struct pcim_addr_devres *res;
+
+   res = pcim_addr_devres_alloc(pdev);
+   if (!res)
+   return IOMEM_ERR_PTR(-ENOMEM);
+
+   mapping = pci_iomap_range(pdev, bar, offset, len);
+   if (!mapping) {
+   pcim_addr_devres_free(res);
+   return IOMEM_ERR_PTR(-EINVAL);
+   }
+
+   res->type = PCIM_ADDR_DEVRES_TYPE_MAPPING;
+   res->baseaddr = mapping;
+
+   /*
+* Ranged mappings don't get added to the legacy-table, since the table
+* only ever keeps track of whole BARs.
+*/
+
+   devres_add(&pdev->dev, res);
+   return mapping;
+}
+EXPORT_SYMBOL(pcim_iomap_range);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3104c0238a42..f6918e49ea5f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2329,6 +2329,8 @@ int pcim_iomap_regions(struct pci_dev *pdev, int mask, 
const char *name);
 int pcim_iomap_regions_request_all(struct pci_dev *pdev, int mask,
   const char *name);
 void pcim_iounmap_regions(struct pci_dev *pdev, int mask);
+void __iomem *pcim_iomap_range(struct pci_dev *pdev, int bar,
+   unsigned long offset, unsigned long len);
 
 extern int pci_pci_problems;
 #define PCIPCI_FAIL1   /* No PCI PCI DMA */
-- 
2.45.0



[PATCH v7 10/13] PCI: Give pci(m)_intx its own devres callback

2024-06-05 Thread Philipp Stanner
pci_intx() is one of the functions that have "hybrid mode" (i.e.,
sometimes managed, sometimes not). Providing a separate pcim_intx()
function with its own device resource and cleanup callback allows for
removing further large parts of the legacy PCI devres implementation.

As in the region-request-functions, pci_intx() has to call into its
managed counterpart for backwards compatibility.

As pci_intx() is an outdated function, pcim_intx() shall not be made
visible to drivers via a public API.

Implement pcim_intx() with its own device resource.
Make pci_intx() call pcim_intx() in the managed case.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 76 
 drivers/pci/pci.c| 23 --
 drivers/pci/pci.h|  7 ++--
 3 files changed, 80 insertions(+), 26 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 0bafb67e1886..9a997de280df 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -40,6 +40,11 @@ struct pcim_iomap_devres {
void __iomem *table[PCI_STD_NUM_BARS];
 };
 
+/* Used to restore the old intx state on driver detach. */
+struct pcim_intx_devres {
+   int orig_intx;
+};
+
 enum pcim_addr_devres_type {
/* Default initializer. */
PCIM_ADDR_DEVRES_TYPE_INVALID,
@@ -392,32 +397,75 @@ int pcim_set_mwi(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
+
 static inline bool mask_contains_bar(int mask, int bar)
 {
return mask & BIT(bar);
 }
 
-static void pcim_release(struct device *gendev, void *res)
+static void pcim_intx_restore(struct device *dev, void *data)
 {
-   struct pci_dev *dev = to_pci_dev(gendev);
-   struct pci_devres *this = res;
+   struct pci_dev *pdev = to_pci_dev(dev);
+   struct pcim_intx_devres *res = data;
 
-   if (this->restore_intx)
-   pci_intx(dev, this->orig_intx);
+   pci_intx(pdev, res->orig_intx);
+}
 
-   if (!dev->pinned)
-   pci_disable_device(dev);
+static struct pcim_intx_devres *get_or_create_intx_devres(struct device *dev)
+{
+   struct pcim_intx_devres *res;
+
+   res = devres_find(dev, pcim_intx_restore, NULL, NULL);
+   if (res)
+   return res;
+
+   res = devres_alloc(pcim_intx_restore, sizeof(*res), GFP_KERNEL);
+   if (res)
+   devres_add(dev, res);
+
+   return res;
 }
 
-/*
- * TODO: After the last four callers in pci.c are ported, find_pci_dr()
- * needs to be made static again.
+/**
+ * pcim_intx - managed pci_intx()
+ * @pdev: the PCI device to operate on
+ * @enable: boolean: whether to enable or disable PCI INTx
+ *
+ * Returns: 0 on success, -ENOMEM on error.
+ *
+ * Enables/disables PCI INTx for device @pdev.
+ * Restores the original state on driver detach.
  */
-struct pci_devres *find_pci_dr(struct pci_dev *pdev)
+int pcim_intx(struct pci_dev *pdev, int enable)
 {
-   if (pci_is_managed(pdev))
-   return devres_find(&pdev->dev, pcim_release, NULL, NULL);
-   return NULL;
+   u16 pci_command, new;
+   struct pcim_intx_devres *res;
+
+   res = get_or_create_intx_devres(&pdev->dev);
+   if (!res)
+   return -ENOMEM;
+
+   res->orig_intx = !enable;
+
+   pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
+
+   if (enable)
+   new = pci_command & ~PCI_COMMAND_INTX_DISABLE;
+   else
+   new = pci_command | PCI_COMMAND_INTX_DISABLE;
+
+   if (new != pci_command)
+   pci_write_config_word(pdev, PCI_COMMAND, new);
+
+   return 0;
+}
+
+static void pcim_release(struct device *gendev, void *res)
+{
+   struct pci_dev *dev = to_pci_dev(gendev);
+
+   if (!dev->pinned)
+   pci_disable_device(dev);
 }
 
 static struct pci_devres *get_pci_dr(struct pci_dev *pdev)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 04accdfab7ce..de58e77f0ee0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -,11 +,23 @@ void pci_disable_parity(struct pci_dev *dev)
  * This is a "hybrid" function: It's normally unmanaged, but becomes managed
  * when pcim_enable_device() has been called in advance. This hybrid feature is
  * DEPRECATED!
+ *
+ * Use pcim_intx() if you need a managed version.
  */
 void pci_intx(struct pci_dev *pdev, int enable)
 {
u16 pci_command, new;
 
+   /*
+* This is done for backwards compatibility, because the old PCI devres
+* API had a mode in which this function became managed if the dev had
+* been enabled with pcim_enable_device() instead of 
pci_enable_device().
+*/
+   if (pci_is_managed(pdev)) {
+   WARN_ON_ONCE(pcim_intx(pdev, enable) != 0);
+   return;
+   }
+
pci_read_config_word(pdev, PCI_COMMAND, &pci_command);
 
if (enable)
@@ -4456,17 +4468,8 @@ void pci_intx(struct pci_dev *pdev, int enable)

[PATCH v7 08/13] PCI: Move pinned status bit to struct pci_dev

2024-06-05 Thread Philipp Stanner
The bit describing whether the PCI device is currently pinned is stored
in struct pci_devres. To clean up and simplify the PCI devres API, it's
better if this information is stored in struct pci_dev.

This will later permit simplifying pcim_enable_device().

Move the 'pinned' boolean bit to struct pci_dev.

Restructure bits in struct pci_dev so the pm / pme fields are next to
each other.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 14 --
 drivers/pci/pci.h|  1 -
 include/linux/pci.h  |  5 +++--
 3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index ea590caf8995..936369face4b 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -398,7 +398,7 @@ static void pcim_release(struct device *gendev, void *res)
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
-   if (!this->pinned)
+   if (!dev->pinned)
pci_disable_device(dev);
 }
 
@@ -454,18 +454,12 @@ EXPORT_SYMBOL(pcim_enable_device);
  * pcim_pin_device - Pin managed PCI device
  * @pdev: PCI device to pin
  *
- * Pin managed PCI device @pdev.  Pinned device won't be disabled on
- * driver detach.  @pdev must have been enabled with
- * pcim_enable_device().
+ * Pin managed PCI device @pdev. Pinned device won't be disabled on driver
+ * detach. @pdev must have been enabled with pcim_enable_device().
  */
 void pcim_pin_device(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
-
-   dr = find_pci_dr(pdev);
-   WARN_ON(!dr || !pdev->enabled);
-   if (dr)
-   dr->pinned = 1;
+   pdev->pinned = true;
 }
 EXPORT_SYMBOL(pcim_pin_device);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e223e0f7dada..ff439dd05200 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -823,7 +823,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
  * then remove them from here.
  */
 struct pci_devres {
-   unsigned int pinned:1;
unsigned int orig_intx:1;
unsigned int restore_intx:1;
unsigned int mwi:1;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 110548f00b3b..3104c0238a42 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -367,11 +367,12 @@ struct pci_dev {
   this is D0-D3, D0 being fully
   functional, and D3 being off. */
u8  pm_cap; /* PM capability offset */
-   unsigned intenabled:1;  /* Whether this dev is enabled */
-   unsigned intimm_ready:1;/* Supports Immediate Readiness */
unsigned intpme_support:5;  /* Bitmask of states from which PME#
   can be generated */
unsigned intpme_poll:1; /* Poll device's PME status bit */
+   unsigned intenabled:1;  /* Whether this dev is enabled */
+   unsigned intpinned:1;   /* Whether this dev is pinned */
+   unsigned intimm_ready:1;/* Supports Immediate Readiness */
unsigned intd1_support:1;   /* Low power state D1 is supported */
unsigned intd2_support:1;   /* Low power state D2 is supported */
unsigned intno_d1d2:1;  /* D1 and D2 are forbidden */
-- 
2.45.0



[PATCH v7 09/13] PCI: Give pcim_set_mwi() its own devres callback

2024-06-05 Thread Philipp Stanner
Managing pci_set_mwi() with devres can easily be done with its own
callback, without the necessity to store any state about it in a
device-related struct.

Remove the MWI state from struct pci_devres.
Give pcim_set_mwi() a separate devres-callback.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 29 ++---
 drivers/pci/pci.h|  1 -
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 936369face4b..0bafb67e1886 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -361,24 +361,34 @@ void __iomem *devm_pci_remap_cfg_resource(struct device 
*dev,
 }
 EXPORT_SYMBOL(devm_pci_remap_cfg_resource);
 
+static void __pcim_clear_mwi(void *pdev_raw)
+{
+   struct pci_dev *pdev = pdev_raw;
+
+   pci_clear_mwi(pdev);
+}
+
 /**
  * pcim_set_mwi - a device-managed pci_set_mwi()
- * @dev: the PCI device for which MWI is enabled
+ * @pdev: the PCI device for which MWI is enabled
  *
  * Managed pci_set_mwi().
  *
  * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
  */
-int pcim_set_mwi(struct pci_dev *dev)
+int pcim_set_mwi(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
+   int ret;
 
-   dr = find_pci_dr(dev);
-   if (!dr)
-   return -ENOMEM;
+   ret = devm_add_action(&pdev->dev, __pcim_clear_mwi, pdev);
+   if (ret != 0)
+   return ret;
+
+   ret = pci_set_mwi(pdev);
+   if (ret != 0)
+   devm_remove_action(&pdev->dev, __pcim_clear_mwi, pdev);
 
-   dr->mwi = 1;
-   return pci_set_mwi(dev);
+   return ret;
 }
 EXPORT_SYMBOL(pcim_set_mwi);
 
@@ -392,9 +402,6 @@ static void pcim_release(struct device *gendev, void *res)
struct pci_dev *dev = to_pci_dev(gendev);
struct pci_devres *this = res;
 
-   if (this->mwi)
-   pci_clear_mwi(dev);
-
if (this->restore_intx)
pci_intx(dev, this->orig_intx);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index ff439dd05200..dbf6772f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -825,7 +825,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
 struct pci_devres {
unsigned int orig_intx:1;
unsigned int restore_intx:1;
-   unsigned int mwi:1;
 };
 
 struct pci_devres *find_pci_dr(struct pci_dev *pdev);
-- 
2.45.0



[PATCH v7 06/13] PCI: Warn users about complicated devres nature

2024-06-05 Thread Philipp Stanner
The PCI region-request functions become managed functions when
pcim_enable_device() has been called previously instead of
pci_enable_device().

This has already caused a bug (in 8558de401b5f) by confusing users, who
came to believe that all pci functions, such as pci_iomap_range(),
suddenly are managed that way.

This is not the case.

Add comments to the relevant functions' docstrings that warn users about
this behavior.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/iomap.c | 16 
 drivers/pci/pci.c   | 42 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iomap.c b/drivers/pci/iomap.c
index c9725428e387..a715a4803c95 100644
--- a/drivers/pci/iomap.c
+++ b/drivers/pci/iomap.c
@@ -23,6 +23,10 @@
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR from offset to the end, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_range(struct pci_dev *dev,
  int bar,
@@ -63,6 +67,10 @@ EXPORT_SYMBOL(pci_iomap_range);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR from offset to the end, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
 int bar,
@@ -106,6 +114,10 @@ EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR without checking for its length first, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device(). If you need automatic cleanup, use pcim_iomap().
  * */
 void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 {
@@ -127,6 +139,10 @@ EXPORT_SYMBOL(pci_iomap);
  *
  * @maxlen specifies the maximum length to map. If you want to get access to
  * the complete BAR without checking for its length first, pass %0 here.
+ *
+ * NOTE:
+ * This function is never managed, even if you initialized with
+ * pcim_enable_device().
  * */
 void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
 {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e4feb093f097..8dd711b9a291 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3897,6 +3897,8 @@ EXPORT_SYMBOL(pci_release_region);
  * @res_name: Name to be associated with resource.
  * @exclusive: whether the region access is exclusive or not
  *
+ * Returns: 0 on success, negative error code on failure.
+ *
  * Mark the PCI region associated with PCI device @pdev BAR @bar as
  * being reserved by owner @res_name.  Do not access any
  * address inside the PCI regions unless this call returns
@@ -3947,6 +3949,8 @@ static int __pci_request_region(struct pci_dev *pdev, int 
bar,
  * @bar: BAR to be reserved
  * @res_name: Name to be associated with resource
  *
+ * Returns: 0 on success, negative error code on failure.
+ *
  * Mark the PCI region associated with PCI device @pdev BAR @bar as
  * being reserved by owner @res_name.  Do not access any
  * address inside the PCI regions unless this call returns
@@ -3954,6 +3958,11 @@ static int __pci_request_region(struct pci_dev *pdev, 
int bar,
  *
  * Returns 0 on success, or %EBUSY on error.  A warning
  * message is also printed on failure.
+ *
+ * NOTE:
+ * This is a "hybrid" function: It's normally unmanaged, but becomes managed
+ * when pcim_enable_device() has been called in advance. This hybrid feature is
+ * DEPRECATED! If you want managed cleanup, use the pcim_* functions instead.
  */
 int pci_request_region(struct pci_dev *pdev, int bar, const char *res_name)
 {
@@ -4004,6 +4013,13 @@ static int __pci_request_selected_regions(struct pci_dev 
*pdev, int bars,
  * @pdev: PCI device whose resources are to be reserved
  * @bars: Bitmask of BARs to be requested
  * @res_name: Name to be associated with resource
+ *
+ * Returns: 0 on success, negative error code on failure.
+ *
+ * NOTE:
+ * This is a "hybrid" function: It's normally unmanaged, but becomes managed
+ * when pcim_enable_device() has been called in advance. This hybrid feature is
+ * DEPRECATED! If you want managed cleanup, use the pcim_* functions instead.
  */
 int pci_request_selected_regions(struct pci_dev *pdev, int bars,
 const char *res_name)
@@ -4012,6 +4028,19 @@ int pci_request_selected_regions(struct pci_dev *pdev, 
int bars,
 }
 EXPORT_SYMBOL(pci_request_selected_regions);
 
+/**
+ * pci_request_selected_regions_exclusive - Request regions exclusively
+ * @pdev: PCI device to request regions from
+ * @bars: bit mask of bars to request
+ * @res_name: name to be associated with the requests
+ *
+ * Re

[PATCH v7 02/13] PCI: Add devres helpers for iomap table

2024-06-05 Thread Philipp Stanner
The iomap-table administrated by pcim_iomap_table() has its entries set
and unset at several places throughout devres.c using manual iterations
which are effectively code duplications.

This can be done in a centralized, reusable manner.

Providing these new functions here and using them where (already)
possible will allow for using them in subsequent cleanup steps to
simplify the PCI devres API.

Implement helper functions to add mappings to the table and to remove
them again. Use them where applicable.

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 77 +---
 1 file changed, 58 insertions(+), 19 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index f13edd4a3873..5fc35a947b58 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -297,6 +297,52 @@ void __iomem * const *pcim_iomap_table(struct pci_dev 
*pdev)
 }
 EXPORT_SYMBOL(pcim_iomap_table);
 
+/*
+ * Fill the legacy mapping-table, so that drivers using the old API
+ * can still get a BAR's mapping address through pcim_iomap_table().
+ */
+static int pcim_add_mapping_to_legacy_table(struct pci_dev *pdev,
+void __iomem *mapping, short bar)
+{
+   void __iomem **legacy_iomap_table;
+
+   if (bar >= PCI_STD_NUM_BARS)
+   return -EINVAL;
+
+   legacy_iomap_table = (void __iomem **)pcim_iomap_table(pdev);
+   if (!legacy_iomap_table)
+   return -ENOMEM;
+
+   /* The legacy mechanism doesn't allow for duplicate mappings. */
+   WARN_ON(legacy_iomap_table[bar]);
+
+   legacy_iomap_table[bar] = mapping;
+
+   return 0;
+}
+
+/*
+ * Removes a mapping. The table only contains whole-bar-mappings, so this will
+ * never interfere with ranged mappings.
+ */
+static void pcim_remove_mapping_from_legacy_table(struct pci_dev *pdev,
+   void __iomem *addr)
+{
+   short bar;
+   void __iomem **legacy_iomap_table;
+
+   legacy_iomap_table = (void __iomem **)pcim_iomap_table(pdev);
+   if (!legacy_iomap_table)
+   return;
+
+   for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
+   if (legacy_iomap_table[bar] == addr) {
+   legacy_iomap_table[bar] = NULL;
+   return;
+   }
+   }
+}
+
 /**
  * pcim_iomap - Managed pcim_iomap()
  * @pdev: PCI device to iomap for
@@ -308,16 +354,20 @@ EXPORT_SYMBOL(pcim_iomap_table);
  */
 void __iomem *pcim_iomap(struct pci_dev *pdev, int bar, unsigned long maxlen)
 {
-   void __iomem **tbl;
+   void __iomem *mapping;
 
-   BUG_ON(bar >= PCIM_IOMAP_MAX);
-
-   tbl = (void __iomem **)pcim_iomap_table(pdev);
-   if (!tbl || tbl[bar])   /* duplicate mappings not allowed */
+   mapping = pci_iomap(pdev, bar, maxlen);
+   if (!mapping)
return NULL;
 
-   tbl[bar] = pci_iomap(pdev, bar, maxlen);
-   return tbl[bar];
+   if (pcim_add_mapping_to_legacy_table(pdev, mapping, bar) != 0)
+   goto err_table;
+
+   return mapping;
+
+err_table:
+   pci_iounmap(pdev, mapping);
+   return NULL;
 }
 EXPORT_SYMBOL(pcim_iomap);
 
@@ -330,20 +380,9 @@ EXPORT_SYMBOL(pcim_iomap);
  */
 void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr)
 {
-   void __iomem **tbl;
-   int i;
-
pci_iounmap(pdev, addr);
 
-   tbl = (void __iomem **)pcim_iomap_table(pdev);
-   BUG_ON(!tbl);
-
-   for (i = 0; i < PCIM_IOMAP_MAX; i++)
-   if (tbl[i] == addr) {
-   tbl[i] = NULL;
-   return;
-   }
-   WARN_ON(1);
+   pcim_remove_mapping_from_legacy_table(pdev, addr);
 }
 EXPORT_SYMBOL(pcim_iounmap);
 
-- 
2.45.0



[PATCH v7 11/13] PCI: Remove legacy pcim_release()

2024-06-05 Thread Philipp Stanner
Thanks to preceding cleanup steps, pcim_release() is now not needed
anymore and can be replaced by pcim_disable_device(), which is the exact
counterpart to pcim_enable_device().

This permits removing further parts of the old PCI devres implementation.

Replace pcim_release() with pcim_disable_device().
Remove the now surplus function get_pci_dr().
Remove the struct pci_devres from pci.h.
Remove the now surplus function find_pci_dr().

Signed-off-by: Philipp Stanner 
---
 drivers/pci/devres.c | 53 +---
 drivers/pci/pci.h| 18 ---
 2 files changed, 25 insertions(+), 46 deletions(-)

diff --git a/drivers/pci/devres.c b/drivers/pci/devres.c
index 9a997de280df..271ffd1aaf47 100644
--- a/drivers/pci/devres.c
+++ b/drivers/pci/devres.c
@@ -460,48 +460,45 @@ int pcim_intx(struct pci_dev *pdev, int enable)
return 0;
 }
 
-static void pcim_release(struct device *gendev, void *res)
+static void pcim_disable_device(void *pdev_raw)
 {
-   struct pci_dev *dev = to_pci_dev(gendev);
-
-   if (!dev->pinned)
-   pci_disable_device(dev);
-}
-
-static struct pci_devres *get_pci_dr(struct pci_dev *pdev)
-{
-   struct pci_devres *dr, *new_dr;
-
-   dr = devres_find(&pdev->dev, pcim_release, NULL, NULL);
-   if (dr)
-   return dr;
+   struct pci_dev *pdev = pdev_raw;
 
-   new_dr = devres_alloc(pcim_release, sizeof(*new_dr), GFP_KERNEL);
-   if (!new_dr)
-   return NULL;
-   return devres_get(&pdev->dev, new_dr, NULL, NULL);
+   if (!pdev->pinned)
+   pci_disable_device(pdev);
 }
 
 /**
  * pcim_enable_device - Managed pci_enable_device()
  * @pdev: PCI device to be initialized
  *
- * Managed pci_enable_device().
+ * Returns: 0 on success, negative error code on failure.
+ *
+ * Managed pci_enable_device(). Device will automatically be disabled on
+ * driver detach.
  */
 int pcim_enable_device(struct pci_dev *pdev)
 {
-   struct pci_devres *dr;
-   int rc;
+   int ret;
 
-   dr = get_pci_dr(pdev);
-   if (unlikely(!dr))
-   return -ENOMEM;
+   ret = devm_add_action(&pdev->dev, pcim_disable_device, pdev);
+   if (ret != 0)
+   return ret;
 
-   rc = pci_enable_device(pdev);
-   if (!rc)
-   pdev->is_managed = 1;
+   /*
+* We prefer removing the action in case of an error over
+* devm_add_action_or_reset() because the later could theoretically be
+* disturbed by users having pinned the device too soon.
+*/
+   ret = pci_enable_device(pdev);
+   if (ret != 0) {
+   devm_remove_action(&pdev->dev, pcim_disable_device, pdev);
+   return ret;
+   }
 
-   return rc;
+   pdev->is_managed = true;
+
+   return ret;
 }
 EXPORT_SYMBOL(pcim_enable_device);
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 3aa57cd8b3e5..6a9c4dd77d68 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -812,24 +812,6 @@ static inline pci_power_t mid_pci_get_power_state(struct 
pci_dev *pdev)
 }
 #endif
 
-/*
- * Managed PCI resources.  This manages device on/off, INTx/MSI/MSI-X
- * on/off and BAR regions.  pci_dev itself records MSI/MSI-X status, so
- * there's no need to track it separately.  pci_devres is initialized
- * when a device is enabled using managed PCI device enable interface.
- *
- * TODO: Struct pci_devres and find_pci_dr() only need to be here because
- * they're used in pci.c.  Port or move these functions to devres.c and
- * then remove them from here.
- */
-struct pci_devres {
-   /*
-* TODO:
-* This struct is now surplus. Remove it by refactoring pci/devres.c
-*/
-};
-
-struct pci_devres *find_pci_dr(struct pci_dev *pdev);
 int pcim_intx(struct pci_dev *dev, int enable);
 
 int pcim_request_region(struct pci_dev *pdev, int bar, const char *name);
-- 
2.45.0



  1   2   3   >