On Sat, 30 Aug 2025 10:12:32 +0200 Daniel Stone <dan...@fooishbar.org> wrote:
> Hi Adrian, > > On Thu, 28 Aug 2025 at 04:35, Adrián Larumbe > <adrian.laru...@collabora.com> wrote: > > -void panfrost_job_close(struct panfrost_file_priv *panfrost_priv) > > +int panfrost_jm_ctx_destroy(struct drm_file *file, u32 handle) > > { > > - struct panfrost_device *pfdev = panfrost_priv->pfdev; > > - int i; > > + struct panfrost_file_priv *priv = file->driver_priv; > > + struct panfrost_device *pfdev = priv->pfdev; > > + struct panfrost_jm_ctx *jm_ctx; > > > > - for (i = 0; i < NUM_JOB_SLOTS; i++) > > - drm_sched_entity_destroy(&panfrost_priv->sched_entity[i]); > > + jm_ctx = xa_erase(&priv->jm_ctxs, handle); > > + if (!jm_ctx) > > + return -EINVAL; > > + > > + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) { > > + if (jm_ctx->slots[i].enabled) > > + > > drm_sched_entity_destroy(&jm_ctx->slots[i].sched_entity); > > + } > > > > /* Kill in-flight jobs */ > > spin_lock(&pfdev->js->job_lock); > > - for (i = 0; i < NUM_JOB_SLOTS; i++) { > > - struct drm_sched_entity *entity = > > &panfrost_priv->sched_entity[i]; > > - int j; > > + for (u32 i = 0; i < ARRAY_SIZE(jm_ctx->slots); i++) { > > + struct drm_sched_entity *entity = > > &jm_ctx->slots[i].sched_entity; > > + > > + if (!jm_ctx->slots[i].enabled) > > + continue; > > > > - for (j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) { > > + for (int j = ARRAY_SIZE(pfdev->jobs[0]) - 1; j >= 0; j--) { > > struct panfrost_job *job = pfdev->jobs[i][j]; > > u32 cmd; > > > > @@ -980,18 +1161,7 @@ void panfrost_job_close(struct panfrost_file_priv > > *panfrost_priv) > > } > > } > > spin_unlock(&pfdev->js->job_lock); > > -} > > - > > -int panfrost_job_is_idle(struct panfrost_device *pfdev) > > -{ > > - struct panfrost_job_slot *js = pfdev->js; > > - int i; > > - > > - for (i = 0; i < NUM_JOB_SLOTS; i++) { > > - /* If there are any jobs in the HW queue, we're not idle */ > > - if (atomic_read(&js->queue[i].sched.credit_count)) > > - return false; > > - } > > > > - return true; > > + panfrost_jm_ctx_put(jm_ctx); > > + return 0; > > } > > It seems odd that both panfrost_jm_ctx_destroy() and > panfrost_jm_ctx_release() share lifetime responsibilities. I'd expect > calling panfrost_jm_ctx_destroy() to just release the xarray handle > and drop the refcount. I guess you refer to the drm_sched_entity_destroy() calls. If so, I agree that they should be removed from panfrost_jm_ctx_release() because panfrost_jm_ctx_destroy() should always be called for the JM ctx refcount to drop to zero. > > I can see why calling panfrost_jm_ctx_destroy() is the one to go try > to cancel the jobs - because the jobs keep a refcount on the context, > so we need to break that cycle somehow. But having both the > handle-release and object-release function drop a ref on the sched > entity seems odd? Note that drm_sched_entity_destroy() doesn't really drop a ref, it just flushes/cancels the jobs, and makes sure the entity is no longer considered by the scheduler. After the first drm_sched_entity_destroy() happens (in jm_ctx_destroy()), I'd expect entity->rq to be NULL, making the subsequent call to drm_sched_entity_destroy() (in jm_ctx_release()) a NOP (both drm_sched_entity_{flush,fini}() bail out early if entity->rq is NULL). Now, there might be other things in drm_sched_entity that are not safe to cleanup twice, and I agree that drm_sched_entity_destroy() shouldn't be called in both places anyway. > > It doesn't help much that panfrost_job is used both for actual jobs > (as the type) and the capability for a device to have multiple > job-manager contexts (as a function prefix). Would be great to clean > that up, so you don't have to think about whether e.g. > panfrost_job_close() is actually operating on a panfrost_job, or > operating on multiple panfrost_jm_ctx which operate on multiple > panfrost_job. Yep, we should definitely change the prefix to panthor_jm_ when the function manipulates the JM scheduler context.