Re: [PATCH v3 4/4] drm/v3d: add multiple syncobjs support
On 10/01, Iago Toral wrote: > On Fri, 2021-10-01 at 09:37 +0100, Melissa Wen wrote: > > On 10/01, Iago Toral wrote: > > > On Thu, 2021-09-30 at 17:19 +0100, Melissa Wen wrote: > > > > Using the generic extension from the previous patch, a specific > > > > multisync > > > > extension enables more than one in/out binary syncobj per job > > > > submission. > > > > Arrays of syncobjs are set in struct drm_v3d_multisync, that also > > > > cares > > > > of determining the stage for sync (wait deps) according to the > > > > job > > > > queue. > > > > > > > > v2: > > > > - subclass the generic extension struct (Daniel) > > > > - simplify adding dependency conditions to make understandable > > > > (Iago) > > > > > > > > v3: > > > > - fix conditions to consider single or multiples in/out_syncs > > > > (Iago) > > > > - remove irrelevant comment (Iago) > > > > > > > > Signed-off-by: Melissa Wen > > > > --- > > > > drivers/gpu/drm/v3d/v3d_drv.c | 6 +- > > > > drivers/gpu/drm/v3d/v3d_drv.h | 24 +++-- > > > > drivers/gpu/drm/v3d/v3d_gem.c | 185 > > > > ++ > > > > > > > > include/uapi/drm/v3d_drm.h| 49 - > > > > 4 files changed, 232 insertions(+), 32 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c > > > > b/drivers/gpu/drm/v3d/v3d_drv.c > > > > index 3d6b9bcce2f7..bd46396a1ae0 100644 > > > > --- a/drivers/gpu/drm/v3d/v3d_drv.c > > > > +++ b/drivers/gpu/drm/v3d/v3d_drv.c > > > > @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct > > > > drm_device > > > > > > (...) > > > > > > > @@ -516,9 +536,11 @@ > > > > v3d_attach_fences_and_unlock_reservation(struct > > > > drm_file *file_priv, > > > > struct v3d_job *job, > > > > struct ww_acquire_ctx > > > > *acquire_ctx, > > > > u32 out_sync, > > > > +struct v3d_submit_ext *se, > > > > struct dma_fence *done_fence) > > > > { > > > > struct drm_syncobj *sync_out; > > > > + bool has_multisync = se && (se->flags & > > > > > > We always pass the 'se' pointer from a local variable allocated in > > > the > > > stack of the caller so it is never going to be NULL. > > > > > > I am happy to keep the NULL check if you want to protect against > > > future > > > changes just in case, but if we do that then... > > > > > > > DRM_V3D_EXT_ID_MULTI_SYNC); > > > > int i; > > > > > > > > for (i = 0; i < job->bo_count; i++) { > > > > > > (...) > > > > > > > +static void > > > > +v3d_put_multisync_post_deps(struct v3d_submit_ext *se) > > > > +{ > > > > + unsigned int i; > > > > + > > > > + if (!se->out_sync_count) > > > > > > ...we should also check for NULL here for consistency. > > yes, consistency makes sense here. > > > Also, I think there is another problem in the code: we always call > > > v3d_job_cleanup for failed jobs, but only call v3d_job_put for > > > successful jobs. However, reading the docs for drm_sched_job_init: > > > > > > "Drivers must make sure drm_sched_job_cleanup() if this function > > > returns successfully, even when @job is aborted before > > > drm_sched_job_arm() is called." > > > > > > So my understanding is that we should call v3d_job_cleanup instead > > > of > > > v3d_job_put for successful jobs or we would be leaking sched > > > resources > > > on every successful job, no? > > > > When job_init is successful, v3d_job_cleanup is called by scheduler > > when > > job is completed. drm_sched_job_cleanup describes how it works after > > drm_sched_job_arm: > > > > " After that point of no return @job is committed to be executed by > > the > > scheduler, and this function should be called from the > > &drm_sched_backend_ops.free_job callback." > > > > On v3d_sched.c, .free_job points to v3d_sched_job_free(), which in > > turn > > calls v3d_job_cleanup() (and then drm_sched_job_cleanup). So, it > > looks > > ok. > > > > Also, we can say that the very last v3d_job_put() is in charge to > > decrement refcount initialized (set 1) in v3d_job_init(); while the > > v3d_job_cleanup() from v3d_sched_job_free() callback decrements > > refcount > > that was incremented when v3d_job_push() pushed the job to the > > scheduler. > > > > So, refcount pairs seem consistent, I mean, get and put. And about > > drm_sched_job_cleanup, it is explicitly called when job_init fails or > > after that by the scheduler. > > > >A. Ah ok, thanks for explaining this! nice! > > With the consistency issue discussed above fixed, for the series: > > Reviewed-by: Iago Toral Quiroga ok, thanks for reviewing and all improvement suggestions, Melissa > > Iago > signature.asc Description: PGP signature
Re: [PATCH v3 4/4] drm/v3d: add multiple syncobjs support
On Fri, 2021-10-01 at 09:37 +0100, Melissa Wen wrote: > On 10/01, Iago Toral wrote: > > On Thu, 2021-09-30 at 17:19 +0100, Melissa Wen wrote: > > > Using the generic extension from the previous patch, a specific > > > multisync > > > extension enables more than one in/out binary syncobj per job > > > submission. > > > Arrays of syncobjs are set in struct drm_v3d_multisync, that also > > > cares > > > of determining the stage for sync (wait deps) according to the > > > job > > > queue. > > > > > > v2: > > > - subclass the generic extension struct (Daniel) > > > - simplify adding dependency conditions to make understandable > > > (Iago) > > > > > > v3: > > > - fix conditions to consider single or multiples in/out_syncs > > > (Iago) > > > - remove irrelevant comment (Iago) > > > > > > Signed-off-by: Melissa Wen > > > --- > > > drivers/gpu/drm/v3d/v3d_drv.c | 6 +- > > > drivers/gpu/drm/v3d/v3d_drv.h | 24 +++-- > > > drivers/gpu/drm/v3d/v3d_gem.c | 185 > > > ++ > > > > > > include/uapi/drm/v3d_drm.h| 49 - > > > 4 files changed, 232 insertions(+), 32 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c > > > b/drivers/gpu/drm/v3d/v3d_drv.c > > > index 3d6b9bcce2f7..bd46396a1ae0 100644 > > > --- a/drivers/gpu/drm/v3d/v3d_drv.c > > > +++ b/drivers/gpu/drm/v3d/v3d_drv.c > > > @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct > > > drm_device > > > > (...) > > > > > @@ -516,9 +536,11 @@ > > > v3d_attach_fences_and_unlock_reservation(struct > > > drm_file *file_priv, > > >struct v3d_job *job, > > >struct ww_acquire_ctx > > > *acquire_ctx, > > >u32 out_sync, > > > + struct v3d_submit_ext *se, > > >struct dma_fence *done_fence) > > > { > > > struct drm_syncobj *sync_out; > > > + bool has_multisync = se && (se->flags & > > > > We always pass the 'se' pointer from a local variable allocated in > > the > > stack of the caller so it is never going to be NULL. > > > > I am happy to keep the NULL check if you want to protect against > > future > > changes just in case, but if we do that then... > > > > > DRM_V3D_EXT_ID_MULTI_SYNC); > > > int i; > > > > > > for (i = 0; i < job->bo_count; i++) { > > > > (...) > > > > > +static void > > > +v3d_put_multisync_post_deps(struct v3d_submit_ext *se) > > > +{ > > > + unsigned int i; > > > + > > > + if (!se->out_sync_count) > > > > ...we should also check for NULL here for consistency. > yes, consistency makes sense here. > > Also, I think there is another problem in the code: we always call > > v3d_job_cleanup for failed jobs, but only call v3d_job_put for > > successful jobs. However, reading the docs for drm_sched_job_init: > > > > "Drivers must make sure drm_sched_job_cleanup() if this function > > returns successfully, even when @job is aborted before > > drm_sched_job_arm() is called." > > > > So my understanding is that we should call v3d_job_cleanup instead > > of > > v3d_job_put for successful jobs or we would be leaking sched > > resources > > on every successful job, no? > > When job_init is successful, v3d_job_cleanup is called by scheduler > when > job is completed. drm_sched_job_cleanup describes how it works after > drm_sched_job_arm: > > " After that point of no return @job is committed to be executed by > the > scheduler, and this function should be called from the > &drm_sched_backend_ops.free_job callback." > > On v3d_sched.c, .free_job points to v3d_sched_job_free(), which in > turn > calls v3d_job_cleanup() (and then drm_sched_job_cleanup). So, it > looks > ok. > > Also, we can say that the very last v3d_job_put() is in charge to > decrement refcount initialized (set 1) in v3d_job_init(); while the > v3d_job_cleanup() from v3d_sched_job_free() callback decrements > refcount > that was incremented when v3d_job_push() pushed the job to the > scheduler. > > So, refcount pairs seem consistent, I mean, get and put. And about > drm_sched_job_cleanup, it is explicitly called when job_init fails or > after that by the scheduler. > A. Ah ok, thanks for explaining this! With the consistency issue discussed above fixed, for the series: Reviewed-by: Iago Toral Quiroga Iago
Re: [PATCH v3 4/4] drm/v3d: add multiple syncobjs support
On 10/01, Iago Toral wrote: > On Thu, 2021-09-30 at 17:19 +0100, Melissa Wen wrote: > > Using the generic extension from the previous patch, a specific > > multisync > > extension enables more than one in/out binary syncobj per job > > submission. > > Arrays of syncobjs are set in struct drm_v3d_multisync, that also > > cares > > of determining the stage for sync (wait deps) according to the job > > queue. > > > > v2: > > - subclass the generic extension struct (Daniel) > > - simplify adding dependency conditions to make understandable (Iago) > > > > v3: > > - fix conditions to consider single or multiples in/out_syncs (Iago) > > - remove irrelevant comment (Iago) > > > > Signed-off-by: Melissa Wen > > --- > > drivers/gpu/drm/v3d/v3d_drv.c | 6 +- > > drivers/gpu/drm/v3d/v3d_drv.h | 24 +++-- > > drivers/gpu/drm/v3d/v3d_gem.c | 185 ++ > > > > include/uapi/drm/v3d_drm.h| 49 - > > 4 files changed, 232 insertions(+), 32 deletions(-) > > > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c > > b/drivers/gpu/drm/v3d/v3d_drv.c > > index 3d6b9bcce2f7..bd46396a1ae0 100644 > > --- a/drivers/gpu/drm/v3d/v3d_drv.c > > +++ b/drivers/gpu/drm/v3d/v3d_drv.c > > @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device > > (...) > > > > > @@ -516,9 +536,11 @@ v3d_attach_fences_and_unlock_reservation(struct > > drm_file *file_priv, > > struct v3d_job *job, > > struct ww_acquire_ctx > > *acquire_ctx, > > u32 out_sync, > > +struct v3d_submit_ext *se, > > struct dma_fence *done_fence) > > { > > struct drm_syncobj *sync_out; > > + bool has_multisync = se && (se->flags & > > We always pass the 'se' pointer from a local variable allocated in the > stack of the caller so it is never going to be NULL. > > I am happy to keep the NULL check if you want to protect against future > changes just in case, but if we do that then... > > > DRM_V3D_EXT_ID_MULTI_SYNC); > > int i; > > > > for (i = 0; i < job->bo_count; i++) { > > (...) > > > +static void > > +v3d_put_multisync_post_deps(struct v3d_submit_ext *se) > > +{ > > + unsigned int i; > > + > > + if (!se->out_sync_count) > > ...we should also check for NULL here for consistency. yes, consistency makes sense here. > > Also, I think there is another problem in the code: we always call > v3d_job_cleanup for failed jobs, but only call v3d_job_put for > successful jobs. However, reading the docs for drm_sched_job_init: > > "Drivers must make sure drm_sched_job_cleanup() if this function > returns successfully, even when @job is aborted before > drm_sched_job_arm() is called." > > So my understanding is that we should call v3d_job_cleanup instead of > v3d_job_put for successful jobs or we would be leaking sched resources > on every successful job, no? When job_init is successful, v3d_job_cleanup is called by scheduler when job is completed. drm_sched_job_cleanup describes how it works after drm_sched_job_arm: " After that point of no return @job is committed to be executed by the scheduler, and this function should be called from the &drm_sched_backend_ops.free_job callback." On v3d_sched.c, .free_job points to v3d_sched_job_free(), which in turn calls v3d_job_cleanup() (and then drm_sched_job_cleanup). So, it looks ok. Also, we can say that the very last v3d_job_put() is in charge to decrement refcount initialized (set 1) in v3d_job_init(); while the v3d_job_cleanup() from v3d_sched_job_free() callback decrements refcount that was incremented when v3d_job_push() pushed the job to the scheduler. So, refcount pairs seem consistent, I mean, get and put. And about drm_sched_job_cleanup, it is explicitly called when job_init fails or after that by the scheduler. Thanks, Melissa > > Iago > > > + return; > > + > > + for (i = 0; i < se->out_sync_count; i++) > > + drm_syncobj_put(se->out_syncs[i].syncobj); > > + kvfree(se->out_syncs); > > +} > > + > > +static int > > +v3d_get_multisync_post_deps(struct drm_file *file_priv, > > + struct v3d_submit_ext *se, > > + u32 count, u64 handles) > > +{ > > + struct drm_v3d_sem __user *post_deps; > > + int i, ret; > > + > > + if (!count) > > + return 0; > > + > > + se->out_syncs = (struct v3d_submit_outsync *) > > + kvmalloc_array(count, > > + sizeof(struct > > v3d_submit_outsync), > > + GFP_KERNEL); > > + if (!se->out_syncs) > > + return -ENOMEM; > > + > > + post_deps = u64_to_user_ptr(handles); > > + > > + for (i = 0; i < count; i++) { > > + struct drm_v3d_sem out; > > + > > + ret = copy_from_user(&out, post_deps++, sizeof(out)); > > + if (ret) { > > +
Re: [PATCH v3 4/4] drm/v3d: add multiple syncobjs support
On Thu, 2021-09-30 at 17:19 +0100, Melissa Wen wrote: > Using the generic extension from the previous patch, a specific > multisync > extension enables more than one in/out binary syncobj per job > submission. > Arrays of syncobjs are set in struct drm_v3d_multisync, that also > cares > of determining the stage for sync (wait deps) according to the job > queue. > > v2: > - subclass the generic extension struct (Daniel) > - simplify adding dependency conditions to make understandable (Iago) > > v3: > - fix conditions to consider single or multiples in/out_syncs (Iago) > - remove irrelevant comment (Iago) > > Signed-off-by: Melissa Wen > --- > drivers/gpu/drm/v3d/v3d_drv.c | 6 +- > drivers/gpu/drm/v3d/v3d_drv.h | 24 +++-- > drivers/gpu/drm/v3d/v3d_gem.c | 185 ++ > > include/uapi/drm/v3d_drm.h| 49 - > 4 files changed, 232 insertions(+), 32 deletions(-) > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.c > b/drivers/gpu/drm/v3d/v3d_drv.c > index 3d6b9bcce2f7..bd46396a1ae0 100644 > --- a/drivers/gpu/drm/v3d/v3d_drv.c > +++ b/drivers/gpu/drm/v3d/v3d_drv.c > @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device (...) > > @@ -516,9 +536,11 @@ v3d_attach_fences_and_unlock_reservation(struct > drm_file *file_priv, >struct v3d_job *job, >struct ww_acquire_ctx > *acquire_ctx, >u32 out_sync, > + struct v3d_submit_ext *se, >struct dma_fence *done_fence) > { > struct drm_syncobj *sync_out; > + bool has_multisync = se && (se->flags & We always pass the 'se' pointer from a local variable allocated in the stack of the caller so it is never going to be NULL. I am happy to keep the NULL check if you want to protect against future changes just in case, but if we do that then... > DRM_V3D_EXT_ID_MULTI_SYNC); > int i; > > for (i = 0; i < job->bo_count; i++) { (...) > +static void > +v3d_put_multisync_post_deps(struct v3d_submit_ext *se) > +{ > + unsigned int i; > + > + if (!se->out_sync_count) ...we should also check for NULL here for consistency. Also, I think there is another problem in the code: we always call v3d_job_cleanup for failed jobs, but only call v3d_job_put for successful jobs. However, reading the docs for drm_sched_job_init: "Drivers must make sure drm_sched_job_cleanup() if this function returns successfully, even when @job is aborted before drm_sched_job_arm() is called." So my understanding is that we should call v3d_job_cleanup instead of v3d_job_put for successful jobs or we would be leaking sched resources on every successful job, no? Iago > + return; > + > + for (i = 0; i < se->out_sync_count; i++) > + drm_syncobj_put(se->out_syncs[i].syncobj); > + kvfree(se->out_syncs); > +} > + > +static int > +v3d_get_multisync_post_deps(struct drm_file *file_priv, > + struct v3d_submit_ext *se, > + u32 count, u64 handles) > +{ > + struct drm_v3d_sem __user *post_deps; > + int i, ret; > + > + if (!count) > + return 0; > + > + se->out_syncs = (struct v3d_submit_outsync *) > + kvmalloc_array(count, > +sizeof(struct > v3d_submit_outsync), > +GFP_KERNEL); > + if (!se->out_syncs) > + return -ENOMEM; > + > + post_deps = u64_to_user_ptr(handles); > + > + for (i = 0; i < count; i++) { > + struct drm_v3d_sem out; > + > + ret = copy_from_user(&out, post_deps++, sizeof(out)); > + if (ret) { > + DRM_DEBUG("Failed to copy post dep handles\n"); > + goto fail; > + } > + > + se->out_syncs[i].syncobj = drm_syncobj_find(file_priv, > + out.handle) > ; > + if (!se->out_syncs[i].syncobj) { > + ret = -EINVAL; > + goto fail; > + } > } > + se->out_sync_count = count; > + > + return 0; > + > +fail: > + for (i--; i >= 0; i--) > + drm_syncobj_put(se->out_syncs[i].syncobj); > + kvfree(se->out_syncs); > + > + return ret; > +} > + > +/* Get data for multiple binary semaphores synchronization. Parse > syncobj > + * to be signaled when job completes (out_sync). > + */ > +static int > +v3d_get_multisync_submit_deps(struct drm_file *file_priv, > + struct drm_v3d_extension __user *ext, > + void *data) > +{ > + struct drm_v3d_multi_sync multisync; > + struct v3d_submit_ext *se = data; > + int ret; > + > + ret = copy_from_user(&multisync, ext, sizeof(multisync)); > + if (ret) > + return ret; >
[PATCH v3 4/4] drm/v3d: add multiple syncobjs support
Using the generic extension from the previous patch, a specific multisync extension enables more than one in/out binary syncobj per job submission. Arrays of syncobjs are set in struct drm_v3d_multisync, that also cares of determining the stage for sync (wait deps) according to the job queue. v2: - subclass the generic extension struct (Daniel) - simplify adding dependency conditions to make understandable (Iago) v3: - fix conditions to consider single or multiples in/out_syncs (Iago) - remove irrelevant comment (Iago) Signed-off-by: Melissa Wen --- drivers/gpu/drm/v3d/v3d_drv.c | 6 +- drivers/gpu/drm/v3d/v3d_drv.h | 24 +++-- drivers/gpu/drm/v3d/v3d_gem.c | 185 ++ include/uapi/drm/v3d_drm.h| 49 - 4 files changed, 232 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/v3d_drv.c index 3d6b9bcce2f7..bd46396a1ae0 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.c +++ b/drivers/gpu/drm/v3d/v3d_drv.c @@ -96,6 +96,9 @@ static int v3d_get_param_ioctl(struct drm_device *dev, void *data, case DRM_V3D_PARAM_SUPPORTS_PERFMON: args->value = (v3d->ver >= 40); return 0; + case DRM_V3D_PARAM_SUPPORTS_MULTISYNC_EXT: + args->value = 1; + return 0; default: DRM_DEBUG("Unknown parameter %d\n", args->param); return -EINVAL; @@ -135,9 +138,8 @@ v3d_postclose(struct drm_device *dev, struct drm_file *file) struct v3d_file_priv *v3d_priv = file->driver_priv; enum v3d_queue q; - for (q = 0; q < V3D_MAX_QUEUES; q++) { + for (q = 0; q < V3D_MAX_QUEUES; q++) drm_sched_entity_destroy(&v3d_priv->sched_entity[q]); - } v3d_perfmon_close_file(v3d_priv); kfree(v3d_priv); diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h index b900a050d5e2..b74b1351bfc8 100644 --- a/drivers/gpu/drm/v3d/v3d_drv.h +++ b/drivers/gpu/drm/v3d/v3d_drv.h @@ -19,15 +19,6 @@ struct reset_control; #define GMP_GRANULARITY (128 * 1024) -/* Enum for each of the V3D queues. */ -enum v3d_queue { - V3D_BIN, - V3D_RENDER, - V3D_TFU, - V3D_CSD, - V3D_CACHE_CLEAN, -}; - #define V3D_MAX_QUEUES (V3D_CACHE_CLEAN + 1) struct v3d_queue_state { @@ -294,6 +285,21 @@ struct v3d_csd_job { struct drm_v3d_submit_csd args; }; +struct v3d_submit_outsync { + struct drm_syncobj *syncobj; +}; + +struct v3d_submit_ext { + u32 flags; + u32 wait_stage; + + u32 in_sync_count; + u64 in_syncs; + + u32 out_sync_count; + struct v3d_submit_outsync *out_syncs; +}; + /** * __wait_for - magic wait macro * diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index 93f130fb3a13..d344eae36b57 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -454,11 +454,12 @@ v3d_job_add_deps(struct drm_file *file_priv, struct v3d_job *job, static int v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv, void **container, size_t size, void (*free)(struct kref *ref), -u32 in_sync, enum v3d_queue queue) +u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue) { struct v3d_file_priv *v3d_priv = file_priv->driver_priv; struct v3d_job *job; - int ret; + bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC); + int ret, i; *container = kcalloc(1, size, GFP_KERNEL); if (!*container) { @@ -479,9 +480,28 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv, if (ret) goto fail_job; - ret = v3d_job_add_deps(file_priv, job, in_sync, 0); - if (ret) - goto fail_deps; + if (has_multisync) { + if (se->in_sync_count && se->wait_stage == queue) { + struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs); + + for (i = 0; i < se->in_sync_count; i++) { + struct drm_v3d_sem in; + + ret = copy_from_user(&in, handle++, sizeof(in)); + if (ret) { + DRM_DEBUG("Failed to copy wait dep handle.\n"); + goto fail_deps; + } + ret = v3d_job_add_deps(file_priv, job, in.handle, 0); + if (ret) + goto fail_deps; + } + } + } else { + ret = v3d_job_add_deps(file_priv, job, in_sync, 0); + if (ret) + goto fail_deps; + } kref_init(&job->refcount); @@ -516,9 +536,11 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,