Re: [Intel-gfx] [RFC PATCH 2/5] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-11 Thread Matthew Brost
On Tue, May 11, 2021 at 04:49:58PM +0200, Daniel Vetter wrote:
> On Thu, May 06, 2021 at 10:30:46AM -0700, Matthew Brost wrote:
> > Add entry fpr i915 new parallel submission uAPI plan.
> > 
> > Cc: Tvrtko Ursulin 
> > Cc: Tony Ye 
> > CC: Carl Zhang 
> > Cc: Daniel Vetter 
> > Cc: Jason Ekstrand 
> > Signed-off-by: Matthew Brost 
> > ---
> >  Documentation/gpu/rfc/i915_scheduler.rst | 56 +++-
> >  1 file changed, 54 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/gpu/rfc/i915_scheduler.rst 
> > b/Documentation/gpu/rfc/i915_scheduler.rst
> > index fa6780a11c86..e3455b33edfe 100644
> > --- a/Documentation/gpu/rfc/i915_scheduler.rst
> > +++ b/Documentation/gpu/rfc/i915_scheduler.rst
> > @@ -13,7 +13,8 @@ i915 with the DRM scheduler is:
> >   modparam enable_guc
> > * Lots of rework will need to be done to integrate with DRM scheduler so
> >   no need to nit pick everything in the code, it just should be
> > - functional and not regress execlists
> > + functional, no major coding style / layering errors, and not regress
> > + execlists
> 
> I guess this hunk should be in the previous patch?
> 

Yep, noticed this after sending.

> > * Update IGTs / selftests as needed to work with GuC submission
> > * Enable CI on supported platforms for a baseline
> > * Rework / get CI heathly for GuC submission in place as needed
> > @@ -67,4 +68,55 @@ levels too.
> >  
> >  New parallel submission uAPI
> >  
> > -Details to come in a following patch.
> > +The existing bonding uAPI is completely broken with GuC submission because
> > +whether a submission is a single context submit or parallel submit isn't 
> > known
> > +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
> > +contexts in parallel with the GuC the context must be explictly registered 
> > with
> > +N contexts and all N contexts must be submitted in a single command to the 
> > GuC.
> > +This interfaces doesn't support dynamically changing between N contexts as 
> > the
> > +bonding uAPI does. Hence the need for a new parallel submission interface. 
> > Also
> > +the legacy bonding uAPI is quite confusing and not intuitive at all.
> 
> I think you should sit together with Jason on irc or so for a bit and get
> an earful of how it's all broken irrespective of GuC submission or not.
> Just to hammer in our case :-)
>

Sounds like a fun conversation, will do.
 
> > +
> > +The new parallel submission uAPI consists of 3 parts:
> > +
> > +* Export engines logical mapping
> > +* A 'set_parallel' extension to configure contexts for parallel
> > +  submission
> > +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
> > +
> > +Export engines logical mapping
> > +--
> > +Certain use cases require BBs to be placed on engine instances in logical 
> > order
> > +(e.g. split-frame on gen11+). The logical mapping of engine instances can 
> > change
> > +based on fusing. Rather than making UMDs be aware of fusing, simply expose 
> > the
> > +logical mapping with the existing query engine info IOCTL. Also the GuC
> > +submission interface currently only supports submitting multiple contexts 
> > to
> > +engines in logical order.
> 
> Maybe highlight more that this is a new restriction with GuC compared to
> execlist, which is why we need to expose this information to userspace.
> Also on the platforms thus far supported in upstream there's at most 2
> engines of the same type, so really not an issue.
>

Sure. This is a limitation of the GuC interface + really isn't needed unless we
have more than 2 engines of the same type.
 
> > +
> > +A single bit will be added to drm_i915_engine_info.flags indicating that 
> > the
> > +logical instance has been returned and a new field,
> > +drm_i915_engine_info.logical_instance, returns the logical instance.
> > +
> > +A 'set_parallel' extension to configure contexts for parallel submission
> > +
> > +The 'set_parallel' extension configures N contexts for parallel 
> > submission. It
> > +is setup step that should be called before using any of the contexts. See
> > +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
> > +similar existing examples. Once the N contexts are configured for parallel
> > +submission the execbuf2 IOCTL can be called submiting 1-N BBs in a single 
> > IOCTL.
> > +Although submitting less than N BBs is allowed it is not recommended as 
> > that
> > +will likely leave parts of the hardware reserved and idle. Initially only
> > +support GuC submission. Execlist support can be added later if needed.
> 
> Can we just require that you always submit N batchbuffers, or does this
> create a problem for userspace? Allowing things just because is generally
> not a good idea with uapi, it's better to limit and then allow when
> there's a need.
>

Yes, we can 

Re: [Intel-gfx] [RFC PATCH 2/5] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-11 Thread Daniel Vetter
On Thu, May 06, 2021 at 10:30:46AM -0700, Matthew Brost wrote:
> Add entry fpr i915 new parallel submission uAPI plan.
> 
> Cc: Tvrtko Ursulin 
> Cc: Tony Ye 
> CC: Carl Zhang 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Signed-off-by: Matthew Brost 
> ---
>  Documentation/gpu/rfc/i915_scheduler.rst | 56 +++-
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/gpu/rfc/i915_scheduler.rst 
> b/Documentation/gpu/rfc/i915_scheduler.rst
> index fa6780a11c86..e3455b33edfe 100644
> --- a/Documentation/gpu/rfc/i915_scheduler.rst
> +++ b/Documentation/gpu/rfc/i915_scheduler.rst
> @@ -13,7 +13,8 @@ i915 with the DRM scheduler is:
> modparam enable_guc
>   * Lots of rework will need to be done to integrate with DRM scheduler so
> no need to nit pick everything in the code, it just should be
> -   functional and not regress execlists
> +   functional, no major coding style / layering errors, and not regress
> +   execlists

I guess this hunk should be in the previous patch?

>   * Update IGTs / selftests as needed to work with GuC submission
>   * Enable CI on supported platforms for a baseline
>   * Rework / get CI heathly for GuC submission in place as needed
> @@ -67,4 +68,55 @@ levels too.
>  
>  New parallel submission uAPI
>  
> -Details to come in a following patch.
> +The existing bonding uAPI is completely broken with GuC submission because
> +whether a submission is a single context submit or parallel submit isn't 
> known
> +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple
> +contexts in parallel with the GuC the context must be explictly registered 
> with
> +N contexts and all N contexts must be submitted in a single command to the 
> GuC.
> +This interfaces doesn't support dynamically changing between N contexts as 
> the
> +bonding uAPI does. Hence the need for a new parallel submission interface. 
> Also
> +the legacy bonding uAPI is quite confusing and not intuitive at all.

I think you should sit together with Jason on irc or so for a bit and get
an earful of how it's all broken irrespective of GuC submission or not.
Just to hammer in our case :-)

> +
> +The new parallel submission uAPI consists of 3 parts:
> +
> +* Export engines logical mapping
> +* A 'set_parallel' extension to configure contexts for parallel
> +  submission
> +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
> +
> +Export engines logical mapping
> +--
> +Certain use cases require BBs to be placed on engine instances in logical 
> order
> +(e.g. split-frame on gen11+). The logical mapping of engine instances can 
> change
> +based on fusing. Rather than making UMDs be aware of fusing, simply expose 
> the
> +logical mapping with the existing query engine info IOCTL. Also the GuC
> +submission interface currently only supports submitting multiple contexts to
> +engines in logical order.

Maybe highlight more that this is a new restriction with GuC compared to
execlist, which is why we need to expose this information to userspace.
Also on the platforms thus far supported in upstream there's at most 2
engines of the same type, so really not an issue.

> +
> +A single bit will be added to drm_i915_engine_info.flags indicating that the
> +logical instance has been returned and a new field,
> +drm_i915_engine_info.logical_instance, returns the logical instance.
> +
> +A 'set_parallel' extension to configure contexts for parallel submission
> +
> +The 'set_parallel' extension configures N contexts for parallel submission. 
> It
> +is setup step that should be called before using any of the contexts. See
> +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for
> +similar existing examples. Once the N contexts are configured for parallel
> +submission the execbuf2 IOCTL can be called submiting 1-N BBs in a single 
> IOCTL.
> +Although submitting less than N BBs is allowed it is not recommended as that
> +will likely leave parts of the hardware reserved and idle. Initially only
> +support GuC submission. Execlist support can be added later if needed.

Can we just require that you always submit N batchbuffers, or does this
create a problem for userspace? Allowing things just because is generally
not a good idea with uapi, it's better to limit and then allow when
there's a need.

Ofc if we already have a need then explain why and that's all fine.

Also detailed comments on the kerneldoc I'll do in the next patches.

> +
> +Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and
> +i915_context_engines_parallel_submit to the uAPI to implement this extension.
> +
> +Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL
> +---
> +Contexts that have been configured with the