Re: [PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-06-30 Thread Martin Peres

On 29/06/2021 22:35, Matthew Brost wrote:

Add entry for i915 new parallel submission uAPI plan.

v2:
  (Daniel Vetter):
   - Expand logical order explaination
   - Add dummy header
   - Only allow N BBs in execbuf IOCTL
   - Configure parallel submission per slot not per gem context
v3:
  (Marcin Ślusarz):
   - Lot's of typos / bad english fixed
  (Tvrtko Ursulin):
   - Consistent pseudo code, clean up wording in descriptions
v4:
  (Daniel Vetter)
   - Drop flags
   - Add kernel doc
   - Reword a few things / fix typos
  (Tvrtko)
   - Reword a few things / fix typos
v5:
  (Checkpatch)
   - Fix typos
  (Docs)
   - Fix warning

Cc: Tvrtko Ursulin 
Cc: Tony Ye 
CC: Carl Zhang 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
Signed-off-by: Matthew Brost 
Acked-by: Daniel Vetter 
Acked-by: Tony Ye 
---
  Documentation/gpu/rfc/i915_parallel_execbuf.h | 122 ++
  Documentation/gpu/rfc/i915_scheduler.rst  |  59 -
  2 files changed, 180 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
b/Documentation/gpu/rfc/i915_parallel_execbuf.h
new file mode 100644
index ..8cbe2c4e0172
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
i915_context_engines_parallel_submit */
+
+/**
+ * struct drm_i915_context_engines_parallel_submit - Configure engine for
+ * parallel submission.
+ *
+ * Setup a slot in the context engine map to allow multiple BBs to be submitted
+ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the 
GPU
+ * in parallel. Multiple hardware contexts are created internally in the i915
+ * run these BBs.


The previous sentence makes little sense.


 Once a slot is configured for N BBs only N BBs can be
+ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user
+ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how
+ * many BBs there are based on the slot's configuration. The N BBs are the last
+ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
+ *
+ * The default placement behavior is to create implicit bonds between each
+ * context if each context maps to more than 1 physical engine (e.g. context is
+ * a virtual engine). Also we only allow contexts of same engine class and 
these
+ * contexts must be in logically contiguous order. Examples of the placement
+ * behavior described below. Lastly, the default is to not allow BBs to
+ * preempted mid BB rather insert coordinated preemption on all hardware
+ * contexts between each set of BBs. Flags may be added in the future to change
+ * both of these default behaviors.
+ *
+ * Returns -EINVAL if hardware context placement configuration is invalid or if
+ * the placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * interface.
+ *
+ * .. code-block:: none
+ *
+ * Example 1 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=1,
+ *  engines=CS[0],CS[1])
+ *
+ * Results in the following valid placement:
+ * CS[0], CS[1]
+ *
+ * Example 2 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ *  engines=CS[0],CS[2],CS[1],CS[3])
+ *
+ * Results in the following valid placements:
+ * CS[0], CS[1]
+ * CS[2], CS[3]
+ *
+ * This can also be thought of as 2 virtual engines described by 2-D array
+ * in the engines the field with bonds placed between each index of the
+ * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
+ * CS[3].
+ * VE[0] = CS[0], CS[2]
+ * VE[1] = CS[1], CS[3]
+ *
+ * Example 3 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ *  engines=CS[0],CS[1],CS[1],CS[3])
+ *
+ * Results in the following valid and invalid placements:
+ * CS[0], CS[1]
+ * CS[1], CS[3] - Not logical contiguous, return -EINVAL
+ */
+struct drm_i915_context_engines_parallel_submit {
+   /**
+* @base: base user extension.
+*/
+   struct i915_user_extension base;
+
+   /**
+* @engine_index: slot for parallel engine
+*/
+   __u16 engine_in

Re: [PATCH 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-06-18 Thread Ye, Tony
Acked-by: Tony Ye 

Regards,
Tony

On 6/11/2021 4:40 PM, Matthew Brost wrote:
> Add entry for i915 new parallel submission uAPI plan.
> 
> v2:
>   (Daniel Vetter):
>- Expand logical order explaination
>- Add dummy header
>- Only allow N BBs in execbuf IOCTL
>- Configure parallel submission per slot not per gem context
> v3:
>   (Marcin Ślusarz):
>- Lot's of typos / bad english fixed
>   (Tvrtko Ursulin):
>- Consistent pseudo code, clean up wording in descriptions
> v4:
>   (Daniel Vetter)
>- Drop flags
>- Add kernel doc
>- Reword a few things / fix typos
>   (Tvrtko)
>- Reword a few things / fix typos
> 
> Cc: Tvrtko Ursulin 
> Cc: Tony Ye 
> CC: Carl Zhang 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Signed-off-by: Matthew Brost 
> Acked-by: Daniel Vetter 
> ---
>   Documentation/gpu/rfc/i915_parallel_execbuf.h | 117 ++
>   Documentation/gpu/rfc/i915_scheduler.rst  |  59 -
>   2 files changed, 175 insertions(+), 1 deletion(-)
>   create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> 
> diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> new file mode 100644
> index ..c22af3a359e4
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> @@ -0,0 +1,117 @@
> +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> i915_context_engines_parallel_submit */
> +
> +/**
> + * struct drm_i915_context_engines_parallel_submit - Configure engine for
> + * parallel submission.
> + *
> + * Setup a slot in the context engine map to allow multiple BBs to be 
> submitted
> + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the 
> GPU
> + * in parallel. Multiple hardware contexts are created internally in the i915
> + * run these BBs. Once a slot is configured for N BBs only N BBs can be
> + * submitted in each execbuf IOCTL and this is implicit behavior e.g. The 
> user
> + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows 
> how
> + * many BBs there are based on the slot's configuration. The N BBs are the 
> last
> + * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.
> + *
> + * The default placement behavior is to create implicit bonds between each
> + * context if each context maps to more than 1 physical engine (e.g. context 
> is
> + * a virtual engine). Also we only allow contexts of same engine class and 
> these
> + * contexts must be in logically contiguous order. Examples of the placement
> + * behavior described below. Lastly, the default is to not allow BBs to
> + * preempted mid BB rather insert coordinated preemption on all hardware
> + * contexts between each set of BBs. Flags may be added in the future to 
> change
> + * bott of these default behaviors.
> + *
> + * Returns -EINVAL if hardware context placement configuration is invalid or 
> if
> + * the placement configuration isn't supported on the platform / submission
> + * interface.
> + * Returns -ENODEV if extension isn't supported on the platform / submission
> + * inteface.
> + *
> + * .. code-block::
> + *
> + *   Example 1 pseudo code:
> + *   CS[X] = generic engine of same class, logical instance X
> + *   INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> + *   set_engines(INVALID)
> + *   set_parallel(engine_index=0, width=2, num_siblings=1,
> + *engines=CS[0],CS[1])
> + *
> + *   Results in the following valid placement:
> + *   CS[0], CS[1]
> + *
> + *   Example 2 pseudo code:
> + *   CS[X] = generic engine of same class, logical instance X
> + *   INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> + *   set_engines(INVALID)
> + *   set_parallel(engine_index=0, width=2, num_siblings=2,
> + *engines=CS[0],CS[2],CS[1],CS[3])
> + *
> + *   Results in the following valid placements:
> + *   CS[0], CS[1]
> + *   CS[2], CS[3]
> + *
> + *   This can also be thought of as 2 virtual engines described by 2-D array
> + *   in the engines the field with bonds placed between each index of the
> + *   virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to
> + *   CS[3].
> + *   VE[0] = CS[0], CS[2]
> + *   VE[1] = CS[1], CS[3]
> + *
> + *   Example 3 pseudo code:
> + *   CS[X] = generic engine of same class, logical instance X
> + *   INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> + *   set_engines(INVALID)
> + *   set_parallel(engine_index=0, width=2, num_siblings=2,
> + *engines=CS[0],CS[1],CS[1],CS[3])
> + *
> + *   Results in the following valid and invalid placements:
> + *   CS[0], CS[1]
> + *   CS[1], CS[3] - Not logical contiguous, return -EINVAL
> + */
> +struct drm_i915_context_engines_parallel_submit {
> + /**
> +  * @base: base user extension.
> +  */
> + struct i915_user_extension base;
> +
> + /**
> +  * @engine_index: slot for parallel engine
> +  */
> + __u16 en