[Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-14 Thread Lionel Landwerlin
From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
.instance = 0, };

memset(&arg, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) &sseu;
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
sseu.packed.subslice_mask = 0;

drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_gem_context.c | 169 
 drivers/gpu/drm/i915/intel_lrc.c| 103 ++-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h |  38 ++
 5 files changed, 281 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 01310c99e032..0b72a771c3f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
*dev, void *data,
return 0;
 }
 
+static int
+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+ const struct drm_i915_gem_context_param_sseu 
*user_sseu,
+ union intel_sseu *ctx_sseu)
+{
+   if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+   user_sseu->slice_mask == 0)
+   return -EINVAL;
+
+   if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+   user_sseu->subslice_mask == 0)
+   return -EINVAL;
+
+   if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+   return -EINVAL;
+
+   if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+   user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
+   user_sseu->max_eus_per_subslice == 0)
+   return -EINVAL;
+
+   ctx_sseu->slice_mask = user_sseu->slice_mask;
+   ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+   ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+   ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+   return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+ struct intel_engine_cs *engine,
+ union intel_sseu sseu)
+{
+   struct drm_i915_private *dev_priv = ctx->i915;
+   struct i915_timeline *timeline;
+   struct i915_request *rq;
+   union intel_sseu actual_sseu;
+   enum intel_engine_id id;
+   int ret;
+
+   /*
+* First notify user when this capability is not available so that it
+* can be detected with any valid input.
+*/
+   if (!engine->emit_rpcs_config)
+   return -ENODEV;
+
+   if (to_intel_context(ctx, engine)->sseu.value == sseu.value)
+   return 0;
+
+   lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+   i915_retire_requests(dev_priv);
+
+   /* Now use the RCS to actually reconfigure. */
+   engine = dev_priv->engine[RCS];
+
+   rq = i915_request_alloc(engine, dev_priv->kernel_context);
+   if (IS_ERR(rq))
+   re

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-15 Thread Tvrtko Ursulin


On 14/05/2018 16:56, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
.instance = 0, };

memset(&arg, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) &sseu;
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
sseu.packed.subslice_mask = 0;

drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_gem_context.c | 169 
  drivers/gpu/drm/i915/intel_lrc.c| 103 ++-
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
  include/uapi/drm/i915_drm.h |  38 ++
  5 files changed, 281 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c
index 01310c99e032..0b72a771c3f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct drm_device 
*dev, void *data,
return 0;
  }
  
+static int

+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+ const struct drm_i915_gem_context_param_sseu 
*user_sseu,
+ union intel_sseu *ctx_sseu)
+{
+   if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+   user_sseu->slice_mask == 0)
+   return -EINVAL;
+
+   if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+   user_sseu->subslice_mask == 0)
+   return -EINVAL;
+
+   if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+   return -EINVAL;
+
+   if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+   user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
+   user_sseu->max_eus_per_subslice == 0)
+   return -EINVAL;
+
+   ctx_sseu->slice_mask = user_sseu->slice_mask;
+   ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+   ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+   ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+   return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+ struct intel_engine_cs *engine,
+ union intel_sseu sseu)
+{
+   struct drm_i915_private *dev_priv = ctx->i915;
+   struct i915_timeline *timeline;
+   struct i915_request *rq;
+   union intel_sseu actual_sseu;
+   enum intel_engine_id id;
+   int ret;
+
+   /*
+* First notify user when this capability is not available so that it
+* can be detected with any valid input.
+*/
+   if (!engine->emit_rpcs_config)
+   return -ENODEV;
+
+   if (to_intel_context(ctx, engine)->sseu.value == sseu.value)


Are there other uses for the value union in the series? Think whether it 
could be dropped and memcmp used here for simplicity.



+   return 0;
+
+   lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+   i915_retire_requests(dev_priv);
+
+   /* Now 

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-16 Thread Tvrtko Ursulin


On 15/05/2018 10:05, Tvrtko Ursulin wrote:


On 14/05/2018 16:56, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
    .instance = 0, };

memset(&arg, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) &sseu;
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
    sseu.packed.subslice_mask = 0;

    drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
(Lionel)


v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities 
(Lionel)


v6: Change context powergating settings through MI_SDM on kernel 
context (Chris)


Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_gem_context.c | 169 
  drivers/gpu/drm/i915/intel_lrc.c    | 103 ++-
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
  include/uapi/drm/i915_drm.h |  38 ++
  5 files changed, 281 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c

index 01310c99e032..0b72a771c3f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
drm_device *dev, void *data,

  return 0;
  }
+static int
+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+  const struct drm_i915_gem_context_param_sseu *user_sseu,
+  union intel_sseu *ctx_sseu)
+{
+    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+    user_sseu->slice_mask == 0)
+    return -EINVAL;
+
+    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+    user_sseu->subslice_mask == 0)
+    return -EINVAL;
+
+    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+    return -EINVAL;
+
+    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+    user_sseu->max_eus_per_subslice < 
user_sseu->min_eus_per_subslice ||

+    user_sseu->max_eus_per_subslice == 0)
+    return -EINVAL;
+
+    ctx_sseu->slice_mask = user_sseu->slice_mask;
+    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+    return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+  struct intel_engine_cs *engine,
+  union intel_sseu sseu)
+{
+    struct drm_i915_private *dev_priv = ctx->i915;
+    struct i915_timeline *timeline;
+    struct i915_request *rq;
+    union intel_sseu actual_sseu;
+    enum intel_engine_id id;
+    int ret;
+
+    /*
+ * First notify user when this capability is not available so 
that it

+ * can be detected with any valid input.
+ */
+    if (!engine->emit_rpcs_config)
+    return -ENODEV;
+
+    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)


Are there other uses for the value union in the series? Think whether it 
could be dropped and memcmp used here for simplicity.



+    return 0;
+
+    lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+    i915_retire_requests(dev_priv);
+
+    /* Now use the RCS to actually reconfigure. */
+    engine = dev_priv->engine[RCS];
+
+    rq = i915_request_alloc(engine, dev_priv->kernel_context);
+    if (IS_ERR(rq))
+    return 

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-16 Thread Lionel Landwerlin

On 16/05/18 16:40, Tvrtko Ursulin wrote:


On 15/05/2018 10:05, Tvrtko Ursulin wrote:


On 14/05/2018 16:56, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration 
for

its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and 
forcing

the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
    .instance = 0, };

memset(&arg, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) &sseu;
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
    sseu.packed.subslice_mask = 0;

    drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
(Lionel)


v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities 
(Lionel)


v6: Change context powergating settings through MI_SDM on kernel 
context (Chris)


Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_gem_context.c | 169 


  drivers/gpu/drm/i915/intel_lrc.c    | 103 ++-
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
  include/uapi/drm/i915_drm.h |  38 ++
  5 files changed, 281 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c

index 01310c99e032..0b72a771c3f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
drm_device *dev, void *data,

  return 0;
  }
+static int
+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+  const struct drm_i915_gem_context_param_sseu *user_sseu,
+  union intel_sseu *ctx_sseu)
+{
+    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+    user_sseu->slice_mask == 0)
+    return -EINVAL;
+
+    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+    user_sseu->subslice_mask == 0)
+    return -EINVAL;
+
+    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+    return -EINVAL;
+
+    if (user_sseu->max_eus_per_subslice > 
sseu->max_eus_per_subslice ||
+    user_sseu->max_eus_per_subslice < 
user_sseu->min_eus_per_subslice ||

+    user_sseu->max_eus_per_subslice == 0)
+    return -EINVAL;
+
+    ctx_sseu->slice_mask = user_sseu->slice_mask;
+    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+    return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+  struct intel_engine_cs *engine,
+  union intel_sseu sseu)
+{
+    struct drm_i915_private *dev_priv = ctx->i915;
+    struct i915_timeline *timeline;
+    struct i915_request *rq;
+    union intel_sseu actual_sseu;
+    enum intel_engine_id id;
+    int ret;
+
+    /*
+ * First notify user when this capability is not available so 
that it

+ * can be detected with any valid input.
+ */
+    if (!engine->emit_rpcs_config)
+    return -ENODEV;
+
+    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)


Are there other uses for the value union in the series? Think whether 
it could be dropped and memcmp used here for simplicity.



+    return 0;
+
+    lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+    i915_retire_requests(dev_priv);
+
+    /* Now use the RCS to actually reconfigure. */
+    engine = dev_priv->engine[RCS];
+
+    rq = i915_request_alloc(engine, dev_priv->kerne

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-16 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-05-16 16:40:55)
> 
> On 15/05/2018 10:05, Tvrtko Ursulin wrote:
> > 
> > On 14/05/2018 16:56, Lionel Landwerlin wrote:
> >> From: Chris Wilson 
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> >>
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >> struct drm_i915_gem_context_param arg;
> >> struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> >>     .instance = 0, };
> >>
> >> memset(&arg, 0, sizeof(arg));
> >> arg.ctx_id = ctx;
> >> arg.param = I915_CONTEXT_PARAM_SSEU;
> >> arg.value = (uintptr_t) &sseu;
> >> if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >>     sseu.packed.subslice_mask = 0;
> >>
> >>     drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >> }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
> >> (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> v5: Validate sseu configuration against the device's capabilities 
> >> (Lionel)
> >>
> >> v6: Change context powergating settings through MI_SDM on kernel 
> >> context (Chris)
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Signed-off-by: Chris Wilson 
> >> Signed-off-by: Lionel Landwerlin 
> >> c: Dmitry Rogozhkin 
> >> CC: Tvrtko Ursulin 
> >> CC: Zhipeng Gong 
> >> CC: Joonas Lahtinen 
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 169 
> >>   drivers/gpu/drm/i915/intel_lrc.c    | 103 ++-
> >>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
> >>   include/uapi/drm/i915_drm.h |  38 ++
> >>   5 files changed, 281 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> >> b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index 01310c99e032..0b72a771c3f3 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
> >> drm_device *dev, void *data,
> >>   return 0;
> >>   }
> >> +static int
> >> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> >> +  const struct drm_i915_gem_context_param_sseu *user_sseu,
> >> +  union intel_sseu *ctx_sseu)
> >> +{
> >> +    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
> >> +    user_sseu->slice_mask == 0)
> >> +    return -EINVAL;
> >> +
> >> +    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
> >> +    user_sseu->subslice_mask == 0)
> >> +    return -EINVAL;
> >> +
> >> +    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
> >> +    return -EINVAL;
> >> +
> >> +    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
> >> +    user_sseu->max_eus_per_subslice < 
> >> user_sseu->min_eus_per_subslice ||
> >> +    user_sseu->max_eus_per_subslice == 0)
> >> +    return -EINVAL;
> >> +
> >> +    ctx_sseu->slice_mask = user_sseu->slice_mask;
> >> +    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
> >> +    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
> >> +    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static int
> >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> >> +  struct intel_engine_cs *engine,
> >> +  union intel_sseu sseu)
> >> +{
> >> +    struct drm_i915_private *dev_priv = ctx->i915;
> >> +    struct i915_timeline *timeline;
> >> +    struct i915_request *rq;
> >> +    union intel_sseu actual_sseu;
> >> +    enum intel_engine_id id;
> >> +    int ret;
> >> +
> >> +    /*
> >> + * First notify user when this capability is not available so 
> >> that it
> >

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-21 Thread Lionel Landwerlin

On 15/05/18 10:05, Tvrtko Ursulin wrote:


On 14/05/2018 16:56, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
    .instance = 0, };

memset(&arg, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) &sseu;
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
    sseu.packed.subslice_mask = 0;

    drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
(Lionel)


v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities 
(Lionel)


v6: Change context powergating settings through MI_SDM on kernel 
context (Chris)


Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson 
Signed-off-by: Lionel Landwerlin 
c: Dmitry Rogozhkin 
CC: Tvrtko Ursulin 
CC: Zhipeng Gong 
CC: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_gem_context.c | 169 
  drivers/gpu/drm/i915/intel_lrc.c    | 103 ++-
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
  include/uapi/drm/i915_drm.h |  38 ++
  5 files changed, 281 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
b/drivers/gpu/drm/i915/i915_gem_context.c

index 01310c99e032..0b72a771c3f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
drm_device *dev, void *data,

  return 0;
  }
  +static int
+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+  const struct drm_i915_gem_context_param_sseu *user_sseu,
+  union intel_sseu *ctx_sseu)
+{
+    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+    user_sseu->slice_mask == 0)
+    return -EINVAL;
+
+    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+    user_sseu->subslice_mask == 0)
+    return -EINVAL;
+
+    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+    return -EINVAL;
+
+    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+    user_sseu->max_eus_per_subslice < 
user_sseu->min_eus_per_subslice ||

+    user_sseu->max_eus_per_subslice == 0)
+    return -EINVAL;
+
+    ctx_sseu->slice_mask = user_sseu->slice_mask;
+    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+    return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+  struct intel_engine_cs *engine,
+  union intel_sseu sseu)
+{
+    struct drm_i915_private *dev_priv = ctx->i915;
+    struct i915_timeline *timeline;
+    struct i915_request *rq;
+    union intel_sseu actual_sseu;
+    enum intel_engine_id id;
+    int ret;
+
+    /*
+ * First notify user when this capability is not available so 
that it

+ * can be detected with any valid input.
+ */
+    if (!engine->emit_rpcs_config)
+    return -ENODEV;
+
+    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)


Are there other uses for the value union in the series? Think whether 
it could be dropped and memcmp used here for simplicity.


Sure.




+    return 0;
+
+    lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+    i915_retire_requests(dev_priv);
+
+    /* Now use the RCS to actually reconfigure. */
+    engine = dev_priv->engine[RCS];
+
+    rq = i915_request_alloc(engine, dev_priv->kernel_context);
+    if (IS_ERR(rq))
+   

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-21 Thread Tvrtko Ursulin


On 21/05/2018 14:22, Lionel Landwerlin wrote:

On 15/05/18 10:05, Tvrtko Ursulin wrote:


On 14/05/2018 16:56, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
    .instance = 0, };

memset(&arg, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) &sseu;
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
    sseu.packed.subslice_mask = 0;

    drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
(Lionel)


v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities 
(Lionel)


v6: Change context powergating settings through MI_SDM on kernel 
context (Chris)


[snip]




+ intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
+    sseu;
+
+    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
+    if (ret) {
+    __i915_request_add(rq, true);
+    return ret;
+    }
+
+    /* Queue this switch after all other activity */
+    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {


This can iterate over gt.active_rings for a shorter walk. See current 
state of engine_has_idle_kernel_context.



+    struct i915_request *prev;
+
+    prev = last_request_on_engine(timeline, engine);
+    if (prev)
+ i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+ &prev->submit,
+ I915_FENCE_GFP);
+    }
+
+    __i915_request_add(rq, true);


This is actually the bit I reading the patch for. So this I think is 
much better/safer than previous idling. However one concern, and maybe 
not a concern but just something which needs to be explained in the 
uAPI, is what it means with regards to the point from which the new 
configuration becomes live.


To me it's seems that all context ioctl effects are ordered.
So I would expect any execbuf after this ioctl would have the right 
configuration.

Anything before would have the previous configuration.

If that's your understanind too, I can add that in the documentation.


I guess I am missing what prevents the context which issued the SSEU 
re-configuraiton to run before the re-configuration request executes. 
They are on separate timelines and subsequent execbuf on the issuing 
context won't depend on it.




Could preemption for instance make it not defined enough? Could we, or 
should we, also link this request somehow onto the issuing context 
timeline so it must be first there? Hm, should we use the user context 
instead of the kernel one, and set the highest priority? But would 
have to avoid triggering preemption.


My understanding is that a context can't MI_LRI itself before Gen11.
I haven't tested that on Gen11 though.

I'm afraid that leaves us with only a max priority request tied to all 
the previous requests.
Or somehow keep a pointer to the last request to change the context 
powergating configuration and add a dependency on all new request until 
it's retired?


I lost the train of dependencies here. :) [snip]




+
+    param_sseu.slice_mask = ce->sseu.slice_mask;
+    param_sseu.subslice_mask = ce->sseu.subslice_mask;
+    param_sseu.min_eus_per_subslice = 
ce->sseu.min_eus_per_subslice;
+    param_sseu.max_eus_per_subslice = 
ce->sseu.max_eus_per_subslice;

+
+    if (copy_to_user(u64_to_user_ptr(args->value), ¶m_sseu,
+ sizeof(param_sseu)))
+    ret = -EFAULT;
+    break;
+    }
  default:
  ret = -EINVAL;
  break;
@@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct 
drm_device *dev, void *data,

  ctx->sched.priority = priority;
  }
  break;

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-21 Thread Lionel Landwerlin

On 21/05/18 17:00, Tvrtko Ursulin wrote:


On 21/05/2018 14:22, Lionel Landwerlin wrote:

On 15/05/18 10:05, Tvrtko Ursulin wrote:


On 14/05/2018 16:56, Lionel Landwerlin wrote:

From: Chris Wilson 

We want to allow userspace to reconfigure the subslice 
configuration for

its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and 
forcing

the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

struct drm_i915_gem_context_param arg;
struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
.instance = 0, };

memset(&arg, 0, sizeof(arg));
arg.ctx_id = ctx;
arg.param = I915_CONTEXT_PARAM_SSEU;
arg.value = (uintptr_t) &sseu;
if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 
0) {

    sseu.packed.subslice_mask = 0;

    drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in 
intel_lr_context_set_sseu() (Lionel)


v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities 
(Lionel)


v6: Change context powergating settings through MI_SDM on kernel 
context (Chris)


[snip]




+ intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
+    sseu;
+
+    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
+    if (ret) {
+    __i915_request_add(rq, true);
+    return ret;
+    }
+
+    /* Queue this switch after all other activity */
+    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {


This can iterate over gt.active_rings for a shorter walk. See current 
state of engine_has_idle_kernel_context.


Thanks.




+    struct i915_request *prev;
+
+    prev = last_request_on_engine(timeline, engine);
+    if (prev)
+ i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+ &prev->submit,
+ I915_FENCE_GFP);
+    }
+
+    __i915_request_add(rq, true);


This is actually the bit I reading the patch for. So this I think is 
much better/safer than previous idling. However one concern, and 
maybe not a concern but just something which needs to be explained 
in the uAPI, is what it means with regards to the point from which 
the new configuration becomes live.


To me it's seems that all context ioctl effects are ordered.
So I would expect any execbuf after this ioctl would have the right 
configuration.

Anything before would have the previous configuration.

If that's your understanind too, I can add that in the documentation.


I guess I am missing what prevents the context which issued the SSEU 
re-configuraiton to run before the re-configuration request executes. 
They are on separate timelines and subsequent execbuf on the issuing 
context won't depend on it.




Could preemption for instance make it not defined enough? Could we, 
or should we, also link this request somehow onto the issuing 
context timeline so it must be first there? Hm, should we use the 
user context instead of the kernel one, and set the highest 
priority? But would have to avoid triggering preemption.


My understanding is that a context can't MI_LRI itself before Gen11.
I haven't tested that on Gen11 though.

I'm afraid that leaves us with only a max priority request tied to 
all the previous requests.
Or somehow keep a pointer to the last request to change the context 
powergating configuration and add a dependency on all new request 
until it's retired?


I lost the train of dependencies here. :) [snip]


Sorry, I was getting confused by reading Chris' reply at the same time.

There is a problem with preemption and since the documentation tells us 
that we can't MI_LRI on the context itself (like Chris suggested and 
what would be really simpler), we have to go with an MI_SDM which this 
patch currently lacks synchronization for.







+
+    param_sseu.slice_mask = ce->sseu.slice_mask;
+    param_sseu.subslice_mask = ce->sseu.subslice_mask;
+    param_sseu.min_eus_per_subslice = 
ce->sseu.min_eus_per_subslice;
+    param_sseu.max_eus_per_subslice = 
ce->sseu.max_eus_per_subslice;

+
+    if (copy_to_user(u64_to_user_ptr(a

Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-22 Thread Lionel Landwerlin

On 21/05/18 17:00, Tvrtko Ursulin wrote:


+
+    /* Queue this switch after all other activity */
+    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {


This can iterate over gt.active_rings for a shorter walk. See current 
state of engine_has_idle_kernel_context.


For some reason, iterating over gt.active_rings will trigger an invalid 
memory access :|


Not sure what's wrong here...

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

2018-05-22 Thread Lionel Landwerlin

On 22/05/18 17:11, Lionel Landwerlin wrote:

On 21/05/18 17:00, Tvrtko Ursulin wrote:


+
+    /* Queue this switch after all other activity */
+    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {


This can iterate over gt.active_rings for a shorter walk. See current 
state of engine_has_idle_kernel_context.


For some reason, iterating over gt.active_rings will trigger an 
invalid memory access :|


Not sure what's wrong here...


Duh!

Found it :

list_for_each_entry(ring, &dev_priv->gt.active_rings, link) {

Instead of :

list_for_each_entry(ring, &dev_priv->gt.active_rings, active_link) {

-
Lionel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx