Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-09 Thread Tvrtko Ursulin



On 08/01/2024 15:13, Joonas Lahtinen wrote:

Quoting Tvrtko Ursulin (2024-01-05 12:39:31)


On 04/01/2024 21:23, Andi Shyti wrote:





+void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+   mutex_lock(>ccs.mutex);
+   __intel_gt_apply_ccs_mode(gt);
+   mutex_unlock(>ccs.mutex);
+}
+
+void intel_gt_init_ccs_mode(struct intel_gt *gt)
+{
+   mutex_init(>ccs.mutex);
+   gt->ccs.mode = 1;


What is '1'? And this question carries over to the sysfs interface in the
following patch - who will use it and where it is documented how to use it?


The value '1' is explained in the comment above[1] and in the


Do you mean this is mode '1':

   * With 1 engine (ccs0):
   *   slice 0, 1, 2, 3: ccs0

?

But I don't see where it says what do different modes mean on different
SKU configurations.

It also does not say what should the num_slices sysfs file be used for.

Does "mode N" mean "assign each command streamer N compute slices"? Or
"assign each compute slice N command streamers"?

I wonder if we should add something user friendly into
Documentation/ABI/*/sysfs-... Joonas your thoughts?


We definitely should always properly document all sysfs additions, just
seems like we less frequently remember to do so. So yeah, this should be
documented just like other uAPI.

I also like the idea of not exposing the the file at all if the value
can't be modified.

The ccs_mode is just supposed to allow user to select how many CCS
engines they want to expose, and always make an even split of slices
between them, nothing more nothing less.


Hmm I can't see that the series changes anywhere what command streamers 
will get reported as available.


Regards,

Tvrtko




Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-08 Thread Joonas Lahtinen
Quoting Tvrtko Ursulin (2024-01-05 12:39:31)
> 
> On 04/01/2024 21:23, Andi Shyti wrote:



> >>> +void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> >>> +{
> >>> +   mutex_lock(>ccs.mutex);
> >>> +   __intel_gt_apply_ccs_mode(gt);
> >>> +   mutex_unlock(>ccs.mutex);
> >>> +}
> >>> +
> >>> +void intel_gt_init_ccs_mode(struct intel_gt *gt)
> >>> +{
> >>> +   mutex_init(>ccs.mutex);
> >>> +   gt->ccs.mode = 1;
> >>
> >> What is '1'? And this question carries over to the sysfs interface in the
> >> following patch - who will use it and where it is documented how to use it?
> > 
> > The value '1' is explained in the comment above[1] and in the
> 
> Do you mean this is mode '1':
> 
>   * With 1 engine (ccs0):
>   *   slice 0, 1, 2, 3: ccs0
> 
> ?
> 
> But I don't see where it says what do different modes mean on different 
> SKU configurations.
> 
> It also does not say what should the num_slices sysfs file be used for.
> 
> Does "mode N" mean "assign each command streamer N compute slices"? Or 
> "assign each compute slice N command streamers"?
> 
> I wonder if we should add something user friendly into 
> Documentation/ABI/*/sysfs-... Joonas your thoughts?

We definitely should always properly document all sysfs additions, just
seems like we less frequently remember to do so. So yeah, this should be
documented just like other uAPI.

I also like the idea of not exposing the the file at all if the value
can't be modified.

The ccs_mode is just supposed to allow user to select how many CCS
engines they want to expose, and always make an even split of slices
between them, nothing more nothing less.

Regards, Joonas


Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-05 Thread Tvrtko Ursulin



On 04/01/2024 21:23, Andi Shyti wrote:

Hi Tvrtko,

[1]


+   /*
+* Loop over all available slices and assign each a user engine.
+*
+* With 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* With 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* With 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*
+* Since the number of slices and the number of engines is
+* known, and we ensure that there is an exact multiple of
+* engines for slices, the double loop becomes a loop over each
+* slice.
+*/
+   for (i = num_slices / num_engines; i < num_slices; i++) {
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   for_each_engine_masked(engine, gt, ALL_CCS(gt), tmp) {
+   /* If a slice is fused off, leave disabled */
+   while (!(CCS_MASK(gt) & BIT(slice)))
+   slice++;
+
+   mode &= ~XEHP_CCS_MODE_CSLICE(slice, 
XEHP_CCS_MODE_CSLICE_MASK);
+   mode |= XEHP_CCS_MODE_CSLICE(slice, engine->instance);
+
+   /* assign the next slice */
+   slice++;
+   }
+   }
+
+   intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
+}
+
+void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+   mutex_lock(>ccs.mutex);
+   __intel_gt_apply_ccs_mode(gt);
+   mutex_unlock(>ccs.mutex);
+}
+
+void intel_gt_init_ccs_mode(struct intel_gt *gt)
+{
+   mutex_init(>ccs.mutex);
+   gt->ccs.mode = 1;


What is '1'? And this question carries over to the sysfs interface in the
following patch - who will use it and where it is documented how to use it?


The value '1' is explained in the comment above[1] and in the


Do you mean this is mode '1':

 * With 1 engine (ccs0):
 *   slice 0, 1, 2, 3: ccs0

?

But I don't see where it says what do different modes mean on different 
SKU configurations.


It also does not say what should the num_slices sysfs file be used for.

Does "mode N" mean "assign each command streamer N compute slices"? Or 
"assign each compute slice N command streamers"?


I wonder if we should add something user friendly into 
Documentation/ABI/*/sysfs-... Joonas your thoughts?



comment below[2]. Maybe we should give it an enum meaning? But
that would be something like CCS_MODE_1/2/4, I thinks
ccs.mode = 1/2/4 is more understandable.


Also, should this setting somehow be gated by an applicable platform? Or if
not on setting then when acting on it in __intel_gt_apply_ccs_mode?

Creation of sysfs files as well should be gated by platform too in the
following patch?


The idea of this series is to disable the CCS load balancing
(which automatically chooses between mode 1/2/4) and used the
a fixed scheme chosen by the user.

(I'm preparing v3 as Chris was so kind to recommend some changes
offline)


Okay lets wait for v2 and I will then see if I will this that will make 
it clearer to casual observers.


Regards,

Tvrtko



Thanks,
Andi

[2]


+   /*
+* Track fixed mapping between CCS engines and compute slices.
+*
+* In order to w/a HW that has the inability to dynamically load
+* balance between CCS engines and EU in the compute slices, we have to
+* reconfigure a static mapping on the fly. We track the current CCS
+* configuration (set by thr user through a sysfs interface) and compare
+* it against the current CCS_MODE (which maps CCS engines to compute
+* slices). If there is only a single engine selected, we can map it to
+* all available compute slices for maximal single task performance
+* (fast/narrow). If there are more then one engine selected, we have to
+* reduce the number of slices allocated to each engine (wide/slow),
+* fairly distributing the EU between the equivalent engines.
+*/
+   struct {
+   struct mutex mutex;
+   u32 mode;
+   } ccs;


Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-04 Thread Andi Shyti
Hi Tvrtko,

[1]

> > +   /*
> > +* Loop over all available slices and assign each a user engine.
> > +*
> > +* With 1 engine (ccs0):
> > +*   slice 0, 1, 2, 3: ccs0
> > +*
> > +* With 2 engines (ccs0, ccs1):
> > +*   slice 0, 2: ccs0
> > +*   slice 1, 3: ccs1
> > +*
> > +* With 4 engines (ccs0, ccs1, ccs2, ccs3):
> > +*   slice 0: ccs0
> > +*   slice 1: ccs1
> > +*   slice 2: ccs2
> > +*   slice 3: ccs3
> > +*
> > +* Since the number of slices and the number of engines is
> > +* known, and we ensure that there is an exact multiple of
> > +* engines for slices, the double loop becomes a loop over each
> > +* slice.
> > +*/
> > +   for (i = num_slices / num_engines; i < num_slices; i++) {
> > +   struct intel_engine_cs *engine;
> > +   intel_engine_mask_t tmp;
> > +
> > +   for_each_engine_masked(engine, gt, ALL_CCS(gt), tmp) {
> > +   /* If a slice is fused off, leave disabled */
> > +   while (!(CCS_MASK(gt) & BIT(slice)))
> > +   slice++;
> > +
> > +   mode &= ~XEHP_CCS_MODE_CSLICE(slice, 
> > XEHP_CCS_MODE_CSLICE_MASK);
> > +   mode |= XEHP_CCS_MODE_CSLICE(slice, engine->instance);
> > +
> > +   /* assign the next slice */
> > +   slice++;
> > +   }
> > +   }
> > +
> > +   intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
> > +}
> > +
> > +void intel_gt_apply_ccs_mode(struct intel_gt *gt)
> > +{
> > +   mutex_lock(>ccs.mutex);
> > +   __intel_gt_apply_ccs_mode(gt);
> > +   mutex_unlock(>ccs.mutex);
> > +}
> > +
> > +void intel_gt_init_ccs_mode(struct intel_gt *gt)
> > +{
> > +   mutex_init(>ccs.mutex);
> > +   gt->ccs.mode = 1;
> 
> What is '1'? And this question carries over to the sysfs interface in the
> following patch - who will use it and where it is documented how to use it?

The value '1' is explained in the comment above[1] and in the
comment below[2]. Maybe we should give it an enum meaning? But
that would be something like CCS_MODE_1/2/4, I thinks
ccs.mode = 1/2/4 is more understandable.

> Also, should this setting somehow be gated by an applicable platform? Or if
> not on setting then when acting on it in __intel_gt_apply_ccs_mode?
> 
> Creation of sysfs files as well should be gated by platform too in the
> following patch?

The idea of this series is to disable the CCS load balancing
(which automatically chooses between mode 1/2/4) and used the
a fixed scheme chosen by the user.

(I'm preparing v3 as Chris was so kind to recommend some changes
offline)

Thanks,
Andi

[2]

> > +   /*
> > +* Track fixed mapping between CCS engines and compute slices.
> > +*
> > +* In order to w/a HW that has the inability to dynamically load
> > +* balance between CCS engines and EU in the compute slices, we have to
> > +* reconfigure a static mapping on the fly. We track the current CCS
> > +* configuration (set by thr user through a sysfs interface) and compare
> > +* it against the current CCS_MODE (which maps CCS engines to compute
> > +* slices). If there is only a single engine selected, we can map it to
> > +* all available compute slices for maximal single task performance
> > +* (fast/narrow). If there are more then one engine selected, we have to
> > +* reduce the number of slices allocated to each engine (wide/slow),
> > +* fairly distributing the EU between the equivalent engines.
> > +*/
> > +   struct {
> > +   struct mutex mutex;
> > +   u32 mode;
> > +   } ccs;


Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-04 Thread Tvrtko Ursulin



On 04/01/2024 14:35, Andi Shyti wrote:

The CCS mode involves assigning CCS engines to slices depending
on the number of slices and the number of engines the user wishes
to set.

In this patch, the default CCS setting is established during the
initial GT settings. It involves assigning only one CCS to all
the slices.

Based on a patch by Chris Wilson 
and Tejas Upadhyay .

Signed-off-by: Andi Shyti 
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Niranjana Vishwanathapura 
Cc: Tejas Upadhyay 
---
  drivers/gpu/drm/i915/Makefile   |  1 +
  drivers/gpu/drm/i915/gt/intel_gt.c  |  6 ++
  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 81 +
  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 16 
  drivers/gpu/drm/i915/gt/intel_gt_regs.h | 13 
  drivers/gpu/drm/i915/gt/intel_gt_types.h| 19 +
  drivers/gpu/drm/i915/i915_drv.h |  2 +
  7 files changed, 138 insertions(+)
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e777686190ca..1dce15d6306b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -119,6 +119,7 @@ gt-y += \
gt/intel_ggtt_fencing.o \
gt/intel_gt.o \
gt/intel_gt_buffer_pool.o \
+   gt/intel_gt_ccs_mode.o \
gt/intel_gt_clock_utils.o \
gt/intel_gt_debugfs.o \
gt/intel_gt_engines_debugfs.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a425db5ed3a2..e83c7b80c07a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -17,6 +17,7 @@
  #include "intel_engine_regs.h"
  #include "intel_ggtt_gmch.h"
  #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
  #include "intel_gt_buffer_pool.h"
  #include "intel_gt_clock_utils.h"
  #include "intel_gt_debugfs.h"
@@ -47,6 +48,7 @@ void intel_gt_common_init_early(struct intel_gt *gt)
init_llist_head(>watchdog.list);
INIT_WORK(>watchdog.work, intel_gt_watchdog_work);
  
+	intel_gt_init_ccs_mode(gt);

intel_gt_init_buffer_pool(gt);
intel_gt_init_reset(gt);
intel_gt_init_requests(gt);
@@ -195,6 +197,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
  
  	intel_gt_init_swizzling(gt);
  
+	/* Configure CCS mode */

+   intel_gt_apply_ccs_mode(gt);
+
/*
 * At least 830 can leave some of the unused rings
 * "active" (ie. head != tail) after resume which
@@ -860,6 +865,7 @@ void intel_gt_driver_late_release_all(struct 
drm_i915_private *i915)
  
  	for_each_gt(gt, i915, id) {

intel_uc_driver_late_release(>uc);
+   intel_gt_fini_ccs_mode(gt);
intel_gt_fini_requests(gt);
intel_gt_fini_reset(gt);
intel_gt_fini_timelines(gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
new file mode 100644
index ..fab8a77bded2
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -0,0 +1,81 @@
+//SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "i915_drv.h"
+
+#include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
+#include "intel_gt_regs.h"
+#include "intel_gt_types.h"
+
+static void __intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+   u32 mode = XEHP_CCS_MODE_CSLICE_0_3_MASK; /* disable all by default */
+   int num_slices = hweight32(CCS_MASK(gt));
+   int num_engines = gt->ccs.mode;
+   int slice = 0;
+   int i;
+
+   if (!num_engines)
+   return;
+
+   /*
+* Loop over all available slices and assign each a user engine.
+*
+* With 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* With 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* With 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*
+* Since the number of slices and the number of engines is
+* known, and we ensure that there is an exact multiple of
+* engines for slices, the double loop becomes a loop over each
+* slice.
+*/
+   for (i = num_slices / num_engines; i < num_slices; i++) {
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   for_each_engine_masked(engine, gt, ALL_CCS(gt), tmp) {
+   /* If a slice is fused off, leave disabled */
+   while (!(CCS_MASK(gt) & BIT(slice)))
+   slice++;
+
+   mode &= ~XEHP_CCS_MODE_CSLICE(slice, 
XEHP_CCS_MODE_CSLICE_MASK);
+   mode |= XEHP_CCS_MODE_CSLICE(slice, engine->instance);
+
+  

[PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-04 Thread Andi Shyti
The CCS mode involves assigning CCS engines to slices depending
on the number of slices and the number of engines the user wishes
to set.

In this patch, the default CCS setting is established during the
initial GT settings. It involves assigning only one CCS to all
the slices.

Based on a patch by Chris Wilson 
and Tejas Upadhyay .

Signed-off-by: Andi Shyti 
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Niranjana Vishwanathapura 
Cc: Tejas Upadhyay 
---
 drivers/gpu/drm/i915/Makefile   |  1 +
 drivers/gpu/drm/i915/gt/intel_gt.c  |  6 ++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 81 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 16 
 drivers/gpu/drm/i915/gt/intel_gt_regs.h | 13 
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 19 +
 drivers/gpu/drm/i915/i915_drv.h |  2 +
 7 files changed, 138 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e777686190ca..1dce15d6306b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -119,6 +119,7 @@ gt-y += \
gt/intel_ggtt_fencing.o \
gt/intel_gt.o \
gt/intel_gt_buffer_pool.o \
+   gt/intel_gt_ccs_mode.o \
gt/intel_gt_clock_utils.o \
gt/intel_gt_debugfs.o \
gt/intel_gt_engines_debugfs.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a425db5ed3a2..e83c7b80c07a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -17,6 +17,7 @@
 #include "intel_engine_regs.h"
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_buffer_pool.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_debugfs.h"
@@ -47,6 +48,7 @@ void intel_gt_common_init_early(struct intel_gt *gt)
init_llist_head(>watchdog.list);
INIT_WORK(>watchdog.work, intel_gt_watchdog_work);
 
+   intel_gt_init_ccs_mode(gt);
intel_gt_init_buffer_pool(gt);
intel_gt_init_reset(gt);
intel_gt_init_requests(gt);
@@ -195,6 +197,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
 
intel_gt_init_swizzling(gt);
 
+   /* Configure CCS mode */
+   intel_gt_apply_ccs_mode(gt);
+
/*
 * At least 830 can leave some of the unused rings
 * "active" (ie. head != tail) after resume which
@@ -860,6 +865,7 @@ void intel_gt_driver_late_release_all(struct 
drm_i915_private *i915)
 
for_each_gt(gt, i915, id) {
intel_uc_driver_late_release(>uc);
+   intel_gt_fini_ccs_mode(gt);
intel_gt_fini_requests(gt);
intel_gt_fini_reset(gt);
intel_gt_fini_timelines(gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
new file mode 100644
index ..fab8a77bded2
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -0,0 +1,81 @@
+//SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "i915_drv.h"
+
+#include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
+#include "intel_gt_regs.h"
+#include "intel_gt_types.h"
+
+static void __intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+   u32 mode = XEHP_CCS_MODE_CSLICE_0_3_MASK; /* disable all by default */
+   int num_slices = hweight32(CCS_MASK(gt));
+   int num_engines = gt->ccs.mode;
+   int slice = 0;
+   int i;
+
+   if (!num_engines)
+   return;
+
+   /*
+* Loop over all available slices and assign each a user engine.
+*
+* With 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* With 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* With 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*
+* Since the number of slices and the number of engines is
+* known, and we ensure that there is an exact multiple of
+* engines for slices, the double loop becomes a loop over each
+* slice.
+*/
+   for (i = num_slices / num_engines; i < num_slices; i++) {
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   for_each_engine_masked(engine, gt, ALL_CCS(gt), tmp) {
+   /* If a slice is fused off, leave disabled */
+   while (!(CCS_MASK(gt) & BIT(slice)))
+   slice++;
+
+   mode &= ~XEHP_CCS_MODE_CSLICE(slice, 
XEHP_CCS_MODE_CSLICE_MASK);
+   mode |= XEHP_CCS_MODE_CSLICE(slice, engine->instance);
+
+   /* assign the next slice */
+