from:"Andi Shyti"

Re: [PATCH] drm/i915: Rename functions in the docs to match code changes

2024-10-02 Thread Andi Shyti

Hi Harshit,

On Mon, Sep 30, 2024 at 11:25:54PM -0700, Harshit Mogalapalli wrote:
> make htmldocs is reporting:
> 
> drivers/gpu/drm/i915/i915_irq.c:1: warning: 
> 'intel_runtime_pm_disable_interrupts' not found
> drivers/gpu/drm/i915/i915_irq.c:1: warning: 
> 'intel_runtime_pm_enable_interrupts' not found
> 
> intel_runtime_pm_disable_interrupts() is renamed to intel_irq_suspend(),
> make documentation changes accordingly.
> 
> Fixes: 3de5774cb8c0 ("drm/i915/irq: Rename suspend/resume functions")
> Reported-by: Stephen Rothwell 
> Closes: https://lore.kernel.org/all/20241001134331.7b4d4...@canb.auug.org.au/
> Signed-off-by: Harshit Mogalapalli 

Thanks for your patch. The functions were indeed renamed here(*)
by Rodrigo.

I'm going to remove the "Fixes:" tag as I don't think
documentation fixes are part of it. Unless someone wants it
strongly.

Without the Fixes tag:

Reviewed-by: Andi Shyti 

Andi

(*) 3de5774cb8c0 ("drm/i915/irq: Rename suspend/resume functions")

> ---
> Noticed that Stephen also reported this so added a Closes URL.
> ---
>  Documentation/gpu/i915.rst | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index ad59ae579237..7a469df675d8 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -35,10 +35,10 @@ Interrupt Handling
> :functions: intel_irq_init intel_irq_init_hw intel_hpd_init
>  
>  .. kernel-doc:: drivers/gpu/drm/i915/i915_irq.c
> -   :functions: intel_runtime_pm_disable_interrupts
> +   :functions: intel_irq_suspend
>  
>  .. kernel-doc:: drivers/gpu/drm/i915/i915_irq.c
> -   :functions: intel_runtime_pm_enable_interrupts
> +   :functions: intel_irq_resume
>  
>  Intel GVT-g Guest Support(vGPU)
>  ---
> -- 
> 2.46.0

Re: [PATCH v1 1/1] drm/i915/gt: Use IS_ENABLED() instead of defined() on config check

2024-09-27 Thread Andi Shyti

Hi Nitin,

On Fri, Sep 20, 2024 at 04:15:41PM +0530, Nitin Gote wrote:
> Always prefer to use IS_ENABLED() instead of defined() for
> checking whether a kconfig option is enabled or not.
> 
> Signed-off-by: Nitin Gote 

Reviewed-by: Andi Shyti 

Thanks,
Andi

Re: [PATCH 1/5] drm/i915/gem: fix bitwise and logical AND mixup

2024-09-18 Thread Andi Shyti

Hi Jani,

On Wed, Sep 18, 2024 at 02:17:44PM +0300, Jani Nikula wrote:
> CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND is an int, defaulting to 250. When
> the wakeref is non-zero, it's either -1 or a dynamically allocated
> pointer, depending on CONFIG_DRM_I915_DEBUG_RUNTIME_PM. It's likely that
> the code works by coincidence with the bitwise AND, but with
> CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y, there's the off chance that the
> condition evaluates to false, and intel_wakeref_auto() doesn't get
> called. Switch to the intended logical AND.
> 
> Fixes: ad74457a6b5a ("drm/i915/dgfx: Release mmap on rpm suspend")
> Cc: Matthew Auld 
> Cc: Rodrigo Vivi 
> Cc: Anshuman Gupta 
> Cc: Andi Shyti 
> Cc:  # v6.1+
> Signed-off-by: Jani Nikula 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 5c72462d1f57..c157ade48c39 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1131,7 +1131,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
>   GEM_WARN_ON(!i915_ttm_cpu_maps_iomem(bo->resource));
>   }
>  
> - if (wakeref & CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)
> + if (wakeref && CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)

ops!

Reviewed-by: Andi Shyti 

Andi

>   
> intel_wakeref_auto(&to_i915(obj->base.dev)->runtime_pm.userfault_wakeref,
>  
> msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND));
>  
> -- 
> 2.39.2

Re: [PATCH] drm/i915: Fix typos

2024-09-16 Thread Andi Shyti

Hi Andrew,

On Sun, Sep 15, 2024 at 03:01:55PM GMT, Andrew Kreimer wrote:
> Fix typos in documentation.
> 
> Reported-by: Matthew Wilcox 
> Signed-off-by: Andrew Kreimer 

Reviewed-by: Andi Shyti 

Because we are receiving lots of typos patches in this period,
it's nice to have the context written in the subject, e.g., in
this case, Fix "bellow" -> "below" typo.

Don't worry, I will take care of it.

Thanks,
Andi

Re: [PATCH v1] drm/i915: Fix typo in the comment

2024-09-16 Thread Andi Shyti

Hi Yan,

On Sat, Sep 14, 2024 at 02:41:41PM GMT, Yan Zhen wrote:
> Correctly spelled comments make it easier for the reader to understand
> the code.
> 
> Replace 'ojects' with 'objects' in the comment &
> replace 'resonable' with 'reasonable' in the comment &
> replace 'droping' with 'dropping' in the comment &
> replace 'sacrifical' with 'sacrificial' in the comment.
> 
> Signed-off-by: Yan Zhen 

Reviewed-by: Andi Shyti 

Thanks,
Andi

Re: [PATCH v1] drm/i915/display: fix typo in the comment

2024-09-16 Thread Andi Shyti

Hi Yan,

On Fri, Sep 13, 2024 at 02:17:27PM GMT, Yan Zhen wrote:
> Correctly spelled comments make it easier for the reader to understand
> the code.
> 
> Replace 'platformas' with 'platforms' in the comment &
> replace 'prefere' with 'prefer' in the comment &
> replace 'corresponsding' with 'corresponding' in the comment &
> replace 'harizontal' with 'horizontal' in the comment.
> 
> Signed-off-by: Yan Zhen 

reviewed and merged to drm-intel-next.

Thanks,
Andi

Re: [PATCH v1] drm/i915/gvt: Correct multiple typos in comments

2024-09-16 Thread Andi Shyti

Hi Shen,

On Fri, Sep 13, 2024 at 10:16:12AM GMT, Shen Lichuan wrote:
> Fixed some spelling errors, the details are as follows:
> 
> -in the code comments:
>   addess->address
>   trasitions->transitions
>   furture->future
>   unsubmited->unsubmitted
> 
> Signed-off-by: Shen Lichuan 

reviewed and merged to drm-intel-next.

Thanks,
Andi

Re: [PATCH v1] drm/i915/dp: Remove double assignment in intel_dp_compute_as_sdp()

2024-09-16 Thread Andi Shyti

Hi Yuesong,

On Fri, Aug 23, 2024 at 10:36:12AM GMT, Yuesong Li wrote:
> cocci report a double assignment warning. 'as_sdp->duration_incr_ms'
> was assigned twice in intel_dp_compute_as_sdp().
> 
> Signed-off-by: Yuesong Li 

reviewed and merged to drm-intel-next.

Thanks,
Andi

Re: [PATCH 2/2] drm/i915/gt: Fixed an typo

2024-09-16 Thread Andi Shyti

Hi Zhang He,

I merged your previous patch so that you don't need to resend it
anymore.

I already asked you not to resend it in the past version. Please,
read carefully the comments you receive.

I repeat: add the versioning. When you do:

   git format patch ...

you get:

   [PATCH 1/1] drm/

if you do

   git format patch -v 2 ...

you get:

   [PATCH v2 1/1] drm/

This is what I asked you to do.

The 1/1 or 2/2 is the patch counter for multi patch series, not
the version. The version is given by "-v 2" from the git
format-patch command.

Please read the SubmittingPatches document, it's essential for
sending patches.

On Sat, Sep 14, 2024 at 09:31:46AM GMT, Zhang He wrote:
> column header should be GPU, not CPU
> 
> ---
> ChangeLog:
> v1: use correct name as Author and Signer
> v2: change one line in drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c,
> LLC's information header from "Effective CPU freq" to "Effective GPU 
> freq"

Good that the changelog is here. The v2, though, is not a
changelog, but a description.

> Signed-off-by: Zhang He 

The signature goes above the "---" section, otherwise it doesn't
show up when I apply the patch.

Thanks for your patch,
Andi

Re: [PATCH 1/2] drm/i915/debugfs: remove superfluous kernel_param_lock/unlock

2024-09-13 Thread Andi Shyti

Hi Jani,

On Fri, Sep 13, 2024 at 03:51:54PM GMT, Jani Nikula wrote:
> We're not actually accessing the module params here anymore. The locking
> is completely unnecessary.
> 
> Signed-off-by: Jani Nikula 

Reviewed-by: Andi Shyti 

Thanks,
Andi

Re: [PATCH] drm/i915/gt: Fixed an typo

2024-09-13 Thread Andi Shyti

Hi Zhang,

On Fri, Sep 13, 2024 at 10:07:21PM GMT, Zhang He wrote:
> column header should be GPU, not CPU
> 
> Signed-off-by: Zhang He 

Thanks for having fixed the issues I pointed out. Said that, for
the next patches:

1. Add a versioning. This is version number 2, so that yo uneed
   to do "git format patch -v 2"
2. Add the changelog: you need to list the differences between
   the two versions, so that people are aware of what changes to
   look for. You can do it after the "---' section in this patch.
   For this patch the difference would be the use of your correct
   name as Author and as Signer.
3. Add the tags that you collected in the previous version of the
   patch. I did review your change, so that you should have added
   my:

Reviewed-by: Andi Shyti 

For now it's OK, your patch is accepted, I will merge it and then
I will notify you.

Thanks for having sent your change and for following up on the
review,
Andi

> ---
>  drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> index 8d08b38874ef..b635aa2820d9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> @@ -431,7 +431,7 @@ static int llc_show(struct seq_file *m, void *data)
>   max_gpu_freq /= GEN9_FREQ_SCALER;
>   }
>  
> - seq_puts(m, "GPU freq (MHz)\tEffective CPU freq (MHz)\tEffective Ring 
> freq (MHz)\n");
> + seq_puts(m, "GPU freq (MHz)\tEffective GPU freq (MHz)\tEffective Ring 
> freq (MHz)\n");
>  
>   wakeref = intel_runtime_pm_get(gt->uncore->rpm);
>   for (gpu_freq = min_gpu_freq; gpu_freq <= max_gpu_freq; gpu_freq++) {
> -- 
> 2.34.1
>

Re: [PATCH v3] drm/i915/hwmon: expose package temperature

2024-09-13 Thread Andi Shyti

Hi Raag,

On Fri, Sep 13, 2024 at 10:57:00AM GMT, Andi Shyti wrote:
> On Fri, Sep 13, 2024 at 11:51:22AM GMT, Raag Jadav wrote:
> > On Fri, Sep 13, 2024 at 11:14:22AM +0530, Riana Tauro wrote:
> > > On 9/10/2024 4:22 PM, Raag Jadav wrote:
> > > > Add hwmon support for temp1_input attribute, which will expose package
> > > > temperature in millidegree Celsius. With this in place we can monitor
> > > > package temperature using lm-sensors tool.
> > > > 
> > > > $ sensors
> > > > i915-pci-0300
> > > > Adapter: PCI adapter
> > > > in0: 990.00 mV
> > > > fan1:1260 RPM
> > > > temp1:+45.0°C
> > > > power1:   N/A  (max =  35.00 W)
> > > > energy1:  12.62 kJ
> > > > 
> > > > v2: Use switch case (Anshuman)
> > > > v3: Comment adjustment (Riana)
> > > > 
> > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11276
> > > > Signed-off-by: Raag Jadav 
> > > > Reviewed-by: Anshuman Gupta 
> > > > Reviewed-by: Andi Shyti 
> > > Looks good to me
> > > Reviewed-by: Riana Tauro 
> > 
> > Thank you :)
> > 
> > Andi, can you pick this one up? Anshuman's machine is down.
> 
> Sure!

merged to drm-intel-next.

Thanks,
Andi

Re: [PATCH 3/3] drm/i915/irq: Rename suspend/resume funcitons

2024-09-13 Thread Andi Shyti

Hi Rodrigo,

On Thu, Sep 12, 2024 at 01:25:39PM GMT, Rodrigo Vivi wrote:
> Although these functions are used in runtime_pm, they are not
> exclusively used there, so remove the misleading prefix.
> 
> Signed-off-by: Rodrigo Vivi 

Reviewed-by: Andi Shyti 

A general note, please add a cover letter even if the series
looks trivial. It's important to have an overview of what the
series does.

Thanks,
Andi

Re: [PATCH 2/3] drm/i915/irq: Move irqs_enabled out of runtime_pm

2024-09-13 Thread Andi Shyti

Hi Rodrigo,

On Thu, Sep 12, 2024 at 01:25:38PM GMT, Rodrigo Vivi wrote:
> This information is used in many places and it doesn't have
> anything to do with runtime_pm directly. Let's move it to
> the driver, where it belongs.
> 
> Signed-off-by: Rodrigo Vivi 

Reviewed-by: Andi Shyti 

Thanks,
Andi

Re: [PATCH 1/3] drm/i915/irq: Remove duplicated irq_enabled variable

2024-09-13 Thread Andi Shyti

Hi Rodrigo,

On Thu, Sep 12, 2024 at 01:25:37PM GMT, Rodrigo Vivi wrote:
> Let's kill this legacy iand almost unused rq_enabled version
> in favor of the real one that is checked at
> intel_irqs_enabled().
> 
> The commit 'ac1723c16b66 ("drm/i915: Track IRQ state
> in local device state")' shows that this was a legacy
> DRM level irq_enabled information that got removed.
> 
> But the driver one already existed under a different

But the driver has already one under a different name (and
perhaps specify which one and which location :-))

I don't think you need to send a v2 for commit log changes.

Reviewed-by: Andi Shyti 

Andi

Re: [PATCH v3] drm/i915/hwmon: expose package temperature

2024-09-13 Thread Andi Shyti

Hi Raag,

On Fri, Sep 13, 2024 at 11:51:22AM GMT, Raag Jadav wrote:
> On Fri, Sep 13, 2024 at 11:14:22AM +0530, Riana Tauro wrote:
> > On 9/10/2024 4:22 PM, Raag Jadav wrote:
> > > Add hwmon support for temp1_input attribute, which will expose package
> > > temperature in millidegree Celsius. With this in place we can monitor
> > > package temperature using lm-sensors tool.
> > > 
> > > $ sensors
> > > i915-pci-0300
> > > Adapter: PCI adapter
> > > in0: 990.00 mV
> > > fan1:1260 RPM
> > > temp1:+45.0°C
> > > power1:   N/A  (max =  35.00 W)
> > > energy1:  12.62 kJ
> > > 
> > > v2: Use switch case (Anshuman)
> > > v3: Comment adjustment (Riana)
> > > 
> > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11276
> > > Signed-off-by: Raag Jadav 
> > > Reviewed-by: Anshuman Gupta 
> > > Reviewed-by: Andi Shyti 
> > Looks good to me
> > Reviewed-by: Riana Tauro 
> 
> Thank you :)
> 
> Andi, can you pick this one up? Anshuman's machine is down.

Sure!

Andi

Re: [PATCH] drm/i915/gt: Fixed an typo

2024-09-10 Thread Andi Shyti

Hi,

On Tue, Sep 10, 2024 at 10:10:04PM GMT, 张河 wrote:
> :), i think you mean should use CPU column head? because the reg value just
> reflect CPU related information

before getting into it, please, keep in mind:

1. do not top post.
2. do not send html e-mails.
3. read carefully the reviews from reviewers.

Said that, I asked you:

1. include the proper mailing lists when you send patches (use
get_maintainers.pl)
2. Do you really want to use "zhanghe9702" as a name rather than
your real "Name Surname" as everyone does?

Andi

> At 2024-09-10 17:24:32, "Andi Shyti"  wrote:
> >Hi Zhanghe,
> >
> >Thanks for your patch. Please next time check from
> >get_maintainers.pl the mailing lists that need to be included in
> >your patches.
> >
> >In this case you should have included at least the
> >intel-gfx  and the
> >dri-devel  mailing lists.
> >
> >On Sat, Sep 07, 2024 at 05:24:43PM +0800, zhanghe9702 wrote:
> >> column header should be GPU, not CPU
> >>
> >> Signed-off-by: zhanghe9702 
> >
> >Do you really want your name to appear as zhanghe9702? If you git
> >log the linux directory you will se that people normally use
> >the "Name Surname " style. As you wish.
> >
> >> ---
> >>  drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c 
> >> b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> >> index 8d08b38874ef..b635aa2820d9 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> >> @@ -431,7 +431,7 @@ static int llc_show(struct seq_file *m, void *data)
> >>  max_gpu_freq /= GEN9_FREQ_SCALER;
> >>  }
> >>
> >> -seq_puts(m, "GPU freq (MHz)\tEffective CPU freq (MHz)\tEffective Ring 
> >> freq (MHz)\n");
> >> +seq_puts(m, "GPU freq (MHz)\tEffective GPU freq (MHz)\tEffective Ring 
> >> freq (MHz)\n");
> >
> >This is correct:
> >
> >Reviewed-by: Andi Shyti 
> >
> >Thanks,
> >Andi
>

Re: [PATCH v2] drm/i915/hwmon: expose package temperature

2024-09-10 Thread Andi Shyti

On Tue, Sep 10, 2024 at 01:58:29PM GMT, Raag Jadav wrote:
> On Tue, Sep 10, 2024 at 11:57:20AM +0530, Nilawar, Badal wrote:
> > On 10-09-2024 10:07, Gupta, Anshuman wrote:
> > > > 
> > > > ...
> > > > 
> > > > > +static int
> > > > > +hwm_temp_read(struct hwm_drvdata *ddat, u32 attr, long *val) {
> > > > > + struct i915_hwmon *hwmon = ddat->hwmon;
> > > > > + intel_wakeref_t wakeref;
> > > > > + u32 reg_val;
> > > > > +
> > > > > + switch (attr) {
> > > > > + case hwmon_temp_input:
> > > > > + with_intel_runtime_pm(ddat->uncore->rpm, wakeref)
> > > > > + reg_val = intel_uncore_read(ddat->uncore, hwmon-
> > > > > rg.pkg_temp);
> > > > > +
> > > > > + /* HW register value is in degrees, convert to 
> > > > > millidegrees. */
> > > > > + *val = REG_FIELD_GET(TEMP_MASK, reg_val) *
> > > > MILLIDEGREE_PER_DEGREE;
> > > > > + return 0;
> > > > > + default:
> > > > > + return -EOPNOTSUPP;
> > > > > + }
> > > > 
> > > > I don't understand this love for single case switches.
> > > IMHO this is kept to keep symmetry in this file to make it more readable.
> > > Also it readable to return error using default case, which is followed in 
> > > this entire file.
> > I agree on this. Let’s stick to file-wide approach and ensure it is applied
> > to the fan_input attribute as well.
> 
> Since fan patch is already on its way to drm-next, you can submit a fix if 
> you wish.
> Although I don't agree with it, I have no objections.

nack! :-)

It doesn't make much sense to send a controvertial patch that
refactors good working code to other good (some would say worse)
working code without any functional change.

Thanks,
Andi

Re: [PATCH] drm/i915/gt: Fixed an typo

2024-09-10 Thread Andi Shyti

Hi Zhanghe,

Thanks for your patch. Please next time check from
get_maintainers.pl the mailing lists that need to be included in
your patches.

In this case you should have included at least the
intel-gfx  and the
dri-devel  mailing lists.

On Sat, Sep 07, 2024 at 05:24:43PM +0800, zhanghe9702 wrote:
> column header should be GPU, not CPU
> 
> Signed-off-by: zhanghe9702 

Do you really want your name to appear as zhanghe9702? If you git
log the linux directory you will se that people normally use
the "Name Surname " style. As you wish.

> ---
>  drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> index 8d08b38874ef..b635aa2820d9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
> @@ -431,7 +431,7 @@ static int llc_show(struct seq_file *m, void *data)
>   max_gpu_freq /= GEN9_FREQ_SCALER;
>   }
>  
> - seq_puts(m, "GPU freq (MHz)\tEffective CPU freq (MHz)\tEffective Ring 
> freq (MHz)\n");
> + seq_puts(m, "GPU freq (MHz)\tEffective GPU freq (MHz)\tEffective Ring 
> freq (MHz)\n");

This is correct:

Reviewed-by: Andi Shyti 

Thanks,
Andi

Re: [PATCH v2] drm/i915/hwmon: expose package temperature

2024-09-10 Thread Andi Shyti

Hi,

> > > > +static int
> > > > +hwm_temp_read(struct hwm_drvdata *ddat, u32 attr, long *val) {
> > > > +   struct i915_hwmon *hwmon = ddat->hwmon;
> > > > +   intel_wakeref_t wakeref;
> > > > +   u32 reg_val;
> > > > +
> > > > +   switch (attr) {
> > > > +   case hwmon_temp_input:
> > > > +   with_intel_runtime_pm(ddat->uncore->rpm, wakeref)
> > > > +   reg_val = intel_uncore_read(ddat->uncore, hwmon-
> > > > rg.pkg_temp);
> > > > +
> > > > +   /* HW register value is in degrees, convert to 
> > > > millidegrees. */
> > > > +   *val = REG_FIELD_GET(TEMP_MASK, reg_val) *
> > > MILLIDEGREE_PER_DEGREE;
> > > > +   return 0;
> > > > +   default:
> > > > +   return -EOPNOTSUPP;
> > > > +   }
> > > 
> > > I don't understand this love for single case switches.
> > IMHO this is kept to keep symmetry in this file to make it more readable.
> > Also it readable to return error using default case, which is followed in 
> > this entire file.
> I agree on this. Let’s stick to file-wide approach and ensure it is applied
> to the fan_input attribute as well.

Yes, that's why I'm giving the r-b. I don't like it, but that's
how you guys have decided to do it.

Thanks,
Andi

Re: [PATCH v2] drm/i915/hwmon: expose package temperature

2024-09-09 Thread Andi Shyti

Hi Raag,

> > > + case hwmon_temp_input:
> > > + with_intel_runtime_pm(ddat->uncore->rpm, wakeref)
> > > + reg_val = intel_uncore_read(ddat->uncore, 
> > > hwmon->rg.pkg_temp);
> > > +
> > > + /* HW register value is in degrees, convert to millidegrees. */
> > use millidegree Celsius here
> 
> The intent here is to signify the conversion rather than the unit.
> But okay, will add if we have another version.

is Riana asking to improve the comment here? Then please do, if
someone asks to make better comments it means that he is asking
to answer to an open question that someone might have in the
future.

Sending a v3 is not much of a work but improving the comment
later is not trivial.

Besides you need to retrigger tests anyway because you got a
BAT test failure :-)

Thanks,
Andi

Re: [PATCH v2] drm/i915/hwmon: expose package temperature

2024-09-09 Thread Andi Shyti

Hi Raag,

...

> +static int
> +hwm_temp_read(struct hwm_drvdata *ddat, u32 attr, long *val)
> +{
> + struct i915_hwmon *hwmon = ddat->hwmon;
> + intel_wakeref_t wakeref;
> + u32 reg_val;
> +
> + switch (attr) {
> + case hwmon_temp_input:
> + with_intel_runtime_pm(ddat->uncore->rpm, wakeref)
> + reg_val = intel_uncore_read(ddat->uncore, 
> hwmon->rg.pkg_temp);
> +
> + /* HW register value is in degrees, convert to millidegrees. */
> + *val = REG_FIELD_GET(TEMP_MASK, reg_val) * 
> MILLIDEGREE_PER_DEGREE;
> + return 0;
> + default:
> + return -EOPNOTSUPP;
> +     }

I don't understand this love for single case switches.

Reviewed-by: Andi Shyti 

Thanks,
Andi

Re: [PATCH] drm/i915/gt: Continue creating engine sysfs files even after a failure

2024-09-04 Thread Andi Shyti

Hi Sima,

On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:
> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> > The i915 driver generates sysfs entries for each engine of the
> > GPU in /sys/class/drm/cardX/engines/.
> > 
> > The process is straightforward: we loop over the UABI engines and
> > for each one, we:
> > 
> >  - Create the object.
> >  - Create basic files.
> >  - If the engine supports timeslicing, create timeslice duration files.
> >  - If the engine supports preemption, create preemption-related files.
> >  - Create default value files.
> > 
> > Currently, if any of these steps fail, the process stops, and no
> > further sysfs files are created.
> > 
> > However, it's not necessary to stop the process on failure.
> > Instead, we can continue creating the remaining sysfs files for
> > the other engines. Even if some files fail to be created, the
> > list of engines can still be retrieved by querying i915.
> > 
> > Signed-off-by: Andi Shyti 
> 
> Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
> needed, and we should delete those files probably.
> 
> This is different from debugfs, where failures are consistently ignored
> because that's the conscious design choice Greg made and wants supported.
> Because debugfs is optional.
> 
> So please make sure we correctly fail driver load if these don't register.
> Even better would be if sysfs files are registered atomically as attribute
> blocks, but that's an entire different can of worms. But that would really
> clean up this code and essentially put any failure handling onto core
> driver model and sysfs code.

This comment came after I merged the patch. So far, we have been
keeping the driver going even if sysfs fails to create, with the
idea of "if there is something wrong let it go as far as it can
and fail on its own".

This change is just setting the behavior to what the rest of the
interfaces are doing, so that either we change them all to fail
the driver's probe or we have them behaving consistently as they
are.

Tvrtko, Chris, Rodrigo any opinion from your side? Shall we bail
out as Sima is suggesting?

Thanks,
Andi

Re: [PATCH v3 00/15] CCS static load balance

2024-08-28 Thread Andi Shyti

Hi Sima,

On Wed, Aug 28, 2024 at 03:47:21PM +0200, Daniel Vetter wrote:
> On Wed, Aug 28, 2024 at 10:20:15AM +0200, Andi Shyti wrote:
> > Hi Sima,
> > 
> > first of all, thanks for looking into this series.
> > 
> > On Tue, Aug 27, 2024 at 07:31:21PM +0200, Daniel Vetter wrote:
> > > On Fri, Aug 23, 2024 at 03:08:40PM +0200, Andi Shyti wrote:
> > > > Hi,
> > > > 
> > > > This patch series introduces static load balancing for GPUs with
> > > > multiple compute engines. It's a lengthy series, and some
> > > > challenging aspects still need to be resolved.
> > > 
> > > Do we have an actual user for this, where just reloading the entire driver
> > > (or well-rebinding, if you only want to change the value for a specific
> > > device) with a new module option isn't enough?
> > 
> > Yes, we have users for this and this has been already agreed with
> > architects and maintainers.
> 
> So my understanding is that for upstream, this only applies to dg2,
> because the other platforms don't have enough CCS engines to make this a
> real issue.
> 
> Do we really have upstream demand for this feature on dg2 only?

That's my understanding.

> Also how hard would it be to make these users happy with xe-on-dg2 in
> upstream instead?

I don't know this, I think the user is already on i915.

> > Why are you saying that we are reloading/rebinding the driver?
> 
> That's the other alternate solution.

But that's not how XE does it, though.

The use case is that userspace has an environment variable that
they change ondemand for choosing the CCS mode. They want to
change the value of that variable on the fly and, as we are only
adding or removing a few engines, this is done without reprobing
the whole driver.

In a previous implementation (from where both I and Niranjana for
XE took inspiration) the CCS mode was passed during compute
execbuf.

> > I'm only removing the exposure of user engines, which is
> > basically a flag in the engines data structure.
> > 
> > > There's some really gnarly locking and lifetime fun in there, and it needs
> > > a corresponding justification.
> > 
> > What locking are you referring about?
> > 
> > I only added one single mutex that has a comment and a
> > justification. If you think that's not enough, I can of course
> > improve it (please note that the changes have a good amount of
> > comments and I tried to be aso more descriptive as I could).
> > 
> > When I change the engines configurations only for the compute
> > engines and only for DG2 platforms, I need to make sure that no
> > other user is affected by the change. Thus I need to make sure
> > that access to some of the strucures are properly serialized.
> > 
> > > Which needs to be enormous for this case,
> > > meaning actual customers willing to shout on dri-devel that they really,
> > > absolutely need this, or their machines will go up in flames.
> > > Otherwise this is a nack from me.
> > 
> > Would you please tell me why are you nacking the patch? So that I
> > address your comments for v4?
> 
> So for one, this is substantially more flexible than the solution merged
> into xe. And the patch set doesn't explain why (the commit messages
> actualy describe the design xe has).

I think in XE we might have missed a few things and my plan is to
check the XE implementation once I'm done with i915 (I was one of
the XE reviewers). And, many of the things in XE are so different
that the solution can't be taken as it is.

> That does not inspire confidence at all.

Consider that most of the patches are refactoring, only the last
patch is doing the real job. That's because the first workaround
was already merged a while ago. While XE didn't need the
refactorings I made.

> Second, I don't think anyone understands the entire engine/ctx locking
> design in i915-gem. And the fix for that was to make as much as absolutely
> possible immutable. Yes the implementation looks correct, but when I
> looked at the much, much simpler xe implementation I'm pretty sure I've
> found an issue there too. Here I can't even tell.

The locking is fairly simple, when the user wants to set a
specific CCS mode, I take the wakrefe lock and I check no one is
holding it. This way I am sure that I am the only user of the GPU
(otherwise the GPU would be off).

I added one single lock to be used for the for_each_uabi_engine.
It's not really required but I really want to be sure that I am
not changing the CCS mode while someone else is using the uabi
engines.

I'm also adding Joonas in Cc with whom I discussed many details
of the implementation. I would really appreaciate to know what
exactly is wrong here and what are the necessary changes needed
to get the series merged.

For now, thanks a lot for your comments,
Andi

Re: [PATCH v3 00/15] CCS static load balance

2024-08-28 Thread Andi Shyti

Hi Sima,

first of all, thanks for looking into this series.

On Tue, Aug 27, 2024 at 07:31:21PM +0200, Daniel Vetter wrote:
> On Fri, Aug 23, 2024 at 03:08:40PM +0200, Andi Shyti wrote:
> > Hi,
> > 
> > This patch series introduces static load balancing for GPUs with
> > multiple compute engines. It's a lengthy series, and some
> > challenging aspects still need to be resolved.
> 
> Do we have an actual user for this, where just reloading the entire driver
> (or well-rebinding, if you only want to change the value for a specific
> device) with a new module option isn't enough?

Yes, we have users for this and this has been already agreed with
architects and maintainers.

Why are you saying that we are reloading/rebinding the driver?
I'm only removing the exposure of user engines, which is
basically a flag in the engines data structure.

> There's some really gnarly locking and lifetime fun in there, and it needs
> a corresponding justification.

What locking are you referring about?

I only added one single mutex that has a comment and a
justification. If you think that's not enough, I can of course
improve it (please note that the changes have a good amount of
comments and I tried to be aso more descriptive as I could).

When I change the engines configurations only for the compute
engines and only for DG2 platforms, I need to make sure that no
other user is affected by the change. Thus I need to make sure
that access to some of the strucures are properly serialized.

> Which needs to be enormous for this case,
> meaning actual customers willing to shout on dri-devel that they really,
> absolutely need this, or their machines will go up in flames.
> Otherwise this is a nack from me.

Would you please tell me why are you nacking the patch? So that I
address your comments for v4?

Thanks,
Andi

Re: [PATCH v3 00/15] CCS static load balance

2024-08-27 Thread Andi Shyti

Hi Chris,

just a kind ping: any chance you can take a look at this? I would
really appreciate.

Thanks,
Andi

On Fri, Aug 23, 2024 at 03:08:40PM +0200, Andi Shyti wrote:
> Hi,
> 
> This patch series introduces static load balancing for GPUs with
> multiple compute engines. It's a lengthy series, and some
> challenging aspects still need to be resolved.
> 
> I have tried to split the work as much as possible to facilitate
> the review process.
> 
> To summarize, in patches 1 to 14, no functional changes occur
> except for the addition of the num_cslices interface. The
> significant changes happen in patch 15, which is the core part of
> the CCS mode setting, utilizing the groundwork laid in the
> earlier patches.
> 
> In this updated approach, the focus is now on managing the UABI
> engine list, which controls the engines exposed to userspace.
> Instead of manipulating phuscal engines and their memory, we now
> handle engine exposure through this list.
> 
> I would greatly appreciate further input from all reviewers who
> have already assisted with the previous work.
> 
> IGT tests have also been developed, but I haven't sent them yet.
> 
> Thank you Chris for the offline reviews.
> 
> Thanks,
> Andi
> 
> Changelog:
> ==
> PATCHv2 -> PATCHv3
> --
>  - Fix a NULL pointer dereference during module unload.
>In i915_gem_driver_remove() I was accessing the gt after the
>gt was removed. Use the dev_priv, instead (obviously!).
>  - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks
>were not correctly placed in the exit paths.
>  - Fix a checkpatch error for spaces after and before parenthesis
>in the for_each_enabled_engine() definition.
> 
> PATCHv1 -> PATCHv2
> --
>  - Use uabi_mutex to protect the uabi_engines, not the engine
>itself. Rename it to uabi_engines_mutex.
>  - Use kobject_add/kobject_del for adding and removing
>interfaces, this way we don't need to destroy and recreate the
>engines, anymore. Refactor intel_engine_add_single_sysfs() to
>reflect this scenario.
>  - After adding engines to the rb_tree check that they have been
>added correctly.
>  - Fix rb_find_add() compare function to take into accoung also
>the class, not just the instance.
> 
> RFCv2 -> PATCHv1
> 
>  - Removed gt->ccs.mutex
>  - Rename m -> width, ccs_id -> engine in
>intel_gt_apply_ccs_mode().
>  - In the CCS register value calculation
>(intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
>along the ccs_mask (set by the user) instead of the
>cslice_mask.
>  - Add GEM_BUG_ON after calculating the new ccs_mask
>(update_ccs_mask()) to make sure all angines have been
>evaluated (i.e. ccs_mask must be '0' at the end of the
>algorithm).
>  - move wakeref lock before evaluating intel_gt_pm_is_awake() and
>fix exit path accordingly.
>  - Use a more compact form in intel_gt_sysfs_ccs_init() and
>add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
>need to store the return value to the err variable which is
>unused. Get rid of err.
>  - Print a warnging instead of a debug message if we fail to
>create the sysfs files.
>  - If engine files creation fails in
>intel_engine_add_single_sysfs(), print a warning, not an
>error.
>  - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
>to explain its purpose.
>  - During uabi engine creation, in
>intel_engines_driver_register(), the uabi_ccs_instance is
>redundant because the ccs_instances is already tracked in
>engine->uabi_instance.
>  - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
>__maybe_unused not to break bisectability. They wouldn't
>compile in their own commit. They will be used in the next
>patch and the __maybe_unused is removed.
>  - Update engine's workaround every time a new mode is set in
>update_ccs_mask().
>  - Mark engines as valid or invalid using their status as
>rb_node. Invalid engines are marked as invalid using
>RB_CLEAR_NODE(). Execbufs will check for their validity when
>selecting the engine to be combined to a context.
>  - Create for_each_enabled_engine() which skips the non valid
>engines and use it in selftests.
> 
> RFCv1 -> RFCv2
> --
> Compared to the first version I've taken a completely different
> approach to adding and removing engines. in v1 physical engines
> were directly added and removed, along with the memory allocated
> to them, each time the user changed the CCS mode (from the
> previous cover letter).
> 
&g

Re: [PATCH] drm: Fix kerneldoc for "Returns" section

2024-08-26 Thread Andi Shyti

Hi Renjun,

On Sat, Aug 24, 2024 at 04:36:34PM +0800, renjun wang wrote:
> The blank line between title "Returns:" and detail description is not
> allowed, otherwise the title will goes under the description block in
> generated .html file after running `make htmldocs`.
> 
> There are a few examples for current kerneldoc:
> https://www.kernel.org/doc/html/latest/gpu/drm-kms.html#c.drm_crtc_commit_wait
> https://www.kernel.org/doc/html/latest/gpu/drm-kms.html#c.drm_atomic_get_crtc_state
> https://www.kernel.org/doc/html/latest/gpu/i915.html#c.i915_vma_pin_fence
> 
> Signed-off-by: renjun wang 
> ---
>  drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 
>  drivers/gpu/drm/drm_atomic.c  | 6 --
>  drivers/gpu/drm/drm_atomic_helper.c   | 2 --
>  drivers/gpu/drm/drm_file.c| 7 ---
>  drivers/gpu/drm/drm_gem.c | 7 ++-
>  drivers/gpu/drm/drm_modes.c   | 1 -
>  drivers/gpu/drm/drm_rect.c| 1 -
>  drivers/gpu/drm/drm_vblank.c  | 2 --
>  drivers/gpu/drm/i915/gem/i915_gem_object.h| 1 -
>  drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  | 1 -
>  drivers/gpu/drm/i915/i915_vma.h   | 1 -
>  11 files changed, 2 insertions(+), 31 deletions(-)

next time, please, split the series so that each component goes
to the right branch.

Andi

Re: [PATCH v3] drm/i915/gt: Use kmemdup_array instead of kmemdup for multiple allocation

2024-08-23 Thread Andi Shyti

Hi Yu,

On Wed, Aug 21, 2024 at 10:41:27AM +0800, Yu Jiaoliang wrote:
> Let the kememdup_array() take care about multiplication and possible
> overflows.
> 
> v2:
> - Change subject
> - Leave one blank line between the commit log and the tag section
> - Fix code alignment issue
> 
> v3:
> - Fix code alignment
> - Apply the patch on a clean drm-tip
> 
> Signed-off-by: Yu Jiaoliang 
> Reviewed-by: Jani Nikula 
> Reviewed-by: Andi Shyti 

merged to drm-intel-gt-next.

Thanks,
Andi

Re: [PATCH] drm/i915/gt: Continue creating engine sysfs files even after a failure

2024-08-23 Thread Andi Shyti

Hi Rodrigo,

On Fri, Aug 23, 2024 at 09:41:31AM -0400, Rodrigo Vivi wrote:
> On Wed, Aug 21, 2024 at 09:32:48AM +0200, Andi Shyti wrote:
> > On Tue, Aug 20, 2024 at 05:22:40PM -0400, Rodrigo Vivi wrote:
> > > On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
...
> > > > It might make sense to create an "inv-" if something
> > > > goes wrong, so that the user is aware that the engine exists, but
> > > > the sysfs file is not present.
> > > 
> > > well, if the sysfs dir/files creation is failing, then it will
> > > probably be unreliable anyway right?
> > 
> > Are you suggesting that "inv-" is OK?
> 
> it is okay I guess.
> But my point is more on, how are we going to create this if
> the creation mechanism is what is likely failing here.

We can fail for different reasons... but yeah you are right, it
doesn't make much sense, as also the creation of "inv-<...>"
interfaces might be unreliable.

> > > Also it looks something is off with the goto paths...
> > > 
> > > That if (0) is also ugly... probably better to use a
> > > kobject_put with continue on every failing point as well...
> > 
> > ehehe... I came to like it, to be honest. Besides I like single
> > exit paths instead of distributed returns. In this particular
> > case we would replcate the same "kobject_put() ... dev_warn()" in
> > several places, so that I'm not sure it's better.
> > 
> > If you like more we could do:
> > 
> > for (...) {
> > ...
> > ...
> > /* everything goes fine */
> > continue
> > 
> > err_engine:
> > kobject_put(...);
> > dev_warn(...);
> > }
> > 
> > And we avoid using the "if (0)" that you don't like.
> 
> nah, no strong feeling from my side. It is there, let's
> avoid unnecessary refactors.
> 
> Reviewed-by: Rodrigo Vivi 
> 
> on this patch as is. And sorry for the delay.

Thanks a lot for your review :-)

Andi

[PATCH v3 15/15] drm/i915/gt: Allow the user to change the CCS mode through sysfs

2024-08-23 Thread Andi Shyti

Create the 'ccs_mode' file under

/sys/class/drm/cardX/gt/gt0/ccs_mode

This file allows the user to read and set the current CCS mode.

 - Reading: The user can read the current CCS mode, which can be
   1, 2, or 4. This value is derived from the current engine
   mask.

 - Writing: The user can set the CCS mode to 1, 2, or 4,
   depending on the desired number of exposed engines and the
   required load balancing.

The interface will return -EBUSY if other clients are connected
to i915, or -EINVAL if an invalid value is set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 82 -
 1 file changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index cc46ee9dea3f..1ed6153ff8cf 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -6,6 +6,7 @@
 #include "i915_drv.h"
 #include "intel_engine_user.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_pm.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
@@ -172,7 +173,7 @@ static int rb_engine_cmp(struct rb_node *rb_new, const 
struct rb_node *rb_old)
return new->uabi_class - old->uabi_class;
 }
 
-static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 
ccs_mode)
+static void add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mode)
 {
struct drm_i915_private *i915 = gt->i915;
intel_engine_mask_t new_ccs_mask, tmp;
@@ -230,7 +231,7 @@ static void __maybe_unused add_uabi_ccs_engines(struct 
intel_gt *gt, u32 ccs_mod
mutex_unlock(&i915->uabi_engines_mutex);
 }
 
-static void __maybe_unused remove_uabi_ccs_engines(struct intel_gt *gt, u8 
ccs_mode)
+static void remove_uabi_ccs_engines(struct intel_gt *gt, u8 ccs_mode)
 {
struct drm_i915_private *i915 = gt->i915;
intel_engine_mask_t new_ccs_mask, tmp;
@@ -273,8 +274,85 @@ static ssize_t num_cslices_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(num_cslices);
 
+static ssize_t ccs_mode_show(struct device *dev,
+struct device_attribute *attr, char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 ccs_mode;
+
+   ccs_mode = hweight32(gt->ccs.id_mask);
+
+   return sysfs_emit(buff, "%u\n", ccs_mode);
+}
+
+static ssize_t ccs_mode_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buff, size_t count)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   int num_cslices = hweight32(CCS_MASK(gt));
+   int ccs_mode = hweight32(gt->ccs.id_mask);
+   ssize_t ret;
+   u32 val;
+
+   ret = kstrtou32(buff, 0, &val);
+   if (ret)
+   return ret;
+
+   /*
+* As of now possible values to be set are 1, 2, 4,
+* up to the maximum number of available slices
+*/
+   if (!val || val > num_cslices || (num_cslices % val))
+   return -EINVAL;
+
+   /* Let's wait until the GT is no longer in use */
+   ret = intel_gt_pm_wait_for_idle(gt);
+   if (ret)
+   return ret;
+
+   mutex_lock(>->wakeref.mutex);
+
+   /*
+* Let's check again that the GT is idle,
+* we don't want to change the CCS mode
+* while someone is using the GT
+*/
+   if (intel_gt_pm_is_awake(gt)) {
+   ret = -EBUSY;
+   goto out;
+   }
+
+   /*
+* Nothing to do if the requested setting
+* is the same as the current one
+*/
+   if (val == ccs_mode)
+   goto out;
+   else if (val > ccs_mode)
+   add_uabi_ccs_engines(gt, val);
+   else
+   remove_uabi_ccs_engines(gt, val);
+
+out:
+   mutex_unlock(>->wakeref.mutex);
+
+   return ret ?: count;
+}
+static DEVICE_ATTR_RW(ccs_mode);
+
 void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
 {
if (sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr))
gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+
+   /*
+* Do not create the ccs_mode file for non DG2 platforms
+* because they don't need it as they have only one CCS engine
+*/
+   if (!IS_DG2(gt->i915))
+   return;
+
+   if (sysfs_create_file(>->sysfs_gt, &dev_attr_ccs_mode.attr))
+   gt_warn(gt, "Failed to create sysfs ccs_mode files\n");
 }
-- 
2.45.2

[PATCH v3 14/15] drm/i915/gt: Implement creation and removal routines for CCS engines

2024-08-23 Thread Andi Shyti

In preparation for upcoming patches, we need routines to
dynamically create and destroy CCS engines based on the CCS mode
that the user wants to set.

The process begins by calculating the engine mask for the engines
that need to be added or removed. We then update the UABI list of
exposed engines and create or destroy the corresponding sysfs
interfaces accordingly.

These functions are not yet in use, so no functional changes are
intended at this stage.

Mark the functions 'add_uabi_ccs_engines()' and
'remove_uabi_ccs_engines()' as '__maybe_unused' to ensure
successful compilation and maintain bisectability. This
annotation will be removed in subsequent commits.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 124 
 1 file changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 5eead7b18f57..cc46ee9dea3f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,10 +4,12 @@
  */
 
 #include "i915_drv.h"
+#include "intel_engine_user.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
+#include "sysfs_engines.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -123,6 +125,29 @@ static void __update_ccs_mask(struct intel_gt *gt, u32 
ccs_mode)
intel_gt_apply_ccs_mode(gt);
 }
 
+static void update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   __update_ccs_mask(gt, ccs_mode);
+
+   /* Update workaround values */
+   for_each_engine_masked(engine, gt, gt->ccs.id_mask, tmp) {
+   struct i915_wa_list *wal = &engine->wa_list;
+   struct i915_wa *wa;
+   int i;
+
+   for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
+   if (!i915_mmio_reg_equal(wa->reg, XEHP_CCS_MODE))
+   continue;
+
+   wa->set = gt->ccs.mode_reg_val;
+   wa->read = gt->ccs.mode_reg_val;
+   }
+   }
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
if (!IS_DG2(gt->i915))
@@ -136,6 +161,105 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
__update_ccs_mask(gt, 1);
 }
 
+static int rb_engine_cmp(struct rb_node *rb_new, const struct rb_node *rb_old)
+{
+   struct intel_engine_cs *new = rb_to_uabi_engine(rb_new);
+   struct intel_engine_cs *old = rb_to_uabi_engine(rb_old);
+
+   if (new->uabi_class - old->uabi_class == 0)
+   return new->uabi_instance - old->uabi_instance;
+
+   return new->uabi_class - old->uabi_class;
+}
+
+static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 
ccs_mode)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_engine_mask_t new_ccs_mask, tmp;
+   struct intel_engine_cs *e;
+
+   /* Store the current ccs mask */
+   new_ccs_mask = gt->ccs.id_mask;
+   update_ccs_mask(gt, ccs_mode);
+
+   /*
+* Store only the mask of the CCS engines that need to be added by
+* removing from the new mask the engines that are already active
+*/
+   new_ccs_mask = gt->ccs.id_mask & ~new_ccs_mask;
+   new_ccs_mask <<= CCS0;
+
+   mutex_lock(&i915->uabi_engines_mutex);
+   for_each_engine_masked(e, gt, new_ccs_mask, tmp) {
+   int err;
+
+   i915->engine_uabi_class_count[I915_ENGINE_CLASS_COMPUTE]++;
+
+   /*
+* The engine is now inserted and marked as valid.
+*
+* rb_find_add() should always return NULL. If it returns a
+* pointer to an rb_node it means that it found the engine we
+* are trying to insert which means that something is really
+* wrong.
+*/
+   GEM_BUG_ON(rb_find_add(&e->uabi_node,
+  &i915->uabi_engines, rb_engine_cmp));
+
+   /* We inserted the engine, let's check if now we can find it */
+   GEM_BUG_ON(intel_engine_lookup_user(i915, e->uabi_class,
+   e->uabi_instance) != e);
+
+   /*
+* If the engine has never been used before (e.g. we are moving
+* for the first time from CCS mode 1 to CCS mode 2 or 4), then
+* also its sysfs entry has never been created. In this case its
+* value will be null and we need to allocate it.
+*/
+   if (!e->kobj)
+   err

[PATCH v3 13/15] drm/i915/gt: Isolate single sysfs engine file creation

2024-08-23 Thread Andi Shyti

In preparation for upcoming patches, we need the ability to
create and remove individual sysfs files. To facilitate this,
extract from the intel_engines_add_sysfs() function the creation
of individual files.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/sysfs_engines.c | 74 +++--
 drivers/gpu/drm/i915/gt/sysfs_engines.h |  2 +
 2 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index c1cc0981c8fb..ef2eda72ac7f 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_engine.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_gt_print.h"
 #include "sysfs_engines.h"
 
 struct kobj_engine {
@@ -481,7 +482,7 @@ static void add_defaults(struct kobj_engine *parent)
return;
 }
 
-void intel_engines_add_sysfs(struct drm_i915_private *i915)
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine)
 {
static const struct attribute * const files[] = {
&name_attr.attr,
@@ -497,7 +498,48 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 #endif
NULL
};
+   struct kobject *dir = engine->i915->sysfs_engine;
+   struct kobject *kobj = engine->kobj;
+   int err;
+
+   kobj = kobj_engine(dir, engine);
+   if (!kobj) {
+   err = -EFAULT;
+   goto err_engine;
+   }
+
+   err = sysfs_create_files(kobj, files);
+   if (err)
+   goto err_object;
+
+   if (intel_engine_has_timeslices(engine)) {
+   err = sysfs_create_file(kobj, ×lice_duration_attr.attr);
+   if (err)
+   goto err_object;
+   }
+
+   if (intel_engine_has_preempt_reset(engine)) {
+   err = sysfs_create_file(kobj, &preempt_timeout_attr.attr);
+   if (err)
+   goto err_object;
+   }
+
+   add_defaults(container_of(kobj, struct kobj_engine, base));
+
+   engine->kobj = kobj;
+
+   return 0;
+
+err_object:
+   kobject_put(kobj);
+err_engine:
+   gt_warn(engine->gt, "Failed to add sysfs engine '%s'\n", engine->name);
+
+   return err;
+}
 
+void intel_engines_add_sysfs(struct drm_i915_private *i915)
+{
struct device *kdev = i915->drm.primary->kdev;
struct intel_engine_cs *engine;
struct kobject *dir;
@@ -514,34 +556,10 @@ void intel_engines_add_sysfs(struct drm_i915_private 
*i915)
 * uabi_engines access list with the mutex.
 */
for_each_uabi_engine(engine, i915) {
-   struct kobject *kobj;
-
-   kobj = kobj_engine(dir, engine);
-   if (!kobj)
-   goto err_engine;
-
-   if (sysfs_create_files(kobj, files))
-   goto err_object;
-
-   if (intel_engine_has_timeslices(engine) &&
-   sysfs_create_file(kobj, ×lice_duration_attr.attr))
-   goto err_engine;
-
-   if (intel_engine_has_preempt_reset(engine) &&
-   sysfs_create_file(kobj, &preempt_timeout_attr.attr))
-   goto err_engine;
-
-   add_defaults(container_of(kobj, struct kobj_engine, base));
+   int err;
 
-   engine->kobj = kobj;
-
-   if (0) {
-err_object:
-   kobject_put(kobj);
-err_engine:
-   dev_err(kdev, "Failed to add sysfs engine '%s'\n",
-   engine->name);
+   err = intel_engine_add_single_sysfs(engine);
+   if (err)
break;
-   }
}
 }
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.h 
b/drivers/gpu/drm/i915/gt/sysfs_engines.h
index 9546fffe03a7..2e3ec2df14a9 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.h
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.h
@@ -7,7 +7,9 @@
 #define INTEL_ENGINE_SYSFS_H
 
 struct drm_i915_private;
+struct intel_engine_cs;
 
 void intel_engines_add_sysfs(struct drm_i915_private *i915);
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine);
 
 #endif /* INTEL_ENGINE_SYSFS_H */
-- 
2.45.2

[PATCH v3 12/15] drm/i915: Protect access to the UABI engines list with a mutex

2024-08-23 Thread Andi Shyti

Until now, the UABI engines list has been accessed in read-only
mode, as it was created once during boot and destroyed upon
module unload.

In upcoming commits, we will be modifying this list by changing
the CCS mode, allowing compute engines to be dynamically added
and removed at runtime based on user whims.

To ensure thread safety and prevent race conditions, we need to
protect the engine list with a mutex, thereby serializing access
to it.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  3 +++
 drivers/gpu/drm/i915/gt/intel_engine_user.c |  7 +++
 drivers/gpu/drm/i915/gt/sysfs_engines.c |  5 +
 drivers/gpu/drm/i915/i915_cmd_parser.c  |  2 ++
 drivers/gpu/drm/i915/i915_debugfs.c |  4 
 drivers/gpu/drm/i915/i915_drv.h |  4 
 drivers/gpu/drm/i915/i915_gem.c |  4 
 drivers/gpu/drm/i915/i915_perf.c|  8 +---
 drivers/gpu/drm/i915/i915_pmu.c | 11 +--
 drivers/gpu/drm/i915/i915_query.c   | 21 -
 10 files changed, 59 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index c0543c35cd6a..0ccbe447f51d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1124,6 +1124,7 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx,
if (!e)
return ERR_PTR(-ENOMEM);
 
+   mutex_lock(&ctx->i915->uabi_engines_mutex);
for_each_uabi_engine(engine, ctx->i915) {
struct intel_context *ce;
struct intel_sseu sseu = {};
@@ -1155,9 +1156,11 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx,
 
}
 
+   mutex_unlock(&ctx->i915->uabi_engines_mutex);
return e;
 
 free_engines:
+   mutex_unlock(&ctx->i915->uabi_engines_mutex);
free_engines(e);
return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 8e5284af8335..209d5badbd3d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -210,6 +210,13 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
LIST_HEAD(engines);
 
sort_engines(i915, &engines);
+   mutex_init(&i915->uabi_engines_mutex);
+
+   /*
+* We are still booting i915 and we are sure we are running
+* single-threaded. We don't need at this point to protect the
+* uabi_engines access list with the mutex.
+*/
 
prev = NULL;
p = &i915->uabi_engines.rb_node;
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index f67f76df1cfe..c1cc0981c8fb 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -508,6 +508,11 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
i915->sysfs_engine = dir;
 
+   /*
+* We are still booting i915 and we are sure we are running
+* single-threaded. We don't need at this point to protect the
+* uabi_engines access list with the mutex.
+*/
for_each_uabi_engine(engine, i915) {
struct kobject *kobj;
 
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 2905df83e180..12987ece6f8e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1592,12 +1592,14 @@ int i915_cmd_parser_get_version(struct drm_i915_private 
*dev_priv)
bool active = false;
 
/* If the command parser is not enabled, report 0 - unsupported */
+   mutex_lock(&dev_priv->uabi_engines_mutex);
for_each_uabi_engine(engine, dev_priv) {
if (intel_engine_using_cmd_parser(engine)) {
active = true;
break;
}
}
+   mutex_unlock(&dev_priv->uabi_engines_mutex);
if (!active)
return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index bc717cf544e4..8b5e365eb6bd 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -459,8 +459,10 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
   to_gt(i915)->clock_period_ns);
 
p = drm_seq_file_printer(m);
+   mutex_lock(&i915->uabi_engines_mutex);
for_each_uabi_engine(engine, i915)
intel_engine_dump(engine, &p, "%s\n", engine->name);
+   mutex_unlock(&i915->uabi_engines_mutex);
 
intel_gt_show_timelines(to_gt(i915), &p, 
i915_request_show_with_schedule);
 
@@ -474,6 +476,7 @@ static int

[PATCH v3 11/15] drm/i915/gt: Store active CCS mask

2024-08-23 Thread Andi Shyti

To support upcoming patches, we need to store the current mask
for active CCS engines.

Active engines refer to those exposed to userspace via the UABI
engine list.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 41 +++--
 drivers/gpu/drm/i915/gt/intel_gt_types.h|  7 
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index edb6a4b63826..5eead7b18f57 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -12,6 +12,7 @@
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
unsigned long cslices_mask = CCS_MASK(gt);
+   unsigned long ccs_mask = gt->ccs.id_mask;
u32 mode_val = 0;
/* CCS engine id, i.e. the engines position in the engine's bitmask */
int engine;
@@ -55,7 +56,7 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 *   slice 2: ccs2
 *   slice 3: ccs3
 */
-   engine = __ffs(cslices_mask);
+   engine = __ffs(ccs_mask);
 
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
if (!(cslices_mask & BIT(cslice))) {
@@ -86,7 +87,7 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 * CCS mode, will be used later to
 * reset to a flexible value
 */
-   engine = __ffs(cslices_mask);
+   engine = __ffs(ccs_mask);
continue;
}
}
@@ -94,13 +95,45 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
gt->ccs.mode_reg_val = mode_val;
 }
 
+static void __update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+   unsigned long cslices_mask = CCS_MASK(gt);
+   int i;
+
+   /* Mask off all the CCS engines */
+   gt->ccs.id_mask = 0;
+
+   for_each_set_bit(i, &cslices_mask, I915_MAX_CCS) {
+   gt->ccs.id_mask |= BIT(i);
+
+   ccs_mode--;
+   if (!ccs_mode)
+   break;
+   }
+
+   /*
+* It's impossible for 'ccs_mode' to be zero at this point.
+* This scenario would only occur if the 'ccs_mode' provided by
+* the caller exceeded the total number of CCS engines, a condition
+* we check before calling the 'update_ccs_mask()' function.
+*/
+   GEM_BUG_ON(ccs_mode);
+
+   /* Initialize the CCS mode setting */
+   intel_gt_apply_ccs_mode(gt);
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
if (!IS_DG2(gt->i915))
return;
 
-   /* Initialize the CCS mode setting */
-   intel_gt_apply_ccs_mode(gt);
+   /*
+* Set CCS balance mode 1 in the ccs_mask.
+*
+* During init the workaround are not set up yet.
+*/
+   __update_ccs_mask(gt, 1);
 }
 
 static ssize_t num_cslices_show(struct device *dev,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 71e43071da0b..641be69016e1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -219,6 +219,13 @@ struct intel_gt {
 */
struct {
u32 mode_reg_val;
+
+   /*
+* CCS id_mask is the command streamer instance
+* exposed to the user. While the CCS_MASK(gt)
+* is the available unfused compute slices.
+*/
+   intel_engine_mask_t id_mask;
} ccs;
 
/*
-- 
2.45.2

[PATCH v3 10/15] drm/i915/gt: Store engine-related sysfs kobjects

2024-08-23 Thread Andi Shyti

Upcoming commits will need to access engine-related kobjects to
enable the creation and destruction of sysfs interfaces at
runtime.

For this, store the "engine" directory (i915->sysfs_engine), the
engine files (gt->kobj), and the default data
(gt->kobj_defaults).

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++
 drivers/gpu/drm/i915/gt/sysfs_engines.c  | 4 
 drivers/gpu/drm/i915/i915_drv.h  | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index ba55c059063d..cdc695fda918 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
u32 context_size;
u32 mmio_base;
 
+   struct kobject *kobj;
+
struct intel_engine_tlb_inv tlb_inv;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 021f51d9b456..f67f76df1cfe 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -506,6 +506,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
if (!dir)
return;
 
+   i915->sysfs_engine = dir;
+
for_each_uabi_engine(engine, i915) {
struct kobject *kobj;
 
@@ -526,6 +528,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
add_defaults(container_of(kobj, struct kobj_engine, base));
 
+   engine->kobj = kobj;
+
if (0) {
 err_object:
kobject_put(kobj);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 94f7f6cc444c..3a8a757f5bd5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -320,6 +320,7 @@ struct drm_i915_private {
struct intel_gt *gt[I915_MAX_GT];
 
struct kobject *sysfs_gt;
+   struct kobject *sysfs_engine;
 
/* Quick lookup of media GT (current platforms only have one) */
struct intel_gt *media_gt;
-- 
2.45.2

[PATCH v3 09/15] drm/i915/gt: Expose the number of total CCS slices

2024-08-23 Thread Andi Shyti

Implement a sysfs interface to show the number of available CCS
slices. The displayed number does not take into account the CCS
balancing mode.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 21 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c|  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index fc8a23fc28b6..edb6a4b63826 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -5,7 +5,9 @@
 
 #include "i915_drv.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_print.h"
 #include "intel_gt_regs.h"
+#include "intel_gt_sysfs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -100,3 +102,22 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
/* Initialize the CCS mode setting */
intel_gt_apply_ccs_mode(gt);
 }
+
+static ssize_t num_cslices_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 num_slices;
+
+   num_slices = hweight32(CCS_MASK(gt));
+
+   return sysfs_emit(buff, "%u\n", num_slices);
+}
+static DEVICE_ATTR_RO(num_cslices);
+
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
+{
+   if (sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr))
+   gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 4a6763b95a78..9696cc9017f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -9,5 +9,6 @@
 #include "intel_gt.h"
 
 void intel_gt_ccs_mode_init(struct intel_gt *gt);
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
index 33cba406b569..895eedc402ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
@@ -12,6 +12,7 @@
 #include "i915_drv.h"
 #include "i915_sysfs.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_sysfs.h"
 #include "intel_gt_sysfs_pm.h"
@@ -101,6 +102,7 @@ void intel_gt_sysfs_register(struct intel_gt *gt)
goto exit_fail;
 
intel_gt_sysfs_pm_init(gt, >->sysfs_gt);
+   intel_gt_sysfs_ccs_init(gt);
 
return;
 
-- 
2.45.2

[PATCH v3 08/15] drm/i915/gt: Remove cslices mask value from the CCS structure

2024-08-23 Thread Andi Shyti

Following the decision to manage CCS engine creation within UABI
engines, the "cslices" variable in the "ccs" structure in the
"gt" is no longer needed. Remove it is now redundant.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 5 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index a6c33b471567..fc8a23fc28b6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -9,7 +9,7 @@
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
-   unsigned long cslices_mask = gt->ccs.cslices;
+   unsigned long cslices_mask = CCS_MASK(gt);
u32 mode_val = 0;
/* CCS engine id, i.e. the engines position in the engine's bitmask */
int engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 9e257f34d05b..71e43071da0b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -218,11 +218,6 @@ struct intel_gt {
 * i.e. how the CCS streams are distributed amongs the slices.
 */
struct {
-   /*
-* Mask of the non fused CCS slices
-* to be used for the load balancing
-*/
-   intel_engine_mask_t cslices;
u32 mode_reg_val;
} ccs;
 
-- 
2.45.2

[PATCH v3 07/15] drm/i915/gt: Manage CCS engine creation within UABI exposure

2024-08-23 Thread Andi Shyti

In commit ea315f98e5d6 ("drm/i915/gt: Do not generate the command
streamer for all the CCS"), we restricted the creation of
physical CCS engines to only one stream. This allowed the user to
submit a single compute workload, with all CCS slices sharing the
workload from that stream.

This patch removes that limitation but still exposes only one
stream to the user. The physical memory for each engine remains
allocated but unused, however the user will only see one engine
exposed.

Do this by adding only one engine to the UABI list, ensuring that
only one engine is visible to the user.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 23 -
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 17 ---
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4d30a86016f2..def255ee0b96 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -876,29 +876,6 @@ static intel_engine_mask_t init_engine_mask(struct 
intel_gt *gt)
info->engine_mask &= ~BIT(GSC0);
}
 
-   /*
-* Do not create the command streamer for CCS slices beyond the first.
-* All the workload submitted to the first engine will be shared among
-* all the slices.
-*
-* Once the user will be allowed to customize the CCS mode, then this
-* check needs to be removed.
-*/
-   if (IS_DG2(gt->i915)) {
-   u8 first_ccs = __ffs(CCS_MASK(gt));
-
-   /*
-* Store the number of active cslices before
-* changing the CCS engine configuration
-*/
-   gt->ccs.cslices = CCS_MASK(gt);
-
-   /* Mask off all the CCS engine */
-   info->engine_mask &= ~GENMASK(CCS3, CCS0);
-   /* Put back in the first CCS engine */
-   info->engine_mask |= BIT(_CCS(first_ccs));
-   }
-
return info->engine_mask;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index cd7662b1ad59..8e5284af8335 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -246,6 +246,20 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
 
GEM_BUG_ON(uabi_class >=
   ARRAY_SIZE(i915->engine_uabi_class_count));
+
+   /* Fix up the mapping to match default execbuf::user_map[] */
+   add_legacy_ring(&ring, engine);
+
+   /*
+* Do not create the command streamer for CCS slices beyond the
+* first. All the workload submitted to the first engine will be
+* shared among all the slices.
+*/
+   if (IS_DG2(i915) &&
+   uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+   engine->uabi_instance)
+   goto clear_node_continue;
+
i915->engine_uabi_class_count[uabi_class]++;
 
rb_link_node(&engine->uabi_node, prev, p);
@@ -255,9 +269,6 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
engine->uabi_class,
engine->uabi_instance) != 
engine);
 
-   /* Fix up the mapping to match default execbuf::user_map[] */
-   add_legacy_ring(&ring, engine);
-
prev = &engine->uabi_node;
p = &prev->rb_right;
 
-- 
2.45.2

[PATCH v3 06/15] drm/i915/gt: Introduce for_each_enabled_engine() and apply it in selftests

2024-08-23 Thread Andi Shyti

Selftests should run only on enabled engines, as disabled engines
are not intended for use. A practical example is when, on DG2
machines, the user chooses to utilize only one CCS stream instead
of all four.

To address this, introduce the for_each_enabled_engine() loop,
which will skip engines when they are marked as RB_EMPTY.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt.h| 12 +
 drivers/gpu/drm/i915/gt/selftest_context.c|  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 52 +--
 drivers/gpu/drm/i915/gt/selftest_gt_pm.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 22 
 drivers/gpu/drm/i915/gt/selftest_lrc.c| 18 +++
 drivers/gpu/drm/i915/gt/selftest_mocs.c   |  6 +--
 drivers/gpu/drm/i915/gt/selftest_rc6.c|  4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +--
 .../drm/i915/gt/selftest_ring_submission.c|  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c| 14 ++---
 drivers/gpu/drm/i915/gt/selftest_timeline.c   | 14 ++---
 drivers/gpu/drm/i915/gt/selftest_tlb.c|  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c| 14 ++---
 17 files changed, 102 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 998ca029b73a..1c9d861241ad 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -188,6 +188,18 @@ int intel_gt_tiles_init(struct drm_i915_private *i915);
 (id__)++) \
for_each_if ((engine__) = (gt__)->engine[(id__)])
 
+/*
+ * Iterator over all initialized and enabled engines. Some engines, like CCS,
+ * may be "disabled" (i.e., not exposed to the user). Disabling is indicated
+ * by marking the rb_node as empty.
+ */
+#define for_each_enabled_engine(engine__, gt__, id__) \
+   for ((id__) = 0; \
+(id__) < I915_NUM_ENGINES; \
+(id__)++) \
+   for_each_if (((engine__) = (gt__)->engine[(id__)]) && \
+(!RB_EMPTY_NODE(&(engine__)->uabi_node)))
+
 /* Iterator over subset of engines selected by mask */
 #define for_each_engine_masked(engine__, gt__, mask__, tmp__) \
for ((tmp__) = (mask__) & (gt__)->info.engine_mask; \
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
b/drivers/gpu/drm/i915/gt/selftest_context.c
index 5eb46700dc4e..9976e231248d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -157,7 +157,7 @@ static int live_context_size(void *arg)
 * HW tries to write past the end of one.
 */
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct file *saved;
 
if (!engine->context_size)
@@ -311,7 +311,7 @@ static int live_active_context(void *arg)
enum intel_engine_id id;
int err = 0;
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
err = __live_active_context(engine);
if (err)
break;
@@ -424,7 +424,7 @@ static int live_remote_context(void *arg)
enum intel_engine_id id;
int err = 0;
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
err = __live_remote_context(engine);
if (err)
break;
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
index 5ffa5e30f419..038723a401df 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
@@ -142,7 +142,7 @@ static int perf_mi_bb_start(void *arg)
return 0;
 
wakeref = perf_begin(gt);
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct intel_context *ce = engine->kernel_context;
struct i915_vma *batch;
u32 cycles[COUNT];
@@ -270,7 +270,7 @@ static int perf_mi_noop(void *arg)
return 0;
 
wakeref = perf_begin(gt);
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct intel_context *ce = engine->kernel_context;
struct i915_vma *base, *nop;
u32 cycles[COUNT];
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index 9e4f0e417b3b..74d4c2dc69cf 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -160,7 +160,7 @@ static int live_idle_flush(void *arg)

[PATCH v3 05/15] drm/i915/gem: Mark and verify UABI engine validity

2024-08-23 Thread Andi Shyti

Mark engines as invalid when they are not added to the UABI list
to prevent accidental assignment of batch buffers.

Currently, this change is mostly precautionary with minimal
impact. However, in the future, when CCS engines will be
dynamically added and removed by the user, this mechanism will
be used for determining engine validity.

Signed-off-by: Andi Shyti 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 28 +--
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  9 --
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c58290274f97..770875e72056 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2682,6 +2682,22 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
return user_ring_map[user_ring_id];
 }
 
+static bool engine_valid(struct intel_context *ce)
+{
+   if (!intel_engine_is_virtual(ce->engine))
+   return !RB_EMPTY_NODE(&ce->engine->uabi_node);
+
+   /*
+* TODO: check virtual sibilings; we need to walk through all the
+* virtual engines and ask whether the physical engine where it is based
+* is still valid. For each of them we need to check with
+* RB_EMPTY_NODE(...)
+*
+* This can be a placed in a new ce_ops.
+*/
+   return true;
+}
+
 static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
@@ -2712,8 +2728,6 @@ eb_select_engine(struct i915_execbuffer *eb)
eb->num_batches = ce->parallel.number_children + 1;
gt = ce->engine->gt;
 
-   for_each_child(ce, child)
-   intel_context_get(child);
eb->wakeref = intel_gt_pm_get(ce->engine->gt);
/*
 * Keep GT0 active on MTL so that i915_vma_parked() doesn't
@@ -2722,6 +2736,16 @@ eb_select_engine(struct i915_execbuffer *eb)
if (gt->info.id)
eb->wakeref_gt0 = intel_gt_pm_get(to_gt(gt->i915));
 
+   /* We need to hold the wakeref to stabilize i915->uabi_engines */
+   if (!engine_valid(ce)) {
+   intel_context_put(ce);
+   err = -ENODEV;
+   goto err;
+   }
+
+   for_each_child(ce, child)
+   intel_context_get(child);
+
if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
err = intel_context_alloc_state(ce);
if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 11cc06c0c785..cd7662b1ad59 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -220,7 +220,7 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
container_of(it, typeof(*engine), uabi_list);
 
if (intel_gt_has_unrecoverable_error(engine->gt))
-   continue; /* ignore incomplete engines */
+   goto clear_node_continue; /* ignore incomplete engines 
*/
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
engine->uabi_class = uabi_classes[engine->class];
@@ -242,7 +242,7 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
  engine->uabi_instance);
 
if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
-   continue;
+   goto clear_node_continue;
 
GEM_BUG_ON(uabi_class >=
   ARRAY_SIZE(i915->engine_uabi_class_count));
@@ -260,6 +260,11 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
 
prev = &engine->uabi_node;
p = &prev->rb_right;
+
+   continue;
+
+clear_node_continue:
+   RB_CLEAR_NODE(&engine->uabi_node);
}
 
if (IS_ENABLED(CONFIG_DRM_I915_SELFTESTS) &&
-- 
2.45.2

[PATCH v3 04/15] drm/i915/gt: Refactor uabi engine class/instance list creation

2024-08-23 Thread Andi Shyti

For the upcoming changes we need a cleaner way to build the list
of uabi engines.

Suggested-by: Tvrtko Ursulin 
Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 -
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..11cc06c0c785 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, 
const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)
 {
-   u16 name_instance, other_instance = 0;
+   u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
struct legacy_ring ring = {};
struct list_head *it, *next;
struct rb_node **p, *prev;
@@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
prev = NULL;
p = &i915->uabi_engines.rb_node;
list_for_each_safe(it, next, &engines) {
+   u16 uabi_class;
+
struct intel_engine_cs *engine =
container_of(it, typeof(*engine), uabi_list);
 
@@ -222,15 +224,14 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
engine->uabi_class = uabi_classes[engine->class];
-   if (engine->uabi_class == I915_NO_UABI_CLASS) {
-   name_instance = other_instance++;
-   } else {
-   GEM_BUG_ON(engine->uabi_class >=
-  ARRAY_SIZE(i915->engine_uabi_class_count));
-   name_instance =
-   
i915->engine_uabi_class_count[engine->uabi_class]++;
-   }
-   engine->uabi_instance = name_instance;
+
+   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+   else
+   uabi_class = engine->uabi_class;
+
+   GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+   engine->uabi_instance = class_instance[uabi_class]++;
 
/*
 * Replace the internal name with the final user and log facing
@@ -238,11 +239,15 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 */
engine_rename(engine,
  intel_engine_class_repr(engine->class),
- name_instance);
+ engine->uabi_instance);
 
-   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
continue;
 
+   GEM_BUG_ON(uabi_class >=
+  ARRAY_SIZE(i915->engine_uabi_class_count));
+   i915->engine_uabi_class_count[uabi_class]++;
+
rb_link_node(&engine->uabi_node, prev, p);
rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
-- 
2.45.2

[PATCH v3 03/15] drm/i915/gt: Allow the creation of multi-mode CCS masks

2024-08-23 Thread Andi Shyti

Until now, we have only set CCS mode balancing to 1, which means
that only one compute engine is exposed to the user. The stream
of compute commands submitted to that engine is then shared among
all the dedicated execution units.

This is done by calling the 'intel_gt_apply_ccs_mode(); function.

With this change, the aforementioned function takes an additional
parameter called 'mode' that specifies the desired mode to be set
for the CCS engines balancing. The mode parameter can have the
following values:

 - mode = 0: CCS load balancing mode 1 (1 CCS engine exposed)
 - mode = 1: CCS load balancing mode 2 (2 CCS engines exposed)
 - mode = 3: CCS load balancing mode 4 (4 CCS engines exposed)

This allows us to generate the appropriate register value to be
written to CCS_MODE, configuring how the exposed engine streams
will be submitted to the execution units.

No functional changes are intended yet, as no mode higher than
'0' is currently being set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 85 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 2 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index fcd07eb4728b..a6c33b471567 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,35 +4,92 @@
  */
 
 #include "i915_drv.h"
-#include "intel_gt.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
+   unsigned long cslices_mask = gt->ccs.cslices;
+   u32 mode_val = 0;
+   /* CCS engine id, i.e. the engines position in the engine's bitmask */
+   int engine;
int cslice;
-   u32 mode = 0;
-   int first_ccs = __ffs(CCS_MASK(gt));
 
-   /* Build the value for the fixed CCS load balancing */
+   /*
+* The mode has two bit dedicated for each engine
+* that will be used for the CCS balancing algorithm:
+*
+*BIT | CCS slice
+*   --
+* 0  | CCS slice
+* 1  | 0
+*   --
+* 2  | CCS slice
+* 3  | 1
+*   --
+* 4  | CCS slice
+* 5  | 2
+*   --
+* 6  | CCS slice
+* 7  | 3
+*   --
+*
+* When a CCS slice is not available, then we will write 0x7,
+* oterwise we will write the user engine id which load will
+* be forwarded to that slice.
+*
+* The possible configurations are:
+*
+* 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*/
+   engine = __ffs(cslices_mask);
+
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
-   if (gt->ccs.cslices & BIT(cslice))
+   if (!(cslices_mask & BIT(cslice))) {
/*
-* If available, assign the cslice
-* to the first available engine...
+* If not available, mark the slice as unavailable
+* and no task will be dispatched here.
 */
-   mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice,
+XEHP_CCS_MODE_CSLICE_MASK);
+   continue;
+   }
 
-   else
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice, engine);
+
+   engine = find_next_bit(&cslices_mask, I915_MAX_CCS, engine + 1);
+   /*
+* If "engine" has reached the I915_MAX_CCS value it means that
+* we have gone through all the unfused engines and now we need
+* to reset its value to the first engine.
+*
+* From the find_next_bit() description:
+*
+* "Returns the bit number for the next set bit
+* If no bits are set, returns @size."
+*/
+   if (engine == I915_MAX_CCS) {
/*
-* ... otherwise, mark the cslice as
-* unavailable if no CCS dispatches here
+* CCS mode, will be used later to
+* re

[PATCH v3 02/15] drm/i915/gt: Move the CCS mode variable to a global position

2024-08-23 Thread Andi Shyti

Store the CCS mode value in the intel_gt->ccs structure to make
it available for future instances that may need to change its
value.

Name it mode_reg_val because it holds the value that will
be written into the CCS_MODE register, determining the CCS
balancing and, consequently, the number of engines generated.

No functional changes intended.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt.c  |  3 +++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 16 +++-
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 11 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 --
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a6c69a706fd7..5af0527d822d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -18,6 +18,7 @@
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_debugfs.h"
 #include "intel_gt_mcr.h"
@@ -136,6 +137,8 @@ int intel_gt_init_mmio(struct intel_gt *gt)
intel_sseu_info_init(gt);
intel_gt_mcr_init(gt);
 
+   intel_gt_ccs_mode_init(gt);
+
return intel_engines_init_mmio(gt);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 3c62a44e9106..fcd07eb4728b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -8,15 +8,12 @@
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
int cslice;
u32 mode = 0;
int first_ccs = __ffs(CCS_MASK(gt));
 
-   if (!IS_DG2(gt->i915))
-   return 0;
-
/* Build the value for the fixed CCS load balancing */
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
if (gt->ccs.cslices & BIT(cslice))
@@ -35,5 +32,14 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
 XEHP_CCS_MODE_CSLICE_MASK);
}
 
-   return mode;
+   gt->ccs.mode_reg_val = mode;
+}
+
+void intel_gt_ccs_mode_init(struct intel_gt *gt)
+{
+   if (!IS_DG2(gt->i915))
+   return;
+
+   /* Initialize the CCS mode setting */
+   intel_gt_apply_ccs_mode(gt);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 55547f2ff426..0f2506586a41 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -8,6 +8,6 @@
 
 struct intel_gt;
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt);
+void intel_gt_ccs_mode_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index bcee084b1f27..9e257f34d05b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -207,12 +207,23 @@ struct intel_gt {
[MAX_ENGINE_INSTANCE + 1];
enum intel_submission_method submission_method;
 
+   /*
+* Track fixed mapping between CCS engines and compute slices.
+*
+* In order to w/a HW that has the inability to dynamically load
+* balance between CCS engines and EU in the compute slices, we have to
+* reconfigure a static mapping on the fly.
+*
+* The mode variable is set by the user and sets the balancing mode,
+* i.e. how the CCS streams are distributed amongs the slices.
+*/
struct {
/*
 * Mask of the non fused CCS slices
 * to be used for the load balancing
 */
intel_engine_mask_t cslices;
+   u32 mode_reg_val;
} ccs;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index f3082fad3f45..f6135be3cd86 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2727,7 +2727,7 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
 static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct 
i915_wa_list *wal)
 {
struct intel_gt *gt = engine->gt;
-   u32 mode;
+   u32 mode = gt->ccs.mode_reg_val;
 
if (!IS_DG2(gt->i915))
return;
@@ -2743,8 +2743,10 @@ static void ccs_engine_wa_mode(struct intel_engine_cs 
*engine, struct i915_wa_li
/*
 * After hav

[PATCH v3 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting

2024-08-23 Thread Andi Shyti

When setting the CCS mode, we mistakenly used wa_masked_en() to
apply the workaround, which reads from the register and masks the
existing value with the new one.

Our intention was to write the value directly, without masking
it.

So far, this hasn't caused issues because we've been using a
register value that only enables a single CCS engine, typically
with an ID of '0'.

However, in upcoming patches, we will be utilizing multiple
engines, and it's crucial that we write the new value directly
without any masking.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index bfe6d8fc820f..f3082fad3f45 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2745,7 +2745,7 @@ static void ccs_engine_wa_mode(struct intel_engine_cs 
*engine, struct i915_wa_li
 * assign all slices to a single CCS. We will call it CCS mode 1
 */
mode = intel_gt_apply_ccs_mode(gt);
-   wa_masked_en(wal, XEHP_CCS_MODE, mode);
+   wa_add(wal, XEHP_CCS_MODE, 0, mode, mode, false);
 }
 
 /*
-- 
2.45.2

[PATCH v3 00/15] CCS static load balance

2024-08-23 Thread Andi Shyti

Hi,

This patch series introduces static load balancing for GPUs with
multiple compute engines. It's a lengthy series, and some
challenging aspects still need to be resolved.

I have tried to split the work as much as possible to facilitate
the review process.

To summarize, in patches 1 to 14, no functional changes occur
except for the addition of the num_cslices interface. The
significant changes happen in patch 15, which is the core part of
the CCS mode setting, utilizing the groundwork laid in the
earlier patches.

In this updated approach, the focus is now on managing the UABI
engine list, which controls the engines exposed to userspace.
Instead of manipulating phuscal engines and their memory, we now
handle engine exposure through this list.

I would greatly appreciate further input from all reviewers who
have already assisted with the previous work.

IGT tests have also been developed, but I haven't sent them yet.

Thank you Chris for the offline reviews.

Thanks,
Andi

Changelog:
==
PATCHv2 -> PATCHv3
--
 - Fix a NULL pointer dereference during module unload.
   In i915_gem_driver_remove() I was accessing the gt after the
   gt was removed. Use the dev_priv, instead (obviously!).
 - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks
   were not correctly placed in the exit paths.
 - Fix a checkpatch error for spaces after and before parenthesis
   in the for_each_enabled_engine() definition.

PATCHv1 -> PATCHv2
--
 - Use uabi_mutex to protect the uabi_engines, not the engine
   itself. Rename it to uabi_engines_mutex.
 - Use kobject_add/kobject_del for adding and removing
   interfaces, this way we don't need to destroy and recreate the
   engines, anymore. Refactor intel_engine_add_single_sysfs() to
   reflect this scenario.
 - After adding engines to the rb_tree check that they have been
   added correctly.
 - Fix rb_find_add() compare function to take into accoung also
   the class, not just the instance.

RFCv2 -> PATCHv1

 - Removed gt->ccs.mutex
 - Rename m -> width, ccs_id -> engine in
   intel_gt_apply_ccs_mode().
 - In the CCS register value calculation
   (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
   along the ccs_mask (set by the user) instead of the
   cslice_mask.
 - Add GEM_BUG_ON after calculating the new ccs_mask
   (update_ccs_mask()) to make sure all angines have been
   evaluated (i.e. ccs_mask must be '0' at the end of the
   algorithm).
 - move wakeref lock before evaluating intel_gt_pm_is_awake() and
   fix exit path accordingly.
 - Use a more compact form in intel_gt_sysfs_ccs_init() and
   add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
   need to store the return value to the err variable which is
   unused. Get rid of err.
 - Print a warnging instead of a debug message if we fail to
   create the sysfs files.
 - If engine files creation fails in
   intel_engine_add_single_sysfs(), print a warning, not an
   error.
 - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
   to explain its purpose.
 - During uabi engine creation, in
   intel_engines_driver_register(), the uabi_ccs_instance is
   redundant because the ccs_instances is already tracked in
   engine->uabi_instance.
 - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
   __maybe_unused not to break bisectability. They wouldn't
   compile in their own commit. They will be used in the next
   patch and the __maybe_unused is removed.
 - Update engine's workaround every time a new mode is set in
   update_ccs_mask().
 - Mark engines as valid or invalid using their status as
   rb_node. Invalid engines are marked as invalid using
   RB_CLEAR_NODE(). Execbufs will check for their validity when
   selecting the engine to be combined to a context.
 - Create for_each_enabled_engine() which skips the non valid
   engines and use it in selftests.

RFCv1 -> RFCv2
--
Compared to the first version I've taken a completely different
approach to adding and removing engines. in v1 physical engines
were directly added and removed, along with the memory allocated
to them, each time the user changed the CCS mode (from the
previous cover letter).

Andi Shyti (15):
  drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  drm/i915/gt: Move the CCS mode variable to a global position
  drm/i915/gt: Allow the creation of multi-mode CCS masks
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gem: Mark and verify UABI engine validity
  drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
selftests
  drm/i915/gt: Manage CCS engine creation within UABI exposure
  drm/i915/gt: Remove cslices mask value from the CCS structure
  drm/i915/gt: Expose the number of total CCS slices
  drm/i915/gt: Store engine-related sysfs kobjects
  drm/i915/gt: Store active CCS mask
  drm/i915: Protect access to

[PATCH v3 00/15] CCS static load balance

2024-08-23 Thread Andi Shyti

Hi,

This patch series introduces static load balancing for GPUs with
multiple compute engines. It's a lengthy series, and some
challenging aspects still need to be resolved.

I have tried to split the work as much as possible to facilitate
the review process.

To summarize, in patches 1 to 14, no functional changes occur
except for the addition of the num_cslices interface. The
significant changes happen in patch 15, which is the core part of
the CCS mode setting, utilizing the groundwork laid in the
earlier patches.

In this updated approach, the focus is now on managing the UABI
engine list, which controls the engines exposed to userspace.
Instead of manipulating phuscal engines and their memory, we now
handle engine exposure through this list.

I would greatly appreciate further input from all reviewers who
have already assisted with the previous work.

IGT tests have also been developed, but I haven't sent them yet.

Thank you Chris for the offline reviews.

Thanks,
Andi

Changelog:
==
PATCHv2 -> PATCHv3
--
 - Fix a NULL pointer dereference during module unload.
   In i915_gem_driver_remove() I was accessing the gt after the
   gt was removed. Use the dev_priv, instead (obviously!).
 - Fix a lockdep issue: Some of the uabi_engines_mutex unlocks
   were not correctly placed in the exit paths.
 - Fix a checkpatch error for spaces after and before parenthesis
   in the for_each_enabled_engine() definition.

PATCHv1 -> PATCHv2
--
 - Use uabi_mutex to protect the uabi_engines, not the engine
   itself. Rename it to uabi_engines_mutex.
 - Use kobject_add/kobject_del for adding and removing
   interfaces, this way we don't need to destroy and recreate the
   engines, anymore. Refactor intel_engine_add_single_sysfs() to
   reflect this scenario.
 - After adding engines to the rb_tree check that they have been
   added correctly.
 - Fix rb_find_add() compare function to take into accoung also
   the class, not just the instance.

RFCv2 -> PATCHv1

 - Removed gt->ccs.mutex
 - Rename m -> width, ccs_id -> engine in
   intel_gt_apply_ccs_mode().
 - In the CCS register value calculation
   (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
   along the ccs_mask (set by the user) instead of the
   cslice_mask.
 - Add GEM_BUG_ON after calculating the new ccs_mask
   (update_ccs_mask()) to make sure all angines have been
   evaluated (i.e. ccs_mask must be '0' at the end of the
   algorithm).
 - move wakeref lock before evaluating intel_gt_pm_is_awake() and
   fix exit path accordingly.
 - Use a more compact form in intel_gt_sysfs_ccs_init() and
   add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
   need to store the return value to the err variable which is
   unused. Get rid of err.
 - Print a warnging instead of a debug message if we fail to
   create the sysfs files.
 - If engine files creation fails in
   intel_engine_add_single_sysfs(), print a warning, not an
   error.
 - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
   to explain its purpose.
 - During uabi engine creation, in
   intel_engines_driver_register(), the uabi_ccs_instance is
   redundant because the ccs_instances is already tracked in
   engine->uabi_instance.
 - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
   __maybe_unused not to break bisectability. They wouldn't
   compile in their own commit. They will be used in the next
   patch and the __maybe_unused is removed.
 - Update engine's workaround every time a new mode is set in
   update_ccs_mask().
 - Mark engines as valid or invalid using their status as
   rb_node. Invalid engines are marked as invalid using
   RB_CLEAR_NODE(). Execbufs will check for their validity when
   selecting the engine to be combined to a context.
 - Create for_each_enabled_engine() which skips the non valid
   engines and use it in selftests.

RFCv1 -> RFCv2
--
Compared to the first version I've taken a completely different
approach to adding and removing engines. in v1 physical engines
were directly added and removed, along with the memory allocated
to them, each time the user changed the CCS mode (from the
previous cover letter).

Andi Shyti (15):
  drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  drm/i915/gt: Move the CCS mode variable to a global position
  drm/i915/gt: Allow the creation of multi-mode CCS masks
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gem: Mark and verify UABI engine validity
  drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
selftests
  drm/i915/gt: Manage CCS engine creation within UABI exposure
  drm/i915/gt: Remove cslices mask value from the CCS structure
  drm/i915/gt: Expose the number of total CCS slices
  drm/i915/gt: Store engine-related sysfs kobjects
  drm/i915/gt: Store active CCS mask
  drm/i915: Protect access to

[PATCH v2 15/15] drm/i915/gt: Allow the user to change the CCS mode through sysfs

2024-08-22 Thread Andi Shyti

Create the 'ccs_mode' file under

/sys/class/drm/cardX/gt/gt0/ccs_mode

This file allows the user to read and set the current CCS mode.

 - Reading: The user can read the current CCS mode, which can be
   1, 2, or 4. This value is derived from the current engine
   mask.

 - Writing: The user can set the CCS mode to 1, 2, or 4,
   depending on the desired number of exposed engines and the
   required load balancing.

The interface will return -EBUSY if other clients are connected
to i915, or -EINVAL if an invalid value is set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 82 -
 1 file changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 4462e07ee903..b0f69ae435ca 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -6,6 +6,7 @@
 #include "i915_drv.h"
 #include "intel_engine_user.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_pm.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
@@ -172,7 +173,7 @@ static int rb_engine_cmp(struct rb_node *rb_new, const 
struct rb_node *rb_old)
return new->uabi_class - old->uabi_class;
 }
 
-static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 
ccs_mode)
+static void add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mode)
 {
struct drm_i915_private *i915 = gt->i915;
intel_engine_mask_t new_ccs_mask, tmp;
@@ -231,7 +232,7 @@ static void __maybe_unused add_uabi_ccs_engines(struct 
intel_gt *gt, u32 ccs_mod
}
 }
 
-static void __maybe_unused remove_uabi_ccs_engines(struct intel_gt *gt, u8 
ccs_mode)
+static void remove_uabi_ccs_engines(struct intel_gt *gt, u8 ccs_mode)
 {
struct drm_i915_private *i915 = gt->i915;
intel_engine_mask_t new_ccs_mask, tmp;
@@ -272,8 +273,85 @@ static ssize_t num_cslices_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(num_cslices);
 
+static ssize_t ccs_mode_show(struct device *dev,
+struct device_attribute *attr, char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 ccs_mode;
+
+   ccs_mode = hweight32(gt->ccs.id_mask);
+
+   return sysfs_emit(buff, "%u\n", ccs_mode);
+}
+
+static ssize_t ccs_mode_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buff, size_t count)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   int num_cslices = hweight32(CCS_MASK(gt));
+   int ccs_mode = hweight32(gt->ccs.id_mask);
+   ssize_t ret;
+   u32 val;
+
+   ret = kstrtou32(buff, 0, &val);
+   if (ret)
+   return ret;
+
+   /*
+* As of now possible values to be set are 1, 2, 4,
+* up to the maximum number of available slices
+*/
+   if (!val || val > num_cslices || (num_cslices % val))
+   return -EINVAL;
+
+   /* Let's wait until the GT is no longer in use */
+   ret = intel_gt_pm_wait_for_idle(gt);
+   if (ret)
+   return ret;
+
+   mutex_lock(>->wakeref.mutex);
+
+   /*
+* Let's check again that the GT is idle,
+* we don't want to change the CCS mode
+* while someone is using the GT
+*/
+   if (intel_gt_pm_is_awake(gt)) {
+   ret = -EBUSY;
+   goto out;
+   }
+
+   /*
+* Nothing to do if the requested setting
+* is the same as the current one
+*/
+   if (val == ccs_mode)
+   goto out;
+   else if (val > ccs_mode)
+   add_uabi_ccs_engines(gt, val);
+   else
+   remove_uabi_ccs_engines(gt, val);
+
+out:
+   mutex_unlock(>->wakeref.mutex);
+
+   return ret ?: count;
+}
+static DEVICE_ATTR_RW(ccs_mode);
+
 void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
 {
if (sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr))
gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+
+   /*
+* Do not create the ccs_mode file for non DG2 platforms
+* because they don't need it as they have only one CCS engine
+*/
+   if (!IS_DG2(gt->i915))
+   return;
+
+   if (sysfs_create_file(>->sysfs_gt, &dev_attr_ccs_mode.attr))
+   gt_warn(gt, "Failed to create sysfs ccs_mode files\n");
 }
-- 
2.45.2

[PATCH v2 14/15] drm/i915/gt: Implement creation and removal routines for CCS engines

2024-08-22 Thread Andi Shyti

In preparation for upcoming patches, we need routines to
dynamically create and destroy CCS engines based on the CCS mode
that the user wants to set.

The process begins by calculating the engine mask for the engines
that need to be added or removed. We then update the UABI list of
exposed engines and create or destroy the corresponding sysfs
interfaces accordingly.

These functions are not yet in use, so no functional changes are
intended at this stage.

Mark the functions 'add_uabi_ccs_engines()' and
'remove_uabi_ccs_engines()' as '__maybe_unused' to ensure
successful compilation and maintain bisectability. This
annotation will be removed in subsequent commits.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 123 
 1 file changed, 123 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 5eead7b18f57..4462e07ee903 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,10 +4,12 @@
  */
 
 #include "i915_drv.h"
+#include "intel_engine_user.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
+#include "sysfs_engines.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -123,6 +125,29 @@ static void __update_ccs_mask(struct intel_gt *gt, u32 
ccs_mode)
intel_gt_apply_ccs_mode(gt);
 }
 
+static void update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   __update_ccs_mask(gt, ccs_mode);
+
+   /* Update workaround values */
+   for_each_engine_masked(engine, gt, gt->ccs.id_mask, tmp) {
+   struct i915_wa_list *wal = &engine->wa_list;
+   struct i915_wa *wa;
+   int i;
+
+   for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
+   if (!i915_mmio_reg_equal(wa->reg, XEHP_CCS_MODE))
+   continue;
+
+   wa->set = gt->ccs.mode_reg_val;
+   wa->read = gt->ccs.mode_reg_val;
+   }
+   }
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
if (!IS_DG2(gt->i915))
@@ -136,6 +161,104 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
__update_ccs_mask(gt, 1);
 }
 
+static int rb_engine_cmp(struct rb_node *rb_new, const struct rb_node *rb_old)
+{
+   struct intel_engine_cs *new = rb_to_uabi_engine(rb_new);
+   struct intel_engine_cs *old = rb_to_uabi_engine(rb_old);
+
+   if (new->uabi_class - old->uabi_class == 0)
+   return new->uabi_instance - old->uabi_instance;
+
+   return new->uabi_class - old->uabi_class;
+}
+
+static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 
ccs_mode)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_engine_mask_t new_ccs_mask, tmp;
+   struct intel_engine_cs *e;
+
+   /* Store the current ccs mask */
+   new_ccs_mask = gt->ccs.id_mask;
+   update_ccs_mask(gt, ccs_mode);
+
+   /*
+* Store only the mask of the CCS engines that need to be added by
+* removing from the new mask the engines that are already active
+*/
+   new_ccs_mask = gt->ccs.id_mask & ~new_ccs_mask;
+   new_ccs_mask <<= CCS0;
+
+   for_each_engine_masked(e, gt, new_ccs_mask, tmp) {
+   int err;
+
+   i915->engine_uabi_class_count[I915_ENGINE_CLASS_COMPUTE]++;
+
+   /*
+* The engine is now inserted and marked as valid.
+*
+* rb_find_add() should always return NULL. If it returns a
+* pointer to an rb_node it means that it found the engine we
+* are trying to insert which means that something is really
+* wrong.
+*/
+   if (rb_find_add(&e->uabi_node,
+   &i915->uabi_engines, rb_engine_cmp)) {
+   gt_err(gt, "Failed to apply CCS mode!\n");
+   return;
+   }
+
+   /* We inserted the engine, let's check if now we can find it */
+   GEM_BUG_ON(intel_engine_lookup_user(i915, e->uabi_class,
+   e->uabi_instance) != e);
+
+   /*
+* If the engine has never been used before (e.g. we are moving
+* for the first time from CCS mode 1 to CCS mode 2 or 4), then
+* also its sysfs entry has never been created. In this case its
+* value will be null and we need to allocate it.
+*/
+

[PATCH v2 13/15] drm/i915/gt: Isolate single sysfs engine file creation

2024-08-22 Thread Andi Shyti

In preparation for upcoming patches, we need the ability to
create and remove individual sysfs files. To facilitate this,
extract from the intel_engines_add_sysfs() function the creation
of individual files.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/sysfs_engines.c | 74 +++--
 drivers/gpu/drm/i915/gt/sysfs_engines.h |  2 +
 2 files changed, 48 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index c1cc0981c8fb..ef2eda72ac7f 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_engine.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_gt_print.h"
 #include "sysfs_engines.h"
 
 struct kobj_engine {
@@ -481,7 +482,7 @@ static void add_defaults(struct kobj_engine *parent)
return;
 }
 
-void intel_engines_add_sysfs(struct drm_i915_private *i915)
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine)
 {
static const struct attribute * const files[] = {
&name_attr.attr,
@@ -497,7 +498,48 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 #endif
NULL
};
+   struct kobject *dir = engine->i915->sysfs_engine;
+   struct kobject *kobj = engine->kobj;
+   int err;
+
+   kobj = kobj_engine(dir, engine);
+   if (!kobj) {
+   err = -EFAULT;
+   goto err_engine;
+   }
+
+   err = sysfs_create_files(kobj, files);
+   if (err)
+   goto err_object;
+
+   if (intel_engine_has_timeslices(engine)) {
+   err = sysfs_create_file(kobj, ×lice_duration_attr.attr);
+   if (err)
+   goto err_object;
+   }
+
+   if (intel_engine_has_preempt_reset(engine)) {
+   err = sysfs_create_file(kobj, &preempt_timeout_attr.attr);
+   if (err)
+   goto err_object;
+   }
+
+   add_defaults(container_of(kobj, struct kobj_engine, base));
+
+   engine->kobj = kobj;
+
+   return 0;
+
+err_object:
+   kobject_put(kobj);
+err_engine:
+   gt_warn(engine->gt, "Failed to add sysfs engine '%s'\n", engine->name);
+
+   return err;
+}
 
+void intel_engines_add_sysfs(struct drm_i915_private *i915)
+{
struct device *kdev = i915->drm.primary->kdev;
struct intel_engine_cs *engine;
struct kobject *dir;
@@ -514,34 +556,10 @@ void intel_engines_add_sysfs(struct drm_i915_private 
*i915)
 * uabi_engines access list with the mutex.
 */
for_each_uabi_engine(engine, i915) {
-   struct kobject *kobj;
-
-   kobj = kobj_engine(dir, engine);
-   if (!kobj)
-   goto err_engine;
-
-   if (sysfs_create_files(kobj, files))
-   goto err_object;
-
-   if (intel_engine_has_timeslices(engine) &&
-   sysfs_create_file(kobj, ×lice_duration_attr.attr))
-   goto err_engine;
-
-   if (intel_engine_has_preempt_reset(engine) &&
-   sysfs_create_file(kobj, &preempt_timeout_attr.attr))
-   goto err_engine;
-
-   add_defaults(container_of(kobj, struct kobj_engine, base));
+   int err;
 
-   engine->kobj = kobj;
-
-   if (0) {
-err_object:
-   kobject_put(kobj);
-err_engine:
-   dev_err(kdev, "Failed to add sysfs engine '%s'\n",
-   engine->name);
+   err = intel_engine_add_single_sysfs(engine);
+   if (err)
break;
-   }
}
 }
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.h 
b/drivers/gpu/drm/i915/gt/sysfs_engines.h
index 9546fffe03a7..2e3ec2df14a9 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.h
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.h
@@ -7,7 +7,9 @@
 #define INTEL_ENGINE_SYSFS_H
 
 struct drm_i915_private;
+struct intel_engine_cs;
 
 void intel_engines_add_sysfs(struct drm_i915_private *i915);
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine);
 
 #endif /* INTEL_ENGINE_SYSFS_H */
-- 
2.45.2

[PATCH v2 12/15] drm/i915: Protect access to the UABI engines list with a mutex

2024-08-22 Thread Andi Shyti

Until now, the UABI engines list has been accessed in read-only
mode, as it was created once during boot and destroyed upon
module unload.

In upcoming commits, we will be modifying this list by changing
the CCS mode, allowing compute engines to be dynamically added
and removed at runtime based on user whims.

To ensure thread safety and prevent race conditions, we need to
protect the engine list with a mutex, thereby serializing access
to it.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 3 +++
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 7 +++
 drivers/gpu/drm/i915/gt/sysfs_engines.c | 5 +
 drivers/gpu/drm/i915/i915_cmd_parser.c  | 2 ++
 drivers/gpu/drm/i915/i915_debugfs.c | 4 
 drivers/gpu/drm/i915/i915_drv.h | 4 
 drivers/gpu/drm/i915/i915_gem.c | 4 
 drivers/gpu/drm/i915/i915_perf.c| 2 ++
 drivers/gpu/drm/i915/i915_pmu.c | 4 
 drivers/gpu/drm/i915/i915_query.c   | 2 ++
 10 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index c0543c35cd6a..0ccbe447f51d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1124,6 +1124,7 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx,
if (!e)
return ERR_PTR(-ENOMEM);
 
+   mutex_lock(&ctx->i915->uabi_engines_mutex);
for_each_uabi_engine(engine, ctx->i915) {
struct intel_context *ce;
struct intel_sseu sseu = {};
@@ -1155,9 +1156,11 @@ static struct i915_gem_engines *default_engines(struct 
i915_gem_context *ctx,
 
}
 
+   mutex_unlock(&ctx->i915->uabi_engines_mutex);
return e;
 
 free_engines:
+   mutex_unlock(&ctx->i915->uabi_engines_mutex);
free_engines(e);
return err;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 8e5284af8335..209d5badbd3d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -210,6 +210,13 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
LIST_HEAD(engines);
 
sort_engines(i915, &engines);
+   mutex_init(&i915->uabi_engines_mutex);
+
+   /*
+* We are still booting i915 and we are sure we are running
+* single-threaded. We don't need at this point to protect the
+* uabi_engines access list with the mutex.
+*/
 
prev = NULL;
p = &i915->uabi_engines.rb_node;
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index f67f76df1cfe..c1cc0981c8fb 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -508,6 +508,11 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
i915->sysfs_engine = dir;
 
+   /*
+* We are still booting i915 and we are sure we are running
+* single-threaded. We don't need at this point to protect the
+* uabi_engines access list with the mutex.
+*/
for_each_uabi_engine(engine, i915) {
struct kobject *kobj;
 
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 2905df83e180..12987ece6f8e 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -1592,12 +1592,14 @@ int i915_cmd_parser_get_version(struct drm_i915_private 
*dev_priv)
bool active = false;
 
/* If the command parser is not enabled, report 0 - unsupported */
+   mutex_lock(&dev_priv->uabi_engines_mutex);
for_each_uabi_engine(engine, dev_priv) {
if (intel_engine_using_cmd_parser(engine)) {
active = true;
break;
}
}
+   mutex_unlock(&dev_priv->uabi_engines_mutex);
if (!active)
return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index bc717cf544e4..8b5e365eb6bd 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -459,8 +459,10 @@ static int i915_engine_info(struct seq_file *m, void 
*unused)
   to_gt(i915)->clock_period_ns);
 
p = drm_seq_file_printer(m);
+   mutex_lock(&i915->uabi_engines_mutex);
for_each_uabi_engine(engine, i915)
intel_engine_dump(engine, &p, "%s\n", engine->name);
+   mutex_unlock(&i915->uabi_engines_mutex);
 
intel_gt_show_timelines(to_gt(i915), &p, 
i915_request_show_with_schedule);
 
@@ -474,6 +476,7 @@ static int i915_wa_registers(struct seq_file *m, voi

[PATCH v2 11/15] drm/i915/gt: Store active CCS mask

2024-08-22 Thread Andi Shyti

To support upcoming patches, we need to store the current mask
for active CCS engines.

Active engines refer to those exposed to userspace via the UABI
engine list.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 41 +++--
 drivers/gpu/drm/i915/gt/intel_gt_types.h|  7 
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index edb6a4b63826..5eead7b18f57 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -12,6 +12,7 @@
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
unsigned long cslices_mask = CCS_MASK(gt);
+   unsigned long ccs_mask = gt->ccs.id_mask;
u32 mode_val = 0;
/* CCS engine id, i.e. the engines position in the engine's bitmask */
int engine;
@@ -55,7 +56,7 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 *   slice 2: ccs2
 *   slice 3: ccs3
 */
-   engine = __ffs(cslices_mask);
+   engine = __ffs(ccs_mask);
 
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
if (!(cslices_mask & BIT(cslice))) {
@@ -86,7 +87,7 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 * CCS mode, will be used later to
 * reset to a flexible value
 */
-   engine = __ffs(cslices_mask);
+   engine = __ffs(ccs_mask);
continue;
}
}
@@ -94,13 +95,45 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
gt->ccs.mode_reg_val = mode_val;
 }
 
+static void __update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+   unsigned long cslices_mask = CCS_MASK(gt);
+   int i;
+
+   /* Mask off all the CCS engines */
+   gt->ccs.id_mask = 0;
+
+   for_each_set_bit(i, &cslices_mask, I915_MAX_CCS) {
+   gt->ccs.id_mask |= BIT(i);
+
+   ccs_mode--;
+   if (!ccs_mode)
+   break;
+   }
+
+   /*
+* It's impossible for 'ccs_mode' to be zero at this point.
+* This scenario would only occur if the 'ccs_mode' provided by
+* the caller exceeded the total number of CCS engines, a condition
+* we check before calling the 'update_ccs_mask()' function.
+*/
+   GEM_BUG_ON(ccs_mode);
+
+   /* Initialize the CCS mode setting */
+   intel_gt_apply_ccs_mode(gt);
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
if (!IS_DG2(gt->i915))
return;
 
-   /* Initialize the CCS mode setting */
-   intel_gt_apply_ccs_mode(gt);
+   /*
+* Set CCS balance mode 1 in the ccs_mask.
+*
+* During init the workaround are not set up yet.
+*/
+   __update_ccs_mask(gt, 1);
 }
 
 static ssize_t num_cslices_show(struct device *dev,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 71e43071da0b..641be69016e1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -219,6 +219,13 @@ struct intel_gt {
 */
struct {
u32 mode_reg_val;
+
+   /*
+* CCS id_mask is the command streamer instance
+* exposed to the user. While the CCS_MASK(gt)
+* is the available unfused compute slices.
+*/
+   intel_engine_mask_t id_mask;
} ccs;
 
/*
-- 
2.45.2

[PATCH v2 10/15] drm/i915/gt: Store engine-related sysfs kobjects

2024-08-22 Thread Andi Shyti

Upcoming commits will need to access engine-related kobjects to
enable the creation and destruction of sysfs interfaces at
runtime.

For this, store the "engine" directory (i915->sysfs_engine), the
engine files (gt->kobj), and the default data
(gt->kobj_defaults).

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++
 drivers/gpu/drm/i915/gt/sysfs_engines.c  | 4 
 drivers/gpu/drm/i915/i915_drv.h  | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index ba55c059063d..cdc695fda918 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
u32 context_size;
u32 mmio_base;
 
+   struct kobject *kobj;
+
struct intel_engine_tlb_inv tlb_inv;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 021f51d9b456..f67f76df1cfe 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -506,6 +506,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
if (!dir)
return;
 
+   i915->sysfs_engine = dir;
+
for_each_uabi_engine(engine, i915) {
struct kobject *kobj;
 
@@ -526,6 +528,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
add_defaults(container_of(kobj, struct kobj_engine, base));
 
+   engine->kobj = kobj;
+
if (0) {
 err_object:
kobject_put(kobj);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 94f7f6cc444c..3a8a757f5bd5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -320,6 +320,7 @@ struct drm_i915_private {
struct intel_gt *gt[I915_MAX_GT];
 
struct kobject *sysfs_gt;
+   struct kobject *sysfs_engine;
 
/* Quick lookup of media GT (current platforms only have one) */
struct intel_gt *media_gt;
-- 
2.45.2

[PATCH v2 09/15] drm/i915/gt: Expose the number of total CCS slices

2024-08-22 Thread Andi Shyti

Implement a sysfs interface to show the number of available CCS
slices. The displayed number does not take into account the CCS
balancing mode.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 21 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c|  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index fc8a23fc28b6..edb6a4b63826 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -5,7 +5,9 @@
 
 #include "i915_drv.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_print.h"
 #include "intel_gt_regs.h"
+#include "intel_gt_sysfs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -100,3 +102,22 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
/* Initialize the CCS mode setting */
intel_gt_apply_ccs_mode(gt);
 }
+
+static ssize_t num_cslices_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 num_slices;
+
+   num_slices = hweight32(CCS_MASK(gt));
+
+   return sysfs_emit(buff, "%u\n", num_slices);
+}
+static DEVICE_ATTR_RO(num_cslices);
+
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
+{
+   if (sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr))
+   gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 4a6763b95a78..9696cc9017f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -9,5 +9,6 @@
 #include "intel_gt.h"
 
 void intel_gt_ccs_mode_init(struct intel_gt *gt);
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
index 33cba406b569..895eedc402ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
@@ -12,6 +12,7 @@
 #include "i915_drv.h"
 #include "i915_sysfs.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_sysfs.h"
 #include "intel_gt_sysfs_pm.h"
@@ -101,6 +102,7 @@ void intel_gt_sysfs_register(struct intel_gt *gt)
goto exit_fail;
 
intel_gt_sysfs_pm_init(gt, >->sysfs_gt);
+   intel_gt_sysfs_ccs_init(gt);
 
return;
 
-- 
2.45.2

[PATCH v2 08/15] drm/i915/gt: Remove cslices mask value from the CCS structure

2024-08-22 Thread Andi Shyti

Following the decision to manage CCS engine creation within UABI
engines, the "cslices" variable in the "ccs" structure in the
"gt" is no longer needed. Remove it is now redundant.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 5 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index a6c33b471567..fc8a23fc28b6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -9,7 +9,7 @@
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
-   unsigned long cslices_mask = gt->ccs.cslices;
+   unsigned long cslices_mask = CCS_MASK(gt);
u32 mode_val = 0;
/* CCS engine id, i.e. the engines position in the engine's bitmask */
int engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 9e257f34d05b..71e43071da0b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -218,11 +218,6 @@ struct intel_gt {
 * i.e. how the CCS streams are distributed amongs the slices.
 */
struct {
-   /*
-* Mask of the non fused CCS slices
-* to be used for the load balancing
-*/
-   intel_engine_mask_t cslices;
u32 mode_reg_val;
} ccs;
 
-- 
2.45.2

[PATCH v2 07/15] drm/i915/gt: Manage CCS engine creation within UABI exposure

2024-08-22 Thread Andi Shyti

In commit ea315f98e5d6 ("drm/i915/gt: Do not generate the command
streamer for all the CCS"), we restricted the creation of
physical CCS engines to only one stream. This allowed the user to
submit a single compute workload, with all CCS slices sharing the
workload from that stream.

This patch removes that limitation but still exposes only one
stream to the user. The physical memory for each engine remains
allocated but unused, however the user will only see one engine
exposed.

Do this by adding only one engine to the UABI list, ensuring that
only one engine is visible to the user.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 23 -
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 17 ---
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4d30a86016f2..def255ee0b96 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -876,29 +876,6 @@ static intel_engine_mask_t init_engine_mask(struct 
intel_gt *gt)
info->engine_mask &= ~BIT(GSC0);
}
 
-   /*
-* Do not create the command streamer for CCS slices beyond the first.
-* All the workload submitted to the first engine will be shared among
-* all the slices.
-*
-* Once the user will be allowed to customize the CCS mode, then this
-* check needs to be removed.
-*/
-   if (IS_DG2(gt->i915)) {
-   u8 first_ccs = __ffs(CCS_MASK(gt));
-
-   /*
-* Store the number of active cslices before
-* changing the CCS engine configuration
-*/
-   gt->ccs.cslices = CCS_MASK(gt);
-
-   /* Mask off all the CCS engine */
-   info->engine_mask &= ~GENMASK(CCS3, CCS0);
-   /* Put back in the first CCS engine */
-   info->engine_mask |= BIT(_CCS(first_ccs));
-   }
-
return info->engine_mask;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index cd7662b1ad59..8e5284af8335 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -246,6 +246,20 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
 
GEM_BUG_ON(uabi_class >=
   ARRAY_SIZE(i915->engine_uabi_class_count));
+
+   /* Fix up the mapping to match default execbuf::user_map[] */
+   add_legacy_ring(&ring, engine);
+
+   /*
+* Do not create the command streamer for CCS slices beyond the
+* first. All the workload submitted to the first engine will be
+* shared among all the slices.
+*/
+   if (IS_DG2(i915) &&
+   uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+   engine->uabi_instance)
+   goto clear_node_continue;
+
i915->engine_uabi_class_count[uabi_class]++;
 
rb_link_node(&engine->uabi_node, prev, p);
@@ -255,9 +269,6 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
engine->uabi_class,
engine->uabi_instance) != 
engine);
 
-   /* Fix up the mapping to match default execbuf::user_map[] */
-   add_legacy_ring(&ring, engine);
-
prev = &engine->uabi_node;
p = &prev->rb_right;
 
-- 
2.45.2

[PATCH v2 06/15] drm/i915/gt: Introduce for_each_enabled_engine() and apply it in selftests

2024-08-22 Thread Andi Shyti

Selftests should run only on enabled engines, as disabled engines
are not intended for use. A practical example is when, on DG2
machines, the user chooses to utilize only one CCS stream instead
of all four.

To address this, introduce the for_each_enabled_engine() loop,
which will skip engines when they are marked as RB_EMPTY.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt.h| 12 +
 drivers/gpu/drm/i915/gt/selftest_context.c|  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 52 +--
 drivers/gpu/drm/i915/gt/selftest_gt_pm.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 22 
 drivers/gpu/drm/i915/gt/selftest_lrc.c| 18 +++
 drivers/gpu/drm/i915/gt/selftest_mocs.c   |  6 +--
 drivers/gpu/drm/i915/gt/selftest_rc6.c|  4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +--
 .../drm/i915/gt/selftest_ring_submission.c|  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c| 14 ++---
 drivers/gpu/drm/i915/gt/selftest_timeline.c   | 14 ++---
 drivers/gpu/drm/i915/gt/selftest_tlb.c|  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c| 14 ++---
 17 files changed, 102 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 998ca029b73a..1c52db1b5e25 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -188,6 +188,18 @@ int intel_gt_tiles_init(struct drm_i915_private *i915);
 (id__)++) \
for_each_if ((engine__) = (gt__)->engine[(id__)])
 
+/*
+ * Iterator over all initialized and enabled engines. Some engines, like CCS,
+ * may be "disabled" (i.e., not exposed to the user). Disabling is indicated
+ * by marking the rb_node as empty.
+ */
+#define for_each_enabled_engine(engine__, gt__, id__) \
+   for ((id__) = 0; \
+(id__) < I915_NUM_ENGINES; \
+(id__)++) \
+   for_each_if ( ((engine__) = (gt__)->engine[(id__)]) && \
+ (!RB_EMPTY_NODE(&(engine__)->uabi_node)) )
+
 /* Iterator over subset of engines selected by mask */
 #define for_each_engine_masked(engine__, gt__, mask__, tmp__) \
for ((tmp__) = (mask__) & (gt__)->info.engine_mask; \
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
b/drivers/gpu/drm/i915/gt/selftest_context.c
index 5eb46700dc4e..9976e231248d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -157,7 +157,7 @@ static int live_context_size(void *arg)
 * HW tries to write past the end of one.
 */
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct file *saved;
 
if (!engine->context_size)
@@ -311,7 +311,7 @@ static int live_active_context(void *arg)
enum intel_engine_id id;
int err = 0;
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
err = __live_active_context(engine);
if (err)
break;
@@ -424,7 +424,7 @@ static int live_remote_context(void *arg)
enum intel_engine_id id;
int err = 0;
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
err = __live_remote_context(engine);
if (err)
break;
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
index 5ffa5e30f419..038723a401df 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
@@ -142,7 +142,7 @@ static int perf_mi_bb_start(void *arg)
return 0;
 
wakeref = perf_begin(gt);
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct intel_context *ce = engine->kernel_context;
struct i915_vma *batch;
u32 cycles[COUNT];
@@ -270,7 +270,7 @@ static int perf_mi_noop(void *arg)
return 0;
 
wakeref = perf_begin(gt);
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct intel_context *ce = engine->kernel_context;
struct i915_vma *base, *nop;
u32 cycles[COUNT];
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index 9e4f0e417b3b..74d4c2dc69cf 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -160,7 +160,7 @@ static int live_idle_flush(void *arg)

[PATCH v2 05/15] drm/i915/gem: Mark and verify UABI engine validity

2024-08-22 Thread Andi Shyti

Mark engines as invalid when they are not added to the UABI list
to prevent accidental assignment of batch buffers.

Currently, this change is mostly precautionary with minimal
impact. However, in the future, when CCS engines will be
dynamically added and removed by the user, this mechanism will
be used for determining engine validity.

Signed-off-by: Andi Shyti 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 28 +--
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  9 --
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c58290274f97..770875e72056 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2682,6 +2682,22 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
return user_ring_map[user_ring_id];
 }
 
+static bool engine_valid(struct intel_context *ce)
+{
+   if (!intel_engine_is_virtual(ce->engine))
+   return !RB_EMPTY_NODE(&ce->engine->uabi_node);
+
+   /*
+* TODO: check virtual sibilings; we need to walk through all the
+* virtual engines and ask whether the physical engine where it is based
+* is still valid. For each of them we need to check with
+* RB_EMPTY_NODE(...)
+*
+* This can be a placed in a new ce_ops.
+*/
+   return true;
+}
+
 static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
@@ -2712,8 +2728,6 @@ eb_select_engine(struct i915_execbuffer *eb)
eb->num_batches = ce->parallel.number_children + 1;
gt = ce->engine->gt;
 
-   for_each_child(ce, child)
-   intel_context_get(child);
eb->wakeref = intel_gt_pm_get(ce->engine->gt);
/*
 * Keep GT0 active on MTL so that i915_vma_parked() doesn't
@@ -2722,6 +2736,16 @@ eb_select_engine(struct i915_execbuffer *eb)
if (gt->info.id)
eb->wakeref_gt0 = intel_gt_pm_get(to_gt(gt->i915));
 
+   /* We need to hold the wakeref to stabilize i915->uabi_engines */
+   if (!engine_valid(ce)) {
+   intel_context_put(ce);
+   err = -ENODEV;
+   goto err;
+   }
+
+   for_each_child(ce, child)
+   intel_context_get(child);
+
if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
err = intel_context_alloc_state(ce);
if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 11cc06c0c785..cd7662b1ad59 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -220,7 +220,7 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
container_of(it, typeof(*engine), uabi_list);
 
if (intel_gt_has_unrecoverable_error(engine->gt))
-   continue; /* ignore incomplete engines */
+   goto clear_node_continue; /* ignore incomplete engines 
*/
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
engine->uabi_class = uabi_classes[engine->class];
@@ -242,7 +242,7 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
  engine->uabi_instance);
 
if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
-   continue;
+   goto clear_node_continue;
 
GEM_BUG_ON(uabi_class >=
   ARRAY_SIZE(i915->engine_uabi_class_count));
@@ -260,6 +260,11 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
 
prev = &engine->uabi_node;
p = &prev->rb_right;
+
+   continue;
+
+clear_node_continue:
+   RB_CLEAR_NODE(&engine->uabi_node);
}
 
if (IS_ENABLED(CONFIG_DRM_I915_SELFTESTS) &&
-- 
2.45.2

[PATCH v2 04/15] drm/i915/gt: Refactor uabi engine class/instance list creation

2024-08-22 Thread Andi Shyti

For the upcoming changes we need a cleaner way to build the list
of uabi engines.

Suggested-by: Tvrtko Ursulin 
Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 -
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..11cc06c0c785 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, 
const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)
 {
-   u16 name_instance, other_instance = 0;
+   u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
struct legacy_ring ring = {};
struct list_head *it, *next;
struct rb_node **p, *prev;
@@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
prev = NULL;
p = &i915->uabi_engines.rb_node;
list_for_each_safe(it, next, &engines) {
+   u16 uabi_class;
+
struct intel_engine_cs *engine =
container_of(it, typeof(*engine), uabi_list);
 
@@ -222,15 +224,14 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
engine->uabi_class = uabi_classes[engine->class];
-   if (engine->uabi_class == I915_NO_UABI_CLASS) {
-   name_instance = other_instance++;
-   } else {
-   GEM_BUG_ON(engine->uabi_class >=
-  ARRAY_SIZE(i915->engine_uabi_class_count));
-   name_instance =
-   
i915->engine_uabi_class_count[engine->uabi_class]++;
-   }
-   engine->uabi_instance = name_instance;
+
+   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+   else
+   uabi_class = engine->uabi_class;
+
+   GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+   engine->uabi_instance = class_instance[uabi_class]++;
 
/*
 * Replace the internal name with the final user and log facing
@@ -238,11 +239,15 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 */
engine_rename(engine,
  intel_engine_class_repr(engine->class),
- name_instance);
+ engine->uabi_instance);
 
-   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
continue;
 
+   GEM_BUG_ON(uabi_class >=
+  ARRAY_SIZE(i915->engine_uabi_class_count));
+   i915->engine_uabi_class_count[uabi_class]++;
+
rb_link_node(&engine->uabi_node, prev, p);
rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
-- 
2.45.2

[PATCH v2 03/15] drm/i915/gt: Allow the creation of multi-mode CCS masks

2024-08-22 Thread Andi Shyti

Until now, we have only set CCS mode balancing to 1, which means
that only one compute engine is exposed to the user. The stream
of compute commands submitted to that engine is then shared among
all the dedicated execution units.

This is done by calling the 'intel_gt_apply_ccs_mode(); function.

With this change, the aforementioned function takes an additional
parameter called 'mode' that specifies the desired mode to be set
for the CCS engines balancing. The mode parameter can have the
following values:

 - mode = 0: CCS load balancing mode 1 (1 CCS engine exposed)
 - mode = 1: CCS load balancing mode 2 (2 CCS engines exposed)
 - mode = 3: CCS load balancing mode 4 (4 CCS engines exposed)

This allows us to generate the appropriate register value to be
written to CCS_MODE, configuring how the exposed engine streams
will be submitted to the execution units.

No functional changes are intended yet, as no mode higher than
'0' is currently being set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 85 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 2 files changed, 72 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index fcd07eb4728b..a6c33b471567 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,35 +4,92 @@
  */
 
 #include "i915_drv.h"
-#include "intel_gt.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
+   unsigned long cslices_mask = gt->ccs.cslices;
+   u32 mode_val = 0;
+   /* CCS engine id, i.e. the engines position in the engine's bitmask */
+   int engine;
int cslice;
-   u32 mode = 0;
-   int first_ccs = __ffs(CCS_MASK(gt));
 
-   /* Build the value for the fixed CCS load balancing */
+   /*
+* The mode has two bit dedicated for each engine
+* that will be used for the CCS balancing algorithm:
+*
+*BIT | CCS slice
+*   --
+* 0  | CCS slice
+* 1  | 0
+*   --
+* 2  | CCS slice
+* 3  | 1
+*   --
+* 4  | CCS slice
+* 5  | 2
+*   --
+* 6  | CCS slice
+* 7  | 3
+*   --
+*
+* When a CCS slice is not available, then we will write 0x7,
+* oterwise we will write the user engine id which load will
+* be forwarded to that slice.
+*
+* The possible configurations are:
+*
+* 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*/
+   engine = __ffs(cslices_mask);
+
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
-   if (gt->ccs.cslices & BIT(cslice))
+   if (!(cslices_mask & BIT(cslice))) {
/*
-* If available, assign the cslice
-* to the first available engine...
+* If not available, mark the slice as unavailable
+* and no task will be dispatched here.
 */
-   mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice,
+XEHP_CCS_MODE_CSLICE_MASK);
+   continue;
+   }
 
-   else
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice, engine);
+
+   engine = find_next_bit(&cslices_mask, I915_MAX_CCS, engine + 1);
+   /*
+* If "engine" has reached the I915_MAX_CCS value it means that
+* we have gone through all the unfused engines and now we need
+* to reset its value to the first engine.
+*
+* From the find_next_bit() description:
+*
+* "Returns the bit number for the next set bit
+* If no bits are set, returns @size."
+*/
+   if (engine == I915_MAX_CCS) {
/*
-* ... otherwise, mark the cslice as
-* unavailable if no CCS dispatches here
+* CCS mode, will be used later to
+* re

[PATCH v2 02/15] drm/i915/gt: Move the CCS mode variable to a global position

2024-08-22 Thread Andi Shyti

Store the CCS mode value in the intel_gt->ccs structure to make
it available for future instances that may need to change its
value.

Name it mode_reg_val because it holds the value that will
be written into the CCS_MODE register, determining the CCS
balancing and, consequently, the number of engines generated.

No functional changes intended.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt.c  |  3 +++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 16 +++-
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 11 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 --
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a6c69a706fd7..5af0527d822d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -18,6 +18,7 @@
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_debugfs.h"
 #include "intel_gt_mcr.h"
@@ -136,6 +137,8 @@ int intel_gt_init_mmio(struct intel_gt *gt)
intel_sseu_info_init(gt);
intel_gt_mcr_init(gt);
 
+   intel_gt_ccs_mode_init(gt);
+
return intel_engines_init_mmio(gt);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 3c62a44e9106..fcd07eb4728b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -8,15 +8,12 @@
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
int cslice;
u32 mode = 0;
int first_ccs = __ffs(CCS_MASK(gt));
 
-   if (!IS_DG2(gt->i915))
-   return 0;
-
/* Build the value for the fixed CCS load balancing */
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
if (gt->ccs.cslices & BIT(cslice))
@@ -35,5 +32,14 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
 XEHP_CCS_MODE_CSLICE_MASK);
}
 
-   return mode;
+   gt->ccs.mode_reg_val = mode;
+}
+
+void intel_gt_ccs_mode_init(struct intel_gt *gt)
+{
+   if (!IS_DG2(gt->i915))
+   return;
+
+   /* Initialize the CCS mode setting */
+   intel_gt_apply_ccs_mode(gt);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 55547f2ff426..0f2506586a41 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -8,6 +8,6 @@
 
 struct intel_gt;
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt);
+void intel_gt_ccs_mode_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index bcee084b1f27..9e257f34d05b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -207,12 +207,23 @@ struct intel_gt {
[MAX_ENGINE_INSTANCE + 1];
enum intel_submission_method submission_method;
 
+   /*
+* Track fixed mapping between CCS engines and compute slices.
+*
+* In order to w/a HW that has the inability to dynamically load
+* balance between CCS engines and EU in the compute slices, we have to
+* reconfigure a static mapping on the fly.
+*
+* The mode variable is set by the user and sets the balancing mode,
+* i.e. how the CCS streams are distributed amongs the slices.
+*/
struct {
/*
 * Mask of the non fused CCS slices
 * to be used for the load balancing
 */
intel_engine_mask_t cslices;
+   u32 mode_reg_val;
} ccs;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index f3082fad3f45..f6135be3cd86 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2727,7 +2727,7 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
 static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct 
i915_wa_list *wal)
 {
struct intel_gt *gt = engine->gt;
-   u32 mode;
+   u32 mode = gt->ccs.mode_reg_val;
 
if (!IS_DG2(gt->i915))
return;
@@ -2743,8 +2743,10 @@ static void ccs_engine_wa_mode(struct intel_engine_cs 
*engine, struct i915_wa_li
/*
 * After hav

[PATCH v2 01/15] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting

2024-08-22 Thread Andi Shyti

When setting the CCS mode, we mistakenly used wa_masked_en() to
apply the workaround, which reads from the register and masks the
existing value with the new one.

Our intention was to write the value directly, without masking
it.

So far, this hasn't caused issues because we've been using a
register value that only enables a single CCS engine, typically
with an ID of '0'.

However, in upcoming patches, we will be utilizing multiple
engines, and it's crucial that we write the new value directly
without any masking.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index bfe6d8fc820f..f3082fad3f45 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2745,7 +2745,7 @@ static void ccs_engine_wa_mode(struct intel_engine_cs 
*engine, struct i915_wa_li
 * assign all slices to a single CCS. We will call it CCS mode 1
 */
mode = intel_gt_apply_ccs_mode(gt);
-   wa_masked_en(wal, XEHP_CCS_MODE, mode);
+   wa_add(wal, XEHP_CCS_MODE, 0, mode, mode, false);
 }
 
 /*
-- 
2.45.2

[PATCH v2 00/15] CCS static load balance

2024-08-22 Thread Andi Shyti

Hi,

This patch series introduces static load balancing for GPUs with
multiple compute engines. It's a lengthy series, and some
challenging aspects still need to be resolved.

I have tried to split the work as much as possible to facilitate
the review process.

To summarize, in patches 1 to 14, no functional changes occur
except for the addition of the num_cslices interface. The
significant changes happen in patch 15, which is the core part of
the CCS mode setting, utilizing the groundwork laid in the
earlier patches.

In this updated approach, the focus is now on managing the UABI
engine list, which controls the engines exposed to userspace.
Instead of manipulating phuscal engines and their memory, we now
handle engine exposure through this list.

I would greatly appreciate further input from all reviewers who
have already assisted with the previous work.

IGT tests have also been developed, but I haven't sent them yet.

Thank you Chris for the offline reviews.

Thanks,
Andi

Changelog:
==
PATCHv1 -> PATCHv2
--
 - Use uabi_mutex to protect the uabi_engines, not the engine
   itself. Rename it to uabi_engines_mutex.
 - Use kobject_add/kobject_del for adding and removing
   interfaces, this way we don't need to destroy and recreate the
   engines, anymore. Refactor intel_engine_add_single_sysfs() to
   reflect this scenario.
 - After adding engines to the rb_tree check that they have been
   added correctly.
 - Fix rb_find_add() compare function to take into accoung also
   the class, not just the instance.

RFCv2 -> PATCHv1

 - Removed gt->ccs.mutex
 - Rename m -> width, ccs_id -> engine in
   intel_gt_apply_ccs_mode().
 - In the CCS register value calculation
   (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
   along the ccs_mask (set by the user) instead of the
   cslice_mask.
 - Add GEM_BUG_ON after calculating the new ccs_mask
   (update_ccs_mask()) to make sure all angines have been
   evaluated (i.e. ccs_mask must be '0' at the end of the
   algorithm).
 - move wakeref lock before evaluating intel_gt_pm_is_awake() and
   fix exit path accordingly.
 - Use a more compact form in intel_gt_sysfs_ccs_init() and
   add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
   need to store the return value to the err variable which is
   unused. Get rid of err.
 - Print a warnging instead of a debug message if we fail to
   create the sysfs files.
 - If engine files creation fails in
   intel_engine_add_single_sysfs(), print a warning, not an
   error.
 - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
   to explain its purpose.
 - During uabi engine creation, in
   intel_engines_driver_register(), the uabi_ccs_instance is
   redundant because the ccs_instances is already tracked in
   engine->uabi_instance.
 - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
   __maybe_unused not to break bisectability. They wouldn't
   compile in their own commit. They will be used in the next
   patch and the __maybe_unused is removed.
 - Update engine's workaround every time a new mode is set in
   update_ccs_mask().
 - Mark engines as valid or invalid using their status as
   rb_node. Invalid engines are marked as invalid using
   RB_CLEAR_NODE(). Execbufs will check for their validity when
   selecting the engine to be combined to a context.
 - Create for_each_enabled_engine() which skips the non valid
   engines and use it in selftests.

RFCv1 -> RFCv2
--
Compared to the first version I've taken a completely different
approach to adding and removing engines. in v1 physical engines
were directly added and removed, along with the memory allocated
to them, each time the user changed the CCS mode (from the
previous cover letter).

Andi Shyti (15):
  drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  drm/i915/gt: Move the CCS mode variable to a global position
  drm/i915/gt: Allow the creation of multi-mode CCS masks
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gem: Mark and verify UABI engine validity
  drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
selftests
  drm/i915/gt: Manage CCS engine creation within UABI exposure
  drm/i915/gt: Remove cslices mask value from the CCS structure
  drm/i915/gt: Expose the number of total CCS slices
  drm/i915/gt: Store engine-related sysfs kobjects
  drm/i915/gt: Store active CCS mask
  drm/i915: Protect access to the UABI engines list with a mutex
  drm/i915/gt: Isolate single sysfs engine file creation
  drm/i915/gt: Implement creation and removal routines for CCS engines
  drm/i915/gt: Allow the user to change the CCS mode through sysfs

 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   3 +
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  28 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  23 --
 drivers/gpu/drm/i915/gt/intel_engine_ty

Re: [PATCH v5] drm/i915/hwmon: expose fan speed

2024-08-21 Thread Andi Shyti

> > > + /*
> > > +  * HW register value is accumulated count of pulses from
> > > +  * PWM fan with the scale of 2 pulses per rotation.
> > > +  */
> > > + rotations = pulses / 2;
> > > +
> > > + time = jiffies_delta_to_msecs(time_now - fi->time_prev);
> > > + if (unlikely(!time)) {
> > > + ret = -EAGAIN;
> > > + goto exit;
> > > + }
> > 
> > Can you please add a comment describing how you obtain the speed
> > calculation?
> 
> That's what I initially tried but ended up dropping it in favour of RPM
> formula below, which I found to be doing a better job of explaining than
> a few lines of description.
> 
> > Basically at every read you store the values. Is it possible that
> > we don't have reads for a long time and the register resets more
> > than once?
> 
> Considering a fan continuously running at higher speeds (for example 4000 RPM
> which is quite optimistic), with the scale of 2 pulses per rotation, a 32 bit
> register will take around a year to overflow, which is more than most usecases
> I could think of.

Which can be considered as a worse case scenario. I would have
preferred here a runtime calculation, which means read now, wait
a bit, read again and calculate. The read might be slow, but
efficient.

Anyway, your argument makes sense, so that I'm not going to push
on this, I already r-b'ed it.

Thanks,
Andi

Re: [PATCH v6] drm/i915/hwmon: expose fan speed

2024-08-21 Thread Andi Shyti

Hi Raag,

> + /*
> +  * Calculate fan speed in RPM by time averaging two subsequent
> +  * readings in minutes.
> +  * RPM = number of rotations * msecs per minute / time in msecs
> +  */
> + *val = DIV_ROUND_UP(rotations * (MSEC_PER_SEC * 60), time);

once you find the correct operator here, please feel free to add:

Reviewed-by: Andi Shyti 

Normally these get me confused, I'm happy Andy is looking after
it :-)

Thanks,
Andi

[CI] drm/i915/gt: Use kmemdup_array instead of kmemdup for multiple allocation

2024-08-21 Thread Andi Shyti

From: Yu Jiaoliang 

Let the kememdup_array() take care about multiplication and possible
overflows.

v2:
- Change subject
- Leave one blank line between the commit log and the tag section
- Fix code alignment issue

v3:
- Fix code alignment
- Apply the patch on a clean drm-tip

Signed-off-by: Yu Jiaoliang 
Reviewed-by: Jani Nikula 
Reviewed-by: Andi Shyti 
Signed-off-by: Andi Shyti 
---
Hi Yu,

I am resending this patch because this never reached the mailing
list and it didn't get picked up by CI. Maybe it's going through
some manual spam check so that it might take a bit of time.

You should have seen it here[*] as you can see your previous
patches.

Andi

[*] 
https://patchwork.freedesktop.org/project/intel-gfx/series/?ordering=-last_updated

 drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index bfe6d8fc820f..baa609bdf7cb 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -111,9 +111,8 @@ static void wa_init_finish(struct i915_wa_list *wal)
 {
/* Trim unused entries. */
if (!IS_ALIGNED(wal->count, WA_LIST_CHUNK)) {
-   struct i915_wa *list = kmemdup(wal->list,
-  wal->count * sizeof(*list),
-  GFP_KERNEL);
+   struct i915_wa *list = kmemdup_array(wal->list, wal->count,
+sizeof(*list), GFP_KERNEL);
 
if (list) {
kfree(wal->list);
-- 
2.45.2

[PATCH v1 14/14] drm/i915/gt: Allow the user to change the CCS mode through sysfs

2024-08-21 Thread Andi Shyti

Create the 'ccs_mode' file under

/sys/class/drm/cardX/gt/gt0/ccs_mode

This file allows the user to read and set the current CCS mode.

 - Reading: The user can read the current CCS mode, which can be
   1, 2, or 4. This value is derived from the current engine
   mask.

 - Writing: The user can set the CCS mode to 1, 2, or 4,
   depending on the desired number of exposed engines and the
   required load balancing.

The interface will return -EBUSY if other clients are connected
to i915, or -EINVAL if an invalid value is set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 82 -
 1 file changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 82de29eb4dd7..ffdcc98b0802 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -5,6 +5,7 @@
 
 #include "i915_drv.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_pm.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
@@ -160,7 +161,7 @@ static int rb_engine_cmp(struct rb_node *rb_new, const 
struct rb_node *rb_old)
return new->uabi_instance - old->uabi_instance;
 }
 
-static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 
ccs_mode)
+static void add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mode)
 {
struct drm_i915_private *i915 = gt->i915;
intel_engine_mask_t new_ccs_mask, tmp;
@@ -194,7 +195,7 @@ static void __maybe_unused add_uabi_ccs_engines(struct 
intel_gt *gt, u32 ccs_mod
}
 }
 
-static void __maybe_unused remove_uabi_ccs_engines(struct intel_gt *gt, u8 
ccs_mode)
+static void remove_uabi_ccs_engines(struct intel_gt *gt, u8 ccs_mode)
 {
struct drm_i915_private *i915 = gt->i915;
intel_engine_mask_t new_ccs_mask, tmp;
@@ -240,8 +241,85 @@ static ssize_t num_cslices_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(num_cslices);
 
+static ssize_t ccs_mode_show(struct device *dev,
+struct device_attribute *attr, char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 ccs_mode;
+
+   ccs_mode = hweight32(gt->ccs.id_mask);
+
+   return sysfs_emit(buff, "%u\n", ccs_mode);
+}
+
+static ssize_t ccs_mode_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buff, size_t count)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   int num_cslices = hweight32(CCS_MASK(gt));
+   int ccs_mode = hweight32(gt->ccs.id_mask);
+   ssize_t ret;
+   u32 val;
+
+   ret = kstrtou32(buff, 0, &val);
+   if (ret)
+   return ret;
+
+   /*
+* As of now possible values to be set are 1, 2, 4,
+* up to the maximum number of available slices
+*/
+   if (!val || val > num_cslices || (num_cslices % val))
+   return -EINVAL;
+
+   /* Let's wait until the GT is no longer in use */
+   ret = intel_gt_pm_wait_for_idle(gt);
+   if (ret)
+   return ret;
+
+   mutex_lock(>->wakeref.mutex);
+
+   /*
+* Let's check again that the GT is idle,
+* we don't want to change the CCS mode
+* while someone is using the GT
+*/
+   if (intel_gt_pm_is_awake(gt)) {
+   ret = -EBUSY;
+   goto out;
+   }
+
+   /*
+* Nothing to do if the requested setting
+* is the same as the current one
+*/
+   if (val == ccs_mode)
+   goto out;
+   else if (val > ccs_mode)
+   add_uabi_ccs_engines(gt, val);
+   else
+   remove_uabi_ccs_engines(gt, val);
+
+out:
+   mutex_unlock(>->wakeref.mutex);
+
+   return ret ?: count;
+}
+static DEVICE_ATTR_RW(ccs_mode);
+
 void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
 {
if (sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr))
gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+
+   /*
+* Do not create the ccs_mode file for non DG2 platforms
+* because they don't need it as they have only one CCS engine
+*/
+   if (!IS_DG2(gt->i915))
+   return;
+
+   if (sysfs_create_file(>->sysfs_gt, &dev_attr_ccs_mode.attr))
+   gt_warn(gt, "Failed to create sysfs ccs_mode files\n");
 }
-- 
2.45.2

[PATCH v1 13/14] drm/i915/gt: Implement creation and removal routines for CCS engines

2024-08-21 Thread Andi Shyti

In preparation for upcoming patches, we need routines to
dynamically create and destroy CCS engines based on the CCS mode
that the user wants to set.

The process begins by calculating the engine mask for the engines
that need to be added or removed. We then update the UABI list of
exposed engines and create or destroy the corresponding sysfs
interfaces accordingly.

These functions are not yet in use, so no functional changes are
intended at this stage.

Mark the functions 'add_uabi_ccs_engines()' and
'remove_uabi_ccs_engines()' as '__maybe_unused' to ensure
successful compilation and maintain bisectability. This
annotation will be removed in subsequent commits.

Use a mutex to control the changes to the uabi engine.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  5 +
 drivers/gpu/drm/i915/gt/intel_engine_user.c  |  2 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c  | 99 
 3 files changed, 106 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index cdc695fda918..28a81e33dbe1 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -413,6 +413,11 @@ struct intel_engine_cs {
struct list_head uabi_list;
struct rb_node uabi_node;
};
+   /*
+* Serialize changes if the engine status, validity (through
+* RB_CLEAR_NODE) and insertion and removal from uabi list
+*/
+   struct mutex uabi_mutex;
 
struct intel_sseu sseu;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 8e5284af8335..c5006016 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -219,6 +219,8 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
struct intel_engine_cs *engine =
container_of(it, typeof(*engine), uabi_list);
 
+   mutex_init(&engine->uabi_mutex);
+
if (intel_gt_has_unrecoverable_error(engine->gt))
goto clear_node_continue; /* ignore incomplete engines 
*/
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 45e9280f9bac..82de29eb4dd7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -8,6 +8,7 @@
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
+#include "sysfs_engines.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -115,6 +116,29 @@ static void __update_ccs_mask(struct intel_gt *gt, u32 
ccs_mode)
intel_gt_apply_ccs_mode(gt);
 }
 
+static void update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   __update_ccs_mask(gt, ccs_mode);
+
+   /* Update workaround values */
+   for_each_engine_masked(engine, gt, gt->ccs.id_mask, tmp) {
+   struct i915_wa_list *wal = &engine->wa_list;
+   struct i915_wa *wa;
+   int i;
+
+   for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
+   if (!i915_mmio_reg_equal(wa->reg, XEHP_CCS_MODE))
+   continue;
+
+   wa->set = gt->ccs.mode_reg_val;
+   wa->read = gt->ccs.mode_reg_val;
+   }
+   }
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
if (!IS_DG2(gt->i915))
@@ -128,6 +152,81 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
__update_ccs_mask(gt, 1);
 }
 
+static int rb_engine_cmp(struct rb_node *rb_new, const struct rb_node *rb_old)
+{
+   struct intel_engine_cs *new = rb_to_uabi_engine(rb_new);
+   struct intel_engine_cs *old = rb_to_uabi_engine(rb_old);
+
+   return new->uabi_instance - old->uabi_instance;
+}
+
+static void __maybe_unused add_uabi_ccs_engines(struct intel_gt *gt, u32 
ccs_mode)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_engine_mask_t new_ccs_mask, tmp;
+   struct intel_engine_cs *e;
+
+   /* Store the current ccs mask */
+   new_ccs_mask = gt->ccs.id_mask;
+   update_ccs_mask(gt, ccs_mode);
+
+   /*
+* Store only the mask of the CCS engines that need to be added by
+* removing from the new mask the engines that are already active
+*/
+   new_ccs_mask = gt->ccs.id_mask & ~new_ccs_mask;
+   new_ccs_mask <<= CCS0;
+
+   for_each_engine_masked(e, gt, new_ccs_mask, tmp) {
+   mutex_lock(&e->uabi_mutex);
+
+   i915->engine_uabi_class_count[I915_ENGINE_CLASS_COMPUTE]++;
+
+   /

[PATCH v1 12/14] drm/i915/gt: Isolate single sysfs engine file creation

2024-08-21 Thread Andi Shyti

In preparation for upcoming patches, we need the ability to
create and remove individual sysfs files. To facilitate this,
extract from the intel_engines_add_sysfs() function the creation
of individual files.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/sysfs_engines.c | 76 -
 drivers/gpu/drm/i915/gt/sysfs_engines.h |  2 +
 2 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index f67f76df1cfe..fd1685f81505 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_engine.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_gt_print.h"
 #include "sysfs_engines.h"
 
 struct kobj_engine {
@@ -481,7 +482,7 @@ static void add_defaults(struct kobj_engine *parent)
return;
 }
 
-void intel_engines_add_sysfs(struct drm_i915_private *i915)
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine)
 {
static const struct attribute * const files[] = {
&name_attr.attr,
@@ -497,7 +498,50 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 #endif
NULL
};
+   struct kobject *dir = engine->i915->sysfs_engine;
+   struct kobject *kobj = engine->kobj;
+   int err;
+
+   if (!kobj) {
+   kobj = kobj_engine(dir, engine);
+   if (!kobj) {
+   err = -EFAULT;
+   goto err_engine;
+   }
+   }
+
+   err = sysfs_create_files(kobj, files);
+   if (err)
+   goto err_object;
+
+   if (intel_engine_has_timeslices(engine)) {
+   err = sysfs_create_file(kobj, ×lice_duration_attr.attr);
+   if (err)
+   goto err_object;
+   }
+
+   if (intel_engine_has_preempt_reset(engine)) {
+   err = sysfs_create_file(kobj, &preempt_timeout_attr.attr);
+   if (err)
+   goto err_object;
+   }
 
+   add_defaults(container_of(kobj, struct kobj_engine, base));
+
+   engine->kobj = kobj;
+
+   return 0;
+
+err_object:
+   kobject_put(kobj);
+err_engine:
+   gt_warn(engine->gt, "Failed to add sysfs engine '%s'\n", engine->name);
+
+   return err;
+}
+
+void intel_engines_add_sysfs(struct drm_i915_private *i915)
+{
struct device *kdev = i915->drm.primary->kdev;
struct intel_engine_cs *engine;
struct kobject *dir;
@@ -509,34 +553,10 @@ void intel_engines_add_sysfs(struct drm_i915_private 
*i915)
i915->sysfs_engine = dir;
 
for_each_uabi_engine(engine, i915) {
-   struct kobject *kobj;
-
-   kobj = kobj_engine(dir, engine);
-   if (!kobj)
-   goto err_engine;
-
-   if (sysfs_create_files(kobj, files))
-   goto err_object;
+   int err;
 
-   if (intel_engine_has_timeslices(engine) &&
-   sysfs_create_file(kobj, ×lice_duration_attr.attr))
-   goto err_engine;
-
-   if (intel_engine_has_preempt_reset(engine) &&
-   sysfs_create_file(kobj, &preempt_timeout_attr.attr))
-   goto err_engine;
-
-   add_defaults(container_of(kobj, struct kobj_engine, base));
-
-   engine->kobj = kobj;
-
-   if (0) {
-err_object:
-   kobject_put(kobj);
-err_engine:
-   dev_err(kdev, "Failed to add sysfs engine '%s'\n",
-   engine->name);
+   err = intel_engine_add_single_sysfs(engine);
+   if (err)
break;
-   }
}
 }
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.h 
b/drivers/gpu/drm/i915/gt/sysfs_engines.h
index 9546fffe03a7..2e3ec2df14a9 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.h
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.h
@@ -7,7 +7,9 @@
 #define INTEL_ENGINE_SYSFS_H
 
 struct drm_i915_private;
+struct intel_engine_cs;
 
 void intel_engines_add_sysfs(struct drm_i915_private *i915);
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine);
 
 #endif /* INTEL_ENGINE_SYSFS_H */
-- 
2.45.2

[PATCH v1 11/14] drm/i915/gt: Store active CCS mask

2024-08-21 Thread Andi Shyti

To support upcoming patches, we need to store the current mask
for active CCS engines.

Active engines refer to those exposed to userspace via the UABI
engine list.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 54 -
 drivers/gpu/drm/i915/gt/intel_gt_types.h|  7 +++
 2 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index ed3ad881a89d..45e9280f9bac 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -12,9 +12,10 @@
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
unsigned long cslices_mask = CCS_MASK(gt);
-   u32 mode_val = 0;
+   unsigned long ccs_mask = gt->ccs.id_mask;
/* CCS mode, i.e. number of CCS engines to be enabled */
-   u32 width = 1;
+   u32 width = hweight32(ccs_mask);
+   u32 mode_val = 0;
/* CCS engine id, i.e. the engines position in the engine's bitmask */
int engine;
int cslice;
@@ -57,7 +58,7 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 *   slice 2: ccs2
 *   slice 3: ccs3
 */
-   engine = __ffs(cslices_mask);
+   engine = __ffs(ccs_mask);
 
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
if (!(cslices_mask & BIT(cslice))) {
@@ -73,29 +74,58 @@ static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
mode_val |= XEHP_CCS_MODE_CSLICE(cslice, engine);
 
if (!width) {
-   /*
-* CCS mode, will be used later to
-* reset to a flexible value
-*/
-   width = 1;
-   engine = __ffs(cslices_mask);
+   /* CCS mode, reset to the initial mode */
+   width = hweight32(ccs_mask);
+   engine = __ffs(ccs_mask);
continue;
}
 
width--;
-   engine = find_next_bit(&cslices_mask, I915_MAX_CCS, engine + 1);
+   engine = find_next_bit(&ccs_mask, I915_MAX_CCS, engine + 1);
}
 
gt->ccs.mode_reg_val = mode_val;
 }
 
+static void __update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+   unsigned long cslices_mask = CCS_MASK(gt);
+   int i;
+
+   /* Mask off all the CCS engines */
+   gt->ccs.id_mask = 0;
+
+   for_each_set_bit(i, &cslices_mask, I915_MAX_CCS) {
+   gt->ccs.id_mask |= BIT(i);
+
+   ccs_mode--;
+   if (!ccs_mode)
+   break;
+   }
+
+   /*
+* It's impossible for 'ccs_mode' to be zero at this point.
+* This scenario would only occur if the 'ccs_mode' provided by
+* the caller exceeded the total number of CCS engines, a condition
+* we check before calling the 'update_ccs_mask()' function.
+*/
+   GEM_BUG_ON(ccs_mode);
+
+   /* Initialize the CCS mode setting */
+   intel_gt_apply_ccs_mode(gt);
+}
+
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
if (!IS_DG2(gt->i915))
return;
 
-   /* Initialize the CCS mode setting */
-   intel_gt_apply_ccs_mode(gt);
+   /*
+* Set CCS balance mode 1 in the ccs_mask.
+*
+* During init the workaround are not set up yet.
+*/
+   __update_ccs_mask(gt, 1);
 }
 
 static ssize_t num_cslices_show(struct device *dev,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 71e43071da0b..641be69016e1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -219,6 +219,13 @@ struct intel_gt {
 */
struct {
u32 mode_reg_val;
+
+   /*
+* CCS id_mask is the command streamer instance
+* exposed to the user. While the CCS_MASK(gt)
+* is the available unfused compute slices.
+*/
+   intel_engine_mask_t id_mask;
} ccs;
 
/*
-- 
2.45.2

[PATCH v1 10/14] drm/i915/gt: Store engine-related sysfs kobjects

2024-08-21 Thread Andi Shyti

Upcoming commits will need to access engine-related kobjects to
enable the creation and destruction of sysfs interfaces at
runtime.

For this, store the "engine" directory (i915->sysfs_engine), the
engine files (gt->kobj), and the default data
(gt->kobj_defaults).

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++
 drivers/gpu/drm/i915/gt/sysfs_engines.c  | 4 
 drivers/gpu/drm/i915/i915_drv.h  | 1 +
 3 files changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index ba55c059063d..cdc695fda918 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
u32 context_size;
u32 mmio_base;
 
+   struct kobject *kobj;
+
struct intel_engine_tlb_inv tlb_inv;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 021f51d9b456..f67f76df1cfe 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -506,6 +506,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
if (!dir)
return;
 
+   i915->sysfs_engine = dir;
+
for_each_uabi_engine(engine, i915) {
struct kobject *kobj;
 
@@ -526,6 +528,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
add_defaults(container_of(kobj, struct kobj_engine, base));
 
+   engine->kobj = kobj;
+
if (0) {
 err_object:
kobject_put(kobj);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 94f7f6cc444c..3a8a757f5bd5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -320,6 +320,7 @@ struct drm_i915_private {
struct intel_gt *gt[I915_MAX_GT];
 
struct kobject *sysfs_gt;
+   struct kobject *sysfs_engine;
 
/* Quick lookup of media GT (current platforms only have one) */
struct intel_gt *media_gt;
-- 
2.45.2

[PATCH v1 09/14] drm/i915/gt: Expose the number of total CCS slices

2024-08-21 Thread Andi Shyti

Implement a sysfs interface to show the number of available CCS
slices. The displayed number does not take into account the CCS
balancing mode.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 21 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c|  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index f0319278a5fc..ed3ad881a89d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -5,7 +5,9 @@
 
 #include "i915_drv.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_print.h"
 #include "intel_gt_regs.h"
+#include "intel_gt_sysfs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
@@ -95,3 +97,22 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
/* Initialize the CCS mode setting */
intel_gt_apply_ccs_mode(gt);
 }
+
+static ssize_t num_cslices_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 num_slices;
+
+   num_slices = hweight32(CCS_MASK(gt));
+
+   return sysfs_emit(buff, "%u\n", num_slices);
+}
+static DEVICE_ATTR_RO(num_cslices);
+
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
+{
+   if (sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr))
+   gt_warn(gt, "Failed to create sysfs num_cslices files\n");
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 4a6763b95a78..9696cc9017f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -9,5 +9,6 @@
 #include "intel_gt.h"
 
 void intel_gt_ccs_mode_init(struct intel_gt *gt);
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
index 33cba406b569..895eedc402ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
@@ -12,6 +12,7 @@
 #include "i915_drv.h"
 #include "i915_sysfs.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_sysfs.h"
 #include "intel_gt_sysfs_pm.h"
@@ -101,6 +102,7 @@ void intel_gt_sysfs_register(struct intel_gt *gt)
goto exit_fail;
 
intel_gt_sysfs_pm_init(gt, >->sysfs_gt);
+   intel_gt_sysfs_ccs_init(gt);
 
return;
 
-- 
2.45.2

[PATCH v1 08/14] drm/i915/gt: Remove cslices mask value from the CCS structure

2024-08-21 Thread Andi Shyti

Following the decision to manage CCS engine creation within UABI
engines, the "cslices" variable in the "ccs" structure in the
"gt" is no longer needed. Remove it is now redundant.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 5 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 5ca36985bdd7..f0319278a5fc 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -9,7 +9,7 @@
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
-   unsigned long cslices_mask = gt->ccs.cslices;
+   unsigned long cslices_mask = CCS_MASK(gt);
u32 mode_val = 0;
/* CCS mode, i.e. number of CCS engines to be enabled */
u32 width = 1;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 9e257f34d05b..71e43071da0b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -218,11 +218,6 @@ struct intel_gt {
 * i.e. how the CCS streams are distributed amongs the slices.
 */
struct {
-   /*
-* Mask of the non fused CCS slices
-* to be used for the load balancing
-*/
-   intel_engine_mask_t cslices;
u32 mode_reg_val;
} ccs;
 
-- 
2.45.2

[PATCH v1 07/14] drm/i915/gt: Manage CCS engine creation within UABI exposure

2024-08-21 Thread Andi Shyti

In commit ea315f98e5d6 ("drm/i915/gt: Do not generate the command
streamer for all the CCS"), we restricted the creation of
physical CCS engines to only one stream. This allowed the user to
submit a single compute workload, with all CCS slices sharing the
workload from that stream.

This patch removes that limitation but still exposes only one
stream to the user. The physical memory for each engine remains
allocated but unused, however the user will only see one engine
exposed.

Do this by adding only one engine to the UABI list, ensuring that
only one engine is visible to the user.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 23 -
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 17 ---
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4d30a86016f2..def255ee0b96 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -876,29 +876,6 @@ static intel_engine_mask_t init_engine_mask(struct 
intel_gt *gt)
info->engine_mask &= ~BIT(GSC0);
}
 
-   /*
-* Do not create the command streamer for CCS slices beyond the first.
-* All the workload submitted to the first engine will be shared among
-* all the slices.
-*
-* Once the user will be allowed to customize the CCS mode, then this
-* check needs to be removed.
-*/
-   if (IS_DG2(gt->i915)) {
-   u8 first_ccs = __ffs(CCS_MASK(gt));
-
-   /*
-* Store the number of active cslices before
-* changing the CCS engine configuration
-*/
-   gt->ccs.cslices = CCS_MASK(gt);
-
-   /* Mask off all the CCS engine */
-   info->engine_mask &= ~GENMASK(CCS3, CCS0);
-   /* Put back in the first CCS engine */
-   info->engine_mask |= BIT(_CCS(first_ccs));
-   }
-
return info->engine_mask;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index cd7662b1ad59..8e5284af8335 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -246,6 +246,20 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
 
GEM_BUG_ON(uabi_class >=
   ARRAY_SIZE(i915->engine_uabi_class_count));
+
+   /* Fix up the mapping to match default execbuf::user_map[] */
+   add_legacy_ring(&ring, engine);
+
+   /*
+* Do not create the command streamer for CCS slices beyond the
+* first. All the workload submitted to the first engine will be
+* shared among all the slices.
+*/
+   if (IS_DG2(i915) &&
+   uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+   engine->uabi_instance)
+   goto clear_node_continue;
+
i915->engine_uabi_class_count[uabi_class]++;
 
rb_link_node(&engine->uabi_node, prev, p);
@@ -255,9 +269,6 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
engine->uabi_class,
engine->uabi_instance) != 
engine);
 
-   /* Fix up the mapping to match default execbuf::user_map[] */
-   add_legacy_ring(&ring, engine);
-
prev = &engine->uabi_node;
p = &prev->rb_right;
 
-- 
2.45.2

[PATCH v1 06/14] drm/i915/gt: Introduce for_each_enabled_engine() and apply it in selftests

2024-08-21 Thread Andi Shyti

Selftests should run only on enabled engines, as disabled engines
are not intended for use. A practical example is when, on DG2
machines, the user chooses to utilize only one CCS stream instead
of all four.

To address this, introduce the for_each_enabled_engine() loop,
which will skip engines when they are marked as RB_EMPTY.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt.h| 12 +
 drivers/gpu/drm/i915/gt/selftest_context.c|  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_cs.c  |  4 +-
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  6 +--
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_execlists.c  | 52 +--
 drivers/gpu/drm/i915/gt/selftest_gt_pm.c  |  2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 22 
 drivers/gpu/drm/i915/gt/selftest_lrc.c| 18 +++
 drivers/gpu/drm/i915/gt/selftest_mocs.c   |  6 +--
 drivers/gpu/drm/i915/gt/selftest_rc6.c|  4 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +--
 .../drm/i915/gt/selftest_ring_submission.c|  2 +-
 drivers/gpu/drm/i915/gt/selftest_rps.c| 14 ++---
 drivers/gpu/drm/i915/gt/selftest_timeline.c   | 14 ++---
 drivers/gpu/drm/i915/gt/selftest_tlb.c|  2 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c| 14 ++---
 17 files changed, 102 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 998ca029b73a..1c52db1b5e25 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -188,6 +188,18 @@ int intel_gt_tiles_init(struct drm_i915_private *i915);
 (id__)++) \
for_each_if ((engine__) = (gt__)->engine[(id__)])
 
+/*
+ * Iterator over all initialized and enabled engines. Some engines, like CCS,
+ * may be "disabled" (i.e., not exposed to the user). Disabling is indicated
+ * by marking the rb_node as empty.
+ */
+#define for_each_enabled_engine(engine__, gt__, id__) \
+   for ((id__) = 0; \
+(id__) < I915_NUM_ENGINES; \
+(id__)++) \
+   for_each_if ( ((engine__) = (gt__)->engine[(id__)]) && \
+ (!RB_EMPTY_NODE(&(engine__)->uabi_node)) )
+
 /* Iterator over subset of engines selected by mask */
 #define for_each_engine_masked(engine__, gt__, mask__, tmp__) \
for ((tmp__) = (mask__) & (gt__)->info.engine_mask; \
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c 
b/drivers/gpu/drm/i915/gt/selftest_context.c
index 5eb46700dc4e..9976e231248d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -157,7 +157,7 @@ static int live_context_size(void *arg)
 * HW tries to write past the end of one.
 */
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct file *saved;
 
if (!engine->context_size)
@@ -311,7 +311,7 @@ static int live_active_context(void *arg)
enum intel_engine_id id;
int err = 0;
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
err = __live_active_context(engine);
if (err)
break;
@@ -424,7 +424,7 @@ static int live_remote_context(void *arg)
enum intel_engine_id id;
int err = 0;
 
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
err = __live_remote_context(engine);
if (err)
break;
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
index 5ffa5e30f419..038723a401df 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_cs.c
@@ -142,7 +142,7 @@ static int perf_mi_bb_start(void *arg)
return 0;
 
wakeref = perf_begin(gt);
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct intel_context *ce = engine->kernel_context;
struct i915_vma *batch;
u32 cycles[COUNT];
@@ -270,7 +270,7 @@ static int perf_mi_noop(void *arg)
return 0;
 
wakeref = perf_begin(gt);
-   for_each_engine(engine, gt, id) {
+   for_each_enabled_engine(engine, gt, id) {
struct intel_context *ce = engine->kernel_context;
struct i915_vma *base, *nop;
u32 cycles[COUNT];
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index 9e4f0e417b3b..74d4c2dc69cf 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -160,7 +160,7 @@ static int live_idle_flush(void *arg)

[PATCH v1 05/14] drm/i915/gem: Mark and verify UABI engine validity

2024-08-21 Thread Andi Shyti

Mark engines as invalid when they are not added to the UABI list
to prevent accidental assignment of batch buffers.

Currently, this change is mostly precautionary with minimal
impact. However, in the future, when CCS engines will be
dynamically added and removed by the user, this mechanism will
be used for determining engine validity.

Signed-off-by: Andi Shyti 
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 28 +--
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  9 --
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c58290274f97..770875e72056 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2682,6 +2682,22 @@ eb_select_legacy_ring(struct i915_execbuffer *eb)
return user_ring_map[user_ring_id];
 }
 
+static bool engine_valid(struct intel_context *ce)
+{
+   if (!intel_engine_is_virtual(ce->engine))
+   return !RB_EMPTY_NODE(&ce->engine->uabi_node);
+
+   /*
+* TODO: check virtual sibilings; we need to walk through all the
+* virtual engines and ask whether the physical engine where it is based
+* is still valid. For each of them we need to check with
+* RB_EMPTY_NODE(...)
+*
+* This can be a placed in a new ce_ops.
+*/
+   return true;
+}
+
 static int
 eb_select_engine(struct i915_execbuffer *eb)
 {
@@ -2712,8 +2728,6 @@ eb_select_engine(struct i915_execbuffer *eb)
eb->num_batches = ce->parallel.number_children + 1;
gt = ce->engine->gt;
 
-   for_each_child(ce, child)
-   intel_context_get(child);
eb->wakeref = intel_gt_pm_get(ce->engine->gt);
/*
 * Keep GT0 active on MTL so that i915_vma_parked() doesn't
@@ -2722,6 +2736,16 @@ eb_select_engine(struct i915_execbuffer *eb)
if (gt->info.id)
eb->wakeref_gt0 = intel_gt_pm_get(to_gt(gt->i915));
 
+   /* We need to hold the wakeref to stabilize i915->uabi_engines */
+   if (!engine_valid(ce)) {
+   intel_context_put(ce);
+   err = -ENODEV;
+   goto err;
+   }
+
+   for_each_child(ce, child)
+   intel_context_get(child);
+
if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
err = intel_context_alloc_state(ce);
if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 11cc06c0c785..cd7662b1ad59 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -220,7 +220,7 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
container_of(it, typeof(*engine), uabi_list);
 
if (intel_gt_has_unrecoverable_error(engine->gt))
-   continue; /* ignore incomplete engines */
+   goto clear_node_continue; /* ignore incomplete engines 
*/
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
engine->uabi_class = uabi_classes[engine->class];
@@ -242,7 +242,7 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
  engine->uabi_instance);
 
if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
-   continue;
+   goto clear_node_continue;
 
GEM_BUG_ON(uabi_class >=
   ARRAY_SIZE(i915->engine_uabi_class_count));
@@ -260,6 +260,11 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
 
prev = &engine->uabi_node;
p = &prev->rb_right;
+
+   continue;
+
+clear_node_continue:
+   RB_CLEAR_NODE(&engine->uabi_node);
}
 
if (IS_ENABLED(CONFIG_DRM_I915_SELFTESTS) &&
-- 
2.45.2

[PATCH v1 04/14] drm/i915/gt: Refactor uabi engine class/instance list creation

2024-08-21 Thread Andi Shyti

For the upcoming changes we need a cleaner way to build the list
of uabi engines.

Suggested-by: Tvrtko Ursulin 
Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 -
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..11cc06c0c785 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, 
const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)
 {
-   u16 name_instance, other_instance = 0;
+   u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
struct legacy_ring ring = {};
struct list_head *it, *next;
struct rb_node **p, *prev;
@@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
prev = NULL;
p = &i915->uabi_engines.rb_node;
list_for_each_safe(it, next, &engines) {
+   u16 uabi_class;
+
struct intel_engine_cs *engine =
container_of(it, typeof(*engine), uabi_list);
 
@@ -222,15 +224,14 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
engine->uabi_class = uabi_classes[engine->class];
-   if (engine->uabi_class == I915_NO_UABI_CLASS) {
-   name_instance = other_instance++;
-   } else {
-   GEM_BUG_ON(engine->uabi_class >=
-  ARRAY_SIZE(i915->engine_uabi_class_count));
-   name_instance =
-   
i915->engine_uabi_class_count[engine->uabi_class]++;
-   }
-   engine->uabi_instance = name_instance;
+
+   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+   else
+   uabi_class = engine->uabi_class;
+
+   GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+   engine->uabi_instance = class_instance[uabi_class]++;
 
/*
 * Replace the internal name with the final user and log facing
@@ -238,11 +239,15 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 */
engine_rename(engine,
  intel_engine_class_repr(engine->class),
- name_instance);
+ engine->uabi_instance);
 
-   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
continue;
 
+   GEM_BUG_ON(uabi_class >=
+  ARRAY_SIZE(i915->engine_uabi_class_count));
+   i915->engine_uabi_class_count[uabi_class]++;
+
rb_link_node(&engine->uabi_node, prev, p);
rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
-- 
2.45.2

[PATCH v1 03/14] drm/i915/gt: Allow the creation of multi-mode CCS masks

2024-08-21 Thread Andi Shyti

Until now, we have only set CCS mode balancing to 1, which means
that only one compute engine is exposed to the user. The stream
of compute commands submitted to that engine is then shared among
all the dedicated execution units.

This is done by calling the 'intel_gt_apply_ccs_mode(); function.

With this change, the aforementioned function takes an additional
parameter called 'mode' that specifies the desired mode to be set
for the CCS engines balancing. The mode parameter can have the
following values:

 - mode = 0: CCS load balancing mode 1 (1 CCS engine exposed)
 - mode = 1: CCS load balancing mode 2 (2 CCS engines exposed)
 - mode = 3: CCS load balancing mode 4 (4 CCS engines exposed)

This allows us to generate the appropriate register value to be
written to CCS_MODE, configuring how the exposed engine streams
will be submitted to the execution units.

No functional changes are intended yet, as no mode higher than
'0' is currently being set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 80 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 2 files changed, 67 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index fcd07eb4728b..5ca36985bdd7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,35 +4,87 @@
  */
 
 #include "i915_drv.h"
-#include "intel_gt.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
 static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
+   unsigned long cslices_mask = gt->ccs.cslices;
+   u32 mode_val = 0;
+   /* CCS mode, i.e. number of CCS engines to be enabled */
+   u32 width = 1;
+   /* CCS engine id, i.e. the engines position in the engine's bitmask */
+   int engine;
int cslice;
-   u32 mode = 0;
-   int first_ccs = __ffs(CCS_MASK(gt));
 
-   /* Build the value for the fixed CCS load balancing */
+   /*
+* The mode has two bit dedicated for each engine
+* that will be used for the CCS balancing algorithm:
+*
+*BIT | CCS slice
+*   --
+* 0  | CCS slice
+* 1  | 0
+*   --
+* 2  | CCS slice
+* 3  | 1
+*   --
+* 4  | CCS slice
+* 5  | 2
+*   --
+* 6  | CCS slice
+* 7  | 3
+*   --
+*
+* When a CCS slice is not available, then we will write 0x7,
+* oterwise we will write the user engine id which load will
+* be forwarded to that slice.
+*
+* The possible configurations are:
+*
+* 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*/
+   engine = __ffs(cslices_mask);
+
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
-   if (gt->ccs.cslices & BIT(cslice))
+   if (!(cslices_mask & BIT(cslice))) {
/*
-* If available, assign the cslice
-* to the first available engine...
+* If not available, mark the slice as unavailable
+* and no task will be dispatched here.
 */
-   mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice,
+XEHP_CCS_MODE_CSLICE_MASK);
+   continue;
+   }
+
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice, engine);
 
-   else
+   if (!width) {
/*
-* ... otherwise, mark the cslice as
-* unavailable if no CCS dispatches here
+* CCS mode, will be used later to
+* reset to a flexible value
 */
-   mode |= XEHP_CCS_MODE_CSLICE(cslice,
-XEHP_CCS_MODE_CSLICE_MASK);
+   width = 1;
+   engine = __ffs(cslices_mask);
+   continue;
+   }
+
+   width--;
+   engine = find_next_bit(&cslices_mask, I915_MAX_CCS, engine + 1);
}
 
-   gt->ccs.mode_reg_val = mode;
+   gt->ccs.mode_reg_val = mode_val;
 }
 
 void intel_gt_ccs_mode_

[PATCH v1 02/14] drm/i915/gt: Move the CCS mode variable to a global position

2024-08-21 Thread Andi Shyti

Store the CCS mode value in the intel_gt->ccs structure to make
it available for future instances that may need to change its
value.

Name it mode_reg_val because it holds the value that will
be written into the CCS_MODE register, determining the CCS
balancing and, consequently, the number of engines generated.

No functional changes intended.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt.c  |  3 +++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 16 +++-
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 11 +++
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  6 --
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a6c69a706fd7..5af0527d822d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -18,6 +18,7 @@
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_debugfs.h"
 #include "intel_gt_mcr.h"
@@ -136,6 +137,8 @@ int intel_gt_init_mmio(struct intel_gt *gt)
intel_sseu_info_init(gt);
intel_gt_mcr_init(gt);
 
+   intel_gt_ccs_mode_init(gt);
+
return intel_engines_init_mmio(gt);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 3c62a44e9106..fcd07eb4728b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -8,15 +8,12 @@
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
int cslice;
u32 mode = 0;
int first_ccs = __ffs(CCS_MASK(gt));
 
-   if (!IS_DG2(gt->i915))
-   return 0;
-
/* Build the value for the fixed CCS load balancing */
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
if (gt->ccs.cslices & BIT(cslice))
@@ -35,5 +32,14 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
 XEHP_CCS_MODE_CSLICE_MASK);
}
 
-   return mode;
+   gt->ccs.mode_reg_val = mode;
+}
+
+void intel_gt_ccs_mode_init(struct intel_gt *gt)
+{
+   if (!IS_DG2(gt->i915))
+   return;
+
+   /* Initialize the CCS mode setting */
+   intel_gt_apply_ccs_mode(gt);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 55547f2ff426..0f2506586a41 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -8,6 +8,6 @@
 
 struct intel_gt;
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt);
+void intel_gt_ccs_mode_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index bcee084b1f27..9e257f34d05b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -207,12 +207,23 @@ struct intel_gt {
[MAX_ENGINE_INSTANCE + 1];
enum intel_submission_method submission_method;
 
+   /*
+* Track fixed mapping between CCS engines and compute slices.
+*
+* In order to w/a HW that has the inability to dynamically load
+* balance between CCS engines and EU in the compute slices, we have to
+* reconfigure a static mapping on the fly.
+*
+* The mode variable is set by the user and sets the balancing mode,
+* i.e. how the CCS streams are distributed amongs the slices.
+*/
struct {
/*
 * Mask of the non fused CCS slices
 * to be used for the load balancing
 */
intel_engine_mask_t cslices;
+   u32 mode_reg_val;
} ccs;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index f3082fad3f45..f6135be3cd86 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2727,7 +2727,7 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
 static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct 
i915_wa_list *wal)
 {
struct intel_gt *gt = engine->gt;
-   u32 mode;
+   u32 mode = gt->ccs.mode_reg_val;
 
if (!IS_DG2(gt->i915))
return;
@@ -2743,8 +2743,10 @@ static void ccs_engine_wa_mode(struct intel_engine_cs 
*engine, struct i915_wa_li
/*
 * After hav

[PATCH v1 01/14] drm/i915/gt: Avoid using masked workaround for CCS_MODE setting

2024-08-21 Thread Andi Shyti

When setting the CCS mode, we mistakenly used wa_masked_en() to
apply the workaround, which reads from the register and masks the
existing value with the new one.

Our intention was to write the value directly, without masking
it.

So far, this hasn't caused issues because we've been using a
register value that only enables a single CCS engine, typically
with an ID of '0'.

However, in upcoming patches, we will be utilizing multiple
engines, and it's crucial that we write the new value directly
without any masking.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index bfe6d8fc820f..f3082fad3f45 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2745,7 +2745,7 @@ static void ccs_engine_wa_mode(struct intel_engine_cs 
*engine, struct i915_wa_li
 * assign all slices to a single CCS. We will call it CCS mode 1
 */
mode = intel_gt_apply_ccs_mode(gt);
-   wa_masked_en(wal, XEHP_CCS_MODE, mode);
+   wa_add(wal, XEHP_CCS_MODE, 0, mode, mode, false);
 }
 
 /*
-- 
2.45.2

[PATCH v1 00/14] CCS static load balance

2024-08-21 Thread Andi Shyti

Hi,

Time to promote this series from RFCv2 to PATCHv1 as I think it's
already in a decent working condition.

This patch series introduces static load balancing for GPUs with
multiple compute engines. It's a lengthy series, and some
challenging aspects still need to be resolved.

I have tried to split the work as much as possible to facilitate
the review process.

To summarize, in patches 1 to 13, no functional changes occur
except for the addition of the num_cslices interface. The
significant changes happen in patch 14, which is the core part of
the CCS mode setting, utilizing the groundwork laid in the
earlier patches.

In this updated approach, the focus is now on managing the UABI
engine list, which controls the engines exposed to userspace.
Instead of manipulating phuscal engines and their memory, we now
handle engine exposure through this list.

I would greatly appreciate further input from all reviewers who
have already assisted with the previous work.

IGT tests have also been developed, but I haven't sent them yet.

Thank you Chris for the offline reviews.

Changelog:
==
RFCv2 -> PATCHv1

 - Removed gt->ccs.mutex
 - Rename m -> width, ccs_id -> engine in
   intel_gt_apply_ccs_mode().
 - In the CCS register value calculation
   (intel_gt_apply_ccs_mode()) the engine (ccs_id) needs to move
   along the ccs_mask (set by the user) instead of the
   cslice_mask.
 - Add GEM_BUG_ON after calculating the new ccs_mask
   (update_ccs_mask()) to make sure all angines have been
   evaluated (i.e. ccs_mask must be '0' at the end of the
   algorithm).
 - move wakeref lock before evaluating intel_gt_pm_is_awake() and
   fix exit path accordingly.
 - Use a more compact form in intel_gt_sysfs_ccs_init() and
   add_uabi_ccs_engines() when evaluating sysfs_create_file(): no
   need to store the return value to the err variable which is
   unused. Get rid of err.
 - Print a warnging instead of a debug message if we fail to
   create the sysfs files.
 - If engine files creation fails in
   intel_engine_add_single_sysfs(), print a warning, not an
   error.
 - Rename gt->ccs.ccs_mask to gt->ccs.id_mask and add a comment
   to explain its purpose.
 - During uabi engine creation, in
   intel_engines_driver_register(), the uabi_ccs_instance is
   redundant because the ccs_instances is already tracked in
   engine->uabi_instance.
 - Mark add_uabi_ccs_engines() and remove_uabi_ccs_engines() as
   __maybe_unused not to break bisectability. They wouldn't
   compile in their own commit. They will be used in the next
   patch and the __maybe_unused is removed.
 - Update engine's workaround every time a new mode is set in
   update_ccs_mask().
 - Mark engines as valid or invalid using their status as
   rb_node. Invalid engines are marked as invalid using
   RB_CLEAR_NODE(). Execbufs will check for their validity when
   selecting the engine to be combined to a context.
 - Create for_each_enabled_engine() which skips the non valid
   engines and use it in selftests.

RFCv1 -> RFCv2
--
Compared to the first version I've taken a completely different
approach to adding and removing engines. in v1 physical engines
were directly added and removed, along with the memory allocated
to them, each time the user changed the CCS mode (from the
previous cover letter).

Andi Shyti (14):
  drm/i915/gt: Avoid using masked workaround for CCS_MODE setting
  drm/i915/gt: Move the CCS mode variable to a global position
  drm/i915/gt: Allow the creation of multi-mode CCS masks
  drm/i915/gt: Refactor uabi engine class/instance list creation
  drm/i915/gem: Mark and verify UABI engine validity
  drm/i915/gt: Introduce for_each_enabled_engine() and apply it in
selftests
  drm/i915/gt: Manage CCS engine creation within UABI exposure
  drm/i915/gt: Remove cslices mask value from the CCS structure
  drm/i915/gt: Expose the number of total CCS slices
  drm/i915/gt: Store engine-related sysfs kobjects
  drm/i915/gt: Store active CCS mask
  drm/i915/gt: Isolate single sysfs engine file creation
  drm/i915/gt: Implement creation and removal routines for CCS engines
  drm/i915/gt: Allow the user to change the CCS mode through sysfs

 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  28 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c |  23 --
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 +
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  57 ++-
 drivers/gpu/drm/i915/gt/intel_gt.c|   3 +
 drivers/gpu/drm/i915/gt/intel_gt.h|  12 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c   | 324 +-
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h   |   5 +-
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c  |   2 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h  |  19 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   8 +-
 drivers/gpu/drm/i915/gt/selftest_context.c|   6 +-
 drivers/gpu/drm/i915/gt/selftest_engine_cs.

Re: [PATCH] drm/i915/gt: Continue creating engine sysfs files even after a failure

2024-08-21 Thread Andi Shyti

Hi Again, Rodrigo,

...

> > Also it looks something is off with the goto paths...
> > 
> > That if (0) is also ugly... probably better to use a
> > kobject_put with continue on every failing point as well...
> 
> ehehe... I came to like it, to be honest. Besides I like single
> exit paths instead of distributed returns. In this particular
> case we would replcate the same "kobject_put() ... dev_warn()" in
> several places, so that I'm not sure it's better.
> 
> If you like more we could do:
> 
>   for (...) {
>   ...
>   ...
>   /* everything goes fine */
>   continue
> 
> err_engine:
>   kobject_put(...);
>   dev_warn(...);
>   }
> 
> And we avoid using the "if (0)" that you don't like.

BTW, the purpose of the patch is to remove the break and, as I
was at it, I chhanged dev_err/dev_warn.

Refactoring the "if (0)" is a bit out of scope. Right?

Thanks,
Andi

Re: [PATCH] drm/i915/gt: Continue creating engine sysfs files even after a failure

2024-08-21 Thread Andi Shyti

Hi Rodrigo,

On Tue, Aug 20, 2024 at 05:22:40PM -0400, Rodrigo Vivi wrote:
> On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:
> > The i915 driver generates sysfs entries for each engine of the
> > GPU in /sys/class/drm/cardX/engines/.
> > 
> > The process is straightforward: we loop over the UABI engines and
> > for each one, we:
> > 
> >  - Create the object.
> >  - Create basic files.
> >  - If the engine supports timeslicing, create timeslice duration files.
> >  - If the engine supports preemption, create preemption-related files.
> >  - Create default value files.
> > 
> > Currently, if any of these steps fail, the process stops, and no
> > further sysfs files are created.
> > 
> > However, it's not necessary to stop the process on failure.
> > Instead, we can continue creating the remaining sysfs files for
> > the other engines. Even if some files fail to be created, the
> > list of engines can still be retrieved by querying i915.
> > 
> > Signed-off-by: Andi Shyti 
> > ---
> > Hi,
> > 
> > It might make sense to create an "inv-" if something
> > goes wrong, so that the user is aware that the engine exists, but
> > the sysfs file is not present.
> 
> well, if the sysfs dir/files creation is failing, then it will
> probably be unreliable anyway right?

Are you suggesting that "inv-" is OK?

> > One further improvement would be to provide more information
> > about thei failure reason the dev_warn() message.
> 
> So, perhaps this patch should already go there and remove
> the dev_err and add individual dev_warn for each failing path?

That's a suggestion, but it doesn't mean that it necessarily
improves things as it might add some innecessary information.
Just thinking.

> Also it looks something is off with the goto paths...
> 
> That if (0) is also ugly... probably better to use a
> kobject_put with continue on every failing point as well...

ehehe... I came to like it, to be honest. Besides I like single
exit paths instead of distributed returns. In this particular
case we would replcate the same "kobject_put() ... dev_warn()" in
several places, so that I'm not sure it's better.

If you like more we could do:

for (...) {
...
...
/* everything goes fine */
continue

err_engine:
kobject_put(...);
dev_warn(...);
}

And we avoid using the "if (0)" that you don't like.

Thanks,
Andi

Re: [PATCH v2] drm/i915/gt: Use kmemdup_array instead of kmemdup for multiple allocation

2024-08-20 Thread Andi Shyti

Hi Lucas,

On Tue, Aug 20, 2024 at 07:53:10AM -0500, Lucas De Marchi wrote:
> On Tue, Aug 20, 2024 at 05:53:02PM GMT, Yu Jiaoliang wrote:
> > Let the kememdup_array() take care about multiplication and possible
> > overflows.
> > 
> > v2:
> > - Change subject
> > - Leave one blank line between the commit log and the tag section
> > - Fix code alignment issue
> > 
> > Signed-off-by: Yu Jiaoliang 
> > Reviewed-by: Jani Nikula 
> > Reviewed-by: Andi Shyti 
> > ---
> > drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> > b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > index d90348c56765..0fcfd55c62b4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > @@ -111,9 +111,8 @@ static void wa_init_finish(struct i915_wa_list *wal)
> > {
> > /* Trim unused entries. */
> > if (!IS_ALIGNED(wal->count, WA_LIST_CHUNK)) {
> > -   struct i915_wa *list = kmemdup_array(wal->list,
> 
>   ^
> 
> it was already kmemdup_array, not kmemdup. Am I missing anything?

I see kmemdup() in drm-tip.

What Yu has done here is to change kmemdup to kmemdup_array and
send the patch. Received the reviews and made a new commit on top
of the previous one; then he sent only this second commit.

Yu needs to make sure that:

 1. the patch applies correctly on a clean drm-tip
 2. drm-tip + patch compiles
 3. there are no checkpatch and sparse new errors

We missed point 1 here :-)

Andi

Re: [PATCH v2] drm/i915/gt: Use kmemdup_array instead of kmemdup for multiple allocation

2024-08-20 Thread Andi Shyti

Hi,

On Tue, Aug 20, 2024 at 05:53:02PM +0800, Yu Jiaoliang wrote:
> Let the kememdup_array() take care about multiplication and possible
> overflows.
> 
> v2:
> - Change subject
> - Leave one blank line between the commit log and the tag section
> - Fix code alignment issue
> 
> Signed-off-by: Yu Jiaoliang 
> Reviewed-by: Jani Nikula 
> Reviewed-by: Andi Shyti 

I didn't give you an explicit R-b, but that's fine, you can keep
it as I think the patch is fine.

> - struct i915_wa *list = kmemdup_array(wal->list,
> -wal->count, sizeof(*list),
> -GFP_KERNEL);
> + struct i915_wa *list = kmemdup_array(wal->list, wal->count,
> + 
>  sizeof(*list), GFP_KERNEL);

Do you see the indentation is off here? :-)

Please, run checkpatch.pl before sending the patch, as well.

Besides, what patch is this? Are you replacing kmemdup_array with
kmemdup_array? This v2 applies on your v1 while it should apply
on a clean drm-tip repository.

Thanks,
Andi

>  
>   if (list) {
>   kfree(wal->list);
> -- 
> 2.34.1

Re: [PATCH v1] drivers:gt:Switch to use kmemdup_array()

2024-08-20 Thread Andi Shyti

Hi Yi,

Please, next time check with "git drivers/gpu/drm/i915/gt" to
better understand the patch formatting.

The title should be something like:

   drm/i915/gt: Switch to use kmemdup_array()

But sounds more grammatically correct

   drm/i915/gt: Use kmemdup_array instead of kmemdup for multiple allocation

On Tue, Aug 20, 2024 at 03:45:03PM +0800, Yu Jiaoliang wrote:
> Let the kememdup_array() take care about multiplication and possible
> overflows.

Leave one blank line between the commit log and the tag section

> Signed-off-by: Yu Jiaoliang 
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 09a287c1aedd..d90348c56765 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -111,8 +111,8 @@ static void wa_init_finish(struct i915_wa_list *wal)
>  {
>   /* Trim unused entries. */
>   if (!IS_ALIGNED(wal->count, WA_LIST_CHUNK)) {
> - struct i915_wa *list = kmemdup(wal->list,
> -wal->count * sizeof(*list),
> + struct i915_wa *list = kmemdup_array(wal->list,
> +wal->count, sizeof(*list),
>  GFP_KERNEL);

Here you are not aligning correctly. Everything should be aligned
to one character after the open parenthesis; for example:

struct i915_wa *list = kmemdup_array(wal->list, wal->count,
 sizeof(*list), GFP_KERNEL);

Patch is good, though looking forward to receiving v2.

Thanks,
Andi

>  
>   if (list) {
> -- 
> 2.34.1

Re: [PATCH 0/2] Allow partial memory mapping for cpu memory

2024-08-19 Thread Andi Shyti

Hi Sima,

On Mon, Aug 19, 2024 at 04:17:01PM +0200, Daniel Vetter wrote:
> On Wed, Aug 14, 2024 at 02:08:49AM +, Matthew Brost wrote:
> > On Tue, Aug 13, 2024 at 07:08:02PM +, Matthew Brost wrote:
> > > On Tue, Aug 13, 2024 at 04:09:55PM +0200, Daniel Vetter wrote:
> > > > On Tue, Aug 13, 2024 at 02:54:31AM +, Matthew Brost wrote:
> > > > > On Mon, Aug 12, 2024 at 04:45:32PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, Aug 12, 2024 at 01:51:30PM +0200, Andi Shyti wrote:
> > > > > > > On Mon, Aug 12, 2024 at 11:11:21AM +0200, Daniel Vetter wrote:
> > > > > > > > On Fri, Aug 09, 2024 at 11:20:56AM +0100, Andi Shyti wrote:
> > > > > > > > > On Fri, Aug 09, 2024 at 10:53:38AM +0200, Daniel Vetter wrote:
> > > > > > > > > > On Wed, Aug 07, 2024 at 11:05:19AM +0100, Andi Shyti wrote:
> > > > > > > > > > > This patch series concludes on the memory mapping fixes 
> > > > > > > > > > > and
> > > > > > > > > > > improvements by allowing partial memory mapping for the 
> > > > > > > > > > > cpu
> > > > > > > > > > > memory as well.
> > > > > > > > > > > 
> > > > > > > > > > > The partial memory mapping by adding an object offset was
> > > > > > > > > > > implicitely included in commit 8bdd9ef7e9b1 
> > > > > > > > > > > ("drm/i915/gem: Fix
> > > > > > > > > > > Virtual Memory mapping boundaries calculation") for the 
> > > > > > > > > > > gtt
> > > > > > > > > > > memory.
> > > > > > > > > > 
> > > > > > > > > > Does userspace actually care? Do we have a flag or 
> > > > > > > > > > something, so that
> > > > > > > > > > userspace can discover this?
> > > > > > > > > > 
> > > > > > > > > > Adding complexity of any kind is absolute no-go, unless 
> > > > > > > > > > there's a
> > > > > > > > > > userspace need. This also includes the gtt accidental fix.
> > > > > > > > > 
> > > > > > > > > Actually this missing functionality was initially filed as a 
> > > > > > > > > bug
> > > > > > > > > by mesa folks. So that this patch was requested by them 
> > > > > > > > > (Lionel
> > > > > > > > > is Cc'ed).
> > > > > > > > > 
> > > > > > > > > The tests cases that have been sent previously and I'm going 
> > > > > > > > > to
> > > > > > > > > send again, are directly taken from mesa use cases.
> > > > > > > > 
> > > > > > > > Please add the relevant mesa MR to this patch then, and some 
> > > > > > > > relevant
> > > > > > > > explanations for how userspace detects this all and decides to 
> > > > > > > > use it.
> > > > > > > 
> > > > > > > AFAIK, there is no Mesa MR. We are adding a feature that was
> > > > > > > missing, but Mesa already supported it (indeed, Nimroy suggested
> > > > > > > adding the Fixes tag for this).
> > > > > > > 
> > > > > > > Also because, Mesa was receiving an invalid address error and
> > > > > > > asked to support the partial mapping of the memory.
> > > > > > 
> > > > > > Uh this sounds a bit too much like just yolo'ing uabi. There's two 
> > > > > > cases:
> > > > > > 
> > > > > > - Either this is a regression, it worked previously, mesa is now 
> > > > > > angry.
> > > > > >   Then we absolutely need a Fixes: tag, and we also need that for 
> > > > > > the
> > > > > >   preceeding work to re-enable this for gtt mappings.
> > > > > > 
> > > > > > - Or mesa is just plain wrong here, which is what my guess is. 
> > > > > > Because bo
> > > > > >   mappings have always been full-object (except for the old-

Re: [PATCH v4] drm/i915/hwmon: expose fan speed

2024-08-19 Thread Andi Shyti

Hi Raag,

I'm sorry, I missed this mail.

On Mon, Aug 19, 2024 at 09:50:13AM +0300, Raag Jadav wrote:
> On Wed, Aug 14, 2024 at 02:07:44PM +0530, Nilawar, Badal wrote:
> > On 09-08-2024 15:46, Andi Shyti wrote:
> > > > > +static int
> > > > > +hwm_fan_read(struct hwm_drvdata *ddat, u32 attr, long *val)
> > > > > +{
> > > > > + struct i915_hwmon *hwmon = ddat->hwmon;
> > > > > + struct hwm_fan_info *fi = &ddat->fi;
> > > > > + u32 reg_val, pulses, time, time_now;
> > > > > + intel_wakeref_t wakeref;
> > > > > + long rotations;
> > > > > + int ret = 0;
> > > > > +
> > > > > + if (attr != hwmon_fan_input)
> > > > > + return -EOPNOTSUPP;
> > > > Using a switch case in rev3 is more logical here. It will also simplify
> > > > adding more fan attributes in the future. This is why switch cases are 
> > > > used
> > > > in other parts of the file.
> > > 
> > > it was my suggestion and to be honest I would rather prefer it
> > > this way. I can understand it if we were expecting more cases in
> > > the immediate, like it was in your case.
> > > 
> > > But I wouldn't have an ugly and unreadable one-case-switch in the
> > > eventuality that something comes in the future. In that case, we
> > > can always convert it.
> > 
> > My rationale for suggesting a switch case is that in the current alignment
> > hwm_XX_read() function is designed to handle all possible/supported
> > attributes of the XX sensor type.
> > With the proposed change, hwm_fan_read() would only manage the
> > hwmon_fan_input attribute.
> > If a single switch case isn’t preferred, I would recommend creating a helper
> > function dedicated to hwmon_fan_input.
> > 
> > hwm_fan_read()
> > {
> > if (attr == hwmon_fan_input)
> > return helper(); //hwmon_fan_input_read()

I'm not really understanding what is the point of the helper, but
if it looks cleaner, I have no objection.

Thanks,
Andi

Re: [PATCH v2 0/2] Allow partial memory mapping for cpu memory

2024-08-19 Thread Andi Shyti

Hi Matt,

On Wed, Aug 14, 2024 at 04:07:02PM +, Matthew Brost wrote:
> On Wed, Aug 14, 2024 at 03:48:32PM +0200, Andi Shyti wrote:
> > I am resending this patch series, not to disregard the previous
> > discussions, but to ensure it gets tested with the IGTs that
> > Krzysztof has provided.
> > 
> > This patch series finalizes the memory mapping fixes and
> > improvements by enabling partial memory mapping for CPU memory as
> > well.
> > 
> > The concept of partial memory mapping, achieved by adding an
> > object offset, was implicitly introduced in commit 8bdd9ef7e9b1
> > ("drm/i915/gem: Fix Virtual Memory mapping boundaries
> > calculation") for GTT memory.
> > 
> > To address a previous discussion with Sima and Matt, this feature
> > is used by Mesa and is required across all platforms utilizing
> > Mesa. Although Nirmoy suggested using the Fixes tag to backport
> 
> Other vendors than Intel too?

Yes, that's what I understood.

I hope Lionel can jump in and explain the use cases from Mesa
perspective.

> > this to previous kernels, I view this as a new feature rather
> > than a fix.
> > 
> > Lionel, please let me know if you have a different perspective
> > and believe this should be treated as a bug fix, requiring it
> > to be backported to stable kernels.
> > 
> > The IGTs have been developed in collaboration with the Mesa team
> > to replicate the exact Mesa use case[*].
> > 
> > Thanks Chris for the support, thanks Krzysztof for taking care of
> > the IGT tests, thanks Nirmoy for your reviews and thanks Sima and
> > Matt for the discussion on this series.
> > 
> > Andi
> > 
> > [*] https://patchwork.freedesktop.org/patch/608232/?series=137303&rev=1
> 
> So here is really quick test [1] which I put together in Xe to test
> partial mmaps, not as complete as the i915 one though.
> 
> It fails on the Xe baseline.
> 
> It pass if with [2] in drm_gem.c:drm_gem_mmap. Blindly changing that
> function might not be the correct solution but thought I'd share as a
> reference.

Thanks for sharing it. I took a quick look and I think there are
a few things missing there. If you want and if this is not in
anyone's task list, I can try to "port" this in XE.

Is there any other objection to getting this merged into i915?

Andi

[PATCH] drm/i915/gt: Continue creating engine sysfs files even after a failure

2024-08-19 Thread Andi Shyti

The i915 driver generates sysfs entries for each engine of the
GPU in /sys/class/drm/cardX/engines/.

The process is straightforward: we loop over the UABI engines and
for each one, we:

 - Create the object.
 - Create basic files.
 - If the engine supports timeslicing, create timeslice duration files.
 - If the engine supports preemption, create preemption-related files.
 - Create default value files.

Currently, if any of these steps fail, the process stops, and no
further sysfs files are created.

However, it's not necessary to stop the process on failure.
Instead, we can continue creating the remaining sysfs files for
the other engines. Even if some files fail to be created, the
list of engines can still be retrieved by querying i915.

Signed-off-by: Andi Shyti 
---
Hi,

It might make sense to create an "inv-" if something
goes wrong, so that the user is aware that the engine exists, but
the sysfs file is not present.

One further improvement would be to provide more information
about thei failure reason the dev_warn() message.

Andi

 drivers/gpu/drm/i915/gt/sysfs_engines.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 021f51d9b456..aab2759067d2 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -530,9 +530,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 err_object:
kobject_put(kobj);
 err_engine:
-   dev_err(kdev, "Failed to add sysfs engine '%s'\n",
-   engine->name);
-   break;
+   dev_warn(kdev, "Failed to add sysfs engine '%s'\n",
+engine->name);
}
}
 }
-- 
2.45.2

[RFC PATCH v2 11/11] drm/i915/gt: Allow the user to change the CCS mode through sysfs

2024-08-17 Thread Andi Shyti

Create the 'ccs_mode' file under

/sys/class/drm/cardX/gt/gt0/ccs_mode

This file allows the user to read and set the current CCS mode.

 - Reading: The user can read the current CCS mode, which can be
   1, 2, or 4. This value is derived from the current engine
   mask.

 - Writing: The user can set the CCS mode to 1, 2, or 4,
   depending on the desired number of exposed engines and the
   required load balancing.

The interface will return -EBUSY if other clients are connected
to i915, or -EINVAL if an invalid value is set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 74 +
 1 file changed, 74 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index b1c3c9d9bb4f..30393009bc43 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -5,6 +5,7 @@
 
 #include "i915_drv.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_pm.h"
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
@@ -206,6 +207,68 @@ static ssize_t num_cslices_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(num_cslices);
 
+static ssize_t ccs_mode_show(struct device *dev,
+struct device_attribute *attr, char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 ccs_mode;
+
+   ccs_mode = hweight32(gt->ccs.ccs_mask);
+
+   return sysfs_emit(buff, "%u\n", ccs_mode);
+}
+
+static ssize_t ccs_mode_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buff, size_t count)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   int num_cslices = hweight32(CCS_MASK(gt));
+   int ccs_mode = hweight32(gt->ccs.ccs_mask);
+   ssize_t ret;
+   u32 val;
+
+   ret = kstrtou32(buff, 0, &val);
+   if (ret)
+   return ret;
+
+   /*
+* As of now possible values to be set are 1, 2, 4,
+* up to the maximum number of available slices
+*/
+   if ((!val) || (val > num_cslices) || (num_cslices % val))
+   return -EINVAL;
+
+   /*
+* We don't want to change the CCS
+* mode while someone is using the GT
+*/
+   if (intel_gt_pm_is_awake(gt))
+   return -EBUSY;
+
+   mutex_lock(>->wakeref.mutex);
+   mutex_lock(>->ccs.mutex);
+
+   /*
+* Nothing to do if the requested setting
+* is the same as the current one
+*/
+   if (val == ccs_mode)
+   return count;
+   else if (val > ccs_mode)
+   add_uabi_ccs_engines(gt, val);
+   else
+   remove_uabi_ccs_engines(gt, val);
+
+   intel_gt_apply_ccs_mode(gt, val);
+
+   mutex_unlock(>->ccs.mutex);
+   mutex_unlock(>->wakeref.mutex);
+
+   return count;
+}
+static DEVICE_ATTR_RW(ccs_mode);
+
 void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
 {
int err;
@@ -213,4 +276,15 @@ void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
err = sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr);
if (err)
gt_dbg(gt, "failed to create sysfs num_cslices files\n");
+
+   /*
+* Do not create the ccs_mode file for non DG2 platforms
+* because they don't need it as they have only one CCS engine
+*/
+   if (!IS_DG2(gt->i915))
+   return;
+
+   err = sysfs_create_file(>->sysfs_gt, &dev_attr_ccs_mode.attr);
+   if (err)
+   gt_dbg(gt, "failed to create sysfs ccs_mode files\n");
 }
-- 
2.45.2

[RFC PATCH v2 10/11] drm/i915/gt: Implement creation and removal routines for CCS engines

2024-08-17 Thread Andi Shyti

In preparation for upcoming patches, we need routines to
dynamically create and destroy CCS engines based on the CCS mode
that the user wants to set.

The process begins by calculating the engine mask for the engines
that need to be added or removed. We then update the UABI list of
exposed engines and create or destroy the corresponding sysfs
interfaces accordingly.

These functions are not yet in use, so no functional changes are
intended at this stage.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 80 +
 1 file changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 01ce719cf475..b1c3c9d9bb4f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -8,6 +8,7 @@
 #include "intel_gt_print.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
+#include "sysfs_engines.h"
 
 static void update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
 {
@@ -113,6 +114,85 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
update_ccs_mask(gt, 1);
 }
 
+static void add_uabi_ccs_engines(struct intel_gt *gt, u32 ccs_mode)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_engine_mask_t new_ccs_mask, tmp;
+   struct intel_engine_cs *engine;
+   struct rb_node **p, *prev;
+
+   /* Store the current ccs mask */
+   new_ccs_mask = gt->ccs.ccs_mask;
+   update_ccs_mask(gt, ccs_mode);
+
+   /*
+* Store only the mask of the CCS engines that need to be added by
+* removing from the new mask the engines that are already active
+*/
+   new_ccs_mask = gt->ccs.ccs_mask & ~new_ccs_mask;
+   new_ccs_mask <<= CCS0;
+
+   /*
+* UABI are stored only on the right branch of the rb tree, making it
+* de facto a double linked list. Get to the bottom of the list and
+* insert there the new engines.
+*/
+   prev = NULL;
+   p = &i915->uabi_engines.rb_node;
+   for_each_uabi_engine(engine, i915) {
+   prev = &engine->uabi_node;
+   p = &prev->rb_right;
+   }
+
+   for_each_engine_masked(engine, gt, new_ccs_mask, tmp) {
+   int err;
+
+   i915->engine_uabi_class_count[I915_ENGINE_CLASS_COMPUTE]++;
+
+   rb_link_node(&engine->uabi_node, prev, p);
+   rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
+
+   rb_link_node(&engine->uabi_node, prev, p);
+   rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
+
+   prev = &engine->uabi_node;
+   p = &prev->rb_right;
+
+   err = intel_engine_add_single_sysfs(engine);
+   if (err)
+   gt_warn(gt,
+   "Unable to create sysfs entries for %s engine",
+   engine->name);
+   }
+}
+
+static void remove_uabi_ccs_engines(struct intel_gt *gt, u8 ccs_mode)
+{
+   struct drm_i915_private *i915 = gt->i915;
+   intel_engine_mask_t new_ccs_mask, tmp;
+   struct intel_engine_cs *engine;
+
+   /* Store the current ccs mask */
+   new_ccs_mask = gt->ccs.ccs_mask;
+   update_ccs_mask(gt, ccs_mode);
+
+   /*
+* Store only the mask of the CCS engines that need to be removed by
+* unmasking them from the new mask the engines that are already active
+*/
+   new_ccs_mask = new_ccs_mask & ~gt->ccs.ccs_mask;
+   new_ccs_mask <<= CCS0;
+
+   for_each_engine_masked(engine, gt, new_ccs_mask, tmp) {
+   i915->engine_uabi_class_count[I915_ENGINE_CLASS_COMPUTE]--;
+
+   rb_erase(&engine->uabi_node, &i915->uabi_engines);
+   /* Remove sysfs entries */
+   kobject_put(engine->kobj_defaults);
+   kobject_put(engine->kobj);
+   }
+}
+
 static ssize_t num_cslices_show(struct device *dev,
struct device_attribute *attr,
char *buff)
-- 
2.45.2

[RFC PATCH v2 09/11] drm/i915/gt: Isolate single sysfs engine file creation

2024-08-17 Thread Andi Shyti

In preparation for upcoming patches, we need the ability to
create and remove individual sysfs files. To facilitate this,
extract from the intel_engines_add_sysfs() function the creation
of individual files.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/sysfs_engines.c | 75 -
 drivers/gpu/drm/i915/gt/sysfs_engines.h |  2 +
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index d0bb2aa561ed..3356fadce327 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -9,6 +9,7 @@
 #include "i915_drv.h"
 #include "intel_engine.h"
 #include "intel_engine_heartbeat.h"
+#include "intel_gt_print.h"
 #include "sysfs_engines.h"
 
 struct kobj_engine {
@@ -483,7 +484,7 @@ static void add_defaults(struct kobj_engine *parent)
parent->engine->kobj_defaults = &ke->base;
 }
 
-void intel_engines_add_sysfs(struct drm_i915_private *i915)
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine)
 {
static const struct attribute * const files[] = {
&name_attr.attr,
@@ -499,46 +500,64 @@ void intel_engines_add_sysfs(struct drm_i915_private 
*i915)
 #endif
NULL
};
+   struct kobject *dir = engine->i915->sysfs_engine;
+   struct kobject *kobj = engine->kobj;
+   int err;
 
-   struct device *kdev = i915->drm.primary->kdev;
-   struct intel_engine_cs *engine;
-   struct kobject *dir;
-
-   dir = kobject_create_and_add("engine", &kdev->kobj);
-   if (!dir)
-   return;
-
-   i915->sysfs_engine = dir;
-
-   for_each_uabi_engine(engine, i915) {
-   struct kobject *kobj;
-
+   if (!kobj) {
kobj = kobj_engine(dir, engine);
if (!kobj)
goto err_engine;
+   }
 
-   if (sysfs_create_files(kobj, files))
+   err = sysfs_create_files(kobj, files);
+   if (err)
+   goto err_object;
+
+   if (intel_engine_has_timeslices(engine)) {
+   err = sysfs_create_file(kobj, ×lice_duration_attr.attr);
+   if (err)
goto err_object;
+   }
 
-   if (intel_engine_has_timeslices(engine) &&
-   sysfs_create_file(kobj, ×lice_duration_attr.attr))
-   goto err_engine;
+   if (intel_engine_has_preempt_reset(engine)) {
+   err = sysfs_create_file(kobj, &preempt_timeout_attr.attr);
+   if (err)
+   goto err_object;
+   }
 
-   if (intel_engine_has_preempt_reset(engine) &&
-   sysfs_create_file(kobj, &preempt_timeout_attr.attr))
-   goto err_engine;
+   add_defaults(container_of(kobj, struct kobj_engine, base));
 
-   add_defaults(container_of(kobj, struct kobj_engine, base));
+   engine->kobj = kobj;
 
-   engine->kobj = kobj;
+   return 0;
 
-   if (0) {
 err_object:
-   kobject_put(kobj);
+   kobject_put(kobj);
 err_engine:
-   dev_err(kdev, "Failed to add sysfs engine '%s'\n",
-   engine->name);
+   gt_err(engine->gt, "Failed to add sysfs engine '%s'\n",
+   engine->name);
+
+   return err;
+}
+
+void intel_engines_add_sysfs(struct drm_i915_private *i915)
+{
+   struct device *kdev = i915->drm.primary->kdev;
+   struct intel_engine_cs *engine;
+   struct kobject *dir;
+
+   dir = kobject_create_and_add("engine", &kdev->kobj);
+   if (!dir)
+   return;
+
+   i915->sysfs_engine = dir;
+
+   for_each_uabi_engine(engine, i915) {
+   int err;
+
+   err = intel_engine_add_single_sysfs(engine);
+   if (err)
break;
-   }
}
 }
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.h 
b/drivers/gpu/drm/i915/gt/sysfs_engines.h
index 9546fffe03a7..2e3ec2df14a9 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.h
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.h
@@ -7,7 +7,9 @@
 #define INTEL_ENGINE_SYSFS_H
 
 struct drm_i915_private;
+struct intel_engine_cs;
 
 void intel_engines_add_sysfs(struct drm_i915_private *i915);
+int intel_engine_add_single_sysfs(struct intel_engine_cs *engine);
 
 #endif /* INTEL_ENGINE_SYSFS_H */
-- 
2.45.2

[RFC PATCH v2 08/11] drm/i915/gt: Store active CCS mask

2024-08-17 Thread Andi Shyti

To support upcoming patches, we need to store the current mask
for active CCS engines.

Active engines refer to those exposed to userspace via the UABI
engine list.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 20 
 drivers/gpu/drm/i915/gt/intel_gt_types.h|  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 49493928f714..01ce719cf475 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -9,6 +9,23 @@
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
 
+static void update_ccs_mask(struct intel_gt *gt, u32 ccs_mode)
+{
+   unsigned long cslices_mask = CCS_MASK(gt);
+   int i;
+
+   /* Mask off all the CCS engines */
+   gt->ccs.ccs_mask = 0;
+
+   for_each_set_bit(i, &cslices_mask, I915_MAX_CCS) {
+   gt->ccs.ccs_mask |= BIT(i);
+
+   ccs_mode--;
+   if (!ccs_mode)
+   break;
+   }
+}
+
 void intel_gt_apply_ccs_mode(struct intel_gt *gt, u32 mode)
 {
unsigned long cslices_mask = CCS_MASK(gt);
@@ -91,6 +108,9 @@ void intel_gt_apply_ccs_mode(struct intel_gt *gt, u32 mode)
 void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
mutex_init(>->ccs.mutex);
+
+   /* Set CCS balance mode 1 in the ccs_mask */
+   update_ccs_mask(gt, 1);
 }
 
 static ssize_t num_cslices_show(struct device *dev,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index a833b395237b..235b4b81eecd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -220,6 +220,7 @@ struct intel_gt {
struct {
struct mutex mutex;
u32 mode_reg_val;
+   intel_engine_mask_t ccs_mask;
} ccs;
 
/*
-- 
2.45.2

[RFC PATCH v2 07/11] drm/i915/gt: Store engine-related sysfs kobjects

2024-08-17 Thread Andi Shyti

Upcoming commits will need to access engine-related kobjects to
enable the creation and destruction of sysfs interfaces at
runtime.

For this, store the "engine" directory (i915->sysfs_engine), the
engine files (gt->kobj), and the default data
(gt->kobj_defaults).

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h | 3 +++
 drivers/gpu/drm/i915/gt/sysfs_engines.c  | 6 ++
 drivers/gpu/drm/i915/i915_drv.h  | 1 +
 3 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index ba55c059063d..a0f2f5c08388 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,9 @@ struct intel_engine_cs {
u32 context_size;
u32 mmio_base;
 
+   struct kobject *kobj;
+   struct kobject *kobj_defaults;
+
struct intel_engine_tlb_inv tlb_inv;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 021f51d9b456..d0bb2aa561ed 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -479,6 +479,8 @@ static void add_defaults(struct kobj_engine *parent)
if (intel_engine_has_preempt_reset(ke->engine) &&
sysfs_create_file(&ke->base, &preempt_timeout_def.attr))
return;
+
+   parent->engine->kobj_defaults = &ke->base;
 }
 
 void intel_engines_add_sysfs(struct drm_i915_private *i915)
@@ -506,6 +508,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
if (!dir)
return;
 
+   i915->sysfs_engine = dir;
+
for_each_uabi_engine(engine, i915) {
struct kobject *kobj;
 
@@ -526,6 +530,8 @@ void intel_engines_add_sysfs(struct drm_i915_private *i915)
 
add_defaults(container_of(kobj, struct kobj_engine, base));
 
+   engine->kobj = kobj;
+
if (0) {
 err_object:
kobject_put(kobj);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 94f7f6cc444c..3a8a757f5bd5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -320,6 +320,7 @@ struct drm_i915_private {
struct intel_gt *gt[I915_MAX_GT];
 
struct kobject *sysfs_gt;
+   struct kobject *sysfs_engine;
 
/* Quick lookup of media GT (current platforms only have one) */
struct intel_gt *media_gt;
-- 
2.45.2

[RFC PATCH v2 06/11] drm/i915/gt: Expose the number of total CCS slices

2024-08-17 Thread Andi Shyti

Implement a sysfs interface to show the number of available CCS
slices. The displayed number does not take into account the CCS
balancing mode.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 24 +
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_sysfs.c|  2 ++
 3 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 2b6d4ee7445d..49493928f714 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -5,7 +5,9 @@
 
 #include "i915_drv.h"
 #include "intel_gt_ccs_mode.h"
+#include "intel_gt_print.h"
 #include "intel_gt_regs.h"
+#include "intel_gt_sysfs.h"
 
 void intel_gt_apply_ccs_mode(struct intel_gt *gt, u32 mode)
 {
@@ -90,3 +92,25 @@ void intel_gt_ccs_mode_init(struct intel_gt *gt)
 {
mutex_init(>->ccs.mutex);
 }
+
+static ssize_t num_cslices_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buff)
+{
+   struct intel_gt *gt = kobj_to_gt(&dev->kobj);
+   u32 num_slices;
+
+   num_slices = hweight32(CCS_MASK(gt));
+
+   return sysfs_emit(buff, "%u\n", num_slices);
+}
+static DEVICE_ATTR_RO(num_cslices);
+
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt)
+{
+   int err;
+
+   err = sysfs_create_file(>->sysfs_gt, &dev_attr_num_cslices.attr);
+   if (err)
+   gt_dbg(gt, "failed to create sysfs num_cslices files\n");
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 0e1c43ea1d54..c60bfdb54e37 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -9,6 +9,7 @@
 #include "intel_gt.h"
 
 void intel_gt_apply_ccs_mode(struct intel_gt *gt, u32 mode);
+void intel_gt_sysfs_ccs_init(struct intel_gt *gt);
 void intel_gt_ccs_mode_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
index 33cba406b569..895eedc402ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
@@ -12,6 +12,7 @@
 #include "i915_drv.h"
 #include "i915_sysfs.h"
 #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_print.h"
 #include "intel_gt_sysfs.h"
 #include "intel_gt_sysfs_pm.h"
@@ -101,6 +102,7 @@ void intel_gt_sysfs_register(struct intel_gt *gt)
goto exit_fail;
 
intel_gt_sysfs_pm_init(gt, >->sysfs_gt);
+   intel_gt_sysfs_ccs_init(gt);
 
return;
 
-- 
2.45.2

[RFC PATCH v2 05/11] drm/i915/gt: Remove cslices mask value from the CCS structure

2024-08-17 Thread Andi Shyti

Following the decision to manage CCS engine creation within UABI
engines, the "cslices" variable in the "ccs" structure in the
"gt" is no longer needed. Remove it is now redundant.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 5 -
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 6afd44ffc358..2b6d4ee7445d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -9,7 +9,7 @@
 
 void intel_gt_apply_ccs_mode(struct intel_gt *gt, u32 mode)
 {
-   unsigned long cslices_mask = gt->ccs.cslices;
+   unsigned long cslices_mask = CCS_MASK(gt);
u32 mode_val = 0;
u32 m = mode;
int ccs_id;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 8df8fac066c0..a833b395237b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -218,11 +218,6 @@ struct intel_gt {
 * i.e. how the CCS streams are distributed amongs the slices.
 */
struct {
-   /*
-* Mask of the non fused CCS slices
-* to be used for the load balancing
-*/
-   intel_engine_mask_t cslices;
struct mutex mutex;
u32 mode_reg_val;
} ccs;
-- 
2.45.2

[RFC PATCH v2 04/11] drm/i915/gt: Manage CCS engine creation within UABI exposure

2024-08-17 Thread Andi Shyti

In commit ea315f98e5d6 ("drm/i915/gt: Do not generate the command
streamer for all the CCS"), we restricted the creation of
physical CCS engines to only one stream. This allowed the user to
submit a single compute workload, with all CCS slices sharing the
workload from that stream.

This patch removes that limitation but still exposes only one
stream to the user. The physical memory for each engine remains
allocated but unused, however the user will only see one engine
exposed.

Do this by adding only one engine to the UABI list, ensuring that
only one engine is visible to the user.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 23 -
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 20 +++---
 2 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4d30a86016f2..def255ee0b96 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -876,29 +876,6 @@ static intel_engine_mask_t init_engine_mask(struct 
intel_gt *gt)
info->engine_mask &= ~BIT(GSC0);
}
 
-   /*
-* Do not create the command streamer for CCS slices beyond the first.
-* All the workload submitted to the first engine will be shared among
-* all the slices.
-*
-* Once the user will be allowed to customize the CCS mode, then this
-* check needs to be removed.
-*/
-   if (IS_DG2(gt->i915)) {
-   u8 first_ccs = __ffs(CCS_MASK(gt));
-
-   /*
-* Store the number of active cslices before
-* changing the CCS engine configuration
-*/
-   gt->ccs.cslices = CCS_MASK(gt);
-
-   /* Mask off all the CCS engine */
-   info->engine_mask &= ~GENMASK(CCS3, CCS0);
-   /* Put back in the first CCS engine */
-   info->engine_mask |= BIT(_CCS(first_ccs));
-   }
-
return info->engine_mask;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 11cc06c0c785..c5ccb677ed15 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -207,6 +207,7 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
struct legacy_ring ring = {};
struct list_head *it, *next;
struct rb_node **p, *prev;
+   u8 uabi_ccs_instance = 0;
LIST_HEAD(engines);
 
sort_engines(i915, &engines);
@@ -246,6 +247,22 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
 
GEM_BUG_ON(uabi_class >=
   ARRAY_SIZE(i915->engine_uabi_class_count));
+
+   /* Fix up the mapping to match default execbuf::user_map[] */
+   add_legacy_ring(&ring, engine);
+
+   /*
+* Do not create the command streamer for CCS slices beyond the
+* first. All the workload submitted to the first engine will be
+* shared among all the slices.
+*/
+   if (IS_DG2(i915) && uabi_class == I915_ENGINE_CLASS_COMPUTE) {
+   uabi_ccs_instance++;
+
+   if (uabi_ccs_instance > 1)
+   continue;
+   }
+
i915->engine_uabi_class_count[uabi_class]++;
 
rb_link_node(&engine->uabi_node, prev, p);
@@ -255,9 +272,6 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
engine->uabi_class,
engine->uabi_instance) != 
engine);
 
-   /* Fix up the mapping to match default execbuf::user_map[] */
-   add_legacy_ring(&ring, engine);
-
prev = &engine->uabi_node;
p = &prev->rb_right;
}
-- 
2.45.2

[RFC PATCH v2 03/11] drm/i915/gt: Refactor uabi engine class/instance list creation

2024-08-17 Thread Andi Shyti

For the upcoming changes we need a cleaner way to build the list
of uabi engines.

Suggested-by: Tvrtko Ursulin 
Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 29 -
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..11cc06c0c785 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,7 @@ static void engine_rename(struct intel_engine_cs *engine, 
const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)
 {
-   u16 name_instance, other_instance = 0;
+   u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
struct legacy_ring ring = {};
struct list_head *it, *next;
struct rb_node **p, *prev;
@@ -214,6 +214,8 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
prev = NULL;
p = &i915->uabi_engines.rb_node;
list_for_each_safe(it, next, &engines) {
+   u16 uabi_class;
+
struct intel_engine_cs *engine =
container_of(it, typeof(*engine), uabi_list);
 
@@ -222,15 +224,14 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
engine->uabi_class = uabi_classes[engine->class];
-   if (engine->uabi_class == I915_NO_UABI_CLASS) {
-   name_instance = other_instance++;
-   } else {
-   GEM_BUG_ON(engine->uabi_class >=
-  ARRAY_SIZE(i915->engine_uabi_class_count));
-   name_instance =
-   
i915->engine_uabi_class_count[engine->uabi_class]++;
-   }
-   engine->uabi_instance = name_instance;
+
+   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+   else
+   uabi_class = engine->uabi_class;
+
+   GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+   engine->uabi_instance = class_instance[uabi_class]++;
 
/*
 * Replace the internal name with the final user and log facing
@@ -238,11 +239,15 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 */
engine_rename(engine,
  intel_engine_class_repr(engine->class),
- name_instance);
+ engine->uabi_instance);
 
-   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   if (uabi_class > I915_LAST_UABI_ENGINE_CLASS)
continue;
 
+   GEM_BUG_ON(uabi_class >=
+  ARRAY_SIZE(i915->engine_uabi_class_count));
+   i915->engine_uabi_class_count[uabi_class]++;
+
rb_link_node(&engine->uabi_node, prev, p);
rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
 
-- 
2.45.2

[RFC PATCH v2 02/11] drm/i915/gt: Allow the creation of multi-mode CCS masks

2024-08-17 Thread Andi Shyti

Until now, we have only set CCS mode balancing to 1, which means
that only one compute engine is exposed to the user. The stream
of compute commands submitted to that engine is then shared among
all the dedicated execution units.

This is done by calling the 'intel_gt_apply_ccs_mode(); function.

With this change, the aforementioned function takes an additional
parameter called 'mode' that specifies the desired mode to be set
for the CCS engines balancing. The mode parameter can have the
following values:

 - mode = 0: CCS load balancing mode 1 (1 CCS engine exposed)
 - mode = 1: CCS load balancing mode 2 (2 CCS engines exposed)
 - mode = 3: CCS load balancing mode 4 (4 CCS engines exposed)

This allows us to generate the appropriate register value to be
written to CCS_MODE, configuring how the exposed engine streams
will be submitted to the execution units.

No functional changes are intended yet, as no mode higher than
'0' is currently being set.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 78 -
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  4 +-
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  2 +-
 3 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 19e0bc359861..6afd44ffc358 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -4,37 +4,83 @@
  */
 
 #include "i915_drv.h"
-#include "intel_gt.h"
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
-void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+void intel_gt_apply_ccs_mode(struct intel_gt *gt, u32 mode)
 {
+   unsigned long cslices_mask = gt->ccs.cslices;
+   u32 mode_val = 0;
+   u32 m = mode;
+   int ccs_id;
int cslice;
-   u32 mode = 0;
-   int first_ccs = __ffs(CCS_MASK(gt));
 
lockdep_assert_held(>->ccs.mutex);
 
if (!IS_DG2(gt->i915))
return;
 
-   /* Build the value for the fixed CCS load balancing */
-   for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
-   if (gt->ccs.cslices & BIT(cslice))
-   /*
-* If available, assign the cslice
-* to the first available engine...
-*/
-   mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
+   /*
+* The mode has two bit dedicated for each engine
+* that will be used for the CCS balancing algorithm:
+*
+*BIT | CCS slice
+*   --
+* 0  | CCS slice
+* 1  | 0
+*   --
+* 2  | CCS slice
+* 3  | 1
+*   --
+* 4  | CCS slice
+* 5  | 2
+*   --
+* 6  | CCS slice
+* 7  | 3
+*   --
+*
+* When a CCS slice is not available, then we will write 0x7,
+* oterwise we will write the user engine id which load will
+* be forwarded to that slice.
+*
+* The possible configurations are:
+*
+* 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*/
+   ccs_id = __ffs(cslices_mask);
 
-   else
+   for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
+   if (!(cslices_mask & BIT(cslice))) {
/*
-* ... otherwise, mark the cslice as
-* unavailable if no CCS dispatches here
+* If not available, mark the slice as unavailable
+* and no task will be dispatched here.
 */
-   mode |= XEHP_CCS_MODE_CSLICE(cslice,
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice,
 XEHP_CCS_MODE_CSLICE_MASK);
+   continue;
+   }
+
+   mode_val |= XEHP_CCS_MODE_CSLICE(cslice, ccs_id);
+
+   if (!m) {
+   m = mode;
+   ccs_id = __ffs(cslices_mask);
+   continue;
+   }
+
+   m--;
+   ccs_id = find_next_bit(&cslices_mask, I915_MAX_CCS, ccs_id + 1);
}
 
gt->ccs.mode_reg_val = mode;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index e646ab595ded..0e1c43ea1d54 10064

[RFC PATCH v2 01/11] drm/i915/gt: Move the CCS mode variable to a global position

2024-08-17 Thread Andi Shyti

Store the CCS mode value in the intel_gt->ccs structure to make
it available for future instances that may need to change its
value.

Name it mode_reg_val because it holds the value that will
be written into the CCS_MODE register, determining the CCS
balancing and, consequently, the number of engines generated.

Create a mutex to control access to the mode_reg_val variable.

No functional changes intended.

Signed-off-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gt/intel_gt.c  |  3 +++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 13 ++---
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h |  3 ++-
 drivers/gpu/drm/i915/gt/intel_gt_types.h| 12 
 drivers/gpu/drm/i915/gt/intel_workarounds.c |  7 ---
 5 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a6c69a706fd7..5af0527d822d 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -18,6 +18,7 @@
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
+#include "intel_gt_ccs_mode.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_debugfs.h"
 #include "intel_gt_mcr.h"
@@ -136,6 +137,8 @@ int intel_gt_init_mmio(struct intel_gt *gt)
intel_sseu_info_init(gt);
intel_gt_mcr_init(gt);
 
+   intel_gt_ccs_mode_init(gt);
+
return intel_engines_init_mmio(gt);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 3c62a44e9106..19e0bc359861 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -8,14 +8,16 @@
 #include "intel_gt_ccs_mode.h"
 #include "intel_gt_regs.h"
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
+void intel_gt_apply_ccs_mode(struct intel_gt *gt)
 {
int cslice;
u32 mode = 0;
int first_ccs = __ffs(CCS_MASK(gt));
 
+   lockdep_assert_held(>->ccs.mutex);
+
if (!IS_DG2(gt->i915))
-   return 0;
+   return;
 
/* Build the value for the fixed CCS load balancing */
for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
@@ -35,5 +37,10 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
 XEHP_CCS_MODE_CSLICE_MASK);
}
 
-   return mode;
+   gt->ccs.mode_reg_val = mode;
+}
+
+void intel_gt_ccs_mode_init(struct intel_gt *gt)
+{
+   mutex_init(>->ccs.mutex);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
index 55547f2ff426..e646ab595ded 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
@@ -8,6 +8,7 @@
 
 struct intel_gt;
 
-unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt);
+void intel_gt_apply_ccs_mode(struct intel_gt *gt);
+void intel_gt_ccs_mode_init(struct intel_gt *gt);
 
 #endif /* __INTEL_GT_CCS_MODE_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h 
b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index bcee084b1f27..8df8fac066c0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -207,12 +207,24 @@ struct intel_gt {
[MAX_ENGINE_INSTANCE + 1];
enum intel_submission_method submission_method;
 
+   /*
+* Track fixed mapping between CCS engines and compute slices.
+*
+* In order to w/a HW that has the inability to dynamically load
+* balance between CCS engines and EU in the compute slices, we have to
+* reconfigure a static mapping on the fly.
+*
+* The mode variable is set by the user and sets the balancing mode,
+* i.e. how the CCS streams are distributed amongs the slices.
+*/
struct {
/*
 * Mask of the non fused CCS slices
 * to be used for the load balancing
 */
intel_engine_mask_t cslices;
+   struct mutex mutex;
+   u32 mode_reg_val;
} ccs;
 
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index bfe6d8fc820f..daa11e11d68f 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -2727,7 +2727,6 @@ add_render_compute_tuning_settings(struct intel_gt *gt,
 static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct 
i915_wa_list *wal)
 {
struct intel_gt *gt = engine->gt;
-   u32 mode;
 
if (!IS_DG2(gt->i915))
return;
@@ -2744,8 +2743,10 @@ static void ccs_engine_wa_mode(struct intel_engine_cs 
*engine, struct i915_wa_li
 * After havi

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1336 matches

Mail list logo