Re: [PATCH v6 01/12] spi: add driver for intel graphics on-die spi device

2024-09-23 Thread Tvrtko Ursulin



On 21/09/2024 14:00, Winkler, Tomas wrote:





On Thu, Sep 19, 2024 at 09:54:24AM +, Winkler, Tomas wrote:

On Mon, Sep 16, 2024 at 04:49:17PM +0300, Alexander Usyskin wrote:



@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2019-2024, Intel Corporation. All rights reserved.
+ */



Please make the entire comment a C++ one so things look more

intentional.


This is how it is required by Linux spdx checker,


There is no incompatibility between SPDX and what I'm asking for...


+   size = sizeof(*spi) + sizeof(spi->regions[0]) * nregions;
+   spi = kzalloc(size, GFP_KERNEL);



Use at least array_size().



Regions is not fixed size array, it will not work.


Yes, that's the wrong helper - there is a relevent one though which I'm not
remembering right now.



I don't think there is one, you can allocate arrays but this is not the case 
here.


struct_size() probably.

Regards,

Tvrtko


[PULL] drm-intel-fixes

2024-09-12 Thread Tvrtko Ursulin


Hi Dave, Sima,

It is late in the cycle and luckily the fix in this weeks PR is just
something to satisfy static analyzers, nothing that can happen in reality,
so pulling it is even optional.

Regards,

Tvrtko

drm-intel-fixes-2024-09-12:
- Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich)
The following changes since commit da3ea35007d0af457a0afc87e84fddaebc4e0b63:

  Linux 6.11-rc7 (2024-09-08 14:50:28 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-fixes-2024-09-12

for you to fetch changes up to d3d37f74683e2f16f2635ee265884f7ca69350ae:

  drm/i915/guc: prevent a possible int overflow in wq offsets (2024-09-10 
08:13:51 +0100)


- Prevent a possible int overflow in wq offsets [guc] (Nikita Zhandarovich)


Nikita Zhandarovich (1):
  drm/i915/guc: prevent a possible int overflow in wq offsets

 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


[PULL] drm-intel-fixes

2024-09-05 Thread Tvrtko Ursulin
Hi Dave, Sima,

Some fixes for the weekly cycle:

Avoid pointless attempts to reload GSC, fix for VBIOS/GOP LUT takeover on
ILK and SNB, eliminate regressions by limitting Fast Wake sync pulse
workaround to Dell Precision 5490 with AUO panels only and some clang
build fixes.

Regards,

Tvrtko

drm-intel-fixes-2024-09-05:
- drm/i915: Do not attempt to load the GSC multiple times (Daniele Ceraolo 
Spurio)
- drm/i915: Fix readout degamma_lut mismatch on ilk/snb (Ville Syrjälä)
- drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused (Andy 
Shevchenko)
- drm/i915/fence: Mark debug_fence_free() with __maybe_unused (Andy Shevchenko)
- drm/i915/display: Add mechanism to use sink model when applying quirk 
[display] (Jouni Högander)
- drm/i915/display: Increase Fast Wake Sync length as a quirk [display] (Jouni 
Högander)
The following changes since commit 431c1646e1f86b949fa3685efc50b660a364c2b6:

  Linux 6.11-rc6 (2024-09-01 19:46:02 +1200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-fixes-2024-09-05

for you to fetch changes up to a13494de53258d8cf82ed3bcd69176bbf7f2640e:

  drm/i915/display: Increase Fast Wake Sync length as a quirk (2024-09-03 
10:22:39 +0300)


- drm/i915: Do not attempt to load the GSC multiple times (Daniele Ceraolo 
Spurio)
- drm/i915: Fix readout degamma_lut mismatch on ilk/snb (Ville Syrjälä)
- drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused (Andy 
Shevchenko)
- drm/i915/fence: Mark debug_fence_free() with __maybe_unused (Andy Shevchenko)
- drm/i915/display: Add mechanism to use sink model when applying quirk 
[display] (Jouni Högander)
- drm/i915/display: Increase Fast Wake Sync length as a quirk [display] (Jouni 
Högander)


Andy Shevchenko (2):
  drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused
  drm/i915/fence: Mark debug_fence_free() with __maybe_unused

Daniele Ceraolo Spurio (1):
  drm/i915: Do not attempt to load the GSC multiple times

Jouni Högander (2):
  drm/i915/display: Add mechanism to use sink model when applying quirk
  drm/i915/display: Increase Fast Wake Sync length as a quirk

Ville Syrjälä (1):
  drm/i915: Fix readout degamma_lut mismatch on ilk/snb

 drivers/gpu/drm/i915/display/intel_alpm.c  |  2 +-
 drivers/gpu/drm/i915/display/intel_display_types.h |  4 ++
 drivers/gpu/drm/i915/display/intel_dp.c|  4 ++
 drivers/gpu/drm/i915/display/intel_dp_aux.c| 16 +++--
 drivers/gpu/drm/i915/display/intel_dp_aux.h|  2 +-
 drivers/gpu/drm/i915/display/intel_modeset_setup.c | 31 --
 drivers/gpu/drm/i915/display/intel_quirks.c| 68 ++
 drivers/gpu/drm/i915/display/intel_quirks.h|  6 ++
 drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c  |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h   |  5 ++
 drivers/gpu/drm/i915/i915_sw_fence.c   |  8 +--
 11 files changed, 131 insertions(+), 17 deletions(-)


Re: [PATCH] drm/i915/gt: Continue creating engine sysfs files even after a failure

2024-09-04 Thread Tvrtko Ursulin



On 04/09/2024 15:34, Jani Nikula wrote:

On Wed, 04 Sep 2024, Andi Shyti  wrote:

Hi Sima,

On Tue, Aug 27, 2024 at 07:05:05PM +0200, Daniel Vetter wrote:

On Mon, Aug 19, 2024 at 01:31:40PM +0200, Andi Shyti wrote:

The i915 driver generates sysfs entries for each engine of the
GPU in /sys/class/drm/cardX/engines/.

The process is straightforward: we loop over the UABI engines and
for each one, we:

  - Create the object.
  - Create basic files.
  - If the engine supports timeslicing, create timeslice duration files.
  - If the engine supports preemption, create preemption-related files.
  - Create default value files.

Currently, if any of these steps fail, the process stops, and no
further sysfs files are created.

However, it's not necessary to stop the process on failure.
Instead, we can continue creating the remaining sysfs files for
the other engines. Even if some files fail to be created, the
list of engines can still be retrieved by querying i915.

Signed-off-by: Andi Shyti 


Uh, sysfs is uapi. Either we need it, and it _must_ be there, or it's not
needed, and we should delete those files probably.

This is different from debugfs, where failures are consistently ignored
because that's the conscious design choice Greg made and wants supported.
Because debugfs is optional.

So please make sure we correctly fail driver load if these don't register.
Even better would be if sysfs files are registered atomically as attribute
blocks, but that's an entire different can of worms. But that would really
clean up this code and essentially put any failure handling onto core
driver model and sysfs code.


This comment came after I merged the patch. So far, we have been
keeping the driver going even if sysfs fails to create, with the
idea of "if there is something wrong let it go as far as it can
and fail on its own".

This change is just setting the behavior to what the rest of the
interfaces are doing, so that either we change them all to fail
the driver's probe or we have them behaving consistently as they
are.

Tvrtko, Chris, Rodrigo any opinion from your side? Shall we bail
out as Sima is suggesting?


Are there any causes for sysfs creation errors that would be acceptable
to ignore? I didn't see any examples. Or is this just speculative?


I think it is speculative and that the reason for "carry on on failure" 
was probably simply because there aren't any real world reasons any 
would ever fail. Either a programming error or kernel out of memory on 
driver load, and neither of those sounds interesting.


I suspect historically it was probably deemed simpler not to bother with 
any unwind or such, and that is the only reason i915_setup_sysfs() 
returns void.


In this context I don't see a big ROI in making someone work on 
implementing a driver load abort here, but also don't think it would harm.


IMO it would be fine to tie the decision with the fate of dynamic CCS 
engines. If that will go in then it definitely more than makes sense to 
propagate all errors to the entity doing the sysfs write.


Regards,

Tvrtko


IMO fail fast and loud. We get enough bug reports where there's some big
backtrace splash copy-pasted on the bug, but the root cause happened
much earlier and was ignored.

BR,
Jani.




Re: [PATCH] drm/i915/guc: prevent a possible int overflow in wq offsets

2024-09-04 Thread Tvrtko Ursulin



On 26/08/2024 11:45, Nikita Zhandarovich wrote:

Hi,

On 7/25/24 08:59, Nikita Zhandarovich wrote:

It may be possible for the sum of the values derived from
i915_ggtt_offset() and __get_parent_scratch_offset()/
i915_ggtt_offset() to go over the u32 limit before being assigned
to wq offsets of u64 type.

Mitigate these issues by expanding one of the right operands
to u64 to avoid any overflow issues just in case.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 2584b3549f4c ("drm/i915/guc: Update to GuC version 70.1.1")
Cc: sta...@vger.kernel.org
Signed-off-by: Nikita Zhandarovich 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9400d0eb682b..908ebfa22933 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2842,9 +2842,9 @@ static void prepare_context_registration_info_v70(struct 
intel_context *ce,
ce->parallel.guc.wqi_tail = 0;
ce->parallel.guc.wqi_head = 0;
  
-		wq_desc_offset = i915_ggtt_offset(ce->state) +

+   wq_desc_offset = (u64)i915_ggtt_offset(ce->state) +
 __get_parent_scratch_offset(ce);
-   wq_base_offset = i915_ggtt_offset(ce->state) +
+   wq_base_offset = (u64)i915_ggtt_offset(ce->state) +
 __get_wq_offset(ce);
info->wq_desc_lo = lower_32_bits(wq_desc_offset);
info->wq_desc_hi = upper_32_bits(wq_desc_offset);


Gentle ping,


With the current hardware this cannot overflow but I guess it doesn't 
harm to be explicitly safe. Adding some GuC folks to either r-b or add 
more candidates for review.


Regards,

Tvrtko



[PULL] drm-intel-fixes

2024-08-08 Thread Tvrtko Ursulin


Hi Dave, Sima,

A small bunch of fixes for the weekly cycle:

Fix for Meteorlake dual PPS, vma offset calculation and tidy when partial
mapping and unbreaking of eviction handling on DG2 small bar systems. 

Regards,

Tvrtko

drm-intel-fixes-2024-08-08:
- correct dual pps handling for MTL_PCH+ [display] (Dnyaneshwar Bhadane)
- Adjust vma offset for framebuffer mmap offset [gem] (Andi Shyti)
- Fix Virtual Memory mapping boundaries calculation [gem] (Andi Shyti)
- Allow evicting to use the requested placement (David Gow)
- Attempt to get pages without eviction first (David Gow)
The following changes since commit de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed:

  Linux 6.11-rc2 (2024-08-04 13:50:53 -0700)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-fixes-2024-08-08

for you to fetch changes up to 787db3bb6ed5cee56fc97fecdd61517d89763f0a:

  drm/i915: Attempt to get pages without eviction first (2024-08-07 11:02:38 
+0300)


- correct dual pps handling for MTL_PCH+ [display] (Dnyaneshwar Bhadane)
- Adjust vma offset for framebuffer mmap offset [gem] (Andi Shyti)
- Fix Virtual Memory mapping boundaries calculation [gem] (Andi Shyti)
- Allow evicting to use the requested placement (David Gow)
- Attempt to get pages without eviction first (David Gow)


Andi Shyti (2):
  drm/i915/gem: Adjust vma offset for framebuffer mmap offset
  drm/i915/gem: Fix Virtual Memory mapping boundaries calculation

David Gow (2):
  drm/i915: Allow evicting to use the requested placement
  drm/i915: Attempt to get pages without eviction first

Dnyaneshwar Bhadane (1):
  drm/i915/display: correct dual pps handling for MTL_PCH+

 drivers/gpu/drm/i915/display/intel_backlight.c |  3 ++
 drivers/gpu/drm/i915/display/intel_pps.c   |  3 ++
 drivers/gpu/drm/i915/gem/i915_gem_mman.c   | 55 +++---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 13 +++---
 4 files changed, 62 insertions(+), 12 deletions(-)


Re: [PATCH 3/3] drm/i915: remove __i915_printk()

2024-08-07 Thread Tvrtko Ursulin



On 07/08/2024 12:40, Jani Nikula wrote:

On Wed, 07 Aug 2024, Tvrtko Ursulin  wrote:

On 06/08/2024 14:38, Jani Nikula wrote:

With the previous cleanups, the last remaining user of __i915_printk()
is i915_probe_error(). Switch that to use drm_dbg() and drm_err()
instead, dropping the request to report bugs in the few remaining
specific cases.


Aren't those few cases legitimate probe failures, including anything
unexpected which results in non-operational GPU (any -EIO from
intel_gt_init())?


They are, and they're still logged as such. Functionally, the only
change is removing the bug filing request.


So it is effectively completely(*) removing the request to file bugs, or
I miss something remained? Or the unmentioned goal is to encourage fewer
i915 bug reports on top of the code base cleanup?


I should've elaborated this better.

My question is, what makes these cases so special that they warrant
logging a bug filing request? First, I would assume the init paths are
most tested in CI and least likely to trigger a failure on end user
machines. Second, even if they did trigger for the end user, a
non-operational GPU is most likely to lead to a bug report even without
a request.


Yeah I tend to agree. Just wanted to probe a bit more on the motivation.

Error captures aside, other places which can fail and which we are 
discussing here are a bit too varied and I agree it is better to 
simplify, rather than pretend some are more important than the others.


Acked-by: Tvrtko Ursulin 

Regards,

Tvrtko


To me it just seems weird, and I opted to remove them, not least because
it's not common for drivers to do this at all. (And yes, I'd remove the
backlight one too.)

The other option is to embrace logging bug reporting requests. But for
that I'd rather add a separate function, call it at the relevant places,
and not hide it within this complex maze of multi-level debug logging
macros.


BR,
Jani.





Regards,

Tvrtko

*) Apart from display/intel_dp_aux_backlight.c !? :)


Signed-off-by: Jani Nikula 
---
   drivers/gpu/drm/i915/i915_utils.c | 41 ---
   drivers/gpu/drm/i915/i915_utils.h | 13 +-
   2 files changed, 6 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.c 
b/drivers/gpu/drm/i915/i915_utils.c
index bee3f0fd..b34a2d3d331d 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -11,47 +11,6 @@
   #include "i915_reg.h"
   #include "i915_utils.h"
   
-#define FDO_BUG_MSG "Please file a bug on drm/i915; see " FDO_BUG_URL " for details."

-
-void
-__i915_printk(struct drm_i915_private *dev_priv, const char *level,
- const char *fmt, ...)
-{
-   static bool shown_bug_once;
-   struct device *kdev = dev_priv->drm.dev;
-   bool is_error = level[1] <= KERN_ERR[1];
-   bool is_debug = level[1] == KERN_DEBUG[1];
-   struct va_format vaf;
-   va_list args;
-
-   if (is_debug && !drm_debug_enabled(DRM_UT_DRIVER))
-   return;
-
-   va_start(args, fmt);
-
-   vaf.fmt = fmt;
-   vaf.va = &args;
-
-   if (is_error)
-   dev_printk(level, kdev, "%pV", &vaf);
-   else
-   dev_printk(level, kdev, "[" DRM_NAME ":%ps] %pV",
-  __builtin_return_address(0), &vaf);
-
-   va_end(args);
-
-   if (is_error && !shown_bug_once) {
-   /*
-* Ask the user to file a bug report for the error, except
-* if they may have caused the bug by fiddling with unsafe
-* module parameters.
-*/
-   if (!test_taint(TAINT_USER))
-   dev_notice(kdev, "%s", FDO_BUG_MSG);
-   shown_bug_once = true;
-   }
-}
-
   void add_taint_for_CI(struct drm_i915_private *i915, unsigned int taint)
   {
drm_notice(&i915->drm, "CI tainted: %#x by %pS\n",
diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index feb078ae246f..71bdc89bd621 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -45,10 +45,6 @@ struct timer_list;
   #define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
 __stringify(x), (long)(x))
   
-void __printf(3, 4)

-__i915_printk(struct drm_i915_private *dev_priv, const char *level,
- const char *fmt, ...);
-
   #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
   
   int __i915_inject_probe_error(struct drm_i915_private *i915, int err,

@@ -66,9 +62,12 @@ bool i915_error_injected(void);
   
   #define i915_inject_probe_failure(i915) i915_inject_probe_error((i915), -ENODEV)
   
-#define i915_probe_error(i915, fmt, ...)   \

-   __i915_printk(i915, i915_error_injected() ? KERN_DEBUG : KERN_ERR, \
-  

Re: Too large alloc in gem_exec_reloc test? (was Re: [linus:master] [mm/slab] 2e8000b826: WARNING:at_mm/util.c:#__kvmalloc_node_noprof)

2024-08-07 Thread Tvrtko Ursulin



Hi,

On 05/08/2024 19:48, Kees Cook wrote:

This seems like some kind of pre-existing issue in the igt test, reachable
via eb_copy_relocations(). The only warning in kvmalloc_node_noprof() is:

 /* Don't even allow crazy sizes */
 if (unlikely(size > INT_MAX)) {
 WARN_ON_ONCE(!(flags & __GFP_NOWARN));
 return NULL;
 }

So, something is too big in the test?


Yes, and I think it was reported before, _and_ I tried to fix it.. 
(https://patchwork.freedesktop.org/patch/594928/?series=133871&rev=1) It 
looks it fell through the cracks. Now pushed, thank you for reviews!


Regards,

Tvrtko


-Kees

On Sun, Aug 04, 2024 at 04:56:40PM +0800, kernel test robot wrote:


hi, Kees Cook,

as we understand, this commit is not the root cause of WARNING. the WARNING just
changes the form from (2) to (1) due to this commit.

67f2df3b82d091ed 2e8000b826fcd2716449d09753d
 ---
fail:runs  %reproductionfail:runs
| | |
:6  100%   6:6 
dmesg.WARNING:at_mm/util.c:#__kvmalloc_node_noprof  <--- (1)
   6:6  -67%:6 
dmesg.WARNING:at_mm/util.c:#kvmalloc_node_noprof<--- (2)

however, we failed to bisect (2). so below report is FYI what we observed in our
tests. not sure if it can give any hint to some real issues.



Hello,

kernel test robot noticed "WARNING:at_mm/util.c:#__kvmalloc_node_noprof" on:

commit: 2e8000b826fcd2716449d09753d5ed843067881e ("mm/slab: Introduce 
kvmalloc_buckets_node() that can take kmem_buckets argument")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master  786c8248dbd33a5a7a07f7c6e55a7bfc68d2ca48]
[test failed on linux-next/master 9ec6ec93f2c1e6cd2911e2a4acd5ac85e13bb3e2]

in testcase: igt
version: igt-x86_64-73e21b2bb-1_20240623
with following parameters:

group: gem_exec_reloc



compiler: gcc-13
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby 
Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: https://lore.kernel.org/oe-lkp/202408041614.dbe4b7fd-...@intel.com



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240804/202408041614.dbe4b7fd-...@intel.com


[  928.741334][ T5136] [ cut here ]
[  928.747005][ T5136] WARNING: CPU: 2 PID: 5136 at mm/util.c:650 
__kvmalloc_node_noprof+0x142/0x190
[  928.755967][ T5136] Modules linked in: btrfs blake2b_generic xor 
zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common 
x86_pkg_temp_thermal ipmi_devintf ipmi_msghandler sd_mod intel_powerclamp 
t10_pi coretemp crc64_rocksoft_generic crc64_rocksoft crc64 kvm_intel sg i915 
kvm crct10dif_pclmul crc32_pclmul crc32c_intel drm_buddy ghash_clmulni_intel 
intel_gtt sha512_ssse3 drm_display_helper mei_wdt ttm rapl drm_kms_helper ahci 
wmi_bmof libahci mei_me video intel_cstate intel_uncore idma64 libata mei 
i2c_designware_platform i2c_i801 i2c_designware_core i2c_smbus 
pinctrl_sunrisepoint wmi acpi_pad binfmt_misc loop drm fuse dm_mod ip_tables
[  928.812981][ T5136] CPU: 2 PID: 5136 Comm: gem_exec_reloc Tainted: G S   
  6.10.0-rc1-9-g2e8000b826fc #1
[  928.823924][ T5136] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 
1.2.0 12/22/2016
[  928.832080][ T5136] RIP: 0010:__kvmalloc_node_noprof+0x142/0x190
[  928.838186][ T5136] Code: c4 06 0e 00 48 83 c4 18 48 83 c4 08 5b 5d 41 5c 41 5d 41 
5e c3 cc cc cc cc 49 be 00 00 00 00 00 20 00 00 eb 9f 80 e7 20 75 de <0f> 0b eb 
da 48 c7 c7 f0 fe cf 84 e8 5e 2c 19 00 e9 3f ff ff ff 48
[  928.857727][ T5136] RSP: 0018:c9000e82f6f8 EFLAGS: 00010246
[  928.863744][ T5136] RAX:  RBX: 00c0 RCX: 
0013
[  928.871647][ T5136] RDX: 0007 RSI: 81a13806 RDI: 

[  928.879565][ T5136] RBP: 8000 R08: 0001 R09: 

[  928.887466][ T5136] R10: c9000e82f6f8 R11:  R12: 

[  928.895375][ T5136] R13:  R14: 0400 R15: 
c9000e82f9b0
[  928.903288][ T5136] FS:  7f0ff830d8c0() GS:88879db0() 
knlGS:
[  928.912151][ T5136] CS:  0010 DS:  ES:  CR0: 80050033
[  928.918679][ T5136] CR2: 7f0ff810 CR3: 0008162e0004 CR4: 
003706f0
[  928.926595][ T5136] DR0:  DR1:  DR2: 

[  928.934489][ T5136] DR3:  DR6: fffe0ff0 DR7: 
0400
[  928.942382][ T5136] Call Trace:
[  928.945631][ T5136]  
[  928.948499][ T5136]  ? __warn+0xcc/0x260
[  928.952503][ T5136]  ? __kvma

Re: [PATCH 3/3] drm/i915: remove __i915_printk()

2024-08-07 Thread Tvrtko Ursulin



On 06/08/2024 14:38, Jani Nikula wrote:

With the previous cleanups, the last remaining user of __i915_printk()
is i915_probe_error(). Switch that to use drm_dbg() and drm_err()
instead, dropping the request to report bugs in the few remaining
specific cases.


Aren't those few cases legitimate probe failures, including anything 
unexpected which results in non-operational GPU (any -EIO from 
intel_gt_init())?


So it is effectively completely(*) removing the request to file bugs, or 
I miss something remained? Or the unmentioned goal is to encourage fewer 
i915 bug reports on top of the code base cleanup?


Regards,

Tvrtko

*) Apart from display/intel_dp_aux_backlight.c !? :)


Signed-off-by: Jani Nikula 
---
  drivers/gpu/drm/i915/i915_utils.c | 41 ---
  drivers/gpu/drm/i915/i915_utils.h | 13 +-
  2 files changed, 6 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.c 
b/drivers/gpu/drm/i915/i915_utils.c
index bee3f0fd..b34a2d3d331d 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -11,47 +11,6 @@
  #include "i915_reg.h"
  #include "i915_utils.h"
  
-#define FDO_BUG_MSG "Please file a bug on drm/i915; see " FDO_BUG_URL " for details."

-
-void
-__i915_printk(struct drm_i915_private *dev_priv, const char *level,
- const char *fmt, ...)
-{
-   static bool shown_bug_once;
-   struct device *kdev = dev_priv->drm.dev;
-   bool is_error = level[1] <= KERN_ERR[1];
-   bool is_debug = level[1] == KERN_DEBUG[1];
-   struct va_format vaf;
-   va_list args;
-
-   if (is_debug && !drm_debug_enabled(DRM_UT_DRIVER))
-   return;
-
-   va_start(args, fmt);
-
-   vaf.fmt = fmt;
-   vaf.va = &args;
-
-   if (is_error)
-   dev_printk(level, kdev, "%pV", &vaf);
-   else
-   dev_printk(level, kdev, "[" DRM_NAME ":%ps] %pV",
-  __builtin_return_address(0), &vaf);
-
-   va_end(args);
-
-   if (is_error && !shown_bug_once) {
-   /*
-* Ask the user to file a bug report for the error, except
-* if they may have caused the bug by fiddling with unsafe
-* module parameters.
-*/
-   if (!test_taint(TAINT_USER))
-   dev_notice(kdev, "%s", FDO_BUG_MSG);
-   shown_bug_once = true;
-   }
-}
-
  void add_taint_for_CI(struct drm_i915_private *i915, unsigned int taint)
  {
drm_notice(&i915->drm, "CI tainted: %#x by %pS\n",
diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index feb078ae246f..71bdc89bd621 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -45,10 +45,6 @@ struct timer_list;
  #define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
 __stringify(x), (long)(x))
  
-void __printf(3, 4)

-__i915_printk(struct drm_i915_private *dev_priv, const char *level,
- const char *fmt, ...);
-
  #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
  
  int __i915_inject_probe_error(struct drm_i915_private *i915, int err,

@@ -66,9 +62,12 @@ bool i915_error_injected(void);
  
  #define i915_inject_probe_failure(i915) i915_inject_probe_error((i915), -ENODEV)
  
-#define i915_probe_error(i915, fmt, ...)   \

-   __i915_printk(i915, i915_error_injected() ? KERN_DEBUG : KERN_ERR, \
- fmt, ##__VA_ARGS__)
+#define i915_probe_error(i915, fmt, ...) ({ \
+   if (i915_error_injected()) \
+   drm_dbg(&(i915)->drm, fmt, ##__VA_ARGS__); \
+   else \
+   drm_err(&(i915)->drm, fmt, ##__VA_ARGS__); \
+})
  
  #define range_overflows(start, size, max) ({ \

typeof(start) start__ = (start); \


Re: [PATCH 2/3] drm/i915: remove i915_report_error()

2024-08-07 Thread Tvrtko Ursulin



On 06/08/2024 14:38, Jani Nikula wrote:

i915_report_error() has only two users, both in driver probe. I doubt
these cases are worth having a dedicated wrapper to also print bug
reporting info. Just switch them to regular drm_err() and remove the
wrapper.

Signed-off-by: Jani Nikula 
---
  drivers/gpu/drm/i915/i915_driver.c | 8 
  drivers/gpu/drm/i915/i915_utils.h  | 3 ---
  2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c 
b/drivers/gpu/drm/i915/i915_driver.c
index fb8e9c2fcea5..94dca1d8bb15 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -451,8 +451,8 @@ static int i915_driver_hw_probe(struct drm_i915_private 
*dev_priv)
if (HAS_PPGTT(dev_priv)) {
if (intel_vgpu_active(dev_priv) &&
!intel_vgpu_has_full_ppgtt(dev_priv)) {
-   i915_report_error(dev_priv,
- "incompatible vGPU found, support for 
isolated ppGTT required\n");
+   drm_err(&dev_priv->drm,
+   "incompatible vGPU found, support for isolated ppGTT 
required\n");
return -ENXIO;
}
}
@@ -465,8 +465,8 @@ static int i915_driver_hw_probe(struct drm_i915_private 
*dev_priv)
 */
if (intel_vgpu_active(dev_priv) &&
!intel_vgpu_has_hwsp_emulation(dev_priv)) {
-   i915_report_error(dev_priv,
- "old vGPU host found, support for HWSP 
emulation required\n");
+   drm_err(&dev_priv->drm,
+   "old vGPU host found, support for HWSP emulation 
required\n");
return -ENXIO;
}
}
diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 06ec6ceb61d5..feb078ae246f 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -49,9 +49,6 @@ void __printf(3, 4)
  __i915_printk(struct drm_i915_private *dev_priv, const char *level,
  const char *fmt, ...);
  
-#define i915_report_error(dev_priv, fmt, ...)   \

-   __i915_printk(dev_priv, KERN_ERR, fmt, ##__VA_ARGS__)
-
  #if IS_ENABLED(CONFIG_DRM_I915_DEBUG)
  
  int __i915_inject_probe_error(struct drm_i915_private *i915, int err,


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH 1/3] drm/i915: remove a few __i915_printk() uses

2024-08-07 Thread Tvrtko Ursulin



On 06/08/2024 14:38, Jani Nikula wrote:

__i915_printk() does nothing for notice/info levels. Just use the
regular drm_notice() and drm_info() calls.


"does nothing"? You mean does nothing _special_?

The patch itself looks okay.

Regards,

Tvrtko


Signed-off-by: Jani Nikula 
---
  drivers/gpu/drm/i915/i915_utils.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.c 
b/drivers/gpu/drm/i915/i915_utils.c
index 6f9e7b354b54..bee3f0fd 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -54,8 +54,8 @@ __i915_printk(struct drm_i915_private *dev_priv, const char 
*level,
  
  void add_taint_for_CI(struct drm_i915_private *i915, unsigned int taint)

  {
-   __i915_printk(i915, KERN_NOTICE, "CI tainted:%#x by %pS\n",
- taint, (void *)_RET_IP_);
+   drm_notice(&i915->drm, "CI tainted: %#x by %pS\n",
+  taint, __builtin_return_address(0));
  
  	/* Failures that occur during fault injection testing are expected */

if (!i915_error_injected())
@@ -74,9 +74,9 @@ int __i915_inject_probe_error(struct drm_i915_private *i915, 
int err,
if (++i915_probe_fail_count < i915_modparams.inject_probe_failure)
return 0;
  
-	__i915_printk(i915, KERN_INFO,

- "Injecting failure %d at checkpoint %u [%s:%d]\n",
- err, i915_modparams.inject_probe_failure, func, line);
+   drm_info(&i915->drm, "Injecting failure %d at checkpoint %u [%s:%d]\n",
+err, i915_modparams.inject_probe_failure, func, line);
+
i915_modparams.inject_probe_failure = 0;
return err;
  }


Re: [PATCH] drm/i915: Allow NULL memory region

2024-07-29 Thread Tvrtko Ursulin



On 29/07/2024 15:59, Cavitt, Jonathan wrote:

-Original Message-
From: Tvrtko Ursulin 
Sent: Monday, July 29, 2024 1:21 AM
To: Dan Carpenter 
Cc: Cavitt, Jonathan ; intel-gfx@lists.freedesktop.org; 
Gupta, saurabhg ; chris.p.wil...@linux.intel.com
Subject: Re: [PATCH] drm/i915: Allow NULL memory region



On 26/07/2024 18:00, Dan Carpenter wrote:

On Fri, Jul 26, 2024 at 09:17:20AM +0100, Tvrtko Ursulin wrote:


On 25/07/2024 16:58, Dan Carpenter wrote:

On Thu, Jul 25, 2024 at 08:48:35AM +0100, Tvrtko Ursulin wrote:


Hi,

On 12/07/2024 22:41, Jonathan Cavitt wrote:

Prevent a NULL pointer access in intel_memory_regions_hw_probe.


For future reference please include some impact assessment in patches tagged
as fixes. Makes maintainers job, and even anyone's who tries to backport
stuff to stable at some future date, much easier if it is known how
important is the fix and in what circumstances can the problem it is fixing
trigger.



As someone doing backport work, I think this patch is fine.  Everyone
knows the impact of a NULL dereference in probe().

I guess with patches that add NULL dereferences, the trick is
understanding when people are adding NULL checks to make a static
checker happy or when it's a real bug.  But the fault lies with the
people adding NULL checks just to make the tools happy.  Some of these
pointless NULL checks end up in stable, but it's fine, extra NULL checks
never hurt anyone.  If the maintainer wants to be extra safe by adding
NULL checks then who are we to say otherwise.

In other words normal patches shouldn't have to say. "I'm not lying" at
the end.  It should be the pointless patches which say, "I'm doing a
pointless thing.  Don't bother backporting."

Most stable patch backports are done automatically and people have
various tools and scripts to do that.  If the tools don't handle this
patch automatically then they are defective.


Right, and every few releases maintainers and authors get a bunch of emails
for patches which did not apply to some stable tree.



I believe these emails are only sent for commits that are tagged for
stable.  For AUTOSEL patches, the backporting is done on a best effort
basis.  On the other hand, hopefully this patch would have been tagged
for stable if we hadn't fixed the bug so quickly.


In which case someone has to do manual work and then it is good to know how
important it is to backport something. For cases when it is not trivial. It
does not apply to this patch, but as a _best practice_ it is good if the
commit message explains the impacted platforms and scenarios.

In this case I can follow the Fixes: tag and see the fix that this patches
fixes is only about ATS-M. Which if it was a more complicated patch might be
a reason to not need bother backporting past some kernel version where
platform X wasn't even supported.

Therefore I think my point is that best practice is to include this the
commit text, so any future maintainer/backporter does not have to re-do
detective work, stands.


This is a really elaborate hypothetical.  Are there kernels which are
affected by this bug but don't support ATS-M?


I am not sure why are we arguing against the value of putting a bit more
info in commit messages.

When I was writing up the drm-intel-next-fixes pull request I already
had to follow the Fixes: chain for this one to understand the impact.

This patch is already in and all but from my point of view best practice
still is for commit messages to be a bit more verbose than "fix null
pointer deref". At least when fixes are coming from inside Intel I think
we can assume people have enough info to asses and document.


For future reference, what kind of additional information would you have
preferred been added to this patch that was not originally provided, and in
what location should that information have been added (as a part of the
commit message itself, after the Fixes tag, etc.)?


In this particular case something as simple as below would have made my 
job a little bit easier:


drm/i915: Allow NULL memory region

Prevent a NULL pointer access in intel_memory_regions_hw_probe which
can happen on some ATS-M machines with specific BIOS configurations.

(And it may be wrong what I added but hey-ho, that's kind of the point 
of getting the information direct from the source instead of having to 
figure it out when writing pull requests.)


Regards,

Tvrtko


Re: [PATCH] drm/i915: Allow NULL memory region

2024-07-29 Thread Tvrtko Ursulin



On 26/07/2024 18:00, Dan Carpenter wrote:

On Fri, Jul 26, 2024 at 09:17:20AM +0100, Tvrtko Ursulin wrote:


On 25/07/2024 16:58, Dan Carpenter wrote:

On Thu, Jul 25, 2024 at 08:48:35AM +0100, Tvrtko Ursulin wrote:


Hi,

On 12/07/2024 22:41, Jonathan Cavitt wrote:

Prevent a NULL pointer access in intel_memory_regions_hw_probe.


For future reference please include some impact assessment in patches tagged
as fixes. Makes maintainers job, and even anyone's who tries to backport
stuff to stable at some future date, much easier if it is known how
important is the fix and in what circumstances can the problem it is fixing
trigger.



As someone doing backport work, I think this patch is fine.  Everyone
knows the impact of a NULL dereference in probe().

I guess with patches that add NULL dereferences, the trick is
understanding when people are adding NULL checks to make a static
checker happy or when it's a real bug.  But the fault lies with the
people adding NULL checks just to make the tools happy.  Some of these
pointless NULL checks end up in stable, but it's fine, extra NULL checks
never hurt anyone.  If the maintainer wants to be extra safe by adding
NULL checks then who are we to say otherwise.

In other words normal patches shouldn't have to say. "I'm not lying" at
the end.  It should be the pointless patches which say, "I'm doing a
pointless thing.  Don't bother backporting."

Most stable patch backports are done automatically and people have
various tools and scripts to do that.  If the tools don't handle this
patch automatically then they are defective.


Right, and every few releases maintainers and authors get a bunch of emails
for patches which did not apply to some stable tree.



I believe these emails are only sent for commits that are tagged for
stable.  For AUTOSEL patches, the backporting is done on a best effort
basis.  On the other hand, hopefully this patch would have been tagged
for stable if we hadn't fixed the bug so quickly.


In which case someone has to do manual work and then it is good to know how
important it is to backport something. For cases when it is not trivial. It
does not apply to this patch, but as a _best practice_ it is good if the
commit message explains the impacted platforms and scenarios.

In this case I can follow the Fixes: tag and see the fix that this patches
fixes is only about ATS-M. Which if it was a more complicated patch might be
a reason to not need bother backporting past some kernel version where
platform X wasn't even supported.

Therefore I think my point is that best practice is to include this the
commit text, so any future maintainer/backporter does not have to re-do
detective work, stands.


This is a really elaborate hypothetical.  Are there kernels which are
affected by this bug but don't support ATS-M?


I am not sure why are we arguing against the value of putting a bit more 
info in commit messages.


When I was writing up the drm-intel-next-fixes pull request I already 
had to follow the Fixes: chain for this one to understand the impact.


This patch is already in and all but from my point of view best practice 
still is for commit messages to be a bit more verbose than "fix null 
pointer deref". At least when fixes are coming from inside Intel I think 
we can assume people have enough info to asses and document.


Regards,

Tvrtko


Re: [PATCH] drm/i915: Allow NULL memory region

2024-07-26 Thread Tvrtko Ursulin



On 25/07/2024 16:58, Dan Carpenter wrote:

On Thu, Jul 25, 2024 at 08:48:35AM +0100, Tvrtko Ursulin wrote:


Hi,

On 12/07/2024 22:41, Jonathan Cavitt wrote:

Prevent a NULL pointer access in intel_memory_regions_hw_probe.


For future reference please include some impact assessment in patches tagged
as fixes. Makes maintainers job, and even anyone's who tries to backport
stuff to stable at some future date, much easier if it is known how
important is the fix and in what circumstances can the problem it is fixing
trigger.



As someone doing backport work, I think this patch is fine.  Everyone
knows the impact of a NULL dereference in probe().

I guess with patches that add NULL dereferences, the trick is
understanding when people are adding NULL checks to make a static
checker happy or when it's a real bug.  But the fault lies with the
people adding NULL checks just to make the tools happy.  Some of these
pointless NULL checks end up in stable, but it's fine, extra NULL checks
never hurt anyone.  If the maintainer wants to be extra safe by adding
NULL checks then who are we to say otherwise.

In other words normal patches shouldn't have to say. "I'm not lying" at
the end.  It should be the pointless patches which say, "I'm doing a
pointless thing.  Don't bother backporting."

Most stable patch backports are done automatically and people have
various tools and scripts to do that.  If the tools don't handle this
patch automatically then they are defective.


Right, and every few releases maintainers and authors get a bunch of 
emails for patches which did not apply to some stable tree.


In which case someone has to do manual work and then it is good to know 
how important it is to backport something. For cases when it is not 
trivial. It does not apply to this patch, but as a _best practice_ it is 
good if the commit message explains the impacted platforms and scenarios.


In this case I can follow the Fixes: tag and see the fix that this 
patches fixes is only about ATS-M. Which if it was a more complicated 
patch might be a reason to not need bother backporting past some kernel 
version where platform X wasn't even supported.


Therefore I think my point is that best practice is to include this the 
commit text, so any future maintainer/backporter does not have to re-do 
detective work, stands.


Regards,

Tvrtko


Re: [PATCH] drm/i915: Allow NULL memory region

2024-07-25 Thread Tvrtko Ursulin



Hi,

On 12/07/2024 22:41, Jonathan Cavitt wrote:

Prevent a NULL pointer access in intel_memory_regions_hw_probe.


For future reference please include some impact assessment in patches 
tagged as fixes. Makes maintainers job, and even anyone's who tries to 
backport stuff to stable at some future date, much easier if it is known 
how important is the fix and in what circumstances can the problem it is 
fixing trigger.


Regards,

Tvrtko


Fixes: 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup warning")
Reported-by: Dan Carpenter 
Signed-off-by: Jonathan Cavitt 
---
  drivers/gpu/drm/i915/intel_memory_region.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_memory_region.c 
b/drivers/gpu/drm/i915/intel_memory_region.c
index 172dfa7c3588b..d40ee1b42110a 100644
--- a/drivers/gpu/drm/i915/intel_memory_region.c
+++ b/drivers/gpu/drm/i915/intel_memory_region.c
@@ -368,8 +368,10 @@ int intel_memory_regions_hw_probe(struct drm_i915_private 
*i915)
goto out_cleanup;
}
  
-		mem->id = i;

-   i915->mm.regions[i] = mem;
+   if (mem) { /* Skip on non-fatal errors */
+   mem->id = i;
+   i915->mm.regions[i] = mem;
+   }
}
  
  	for (i = 0; i < ARRAY_SIZE(i915->mm.regions); i++) {


[PULL] drm-intel-next-fixes

2024-07-25 Thread Tvrtko Ursulin


Hi Dave, Sima,

Two fixes for the merge window - turning off preemption on Gen8 since it
apparently just doesn't work reliably enough and a fix for potential NULL
pointer dereference when stolen memory probing failed.

Regards,

Tvrtko

drm-intel-next-fixes-2024-07-25:
- Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote)
- Allow NULL memory region (Jonathan Cavitt)
The following changes since commit 509580fad7323b6a5da27e8365cd488f3b57210e:

  drm/i915/dp: Don't switch the LTTPR mode on an active link (2024-07-16 
08:14:29 +)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-next-fixes-2024-07-25

for you to fetch changes up to 26720dd2b5a1d088bff8f7e6355fca021c83718f:

  drm/i915: Allow NULL memory region (2024-07-23 09:34:13 +)


- Do not consider preemption during execlists_dequeue for gen8 [gt] (Nitin Gote)
- Allow NULL memory region (Jonathan Cavitt)


Jonathan Cavitt (1):
  drm/i915: Allow NULL memory region

Nitin Gote (1):
  drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +-
 drivers/gpu/drm/i915/intel_memory_region.c   | 6 --
 2 files changed, 5 insertions(+), 7 deletions(-)


Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister

2024-07-24 Thread Tvrtko Ursulin



On 23/07/2024 16:30, Lucas De Marchi wrote:

On Tue, Jul 23, 2024 at 09:03:25AM GMT, Tvrtko Ursulin wrote:


On 22/07/2024 22:06, Lucas De Marchi wrote:

Instead of calling perf_pmu_unregister() when unbinding, defer that to
the destruction of i915 object. Since perf itself holds a reference in
the event, this only happens when all events are gone, which guarantees
i915 is not unregistering the pmu with live events.

Previously, running the following sequence would crash the system after
~2 tries:

1) bind device to i915
2) wait events to show up on sysfs
3) start perf  stat -I 1000 -e i915/rcs0-busy/
4) unbind driver
5) kill perf

Most of the time this crashes in perf_pmu_disable() while accessing the
percpu pmu_disable_count. This happens because perf_pmu_unregister()
destroys it with free_percpu(pmu->pmu_disable_count).

With a lazy unbind, the pmu is only unregistered after (5) as opposed to
after (4). The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully). This seems better
than completely crashing the system.


So effectively allows unbind to succeed without fully unbinding the 
driver from the device? That sounds like a significant drawback and if 
so, I wonder if a more complicated solution wouldn't be better after 
all. Or is there precedence for allowing userspace keeping their paws 
on unbound devices in this way?


keeping the resources alive but "unplunged" while the hardware
disappeared is a common thing to do... it's the whole point of the
drmm-managed resource for example. If you bind the driver and then
unbind it while userspace is holding a ref, next time you try to bind it
will come up with a different card number. A similar thing that could be
done is to adjust the name of the event - currently we add the mangled
pci slot.


Yes.. but what my point was this from your commit message:

"""
The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully).
"""

So the subsequent bind does not "come up with a different card number". 
Statement is it will come up with an error if we look at the PMU subset 
of functionality. I was wondering if there was precedent for that kind 
of situation.


Mangling the PMU driver name probably also wouldn't be great.


That said, I agree a better approach would be to allow
perf_pmu_unregister() to do its job even when there are open events. On
top of that (or as a way to help achieve that), make perf core replace
the callbacks with stubs when pmu is unregistered - that would even kill
the need for i915's checks on pmu->closed (and fix the lack thereof in
other drivers).

It can be a can of worms though and may be pushed back by perf core
maintainers, so it'd be good have their feedback.


Yeah definitely would be essential.

Regards,

Tvrtko


Signed-off-by: Lucas De Marchi 
---
 drivers/gpu/drm/i915/i915_pmu.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c 
b/drivers/gpu/drm/i915/i915_pmu.c

index 8708f905f4f4..df53a8fe53ec 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, 
void *res)

 struct i915_pmu *pmu = res;
 struct drm_i915_private *i915 = pmu_to_i915(pmu);
+    perf_pmu_unregister(&pmu->base);
 free_event_attributes(pmu);
 kfree(pmu->base.attr_groups);
 if (IS_DGFX(i915))
 kfree(pmu->name);
+
+    /*
+ * Make sure all currently running (but shortcut on pmu->closed) 
are
+ * gone before proceeding with free'ing the pmu object embedded 
in i915.

+ */
+    synchronize_rcu();
 }
 static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node 
*node)

 {
-    struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), 
cpuhp.node);

-
-    GEM_BUG_ON(!pmu->base.event_init);
-
 /* Select the first online CPU as a designated reader. */
 if (cpumask_empty(&i915_pmu_cpumask))
 cpumask_set_cpu(cpu, &i915_pmu_cpumask);
@@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int 
cpu, struct hlist_node *node)
 struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), 
cpuhp.node);

 unsigned int target = i915_pmu_target_cpu;
-    GEM_BUG_ON(!pmu->base.event_init);
-
 /*
  * Unregistering an instance generates a CPU offline event which 
we must
  * ignore to avoid incorrectly modifying the shared 
i915_pmu_cpumask.
@@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct 
drm_i915_private *i915)

 {
 struct i915_pmu *pmu = &i915->pmu;
-    if (!pmu->base.event_init)
-    return;
-
 /*
- * 

Re: [PATCH 6/7] drm/i915/pmu: Lazy unregister

2024-07-23 Thread Tvrtko Ursulin



On 22/07/2024 22:06, Lucas De Marchi wrote:

Instead of calling perf_pmu_unregister() when unbinding, defer that to
the destruction of i915 object. Since perf itself holds a reference in
the event, this only happens when all events are gone, which guarantees
i915 is not unregistering the pmu with live events.

Previously, running the following sequence would crash the system after
~2 tries:

1) bind device to i915
2) wait events to show up on sysfs
3) start perf  stat -I 1000 -e i915/rcs0-busy/
4) unbind driver
5) kill perf

Most of the time this crashes in perf_pmu_disable() while accessing the
percpu pmu_disable_count. This happens because perf_pmu_unregister()
destroys it with free_percpu(pmu->pmu_disable_count).

With a lazy unbind, the pmu is only unregistered after (5) as opposed to
after (4). The downside is that if a new bind operation is attempted for
the same device/driver without killing the perf process, i915 will fail
to register the pmu (but still load successfully). This seems better
than completely crashing the system.


So effectively allows unbind to succeed without fully unbinding the 
driver from the device? That sounds like a significant drawback and if 
so, I wonder if a more complicated solution wouldn't be better after 
all. Or is there precedence for allowing userspace keeping their paws on 
unbound devices in this way?


Regards,

Tvrtko



Signed-off-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/i915_pmu.c | 24 +---
  1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 8708f905f4f4..df53a8fe53ec 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1158,18 +1158,21 @@ static void free_pmu(struct drm_device *dev, void *res)
struct i915_pmu *pmu = res;
struct drm_i915_private *i915 = pmu_to_i915(pmu);
  
+	perf_pmu_unregister(&pmu->base);

free_event_attributes(pmu);
kfree(pmu->base.attr_groups);
if (IS_DGFX(i915))
kfree(pmu->name);
+
+   /*
+* Make sure all currently running (but shortcut on pmu->closed) are
+* gone before proceeding with free'ing the pmu object embedded in i915.
+*/
+   synchronize_rcu();
  }
  
  static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)

  {
-   struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
-
-   GEM_BUG_ON(!pmu->base.event_init);
-
/* Select the first online CPU as a designated reader. */
if (cpumask_empty(&i915_pmu_cpumask))
cpumask_set_cpu(cpu, &i915_pmu_cpumask);
@@ -1182,8 +1185,6 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct 
hlist_node *node)
struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
unsigned int target = i915_pmu_target_cpu;
  
-	GEM_BUG_ON(!pmu->base.event_init);

-
/*
 * Unregistering an instance generates a CPU offline event which we must
 * ignore to avoid incorrectly modifying the shared i915_pmu_cpumask.
@@ -1337,21 +1338,14 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
  {
struct i915_pmu *pmu = &i915->pmu;
  
-	if (!pmu->base.event_init)

-   return;
-
/*
-* "Disconnect" the PMU callbacks - since all are atomic synchronize_rcu
-* ensures all currently executing ones will have exited before we
-* proceed with unregistration.
+* "Disconnect" the PMU callbacks - unregistering the pmu will be done
+* later when all currently open events are gone
 */
pmu->closed = true;
-   synchronize_rcu();
  
  	hrtimer_cancel(&pmu->timer);

-
i915_pmu_unregister_cpuhp_state(pmu);
-   perf_pmu_unregister(&pmu->base);
  
  	pmu->base.event_init = NULL;

  }


Re: [PATCH 5/7] drm/i915/pmu: Let resource survive unbind

2024-07-23 Thread Tvrtko Ursulin



On 22/07/2024 22:06, Lucas De Marchi wrote:

There's no need to free the resources during unbind. Since perf events
may still access them due to open events, it's safer to free them when
dropping the last i915 reference.

Signed-off-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/i915_pmu.c | 21 -
  1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index b5d14dd318e4..8708f905f4f4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -5,6 +5,7 @@
   */
  
  #include 

+#include 
  
  #include "gt/intel_engine.h"

  #include "gt/intel_engine_pm.h"
@@ -1152,6 +1153,17 @@ static void free_event_attributes(struct i915_pmu *pmu)
pmu->pmu_attr = NULL;
  }
  
+static void free_pmu(struct drm_device *dev, void *res)

+{
+   struct i915_pmu *pmu = res;
+   struct drm_i915_private *i915 = pmu_to_i915(pmu);
+
+   free_event_attributes(pmu);
+   kfree(pmu->base.attr_groups);
+   if (IS_DGFX(i915))
+   kfree(pmu->name);
+}
+
  static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
  {
struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
@@ -1302,6 +1314,9 @@ void i915_pmu_register(struct drm_i915_private *i915)
if (ret)
goto err_unreg;
  
+	if (drmm_add_action_or_reset(&i915->drm, free_pmu, pmu))

+   goto err_unreg;


Is i915_pmu_unregister_cpuhp_state missing on this error path?

Regards,

Tvrtko


+
return;
  
  err_unreg:

@@ -1336,11 +1351,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
hrtimer_cancel(&pmu->timer);
  
  	i915_pmu_unregister_cpuhp_state(pmu);

-
perf_pmu_unregister(&pmu->base);
+
pmu->base.event_init = NULL;
-   kfree(pmu->base.attr_groups);
-   if (IS_DGFX(i915))
-   kfree(pmu->name);
-   free_event_attributes(pmu);
  }


Re: [PATCH 4/7] drm/i915/pmu: Drop is_igp()

2024-07-23 Thread Tvrtko Ursulin



On 22/07/2024 22:06, Lucas De Marchi wrote:

There's no reason to hardcode checking for integrated graphics on a
specific pci slot. That information is already available per platform an
can be checked with IS_DGFX().


Hmm probably reason was this, added is_igp:

commit 05488673a4d41383f9dd537f298e525e6b00fb93
Author: Tvrtko Ursulin 
AuthorDate: Wed Oct 16 10:38:02 2019 +0100
Commit: Tvrtko Ursulin 
CommitDate: Thu Oct 17 10:50:47 2019 +0100

drm/i915/pmu: Support multiple GPUs

Added IS_DGFX:

commit dc90fe3fd219c7693617ba09a9467e4aadc2e039
Author: José Roberto de Souza 
AuthorDate: Thu Oct 24 12:51:19 2019 -0700
Commit: Lucas De Marchi 
CommitDate: Fri Oct 25 13:53:51 2019 -0700

drm/i915: Add is_dgfx to device info

So innocently arrived just a bit before.

Regards,

Tvrtko


Signed-off-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/i915_pmu.c | 17 +++--
  1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 3a8bd11b87e7..b5d14dd318e4 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1235,17 +1235,6 @@ static void i915_pmu_unregister_cpuhp_state(struct 
i915_pmu *pmu)
cpuhp_state_remove_instance(cpuhp_slot, &pmu->cpuhp.node);
  }
  
-static bool is_igp(struct drm_i915_private *i915)

-{
-   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
-
-   /* IGP is :00:02.0 */
-   return pci_domain_nr(pdev->bus) == 0 &&
-  pdev->bus->number == 0 &&
-  PCI_SLOT(pdev->devfn) == 2 &&
-  PCI_FUNC(pdev->devfn) == 0;
-}
-
  void i915_pmu_register(struct drm_i915_private *i915)
  {
struct i915_pmu *pmu = &i915->pmu;
@@ -1269,7 +1258,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
pmu->cpuhp.cpu = -1;
init_rc6(pmu);
  
-	if (!is_igp(i915)) {

+   if (IS_DGFX(i915)) {
pmu->name = kasprintf(GFP_KERNEL,
  "i915_%s",
  dev_name(i915->drm.dev));
@@ -1323,7 +1312,7 @@ void i915_pmu_register(struct drm_i915_private *i915)
pmu->base.event_init = NULL;
free_event_attributes(pmu);
  err_name:
-   if (!is_igp(i915))
+   if (IS_DGFX(i915))
kfree(pmu->name);
  err:
drm_notice(&i915->drm, "Failed to register PMU!\n");
@@ -1351,7 +1340,7 @@ void i915_pmu_unregister(struct drm_i915_private *i915)
perf_pmu_unregister(&pmu->base);
pmu->base.event_init = NULL;
kfree(pmu->base.attr_groups);
-   if (!is_igp(i915))
+   if (IS_DGFX(i915))
kfree(pmu->name);
free_event_attributes(pmu);
  }


[PULL] drm-intel-next-fixes

2024-07-18 Thread Tvrtko Ursulin


Hi Dave, Sima,

One display fix for the merge window relating to DisplayPort LTTPR. It
fixes at least Dell UD22 dock when used on Intel N100 systems.

Regards,

Tvrtko

drm-intel-next-fixes-2024-07-18:
- Reset intel_dp->link_trained before retraining the link [dp] (Imre Deak)
- Don't switch the LTTPR mode on an active link [dp] (Imre Deak)
The following changes since commit c58c39163a7e2c4c8885c57e4e74931c7b482e53:

  drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB (2024-07-12 
13:13:15 +1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-next-fixes-2024-07-18

for you to fetch changes up to 509580fad7323b6a5da27e8365cd488f3b57210e:

  drm/i915/dp: Don't switch the LTTPR mode on an active link (2024-07-16 
08:14:29 +)


- Reset intel_dp->link_trained before retraining the link [dp] (Imre Deak)
- Don't switch the LTTPR mode on an active link [dp] (Imre Deak)


Imre Deak (2):
  drm/i915/dp: Reset intel_dp->link_trained before retraining the link
  drm/i915/dp: Don't switch the LTTPR mode on an active link

 drivers/gpu/drm/i915/display/intel_dp.c|  2 +
 .../gpu/drm/i915/display/intel_dp_link_training.c  | 55 +++---
 2 files changed, 50 insertions(+), 7 deletions(-)


Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

2024-07-11 Thread Tvrtko Ursulin



On 11/07/2024 06:12, Nitin Gote wrote:

We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8").

Gen8 platform has only timeslice and doesn't support a preemption mechanism
as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.

Also move can_preemt() above need_preempt() function to resolve implicit
declaration of function ‘can_preempt' error and make can_preempt()
function param as const to resolve error: passing argument 1 of
‘can_preempt’ discards ‘const’ qualifier from the pointer target type.

v2: Simplify can_preemt() function (Tvrtko Ursulin)


Yeah sorry for that yesterday when I thought gen8 emit bb was dead code, 
somehow I thought there was a gen9 emit_bb flavour. Looks like I 
confused it with something else.




Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti 
Signed-off-by: Nitin Gote 
Cc: Chris Wilson 
CC:  # v5.2+
---
  .../drm/i915/gt/intel_execlists_submission.c| 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 21829439e686..59885d7721e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -294,11 +294,19 @@ static int virtual_prio(const struct 
intel_engine_execlists *el)
return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
  }
  
+static bool can_preempt(const struct intel_engine_cs *engine)

+{
+   return GRAPHICS_VER(engine->i915) > 8;
+}
+
  static bool need_preempt(const struct intel_engine_cs *engine,
 const struct i915_request *rq)
  {
int last_prio;
  
+	if (!can_preempt(engine))

+   return false;
+
if (!intel_engine_has_semaphores(engine))


Patch looks clean now. Hmmm one new observation is whether the "has 
semaphores" check is now redundant? Looks preemption depends on 
semaphore support in logical_ring_default_vfuncs().


Regards,

Tvrtko


return false;
  
@@ -3313,15 +3321,6 @@ static void remove_from_engine(struct i915_request *rq)

i915_request_notify_execute_cb_imm(rq);
  }
  
-static bool can_preempt(struct intel_engine_cs *engine)

-{
-   if (GRAPHICS_VER(engine->i915) > 8)
-   return true;
-
-   /* GPGPU on bdw requires extra w/a; not implemented */
-   return engine->class != RENDER_CLASS;
-}
-
  static void kick_execlists(const struct i915_request *rq, int prio)
  {
struct intel_engine_cs *engine = rq->engine;


Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

2024-07-10 Thread Tvrtko Ursulin



On 09/07/2024 15:02, Tvrtko Ursulin wrote:


On 09/07/2024 13:53, Nitin Gote wrote:

We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries 
for gen8").


Gen8 platform has only timeslice and doesn't support a preemption 
mechanism

as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.

Also move can_preemt() above need_preempt() function to resolve implicit
declaration of function ‘can_preempt' error and make can_preempt()
function param as const to resolve error: passing argument 1 of
‘can_preempt’ discards ‘const’ qualifier from the pointer target type.

Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption 
boundaries for gen8")

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti 
Signed-off-by: Nitin Gote 
Cc: Chris Wilson 
CC:  # v5.2+
---
  .../drm/i915/gt/intel_execlists_submission.c  | 24 ---
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c

index 21829439e686..30631cc690f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -294,11 +294,26 @@ static int virtual_prio(const struct 
intel_engine_execlists *el)

  return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
  }
+static bool can_preempt(const struct intel_engine_cs *engine)
+{
+    if (GRAPHICS_VER(engine->i915) > 8)
+    return true;
+
+    if (IS_CHERRYVIEW(engine->i915) || IS_BROADWELL(engine->i915))
+    return false;
+
+    /* GPGPU on bdw requires extra w/a; not implemented */
+    return engine->class != RENDER_CLASS;


Aren't BDW and CHV the only Gen8 platforms, in which case this function 
can be simplifies as:


...
{
 return GRAPHICS_VER(engine->i915) > 8;
}

?


+}
+
  static bool need_preempt(const struct intel_engine_cs *engine,
   const struct i915_request *rq)
  {
  int last_prio;
+    if ((GRAPHICS_VER(engine->i915) <= 8) && can_preempt(engine))


The GRAPHICS_VER check here looks redundant with the one inside 
can_preempt().


One more thing - I think gen8_emit_bb_start() becomes dead code after 
this and can be removed.


Regards,

Tvrtko


+    return false;
+
  if (!intel_engine_has_semaphores(engine))
  return false;
@@ -3313,15 +3328,6 @@ static void remove_from_engine(struct 
i915_request *rq)

  i915_request_notify_execute_cb_imm(rq);
  }
-static bool can_preempt(struct intel_engine_cs *engine)
-{
-    if (GRAPHICS_VER(engine->i915) > 8)
-    return true;
-
-    /* GPGPU on bdw requires extra w/a; not implemented */
-    return engine->class != RENDER_CLASS;
-}
-
  static void kick_execlists(const struct i915_request *rq, int prio)
  {
  struct intel_engine_cs *engine = rq->engine;


Re: [PATCH] drm/i915/gt: Do not consider preemption during execlists_dequeue for gen8

2024-07-09 Thread Tvrtko Ursulin



On 09/07/2024 13:53, Nitin Gote wrote:

We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8").

Gen8 platform has only timeslice and doesn't support a preemption mechanism
as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.

Also move can_preemt() above need_preempt() function to resolve implicit
declaration of function ‘can_preempt' error and make can_preempt()
function param as const to resolve error: passing argument 1 of
‘can_preempt’ discards ‘const’ qualifier from the pointer target type.

Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for 
gen8")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti 
Signed-off-by: Nitin Gote 
Cc: Chris Wilson 
CC:  # v5.2+
---
  .../drm/i915/gt/intel_execlists_submission.c  | 24 ---
  1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 21829439e686..30631cc690f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -294,11 +294,26 @@ static int virtual_prio(const struct 
intel_engine_execlists *el)
return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
  }
  
+static bool can_preempt(const struct intel_engine_cs *engine)

+{
+   if (GRAPHICS_VER(engine->i915) > 8)
+   return true;
+
+   if (IS_CHERRYVIEW(engine->i915) || IS_BROADWELL(engine->i915))
+   return false;
+
+   /* GPGPU on bdw requires extra w/a; not implemented */
+   return engine->class != RENDER_CLASS;


Aren't BDW and CHV the only Gen8 platforms, in which case this function 
can be simplifies as:


...
{
return GRAPHICS_VER(engine->i915) > 8;
}

?


+}
+
  static bool need_preempt(const struct intel_engine_cs *engine,
 const struct i915_request *rq)
  {
int last_prio;
  
+	if ((GRAPHICS_VER(engine->i915) <= 8) && can_preempt(engine))


The GRAPHICS_VER check here looks redundant with the one inside 
can_preempt().


Regards,

Tvrtko


+   return false;
+
if (!intel_engine_has_semaphores(engine))
return false;
  
@@ -3313,15 +3328,6 @@ static void remove_from_engine(struct i915_request *rq)

i915_request_notify_execute_cb_imm(rq);
  }
  
-static bool can_preempt(struct intel_engine_cs *engine)

-{
-   if (GRAPHICS_VER(engine->i915) > 8)
-   return true;
-
-   /* GPGPU on bdw requires extra w/a; not implemented */
-   return engine->class != RENDER_CLASS;
-}
-
  static void kick_execlists(const struct i915_request *rq, int prio)
  {
struct intel_engine_cs *engine = rq->engine;


[PULL] drm-intel-gt-next

2024-07-04 Thread Tvrtko Ursulin


Hi Dave, Sima,

The final pull for 6.11 is quite small and only contains a handful of
fixes in areas such as stolen memory probing on ATS-M, GuC priority
handling, out of memory reporting noise downgrade and fence register
hanlding race condition reported by CI.

Regards,

Tvrtko

drm-intel-gt-next-2024-07-04:
Driver Changes:

Fixes/improvements/new stuff:

- Downgrade stolen lmem setup warning [gem] (Jonathan Cavitt)
- Evaluate GuC priority within locks [gt/uc] (Andi Shyti)
- Fix potential UAF by revoke of fence registers [gt] (Janusz Krzysztofik)
- Return NULL instead of '0' [gem] (Andi Shyti)
- Use the correct format specifier for resource_size_t [gem] (Andi Shyti)
- Suppress oom warning in favour of ENOMEM to userspace [gem] (Nirmoy Das)

Miscellaneous:

- Evaluate forcewake usage within locks [gt] (Andi Shyti)
- Fix typo in comment [gt/uc] (Andi Shyti)
The following changes since commit 79655e867ad6dfde2734c67c7704c0dd5bf1e777:

  drm/i915/mtl: Update workaround 14018575942 (2024-06-11 16:06:20 +0200)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-gt-next-2024-07-04

for you to fetch changes up to 3b85152cb167bd24fe84ceb91b719b5904ca354f:

  drm/i915/gem: Suppress oom warning in favour of ENOMEM to userspace 
(2024-06-28 00:11:01 +0200)


Driver Changes:

Fixes/improvements/new stuff:

- Downgrade stolen lmem setup warning [gem] (Jonathan Cavitt)
- Evaluate GuC priority within locks [gt/uc] (Andi Shyti)
- Fix potential UAF by revoke of fence registers [gt] (Janusz Krzysztofik)
- Return NULL instead of '0' [gem] (Andi Shyti)
- Use the correct format specifier for resource_size_t [gem] (Andi Shyti)
- Suppress oom warning in favour of ENOMEM to userspace [gem] (Nirmoy Das)

Miscellaneous:

- Evaluate forcewake usage within locks [gt] (Andi Shyti)
- Fix typo in comment [gt/uc] (Andi Shyti)


Andi Shyti (5):
  drm/i915/gt: debugfs: Evaluate forcewake usage within locks
  drm/i915/gt/uc: Fix typo in comment
  drm/i915/gt/uc: Evaluate GuC priority within locks
  drm/i915/gem: Return NULL instead of '0'
  drm/i915/gem: Use the correct format specifier for resource_size_t

Janusz Krzysztofik (1):
  drm/i915/gt: Fix potential UAF by revoke of fence registers

Jonathan Cavitt (1):
  drm/i915/gem: Downgrade stolen lmem setup warning

Nirmoy Das (1):
  drm/i915/gem: Suppress oom warning in favour of ENOMEM to userspace

 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  8 +--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c |  4 
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 27 ++-
 drivers/gpu/drm/i915/i915_scatterlist.c   |  8 +++
 6 files changed, 32 insertions(+), 18 deletions(-)


Re: [PATCH v2 8/8] Revert "drm/i915: Depend on !PREEMPT_RT."

2024-06-19 Thread Tvrtko Ursulin



On 13/06/2024 11:20, Sebastian Andrzej Siewior wrote:

Once the known issues are addressed, it should be safe to enable the
driver.

Signed-off-by: Sebastian Andrzej Siewior 
---
  drivers/gpu/drm/i915/Kconfig | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 5932024f8f954..a02162d6b710e 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -3,7 +3,6 @@ config DRM_I915
tristate "Intel 8xx/9xx/G3x/G4x/HD Graphics"
depends on DRM
depends on X86 && PCI
-   depends on !PREEMPT_RT
select INTEL_GTT if X86
select INTERVAL_TREE
# we need shmfs for the swappable backing store, and in particular


Cool!

Acked-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH v2 6/8] drm/i915: Drop the irqs_disabled() check

2024-06-19 Thread Tvrtko Ursulin



On 13/06/2024 11:20, Sebastian Andrzej Siewior wrote:

The !irqs_disabled() check triggers on PREEMPT_RT even with
i915_sched_engine::lock acquired. The reason is the lock is transformed
into a sleeping lock on PREEMPT_RT and does not disable interrupts.

There is no need to check for disabled interrupts. The lockdep
annotation below already check if the lock has been acquired by the
caller and will yell if the interrupts are not disabled.

Remove the !irqs_disabled() check.

Reported-by: Maarten Lankhorst 
Signed-off-by: Sebastian Andrzej Siewior 
---
  drivers/gpu/drm/i915/i915_request.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 519e096c607cd..466b5ee8ed6d2 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -608,7 +608,6 @@ bool __i915_request_submit(struct i915_request *request)
  
  	RQ_TRACE(request, "\n");
  
-	GEM_BUG_ON(!irqs_disabled());

lockdep_assert_held(&engine->sched_engine->lock);
  
  	/*

@@ -717,7 +716,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 */
RQ_TRACE(request, "\n");
  
-	GEM_BUG_ON(!irqs_disabled());

lockdep_assert_held(&engine->sched_engine->lock);
  
  	/*


Maarten can you r-b since it seems this one originated from your 
testing? Otherwise:


Acked-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH v2 4/8] drm/i915: Disable tracing points on PREEMPT_RT

2024-06-19 Thread Tvrtko Ursulin



On 13/06/2024 11:20, Sebastian Andrzej Siewior wrote:

Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x0003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
|  rt_spin_lock+0x3f/0x50
|  gen6_read32+0x45/0x1d0 [i915]
|  g4x_get_vblank_counter+0x36/0x40 [i915]
|  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]

The tracing events use trace_intel_pipe_update_start() among other events
use functions acquire spinlock_t locks which are transformed into
sleeping locks on PREEMPT_RT. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping locks on PREEMPT_RT.
At the time the arguments are evaluated within trace point, preemption
is disabled and so the locks must not be acquired on PREEMPT_RT.

Based on this I don't see any other way than disable trace points on
PREMPT_RT.

Reported-by: Luca Abeni 
Cc: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
---
  drivers/gpu/drm/i915/display/intel_display_trace.h | 4 
  drivers/gpu/drm/i915/i915_trace.h  | 4 
  2 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_display_trace.h 
b/drivers/gpu/drm/i915/display/intel_display_trace.h
index 49a5e6d9dc0d7..b15c999d91e68 100644
--- a/drivers/gpu/drm/i915/display/intel_display_trace.h
+++ b/drivers/gpu/drm/i915/display/intel_display_trace.h
@@ -9,6 +9,10 @@
  #if !defined(__INTEL_DISPLAY_TRACE_H__) || defined(TRACE_HEADER_MULTI_READ)
  #define __INTEL_DISPLAY_TRACE_H__
  
+#if defined(CONFIG_PREEMPT_RT) && !defined(NOTRACE)

+#define NOTRACE
+#endif
+
  #include 
  #include 
  #include 
diff --git a/drivers/gpu/drm/i915/i915_trace.h 
b/drivers/gpu/drm/i915/i915_trace.h
index ce1cbee1b39dd..247e7d9448d70 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -6,6 +6,10 @@
  #if !defined(_I915_TRACE_H_) || defined(TRACE_HEADER_MULTI_READ)
  #define _I915_TRACE_H_
  
+#if defined(CONFIG_PREEMPT_RT) && !defined(NOTRACE)

+#define NOTRACE
+#endif
+
  #include 
  #include 
  #include 


If tracing experts said this is the way then it is fine by me.

Acked-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH v2 3/8] drm/i915: Don't check for atomic context on PREEMPT_RT

2024-06-19 Thread Tvrtko Ursulin



On 18/06/2024 13:54, Sebastian Andrzej Siewior wrote:

On 2024-06-18 10:00:09 [+0100], Tvrtko Ursulin wrote:

I did a re-test but am not 100% certain yet. CI looks frustratingly noisy at
the moment.

igt@debugfs_test@read_all_entries appears to be a fluke which is not new.

But igt@gem_exec_parallel@engines@basic from the latest run seem new.

So I queued another re-test.


Okay. If you want me to repost the whole series or just parts of it,
just say so.


Looks like a green set of results, both BAT and full run.

Patches 1&2 will need someone from the display side to bless though.

Regards,

Tvrtko


Re: [PATCH v2 3/8] drm/i915: Don't check for atomic context on PREEMPT_RT

2024-06-18 Thread Tvrtko Ursulin



On 17/06/2024 11:07, Sebastian Andrzej Siewior wrote:

On 2024-06-14 13:19:25 [+0100], Tvrtko Ursulin wrote:

So the question is why do you need to know if the context is atomic?
The only impact is avoiding disabling preemption. Is it that important
to avoid it?
If so would cant_migrate() work? It requires CONFIG_DEBUG_ATOMIC_SLEEP=y
to do the trick.


... catching misuse of atomic wait helpers step 2 - are you calling it from
a non-atomic context without the real need. So should use the non-atomic
helper instead.

When i915 development was very active and with a lot of contributors it was
beneficial to catch these things which code review would easily miss.

Now that the pace is much, much slower, it is probably not very important.
So this patch is acceptable for what I am concerned and:

Reviewed-by: Tvrtko Ursulin 

Actually please also add the PREEMPT_RT angle to the comment above
_WAIT_FOR_ATOMIC_CHECK. Sometimes lines change and git blame makes it hard
to find the commit text.


Do you want me the repost the series? Are the bots happy enough?


I did a re-test but am not 100% certain yet. CI looks frustratingly 
noisy at the moment.


igt@debugfs_test@read_all_entries appears to be a fluke which is not new.

But igt@gem_exec_parallel@engines@basic from the latest run seem new.

So I queued another re-test.


I have the following as far this patch:

--->8--

The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.

Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing.

Ignore _WAIT_FOR_ATOMIC_CHECK() on PREEMPT_RT.

Reviewed-by: Tvrtko Ursulin 
Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior 
---
  drivers/gpu/drm/i915/i915_utils.h | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 06ec6ceb61d57..f0d3c5cdc1b1b 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -273,8 +273,13 @@ wait_remaining_ms_from_jiffies(unsigned long 
timestamp_jiffies, int to_wait_ms)
   (Wmax))
  #define wait_for(COND, MS)_wait_for((COND), (MS) * 1000, 10, 1000)
  
-/* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */

-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+/*
+ * If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false.
+ * On PREEMPT_RT the context isn't becoming atomic because it is used in an
+ * interrupt handler or because a spinlock_t is acquired. This leads warnings
+ * which don't occur otherwise and is therefore disabled.


Ack, thanks!

Regards,

Tvrtko


+ */
+#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT) && 
!defined(CONFIG_PREEMPT_RT)
  # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
  #else
  # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)


Regards,

Tvrtko


Sebastian


Re: [PATCH v2 3/8] drm/i915: Don't check for atomic context on PREEMPT_RT

2024-06-14 Thread Tvrtko Ursulin



On 14/06/2024 12:05, Sebastian Andrzej Siewior wrote:

On 2024-06-14 09:32:07 [+0100], Tvrtko Ursulin wrote:

I think this could be okay-ish in principle, but the commit text is not
entirely accurate because there is no direct coupling between the wait
helpers and the uncore lock. They can be used from any atomic context.

Okay-ish in principle because there is sufficient testing in Intel's CI on
non-PREEMPT_RT kernels to catch any conceptual misuses.


You just avoid disabling preemption if you expect to be in atomic
context to save a few cycles. It wouldn't hurt to disable it anyway. The
only reason you need it is to remain on the same CPU while reading the
clock because it is not guaranteed otherwise.


Ah no, that is not why. Reason for conditional disabling of preemption 
is to have an implementation for very short delays which does not run 
with preemption permanently disabled. So it is disabled only around time 
tracking.



Delays > 50ms are detected at build time.


Right, point of that is to ask the contributor if they are sure this is 
what they want. Catching misuse of the short delay wait helper step one..



But see also the caller in skl_pcode_request. It is a bit harder to hit
since it is the fallback path. Or gen5_rps_enable which nests under a
different lock.

Hmm would there be a different helper, or combination of helpers, which
could replace in_atomic() which would do the right thing on both kernels?
Something to tell us we are neither under a spin_lock, nor preempt_disable,
nor interrupts disabled, nor bottom-half. On either stock or PREEMPT_RT.


There is nothing that you can use to deduct that you are under a
spin-lock. preemptible() works only if you have a preemption counter
which is not mandatory. It can affect RCU but not in all configurations.


WARN_ON_ONCE((ATOMIC) && !(!preemptible() || in_hardirq() ||
in_serving_softirq())

Would this work?


Nope. None of this triggers if you acquire a spinlock_t. And I can't
think of something that would always be true.


Bummer.


So the question is why do you need to know if the context is atomic?
The only impact is avoiding disabling preemption. Is it that important
to avoid it?
If so would cant_migrate() work? It requires CONFIG_DEBUG_ATOMIC_SLEEP=y
to do the trick.


... catching misuse of atomic wait helpers step 2 - are you calling it 
from a non-atomic context without the real need. So should use the 
non-atomic helper instead.


When i915 development was very active and with a lot of contributors it 
was beneficial to catch these things which code review would easily miss.


Now that the pace is much, much slower, it is probably not very 
important. So this patch is acceptable for what I am concerned and:


Reviewed-by: Tvrtko Ursulin 

Actually please also add the PREEMPT_RT angle to the comment above 
_WAIT_FOR_ATOMIC_CHECK. Sometimes lines change and git blame makes it 
hard to find the commit text.


Regards,

Tvrtko




Regards,

Tvrtko


Sebastian


Re: [PATCH v2 3/8] drm/i915: Don't check for atomic context on PREEMPT_RT

2024-06-14 Thread Tvrtko Ursulin



On 13/06/2024 11:20, Sebastian Andrzej Siewior wrote:

The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
because the uncore::lock is a spinlock_t and does not disable
preemption or interrupts.

Changing the uncore:lock to a raw_spinlock_t doubles the worst case
latency on an otherwise idle testbox during testing. Therefore I'm
currently unsure about changing this.

Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior 
---
  drivers/gpu/drm/i915/i915_utils.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_utils.h 
b/drivers/gpu/drm/i915/i915_utils.h
index 06ec6ceb61d57..2ca54bc235925 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -274,7 +274,7 @@ wait_remaining_ms_from_jiffies(unsigned long 
timestamp_jiffies, int to_wait_ms)
  #define wait_for(COND, MS)_wait_for((COND), (MS) * 1000, 10, 1000)
  
  /* If CONFIG_PREEMPT_COUNT is disabled, in_atomic() always reports false. */

-#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT)
+#if defined(CONFIG_DRM_I915_DEBUG) && defined(CONFIG_PREEMPT_COUNT) && 
!defined(CONFIG_PREEMPT_RT)
  # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) WARN_ON_ONCE((ATOMIC) && !in_atomic())
  #else
  # define _WAIT_FOR_ATOMIC_CHECK(ATOMIC) do { } while (0)


I think this could be okay-ish in principle, but the commit text is not 
entirely accurate because there is no direct coupling between the wait 
helpers and the uncore lock. They can be used from any atomic context.


Okay-ish in principle because there is sufficient testing in Intel's CI 
on non-PREEMPT_RT kernels to catch any conceptual misuses.


But see also the caller in skl_pcode_request. It is a bit harder to hit 
since it is the fallback path. Or gen5_rps_enable which nests under a 
different lock.


Hmm would there be a different helper, or combination of helpers, which 
could replace in_atomic() which would do the right thing on both 
kernels? Something to tell us we are neither under a spin_lock, nor 
preempt_disable, nor interrupts disabled, nor bottom-half. On either 
stock or PREEMPT_RT.


WARN_ON_ONCE((ATOMIC) && !(!preemptible() || in_hardirq() || 
in_serving_softirq())


Would this work?

Regards,

Tvrtko


[PULL] drm-intel-gt-next

2024-06-12 Thread Tvrtko Ursulin


Hi Dave, Sima,

Here is the main pull request for drm-intel-gt-next targeting 6.11.

First is the new userspace API for allowing upload of custom context
state used for replaying GPU hang error state captures. This will be
used by Mesa (see
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594) for
debugging GPU hangs captured in the wild on real hardware. So far that
was only possible under simulation and that via some hacks. Also,
simulation in general has certain limitations to what hangs it can
reproduce. As the UAPI it is intended for Mesa developers only, it is
hidden behind a kconfig and runtime enablement switches.

Then there are fixes for hangs on Meteorlake due incorrect reduced CCS
configuration and a missing video engine workaround. Then fixes for a
couple race conditions in multi GT and breadcrumb handling, and a more
robust functional level reset by extending the timeout used.

A couple tiny cleanups here and there and finally one back-merge which
was required to land some display code base refactoring.

Regards,

Tvrtko

drm-intel-gt-next-2024-06-12:
UAPI Changes:

- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)

Driver Changes:

Fixes/improvements/new stuff:

- Automate CCS Mode setting during engine resets [gt] (Andi Shyti)
- Revert "drm/i915: Remove extra multi-gt pm-references" (Janusz Krzysztofik)
- Fix HAS_REGION() usage in intel_gt_probe_lmem() (Ville Syrjälä)
- Disarm breadcrumbs if engines are already idle [gt] (Chris Wilson)
- Shadow default engine context image in the context (Tvrtko Ursulin)
- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)
- avoid FIELD_PREP warning [guc] (Arnd Bergmann)
- Fix CCS id's calculation for CCS mode setting [gt] (Andi Shyti)
- Increase FLR timeout from 3s to 9s (Andi Shyti)
- Update workaround 14018575942 [mtl] (Angus Chen)

Future platform enablement:

- Enable w/a 16021333562 for DG2, MTL and ARL [guc] (John Harrison)

Miscellaneous:

- Pass the region ID rather than a bitmask to HAS_REGION() (Ville Syrjälä)
- Remove counter productive REGION_* wrappers (Ville Syrjälä)
- Fix typo [gem/i915_gem_ttm_move] (Deming Wang)
- Delete the live_hearbeat_fast selftest [gt] (Krzysztof Niemiec)
The following changes since commit 431c590c3ab0469dfedad3a832fe73556396ee52:

  drm/tests: Add a unit test for range bias allocation (2024-05-16 12:50:14 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/i915/kernel.git 
tags/drm-intel-gt-next-2024-06-12

for you to fetch changes up to 79655e867ad6dfde2734c67c7704c0dd5bf1e777:

  drm/i915/mtl: Update workaround 14018575942 (2024-06-11 16:06:20 +0200)


UAPI Changes:

- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)

Driver Changes:

Fixes/improvements/new stuff:

- Automate CCS Mode setting during engine resets [gt] (Andi Shyti)
- Revert "drm/i915: Remove extra multi-gt pm-references" (Janusz Krzysztofik)
- Fix HAS_REGION() usage in intel_gt_probe_lmem() (Ville Syrjälä)
- Disarm breadcrumbs if engines are already idle [gt] (Chris Wilson)
- Shadow default engine context image in the context (Tvrtko Ursulin)
- Support replaying GPU hangs with captured context image (Tvrtko Ursulin)
- avoid FIELD_PREP warning [guc] (Arnd Bergmann)
- Fix CCS id's calculation for CCS mode setting [gt] (Andi Shyti)
- Increase FLR timeout from 3s to 9s (Andi Shyti)
- Update workaround 14018575942 [mtl] (Angus Chen)

Future platform enablement:

- Enable w/a 16021333562 for DG2, MTL and ARL [guc] (John Harrison)

Miscellaneous:

- Pass the region ID rather than a bitmask to HAS_REGION() (Ville Syrjälä)
- Remove counter productive REGION_* wrappers (Ville Syrjälä)
- Fix typo [gem/i915_gem_ttm_move] (Deming Wang)
- Delete the live_hearbeat_fast selftest [gt] (Krzysztof Niemiec)


Andi Shyti (3):
  drm/i915/gt: Automate CCS Mode setting during engine resets
  drm/i915/gt: Fix CCS id's calculation for CCS mode setting
  drm/i915: Increase FLR timeout from 3s to 9s

Angus Chen (1):
  drm/i915/mtl: Update workaround 14018575942

Arnd Bergmann (1):
  drm/i915/guc: avoid FIELD_PREP warning

Chris Wilson (1):
  drm/i915/gt: Disarm breadcrumbs if engines are already idle

Deming Wang (1):
  drm/i915/gem/i915_gem_ttm_move: Fix typo

Janusz Krzysztofik (1):
  Revert "drm/i915: Remove extra multi-gt pm-references"

John Harrison (1):
  drm/i915/guc: Enable w/a 16021333562 for DG2, MTL and ARL

Niemiec, Krzysztof (1):
  drm/i915/gt: Delete the live_hearbeat_fast selftest

Tvrtko Ursulin (3):
  Merge drm/drm-next into drm-intel-gt-next
  drm/i915: Shadow default engine context image in the context
  drm/i915: Support replaying GPU hangs with captured context image

Ville Syrjälä (3):
  drm/i915: Fix HAS_REGI

Re: [PATCH] drm/i915/gt: debugfs: Evaluate forcewake usage within locks

2024-06-11 Thread Tvrtko Ursulin



On 10/06/2024 10:24, Nirmoy Das wrote:

Hi Andi,

On 6/7/2024 4:51 PM, Andi Shyti wrote:

The forcewake count and domains listing is multi process critical
and the uncore provides a spinlock for such cases.

Lock the forcewake evaluation section in the fw_domains_show()
debugfs interface.

Signed-off-by: Andi Shyti 


Needs a Fixes tag, below seems to be correct one.


Fixes: 9dd4b065446a ("drm/i915/gt: Move pm debug files into a gt aware 
debugfs")


Cc:  # v5.6+

Reviewed-by: Nirmoy Das 


What is the back story here and why would it need backporting? IGT cares 
about the atomic view of user_forcewake_count and individual domains or 
what?


Regards,

Tvrtko




Regards,

Nirmoy



---
  drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c 
b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c

index 4fcba42cfe34..0437fd8217e0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
@@ -71,6 +71,8 @@ static int fw_domains_show(struct seq_file *m, void 
*data)

  struct intel_uncore_forcewake_domain *fw_domain;
  unsigned int tmp;
+    spin_lock_irq(&uncore->lock);
+
  seq_printf(m, "user.bypass_count = %u\n",
 uncore->user_forcewake_count);
@@ -79,6 +81,8 @@ static int fw_domains_show(struct seq_file *m, void 
*data)

 intel_uncore_forcewake_domain_to_str(fw_domain->id),
 READ_ONCE(fw_domain->wake_count));
+    spin_unlock_irq(&uncore->lock);
+
  return 0;
  }
  DEFINE_INTEL_GT_DEBUGFS_ATTRIBUTE(fw_domains);


Re: [PATCH 00/10] drm/i915: PREEMPT_RT related fixups.

2024-06-11 Thread Tvrtko Ursulin



Hi Sebastian,

On 05/06/2024 11:01, Sebastian Andrzej Siewior wrote:

On 2024-04-05 16:18:18 [+0200], To intel-gfx@lists.freedesktop.org wrote:
Hi,


The following patches are from the PREEMPT_RT queue.  It is mostly about
disabling interrupts/preemption which leads to problems. Unfortunately

…

Could I please get some feedback? I didn't receive anything but
automated mails from bots and I can't tell if this is a problem or not.

As of -rc2 I noticed that I can drop
[PATCH 06/10] drm/i915/gt: Queue and wait for the irq_work item.

from the series.


Previous CI results have unfortunately expired by now. I have tried 
re-queuing it but it also does not apply any longer so I'm afraid you 
will have to respin before anyone can see the results.


And this is not to say that I can promise someone will look at it or 
when. Maybe Jani you could ask for volunteers regarding the display 
related patches (head of the series) and Rodrigo you about the GuC 
change in 9/10?


Regards,

Tvrtko


Re: [PATCH] drm/i915/gt: Delete the live_hearbeat_fast selftest

2024-06-10 Thread Tvrtko Ursulin



Hi Andi,

On 10/06/2024 13:10, Andi Shyti wrote:

Hi Tvrtko,

On Mon, Jun 10, 2024 at 12:42:31PM +0100, Tvrtko Ursulin wrote:

On 03/06/2024 17:20, Niemiec, Krzysztof wrote:

The test is trying to push the heartbeat frequency to the limit, which
might sometimes fail. Such a failure does not provide valuable
information, because it does not indicate that there is something
necessarily wrong with either the driver or the hardware.

Remove the test to prevent random, unnecessary failures from appearing
in CI.

Suggested-by: Chris Wilson 
Signed-off-by: Niemiec, Krzysztof 


Just a note in passing that comma in the email display name is I believe not
RFC 5322 compliant and there might be tools which barf on it(*). If you can
put it in double quotes, it would be advisable.


yes, we discussed it with Krzysztof, I noticed it right after I
submitted the code.


Regards,

Tvrtko

*) Such as my internal pull request generator which uses CPAN's
Email::Address::XS. :)


If we are in time, we can fix it as Krzysztof Niemiec 


Sorry about this oversight,


It's not a big deal (it isn't the first and only occurence) and no need 
to do anything more than correct the display name going forward.


Regards,

Tvrtko


Re: [PATCH] drm/i915/gt: Delete the live_hearbeat_fast selftest

2024-06-10 Thread Tvrtko Ursulin



On 03/06/2024 17:20, Niemiec, Krzysztof wrote:

The test is trying to push the heartbeat frequency to the limit, which
might sometimes fail. Such a failure does not provide valuable
information, because it does not indicate that there is something
necessarily wrong with either the driver or the hardware.

Remove the test to prevent random, unnecessary failures from appearing
in CI.

Suggested-by: Chris Wilson 
Signed-off-by: Niemiec, Krzysztof 


Just a note in passing that comma in the email display name is I believe 
not RFC 5322 compliant and there might be tools which barf on it(*). If 
you can put it in double quotes, it would be advisable.


Regards,

Tvrtko

*) Such as my internal pull request generator which uses CPAN's 
Email::Address::XS. :)



---
  .../drm/i915/gt/selftest_engine_heartbeat.c   | 110 --
  1 file changed, 110 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index ef014df4c4fc..9e4f0e417b3b 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -193,115 +193,6 @@ static int live_idle_pulse(void *arg)
return err;
  }
  
-static int cmp_u32(const void *_a, const void *_b)

-{
-   const u32 *a = _a, *b = _b;
-
-   return *a - *b;
-}
-
-static int __live_heartbeat_fast(struct intel_engine_cs *engine)
-{
-   const unsigned int error_threshold = max(2u, jiffies_to_usecs(6));
-   struct intel_context *ce;
-   struct i915_request *rq;
-   ktime_t t0, t1;
-   u32 times[5];
-   int err;
-   int i;
-
-   ce = intel_context_create(engine);
-   if (IS_ERR(ce))
-   return PTR_ERR(ce);
-
-   intel_engine_pm_get(engine);
-
-   err = intel_engine_set_heartbeat(engine, 1);
-   if (err)
-   goto err_pm;
-
-   for (i = 0; i < ARRAY_SIZE(times); i++) {
-   do {
-   /* Manufacture a tick */
-   intel_engine_park_heartbeat(engine);
-   GEM_BUG_ON(engine->heartbeat.systole);
-   engine->serial++; /*  pretend we are not idle! */
-   intel_engine_unpark_heartbeat(engine);
-
-   flush_delayed_work(&engine->heartbeat.work);
-   if (!delayed_work_pending(&engine->heartbeat.work)) {
-   pr_err("%s: heartbeat %d did not start\n",
-  engine->name, i);
-   err = -EINVAL;
-   goto err_pm;
-   }
-
-   rcu_read_lock();
-   rq = READ_ONCE(engine->heartbeat.systole);
-   if (rq)
-   rq = i915_request_get_rcu(rq);
-   rcu_read_unlock();
-   } while (!rq);
-
-   t0 = ktime_get();
-   while (rq == READ_ONCE(engine->heartbeat.systole))
-   yield(); /* work is on the local cpu! */
-   t1 = ktime_get();
-
-   i915_request_put(rq);
-   times[i] = ktime_us_delta(t1, t0);
-   }
-
-   sort(times, ARRAY_SIZE(times), sizeof(times[0]), cmp_u32, NULL);
-
-   pr_info("%s: Heartbeat delay: %uus [%u, %u]\n",
-   engine->name,
-   times[ARRAY_SIZE(times) / 2],
-   times[0],
-   times[ARRAY_SIZE(times) - 1]);
-
-   /*
-* Ideally, the upper bound on min work delay would be something like
-* 2 * 2 (worst), +1 for scheduling, +1 for slack. In practice, we
-* are, even with system_wq_highpri, at the mercy of the CPU scheduler
-* and may be stuck behind some slow work for many millisecond. Such
-* as our very own display workers.
-*/
-   if (times[ARRAY_SIZE(times) / 2] > error_threshold) {
-   pr_err("%s: Heartbeat delay was %uus, expected less than 
%dus\n",
-  engine->name,
-  times[ARRAY_SIZE(times) / 2],
-  error_threshold);
-   err = -EINVAL;
-   }
-
-   reset_heartbeat(engine);
-err_pm:
-   intel_engine_pm_put(engine);
-   intel_context_put(ce);
-   return err;
-}
-
-static int live_heartbeat_fast(void *arg)
-{
-   struct intel_gt *gt = arg;
-   struct intel_engine_cs *engine;
-   enum intel_engine_id id;
-   int err = 0;
-
-   /* Check that the heartbeat ticks at the desired rate. */
-   if (!CONFIG_DRM_I915_HEARTBEAT_INTERVAL)
-   return 0;
-
-   for_each_engine(engine, gt, id) {
-   err = __live_heartbeat_fast(engine);
-   if (err)
-   break;
-   }
-
-   return err;
-}
-
  static int __live_heartbeat_off(struct intel_engine_cs *engine)
  {
int err;
@@ -372,7 +263,6 @@ i

Re: [PATCH] drm/i915/dpt: Make DPT object unshrinkable

2024-05-23 Thread Tvrtko Ursulin



On 23/05/2024 13:24, Ville Syrjälä wrote:

On Thu, May 23, 2024 at 01:07:24PM +0100, Tvrtko Ursulin wrote:


On 23/05/2024 12:19, Ville Syrjälä wrote:

On Thu, May 23, 2024 at 09:25:45AM +0100, Tvrtko Ursulin wrote:


On 22/05/2024 16:29, Vidya Srinivas wrote:

In some scenarios, the DPT object gets shrunk but
the actual framebuffer did not and thus its still
there on the DPT's vm->bound_list. Then it tries to
rewrite the PTEs via a stale CPU mapping. This causes panic.

Suggested-by: Ville Syrjala 
Cc: sta...@vger.kernel.org
Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Vidya Srinivas 
---
drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3560a062d287..e6b485fc54d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct 
drm_i915_gem_object *obj);
static inline bool
i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
{
-   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
+   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
+   !obj->is_dpt;


Is there a reason i915_gem_object_make_unshrinkable() cannot be used to
mark the object at a suitable place?


Do you have a suitable place in mind?
i915_gem_object_make_unshrinkable() contains some magic
ingredients so doesn't look like it can be called willy
nilly.


After it is created in intel_dpt_create?

I don't see that helper couldn't be called. It is called from madvise
and tiling for instance without any apparent special considerations.


Did you actually read through i915_gem_object_make_unshrinkable()?


Briefly, and also looked around how it is used. I don't immediately 
understand which part concerns you and it is also quite possible I am 
missing something.


But see for example how it is used in intel_context.c+intel_lrc.c to 
protect the context state object from the shrinker while it is in use by 
the GPU. It does not appear any black magic is required.


Question also is does that kind of lifetime aligns with the DPT use case.


Also, there is no mention of this angle in the commit message so I
assumed it wasn't considered. If it was, then it should have been
mentioned why hacky solution was chosen instead...


I suppose.




Anyways, looks like I forgot to reply that I already pushed this
with this extra comment added:
/* TODO: make DPT shrinkable when it has no bound vmas */


... becuase IMO the special case is quite ugly and out of place. :(


Yeah, not the nicest. But there's already a is_dpt check in the
i915_gem_object_is_framebuffer() right next door, so it's not
*that* out of place.


I also see who added that one! ;)


Another option maybe could be to manually clear
I915_GEM_OBJECT_IS_SHRINKABLE but I don't think that is
supposed to be mutable, so might also have other issues.
So a more proper solution with that approach would perhaps
need some kind of gem_create_shmem_unshrinkable() function.



I don't remember from the top of my head how DPT magic works but if
shrinker protection needs to be tied with VMAs there is also
i915_make_make(un)shrinkable to try.


I presume you mistyped something there.


Oops - i915_vma_make_(un)shrinkable.

Anyway, I think it is worth giving it a try if the DPT lifetimes makes 
it possible.


Regards,

Tvrtko


Re: [PATCH] drm/i915/dpt: Make DPT object unshrinkable

2024-05-23 Thread Tvrtko Ursulin



On 23/05/2024 12:19, Ville Syrjälä wrote:

On Thu, May 23, 2024 at 09:25:45AM +0100, Tvrtko Ursulin wrote:


On 22/05/2024 16:29, Vidya Srinivas wrote:

In some scenarios, the DPT object gets shrunk but
the actual framebuffer did not and thus its still
there on the DPT's vm->bound_list. Then it tries to
rewrite the PTEs via a stale CPU mapping. This causes panic.

Suggested-by: Ville Syrjala 
Cc: sta...@vger.kernel.org
Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Vidya Srinivas 
---
   drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3560a062d287..e6b485fc54d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct 
drm_i915_gem_object *obj);
   static inline bool
   i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
   {
-   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
+   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
+   !obj->is_dpt;


Is there a reason i915_gem_object_make_unshrinkable() cannot be used to
mark the object at a suitable place?


Do you have a suitable place in mind?
i915_gem_object_make_unshrinkable() contains some magic
ingredients so doesn't look like it can be called willy
nilly.


After it is created in intel_dpt_create?

I don't see that helper couldn't be called. It is called from madvise 
and tiling for instance without any apparent special considerations.


Also, there is no mention of this angle in the commit message so I 
assumed it wasn't considered. If it was, then it should have been 
mentioned why hacky solution was chosen instead...



Anyways, looks like I forgot to reply that I already pushed this
with this extra comment added:
/* TODO: make DPT shrinkable when it has no bound vmas */


... becuase IMO the special case is quite ugly and out of place. :(

I don't remember from the top of my head how DPT magic works but if 
shrinker protection needs to be tied with VMAs there is also 
i915_make_make(un)shrinkable to try.


Regards,

Tvrtko


Re: [PATCH] drm/i915/dpt: Make DPT object unshrinkable

2024-05-23 Thread Tvrtko Ursulin



On 22/05/2024 16:29, Vidya Srinivas wrote:

In some scenarios, the DPT object gets shrunk but
the actual framebuffer did not and thus its still
there on the DPT's vm->bound_list. Then it tries to
rewrite the PTEs via a stale CPU mapping. This causes panic.

Suggested-by: Ville Syrjala 
Cc: sta...@vger.kernel.org
Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Vidya Srinivas 
---
  drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3560a062d287..e6b485fc54d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct 
drm_i915_gem_object *obj);
  static inline bool
  i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
  {
-   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
+   return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
+   !obj->is_dpt;


Is there a reason i915_gem_object_make_unshrinkable() cannot be used to 
mark the object at a suitable place?


Regards,

Tvrtko


  }
  
  static inline bool


[PATCH] drm/i915: 2 GiB of relocations ought to be enough for anybody*

2024-05-21 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Kernel test robot reports i915 can hit a warn in kvmalloc_node which has
a purpose of dissalowing crazy size kernel allocations. This was added in
7661809d493b ("mm: don't allow oversized kvmalloc() calls"):

   /* Don't even allow crazy sizes */
   if (WARN_ON_ONCE(size > INT_MAX))
   return NULL;

This would be kind of okay since i915 at one point dropped the need for
making a shadow copy of the relocation list, but then it got re-added in
fd1500fcd442 ("Revert "drm/i915/gem: Drop relocation slowpath".") a year
after Linus added the above warning.

It is plausible that the issue was not seen until now because to trigger
gem_exec_reloc test requires a combination of an relatively older
generation hardware but with at least 8GiB of RAM installed. Probably even
more depending on runtime checks.

Lets cap what we allow userspace to pass in using the matching limit.
There should be no issue for real userspace since we are talking about
"crazy" number of relocations which have no practical purpose.

*) Well IGT tests might get upset but they can be easily adjusted.

Signed-off-by: Tvrtko Ursulin 
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-lkp/202405151008.6ddd1aaf-oliver.s...@intel.com
Cc: Kees Cook 
Cc: Kent Overstreet 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index d3a771afb083..4b34bf4fde77 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1533,7 +1533,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, 
struct eb_vma *ev)
u64_to_user_ptr(entry->relocs_ptr);
unsigned long remain = entry->relocation_count;
 
-   if (unlikely(remain > N_RELOC(ULONG_MAX)))
+   if (unlikely(remain > N_RELOC(INT_MAX)))
return -EINVAL;
 
/*
@@ -1641,7 +1641,7 @@ static int check_relocations(const struct 
drm_i915_gem_exec_object2 *entry)
if (size == 0)
return 0;
 
-   if (size > N_RELOC(ULONG_MAX))
+   if (size > N_RELOC(INT_MAX))
return -EINVAL;
 
addr = u64_to_user_ptr(entry->relocs_ptr);
-- 
2.44.0



[CI 2/2] drm/i915: Support replaying GPU hangs with captured context image

2024-05-14 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.

This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.

To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.

Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.

In terms of implementation we allow the legacy context set param to be
used since that removes the need to record the per context data in the
proto context, while still allowing flexibility of specifying context
images for any context.

Mesa MR using the uapi can be seen at:
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594

v2:
 * Fix whitespace alignment as per checkpatch.
 * Added warning on userspace misuse.
 * Rebase for extracting ce->default_state shadowing.

v3:
 * Rebase for I915_CONTEXT_PARAM_LOW_LATENCY.

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
Cc: Rodrigo Vivi 
Reviewed-by: Rodrigo Vivi 
Tested-by: Carlos Santa 
Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/Kconfig.debug|  17 +++
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 113 ++
 drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h   |  22 
 drivers/gpu/drm/i915/gt/intel_context_types.h |   1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   3 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   3 +-
 drivers/gpu/drm/i915/i915_params.c|   5 +
 drivers/gpu/drm/i915/i915_params.h|   3 +-
 include/uapi/drm/i915_drm.h   |  27 +
 10 files changed, 193 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug 
b/drivers/gpu/drm/i915/Kconfig.debug
index d8397065c3f0..1852e0804942 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -16,6 +16,23 @@ config DRM_I915_WERROR
 
  If in doubt, say "N".
 
+config DRM_I915_REPLAY_GPU_HANGS_API
+   bool "Enable GPU hang replay userspace API"
+   depends on DRM_I915
+   depends on EXPERT
+   default n
+   help
+ Choose this option if you want to enable special and unstable
+ userspace API used for replaying GPU hangs on a running system.
+
+ This API is intended to be used by userspace graphics stack developers
+ and provides no stability guarantees.
+
+ The API needs to be activated at boot time using the
+ enable_debug_only_api module parameter.
+
+ If in doubt, say "N".
+
 config DRM_I915_DEBUG
bool "Enable additional driver debugging"
depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 81f65cab1330..c0543c35cd6a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -78,6 +78,7 @@
 #include "gt/intel_engine_user.h"
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
+#include "gt/shmem_utils.h"
 
 #include "pxp/intel_pxp.h"
 
@@ -957,6 +958,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private 
*fpriv,
case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
+   case I915_CONTEXT_PARAM_CONTEXT_IMAGE:
default:
ret = -EINVAL;
break;
@@ -2104,6 +2106,95 @@ static int get_protected(struct i915_gem_context *ctx,
return 0;
 }
 
+static int set_context_image(struct i915_gem_context *ctx,
+struct drm_i915_gem_context_param *args)
+{
+   struct i915_gem_context_param_context_image user;
+   struct intel_context *ce;
+   struct file *shmem_state;
+   unsigned long lookup;
+   void *state;
+   int ret = 0;
+
+   if (!IS_ENABLED(CONFIG_DRM_I915_REPLAY_GPU_HANGS_API))
+   return -EINVAL;
+
+   if (!ctx->i915->params.enable_debug_only_api)
+   return -EINVAL;
+
+   if (args->size < sizeof(user))
+   return -EINVAL;
+
+   if (copy_from_user(&user, u64_to_user_ptr(args->value), sizeof(user)))

[CI 1/2] drm/i915: Shadow default engine context image in the context

2024-05-14 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

To enable adding override of the default engine context image let us start
shadowing the per engine state in the context.

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
Cc: Rodrigo Vivi 
Reviewed-by: Rodrigo Vivi 
Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_context_types.h   | 2 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c | 7 ---
 drivers/gpu/drm/i915/gt/intel_ring_submission.c | 7 ---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed95a7b57cbb..6ae8abfeccdb 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -99,6 +99,8 @@ struct intel_context {
struct i915_address_space *vm;
struct i915_gem_context __rcu *gem_context;
 
+   struct file *default_state;
+
/*
 * @signal_lock protects the list of requests that need signaling,
 * @signals. While there are any requests that need signaling,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index b387146ede98..d4ffb352403c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1017,9 +1017,8 @@ void lrc_init_state(struct intel_context *ce,
 
set_redzone(state, engine);
 
-   if (engine->default_state) {
-   shmem_read(engine->default_state, 0,
-  state, engine->context_size);
+   if (ce->default_state) {
+   shmem_read(ce->default_state, 0, state, engine->context_size);
__set_bit(CONTEXT_VALID_BIT, &ce->flags);
inhibit = false;
}
@@ -1131,6 +1130,8 @@ int lrc_alloc(struct intel_context *ce, struct 
intel_engine_cs *engine)
 
GEM_BUG_ON(ce->state);
 
+   ce->default_state = engine->default_state;
+
vma = __lrc_alloc_state(ce, engine);
if (IS_ERR(vma))
return PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 92085ffd23de..8625e88e785f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -474,8 +474,7 @@ static int ring_context_init_default_state(struct 
intel_context *ce,
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
 
-   shmem_read(ce->engine->default_state, 0,
-  vaddr, ce->engine->context_size);
+   shmem_read(ce->default_state, 0, vaddr, ce->engine->context_size);
 
i915_gem_object_flush_map(obj);
__i915_gem_object_release_map(obj);
@@ -491,7 +490,7 @@ static int ring_context_pre_pin(struct intel_context *ce,
struct i915_address_space *vm;
int err = 0;
 
-   if (ce->engine->default_state &&
+   if (ce->default_state &&
!test_bit(CONTEXT_VALID_BIT, &ce->flags)) {
err = ring_context_init_default_state(ce, ww);
if (err)
@@ -570,6 +569,8 @@ static int ring_context_alloc(struct intel_context *ce)
 {
struct intel_engine_cs *engine = ce->engine;
 
+   ce->default_state = engine->default_state;
+
/* One ringbuffer to rule them all */
GEM_BUG_ON(!engine->legacy.ring);
ce->ring = engine->legacy.ring;
-- 
2.44.0



[CI 1/2] drm/i915: Shadow default engine context image in the context

2024-05-14 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

To enable adding override of the default engine context image let us start
shadowing the per engine state in the context.

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
Cc: Rodrigo Vivi 
Reviewed-by: Rodrigo Vivi 
Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/gt/intel_context_types.h   | 2 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c | 7 ---
 drivers/gpu/drm/i915/gt/intel_ring_submission.c | 7 ---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed95a7b57cbb..6ae8abfeccdb 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -99,6 +99,8 @@ struct intel_context {
struct i915_address_space *vm;
struct i915_gem_context __rcu *gem_context;
 
+   struct file *default_state;
+
/*
 * @signal_lock protects the list of requests that need signaling,
 * @signals. While there are any requests that need signaling,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index b387146ede98..d4ffb352403c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1017,9 +1017,8 @@ void lrc_init_state(struct intel_context *ce,
 
set_redzone(state, engine);
 
-   if (engine->default_state) {
-   shmem_read(engine->default_state, 0,
-  state, engine->context_size);
+   if (ce->default_state) {
+   shmem_read(ce->default_state, 0, state, engine->context_size);
__set_bit(CONTEXT_VALID_BIT, &ce->flags);
inhibit = false;
}
@@ -1131,6 +1130,8 @@ int lrc_alloc(struct intel_context *ce, struct 
intel_engine_cs *engine)
 
GEM_BUG_ON(ce->state);
 
+   ce->default_state = engine->default_state;
+
vma = __lrc_alloc_state(ce, engine);
if (IS_ERR(vma))
return PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 92085ffd23de..8625e88e785f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -474,8 +474,7 @@ static int ring_context_init_default_state(struct 
intel_context *ce,
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
 
-   shmem_read(ce->engine->default_state, 0,
-  vaddr, ce->engine->context_size);
+   shmem_read(ce->default_state, 0, vaddr, ce->engine->context_size);
 
i915_gem_object_flush_map(obj);
__i915_gem_object_release_map(obj);
@@ -491,7 +490,7 @@ static int ring_context_pre_pin(struct intel_context *ce,
struct i915_address_space *vm;
int err = 0;
 
-   if (ce->engine->default_state &&
+   if (ce->default_state &&
!test_bit(CONTEXT_VALID_BIT, &ce->flags)) {
err = ring_context_init_default_state(ce, ww);
if (err)
@@ -570,6 +569,8 @@ static int ring_context_alloc(struct intel_context *ce)
 {
struct intel_engine_cs *engine = ce->engine;
 
+   ce->default_state = engine->default_state;
+
/* One ringbuffer to rule them all */
GEM_BUG_ON(!engine->legacy.ring);
ce->ring = engine->legacy.ring;
-- 
2.44.0



[CI 2/2] drm/i915: Support replaying GPU hangs with captured context image

2024-05-14 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.

This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.

To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.

Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.

In terms of implementation we allow the legacy context set param to be
used since that removes the need to record the per context data in the
proto context, while still allowing flexibility of specifying context
images for any context.

Mesa MR using the uapi can be seen at:
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594

v2:
 * Fix whitespace alignment as per checkpatch.
 * Added warning on userspace misuse.
 * Rebase for extracting ce->default_state shadowing.

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
Cc: Rodrigo Vivi 
Reviewed-by: Rodrigo Vivi 
Tested-by: Carlos Santa 
Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/i915/Kconfig.debug|  17 +++
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 113 ++
 drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h   |  22 
 drivers/gpu/drm/i915/gt/intel_context_types.h |   1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   3 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   3 +-
 drivers/gpu/drm/i915/i915_params.c|   5 +
 drivers/gpu/drm/i915/i915_params.h|   3 +-
 include/uapi/drm/i915_drm.h   |  27 +
 10 files changed, 193 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug 
b/drivers/gpu/drm/i915/Kconfig.debug
index d8397065c3f0..1852e0804942 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -16,6 +16,23 @@ config DRM_I915_WERROR
 
  If in doubt, say "N".
 
+config DRM_I915_REPLAY_GPU_HANGS_API
+   bool "Enable GPU hang replay userspace API"
+   depends on DRM_I915
+   depends on EXPERT
+   default n
+   help
+ Choose this option if you want to enable special and unstable
+ userspace API used for replaying GPU hangs on a running system.
+
+ This API is intended to be used by userspace graphics stack developers
+ and provides no stability guarantees.
+
+ The API needs to be activated at boot time using the
+ enable_debug_only_api module parameter.
+
+ If in doubt, say "N".
+
 config DRM_I915_DEBUG
bool "Enable additional driver debugging"
depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 81f65cab1330..c0543c35cd6a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -78,6 +78,7 @@
 #include "gt/intel_engine_user.h"
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
+#include "gt/shmem_utils.h"
 
 #include "pxp/intel_pxp.h"
 
@@ -957,6 +958,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private 
*fpriv,
case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
+   case I915_CONTEXT_PARAM_CONTEXT_IMAGE:
default:
ret = -EINVAL;
break;
@@ -2104,6 +2106,95 @@ static int get_protected(struct i915_gem_context *ctx,
return 0;
 }
 
+static int set_context_image(struct i915_gem_context *ctx,
+struct drm_i915_gem_context_param *args)
+{
+   struct i915_gem_context_param_context_image user;
+   struct intel_context *ce;
+   struct file *shmem_state;
+   unsigned long lookup;
+   void *state;
+   int ret = 0;
+
+   if (!IS_ENABLED(CONFIG_DRM_I915_REPLAY_GPU_HANGS_API))
+   return -EINVAL;
+
+   if (!ctx->i915->params.enable_debug_only_api)
+   return -EINVAL;
+
+   if (args->size < sizeof(user))
+   return -EINVAL;
+
+   if (copy_from_user(&user, u64_to_user_ptr(args->value), sizeof(user)))
+   return -EFAULT;
+
+  

Re: [PATCH] MAINTAINERS: Move the drm-intel repo location to fd.o GitLab

2024-04-26 Thread Tvrtko Ursulin




On 26/04/2024 16:47, Lucas De Marchi wrote:

On Wed, Apr 24, 2024 at 01:41:59PM GMT, Ryszard Knop wrote:

The drm-intel repo is moving from the classic fd.o git host to GitLab.
Update its location with a URL matching other fd.o GitLab kernel trees.

Signed-off-by: Ryszard Knop 


Acked-by: Lucas De Marchi 

Also Cc'ing maintainers


Thanks,

Acked-by: Tvrtko Ursulin 

Regards,

Tvrtko


---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d6327dc12cb1..fbf7371a0bb0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10854,7 +10854,7 @@ W:
https://drm.pages.freedesktop.org/intel-docs/

Q:    http://patchwork.freedesktop.org/project/intel-gfx/
B:
https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html

C:    irc://irc.oftc.net/intel-gfx
-T:    git git://anongit.freedesktop.org/drm-intel
+T:    git https://gitlab.freedesktop.org/drm/i915/kernel.git
F:    Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
F:    Documentation/gpu/i915.rst
F:    drivers/gpu/drm/ci/xfails/i915*
--
2.44.0



Re: [PATCH 1/2] drm/i915/display: remove small micro-optimizations in irq handling

2024-04-18 Thread Tvrtko Ursulin



On 18/04/2024 10:49, Jani Nikula wrote:

On Wed, 17 Apr 2024, Lucas De Marchi  wrote:

On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:

The raw register reads/writes are there as micro-optimizations to avoid
multiple pointer indirections on uncore->regs. Presumably this is useful
when there are plenty of register reads/writes in the same
function. However, the display irq handling only has a few raw
reads/writes. Remove them for simplification.


I think that comment didn't age well. Not to say there's something wrong
with this commit, but just to make sure we are aware of the additional
stuff going on and we if we are ok with that.

using intel_de_read() in place of raw_reg_read() will do (for newer
platforms):

1) Read FPGA_DBG to detect unclaimed access before the actual read
2) Find the relevant forcewake for that register, acquire and wait for 
ack
3) readl(reg)
4) Read FPGA_DBG to detect unclaimed access after the actual read
5) Trace reg rw

That's much more than a pointer indirection. Are we ok with that in the
irq?  Also, I don't know why but we have variants to skip tracing (step
5 above), but on my books a disabled tracepoint is order of magnitudes
less overhead than 1, 2 and 4.


Honestly, I don't really know.

The thing is, we have these ad hoc optimizations all over the place. Why
do we have the raw access in two places, but not everywhere in irq
handling? The pointer indirection thing really only makes sense if you
have a lot of access in a function, but that's not the case. You do have
a point about everything else.


The "why only two" places is I think simply an artefact of refactoring 
and code evolution. Initially all IRQ handling was in one function, then 
later gen11 and display parts got split out as more platforms were 
added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display 
interrupt handling").


As for the original rationale, it was described in commits like:

2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")
c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")

Obviosuly, once a portion of a handler was/is extracted, pointer caching 
to avoid uncore->regs reloads may not make full sense any more due 
function calls potentially overshadowing that cost.


As for unclaimed debug, I would say it is probably okay to not burden 
the irq handlers with it, but if the display folks think a little bit of 
extra cost in this sub-handlers is fine that would sound plausible to me 
given the frequency of display related interrupts is low. So for me 
patch is fine if it makes the display decoupling easier.



What would the interface be like if display were its own module? We
couldn't just wrap it all in a bunch of macros and static inlines. Is
the end result that display irq handling needs to call functions via
pointers in another module? Or do we need to move the register level irq
handling to xe and i915 cores, and handle the display parts at a higher
abstraction level?


AFAIR no trace variants were not for performance but to avoid log spam 
when debugging stuff. From things like busy/polling loops.


Regards,

Tvrtko


btw, if we drop the raw accesses, then we can probably drop (1) above.

Lucas De Marchi



Signed-off-by: Jani Nikula 
---
drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c 
b/drivers/gpu/drm/i915/display/intel_display_irq.c
index f846c5b108b5..d4ae9139be39 100644
--- a/drivers/gpu/drm/i915/display/intel_display_irq.c
+++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
@@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private 
*dev_priv, u32 master_ctl)

u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
{
-   void __iomem * const regs = intel_uncore_regs(&i915->uncore);
u32 iir;

if (!(master_ctl & GEN11_GU_MISC_IRQ))
return 0;

-   iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
+   iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
if (likely(iir))
-   raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
+   intel_de_write(i915, GEN11_GU_MISC_IIR, iir);

return iir;
}
@@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private 
*i915, const u32 iir)

void gen11_display_irq_handler(struct drm_i915_private *i915)
{
-   void __iomem * const regs = intel_uncore_regs(&i915->uncore);
-   const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
+   u32 disp_ctl;

disable_rpm_wakeref_asserts(&i915->runtime_pm);
/*
 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
 * for the display related bits.
 */
-   raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
+   disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
+
+   intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
 

Re: [PATCH] drm/i915/gem: Replace dev_priv with i915

2024-03-28 Thread Tvrtko Ursulin



On 28/03/2024 07:18, Andi Shyti wrote:

Anyone using 'dev_priv' instead of 'i915' in a cleaned-up area
should be fined and required to do community service for a few
days.

I thought I had cleaned up the 'gem/' directory in the past, but
still, old aficionados of the 'dev_priv' name keep sneaking it
in.

Signed-off-by: Andi Shyti 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  4 ++--
  drivers/gpu/drm/i915/gem/i915_gem_shmem.c  |  6 +++---
  drivers/gpu/drm/i915/gem/i915_gem_stolen.h |  8 
  drivers/gpu/drm/i915/gem/i915_gem_tiling.c | 18 +-
  drivers/gpu/drm/i915/gem/i915_gem_userptr.c|  6 +++---
  .../gpu/drm/i915/gem/selftests/huge_pages.c| 14 +++---
  6 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 3f20fe381199..42619fc05de4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2456,7 +2456,7 @@ static int eb_submit(struct i915_execbuffer *eb)
   * The engine index is returned.
   */
  static unsigned int
-gen8_dispatch_bsd_engine(struct drm_i915_private *dev_priv,
+gen8_dispatch_bsd_engine(struct drm_i915_private *i915,
 struct drm_file *file)
  {
struct drm_i915_file_private *file_priv = file->driver_priv;
@@ -2464,7 +2464,7 @@ gen8_dispatch_bsd_engine(struct drm_i915_private 
*dev_priv,
/* Check whether the file_priv has already selected one ring. */
if ((int)file_priv->bsd_engine < 0)
file_priv->bsd_engine =
-   
get_random_u32_below(dev_priv->engine_uabi_class_count[I915_ENGINE_CLASS_VIDEO]);
+   
get_random_u32_below(i915->engine_uabi_class_count[I915_ENGINE_CLASS_VIDEO]);
  
  	return file_priv->bsd_engine;

  }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 38b72d86560f..c5e1c718a6d2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -654,7 +654,7 @@ i915_gem_object_create_shmem(struct drm_i915_private *i915,
  
  /* Allocate a new GEM object and fill it with the supplied data */

  struct drm_i915_gem_object *
-i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
+i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915,
   const void *data, resource_size_t size)
  {
struct drm_i915_gem_object *obj;
@@ -663,8 +663,8 @@ i915_gem_object_create_shmem_from_data(struct 
drm_i915_private *dev_priv,
resource_size_t offset;
int err;
  
-	GEM_WARN_ON(IS_DGFX(dev_priv));

-   obj = i915_gem_object_create_shmem(dev_priv, round_up(size, PAGE_SIZE));
+   GEM_WARN_ON(IS_DGFX(i915));
+   obj = i915_gem_object_create_shmem(i915, round_up(size, PAGE_SIZE));
if (IS_ERR(obj))
return obj;
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h

index 258381d1c054..dfe0db8bb1b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h
@@ -14,14 +14,14 @@ struct drm_i915_gem_object;
  
  #define i915_stolen_fb drm_mm_node
  
-int i915_gem_stolen_insert_node(struct drm_i915_private *dev_priv,

+int i915_gem_stolen_insert_node(struct drm_i915_private *i915,
struct drm_mm_node *node, u64 size,
unsigned alignment);
-int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv,
+int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *i915,
 struct drm_mm_node *node, u64 size,
 unsigned alignment, u64 start,
 u64 end);
-void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
+void i915_gem_stolen_remove_node(struct drm_i915_private *i915,
 struct drm_mm_node *node);
  struct intel_memory_region *
  i915_gem_stolen_smem_setup(struct drm_i915_private *i915, u16 type,
@@ -31,7 +31,7 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 
type,
   u16 instance);
  
  struct drm_i915_gem_object *

-i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,
+i915_gem_object_create_stolen(struct drm_i915_private *i915,
  resource_size_t size);
  
  bool i915_gem_object_is_stolen(const struct drm_i915_gem_object *obj);

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c 
b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
index a049ca0b7980..d9eb84c1d2f1 100644
--- a/drivers/gpu/drm/i915/gem/i915_

Re: [PATCH v6 0/3] Disable automatic load CCS load balancing

2024-03-20 Thread Tvrtko Ursulin



On 20/03/2024 15:06, Andi Shyti wrote:

Ping! Any thoughts here?


I only casually observed the discussion after I saw Matt suggested 
further simplifications. As I understood it, you will bring back the 
uabi engine games when adding the dynamic behaviour and that is fine by me.


Regards,

Tvrtko


On Wed, Mar 13, 2024 at 09:19:48PM +0100, Andi Shyti wrote:

Hi,

this series does basically two things:

1. Disables automatic load balancing as adviced by the hardware
workaround.

2. Assigns all the CCS slices to one single user engine. The user
will then be able to query only one CCS engine

>From v5 I have created a new file, gt/intel_gt_ccs_mode.c where
I added the intel_gt_apply_ccs_mode(). In the upcoming patches,
this file will contain the implementation for dynamic CCS mode
setting.

Thanks Tvrtko, Matt, John and Joonas for your reviews!

Andi

Changelog
=
v5 -> v6 (thanks Matt for the suggestions in v6)
  - Remove the refactoring and the for_each_available_engine()
macro and instead do not create the intel_engine_cs structure
at all.
  - In patch 1 just a trivial reordering of the bit definitions.

v4 -> v5
  - Use the workaround framework to do all the CCS balancing
settings in order to always apply the modes also when the
engine resets. Put everything in its own specific function to
be executed for the first CCS engine encountered. (Thanks
Matt)
  - Calculate the CCS ID for the CCS mode as the first available
CCS among all the engines (Thanks Matt)
  - create the intel_gt_ccs_mode.c function to host the CCS
configuration. We will have it ready for the next series.
  - Fix a selftest that was failing because could not set CCS2.
  - Add the for_each_available_engine() macro to exclude CCS1+ and
start using it in the hangcheck selftest.

v3 -> v4
  - Reword correctly the comment in the workaround
  - Fix a buffer overflow (Thanks Joonas)
  - Handle properly the fused engines when setting the CCS mode.

v2 -> v3
  - Simplified the algorithm for creating the list of the exported
uabi engines. (Patch 1) (Thanks, Tvrtko)
  - Consider the fused engines when creating the uabi engine list
(Patch 2) (Thanks, Matt)
  - Patch 4 now uses a the refactoring from patch 1, in a cleaner
outcome.

v1 -> v2
  - In Patch 1 use the correct workaround number (thanks Matt).
  - In Patch 2 do not add the extra CCS engines to the exposed
UABI engine list and adapt the engine counting accordingly
(thanks Tvrtko).
  - Reword the commit of Patch 2 (thanks John).

Andi Shyti (3):
   drm/i915/gt: Disable HW load balancing for CCS
   drm/i915/gt: Do not generate the command streamer for all the CCS
   drm/i915/gt: Enable only one CCS for compute workload

  drivers/gpu/drm/i915/Makefile   |  1 +
  drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 20 ---
  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +
  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++
  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  6 
  drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++--
  6 files changed, 103 insertions(+), 6 deletions(-)
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h

--
2.43.0


Re: [PATCH 0/5] drm/i915: cleanup dead code

2024-03-12 Thread Tvrtko Ursulin



On 11/03/2024 19:27, Lucas De Marchi wrote:

On Mon, Mar 11, 2024 at 05:43:00PM +, Tvrtko Ursulin wrote:


On 06/03/2024 19:36, Lucas De Marchi wrote:

Remove platforms that never had their PCI IDs added to the driver and
are of course marked with requiring force_probe. Note that most of the
code for those platforms is actually used by subsequent ones, so it's
not a huge amount of code being removed.


I had PVC and xehpsdv back in October but could not collect all acks. :(

Last two patches from https://patchwork.freedesktop.org/series/124705/.


oh... I was actually surprised we still had xehpsdv while removing a
WA for PVC, which made me look into removing these platforms.

rebasing your series and comparing yours..my-v2, where my-v2 only has
patches 2 and 4, I have the diff below. I think it's small enough that I
can just take your commits and squash delta. Is that ok to you?

my version is a little bit more aggressive, also doing some renames
s/xehpsdv/xehp/ and dropping some more code
(engine_mask_apply_copy_fuses(), unused registers, default ctx, fw
ranges).


Right, yeah I see I missed some case combos in the comments when 
grepping and more.


 diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
b/Documentation/gpu/rfc/i915_vm_bind.h

 index 8a8fcd4fceac..bc26dc126104 100644
 --- a/Documentation/gpu/rfc/i915_vm_bind.h
 +++ b/Documentation/gpu/rfc/i915_vm_bind.h
 @@ -93,12 +93,11 @@ struct drm_i915_gem_timeline_fence {
   * Multiple VA mappings can be created to the same section of the 
object

   * (aliasing).
   *
 - * The @start, @offset and @length must be 4K page aligned. 
However the DG2
 - * and XEHPSDV has 64K page size for device local memory and has 
compact page
 - * table. On those platforms, for binding device local-memory 
objects, the
 - * @start, @offset and @length must be 64K aligned. Also, UMDs 
should not mix
 - * the local memory 64K page and the system memory 4K page 
bindings in the same

 - * 2M range.
 + * The @start, @offset and @length must be 4K page aligned. 
However the DG2 has
 + * 64K page size for device local memory and has compact page 
table. On that
 + * platform, for binding device local-memory objects, the @start, 
@offset and
 + * @length must be 64K aligned. Also, UMDs should not mix the 
local memory 64K

 + * page and the system memory 4K page bindings in the same 2M range.
   *
   * Error code -EINVAL will be returned if @start, @offset and 
@length are not
   * properly aligned. In version 1 (See 
I915_PARAM_VM_BIND_VERSION), error code
 diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h

 index 1495b6074492..d3300ae3053f 100644
 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
 +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
 @@ -386,7 +386,7 @@ struct drm_i915_gem_object {
  * and kernel mode driver for caching policy control after GEN12.
  * In the meantime platform specific tables are created to 
translate
  * i915_cache_level into pat index, for more details check the 
macros

 - * defined i915/i915_pci.c, e.g. TGL_CACHELEVEL.
 + * defined i915/i915_pci.c, e.g. MTL_CACHELEVEL.


Why this?

  * For backward compatibility, this field contains values 
exactly match
  * the entries of enum i915_cache_level for pre-GEN12 platforms 
(See

  * LEGACY_CACHELEVEL), so that the PTE encode functions for these
 diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c

 index fa46d2308b0e..1bd0e041e15c 100644
 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
 +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
 @@ -500,11 +500,11 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
  }
  static void
 -xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
 -  struct i915_vma_resource *vma_res,
 -  struct sgt_dma *iter,
 -  unsigned int pat_index,
 -  u32 flags)
 +xehp_ppgtt_insert_huge(struct i915_address_space *vm,
 +   struct i915_vma_resource *vma_res,
 +   struct sgt_dma *iter,
 +   unsigned int pat_index,
 +   u32 flags)
  {
     const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
     unsigned int rem = sg_dma_len(iter->sg);
 @@ -741,8 +741,8 @@ static void gen8_ppgtt_insert(struct 
i915_address_space *vm,

     struct sgt_dma iter = sgt_dma(vma_res);
     if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
 -    if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
 -    xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, 
pat_index, flags);

 +    if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 55))
 +    xehp_ppgtt_ins

Re: [PATCH 0/5] drm/i915: cleanup dead code

2024-03-11 Thread Tvrtko Ursulin



On 06/03/2024 19:36, Lucas De Marchi wrote:

Remove platforms that never had their PCI IDs added to the driver and
are of course marked with requiring force_probe. Note that most of the
code for those platforms is actually used by subsequent ones, so it's
not a huge amount of code being removed.


I had PVC and xehpsdv back in October but could not collect all acks. :(

Last two patches from https://patchwork.freedesktop.org/series/124705/.

Regards,

Tvrtko


drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h is also changed on the
xe side, but that should be ok: the defines are there only for compat
reasons while building the display side (and none of these platforms
have display, so it's build-issue only).

First patch is what motivated the others and was submitted alone
@ 20240306144723.1826977-1-lucas.demar...@intel.com .
While loooking at this WA I was wondering why we still had some of that
code around.

Build-tested only for now.

Lucas De Marchi (5):
   drm/i915: Drop WA 16015675438
   drm/i915: Drop dead code for xehpsdv
   drm/i915: Update IP_VER(12, 50)
   drm/i915: Drop dead code for pvc
   drm/i915: Remove special handling for !RCS_MASK()

  Documentation/gpu/rfc/i915_vm_bind.h  |  11 +-
  .../gpu/drm/i915/gem/i915_gem_object_types.h  |   2 +-
  .../gpu/drm/i915/gem/selftests/huge_pages.c   |   4 +-
  .../i915/gem/selftests/i915_gem_client_blt.c  |   8 +-
  drivers/gpu/drm/i915/gt/gen8_engine_cs.c  |   5 +-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  |  40 ++--
  drivers/gpu/drm/i915/gt/intel_engine_cs.c |  43 +---
  .../drm/i915/gt/intel_execlists_submission.c  |  10 +-
  drivers/gpu/drm/i915/gt/intel_gsc.c   |  15 --
  drivers/gpu/drm/i915/gt/intel_gt.c|   4 +-
  drivers/gpu/drm/i915/gt/intel_gt_mcr.c|  52 +
  drivers/gpu/drm/i915/gt/intel_gt_mcr.h|   2 +-
  drivers/gpu/drm/i915/gt/intel_gt_regs.h   |  59 --
  drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c   |  21 +-
  drivers/gpu/drm/i915/gt/intel_gtt.c   |   2 +-
  drivers/gpu/drm/i915/gt/intel_lrc.c   |  51 +
  drivers/gpu/drm/i915/gt/intel_migrate.c   |  22 +-
  drivers/gpu/drm/i915/gt/intel_mocs.c  |  52 +
  drivers/gpu/drm/i915/gt/intel_rps.c   |   6 +-
  drivers/gpu/drm/i915/gt/intel_sseu.c  |  13 +-
  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 193 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc.c|   6 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|   4 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c |   2 +-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   1 -
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   2 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc.c |   4 -
  drivers/gpu/drm/i915/i915_debugfs.c   |  12 --
  drivers/gpu/drm/i915/i915_drv.h   |  13 --
  drivers/gpu/drm/i915/i915_getparam.c  |   4 +-
  drivers/gpu/drm/i915/i915_gpu_error.c |   5 +-
  drivers/gpu/drm/i915/i915_hwmon.c |   6 -
  drivers/gpu/drm/i915/i915_pci.c   |  61 +-
  drivers/gpu/drm/i915/i915_perf.c  |  19 +-
  drivers/gpu/drm/i915/i915_query.c |   2 +-
  drivers/gpu/drm/i915/i915_reg.h   |   4 +-
  drivers/gpu/drm/i915/intel_clock_gating.c |  26 +--
  drivers/gpu/drm/i915/intel_device_info.c  |   2 -
  drivers/gpu/drm/i915/intel_device_info.h  |   2 -
  drivers/gpu/drm/i915/intel_step.c |  80 +---
  drivers/gpu/drm/i915/intel_uncore.c   | 159 +--
  drivers/gpu/drm/i915/selftests/intel_uncore.c |   3 -
  .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   6 -
  43 files changed, 110 insertions(+), 928 deletions(-)



[PATCH] MAINTAINERS: Update email address for Tvrtko Ursulin

2024-02-28 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

I will lose access to my @.*intel.com e-mail addresses soon so let me
adjust the maintainers entry and update the mailmap too.

While at it consolidate a few other of my old emails to point to the
main one.

Signed-off-by: Tvrtko Ursulin 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
 .mailmap| 5 +
 MAINTAINERS | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index b99a238ee3bd..d67e351bce8e 100644
--- a/.mailmap
+++ b/.mailmap
@@ -608,6 +608,11 @@ TripleX Chung  
 TripleX Chung  
 Tsuneo Yoshioka 
 Tudor Ambarus  
+Tvrtko Ursulin  
+Tvrtko Ursulin  
+Tvrtko Ursulin  
+Tvrtko Ursulin  
+Tvrtko Ursulin  
 Tycho Andersen  
 Tzung-Bi Shih  
 Uwe Kleine-König 
diff --git a/MAINTAINERS b/MAINTAINERS
index 19f6f8014f94..b940bfe2a692 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10734,7 +10734,7 @@ INTEL DRM I915 DRIVER (Meteor Lake, DG2 and older 
excluding Poulsbo, Moorestown
 M: Jani Nikula 
 M: Joonas Lahtinen 
 M: Rodrigo Vivi 
-M: Tvrtko Ursulin 
+M: Tvrtko Ursulin 
 L: intel-gfx@lists.freedesktop.org
 S: Supported
 W: https://drm.pages.freedesktop.org/intel-docs/
-- 
2.40.1



[PULL] drm-intel-gt-next

2024-02-28 Thread Tvrtko Ursulin
Hi Dave, Sima,

Last drm-intel-gt-next pull request for 6.9.

There are only two small fixes in there so could also wait for the
-next-fixes round if so would be preferred. One fix is for a kerneldoc
warning and other for a very unlikely userptr object creation failure
where cleanup would oops.

Regards,

Tvrtko

drm-intel-gt-next-2024-02-28:
Driver Changes:

Fixes:

- Add some boring kerneldoc (Tvrtko Ursulin)
- Check before removing mm notifier (Nirmoy
The following changes since commit eb927f01dfb6309c8a184593c2c0618c4000c481:

  drm/i915/gt: Restart the heartbeat timer when forcing a pulse (2024-02-14 
17:17:35 -0800)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm-intel tags/drm-intel-gt-next-2024-02-28

for you to fetch changes up to db7bbd13f08774cde0332c705f042e327fe21e73:

  drm/i915: Check before removing mm notifier (2024-02-28 13:11:32 +)


Driver Changes:

Fixes:

- Add some boring kerneldoc (Tvrtko Ursulin)
- Check before removing mm notifier (Nirmoy


Nirmoy Das (1):
  drm/i915: Check before removing mm notifier

Tvrtko Ursulin (1):
  drm/i915: Add some boring kerneldoc

 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 +++
 include/uapi/drm/i915_drm.h | 4 
 2 files changed, 7 insertions(+)


Re: [PATCH] drm/i915: check before removing mm notifier

2024-02-28 Thread Tvrtko Ursulin



On 27/02/2024 09:26, Nirmoy Das wrote:

Hi Tvrtko,

On 2/27/2024 10:04 AM, Tvrtko Ursulin wrote:


On 21/02/2024 11:52, Nirmoy Das wrote:

Merged it to drm-intel-gt-next with s/check/Check


Shouldn't this have had:

Fixes: ed29c2691188 ("drm/i915: Fix userptr so we do not have to worry 
about obj->mm.lock, v7.")

Cc:  # v5.13+

?


Yes. Sorry, I missed that. Can we still the tag ?


I've added them and force pushed the branch since commit was still at 
the top.


FYI + Jani, Joonas and Rodrigo

Regards,

Tvrtko




Thanks,

Nirmoy


Regards,

Tvrtko


On 2/19/2024 1:50 PM, Nirmoy Das wrote:

Error in mmu_interval_notifier_insert() can leave a NULL
notifier.mm pointer. Catch that and return early.

Cc: Andi Shyti 
Cc: Shawn Lee 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c

index 0e21ce9d3e5a..61abfb505766 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -349,6 +349,9 @@ i915_gem_userptr_release(struct 
drm_i915_gem_object *obj)

  {
  GEM_WARN_ON(obj->userptr.page_ref);
+    if (!obj->userptr.notifier.mm)
+    return;
+
mmu_interval_notifier_remove(&obj->userptr.notifier);
  obj->userptr.notifier.mm = NULL;
  }


Re: [PATCH v2] drm/i915/guc: Use context hints for GT freq

2024-02-28 Thread Tvrtko Ursulin



On 27/02/2024 23:51, Vinay Belgaumkar wrote:

Allow user to provide a low latency context hint. When set, KMD
sends a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.

We need to enable the use of SLPC Compute strategy during init, but
it will apply only to contexts that set this bit during context
creation.

Userland can check whether this feature is supported using a new param-
I915_PARAM_HAS_CONTEXT_FREQ_HINTS. This flag is true for all guc submission
enabled platforms as they use SLPC for frequency management.

The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint

v2: Rename flags as per review suggestions (Rodrigo, Tvrtko).
Also, use flag bits in intel_context as it allows finer control for
toggling per engine if needed (Tvrtko).

Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: Sushma Venkatesh Reddy 
Signed-off-by: Vinay Belgaumkar 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 15 +++--
  .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
  drivers/gpu/drm/i915/gt/intel_context_types.h |  1 +
  drivers/gpu/drm/i915/gt/intel_rps.c   |  5 +
  .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 21 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 17 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 ++
  drivers/gpu/drm/i915/i915_getparam.c  | 12 +++
  include/uapi/drm/i915_drm.h   | 15 +
  10 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..0799cb0b2803 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -879,6 +879,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private 
*fpriv,
   struct i915_gem_proto_context *pc,
   struct drm_i915_gem_context_param *args)
  {
+   struct drm_i915_private *i915 = fpriv->i915;
int ret = 0;
  
  	switch (args->param) {

@@ -904,6 +905,13 @@ static int set_proto_ctx_param(struct 
drm_i915_file_private *fpriv,
pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
break;
  
+	case I915_CONTEXT_PARAM_LOW_LATENCY:

+   if (intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   pc->user_flags |= BIT(UCONTEXT_LOW_LATENCY);
+   else
+   ret = -EINVAL;
+   break;
+
case I915_CONTEXT_PARAM_RECOVERABLE:
if (args->size)
ret = -EINVAL;
@@ -992,6 +1000,9 @@ static int intel_context_set_gem(struct intel_context *ce,
if (sseu.slice_mask && !WARN_ON(ce->engine->class != RENDER_CLASS))
ret = intel_context_reconfigure_sseu(ce, sseu);
  
+	if (test_bit(UCONTEXT_LOW_LATENCY, &ctx->user_flags))

+   set_bit(CONTEXT_LOW_LATENCY, &ce->flags);


Does not need to be atomic so can use __set_bit as higher up in the 
function.



+
return ret;
  }
  
@@ -1630,6 +1641,8 @@ i915_gem_create_context(struct drm_i915_private *i915,

if (vm)
ctx->vm = vm;
  
+	ctx->user_flags = pc->user_flags;

+


Given how most ctx->something assignments are at the bottom of the 
function I would stick a comment here saying along the lines of "assign 
early for intel_context_set_gem called when creating engines".



mutex_init(&ctx->engines_mutex);
if (pc->num_user_engines >= 0) {
i915_gem_context_set_user_engines(ctx);
@@ -1652,8 +1665,6 @@ i915_gem_create_context(struct drm_i915_private *i915,
 * is no remap info, it will be a NOP. */
ctx->remap_slice = ALL_L3_SLICES(i915);
  
-	ctx->user_flags = pc->user_flags;

-
for (i = 0; i < ARRAY_SIZE(ctx->hang_timestamp); i++)
ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES;
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h

index 03bc7f9d191b..b6d97da63d1f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -338,6 +338,7 @@ struct i915_gem_context {
  #define UCONTEXT_BANNABLE 2
  #define UCONTEXT_RECOVERABLE  3
  #define UCONTEXT_PERSISTENCE  4
+#define UCONTEXT_LOW_LATENCY   5
  
  	/**

 * @flags: small set of booleans
diff -

Re: [PATCH] drm/i915: check before removing mm notifier

2024-02-27 Thread Tvrtko Ursulin



On 21/02/2024 11:52, Nirmoy Das wrote:

Merged it to drm-intel-gt-next with s/check/Check


Shouldn't this have had:

Fixes: ed29c2691188 ("drm/i915: Fix userptr so we do not have to worry about 
obj->mm.lock, v7.")
Cc:  # v5.13+

?

Regards,

Tvrtko
 

On 2/19/2024 1:50 PM, Nirmoy Das wrote:

Error in mmu_interval_notifier_insert() can leave a NULL
notifier.mm pointer. Catch that and return early.

Cc: Andi Shyti 
Cc: Shawn Lee 
Signed-off-by: Nirmoy Das 
---
  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c

index 0e21ce9d3e5a..61abfb505766 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -349,6 +349,9 @@ i915_gem_userptr_release(struct 
drm_i915_gem_object *obj)

  {
  GEM_WARN_ON(obj->userptr.page_ref);
+    if (!obj->userptr.notifier.mm)
+    return;
+
  mmu_interval_notifier_remove(&obj->userptr.notifier);
  obj->userptr.notifier.mm = NULL;
  }


Re: [PATCH 2/2] drm/i915: Support replaying GPU hangs with captured context image

2024-02-26 Thread Tvrtko Ursulin




On 22/02/2024 21:07, Rodrigo Vivi wrote:

On Wed, Feb 21, 2024 at 02:22:45PM +, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.

This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.

To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.

Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.

In terms of implementation we allow the legacy context set param to be
used since that removes the need to record the per context data in the
proto context, while still allowing flexibility of specifying context
images for any context.

Mesa MR using the uapi can be seen at:
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594

v2:
  * Fix whitespace alignment as per checkpatch.
  * Added warning on userspace misuse.
  * Rebase for extracting ce->default_state shadowing.

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
Cc: Rodrigo Vivi 
Reviewed-by: Rodrigo Vivi  # v1


still valid for v2. Thanks for splitting the patch.


Great, thanks!

Now we need to hear from Lionel if he is still keen to have this. In 
which case some acks or tested by would be good.


Regards,

Tvrtko


---
  drivers/gpu/drm/i915/Kconfig.debug|  17 +++
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 113 ++
  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
  drivers/gpu/drm/i915/gt/intel_context.h   |  22 
  drivers/gpu/drm/i915/gt/intel_context_types.h |   1 +
  drivers/gpu/drm/i915/gt/intel_lrc.c   |   3 +-
  .../gpu/drm/i915/gt/intel_ring_submission.c   |   3 +-
  drivers/gpu/drm/i915/i915_params.c|   5 +
  drivers/gpu/drm/i915/i915_params.h|   3 +-
  include/uapi/drm/i915_drm.h   |  27 +
  10 files changed, 193 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug 
b/drivers/gpu/drm/i915/Kconfig.debug
index 5b7162076850..32e9f70e91ed 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -16,6 +16,23 @@ config DRM_I915_WERROR
  
  	  If in doubt, say "N".
  
+config DRM_I915_REPLAY_GPU_HANGS_API

+   bool "Enable GPU hang replay userspace API"
+   depends on DRM_I915
+   depends on EXPERT
+   default n
+   help
+ Choose this option if you want to enable special and unstable
+ userspace API used for replaying GPU hangs on a running system.
+
+ This API is intended to be used by userspace graphics stack developers
+ and provides no stability guarantees.
+
+ The API needs to be activated at boot time using the
+ enable_debug_only_api module parameter.
+
+ If in doubt, say "N".
+
  config DRM_I915_DEBUG
bool "Enable additional driver debugging"
depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..481aacbc1772 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -78,6 +78,7 @@
  #include "gt/intel_engine_user.h"
  #include "gt/intel_gpu_commands.h"
  #include "gt/intel_ring.h"
+#include "gt/shmem_utils.h"
  
  #include "pxp/intel_pxp.h"
  
@@ -949,6 +950,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private *fpriv,

case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
+   case I915_CONTEXT_PARAM_CONTEXT_IMAGE:
default:
ret = -EINVAL;
break;
@@ -2092,6 +2094,95 @@ static int get_protected(struct i915_gem_context *ctx,
return 0;
  }
  
+static int set_context_image(struct i915_gem_context *ctx,

+struct drm_i915_gem_context_param *args)
+{
+   struct i915_gem_context_param_context_image user;
+   struct intel_context *ce;
+   struct file *shmem_state;
+   unsigned long lookup;
+   void *state;
+   int ret = 0;
+
+   if (!IS_ENABLED(CONFIG_DRM_I915_REPLAY_GPU_HANGS_API))
+   return -EINVAL;
+
+   if (!ctx-

Re: [PATCH] drm/i915/guc: Add Compute context hint

2024-02-26 Thread Tvrtko Ursulin



On 26/02/2024 08:47, Tvrtko Ursulin wrote:


On 23/02/2024 19:25, Rodrigo Vivi wrote:

On Fri, Feb 23, 2024 at 10:31:41AM -0800, Belgaumkar, Vinay wrote:


On 2/23/2024 12:51 AM, Tvrtko Ursulin wrote:


On 22/02/2024 23:31, Belgaumkar, Vinay wrote:


On 2/22/2024 7:32 AM, Tvrtko Ursulin wrote:


On 21/02/2024 21:28, Rodrigo Vivi wrote:

On Wed, Feb 21, 2024 at 09:42:34AM +, Tvrtko Ursulin wrote:


On 21/02/2024 00:14, Vinay Belgaumkar wrote:

Allow user to provide a context hint. When this is set, KMD will
send a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this
context more slowly.
We also disable waitboost for this context as that
will interfere with
the strategy.

We need to enable the use of Compute strategy during SLPC init, 
but

it will apply only to contexts that set this bit during context
creation.

Userland can check whether this feature is supported
using a new param-
I915_PARAM_HAS_COMPUTE_CONTEXT. This flag is true
for all guc submission
enabled platforms since they use SLPC for freq management.

The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint


This allows for setting it for the whole application,
correct? Upsides,
downsides? Are there any plans for per context?


Currently there's no extension on a high level API
(Vulkan/OpenGL/OpenCL/etc)
that would allow the application to hint for
power/freq/latency. So Mesa cannot
decide when to hint. So their solution was to use .drirc and
make per-application
decision.

I would prefer a high level extension for a more granular
and informative
decision. We need to work with that goal, but for now I don't see 
any

cons on this approach.


In principle yeah I doesn't harm to have the option. I am just
not sure how useful this intermediate step this is with its lack
of intra-process granularity.


Cc: Rodrigo Vivi 
Signed-off-by: Vinay Belgaumkar 
---
    drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +++
    .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
    drivers/gpu/drm/i915/gt/intel_rps.c   |  8 +++
    .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |
21 +++
    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |
17 +++
    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
    .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  7 +++
    drivers/gpu/drm/i915/i915_getparam.c  | 11 ++
    include/uapi/drm/i915_drm.h   | 15 
+

    9 files changed, 89 insertions(+)

diff --git
a/drivers/gpu/drm/i915/gem/i915_gem_context.c
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..ceab7dbe9b47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -879,6 +879,7 @@ static int
set_proto_ctx_param(struct drm_i915_file_private
*fpriv,
   struct i915_gem_proto_context *pc,
   struct drm_i915_gem_context_param *args)
    {
+    struct drm_i915_private *i915 = fpriv->i915;
    int ret = 0;
    switch (args->param) {
@@ -904,6 +905,13 @@ static int
set_proto_ctx_param(struct drm_i915_file_private
*fpriv,
    pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
    break;
+    case I915_CONTEXT_PARAM_IS_COMPUTE:
+    if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+    ret = -EINVAL;
+    else
+    pc->user_flags |= BIT(UCONTEXT_COMPUTE);
+    break;
+
    case I915_CONTEXT_PARAM_RECOVERABLE:
    if (args->size)
    ret = -EINVAL;
diff --git
a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 03bc7f9d191b..db86d6f6245f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -338,6 +338,7 @@ struct i915_gem_context {
    #define UCONTEXT_BANNABLE    2
    #define UCONTEXT_RECOVERABLE    3
    #define UCONTEXT_PERSISTENCE    4
+#define UCONTEXT_COMPUTE    5


What is the GuC behaviour when
SLPC_CTX_FREQ_REQ_IS_COMPUTE is set for
non-compute engines? Wondering if per intel_context is
what we want instead.
(Which could then be the i915_context_param_engines extension to 
mark

individual contexts as compute strategy.)


Perhaps we should rename this? This is a freq-decision-strategy 
inside
GuC that is there mostly targeting compute workloads that needs 
lower

latency with short burst execution. But the engine itself
doesn't matter.
It can be applied to any engine.


I have no idea if it makes sense for other engines, such as
video, and what would be pros and cons in terms of PnP. But in
the case we end up allowing it on any engine, then at least
userspace 

Re: [PATCH] drm/i915/guc: Add Compute context hint

2024-02-26 Thread Tvrtko Ursulin



On 23/02/2024 19:25, Rodrigo Vivi wrote:

On Fri, Feb 23, 2024 at 10:31:41AM -0800, Belgaumkar, Vinay wrote:


On 2/23/2024 12:51 AM, Tvrtko Ursulin wrote:


On 22/02/2024 23:31, Belgaumkar, Vinay wrote:


On 2/22/2024 7:32 AM, Tvrtko Ursulin wrote:


On 21/02/2024 21:28, Rodrigo Vivi wrote:

On Wed, Feb 21, 2024 at 09:42:34AM +, Tvrtko Ursulin wrote:


On 21/02/2024 00:14, Vinay Belgaumkar wrote:

Allow user to provide a context hint. When this is set, KMD will
send a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this
context more slowly.
We also disable waitboost for this context as that
will interfere with
the strategy.

We need to enable the use of Compute strategy during SLPC init, but
it will apply only to contexts that set this bit during context
creation.

Userland can check whether this feature is supported
using a new param-
I915_PARAM_HAS_COMPUTE_CONTEXT. This flag is true
for all guc submission
enabled platforms since they use SLPC for freq management.

The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint


This allows for setting it for the whole application,
correct? Upsides,
downsides? Are there any plans for per context?


Currently there's no extension on a high level API
(Vulkan/OpenGL/OpenCL/etc)
that would allow the application to hint for
power/freq/latency. So Mesa cannot
decide when to hint. So their solution was to use .drirc and
make per-application
decision.

I would prefer a high level extension for a more granular
and informative
decision. We need to work with that goal, but for now I don't see any
cons on this approach.


In principle yeah I doesn't harm to have the option. I am just
not sure how useful this intermediate step this is with its lack
of intra-process granularity.


Cc: Rodrigo Vivi 
Signed-off-by: Vinay Belgaumkar 
---
    drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +++
    .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
    drivers/gpu/drm/i915/gt/intel_rps.c   |  8 +++
    .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h |
21 +++
    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |
17 +++
    drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
    .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  7 +++
    drivers/gpu/drm/i915/i915_getparam.c  | 11 ++
    include/uapi/drm/i915_drm.h   | 15 +
    9 files changed, 89 insertions(+)

diff --git
a/drivers/gpu/drm/i915/gem/i915_gem_context.c
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..ceab7dbe9b47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -879,6 +879,7 @@ static int
set_proto_ctx_param(struct drm_i915_file_private
*fpriv,
   struct i915_gem_proto_context *pc,
   struct drm_i915_gem_context_param *args)
    {
+    struct drm_i915_private *i915 = fpriv->i915;
    int ret = 0;
    switch (args->param) {
@@ -904,6 +905,13 @@ static int
set_proto_ctx_param(struct drm_i915_file_private
*fpriv,
    pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
    break;
+    case I915_CONTEXT_PARAM_IS_COMPUTE:
+    if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+    ret = -EINVAL;
+    else
+    pc->user_flags |= BIT(UCONTEXT_COMPUTE);
+    break;
+
    case I915_CONTEXT_PARAM_RECOVERABLE:
    if (args->size)
    ret = -EINVAL;
diff --git
a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 03bc7f9d191b..db86d6f6245f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -338,6 +338,7 @@ struct i915_gem_context {
    #define UCONTEXT_BANNABLE    2
    #define UCONTEXT_RECOVERABLE    3
    #define UCONTEXT_PERSISTENCE    4
+#define UCONTEXT_COMPUTE    5


What is the GuC behaviour when
SLPC_CTX_FREQ_REQ_IS_COMPUTE is set for
non-compute engines? Wondering if per intel_context is
what we want instead.
(Which could then be the i915_context_param_engines extension to mark
individual contexts as compute strategy.)


Perhaps we should rename this? This is a freq-decision-strategy inside
GuC that is there mostly targeting compute workloads that needs lower
latency with short burst execution. But the engine itself
doesn't matter.
It can be applied to any engine.


I have no idea if it makes sense for other engines, such as
video, and what would be pros and cons in terms of PnP. But in
the case we end up allowing it on any engine, then at least
userspace name shouldn't be compute. :)

Yes, one of the sugge

Re: [PATCH] drm/i915/guc: Add Compute context hint

2024-02-23 Thread Tvrtko Ursulin



On 22/02/2024 23:31, Belgaumkar, Vinay wrote:


On 2/22/2024 7:32 AM, Tvrtko Ursulin wrote:


On 21/02/2024 21:28, Rodrigo Vivi wrote:

On Wed, Feb 21, 2024 at 09:42:34AM +, Tvrtko Ursulin wrote:


On 21/02/2024 00:14, Vinay Belgaumkar wrote:

Allow user to provide a context hint. When this is set, KMD will
send a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.

We need to enable the use of Compute strategy during SLPC init, but
it will apply only to contexts that set this bit during context
creation.

Userland can check whether this feature is supported using a new 
param-
I915_PARAM_HAS_COMPUTE_CONTEXT. This flag is true for all guc 
submission

enabled platforms since they use SLPC for freq management.

The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint


This allows for setting it for the whole application, correct? Upsides,
downsides? Are there any plans for per context?


Currently there's no extension on a high level API 
(Vulkan/OpenGL/OpenCL/etc)
that would allow the application to hint for power/freq/latency. So 
Mesa cannot
decide when to hint. So their solution was to use .drirc and make 
per-application

decision.

I would prefer a high level extension for a more granular and 
informative

decision. We need to work with that goal, but for now I don't see any
cons on this approach.


In principle yeah I doesn't harm to have the option. I am just not 
sure how useful this intermediate step this is with its lack of 
intra-process granularity.



Cc: Rodrigo Vivi 
Signed-off-by: Vinay Belgaumkar 
---
   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +++
   .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
   drivers/gpu/drm/i915/gt/intel_rps.c   |  8 +++
   .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 21 
+++

   drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 17 +++
   drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  7 +++
   drivers/gpu/drm/i915/i915_getparam.c  | 11 ++
   include/uapi/drm/i915_drm.h   | 15 +
   9 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c

index dcbfe32fd30c..ceab7dbe9b47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -879,6 +879,7 @@ static int set_proto_ctx_param(struct 
drm_i915_file_private *fpriv,

  struct i915_gem_proto_context *pc,
  struct drm_i915_gem_context_param *args)
   {
+    struct drm_i915_private *i915 = fpriv->i915;
   int ret = 0;
   switch (args->param) {
@@ -904,6 +905,13 @@ static int set_proto_ctx_param(struct 
drm_i915_file_private *fpriv,

   pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
   break;
+    case I915_CONTEXT_PARAM_IS_COMPUTE:
+    if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+    ret = -EINVAL;
+    else
+    pc->user_flags |= BIT(UCONTEXT_COMPUTE);
+    break;
+
   case I915_CONTEXT_PARAM_RECOVERABLE:
   if (args->size)
   ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h

index 03bc7f9d191b..db86d6f6245f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -338,6 +338,7 @@ struct i915_gem_context {
   #define UCONTEXT_BANNABLE    2
   #define UCONTEXT_RECOVERABLE    3
   #define UCONTEXT_PERSISTENCE    4
+#define UCONTEXT_COMPUTE    5


What is the GuC behaviour when SLPC_CTX_FREQ_REQ_IS_COMPUTE is set for
non-compute engines? Wondering if per intel_context is what we want 
instead.

(Which could then be the i915_context_param_engines extension to mark
individual contexts as compute strategy.)


Perhaps we should rename this? This is a freq-decision-strategy inside
GuC that is there mostly targeting compute workloads that needs lower
latency with short burst execution. But the engine itself doesn't 
matter.

It can be applied to any engine.


I have no idea if it makes sense for other engines, such as video, and 
what would be pros and cons in terms of PnP. But in the case we end up 
allowing it on any engine, then at least userspace name shouldn't be 
compute. :)
Yes, one of the suggestions from Daniele was to have something along the 
lines of UCONTEXT_HIFREQ or something along those lines so we don't 
confuse it with the Compute

Re: [PATCH] drm/i915/guc: Add Compute context hint

2024-02-22 Thread Tvrtko Ursulin



On 21/02/2024 21:28, Rodrigo Vivi wrote:

On Wed, Feb 21, 2024 at 09:42:34AM +, Tvrtko Ursulin wrote:


On 21/02/2024 00:14, Vinay Belgaumkar wrote:

Allow user to provide a context hint. When this is set, KMD will
send a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.

We need to enable the use of Compute strategy during SLPC init, but
it will apply only to contexts that set this bit during context
creation.

Userland can check whether this feature is supported using a new param-
I915_PARAM_HAS_COMPUTE_CONTEXT. This flag is true for all guc submission
enabled platforms since they use SLPC for freq management.

The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint


This allows for setting it for the whole application, correct? Upsides,
downsides? Are there any plans for per context?


Currently there's no extension on a high level API (Vulkan/OpenGL/OpenCL/etc)
that would allow the application to hint for power/freq/latency. So Mesa cannot
decide when to hint. So their solution was to use .drirc and make 
per-application
decision.

I would prefer a high level extension for a more granular and informative
decision. We need to work with that goal, but for now I don't see any
cons on this approach.


In principle yeah I doesn't harm to have the option. I am just not sure 
how useful this intermediate step this is with its lack of intra-process 
granularity.



Cc: Rodrigo Vivi 
Signed-off-by: Vinay Belgaumkar 
---
   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +++
   .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
   drivers/gpu/drm/i915/gt/intel_rps.c   |  8 +++
   .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 21 +++
   drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 17 +++
   drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  7 +++
   drivers/gpu/drm/i915/i915_getparam.c  | 11 ++
   include/uapi/drm/i915_drm.h   | 15 +
   9 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..ceab7dbe9b47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -879,6 +879,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private 
*fpriv,
   struct i915_gem_proto_context *pc,
   struct drm_i915_gem_context_param *args)
   {
+   struct drm_i915_private *i915 = fpriv->i915;
int ret = 0;
switch (args->param) {
@@ -904,6 +905,13 @@ static int set_proto_ctx_param(struct 
drm_i915_file_private *fpriv,
pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
break;
+   case I915_CONTEXT_PARAM_IS_COMPUTE:
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   ret = -EINVAL;
+   else
+   pc->user_flags |= BIT(UCONTEXT_COMPUTE);
+   break;
+
case I915_CONTEXT_PARAM_RECOVERABLE:
if (args->size)
ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 03bc7f9d191b..db86d6f6245f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -338,6 +338,7 @@ struct i915_gem_context {
   #define UCONTEXT_BANNABLE2
   #define UCONTEXT_RECOVERABLE 3
   #define UCONTEXT_PERSISTENCE 4
+#define UCONTEXT_COMPUTE   5


What is the GuC behaviour when SLPC_CTX_FREQ_REQ_IS_COMPUTE is set for
non-compute engines? Wondering if per intel_context is what we want instead.
(Which could then be the i915_context_param_engines extension to mark
individual contexts as compute strategy.)


Perhaps we should rename this? This is a freq-decision-strategy inside
GuC that is there mostly targeting compute workloads that needs lower
latency with short burst execution. But the engine itself doesn't matter.
It can be applied to any engine.


I have no idea if it makes sense for other engines, such as video, and 
what would be pros and cons in terms of PnP. But in the case we end up 
allowing it on any engine, then at least userspace name shouldn't be 
compute. :)


Or if we decide to call it compute and only apply to compute engines, 
then I would strongly suggest making the uapi per intel_context i.e. the 
set engines extension inste

[PATCH 2/2] drm/i915: Support replaying GPU hangs with captured context image

2024-02-21 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.

This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.

To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.

Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.

In terms of implementation we allow the legacy context set param to be
used since that removes the need to record the per context data in the
proto context, while still allowing flexibility of specifying context
images for any context.

Mesa MR using the uapi can be seen at:
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594

v2:
 * Fix whitespace alignment as per checkpatch.
 * Added warning on userspace misuse.
 * Rebase for extracting ce->default_state shadowing.

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
Cc: Rodrigo Vivi 
Reviewed-by: Rodrigo Vivi  # v1
---
 drivers/gpu/drm/i915/Kconfig.debug|  17 +++
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 113 ++
 drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h   |  22 
 drivers/gpu/drm/i915/gt/intel_context_types.h |   1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   3 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   3 +-
 drivers/gpu/drm/i915/i915_params.c|   5 +
 drivers/gpu/drm/i915/i915_params.h|   3 +-
 include/uapi/drm/i915_drm.h   |  27 +
 10 files changed, 193 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug 
b/drivers/gpu/drm/i915/Kconfig.debug
index 5b7162076850..32e9f70e91ed 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -16,6 +16,23 @@ config DRM_I915_WERROR
 
  If in doubt, say "N".
 
+config DRM_I915_REPLAY_GPU_HANGS_API
+   bool "Enable GPU hang replay userspace API"
+   depends on DRM_I915
+   depends on EXPERT
+   default n
+   help
+ Choose this option if you want to enable special and unstable
+ userspace API used for replaying GPU hangs on a running system.
+
+ This API is intended to be used by userspace graphics stack developers
+ and provides no stability guarantees.
+
+ The API needs to be activated at boot time using the
+ enable_debug_only_api module parameter.
+
+ If in doubt, say "N".
+
 config DRM_I915_DEBUG
bool "Enable additional driver debugging"
depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..481aacbc1772 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -78,6 +78,7 @@
 #include "gt/intel_engine_user.h"
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
+#include "gt/shmem_utils.h"
 
 #include "pxp/intel_pxp.h"
 
@@ -949,6 +950,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private 
*fpriv,
case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
+   case I915_CONTEXT_PARAM_CONTEXT_IMAGE:
default:
ret = -EINVAL;
break;
@@ -2092,6 +2094,95 @@ static int get_protected(struct i915_gem_context *ctx,
return 0;
 }
 
+static int set_context_image(struct i915_gem_context *ctx,
+struct drm_i915_gem_context_param *args)
+{
+   struct i915_gem_context_param_context_image user;
+   struct intel_context *ce;
+   struct file *shmem_state;
+   unsigned long lookup;
+   void *state;
+   int ret = 0;
+
+   if (!IS_ENABLED(CONFIG_DRM_I915_REPLAY_GPU_HANGS_API))
+   return -EINVAL;
+
+   if (!ctx->i915->params.enable_debug_only_api)
+   return -EINVAL;
+
+   if (args->size < sizeof(user))
+   return -EINVAL;
+
+   if (copy_from_user(&user, u64_to_user_ptr(args->value), sizeof(user)))
+   return -EFAULT;
+
+   if (user.mbz)
+   return -EINVAL;
+
+ 

[PATCH v2 0/2] GPU hang replay

2024-02-21 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Please see 2/2 for explanation and rationale.

v2:
 * Extracted shadowing of default state into a leading patch.

Tvrtko Ursulin (2):
  drm/i915: Shadow default engine context image in the context
  drm/i915: Support replaying GPU hangs with captured context image

 drivers/gpu/drm/i915/Kconfig.debug|  17 +++
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 113 ++
 drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h   |  22 
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   8 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   8 +-
 drivers/gpu/drm/i915/i915_params.c|   5 +
 drivers/gpu/drm/i915/i915_params.h|   3 +-
 include/uapi/drm/i915_drm.h   |  27 +
 10 files changed, 201 insertions(+), 7 deletions(-)

-- 
2.40.1



[PATCH 1/2] drm/i915: Shadow default engine context image in the context

2024-02-21 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

To enable adding override of the default engine context image let us start
shadowing the per engine state in the context.

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
Cc: Rodrigo Vivi 
---
 drivers/gpu/drm/i915/gt/intel_context_types.h   | 2 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c | 7 ---
 drivers/gpu/drm/i915/gt/intel_ring_submission.c | 7 ---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 7eccbd70d89f..b179178680a5 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -99,6 +99,8 @@ struct intel_context {
struct i915_address_space *vm;
struct i915_gem_context __rcu *gem_context;
 
+   struct file *default_state;
+
/*
 * @signal_lock protects the list of requests that need signaling,
 * @signals. While there are any requests that need signaling,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7c367ba8d9dc..d4eb822d20ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1060,9 +1060,8 @@ void lrc_init_state(struct intel_context *ce,
 
set_redzone(state, engine);
 
-   if (engine->default_state) {
-   shmem_read(engine->default_state, 0,
-  state, engine->context_size);
+   if (ce->default_state) {
+   shmem_read(ce->default_state, 0, state, engine->context_size);
__set_bit(CONTEXT_VALID_BIT, &ce->flags);
inhibit = false;
}
@@ -1174,6 +1173,8 @@ int lrc_alloc(struct intel_context *ce, struct 
intel_engine_cs *engine)
 
GEM_BUG_ON(ce->state);
 
+   ce->default_state = engine->default_state;
+
vma = __lrc_alloc_state(ce, engine);
if (IS_ERR(vma))
return PTR_ERR(vma);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c 
b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 92085ffd23de..8625e88e785f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -474,8 +474,7 @@ static int ring_context_init_default_state(struct 
intel_context *ce,
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
 
-   shmem_read(ce->engine->default_state, 0,
-  vaddr, ce->engine->context_size);
+   shmem_read(ce->default_state, 0, vaddr, ce->engine->context_size);
 
i915_gem_object_flush_map(obj);
__i915_gem_object_release_map(obj);
@@ -491,7 +490,7 @@ static int ring_context_pre_pin(struct intel_context *ce,
struct i915_address_space *vm;
int err = 0;
 
-   if (ce->engine->default_state &&
+   if (ce->default_state &&
!test_bit(CONTEXT_VALID_BIT, &ce->flags)) {
err = ring_context_init_default_state(ce, ww);
if (err)
@@ -570,6 +569,8 @@ static int ring_context_alloc(struct intel_context *ce)
 {
struct intel_engine_cs *engine = ce->engine;
 
+   ce->default_state = engine->default_state;
+
/* One ringbuffer to rule them all */
GEM_BUG_ON(!engine->legacy.ring);
ce->ring = engine->legacy.ring;
-- 
2.40.1



Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

2024-02-21 Thread Tvrtko Ursulin




On 21/02/2024 12:08, Tvrtko Ursulin wrote:


On 21/02/2024 11:19, Andi Shyti wrote:

Hi Tvrtko,

On Wed, Feb 21, 2024 at 08:19:34AM +, Tvrtko Ursulin wrote:

On 21/02/2024 00:14, Andi Shyti wrote:

On Tue, Feb 20, 2024 at 02:48:31PM +, Tvrtko Ursulin wrote:

On 20/02/2024 14:35, Andi Shyti wrote:

Enable only one CCS engine by default with all the compute sices


slices


Thanks!

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c

index 833987015b8b..7041acc77810 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -243,6 +243,15 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)

    if (engine->uabi_class == I915_NO_UABI_CLASS)
    continue;
+    /*
+ * Do not list and do not count CCS engines other than 
the first

+ */
+    if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+    engine->uabi_instance > 0) {
+    i915->engine_uabi_class_count[engine->uabi_class]--;
+    continue;
+    }


It's a bit ugly to decrement after increment, instead of somehow
restructuring the loop to satisfy both cases more elegantly.


yes, agree, indeed I had a hard time here to accept this change
myself.

But moving the check above where the counter was incremented it
would have been much uglier.

This check looks ugly everywhere you place it :-)


One idea would be to introduce a separate local counter array for
name_instance, so not use i915->engine_uabi_class_count[]. First one
increments for every engine, second only for the exposed ones. That way
feels wouldn't be too ugly.


Ah... you mean that whenever we change the CCS mode, we update
the indexes of the exposed engines from list of the real engines.
Will try.

My approach was to regenerate the list everytime the CCS mode was
changed, but your suggestion looks a bit simplier.


No, I meant just for this first stage of permanently single engine. For 
avoiding the decrement after increment. Something like this, but not 
compile tested even:


diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c

index 833987015b8b..4c33f30612c4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,8 @@ static void engine_rename(struct intel_engine_cs 
*engine, const char *name, u16


  void intel_engines_driver_register(struct drm_i915_private *i915)
  {
-   u16 name_instance, other_instance = 0;
+   u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
+   u16 uabi_class, other_instance = 0;
     struct legacy_ring ring = {};
     struct list_head *it, *next;
     struct rb_node **p, *prev;
@@ -222,15 +223,14 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)


     GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
     engine->uabi_class = uabi_classes[engine->class];
+
     if (engine->uabi_class == I915_NO_UABI_CLASS) {
-   name_instance = other_instance++;
-   } else {
-   GEM_BUG_ON(engine->uabi_class >=
-  
ARRAY_SIZE(i915->engine_uabi_class_count));

-   name_instance =
-   
i915->engine_uabi_class_count[engine->uabi_class]++;

-   }
-   engine->uabi_instance = name_instance;
+   uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+   else
+   uabi_class = engine->uabi_class;
+
+   GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+   engine->uabi_instance = class_instance[uabi_class]++;

     /*
  * Replace the internal name with the final user and 
log facing
@@ -238,11 +238,15 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)

  */
     engine_rename(engine,
   intel_engine_class_repr(engine->class),
- name_instance);
+ engine->uabi_instance);

-   if (engine->uabi_class == I915_NO_UABI_CLASS)
+   if (uabi_class == I915_NO_UABI_CLASS)
     continue;


Here you just add the ccs skip condition.

Anyway.. I rushed it a bit so see what you think.

Regards,

Tvrtko



+   GEM_BUG_ON(uabi_class >=
+  ARRAY_SIZE(i915->engine_uabi_class_count));
+   i915->engine_uabi_class_count[uabi_class]++;
+
     rb_link_node(&engine->uabi_node, prev, p);
     rb_insert_color(&engine->uabi_node, &i915->uabi_engines);



In any case, I'm working on a patc

Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

2024-02-21 Thread Tvrtko Ursulin



On 21/02/2024 11:19, Andi Shyti wrote:

Hi Tvrtko,

On Wed, Feb 21, 2024 at 08:19:34AM +, Tvrtko Ursulin wrote:

On 21/02/2024 00:14, Andi Shyti wrote:

On Tue, Feb 20, 2024 at 02:48:31PM +, Tvrtko Ursulin wrote:

On 20/02/2024 14:35, Andi Shyti wrote:

Enable only one CCS engine by default with all the compute sices


slices


Thanks!


diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..7041acc77810 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
if (engine->uabi_class == I915_NO_UABI_CLASS)
continue;
+   /*
+* Do not list and do not count CCS engines other than the first
+*/
+   if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+   engine->uabi_instance > 0) {
+   i915->engine_uabi_class_count[engine->uabi_class]--;
+   continue;
+   }


It's a bit ugly to decrement after increment, instead of somehow
restructuring the loop to satisfy both cases more elegantly.


yes, agree, indeed I had a hard time here to accept this change
myself.

But moving the check above where the counter was incremented it
would have been much uglier.

This check looks ugly everywhere you place it :-)


One idea would be to introduce a separate local counter array for
name_instance, so not use i915->engine_uabi_class_count[]. First one
increments for every engine, second only for the exposed ones. That way
feels wouldn't be too ugly.


Ah... you mean that whenever we change the CCS mode, we update
the indexes of the exposed engines from list of the real engines.
Will try.

My approach was to regenerate the list everytime the CCS mode was
changed, but your suggestion looks a bit simplier.


No, I meant just for this first stage of permanently single engine. For 
avoiding the decrement after increment. Something like this, but not compile 
tested even:

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..4c33f30612c4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -203,7 +203,8 @@ static void engine_rename(struct intel_engine_cs *engine, 
const char *name, u16
 
 void intel_engines_driver_register(struct drm_i915_private *i915)

 {
-   u16 name_instance, other_instance = 0;
+   u16 class_instance[I915_LAST_UABI_ENGINE_CLASS + 2] = { };
+   u16 uabi_class, other_instance = 0;
struct legacy_ring ring = {};
struct list_head *it, *next;
struct rb_node **p, *prev;
@@ -222,15 +223,14 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 
GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));

engine->uabi_class = uabi_classes[engine->class];
+
if (engine->uabi_class == I915_NO_UABI_CLASS) {
-   name_instance = other_instance++;
-   } else {
-   GEM_BUG_ON(engine->uabi_class >=
-  ARRAY_SIZE(i915->engine_uabi_class_count));
-   name_instance =
-   
i915->engine_uabi_class_count[engine->uabi_class]++;
-   }
-   engine->uabi_instance = name_instance;
+   uabi_class = I915_LAST_UABI_ENGINE_CLASS + 1;
+   else
+   uabi_class = engine->uabi_class;
+
+   GEM_BUG_ON(uabi_class >= ARRAY_SIZE(class_instance));
+   engine->uabi_instance = class_instance[uabi_class]++;
 
/*

 * Replace the internal name with the final user and log facing
@@ -238,11 +238,15 @@ void intel_engines_driver_register(struct 
drm_i915_private *i915)
 */
engine_rename(engine,
  intel_engine_class_repr(engine->class),
- name_instance);
+ engine->uabi_instance);
 
-   if (engine->uabi_class == I915_NO_UABI_CLASS)

+   if (uabi_class == I915_NO_UABI_CLASS)
continue;
 
+   GEM_BUG_ON(uabi_class >=

+  ARRAY_SIZE(i915->engine_uabi_class_count));
+   i915->engine_uabi_class_count[uabi_class]++;
+
rb_link_node(&engine->uabi_node, prev, p);
rb_insert_color(&engine->uabi_node, &i915->uabi_engines);



In any case, I'm working on a patch that is splitting this
function in two parts and there is some refactoring happening
here (for the firs

Re: [PATCH] drm/i915/guc: Add Compute context hint

2024-02-21 Thread Tvrtko Ursulin



On 21/02/2024 00:14, Vinay Belgaumkar wrote:

Allow user to provide a context hint. When this is set, KMD will
send a hint to GuC which results in special handling for this
context. SLPC will ramp the GT frequency aggressively every time
it switches to this context. The down freq threshold will also be
lower so GuC will ramp down the GT freq for this context more slowly.
We also disable waitboost for this context as that will interfere with
the strategy.

We need to enable the use of Compute strategy during SLPC init, but
it will apply only to contexts that set this bit during context
creation.

Userland can check whether this feature is supported using a new param-
I915_PARAM_HAS_COMPUTE_CONTEXT. This flag is true for all guc submission
enabled platforms since they use SLPC for freq management.

The Mesa usage model for this flag is here -
https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint


This allows for setting it for the whole application, correct? Upsides, 
downsides? Are there any plans for per context?



Cc: Rodrigo Vivi 
Signed-off-by: Vinay Belgaumkar 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  8 +++
  .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 +
  drivers/gpu/drm/i915/gt/intel_rps.c   |  8 +++
  .../drm/i915/gt/uc/abi/guc_actions_slpc_abi.h | 21 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   | 17 +++
  drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h   |  1 +
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  7 +++
  drivers/gpu/drm/i915/i915_getparam.c  | 11 ++
  include/uapi/drm/i915_drm.h   | 15 +
  9 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..ceab7dbe9b47 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -879,6 +879,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private 
*fpriv,
   struct i915_gem_proto_context *pc,
   struct drm_i915_gem_context_param *args)
  {
+   struct drm_i915_private *i915 = fpriv->i915;
int ret = 0;
  
  	switch (args->param) {

@@ -904,6 +905,13 @@ static int set_proto_ctx_param(struct 
drm_i915_file_private *fpriv,
pc->user_flags &= ~BIT(UCONTEXT_BANNABLE);
break;
  
+	case I915_CONTEXT_PARAM_IS_COMPUTE:

+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   ret = -EINVAL;
+   else
+   pc->user_flags |= BIT(UCONTEXT_COMPUTE);
+   break;
+
case I915_CONTEXT_PARAM_RECOVERABLE:
if (args->size)
ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 03bc7f9d191b..db86d6f6245f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -338,6 +338,7 @@ struct i915_gem_context {
  #define UCONTEXT_BANNABLE 2
  #define UCONTEXT_RECOVERABLE  3
  #define UCONTEXT_PERSISTENCE  4
+#define UCONTEXT_COMPUTE   5


What is the GuC behaviour when SLPC_CTX_FREQ_REQ_IS_COMPUTE is set for 
non-compute engines? Wondering if per intel_context is what we want 
instead. (Which could then be the i915_context_param_engines extension 
to mark individual contexts as compute strategy.)


  
  	/**

 * @flags: small set of booleans
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
b/drivers/gpu/drm/i915/gt/intel_rps.c
index 4feef874e6d6..1ed40cd61b70 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -24,6 +24,7 @@
  #include "intel_pcode.h"
  #include "intel_rps.h"
  #include "vlv_sideband.h"
+#include "../gem/i915_gem_context.h"
  #include "../../../platform/x86/intel_ips.h"
  
  #define BUSY_MAX_EI	20u /* ms */

@@ -1018,6 +1019,13 @@ void intel_rps_boost(struct i915_request *rq)
struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
  
  		if (rps_uses_slpc(rps)) {

+   const struct i915_gem_context *ctx;
+
+   ctx = i915_request_gem_context(rq);
+   if (ctx &&
+   test_bit(UCONTEXT_COMPUTE, &ctx->user_flags))
+   return;
+


I think request and intel_context do not own a strong reference to GEM 
context. So at minimum you need a local one obtained under a RCU lock 
with kref_get_unless_zero, as do some other places do.


However.. it may be simpler to just store the flag in 
intel_context->flags. If you carry it over at the time GEM context is 
assigned to intel_context, not only you simplify runtime rules, but you 
get the ability to not set the compute flags for video etc.


It may e

Re: [RFC] drm/i915: Support replaying GPU hangs with captured context image

2024-02-21 Thread Tvrtko Ursulin



On 20/02/2024 22:50, Rodrigo Vivi wrote:

On Tue, Feb 13, 2024 at 01:14:34PM +, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.

This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.

To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.

Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.

In terms of implementation the only trivial change is shadowing of the
default state from engine to context. We also allow the legacy context
set param to be used since that removes the need to record the per context
data in the proto context, while still allowing flexibility of specifying
context images for any context.

Mesa MR using the uapi can be seen at:
   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594


I just wonder if it would be better to split the default_state in a separate
patch but from what I could see it looks correct.


It definitely makes sense to split it. I was just a bit lazy while 
testing the waters. After all this is a very novel idea of debug only 
uapi outside debugfs so I wasn't too sure how it will be received. Stay 
tuned for v2.


Regards,

Tvrtko



Also, I have to say that this approach is nice, clean and well protected.
And much simpler then I imagined when I saw the idea around.

Feel free to use:
Reviewed-by: Rodrigo Vivi 



Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
---
  drivers/gpu/drm/i915/Kconfig.debug|  17 +++
  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 106 ++
  drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
  drivers/gpu/drm/i915/gt/intel_context.h   |  22 
  drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
  drivers/gpu/drm/i915/gt/intel_lrc.c   |   8 +-
  .../gpu/drm/i915/gt/intel_ring_submission.c   |   8 +-
  drivers/gpu/drm/i915/i915_params.c|   5 +
  drivers/gpu/drm/i915/i915_params.h|   3 +-
  include/uapi/drm/i915_drm.h   |  27 +
  10 files changed, 194 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug 
b/drivers/gpu/drm/i915/Kconfig.debug
index 5b7162076850..32e9f70e91ed 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -16,6 +16,23 @@ config DRM_I915_WERROR
  
  	  If in doubt, say "N".
  
+config DRM_I915_REPLAY_GPU_HANGS_API

+   bool "Enable GPU hang replay userspace API"
+   depends on DRM_I915
+   depends on EXPERT
+   default n
+   help
+ Choose this option if you want to enable special and unstable
+ userspace API used for replaying GPU hangs on a running system.
+
+ This API is intended to be used by userspace graphics stack developers
+ and provides no stability guarantees.
+
+ The API needs to be activated at boot time using the
+ enable_debug_only_api module parameter.
+
+ If in doubt, say "N".
+
  config DRM_I915_DEBUG
bool "Enable additional driver debugging"
depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..1cfd624bd978 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -78,6 +78,7 @@
  #include "gt/intel_engine_user.h"
  #include "gt/intel_gpu_commands.h"
  #include "gt/intel_ring.h"
+#include "gt/shmem_utils.h"
  
  #include "pxp/intel_pxp.h"
  
@@ -949,6 +950,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private *fpriv,

case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
+   case I915_CONTEXT_PARAM_CONTEXT_IMAGE:
default:
ret = -EINVAL;
break;
@@ -2092,6 +2094,88 @@ static int get_protected(struct i915_gem_context *ctx,
return 0;
  }
  
+static int set_context_image(struct i915_gem_context *ctx,

+struct drm_i915_gem_context_param *args)
+{
+   struct i915_gem_context_param_context_image user;
+

Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

2024-02-21 Thread Tvrtko Ursulin



On 21/02/2024 00:14, Andi Shyti wrote:

Hi Tvrtko,

On Tue, Feb 20, 2024 at 02:48:31PM +, Tvrtko Ursulin wrote:

On 20/02/2024 14:35, Andi Shyti wrote:

Enable only one CCS engine by default with all the compute sices


slices


Thanks!


diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..7041acc77810 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
if (engine->uabi_class == I915_NO_UABI_CLASS)
continue;
+   /*
+* Do not list and do not count CCS engines other than the first
+*/
+   if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+   engine->uabi_instance > 0) {
+   i915->engine_uabi_class_count[engine->uabi_class]--;
+   continue;
+   }


It's a bit ugly to decrement after increment, instead of somehow
restructuring the loop to satisfy both cases more elegantly.


yes, agree, indeed I had a hard time here to accept this change
myself.

But moving the check above where the counter was incremented it
would have been much uglier.

This check looks ugly everywhere you place it :-)


One idea would be to introduce a separate local counter array for 
name_instance, so not use i915->engine_uabi_class_count[]. First one 
increments for every engine, second only for the exposed ones. That way 
feels wouldn't be too ugly.



In any case, I'm working on a patch that is splitting this
function in two parts and there is some refactoring happening
here (for the first initialization and the dynamic update).

Please let me know if it's OK with you or you want me to fix it
in this run.


And I wonder if
internally (in dmesg when engine name is logged) we don't end up with ccs0
ccs0 ccs0 ccs0.. for all instances.


I don't see this. Even in sysfs we see only one ccs. Where is it?


When you run this patch on something with two or more ccs-es, the 
"renamed ccs... to ccs.." debug logs do not all log the new name as ccs0?


Regards,

Tvrtko




+
rb_link_node(&engine->uabi_node, prev, p);
rb_insert_color(&engine->uabi_node, &i915->uabi_engines);


[...]


diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 3baa2f54a86e..d5a5143971f5 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private 
*i915,
return fill_topology_info(sseu, query_item, 
sseu->geometry_subslice_mask);
   }
+


Zap please.


yes... yes... I noticed it after sending the patch :-)

Thanks,
Andi


Re: [PATCH v2 2/2] drm/i915/gt: Enable only one CCS for compute workload

2024-02-20 Thread Tvrtko Ursulin



On 20/02/2024 14:35, Andi Shyti wrote:

Enable only one CCS engine by default with all the compute sices


slices


allocated to it.

While generating the list of UABI engines to be exposed to the
user, exclude any additional CCS engines beyond the first
instance.

This change can be tested with igt i915_query.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti 
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Matt Roper 
Cc:  # v6.2+
---
  drivers/gpu/drm/i915/gt/intel_engine_user.c |  9 +
  drivers/gpu/drm/i915/gt/intel_gt.c  | 11 +++
  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  2 ++
  drivers/gpu/drm/i915/i915_query.c   |  1 +
  4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 833987015b8b..7041acc77810 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -243,6 +243,15 @@ void intel_engines_driver_register(struct drm_i915_private 
*i915)
if (engine->uabi_class == I915_NO_UABI_CLASS)
continue;
  
+		/*

+* Do not list and do not count CCS engines other than the first
+*/
+   if (engine->uabi_class == I915_ENGINE_CLASS_COMPUTE &&
+   engine->uabi_instance > 0) {
+   i915->engine_uabi_class_count[engine->uabi_class]--;
+   continue;
+   }


It's a bit ugly to decrement after increment, instead of somehow 
restructuring the loop to satisfy both cases more elegantly. And I 
wonder if internally (in dmesg when engine name is logged) we don't end 
up with ccs0 ccs0 ccs0 ccs0.. for all instances.



+
rb_link_node(&engine->uabi_node, prev, p);
rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
  
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c

index a425db5ed3a2..e19df4ef47f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
}
  }
  
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)

+{
+   if (!IS_DG2(gt->i915))
+   return;
+
+   intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
+
  int intel_gt_init_hw(struct intel_gt *gt)
  {
struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
  
  	intel_gt_init_swizzling(gt);
  
+	/* Configure CCS mode */

+   intel_gt_apply_ccs_mode(gt);
+
/*
 * At least 830 can leave some of the unused rings
 * "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index cf709f6c05ae..c148113770ea 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1605,6 +1605,8 @@
  #define   GEN12_VOLTAGE_MASK  REG_GENMASK(10, 0)
  #define   GEN12_CAGF_MASK REG_GENMASK(19, 11)
  
+#define XEHP_CCS_MODE  _MMIO(0x14804)

+
  #define GEN11_GT_INTR_DW(x)   _MMIO(0x190018 + ((x) * 4))
  #define   GEN11_CSME  (31)
  #define   GEN12_HECI_2(30)
diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 3baa2f54a86e..d5a5143971f5 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private 
*i915,
return fill_topology_info(sseu, query_item, 
sseu->geometry_subslice_mask);
  }
  
+


Zap please.


  static int
  query_engine_info(struct drm_i915_private *i915,
  struct drm_i915_query_item *query_item)


Regards,

Tvrtko


Re: [PATCH 2/2] drm/i915/gt: Set default CCS mode '1'

2024-02-20 Thread Tvrtko Ursulin



On 20/02/2024 14:20, Andi Shyti wrote:

Since CCS automatic load balancing is disabled, we will impose a
fixed balancing policy that involves setting all the CCS engines
to work together on the same load.


Erm *all* CSS engines work together..


Simultaneously, the user will see only 1 CCS rather than the
actual number. As of now, this change affects only DG2.


... *one* CCS engine.



Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti 
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Matt Roper 
Cc:  # v6.2+
---
  drivers/gpu/drm/i915/gt/intel_gt.c  | 11 +++
  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  2 ++
  drivers/gpu/drm/i915/i915_drv.h | 17 +
  drivers/gpu/drm/i915/i915_query.c   |  5 +++--
  4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a425db5ed3a2..e19df4ef47f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
}
  }
  
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)

+{
+   if (!IS_DG2(gt->i915))
+   return;
+
+   intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
+
  int intel_gt_init_hw(struct intel_gt *gt)
  {
struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
  
  	intel_gt_init_swizzling(gt);
  
+	/* Configure CCS mode */

+   intel_gt_apply_ccs_mode(gt);
+
/*
 * At least 830 can leave some of the unused rings
 * "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index cf709f6c05ae..c148113770ea 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1605,6 +1605,8 @@
  #define   GEN12_VOLTAGE_MASK  REG_GENMASK(10, 0)
  #define   GEN12_CAGF_MASK REG_GENMASK(19, 11)
  
+#define XEHP_CCS_MODE  _MMIO(0x14804)

+
  #define GEN11_GT_INTR_DW(x)   _MMIO(0x190018 + ((x) * 4))
  #define   GEN11_CSME  (31)
  #define   GEN12_HECI_2(30)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e81b3b2858ac..0853ffd3cb8d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct 
drm_i915_private *i915)
 (engine__); \
 (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node)))
  
+/*

+ * Exclude unavailable engines.
+ *
+ * Only the first CCS engine is utilized due to the disabling of CCS auto load
+ * balancing. As a result, all CCS engines operate collectively, functioning
+ * essentially as a single CCS engine, hence the count of active CCS engines is
+ * considered '1'.
+ * Currently, this applies to platforms with more than one CCS engine,
+ * specifically DG2.
+ */
+#define for_each_available_uabi_engine(engine__, i915__) \
+   for_each_uabi_engine(engine__, i915__) \
+   if ((IS_DG2(i915__)) && \
+   ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
+   ((engine__)->uabi_instance)) { } \
+   else
+


I thought the plan was to simply not register the engine. Like that it 
would be a simpler patch.



  #define INTEL_INFO(i915)  ((i915)->__info)
  #define RUNTIME_INFO(i915)(&(i915)->__runtime)
  #define DRIVER_CAPS(i915) (&(i915)->caps)
diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index fa3e937ed3f5..2d41bda626a6 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private 
*i915,
return fill_topology_info(sseu, query_item, 
sseu->geometry_subslice_mask);
  }
  
+


!


  static int
  query_engine_info(struct drm_i915_private *i915,
  struct drm_i915_query_item *query_item)
@@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915,
if (query_item->flags)
return -EINVAL;
  
-	for_each_uabi_engine(engine, i915)

+   for_each_available_uabi_engine(engine, i915)
num_uabi_engines++;
  
  	len = struct_size(query_ptr, engines, num_uabi_engines);

@@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915,
  
  	info_ptr = &query_ptr->engines[0];
  
-	for_each_uabi_engine(engine, i915) {

+   for_each_available_uabi_engine(engine, i915) {
info.engine.engine_class = engine->uabi_class;
info.engine.engine_instance = engine->uabi_instance;
info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;


I thought you agreed that this sti

Re: [PATCH] drm/i915: Fix possible null pointer dereference after drm_dbg_printer conversion

2024-02-20 Thread Tvrtko Ursulin



On 20/02/2024 10:36, Maxime Ripard wrote:

On Tue, Feb 20, 2024 at 09:16:43AM +, Tvrtko Ursulin wrote:


On 19/02/2024 20:02, Rodrigo Vivi wrote:

On Mon, Feb 19, 2024 at 01:14:23PM +, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Request can be NULL if no guilty request was identified so simply use
engine->i915 instead.

Signed-off-by: Tvrtko Ursulin 
Fixes: d50892a9554c ("drm/i915: switch from drm_debug_printer() to device specific 
drm_dbg_printer()")
Reported-by: Dan Carpenter 
Cc: Jani Nikula 
Cc: Luca Coelho 
Cc: Maxime Ripard 
Cc: Jani Nikula 


Reviewed-by: Rodrigo Vivi 


Thanks Rodrigo!

Given how d50892a9554c landed via drm-misc-next, Maxime or Thomas - could
you take this via drm-misc-next-fixes or if there will be another
drm-misc-next pull request?


There will be a drm-misc-next PR on thursday


Could you pull this one into which branch is needed so it appears in 
that pull request?


Regards,

Tvrtko


Re: [PATCH 2/2] drm/i915/gt: Set default CCS mode '1'

2024-02-20 Thread Tvrtko Ursulin



On 20/02/2024 10:11, Andi Shyti wrote:

Hi Tvrtko,

On Mon, Feb 19, 2024 at 12:51:44PM +, Tvrtko Ursulin wrote:

On 19/02/2024 11:16, Tvrtko Ursulin wrote:

On 15/02/2024 13:59, Andi Shyti wrote:


...


+/*
+ * Exclude unavailable engines.
+ *
+ * Only the first CCS engine is utilized due to the disabling of
CCS auto load
+ * balancing. As a result, all CCS engines operate collectively,
functioning
+ * essentially as a single CCS engine, hence the count of active
CCS engines is
+ * considered '1'.
+ * Currently, this applies to platforms with more than one CCS engine,
+ * specifically DG2.
+ */
+#define for_each_available_uabi_engine(engine__, i915__) \
+    for_each_uabi_engine(engine__, i915__) \
+    if ((IS_DG2(i915__)) && \
+    ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
+    ((engine__)->uabi_instance)) { } \
+    else
+


If you don't want userspace to see some engines, just don't add them to
the uabi list in intel_engines_driver_register or thereabouts?


It will be dynamic. In next series I am preparing the user will
be able to increase the number of CCS engines he wants to use.


Oh tricky and new. Does it need to be at runtime or could be boot time?

If you are aiming to make the static single CCS only into the 6.9 
release, and you feel running out of time, you could always do a simple 
solution for now. The one I mentioned of simply not registering on the 
uabi list. Then you can refine more leisurely for the next release.


Regards,

Tvrtko




Similar as we do for gsc which uses I915_NO_UABI_CLASS, although for ccs
you can choose a different approach, whatever is more elegant.

That is also needed for i915->engine_uabi_class_count to be right, so
userspace stats which rely on it are correct.


Oh yes. Will update it.


I later realized it is more than that - everything that uses
intel_engine_lookup_user to look up class instance passed in from userspace
relies on the engine not being on the user list otherwise userspace could
bypass the fact engine query does not list it. Like PMU, Perf/POA, context
engine map and SSEU context query.


Correct, will look into that, thank you!

Andi


Re: [PATCH] drm/i915: Fix possible null pointer dereference after drm_dbg_printer conversion

2024-02-20 Thread Tvrtko Ursulin



On 19/02/2024 20:02, Rodrigo Vivi wrote:

On Mon, Feb 19, 2024 at 01:14:23PM +, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Request can be NULL if no guilty request was identified so simply use
engine->i915 instead.

Signed-off-by: Tvrtko Ursulin 
Fixes: d50892a9554c ("drm/i915: switch from drm_debug_printer() to device specific 
drm_dbg_printer()")
Reported-by: Dan Carpenter 
Cc: Jani Nikula 
Cc: Luca Coelho 
Cc: Maxime Ripard 
Cc: Jani Nikula 


Reviewed-by: Rodrigo Vivi 


Thanks Rodrigo!

Given how d50892a9554c landed via drm-misc-next, Maxime or Thomas - 
could you take this via drm-misc-next-fixes or if there will be another 
drm-misc-next pull request?


Regards,

Tvrtko




---
  drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 5f8d86e25993..8d4bb95f8424 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -96,8 +96,8 @@ static void heartbeat_commit(struct i915_request *rq,
  static void show_heartbeat(const struct i915_request *rq,
   struct intel_engine_cs *engine)
  {
-   struct drm_printer p = drm_dbg_printer(&rq->i915->drm, DRM_UT_DRIVER,
-  "heartbeat");
+   struct drm_printer p =
+   drm_dbg_printer(&engine->i915->drm, DRM_UT_DRIVER, "heartbeat");
  
  	if (!rq) {

intel_engine_dump(engine, &p,
--
2.40.1



[PATCH] drm/i915: Add some boring kerneldoc

2024-02-19 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Tooling appears very strict so lets pacify it by adding some comments,
even if fields are completely self-explanatory.

Signed-off-by: Tvrtko Ursulin 
Fixes: b11236486749 ("drm/i915: Add GuC submission interface version query")
Reported-by: Stephen Rothwell 
Cc: Jose Souza 
---
 include/uapi/drm/i915_drm.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bd87386a8243..2ee338860b7e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3572,9 +3572,13 @@ struct drm_i915_query_memory_regions {
  * struct drm_i915_query_guc_submission_version - query GuC submission 
interface version
  */
 struct drm_i915_query_guc_submission_version {
+   /** @branch: Firmware branch version. */
__u32 branch;
+   /** @major: Firmware major version. */
__u32 major;
+   /** @minor: Firmware minor version. */
__u32 minor;
+   /** @patch: Firmware patch version. */
__u32 patch;
 };
 
-- 
2.40.1



[PATCH] drm/i915: Fix possible null pointer dereference after drm_dbg_printer conversion

2024-02-19 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Request can be NULL if no guilty request was identified so simply use
engine->i915 instead.

Signed-off-by: Tvrtko Ursulin 
Fixes: d50892a9554c ("drm/i915: switch from drm_debug_printer() to device 
specific drm_dbg_printer()")
Reported-by: Dan Carpenter 
Cc: Jani Nikula 
Cc: Luca Coelho 
Cc: Maxime Ripard 
Cc: Jani Nikula 
---
 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 5f8d86e25993..8d4bb95f8424 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -96,8 +96,8 @@ static void heartbeat_commit(struct i915_request *rq,
 static void show_heartbeat(const struct i915_request *rq,
   struct intel_engine_cs *engine)
 {
-   struct drm_printer p = drm_dbg_printer(&rq->i915->drm, DRM_UT_DRIVER,
-  "heartbeat");
+   struct drm_printer p =
+   drm_dbg_printer(&engine->i915->drm, DRM_UT_DRIVER, "heartbeat");
 
if (!rq) {
intel_engine_dump(engine, &p,
-- 
2.40.1



Re: [PATCH 2/2] drm/i915/gt: Set default CCS mode '1'

2024-02-19 Thread Tvrtko Ursulin



On 19/02/2024 11:16, Tvrtko Ursulin wrote:


On 15/02/2024 13:59, Andi Shyti wrote:

Since CCS automatic load balancing is disabled, we will impose a
fixed balancing policy that involves setting all the CCS engines
to work together on the same load.

Simultaneously, the user will see only 1 CCS rather than the
actual number. As of now, this change affects only DG2.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti 
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Matt Roper 
Cc:  # v6.2+
---
  drivers/gpu/drm/i915/gt/intel_gt.c  | 11 +++
  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  2 ++
  drivers/gpu/drm/i915/i915_drv.h | 17 +
  drivers/gpu/drm/i915/i915_query.c   |  5 +++--
  4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c

index a425db5ed3a2..e19df4ef47f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
  }
  }
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+    if (!IS_DG2(gt->i915))
+    return;
+
+    intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
+
  int intel_gt_init_hw(struct intel_gt *gt)
  {
  struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
  intel_gt_init_swizzling(gt);
+    /* Configure CCS mode */
+    intel_gt_apply_ccs_mode(gt);
+
  /*
   * At least 830 can leave some of the unused rings
   * "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
b/drivers/gpu/drm/i915/gt/intel_gt_regs.h

index cf709f6c05ae..c148113770ea 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1605,6 +1605,8 @@
  #define   GEN12_VOLTAGE_MASK    REG_GENMASK(10, 0)
  #define   GEN12_CAGF_MASK    REG_GENMASK(19, 11)
+#define XEHP_CCS_MODE  _MMIO(0x14804)
+
  #define GEN11_GT_INTR_DW(x)    _MMIO(0x190018 + ((x) * 4))
  #define   GEN11_CSME    (31)
  #define   GEN12_HECI_2    (30)
diff --git a/drivers/gpu/drm/i915/i915_drv.h 
b/drivers/gpu/drm/i915/i915_drv.h

index e81b3b2858ac..0853ffd3cb8d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct 
drm_i915_private *i915)

   (engine__); \
   (engine__) = 
rb_to_uabi_engine(rb_next(&(engine__)->uabi_node)))

+/*
+ * Exclude unavailable engines.
+ *
+ * Only the first CCS engine is utilized due to the disabling of CCS 
auto load
+ * balancing. As a result, all CCS engines operate collectively, 
functioning
+ * essentially as a single CCS engine, hence the count of active CCS 
engines is

+ * considered '1'.
+ * Currently, this applies to platforms with more than one CCS engine,
+ * specifically DG2.
+ */
+#define for_each_available_uabi_engine(engine__, i915__) \
+    for_each_uabi_engine(engine__, i915__) \
+    if ((IS_DG2(i915__)) && \
+    ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
+    ((engine__)->uabi_instance)) { } \
+    else
+


If you don't want userspace to see some engines, just don't add them to 
the uabi list in intel_engines_driver_register or thereabouts?


Similar as we do for gsc which uses I915_NO_UABI_CLASS, although for ccs 
you can choose a different approach, whatever is more elegant.


That is also needed for i915->engine_uabi_class_count to be right, so 
userspace stats which rely on it are correct.


I later realized it is more than that - everything that uses 
intel_engine_lookup_user to look up class instance passed in from 
userspace relies on the engine not being on the user list otherwise 
userspace could bypass the fact engine query does not list it. Like PMU, 
Perf/POA, context engine map and SSEU context query.


Regards,

Tvrtko



Regards,

Tvrtko


  #define INTEL_INFO(i915)    ((i915)->__info)
  #define RUNTIME_INFO(i915)    (&(i915)->__runtime)
  #define DRIVER_CAPS(i915)    (&(i915)->caps)
diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c

index fa3e937ed3f5..2d41bda626a6 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,6 +124,7 @@ static int query_geometry_subslices(struct 
drm_i915_private *i915,
  return fill_topology_info(sseu, query_item, 
sseu->geometry_subslice_mask);

  }
+
  static int
  query_engine_info(struct drm_i915_private *i915,
    struct drm_i915_query_item *query_item)
@@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915,
  if (query_item->flags)
  return -EINVAL;
-    for_each_uabi_engine

Re: [PATCH 2/2] drm/i915/gt: Set default CCS mode '1'

2024-02-19 Thread Tvrtko Ursulin



On 15/02/2024 13:59, Andi Shyti wrote:

Since CCS automatic load balancing is disabled, we will impose a
fixed balancing policy that involves setting all the CCS engines
to work together on the same load.

Simultaneously, the user will see only 1 CCS rather than the
actual number. As of now, this change affects only DG2.

Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Signed-off-by: Andi Shyti 
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Matt Roper 
Cc:  # v6.2+
---
  drivers/gpu/drm/i915/gt/intel_gt.c  | 11 +++
  drivers/gpu/drm/i915/gt/intel_gt_regs.h |  2 ++
  drivers/gpu/drm/i915/i915_drv.h | 17 +
  drivers/gpu/drm/i915/i915_query.c   |  5 +++--
  4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a425db5ed3a2..e19df4ef47f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt)
}
  }
  
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt)

+{
+   if (!IS_DG2(gt->i915))
+   return;
+
+   intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
+
  int intel_gt_init_hw(struct intel_gt *gt)
  {
struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
  
  	intel_gt_init_swizzling(gt);
  
+	/* Configure CCS mode */

+   intel_gt_apply_ccs_mode(gt);
+
/*
 * At least 830 can leave some of the unused rings
 * "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index cf709f6c05ae..c148113770ea 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1605,6 +1605,8 @@
  #define   GEN12_VOLTAGE_MASK  REG_GENMASK(10, 0)
  #define   GEN12_CAGF_MASK REG_GENMASK(19, 11)
  
+#define XEHP_CCS_MODE  _MMIO(0x14804)

+
  #define GEN11_GT_INTR_DW(x)   _MMIO(0x190018 + ((x) * 4))
  #define   GEN11_CSME  (31)
  #define   GEN12_HECI_2(30)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e81b3b2858ac..0853ffd3cb8d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct 
drm_i915_private *i915)
 (engine__); \
 (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node)))
  
+/*

+ * Exclude unavailable engines.
+ *
+ * Only the first CCS engine is utilized due to the disabling of CCS auto load
+ * balancing. As a result, all CCS engines operate collectively, functioning
+ * essentially as a single CCS engine, hence the count of active CCS engines is
+ * considered '1'.
+ * Currently, this applies to platforms with more than one CCS engine,
+ * specifically DG2.
+ */
+#define for_each_available_uabi_engine(engine__, i915__) \
+   for_each_uabi_engine(engine__, i915__) \
+   if ((IS_DG2(i915__)) && \
+   ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
+   ((engine__)->uabi_instance)) { } \
+   else
+


If you don't want userspace to see some engines, just don't add them to 
the uabi list in intel_engines_driver_register or thereabouts?


Similar as we do for gsc which uses I915_NO_UABI_CLASS, although for ccs 
you can choose a different approach, whatever is more elegant.


That is also needed for i915->engine_uabi_class_count to be right, so 
userspace stats which rely on it are correct.


Regards,

Tvrtko


  #define INTEL_INFO(i915)  ((i915)->__info)
  #define RUNTIME_INFO(i915)(&(i915)->__runtime)
  #define DRIVER_CAPS(i915) (&(i915)->caps)
diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index fa3e937ed3f5..2d41bda626a6 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private 
*i915,
return fill_topology_info(sseu, query_item, 
sseu->geometry_subslice_mask);
  }
  
+

  static int
  query_engine_info(struct drm_i915_private *i915,
  struct drm_i915_query_item *query_item)
@@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915,
if (query_item->flags)
return -EINVAL;
  
-	for_each_uabi_engine(engine, i915)

+   for_each_available_uabi_engine(engine, i915)
num_uabi_engines++;
  
  	len = struct_size(query_ptr, engines, num_uabi_engines);

@@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915,
  
  	info_ptr = &query_ptr->engines[0];
  
-	for_each_uabi_engine(engine, i915) {

+   for_each_available_uabi_engine(engine, i9

[PULL] drm-intel-gt-next

2024-02-15 Thread Tvrtko Ursulin
Hi Dave, Daniel,

First pull request for 6.9 with probably one more coming in one to two
weeks.

Nothing to interesting in this one, mostly a sprinkle of small fixes in
GuC, HuC, Perf/OA, a tiny bit of prep work for future platforms and some
code cleanups.

One new uapi in the form of a GuC submission version query which Mesa
wants for implementing Vulkan async compute queues.

Regards,

Tvrtko

drm-intel-gt-next-2024-02-15:
UAPI Changes:

- Add GuC submission interface version query (Tvrtko Ursulin)

Driver Changes:

Fixes/improvements/new stuff:

- Atomically invalidate userptr on mmu-notifier (Jonathan Cavitt)
- Update handling of MMIO triggered reports (Umesh Nerlige Ramappa)
- Don't make assumptions about intel_wakeref_t type (Jani Nikula)
- Add workaround 14019877138 [xelpg] (Tejas Upadhyay)
- Allow for very slow HuC loading [huc] (John Harrison)
- Flush context destruction worker at suspend [guc] (Alan Previn)
- Close deregister-context race against CT-loss [guc] (Alan Previn)
- Avoid circular locking issue on busyness flush [guc] (John Harrison)
- Use rc6.supported flag from intel_gt for rc6_enable sysfs (Juan Escamilla)
- Reflect the true and current status of rc6_enable (Juan Escamilla)
- Wake GT before sending H2G message [mtl] (Vinay Belgaumkar)
- Restart the heartbeat timer when forcing a pulse (John Harrison)

Future platform enablement:

- Extend driver code of Xe_LPG to Xe_LPG+ [xelpg] (Harish Chegondi)
- Extend some workarounds/tuning to gfx version 12.74 [xelpg] (Matt Roper)

Miscellaneous:

- Reconcile Excess struct member kernel-doc warnings (Randy Dunlap)
- Change wa and EU_PERF_CNTL registers to MCR type [guc] (Shuicheng Lin)
- Add flex arrays to struct i915_syncmap (Erick Archer)
- Increasing the sleep time for live_rc6_manual [selftests] (Anirban Sk)
The following changes since commit 31accc37eaee98a90b25809ed58c6ee4956ab642:

  drm/i915: Use kmap_local_page() in gem/i915_gem_execbuffer.c (2023-12-15 
09:34:31 +)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm-intel tags/drm-intel-gt-next-2024-02-15

for you to fetch changes up to eb927f01dfb6309c8a184593c2c0618c4000c481:

  drm/i915/gt: Restart the heartbeat timer when forcing a pulse (2024-02-14 
17:17:35 -0800)


UAPI Changes:

- Add GuC submission interface version query (Tvrtko Ursulin)

Driver Changes:

Fixes/improvements/new stuff:

- Atomically invalidate userptr on mmu-notifier (Jonathan Cavitt)
- Update handling of MMIO triggered reports (Umesh Nerlige Ramappa)
- Don't make assumptions about intel_wakeref_t type (Jani Nikula)
- Add workaround 14019877138 [xelpg] (Tejas Upadhyay)
- Allow for very slow HuC loading [huc] (John Harrison)
- Flush context destruction worker at suspend [guc] (Alan Previn)
- Close deregister-context race against CT-loss [guc] (Alan Previn)
- Avoid circular locking issue on busyness flush [guc] (John Harrison)
- Use rc6.supported flag from intel_gt for rc6_enable sysfs (Juan Escamilla)
- Reflect the true and current status of rc6_enable (Juan Escamilla)
- Wake GT before sending H2G message [mtl] (Vinay Belgaumkar)
- Restart the heartbeat timer when forcing a pulse (John Harrison)

Future platform enablement:

- Extend driver code of Xe_LPG to Xe_LPG+ [xelpg] (Harish Chegondi)
- Extend some workarounds/tuning to gfx version 12.74 [xelpg] (Matt Roper)

Miscellaneous:

- Reconcile Excess struct member kernel-doc warnings (Randy Dunlap)
- Change wa and EU_PERF_CNTL registers to MCR type [guc] (Shuicheng Lin)
- Add flex arrays to struct i915_syncmap (Erick Archer)
- Increasing the sleep time for live_rc6_manual [selftests] (Anirban Sk)


Alan Previn (2):
  drm/i915/guc: Flush context destruction worker at suspend
  drm/i915/guc: Close deregister-context race against CT-loss

Anirban Sk (1):
  drm/i915/selftests: Increasing the sleep time for live_rc6_manual

Erick Archer (1):
  drm/i915: Add flex arrays to struct i915_syncmap

Harish Chegondi (1):
  drm/i915/xelpg: Extend driver code of Xe_LPG to Xe_LPG+

Jani Nikula (1):
  drm/i915: don't make assumptions about intel_wakeref_t type

John Harrison (3):
  drm/i915/huc: Allow for very slow HuC loading
  drm/i915/guc: Avoid circular locking issue on busyness flush
  drm/i915/gt: Restart the heartbeat timer when forcing a pulse

Jonathan Cavitt (1):
  drm/i915/gem: Atomically invalidate userptr on mmu-notifier

Juan Escamilla (2):
  drm/i915/gt: Use rc6.supported flag from intel_gt for rc6_enable sysfs
  drm/i915/gt: Reflect the true and current status of rc6_enable

Matt Roper (1):
  drm/i915/xelpg: Extend some workarounds/tuning to gfx version 12.74

Randy Dunlap (4):
  drm/i915/gem: reconcile Excess struct member kernel-doc warnings
  drm/i915/gt: reconcile Excess struct member kernel-doc warnings
  drm/i915

Re: [PATCH 2/2] i915/pmu: Cleanup pending events on unbind

2024-02-14 Thread Tvrtko Ursulin



On 13/02/2024 18:03, Umesh Nerlige Ramappa wrote:

Once a user opens an fd for a perf event, if the driver undergoes a
function level reset (FLR), the resources are not cleaned up as
expected. For this discussion FLR is defined as a PCI unbind followed by
a bind. perf_pmu_unregister() would cleanup everything, but when the user
closes the perf fd, perf_release is executed and we encounter null
pointer dereferences and/or list corruption in that path which require a
reboot to recover.

The only approach that worked to resolve this was to close the file
associated with the event such that the relevant cleanup happens w.r.t.
the open file. To do so, use the event->owner task and find the file
relevant to the event and close it. This relies on the
file->private_data matching the event object.

Note:
- Closing the event file is a delayed work that gets queued to system_wq.
The close is seen to happen when kernel returns to user space following
the unbind.

- perf framework will access the pmu object after the last event has
been destroyed. The drm device is refcounted in the init and destroy
hooks, so this causes a use after free if we are releasing the drm
device reference after unbind has been called. To work around this, we
take an extra reference in the unbind path and release it using a
delayed work in the destroy patch. The delayed work is queued to
system_wq.

Ref: 
https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursu...@linux.intel.com/T/#me72abfa2771e6fc94b167ce47efdbf391cc313ab

Opens:
- Synchronization may be needed between i915_pmu_unregister and
i915_pmu_event_destroy to avoid any races.

- If unbind and bind happen from the same process the event fd is closed
after bind completes. This means that the cleanup would not happen
until bind completes. In this case, i915 loads fine, but pmu
registration fails with an error that the sysfs entries are already
present. There is no solution feasible here. Since this is not a fatal
error (reloading i915 works fine) and the usual case is to have bind and
unbind in separate processes, there is no intention to solve this.

Other solutions/aspects tried:
- Call perf_event_disable() followed by perf_event_release_kernel() in
the unbind path to clean up the events. This still causes issues when
user closes the fd since perf_event_release_kernel() is called again and
fails requiring reboot.

- Close all event fds in unbind and wait for the close to complete by
checking if list is empty. This wait does not work since the files
are actually closed when unbind returns to user space.

Testing:
- New IGT tests have been added for this and are run with KASAN and
   kmemleak enabled.

Signed-off-by: Umesh Nerlige Ramappa 
---
  drivers/gpu/drm/i915/i915_pmu.c | 96 -
  drivers/gpu/drm/i915/i915_pmu.h | 15 ++
  2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 4d2a289f848a..2f365c7f5db7 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -4,6 +4,8 @@
   * Copyright © 2017-2018 Intel Corporation
   */
  
+#include 

+#include 
  #include 
  
  #include "gt/intel_engine.h"

@@ -573,9 +575,21 @@ static void i915_pmu_event_destroy(struct perf_event 
*event)
  {
struct i915_pmu *pmu = event_to_pmu(event);
struct drm_i915_private *i915 = pmu_to_i915(pmu);
+   struct i915_event *e = event->pmu_private;
  
  	drm_WARN_ON(&i915->drm, event->parent);
  
+	if (e) {

+   event->pmu_private = NULL;
+   list_del(&e->link);
+   kfree(e);
+   }
+
+   if (i915->pmu.closed && list_empty(&i915->pmu.initialized_events)) {
+   pmu_teardown(&i915->pmu);
+   mod_delayed_work(system_wq, &i915->pmu.work, 50);
+   }
+
drm_dev_put(&i915->drm);
  }
  
@@ -684,6 +698,14 @@ static int i915_pmu_event_init(struct perf_event *event)

return ret;
  
  	if (!event->parent) {

+   struct i915_event *e = kzalloc(sizeof(*e), GFP_KERNEL);
+
+   if (!e)
+   return -ENOMEM;
+
+   e->event = event;
+   list_add(&e->link, &pmu->initialized_events);
+   event->pmu_private = e;
drm_dev_get(&i915->drm);
event->destroy = i915_pmu_event_destroy;
}
@@ -1256,6 +1278,14 @@ void i915_pmu_exit(void)
cpuhp_remove_multi_state(cpuhp_slot);
  }
  
+static void i915_pmu_release(struct work_struct *work)

+{
+   struct i915_pmu *pmu = container_of(work, typeof(*pmu), work.work);
+   struct drm_i915_private *i915 = container_of(pmu, typeof(*i915), pmu);
+
+   drm_dev_put(&i915->drm);
+}
+
  void i915_pmu_register(struct drm_i915_private *i915)
  {
struct i915_pmu *pmu = &i915->pmu;
@@ -1313,6 +1343,9 @@ void i915_pmu_register(struct drm_i915_private *i915)
pmu->base.read   = i91

[RFC] drm/i915: Support replaying GPU hangs with captured context image

2024-02-13 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.

This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.

To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.

Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.

In terms of implementation the only trivial change is shadowing of the
default state from engine to context. We also allow the legacy context
set param to be used since that removes the need to record the per context
data in the proto context, while still allowing flexibility of specifying
context images for any context.

Mesa MR using the uapi can be seen at:
  https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594

Signed-off-by: Tvrtko Ursulin 
Cc: Lionel Landwerlin 
Cc: Carlos Santa 
---
 drivers/gpu/drm/i915/Kconfig.debug|  17 +++
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 106 ++
 drivers/gpu/drm/i915/gt/intel_context.c   |   2 +
 drivers/gpu/drm/i915/gt/intel_context.h   |  22 
 drivers/gpu/drm/i915/gt/intel_context_types.h |   3 +
 drivers/gpu/drm/i915/gt/intel_lrc.c   |   8 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   8 +-
 drivers/gpu/drm/i915/i915_params.c|   5 +
 drivers/gpu/drm/i915/i915_params.h|   3 +-
 include/uapi/drm/i915_drm.h   |  27 +
 10 files changed, 194 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.debug 
b/drivers/gpu/drm/i915/Kconfig.debug
index 5b7162076850..32e9f70e91ed 100644
--- a/drivers/gpu/drm/i915/Kconfig.debug
+++ b/drivers/gpu/drm/i915/Kconfig.debug
@@ -16,6 +16,23 @@ config DRM_I915_WERROR
 
  If in doubt, say "N".
 
+config DRM_I915_REPLAY_GPU_HANGS_API
+   bool "Enable GPU hang replay userspace API"
+   depends on DRM_I915
+   depends on EXPERT
+   default n
+   help
+ Choose this option if you want to enable special and unstable
+ userspace API used for replaying GPU hangs on a running system.
+
+ This API is intended to be used by userspace graphics stack developers
+ and provides no stability guarantees.
+
+ The API needs to be activated at boot time using the
+ enable_debug_only_api module parameter.
+
+ If in doubt, say "N".
+
 config DRM_I915_DEBUG
bool "Enable additional driver debugging"
depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index dcbfe32fd30c..1cfd624bd978 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -78,6 +78,7 @@
 #include "gt/intel_engine_user.h"
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
+#include "gt/shmem_utils.h"
 
 #include "pxp/intel_pxp.h"
 
@@ -949,6 +950,7 @@ static int set_proto_ctx_param(struct drm_i915_file_private 
*fpriv,
case I915_CONTEXT_PARAM_NO_ZEROMAP:
case I915_CONTEXT_PARAM_BAN_PERIOD:
case I915_CONTEXT_PARAM_RINGSIZE:
+   case I915_CONTEXT_PARAM_CONTEXT_IMAGE:
default:
ret = -EINVAL;
break;
@@ -2092,6 +2094,88 @@ static int get_protected(struct i915_gem_context *ctx,
return 0;
 }
 
+static int set_context_image(struct i915_gem_context *ctx,
+struct drm_i915_gem_context_param *args)
+{
+   struct i915_gem_context_param_context_image user;
+   struct intel_context *ce;
+   struct file *shmem_state;
+   unsigned long lookup;
+   void *state;
+   int ret = 0;
+
+   if (!IS_ENABLED(CONFIG_DRM_I915_REPLAY_GPU_HANGS_API))
+   return -EINVAL;
+
+   if (!ctx->i915->params.enable_debug_only_api)
+   return -EINVAL;
+
+   if (args->size < sizeof(user))
+   return -EINVAL;
+
+   if (copy_from_user(&user, u64_to_user_ptr(args->value), sizeof(user)))
+   return -EFAULT;
+
+   if (user.mbz)
+   return -EINVAL;
+
+   if (user.flags & ~(I915_CONTEXT_IMAGE_FLAG_ENGINE_INDEX))
+   return -EINVAL;
+
+   looku

Re: [PATCH] drm/i915: Add flex arrays to struct i915_syncmap

2024-02-12 Thread Tvrtko Ursulin



On 08/02/2024 18:13, Erick Archer wrote:

The "struct i915_syncmap" uses a dynamically sized set of trailing
elements. It can use an "u32" array or a "struct i915_syncmap *"
array.

So, use the preferred way in the kernel declaring flexible arrays [1].
Because there are two possibilities for the trailing arrays, it is
necessary to declare a union and use the DECLARE_FLEX_ARRAY macro.

The comment can be removed as the union is now clear enough.

Also, avoid the open-coded arithmetic in the memory allocator functions
[2] using the "struct_size" macro.

Moreover, refactor the "__sync_seqno" and "__sync_child" functions due
to now it is possible to use the union members added to the structure.
This way, it is also possible to avoid the open-coded arithmetic in
pointers.

Link: 
https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays
 [1]
Link: 
https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
 [2]
Signed-off-by: Erick Archer 


Looks good to me too so I've pushed it to drm-intel-gt-next, thanks!

Regards,

Tvrtko


---
  drivers/gpu/drm/i915/i915_syncmap.c | 19 ---
  1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_syncmap.c 
b/drivers/gpu/drm/i915/i915_syncmap.c
index 60404dbb2e9f..df6437c37373 100644
--- a/drivers/gpu/drm/i915/i915_syncmap.c
+++ b/drivers/gpu/drm/i915/i915_syncmap.c
@@ -75,13 +75,10 @@ struct i915_syncmap {
unsigned int height;
unsigned int bitmap;
struct i915_syncmap *parent;
-   /*
-* Following this header is an array of either seqno or child pointers:
-* union {
-*  u32 seqno[KSYNCMAP];
-*  struct i915_syncmap *child[KSYNCMAP];
-* };
-*/
+   union {
+   DECLARE_FLEX_ARRAY(u32, seqno);
+   DECLARE_FLEX_ARRAY(struct i915_syncmap *, child);
+   };
  };

  /**
@@ -99,13 +96,13 @@ void i915_syncmap_init(struct i915_syncmap **root)
  static inline u32 *__sync_seqno(struct i915_syncmap *p)
  {
GEM_BUG_ON(p->height);
-   return (u32 *)(p + 1);
+   return p->seqno;
  }

  static inline struct i915_syncmap **__sync_child(struct i915_syncmap *p)
  {
GEM_BUG_ON(!p->height);
-   return (struct i915_syncmap **)(p + 1);
+   return p->child;
  }

  static inline unsigned int
@@ -200,7 +197,7 @@ __sync_alloc_leaf(struct i915_syncmap *parent, u64 id)
  {
struct i915_syncmap *p;

-   p = kmalloc(sizeof(*p) + KSYNCMAP * sizeof(u32), GFP_KERNEL);
+   p = kmalloc(struct_size(p, seqno, KSYNCMAP), GFP_KERNEL);
if (unlikely(!p))
return NULL;

@@ -282,7 +279,7 @@ static noinline int __sync_set(struct i915_syncmap **root, 
u64 id, u32 seqno)
unsigned int above;

/* Insert a join above the current layer */
-   next = kzalloc(sizeof(*next) + KSYNCMAP * sizeof(next),
+   next = kzalloc(struct_size(next, child, KSYNCMAP),
   GFP_KERNEL);
if (unlikely(!next))
return -ENOMEM;
--
2.25.1



Re: [PATCH v2] drm/i915: Add GuC submission interface version query

2024-02-09 Thread Tvrtko Ursulin



On 08/02/2024 17:55, Souza, Jose wrote:

On Thu, 2024-02-08 at 07:19 -0800, José Roberto de Souza wrote:

On Thu, 2024-02-08 at 14:59 +, Tvrtko Ursulin wrote:

On 08/02/2024 14:30, Souza, Jose wrote:

On Thu, 2024-02-08 at 08:25 +, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Add a new query to the GuC submission interface version.

Mesa intends to use this information to check for old firmware versions
with a known bug where using the render and compute command streamers
simultaneously can cause GPU hangs due issues in firmware scheduling.

Based on patches from Vivaik and Joonas.

Compile tested only.

v2:
   * Added branch version.


Reviewed-by: José Roberto de Souza 
Tested-by: José Roberto de Souza 
UMD: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25233


Thanks, but please we also need to close down on the branch number
situation. I.e. be sure what is the failure mode in shipping Mesa with
the change as it stands in the MR linked. What platforms could start
failing and when, depending on GuC FW release eventualities.


yes, I have asked John Harrison for a documentation link about the firmware 
versioning.


Got the documentation link, MR updated.
Will ask for reviews in Mesa side.


Is it then understood and accepted that should GuC ever update the 
branch number on any given platform, that platform, for all deployed 
Mesa's in the field, will automatically revert to no async queues and so 
cause a silent performance regression?


Regards,

Tvrtko







Regards,

Tvrtko


Signed-off-by: Tvrtko Ursulin 
Cc: Kenneth Graunke 
Cc: Jose Souza 
Cc: Sagar Ghuge 
Cc: Paulo Zanoni 
Cc: John Harrison 
Cc: Rodrigo Vivi 
Cc: Jani Nikula 
Cc: Tvrtko Ursulin 
Cc: Vivaik Balasubrawmanian 
Cc: Joonas Lahtinen 
---
   drivers/gpu/drm/i915/i915_query.c | 33 +++
   include/uapi/drm/i915_drm.h   | 12 +++
   2 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 00871ef99792..d4dba1240b40 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -551,6 +551,38 @@ static int query_hwconfig_blob(struct drm_i915_private 
*i915,
return hwconfig->size;
   }
   
+static int

+query_guc_submission_version(struct drm_i915_private *i915,
+struct drm_i915_query_item *query)
+{
+   struct drm_i915_query_guc_submission_version __user *query_ptr =
+   u64_to_user_ptr(query->data_ptr);
+   struct drm_i915_query_guc_submission_version ver;
+   struct intel_guc *guc = &to_gt(i915)->uc.guc;
+   const size_t size = sizeof(ver);
+   int ret;
+
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   return -ENODEV;
+
+   ret = copy_query_item(&ver, size, size, query);
+   if (ret != 0)
+   return ret;
+
+   if (ver.branch || ver.major || ver.minor || ver.patch)
+   return -EINVAL;
+
+   ver.branch = 0;
+   ver.major = guc->submission_version.major;
+   ver.minor = guc->submission_version.minor;
+   ver.patch = guc->submission_version.patch;
+
+   if (copy_to_user(query_ptr, &ver, size))
+   return -EFAULT;
+
+   return 0;
+}
+
   static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -559,6 +591,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_memregion_info,
query_hwconfig_blob,
query_geometry_subslices,
+   query_guc_submission_version,
   };
   
   int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 550c496ce76d..84fb7f7ea834 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
 *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct 
drm_i915_query_memory_regions)
 *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob uAPI`)
 *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct 
drm_i915_query_topology_info)
+*  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct 
drm_i915_query_guc_submission_version)
 */
__u64 query_id;
   #define DRM_I915_QUERY_TOPOLOGY_INFO 1
@@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
   #define DRM_I915_QUERY_MEMORY_REGIONS4
   #define DRM_I915_QUERY_HWCONFIG_BLOB 5
   #define DRM_I915_QUERY_GEOMETRY_SUBSLICES6
+#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION  7
   /* Must be kept compact -- no holes and well documented */
   
   	/**

@@ -3591,6 +3593,16 @@ struct drm_i915_query_memory_regions {
struct drm_i915_memory_region

Re: [PATCH v2] drm/i915: Add GuC submission interface version query

2024-02-08 Thread Tvrtko Ursulin



On 08/02/2024 14:30, Souza, Jose wrote:

On Thu, 2024-02-08 at 08:25 +, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Add a new query to the GuC submission interface version.

Mesa intends to use this information to check for old firmware versions
with a known bug where using the render and compute command streamers
simultaneously can cause GPU hangs due issues in firmware scheduling.

Based on patches from Vivaik and Joonas.

Compile tested only.

v2:
  * Added branch version.


Reviewed-by: José Roberto de Souza 
Tested-by: José Roberto de Souza 
UMD: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25233


Thanks, but please we also need to close down on the branch number 
situation. I.e. be sure what is the failure mode in shipping Mesa with 
the change as it stands in the MR linked. What platforms could start 
failing and when, depending on GuC FW release eventualities.


Regards,

Tvrtko


Signed-off-by: Tvrtko Ursulin 
Cc: Kenneth Graunke 
Cc: Jose Souza 
Cc: Sagar Ghuge 
Cc: Paulo Zanoni 
Cc: John Harrison 
Cc: Rodrigo Vivi 
Cc: Jani Nikula 
Cc: Tvrtko Ursulin 
Cc: Vivaik Balasubrawmanian 
Cc: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_query.c | 33 +++
  include/uapi/drm/i915_drm.h   | 12 +++
  2 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 00871ef99792..d4dba1240b40 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -551,6 +551,38 @@ static int query_hwconfig_blob(struct drm_i915_private 
*i915,
return hwconfig->size;
  }
  
+static int

+query_guc_submission_version(struct drm_i915_private *i915,
+struct drm_i915_query_item *query)
+{
+   struct drm_i915_query_guc_submission_version __user *query_ptr =
+   u64_to_user_ptr(query->data_ptr);
+   struct drm_i915_query_guc_submission_version ver;
+   struct intel_guc *guc = &to_gt(i915)->uc.guc;
+   const size_t size = sizeof(ver);
+   int ret;
+
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   return -ENODEV;
+
+   ret = copy_query_item(&ver, size, size, query);
+   if (ret != 0)
+   return ret;
+
+   if (ver.branch || ver.major || ver.minor || ver.patch)
+   return -EINVAL;
+
+   ver.branch = 0;
+   ver.major = guc->submission_version.major;
+   ver.minor = guc->submission_version.minor;
+   ver.patch = guc->submission_version.patch;
+
+   if (copy_to_user(query_ptr, &ver, size))
+   return -EFAULT;
+
+   return 0;
+}
+
  static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -559,6 +591,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_memregion_info,
query_hwconfig_blob,
query_geometry_subslices,
+   query_guc_submission_version,
  };
  
  int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 550c496ce76d..84fb7f7ea834 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
 *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct 
drm_i915_query_memory_regions)
 *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob uAPI`)
 *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct 
drm_i915_query_topology_info)
+*  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct 
drm_i915_query_guc_submission_version)
 */
__u64 query_id;
  #define DRM_I915_QUERY_TOPOLOGY_INFO  1
@@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
  #define DRM_I915_QUERY_MEMORY_REGIONS 4
  #define DRM_I915_QUERY_HWCONFIG_BLOB  5
  #define DRM_I915_QUERY_GEOMETRY_SUBSLICES 6
+#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION  7
  /* Must be kept compact -- no holes and well documented */
  
  	/**

@@ -3591,6 +3593,16 @@ struct drm_i915_query_memory_regions {
struct drm_i915_memory_region_info regions[];
  };
  
+/**

+* struct drm_i915_query_guc_submission_version - query GuC submission 
interface version
+*/
+struct drm_i915_query_guc_submission_version {
+   __u32 branch;
+   __u32 major;
+   __u32 minor;
+   __u32 patch;
+};
+
  /**
   * DOC: GuC HWCONFIG blob uAPI
   *




Re: [RFC] drm/i915: Add GuC submission interface version query

2024-02-08 Thread Tvrtko Ursulin



On 07/02/2024 19:34, John Harrison wrote:

On 2/7/2024 10:49, Tvrtko Ursulin wrote:

On 07/02/2024 18:12, John Harrison wrote:

On 2/7/2024 03:56, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Add a new query to the GuC submission interface version.

Mesa intends to use this information to check for old firmware versions
with a known bug where using the render and compute command streamers
simultaneously can cause GPU hangs due issues in firmware scheduling.

Based on patches from Vivaik and Joonas.

There is a little bit of an open around the width required for 
versions.

While the GuC FW iface tells they are u8, i915 GuC code uses u32:

  #define CSS_SW_VERSION_UC_MAJOR   (0xFF << 16)
  #define CSS_SW_VERSION_UC_MINOR   (0xFF << 8)
  #define CSS_SW_VERSION_UC_PATCH   (0xFF << 0)
...
  struct intel_uc_fw_ver {
  u32 major;
  u32 minor;
  u32 patch;
  u32 build;
  };
This is copied from generic code which supports firmwares other than 
GuC. Only GuC promises to use 8-bit version components. Other 
firmwares very definitely do not. There is no open.


Ack.



So we could make the query u8, and refactor the struct intel_uc_fw_ver
to use u8, or not. To avoid any doubts on why are we assigning u32 to
u8 I simply opted to use u64. Which avoids the need to add any padding
too.

I don't follow how potential 8 vs 32 confusion means jump to 64?!


Suggestion was to use u8 in the uapi in order to align with GuC FW ABI 
(or however it's called), in which case there would be:


   ver.major = guc->submission_version.major;

which would be:

   (u8) = (u32)

And I was anticipating someone not liking that either. Using too wide 
u64 simply avoids the need to add a padding element to the uapi struct.


If you are positive we need to include a branch number, even though it 
does not seem to be implemented in the code even(*) then I can make 
uapi 4x u32 and achieve the same.
It's not implemented in the code because we've never had to, and it is 
yet another train wreck waiting to happen. There are a bunch of issues 
at different levels that need to be resolved. But that is all in the 
kernel and/or firmware and so can be added by a later kernel update when 
necessary. However, if the UMDs are not already taking it into account 
or its not even in the UAPI, then we can't back fill in the kernel 
later, we are just broken.




(*)
static void uc_unpack_css_version(struct intel_uc_fw_ver *ver, u32 
css_value)

{
/* Get version numbers from the CSS header */
ver->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR, css_value);
ver->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR, css_value);
ver->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, css_value);
}

No branch field in the CSS header?

I think there is, it's just not officially implemented yet.



And Why is UMD supposed to reject a non-zero branch? Like how would 
1.1.3.0 be fine and 1.1.3.1 be bad? I don't get it. But anyway, I can 
respin if you definitely confirm.

Because that is backwards. The branch number goes at the front.

So, for example (using made up numbers, I don't recall offhand what 
versions we have where) say we currently have 0.1.3.0 in tip and 0.1.1.0 
in the last LTS. We then need to ship a critical security fix and back 
port it to the LTS. Tip becomes 0.1.3.1 but the LTS can't become 0.1.1.1 
because that version already exists in the history of tip and does not 
contain the fix. So the LTS gets branched to 1.1.0.0. We then have both 
branches potentially moving forwards with completely independent 
versioning.


Exactly the same as 5.8.x, 5.9,y, 6.0.z, etc in the Linux kernel 
versioning. You cannot make any assumptions about what might be in 
1.4.5.6 compared to 0.1.2.3. 1.4.5.6 could actually 0.1.0.3 with a stack 
of security fixes but none of the features, workarounds or bug fixes 
that are in 0.1.2.3.


Hence, if the branch number changes then all bets are off. You have to 
start over and reject anything you do not explicitly know about.


This is why we were saying that exposing version numbers to UMDs breaks 
down horribly as soon as we have to start branching. There is no clean 
or simple way to do this.


Right, thank you, I know we talked about the challenges with version 
numbers in the past and fully agreed. I just did not think to idea is to 
conceptually put the branch number first.


(It is called build btw in the i915 struct if that needs cleanup at some 
point. Or maybe name depends on the firmware type.)


But as the plan to piggy back on the existing semaphore capability flag 
has failed and i915 definitely does not want to keep a database of 
version branches to bugs fixes, and Mesa is immovable that they cannot 
ship without something, agreement was to let them have it that 
something. At least from the pretend level one can say it makes sense to 
expose the version and don'

[PATCH v2] drm/i915: Add GuC submission interface version query

2024-02-08 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Add a new query to the GuC submission interface version.

Mesa intends to use this information to check for old firmware versions
with a known bug where using the render and compute command streamers
simultaneously can cause GPU hangs due issues in firmware scheduling.

Based on patches from Vivaik and Joonas.

Compile tested only.

v2:
 * Added branch version.

Signed-off-by: Tvrtko Ursulin 
Cc: Kenneth Graunke 
Cc: Jose Souza 
Cc: Sagar Ghuge 
Cc: Paulo Zanoni 
Cc: John Harrison 
Cc: Rodrigo Vivi 
Cc: Jani Nikula 
Cc: Tvrtko Ursulin 
Cc: Vivaik Balasubrawmanian 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_query.c | 33 +++
 include/uapi/drm/i915_drm.h   | 12 +++
 2 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 00871ef99792..d4dba1240b40 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -551,6 +551,38 @@ static int query_hwconfig_blob(struct drm_i915_private 
*i915,
return hwconfig->size;
 }
 
+static int
+query_guc_submission_version(struct drm_i915_private *i915,
+struct drm_i915_query_item *query)
+{
+   struct drm_i915_query_guc_submission_version __user *query_ptr =
+   u64_to_user_ptr(query->data_ptr);
+   struct drm_i915_query_guc_submission_version ver;
+   struct intel_guc *guc = &to_gt(i915)->uc.guc;
+   const size_t size = sizeof(ver);
+   int ret;
+
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   return -ENODEV;
+
+   ret = copy_query_item(&ver, size, size, query);
+   if (ret != 0)
+   return ret;
+
+   if (ver.branch || ver.major || ver.minor || ver.patch)
+   return -EINVAL;
+
+   ver.branch = 0;
+   ver.major = guc->submission_version.major;
+   ver.minor = guc->submission_version.minor;
+   ver.patch = guc->submission_version.patch;
+
+   if (copy_to_user(query_ptr, &ver, size))
+   return -EFAULT;
+
+   return 0;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -559,6 +591,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_memregion_info,
query_hwconfig_blob,
query_geometry_subslices,
+   query_guc_submission_version,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 550c496ce76d..84fb7f7ea834 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
 *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct 
drm_i915_query_memory_regions)
 *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob uAPI`)
 *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct 
drm_i915_query_topology_info)
+*  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct 
drm_i915_query_guc_submission_version)
 */
__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO   1
@@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
 #define DRM_I915_QUERY_MEMORY_REGIONS  4
 #define DRM_I915_QUERY_HWCONFIG_BLOB   5
 #define DRM_I915_QUERY_GEOMETRY_SUBSLICES  6
+#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION  7
 /* Must be kept compact -- no holes and well documented */
 
/**
@@ -3591,6 +3593,16 @@ struct drm_i915_query_memory_regions {
struct drm_i915_memory_region_info regions[];
 };
 
+/**
+* struct drm_i915_query_guc_submission_version - query GuC submission 
interface version
+*/
+struct drm_i915_query_guc_submission_version {
+   __u32 branch;
+   __u32 major;
+   __u32 minor;
+   __u32 patch;
+};
+
 /**
  * DOC: GuC HWCONFIG blob uAPI
  *
-- 
2.40.1



Re: [RFC] drm/i915: Add GuC submission interface version query

2024-02-07 Thread Tvrtko Ursulin



On 07/02/2024 18:12, John Harrison wrote:

On 2/7/2024 03:56, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Add a new query to the GuC submission interface version.

Mesa intends to use this information to check for old firmware versions
with a known bug where using the render and compute command streamers
simultaneously can cause GPU hangs due issues in firmware scheduling.

Based on patches from Vivaik and Joonas.

There is a little bit of an open around the width required for versions.
While the GuC FW iface tells they are u8, i915 GuC code uses u32:

  #define CSS_SW_VERSION_UC_MAJOR   (0xFF << 16)
  #define CSS_SW_VERSION_UC_MINOR   (0xFF << 8)
  #define CSS_SW_VERSION_UC_PATCH   (0xFF << 0)
...
  struct intel_uc_fw_ver {
  u32 major;
  u32 minor;
  u32 patch;
  u32 build;
  };
This is copied from generic code which supports firmwares other than 
GuC. Only GuC promises to use 8-bit version components. Other firmwares 
very definitely do not. There is no open.


Ack.



So we could make the query u8, and refactor the struct intel_uc_fw_ver
to use u8, or not. To avoid any doubts on why are we assigning u32 to
u8 I simply opted to use u64. Which avoids the need to add any padding
too.

I don't follow how potential 8 vs 32 confusion means jump to 64?!


Suggestion was to use u8 in the uapi in order to align with GuC FW ABI (or 
however it's called), in which case there would be:

   ver.major = guc->submission_version.major;

which would be:

   (u8) = (u32)

And I was anticipating someone not liking that either. Using too wide u64 
simply avoids the need to add a padding element to the uapi struct.

If you are positive we need to include a branch number, even though it does not 
seem to be implemented in the code even(*) then I can make uapi 4x u32 and 
achieve the same.

(*)
static void uc_unpack_css_version(struct intel_uc_fw_ver *ver, u32 css_value)
{
/* Get version numbers from the CSS header */
ver->major = FIELD_GET(CSS_SW_VERSION_UC_MAJOR, css_value);
ver->minor = FIELD_GET(CSS_SW_VERSION_UC_MINOR, css_value);
ver->patch = FIELD_GET(CSS_SW_VERSION_UC_PATCH, css_value);
}

No branch field in the CSS header?

And Why is UMD supposed to reject a non-zero branch? Like how would 1.1.3.0 be 
fine and 1.1.3.1 be bad? I don't get it. But anyway, I can respin if you 
definitely confirm.

Regards,

Tvrtko



Compile tested only.

Signed-off-by: Tvrtko Ursulin 
Cc: Kenneth Graunke 
Cc: Jose Souza 
Cc: Sagar Ghuge 
Cc: Paulo Zanoni 
Cc: John Harrison 
Cc: Rodrigo Vivi 
Cc: Jani Nikula 
Cc: Tvrtko Ursulin 
Cc: Vivaik Balasubrawmanian 
---
  drivers/gpu/drm/i915/i915_query.c | 32 +++
  include/uapi/drm/i915_drm.h   | 11 +++
  2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c

index 00871ef99792..999687f6a3d4 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -551,6 +551,37 @@ static int query_hwconfig_blob(struct 
drm_i915_private *i915,

  return hwconfig->size;
  }
+static int
+query_guc_submission_version(struct drm_i915_private *i915,
+ struct drm_i915_query_item *query)
+{
+    struct drm_i915_query_guc_submission_version __user *query_ptr =
+    u64_to_user_ptr(query->data_ptr);
+    struct drm_i915_query_guc_submission_version ver;
+    struct intel_guc *guc = &to_gt(i915)->uc.guc;
+    const size_t size = sizeof(ver);
+    int ret;
+
+    if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+    return -ENODEV;
+
+    ret = copy_query_item(&ver, size, size, query);
+    if (ret != 0)
+    return ret;
+
+    if (ver.major || ver.minor || ver.patch)
+    return -EINVAL;
+
+    ver.major = guc->submission_version.major;
+    ver.minor = guc->submission_version.minor;
+    ver.patch = guc->submission_version.patch;
This needs to include the branch version (currently set to zero) in the 
definition. And the UMD needs to barf if branch comes back as non-zero. 
I.e. there is no guarantee that a branched version will have the w/a + 
fix that they are wanting.


John.



+
+    if (copy_to_user(query_ptr, &ver, size))
+    return -EFAULT;
+
+    return 0;
+}
+
  static int (* const i915_query_funcs[])(struct drm_i915_private 
*dev_priv,

  struct drm_i915_query_item *query_item) = {
  query_topology_info,
@@ -559,6 +590,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,

  query_memregion_info,
  query_hwconfig_blob,
  query_geometry_subslices,
+    query_guc_submission_version,
  };
  int i915_query_ioctl(struct drm_device *dev, void *data, struct 
drm_file *file)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 550c496ce76d..d80d9

[RFC] drm/i915: Add GuC submission interface version query

2024-02-07 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Add a new query to the GuC submission interface version.

Mesa intends to use this information to check for old firmware versions
with a known bug where using the render and compute command streamers
simultaneously can cause GPU hangs due issues in firmware scheduling.

Based on patches from Vivaik and Joonas.

There is a little bit of an open around the width required for versions.
While the GuC FW iface tells they are u8, i915 GuC code uses u32:

 #define CSS_SW_VERSION_UC_MAJOR   (0xFF << 16)
 #define CSS_SW_VERSION_UC_MINOR   (0xFF << 8)
 #define CSS_SW_VERSION_UC_PATCH   (0xFF << 0)
...
 struct intel_uc_fw_ver {
 u32 major;
 u32 minor;
 u32 patch;
 u32 build;
 };

So we could make the query u8, and refactor the struct intel_uc_fw_ver
to use u8, or not. To avoid any doubts on why are we assigning u32 to
u8 I simply opted to use u64. Which avoids the need to add any padding
too.

Compile tested only.

Signed-off-by: Tvrtko Ursulin 
Cc: Kenneth Graunke 
Cc: Jose Souza 
Cc: Sagar Ghuge 
Cc: Paulo Zanoni 
Cc: John Harrison 
Cc: Rodrigo Vivi 
Cc: Jani Nikula 
Cc: Tvrtko Ursulin 
Cc: Vivaik Balasubrawmanian 
---
 drivers/gpu/drm/i915/i915_query.c | 32 +++
 include/uapi/drm/i915_drm.h   | 11 +++
 2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 00871ef99792..999687f6a3d4 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -551,6 +551,37 @@ static int query_hwconfig_blob(struct drm_i915_private 
*i915,
return hwconfig->size;
 }
 
+static int
+query_guc_submission_version(struct drm_i915_private *i915,
+struct drm_i915_query_item *query)
+{
+   struct drm_i915_query_guc_submission_version __user *query_ptr =
+   u64_to_user_ptr(query->data_ptr);
+   struct drm_i915_query_guc_submission_version ver;
+   struct intel_guc *guc = &to_gt(i915)->uc.guc;
+   const size_t size = sizeof(ver);
+   int ret;
+
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   return -ENODEV;
+
+   ret = copy_query_item(&ver, size, size, query);
+   if (ret != 0)
+   return ret;
+
+   if (ver.major || ver.minor || ver.patch)
+   return -EINVAL;
+
+   ver.major = guc->submission_version.major;
+   ver.minor = guc->submission_version.minor;
+   ver.patch = guc->submission_version.patch;
+
+   if (copy_to_user(query_ptr, &ver, size))
+   return -EFAULT;
+
+   return 0;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -559,6 +590,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_memregion_info,
query_hwconfig_blob,
query_geometry_subslices,
+   query_guc_submission_version,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 550c496ce76d..d80d9b5e1eda 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
 *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct 
drm_i915_query_memory_regions)
 *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob uAPI`)
 *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct 
drm_i915_query_topology_info)
+*  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct 
drm_i915_query_guc_submission_version)
 */
__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO   1
@@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
 #define DRM_I915_QUERY_MEMORY_REGIONS  4
 #define DRM_I915_QUERY_HWCONFIG_BLOB   5
 #define DRM_I915_QUERY_GEOMETRY_SUBSLICES  6
+#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION  7
 /* Must be kept compact -- no holes and well documented */
 
/**
@@ -3591,6 +3593,15 @@ struct drm_i915_query_memory_regions {
struct drm_i915_memory_region_info regions[];
 };
 
+/**
+* struct drm_i915_query_guc_submission_version - query GuC submission 
interface version
+*/
+struct drm_i915_query_guc_submission_version {
+   __u64 major;
+   __u64 minor;
+   __u64 patch;
+};
+
 /**
  * DOC: GuC HWCONFIG blob uAPI
  *
-- 
2.40.1



Re: [PATCH] drm/i915/gt: Prevent possible NULL dereference in __caps_show()

2024-02-07 Thread Tvrtko Ursulin



Hi,

On 06/02/2024 16:45, Nikita Zhandarovich wrote:

After falling through the switch statement to default case 'repr' is
initialized with NULL, which will lead to incorrect dereference of
'!repr[n]' in the following loop.

Fix it with the help of an additional check for NULL.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 4ec76dbeb62b ("drm/i915/gt: Expose engine properties via sysfs")
Signed-off-by: Nikita Zhandarovich 
---
P.S. The NULL-deref problem might be dealt with this way but I am
not certain that the rest of the __caps_show() behaviour remains
correct if we end up in default case. For instance, as far as I
can tell, buf might turn out to be w/o '\0'. I could use some
direction if this has to be addressed as well.

  drivers/gpu/drm/i915/gt/sysfs_engines.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/sysfs_engines.c 
b/drivers/gpu/drm/i915/gt/sysfs_engines.c
index 021f51d9b456..6b130b732867 100644
--- a/drivers/gpu/drm/i915/gt/sysfs_engines.c
+++ b/drivers/gpu/drm/i915/gt/sysfs_engines.c
@@ -105,7 +105,7 @@ __caps_show(struct intel_engine_cs *engine,
  
  	len = 0;

for_each_set_bit(n, &caps, show_unknown ? BITS_PER_LONG : count) {
-   if (n >= count || !repr[n]) {
+   if (n >= count || !repr || !repr[n]) {


There are two input combinations to this function when repr is NULL.

First is show_unknown=true and caps=0, which means the for_each_set_bit 
will not execute its body. (No bits set.)


Second is show_unknown=false and caps=~0, which means count is zero so 
for_each_set_bit will again not run. (Bitfield size input param is zero.)


So unless I am missing something I do not see the null pointer dereference.

What could theoretically happen is that a third input combination 
appears, where caps is not zero in the show_unknown=true case, either 
via a fully un-handled engine->class (switch), or a new capability bit 
not added to the static array a bit above.


That would assert during driver development here:

if (GEM_WARN_ON(show_unknown))

Granted that could be after the dereference in "if (n >= count || 
!repr[n])", but would be caught in debug builds (CI) and therefore not 
be able to "ship" (get merge to the repo).


Your second question is about empty buffer returned i.e. len=0 at the 
end of the function? (Which is when the buffer will not be null 
terminated - or you see another option?)


That I think is safe too since it just results in a zero length read in 
sysfs.


Regards,

Tvrtko


if (GEM_WARN_ON(show_unknown))
len += sysfs_emit_at(buf, len, "[%x] ", n);
} else {


Re: [RFC PATCH] drm/i915: Add GETPARAM for GuC submission version

2024-02-07 Thread Tvrtko Ursulin



On 06/02/2024 20:51, Souza, Jose wrote:

On Tue, 2024-02-06 at 12:42 -0800, John Harrison wrote:

On 2/6/2024 08:33, Tvrtko Ursulin wrote:

On 01/02/2024 18:25, Souza, Jose wrote:

On Wed, 2024-01-24 at 08:55 +, Tvrtko Ursulin wrote:

On 24/01/2024 08:19, Joonas Lahtinen wrote:

Add reporting of the GuC submissio/VF interface version via GETPARAM
properties. Mesa intends to use this information to check for old
firmware versions with known bugs before enabling features like async
compute.


There was
https://patchwork.freedesktop.org/patch/560704/?series=124592&rev=1
which does everything in one go so would be my preference.


IMO Joonas version brings less burden to be maintained(no new struct).
But both versions works, please just get into some agreement so we
can move this forward.


So I would really prefer the query. Simplified version would do like
the compile tested only:

Vivaik's patch is definitely preferred. It is much cleaner to make one
single call than having to make four separate calls. It is also
extensible to other firmwares if required. The only blockage against it
was whether it was a good thing to report at all. If that blockage is no
longer valid then we should just merge the patch that has already been
discussed, polished, fixed, etc. rather than starting the whole process
from scratch.


Agreed.

Vivaik can you please rebase and send it again?


Note there was review feedback not addressed so do that too please. 
AFAIR incorrect usage of copy item, pad/rsvd/mbz checking and questions 
about padding in general. Last is why I proposed a simplified version 
which is not future extensible and avoids the need for padding.


Regards,

Tvrtko






And note that it is four calls not three. The code below is missing the
branch version number.

John.



diff --git a/drivers/gpu/drm/i915/i915_query.c
b/drivers/gpu/drm/i915/i915_query.c
index 00871ef99792..999687f6a3d4 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -551,6 +551,37 @@ static int query_hwconfig_blob(struct
drm_i915_private *i915,
     return hwconfig->size;
  }

+static int
+query_guc_submission_version(struct drm_i915_private *i915,
+    struct drm_i915_query_item *query)
+{
+   struct drm_i915_query_guc_submission_version __user *query_ptr =
+ u64_to_user_ptr(query->data_ptr);
+   struct drm_i915_query_guc_submission_version ver;
+   struct intel_guc *guc = &to_gt(i915)->uc.guc;
+   const size_t size = sizeof(ver);
+   int ret;
+
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   return -ENODEV;
+
+   ret = copy_query_item(&ver, size, size, query);
+   if (ret != 0)
+   return ret;
+
+   if (ver.major || ver.minor || ver.patch)
+   return -EINVAL;
+
+   ver.major = guc->submission_version.major;
+   ver.minor = guc->submission_version.minor;
+   ver.patch = guc->submission_version.patch;
+
+   if (copy_to_user(query_ptr, &ver, size))
+   return -EFAULT;
+
+   return 0;
+}
+
  static int (* const i915_query_funcs[])(struct drm_i915_private
*dev_priv,
     struct drm_i915_query_item
*query_item) = {
     query_topology_info,
@@ -559,6 +590,7 @@ static int (* const i915_query_funcs[])(struct
drm_i915_private *dev_priv,
     query_memregion_info,
     query_hwconfig_blob,
     query_geometry_subslices,
+   query_guc_submission_version,
  };

  int i915_query_ioctl(struct drm_device *dev, void *data, struct
drm_file *file)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 550c496ce76d..d80d9b5e1eda 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
  *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct
drm_i915_query_memory_regions)
  *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob
uAPI`)
  *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct
drm_i915_query_topology_info)
+    *  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct
drm_i915_query_guc_submission_version)
  */
     __u64 query_id;
  #define DRM_I915_QUERY_TOPOLOGY_INFO   1
@@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
  #define DRM_I915_QUERY_MEMORY_REGIONS  4
  #define DRM_I915_QUERY_HWCONFIG_BLOB   5
  #define DRM_I915_QUERY_GEOMETRY_SUBSLICES  6
+#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION  7
  /* Must be kept compact -- no holes and well documented */

     /**
@@ -3591,6 +3593,15 @@ struct drm_i915_query_memory_regions {
     struct drm_i915_memory_region_info regions[];
  };

+/**
+* struct drm_i915_query_guc_submission_version - query GuC submission
interface version
+*/
+struct drm_i915_query_guc_submission_version {
+   __u64 major;
+   __u64 minor;
+   __u64 patch

Re: [RFC PATCH] drm/i915: Add GETPARAM for GuC submission version

2024-02-06 Thread Tvrtko Ursulin



On 01/02/2024 18:25, Souza, Jose wrote:

On Wed, 2024-01-24 at 08:55 +, Tvrtko Ursulin wrote:

On 24/01/2024 08:19, Joonas Lahtinen wrote:

Add reporting of the GuC submissio/VF interface version via GETPARAM
properties. Mesa intends to use this information to check for old
firmware versions with known bugs before enabling features like async
compute.


There was
https://patchwork.freedesktop.org/patch/560704/?series=124592&rev=1
which does everything in one go so would be my preference.


IMO Joonas version brings less burden to be maintained(no new struct).
But both versions works, please just get into some agreement so we can move 
this forward.


So I would really prefer the query. Simplified version would do like the 
compile tested only:

diff --git a/drivers/gpu/drm/i915/i915_query.c 
b/drivers/gpu/drm/i915/i915_query.c
index 00871ef99792..999687f6a3d4 100644
--- a/drivers/gpu/drm/i915/i915_query.c
+++ b/drivers/gpu/drm/i915/i915_query.c
@@ -551,6 +551,37 @@ static int query_hwconfig_blob(struct drm_i915_private 
*i915,
return hwconfig->size;
 }
 
+static int

+query_guc_submission_version(struct drm_i915_private *i915,
+struct drm_i915_query_item *query)
+{
+   struct drm_i915_query_guc_submission_version __user *query_ptr =
+   u64_to_user_ptr(query->data_ptr);
+   struct drm_i915_query_guc_submission_version ver;
+   struct intel_guc *guc = &to_gt(i915)->uc.guc;
+   const size_t size = sizeof(ver);
+   int ret;
+
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   return -ENODEV;
+
+   ret = copy_query_item(&ver, size, size, query);
+   if (ret != 0)
+   return ret;
+
+   if (ver.major || ver.minor || ver.patch)
+   return -EINVAL;
+
+   ver.major = guc->submission_version.major;
+   ver.minor = guc->submission_version.minor;
+   ver.patch = guc->submission_version.patch;
+
+   if (copy_to_user(query_ptr, &ver, size))
+   return -EFAULT;
+
+   return 0;
+}
+
 static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv,
struct drm_i915_query_item *query_item) 
= {
query_topology_info,
@@ -559,6 +590,7 @@ static int (* const i915_query_funcs[])(struct 
drm_i915_private *dev_priv,
query_memregion_info,
query_hwconfig_blob,
query_geometry_subslices,
+   query_guc_submission_version,
 };
 
 int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 550c496ce76d..d80d9b5e1eda 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3038,6 +3038,7 @@ struct drm_i915_query_item {
 *  - %DRM_I915_QUERY_MEMORY_REGIONS (see struct 
drm_i915_query_memory_regions)
 *  - %DRM_I915_QUERY_HWCONFIG_BLOB (see `GuC HWCONFIG blob uAPI`)
 *  - %DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct 
drm_i915_query_topology_info)
+*  - %DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct 
drm_i915_query_guc_submission_version)
 */
__u64 query_id;
 #define DRM_I915_QUERY_TOPOLOGY_INFO   1
@@ -3046,6 +3047,7 @@ struct drm_i915_query_item {
 #define DRM_I915_QUERY_MEMORY_REGIONS  4
 #define DRM_I915_QUERY_HWCONFIG_BLOB   5
 #define DRM_I915_QUERY_GEOMETRY_SUBSLICES  6
+#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION  7
 /* Must be kept compact -- no holes and well documented */
 
/**

@@ -3591,6 +3593,15 @@ struct drm_i915_query_memory_regions {
struct drm_i915_memory_region_info regions[];
 };
 
+/**

+* struct drm_i915_query_guc_submission_version - query GuC submission 
interface version
+*/
+struct drm_i915_query_guc_submission_version {
+   __u64 major;
+   __u64 minor;
+   __u64 patch;
+};
+
 /**
  * DOC: GuC HWCONFIG blob uAPI
  *

It is not that much bigger that the triple get param and IMO nicer.

But if there is no motivation to do it properly then feel free to proceed with 
this, I will not block it.

Regards,

Tvrtko

P.S.
Probably still make sure to remove the reference to SR-IOV.





During the time of that patch there was discussion whether firmware
version or submission version was better. I vaguely remember someone
raised an issue with the latter. Adding John in case he remembers.


Signed-off-by: Joonas Lahtinen 
Cc: Kenneth Graunke 
Cc: Jose Souza 
Cc: Sagar Ghuge 
Cc: Paulo Zanoni 
Cc: John Harrison 
Cc: Rodrigo Vivi 
Cc: Jani Nikula 
Cc: Tvrtko Ursulin 
---
   drivers/gpu/drm/i915/i915_getparam.c | 12 
   include/uapi/drm/i915_drm.h  | 13 +
   2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
b/drivers/gpu/drm/i915/i915_getparam.c
index 5c3fec63cb4c1..f176372debc54 100644
--- a/drivers/gpu/drm/i915/i

Re: [PATCH 5/6] drm/i915: Update shared stats to use the new gem helper

2024-01-30 Thread Tvrtko Ursulin




On 30/01/2024 16:12, Alex Deucher wrote:

Switch to using the new gem shared memory stats helper
rather than hand rolling it.

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/i915/i915_drm_client.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index fa6852713bee..f58682505491 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -53,7 +53,7 @@ obj_meminfo(struct drm_i915_gem_object *obj,
obj->mm.region->id : INTEL_REGION_SMEM;
const u64 sz = obj->base.size;
  
-	if (obj->base.handle_count > 1)

+   if (drm_gem_object_is_shared_for_memory_stats(&obj->base))
stats[id].shared += sz;
else
stats[id].private += sz;


Reviewed-by: Tvrtko Ursulin 

Good that you remembered this story, I completely forgot!

Regards,

Tvrtko


Re: [PATCH 2/6] drm: add drm_gem_object_is_shared_for_memory_stats() helper

2024-01-30 Thread Tvrtko Ursulin



On 30/01/2024 16:12, Alex Deucher wrote:

Add a helper so that drm drivers can consistently report
shared status via the fdinfo shared memory stats interface.

In addition to handle count, show buffers as shared if they
are shared via dma-buf as well (e.g., shared with v4l or some
other subsystem).

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/drm_gem.c | 16 
  include/drm/drm_gem.h |  1 +
  2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 44a948b80ee1..71b5f628d828 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1506,3 +1506,19 @@ int drm_gem_evict(struct drm_gem_object *obj)
return 0;
  }
  EXPORT_SYMBOL(drm_gem_evict);
+
+/**
+ * drm_gem_object_is_shared_for_memory_stats - helper for shared memory stats
+ *
+ * This helper should only be used for fdinfo shared memory stats to determine
+ * if a GEM object is shared.
+ *
+ * @obj: obj in question
+ */
+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj)
+{
+   if ((obj->handle_count > 1) || obj->dma_buf)
+   return true;
+   return false;
+}
+EXPORT_SYMBOL(drm_gem_object_is_shared_for_memory_stats);
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 369505447acd..86a9c696f038 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -552,6 +552,7 @@ unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru,
   bool (*shrink)(struct drm_gem_object *obj));
  
  int drm_gem_evict(struct drm_gem_object *obj);

+bool drm_gem_object_is_shared_for_memory_stats(struct drm_gem_object *obj);
  
  #ifdef CONFIG_LOCKDEP

  /**


Not sure what the local view on static inlines, but fine nevertheless.

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH 3/6] drm: update drm_show_memory_stats() for dma-bufs

2024-01-30 Thread Tvrtko Ursulin




On 30/01/2024 16:12, Alex Deucher wrote:

Show buffers as shared if they are shared via dma-buf as well
(e.g., shared with v4l or some other subsystem).

v2: switch to gem helper

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Reviewed-by: Rob Clark  (v1)
Signed-off-by: Alex Deucher 
Cc: Rob Clark 
---
  drivers/gpu/drm/drm_file.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 8c87287c3e16..638ffaf5 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -913,7 +913,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
DRM_GEM_OBJECT_PURGEABLE;
}
  
-		if (obj->handle_count > 1) {

+   if (drm_gem_object_is_shared_for_memory_stats(obj)) {
status.shared += obj->size;
} else {
status.private += obj->size;


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [PATCH 1/6] Documentation/gpu: Update documentation on drm-shared-*

2024-01-30 Thread Tvrtko Ursulin



On 30/01/2024 16:12, Alex Deucher wrote:

Clarify the documentaiton in preparation for updated
helpers which check the handle count as well as whether
a dma-buf has been attached.

Link: 
https://lore.kernel.org/all/20231207180225.439482-1-alexander.deuc...@amd.com/
Signed-off-by: Alex Deucher 
---
  Documentation/gpu/drm-usage-stats.rst | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 7aca5c7a7b1d..6dc299343b48 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -138,7 +138,7 @@ indicating kibi- or mebi-bytes.
  
  - drm-shared-:  [KiB|MiB]
  
-The total size of buffers that are shared with another file (ie. have more

+The total size of buffers that are shared with another file (e.g., have more
  than a single handle).
  
  - drm-total-:  [KiB|MiB]


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [RFC PATCH] drm/i915: Add GETPARAM for GuC submission version

2024-01-24 Thread Tvrtko Ursulin



On 24/01/2024 08:19, Joonas Lahtinen wrote:

Add reporting of the GuC submissio/VF interface version via GETPARAM
properties. Mesa intends to use this information to check for old
firmware versions with known bugs before enabling features like async
compute.


There was 
https://patchwork.freedesktop.org/patch/560704/?series=124592&rev=1 
which does everything in one go so would be my preference.


During the time of that patch there was discussion whether firmware 
version or submission version was better. I vaguely remember someone 
raised an issue with the latter. Adding John in case he remembers.



Signed-off-by: Joonas Lahtinen 
Cc: Kenneth Graunke 
Cc: Jose Souza 
Cc: Sagar Ghuge 
Cc: Paulo Zanoni 
Cc: John Harrison 
Cc: Rodrigo Vivi 
Cc: Jani Nikula 
Cc: Tvrtko Ursulin 
---
  drivers/gpu/drm/i915/i915_getparam.c | 12 
  include/uapi/drm/i915_drm.h  | 13 +
  2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
b/drivers/gpu/drm/i915/i915_getparam.c
index 5c3fec63cb4c1..f176372debc54 100644
--- a/drivers/gpu/drm/i915/i915_getparam.c
+++ b/drivers/gpu/drm/i915/i915_getparam.c
@@ -113,6 +113,18 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
if (value < 0)
return value;
break;
+   case I915_PARAM_GUC_SUBMISSION_VERSION_MAJOR:
+   case I915_PARAM_GUC_SUBMISSION_VERSION_MINOR:
+   case I915_PARAM_GUC_SUBMISSION_VERSION_PATCH:
+   if (!intel_uc_uses_guc_submission(&to_gt(i915)->uc))
+   return -ENODEV;
+   if (param->param == I915_PARAM_GUC_SUBMISSION_VERSION_MAJOR)
+   value = to_gt(i915)->uc.guc.submission_version.major;
+   else if (param->param == 
I915_PARAM_GUC_SUBMISSION_VERSION_MINOR)
+   value = to_gt(i915)->uc.guc.submission_version.minor;
+   else
+   value = to_gt(i915)->uc.guc.submission_version.patch;
+   break;
case I915_PARAM_MMAP_GTT_VERSION:
/* Though we've started our numbering from 1, and so class all
 * earlier versions as 0, in effect their value is undefined as
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index fd4f9574d177a..7d5a47f182542 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -806,6 +806,19 @@ typedef struct drm_i915_irq_wait {
   */
  #define I915_PARAM_PXP_STATUS  58
  
+/*

+ * Query for the GuC submission/VF interface version number


What is this VF you speak of? :/

Regards,

Tvrtko


+ *
+ * -ENODEV is returned if GuC submission is not used
+ *
+ * On success, returns the respective GuC submission/VF interface major,
+ * minor or patch version as per the requested parameter.
+ *
+ */
+#define I915_PARAM_GUC_SUBMISSION_VERSION_MAJOR 59
+#define I915_PARAM_GUC_SUBMISSION_VERSION_MINOR 60
+#define I915_PARAM_GUC_SUBMISSION_VERSION_PATCH 61
+
  /* Must be kept compact -- no holes and well documented */
  
  /**


[PATCH i-g-t] tools/intel_gpu_top: Fix near full percentage bar formatting

2024-01-18 Thread Tvrtko Ursulin
From: Tvrtko Ursulin 

Fix a bug where 1) the end vertical separator element would not be printed
if the progress bar portion was all filled by the progress bar characters
(no trailing spaces), and 2) the numerical overlay would be skipped to.

The bug would also shift the layout of following UI elements since the
progress bar would not be consuming all the allocated horizontal space.

Signed-off-by: Tvrtko Ursulin 
Reported-by: anonymoustranquill...@proton.me
---
 tools/intel_gpu_top.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 046ead15a122..5b4f94d7de7a 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -1015,9 +1015,8 @@ print_percentage_bar(double percent, double max, int 
max_len, bool numeric)
printf("%s", bars[i]);
 
len -= (bar_len + (w - 1)) / w;
-   if (len < 1)
-   return;
-   n_spaces(len);
+   if (len >= 1)
+   n_spaces(len);
 
putchar('|');
 
-- 
2.40.1



Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-09 Thread Tvrtko Ursulin



On 08/01/2024 15:13, Joonas Lahtinen wrote:

Quoting Tvrtko Ursulin (2024-01-05 12:39:31)


On 04/01/2024 21:23, Andi Shyti wrote:





+void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+   mutex_lock(>->ccs.mutex);
+   __intel_gt_apply_ccs_mode(gt);
+   mutex_unlock(>->ccs.mutex);
+}
+
+void intel_gt_init_ccs_mode(struct intel_gt *gt)
+{
+   mutex_init(>->ccs.mutex);
+   gt->ccs.mode = 1;


What is '1'? And this question carries over to the sysfs interface in the
following patch - who will use it and where it is documented how to use it?


The value '1' is explained in the comment above[1] and in the


Do you mean this is mode '1':

   * With 1 engine (ccs0):
   *   slice 0, 1, 2, 3: ccs0

?

But I don't see where it says what do different modes mean on different
SKU configurations.

It also does not say what should the num_slices sysfs file be used for.

Does "mode N" mean "assign each command streamer N compute slices"? Or
"assign each compute slice N command streamers"?

I wonder if we should add something user friendly into
Documentation/ABI/*/sysfs-... Joonas your thoughts?


We definitely should always properly document all sysfs additions, just
seems like we less frequently remember to do so. So yeah, this should be
documented just like other uAPI.

I also like the idea of not exposing the the file at all if the value
can't be modified.

The ccs_mode is just supposed to allow user to select how many CCS
engines they want to expose, and always make an even split of slices
between them, nothing more nothing less.


Hmm I can't see that the series changes anywhere what command streamers 
will get reported as available.


Regards,

Tvrtko




Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-05 Thread Tvrtko Ursulin



On 04/01/2024 21:23, Andi Shyti wrote:

Hi Tvrtko,

[1]


+   /*
+* Loop over all available slices and assign each a user engine.
+*
+* With 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* With 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* With 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*
+* Since the number of slices and the number of engines is
+* known, and we ensure that there is an exact multiple of
+* engines for slices, the double loop becomes a loop over each
+* slice.
+*/
+   for (i = num_slices / num_engines; i < num_slices; i++) {
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   for_each_engine_masked(engine, gt, ALL_CCS(gt), tmp) {
+   /* If a slice is fused off, leave disabled */
+   while (!(CCS_MASK(gt) & BIT(slice)))
+   slice++;
+
+   mode &= ~XEHP_CCS_MODE_CSLICE(slice, 
XEHP_CCS_MODE_CSLICE_MASK);
+   mode |= XEHP_CCS_MODE_CSLICE(slice, engine->instance);
+
+   /* assign the next slice */
+   slice++;
+   }
+   }
+
+   intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
+}
+
+void intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+   mutex_lock(>->ccs.mutex);
+   __intel_gt_apply_ccs_mode(gt);
+   mutex_unlock(>->ccs.mutex);
+}
+
+void intel_gt_init_ccs_mode(struct intel_gt *gt)
+{
+   mutex_init(>->ccs.mutex);
+   gt->ccs.mode = 1;


What is '1'? And this question carries over to the sysfs interface in the
following patch - who will use it and where it is documented how to use it?


The value '1' is explained in the comment above[1] and in the


Do you mean this is mode '1':

 * With 1 engine (ccs0):
 *   slice 0, 1, 2, 3: ccs0

?

But I don't see where it says what do different modes mean on different 
SKU configurations.


It also does not say what should the num_slices sysfs file be used for.

Does "mode N" mean "assign each command streamer N compute slices"? Or 
"assign each compute slice N command streamers"?


I wonder if we should add something user friendly into 
Documentation/ABI/*/sysfs-... Joonas your thoughts?



comment below[2]. Maybe we should give it an enum meaning? But
that would be something like CCS_MODE_1/2/4, I thinks
ccs.mode = 1/2/4 is more understandable.


Also, should this setting somehow be gated by an applicable platform? Or if
not on setting then when acting on it in __intel_gt_apply_ccs_mode?

Creation of sysfs files as well should be gated by platform too in the
following patch?


The idea of this series is to disable the CCS load balancing
(which automatically chooses between mode 1/2/4) and used the
a fixed scheme chosen by the user.

(I'm preparing v3 as Chris was so kind to recommend some changes
offline)


Okay lets wait for v2 and I will then see if I will this that will make 
it clearer to casual observers.


Regards,

Tvrtko



Thanks,
Andi

[2]


+   /*
+* Track fixed mapping between CCS engines and compute slices.
+*
+* In order to w/a HW that has the inability to dynamically load
+* balance between CCS engines and EU in the compute slices, we have to
+* reconfigure a static mapping on the fly. We track the current CCS
+* configuration (set by thr user through a sysfs interface) and compare
+* it against the current CCS_MODE (which maps CCS engines to compute
+* slices). If there is only a single engine selected, we can map it to
+* all available compute slices for maximal single task performance
+* (fast/narrow). If there are more then one engine selected, we have to
+* reduce the number of slices allocated to each engine (wide/slow),
+* fairly distributing the EU between the equivalent engines.
+*/
+   struct {
+   struct mutex mutex;
+   u32 mode;
+   } ccs;


Re: [PATCH v2 1/3] drm/i915/gt: Support fixed CCS mode

2024-01-04 Thread Tvrtko Ursulin



On 04/01/2024 14:35, Andi Shyti wrote:

The CCS mode involves assigning CCS engines to slices depending
on the number of slices and the number of engines the user wishes
to set.

In this patch, the default CCS setting is established during the
initial GT settings. It involves assigning only one CCS to all
the slices.

Based on a patch by Chris Wilson 
and Tejas Upadhyay .

Signed-off-by: Andi Shyti 
Cc: Chris Wilson 
Cc: Joonas Lahtinen 
Cc: Niranjana Vishwanathapura 
Cc: Tejas Upadhyay 
---
  drivers/gpu/drm/i915/Makefile   |  1 +
  drivers/gpu/drm/i915/gt/intel_gt.c  |  6 ++
  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 81 +
  drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 16 
  drivers/gpu/drm/i915/gt/intel_gt_regs.h | 13 
  drivers/gpu/drm/i915/gt/intel_gt_types.h| 19 +
  drivers/gpu/drm/i915/i915_drv.h |  2 +
  7 files changed, 138 insertions(+)
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
  create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e777686190ca..1dce15d6306b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -119,6 +119,7 @@ gt-y += \
gt/intel_ggtt_fencing.o \
gt/intel_gt.o \
gt/intel_gt_buffer_pool.o \
+   gt/intel_gt_ccs_mode.o \
gt/intel_gt_clock_utils.o \
gt/intel_gt_debugfs.o \
gt/intel_gt_engines_debugfs.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index a425db5ed3a2..e83c7b80c07a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -17,6 +17,7 @@
  #include "intel_engine_regs.h"
  #include "intel_ggtt_gmch.h"
  #include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
  #include "intel_gt_buffer_pool.h"
  #include "intel_gt_clock_utils.h"
  #include "intel_gt_debugfs.h"
@@ -47,6 +48,7 @@ void intel_gt_common_init_early(struct intel_gt *gt)
init_llist_head(>->watchdog.list);
INIT_WORK(>->watchdog.work, intel_gt_watchdog_work);
  
+	intel_gt_init_ccs_mode(gt);

intel_gt_init_buffer_pool(gt);
intel_gt_init_reset(gt);
intel_gt_init_requests(gt);
@@ -195,6 +197,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
  
  	intel_gt_init_swizzling(gt);
  
+	/* Configure CCS mode */

+   intel_gt_apply_ccs_mode(gt);
+
/*
 * At least 830 can leave some of the unused rings
 * "active" (ie. head != tail) after resume which
@@ -860,6 +865,7 @@ void intel_gt_driver_late_release_all(struct 
drm_i915_private *i915)
  
  	for_each_gt(gt, i915, id) {

intel_uc_driver_late_release(>->uc);
+   intel_gt_fini_ccs_mode(gt);
intel_gt_fini_requests(gt);
intel_gt_fini_reset(gt);
intel_gt_fini_timelines(gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c 
b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
new file mode 100644
index ..fab8a77bded2
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -0,0 +1,81 @@
+//SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "i915_drv.h"
+
+#include "intel_gt.h"
+#include "intel_gt_ccs_mode.h"
+#include "intel_gt_regs.h"
+#include "intel_gt_types.h"
+
+static void __intel_gt_apply_ccs_mode(struct intel_gt *gt)
+{
+   u32 mode = XEHP_CCS_MODE_CSLICE_0_3_MASK; /* disable all by default */
+   int num_slices = hweight32(CCS_MASK(gt));
+   int num_engines = gt->ccs.mode;
+   int slice = 0;
+   int i;
+
+   if (!num_engines)
+   return;
+
+   /*
+* Loop over all available slices and assign each a user engine.
+*
+* With 1 engine (ccs0):
+*   slice 0, 1, 2, 3: ccs0
+*
+* With 2 engines (ccs0, ccs1):
+*   slice 0, 2: ccs0
+*   slice 1, 3: ccs1
+*
+* With 4 engines (ccs0, ccs1, ccs2, ccs3):
+*   slice 0: ccs0
+*   slice 1: ccs1
+*   slice 2: ccs2
+*   slice 3: ccs3
+*
+* Since the number of slices and the number of engines is
+* known, and we ensure that there is an exact multiple of
+* engines for slices, the double loop becomes a loop over each
+* slice.
+*/
+   for (i = num_slices / num_engines; i < num_slices; i++) {
+   struct intel_engine_cs *engine;
+   intel_engine_mask_t tmp;
+
+   for_each_engine_masked(engine, gt, ALL_CCS(gt), tmp) {
+   /* If a slice is fused off, leave disabled */
+   while (!(CCS_MASK(gt) & BIT(slice)))
+   slice++;
+
+   mode &= ~XEHP_CCS_MODE_CSLICE(slice, 
XEHP_CCS_MODE_CSLICE_MASK);
+   mode |= XEHP_CCS_MODE_CSLICE(slice, engine->instance);
+
+

  1   2   3   4   5   6   7   8   9   10   >