[Intel-gfx] [PATCH v4a 11/38] timers: drm: Use timer_shutdown_sync() before freeing timer

2022-11-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Before a timer is freed, timer_shutdown_sync() must be called.

Link: https://lore.kernel.org/all/20221104054053.431922...@goodmis.org/

Cc: "Noralf Trønnes" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: dri-de...@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Steven Rostedt (Google) 
---
 drivers/gpu/drm/i915/i915_sw_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index 6fc0d1b89690..bfaa9a67dc35 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -465,7 +465,7 @@ static void irq_i915_sw_fence_work(struct irq_work *wrk)
struct i915_sw_dma_fence_cb_timer *cb =
container_of(wrk, typeof(*cb), work);
 
-   del_timer_sync(&cb->timer);
+   timer_shutdown_sync(&cb->timer);
dma_fence_put(cb->dma);
 
kfree_rcu(cb, rcu);
-- 
2.35.1


[Intel-gfx] [PATCH v4a 31/38] timers: drm: Use timer_shutdown_sync() for on stack timers

2022-11-04 Thread Steven Rostedt
From: "Steven Rostedt (Google)" 

Before a timer is released, timer_shutdown_sync() must be called.

Link: https://lore.kernel.org/all/20221104054053.431922...@goodmis.org/

Cc: "Noralf Trønnes" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: dri-de...@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Steven Rostedt (Google) 
---
 drivers/gpu/drm/gud/gud_pipe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/gud/gud_pipe.c b/drivers/gpu/drm/gud/gud_pipe.c
index 7c6dc2bcd14a..08429bdd57cf 100644
--- a/drivers/gpu/drm/gud/gud_pipe.c
+++ b/drivers/gpu/drm/gud/gud_pipe.c
@@ -272,7 +272,7 @@ static int gud_usb_bulk(struct gud_device *gdrm, size_t len)
 
usb_sg_wait(&ctx.sgr);
 
-   if (!del_timer_sync(&ctx.timer))
+   if (!timer_shutdown_sync(&ctx.timer))
ret = -ETIMEDOUT;
else if (ctx.sgr.status < 0)
ret = ctx.sgr.status;
-- 
2.35.1


[Intel-gfx] [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Steven Rostedt


Back in April, I posted an RFC patch set to help mitigate a common issue
where a timer gets armed just before it is freed, and when the timer
goes off, it crashes in the timer code without any evidence of who the
culprit was. I got side tracked and never finished up on that patch set.
Since this type of crash is still our #1 crash we are seeing in the field,
it has become a priority again to finish it.

The last version of that patch set is here:

  https://lore.kernel.org/all/20221104054053.431922...@goodmis.org/

I'm calling this version 4a as it only has obvious changes were the timer that
is being shutdown is in the same function where it will be freed or released,
as this series should be "safe" for adding. I'll be calling the other patches
4b for the next merge window.

Patch 1 fixes an issue with sunrpc/xprt where it incorrectly uses
del_singleshot_timer_sync() for something that is not a oneshot timer. As this
will be converted to shutdown, this needs to be fixed first.

Patches 2-4 changes existing timer_shutdown() functions used locally in ARM and
some drivers to better namespace names.

Patch 5 implements the new timer_shutdown() and timer_shutdown_sync() functions
that disable re-arming the timer after they are called.

Patches 6-28 change all the locations where there's a kfree(), kfree_rcu(),
kmem_cache_free() and one call_rcu() call where the RCU function frees the
timer (the workqueue patch) in the same function as the del_timer{,_sync}() is
called on that timer, and there's no extra exit path between the del_timer and
freeing of the timer.

Patches 29-32 add timer_shutdown*() on on-stack timers that are about to be
released at the end of the function.

Patches 33-37 add timer_shutdown*() on module timers in the module exit code.

Patch 38 simply converts an open coded "shutdown" code into timer_shutdown(),
as a way timer_shutdown() disables the timer is by setting that timer function
to NULL.

Linus, I sorted the patches this way to let you see which you would think is
safe to go into this -rc. I honestly believe that they are all safe, but that's
just my own opinion.

This series is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
timers-start

Head SHA1: f58b516a65bac76f1bfa00126856d6c6c3d24a40


Steven Rostedt (Google) (38):
  SUNRPC/xprt: Use del_timer_sync() instead of del_singleshot_timer_sync()
  ARM: spear: Do not use timer namespace for timer_shutdown() function
  clocksource/drivers/arm_arch_timer: Do not use timer namespace for 
timer_shutdown() function
  clocksource/drivers/sp804: Do not use timer namespace for 
timer_shutdown() function
  timers: Add timer_shutdown_sync() and timer_shutdown() to be called 
before freeing timers
  timers: sh: Use timer_shutdown_sync() before freeing timer
  timers: block: Use timer_shutdown_sync() before freeing timer
  timers: ACPI: Use timer_shutdown_sync() before freeing timer
  timers: atm: Use timer_shutdown_sync() before freeing timer
  timers: Bluetooth: Use timer_shutdown_sync() before freeing timer
  timers: drm: Use timer_shutdown_sync() before freeing timer
  timers: HID: Use timer_shutdown_sync() before freeing timer
  timers: Input: Use timer_shutdown_sync() before freeing timer
  timers: mISDN: Use timer_shutdown_sync() before freeing timer
  timers: leds: Use timer_shutdown_sync() before freeing timer
  timers: media: Use timer_shutdown_sync() before freeing timer
  timers: net: Use timer_shutdown_sync() before freeing timer
  timers: usb: Use timer_shutdown_sync() before freeing timer
  timers: nfc: pn533: Use timer_shutdown_sync() before freeing timer
  timers: pcmcia: Use timer_shutdown_sync() before freeing timer
  timers: scsi: Use timer_shutdown_sync() and timer_shutdown() before 
freeing timer
  timers: tty: Use timer_shutdown_sync() before freeing timer
  timers: ext4: Use timer_shutdown_sync() before freeing timer
  timers: fs/nilfs2: Use timer_shutdown_sync() before freeing timer
  timers: ALSA: Use timer_shutdown_sync() before freeing timer
  timers: jbd2: Use timer_shutdown() before freeing timer
  timers: sched/psi: Use timer_shutdown_sync() before freeing timer
  timers: workqueue: Use timer_shutdown_sync() before freeing timer
  random: use timer_shutdown_sync() for on stack timers
  timers: dma-buf: Use timer_shutdown_sync() for on stack timers
  timers: drm: Use timer_shutdown_sync() for on stack timers
  timers: media: Use timer_shutdown_sync() for on stack timers
  timers: s390/cmm: Use timer_shutdown_sync() before a module is released
  timers: atm: Use timer_shutdown_sync() before a module is released
  timers: hangcheck: Use timer_shutdown_sync() before a module is released
  timers: ipmi: Use timer_shutdown_sync() before a module is released
  timers: Input: Use timer_shutdown_sync() before a module is released
  timers: PM: 

Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Properly initialise kernel contexts

2022-11-04 Thread Lucas De Marchi

On Wed, Nov 02, 2022 at 12:21:08PM -0700, john.c.harri...@intel.com wrote:

From: John Harrison 

If a context has already been registered prior to first submission
then context init code was not being called. The noticeable effect of
that was the scheduling priority was left at zero (meaning super high
priority) instead of being set to normal. This would occur with
kernel contexts at start of day as they are manually pinned up front
rather than on first submission. So add a call to initialise those
when they are pinned.

Signed-off-by: John Harrison 



Reviewed-by: Lucas De Marchi 

Lucas De Marchi 


---
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 4ccb29f9ac55c..941613be3b9dd 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4111,6 +4111,9 @@ static inline void guc_kernel_context_pin(struct 
intel_guc *guc,
if (context_guc_id_invalid(ce))
pin_guc_id(guc, ce);

+   if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
+   guc_context_init(ce);
+
try_context_registration(ce, true);
}

--
2.37.3



[Intel-gfx] ✗ Fi.CI.IGT: failure for KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL (rev2)

2022-11-04 Thread Patchwork
== Series Details ==

Series: KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if 
current->mm is not NULL (rev2)
URL   : https://patchwork.freedesktop.org/series/110492/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12342_full -> Patchwork_110492v2_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_110492v2_full absolutely need 
to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_110492v2_full, please notify your bug team to allow 
them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 9)
--

  Missing(2): shard-rkl shard-dg1 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110492v2_full:

### IGT changes ###

 Possible regressions 

  * igt@gem_exec_suspend@basic-s3@smem:
- shard-skl:  NOTRUN -> [INCOMPLETE][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-skl6/igt@gem_exec_suspend@basic...@smem.html

  
Known issues


  Here are the changes found in Patchwork_110492v2_full that come from known 
issues:

### CI changes ###

 Issues hit 

  * boot:
- shard-glk:  ([PASS][2], [PASS][3], [PASS][4], [PASS][5], 
[PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], 
[PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], 
[PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], 
[PASS][25], [PASS][26]) -> ([PASS][27], [PASS][28], [PASS][29], [PASS][30], 
[PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], 
[PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], 
[PASS][43], [PASS][44], [FAIL][45], [PASS][46], [PASS][47], [PASS][48], 
[PASS][49], [PASS][50]) ([i915#4392])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk1/boot.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk1/boot.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk1/boot.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk2/boot.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk2/boot.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk2/boot.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk2/boot.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk3/boot.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk3/boot.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk3/boot.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk5/boot.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk5/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk5/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk6/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk6/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk6/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk7/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk7/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk7/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk8/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk8/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk8/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk9/boot.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk9/boot.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk9/boot.html
   [27]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk1/boot.html
   [28]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk1/boot.html
   [29]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk1/boot.html
   [30]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk2/boot.html
   [31]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk2/boot.html
   [32]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk2/boot.html
   [33]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk3/boot.html
   [34]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk3/boot.html
   [35]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/shard-glk3/boot.html
   [36]: 
https://intel-gfx-ci.01.org

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-11-04 Thread Brian Norris
On Fri, Nov 04, 2022 at 02:38:03PM +, Matthew Auld wrote:
> On Thu, 3 Nov 2022 at 00:14, Brian Norris  wrote:
> > I'm still curious about the reported failures, but maybe they require
> > some particular sequence of tests? I also don't have the full
> > igt-gpu-tools set running, so maybe they do something a little
> > differently than my steps in [1]?
> >
> > Brian
> >
> > [1] I have a GLk system, if it matters. I figured I can run some of
> > these with any one of the following:
> >
> >   modprobe i915 live_selftests=1
> >   modprobe i915 live_selftests=1 igt__20__live_workarounds=Y
> >   modprobe i915 live_selftests=1 igt__19__live_uncore=Y
> >   modprobe i915 live_selftests=1 igt__18__live_sanitycheck=Y
> >   ...
> 
> CI should be using the IGT wrapper to run them, AFAIK. So something like:
> 
> ./build/tests/i915_selftest
> 
> Or to just run the live, mock or perf:
> 
> ./build/tests/i915_selftest --run-subtest live
> ./build/tests/i915_selftest --run-subtest mock
> ./build/tests/i915_selftest --run-subtest perf
> 
> Or if you want to run some particular selftest, like live mman tests:
> 
> ./build/tests/i915_selftest --run-subtest live --dyn mman

Thanks. I'm running through those now, and it seems like I'm doing
closer to what the CI logs show [1], but I'm still not reproducing on my
GLK. (I've now managed to run it with drm-tip; still no luck.)

So far, now I've managed to just reproduced *different* known problems:

https://lore.kernel.org/all/y2wfplbx1sedt...@google.com/

But after working around those, I run without any similar lockup
failures.

I might poke around some more next week, but I've probably spent more
time than reasonable on this already.

Anyway, thanks for the help!

Regards,
Brian

[1] For one, I've run through a test list, in order, based on this:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/fi-glk-j4005/testlist0.txt


[Intel-gfx] ✓ Fi.CI.BAT: success for Fix live busy stats selftest failure

2022-11-04 Thread Patchwork
== Series Details ==

Series: Fix live busy stats selftest failure
URL   : https://patchwork.freedesktop.org/series/110557/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12346 -> Patchwork_110557v1


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110557v1/index.html

Participating hosts (39 -> 27)
--

  Missing(12): fi-bdw-samus bat-dg2-8 bat-dg2-9 bat-adlp-6 bat-adlp-4 
fi-ctg-p8600 bat-adln-1 bat-rplp-1 bat-rpls-1 bat-rpls-2 bat-dg2-11 bat-jsl-1 

Known issues


  Here are the changes found in Patchwork_110557v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_gttfill@basic:
- fi-pnv-d510:[PASS][1] -> [FAIL][2] ([i915#7229])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12346/fi-pnv-d510/igt@gem_exec_gttf...@basic.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110557v1/fi-pnv-d510/igt@gem_exec_gttf...@basic.html

  
 Possible fixes 

  * igt@i915_selftest@live@hangcheck:
- {fi-ehl-2}: [INCOMPLETE][3] -> [PASS][4]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12346/fi-ehl-2/igt@i915_selftest@l...@hangcheck.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110557v1/fi-ehl-2/igt@i915_selftest@l...@hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
  the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#7229]: https://gitlab.freedesktop.org/drm/intel/issues/7229


Build changes
-

  * Linux: CI_DRM_12346 -> Patchwork_110557v1

  CI-20190529: 20190529
  CI_DRM_12346: 7b32ba9462baa932abf6cbe2f1a8ecb79e922a6e @ 
git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7044: dbeb6f92720292f8303182a0e649284cea5b11a6 @ 
https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_110557v1: 7b32ba9462baa932abf6cbe2f1a8ecb79e922a6e @ 
git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

231612650801 drm/i915/selftest: Bump up sample period for busy stats selftest
18b0e07b0348 i915/uncore: Acquire fw before loop in intel_uncore_read64_2x32

== Logs ==

For more details see: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110557v1/index.html


[Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [CI,1/2] Revert "freezer, sched: Rewrite core freezer logic fix"

2022-11-04 Thread Patchwork
== Series Details ==

Series: series starting with [CI,1/2] Revert "freezer, sched: Rewrite core 
freezer logic fix"
URL   : https://patchwork.freedesktop.org/series/110529/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12342_full -> Patchwork_110529v1_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_110529v1_full absolutely need 
to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_110529v1_full, please notify your bug team to allow 
them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 9)
--

  Missing(2): shard-rkl shard-dg1 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110529v1_full:

### IGT changes ###

 Possible regressions 

  * igt@i915_suspend@fence-restore-tiled2untiled:
- shard-tglb: [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-tglb1/igt@i915_susp...@fence-restore-tiled2untiled.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-tglb1/igt@i915_susp...@fence-restore-tiled2untiled.html

  
Known issues


  Here are the changes found in Patchwork_110529v1_full that come from known 
issues:

### IGT changes ###

 Issues hit 

  * igt@api_intel_bb@crc32:
- shard-tglb: NOTRUN -> [SKIP][3] ([i915#6230])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-tglb7/igt@api_intel...@crc32.html

  * igt@feature_discovery@psr2:
- shard-iclb: [PASS][4] -> [SKIP][5] ([i915#658])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-iclb2/igt@feature_discov...@psr2.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-iclb7/igt@feature_discov...@psr2.html

  * igt@gem_exec_balancer@parallel-keep-submit-fence:
- shard-iclb: [PASS][6] -> [SKIP][7] ([i915#4525])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-iclb1/igt@gem_exec_balan...@parallel-keep-submit-fence.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-iclb5/igt@gem_exec_balan...@parallel-keep-submit-fence.html

  * igt@gem_exec_fair@basic-deadline:
- shard-skl:  NOTRUN -> [FAIL][8] ([i915#2846])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-skl4/igt@gem_exec_f...@basic-deadline.html

  * igt@gem_exec_fair@basic-flow@rcs0:
- shard-tglb: [PASS][9] -> [FAIL][10] ([i915#2842])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-tglb5/igt@gem_exec_fair@basic-f...@rcs0.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-tglb7/igt@gem_exec_fair@basic-f...@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
- shard-glk:  [PASS][11] -> [FAIL][12] ([i915#2842]) +1 similar 
issue
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-glk5/igt@gem_exec_fair@basic-pace-sh...@rcs0.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-glk9/igt@gem_exec_fair@basic-pace-sh...@rcs0.html

  * igt@gem_exec_flush@basic-batch-kernel-default-cmd:
- shard-tglb: NOTRUN -> [SKIP][13] ([fdo#109313])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-tglb7/igt@gem_exec_fl...@basic-batch-kernel-default-cmd.html

  * igt@gem_huc_copy@huc-copy:
- shard-tglb: [PASS][14] -> [SKIP][15] ([i915#2190])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/shard-tglb5/igt@gem_huc_c...@huc-copy.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-tglb6/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@basic:
- shard-skl:  NOTRUN -> [SKIP][16] ([fdo#109271] / [i915#4613]) +3 
similar issues
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-skl9/igt@gem_lmem_swapp...@basic.html

  * igt@gem_lmem_swapping@parallel-multi:
- shard-glk:  NOTRUN -> [SKIP][17] ([fdo#109271] / [i915#4613]) +1 
similar issue
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-glk9/igt@gem_lmem_swapp...@parallel-multi.html

  * igt@gem_lmem_swapping@parallel-random-verify-ccs:
- shard-apl:  NOTRUN -> [SKIP][18] ([fdo#109271] / [i915#4613]) +1 
similar issue
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-apl6/igt@gem_lmem_swapp...@parallel-random-verify-ccs.html

  * igt@gem_userptr_blits@probe:
- shard-skl:  NOTRUN -> [FAIL][19] ([i915#7247])
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/shard-skl7/igt@gem_userptr_bl...@probe.html

  * igt@gem_userptr_blits@vma-merge:
- shard-skl:  NOTRUN -> [FAIL][20] (

Re: [Intel-gfx] [PATCH 1/2] drm/i915/gt: Add GT oriented dmesg output

2022-11-04 Thread Ceraolo Spurio, Daniele




On 11/4/2022 10:25 AM, john.c.harri...@intel.com wrote:

From: John Harrison 

When trying to analyse bug reports from CI, customers, etc. it can be
difficult to work out exactly what is happening on which GT in a
multi-GT system. So add GT oriented debug/error message wrappers. If
used instead of the drm_ equivalents, you get the same output but with
a GT# prefix on it.

Signed-off-by: John Harrison 


The only downside to this is that we'll print "GT0: " even on single-GT 
devices. We could introduce a gt->info.name and print that, so we could 
have it different per-platform, but IMO it's not worth the effort.


Reviewed-by: Daniele Ceraolo Spurio 

I think it might be worth getting an ack from one of the maintainers to 
make sure we're all aligned on transitioning to these new logging macro 
for gt code.


Daniele


---
  drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index e0365d5562484..1e016fb0117a4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -13,6 +13,21 @@
  struct drm_i915_private;
  struct drm_printer;
  
+#define GT_ERR(_gt, _fmt, ...) \

+   drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
+#define GT_WARN(_gt, _fmt, ...) \
+   drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)
+
+#define GT_NOTICE(_gt, _fmt, ...) \
+   drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)
+
+#define GT_INFO(_gt, _fmt, ...) \
+   drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)
+
+#define GT_DBG(_gt, _fmt, ...) \
+   drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
  #define GT_TRACE(gt, fmt, ...) do {   \
const struct intel_gt *gt__ __maybe_unused = (gt);  \
GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev), \




[Intel-gfx] ✗ Fi.CI.SPARSE: warning for Fix live busy stats selftest failure

2022-11-04 Thread Patchwork
== Series Details ==

Series: Fix live busy stats selftest failure
URL   : https://patchwork.freedesktop.org/series/110557/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.




Re: [Intel-gfx] [CI 11/15] drm/i915/huc: track delayed HuC load with a fence

2022-11-04 Thread Ceraolo Spurio, Daniele




On 11/4/2022 5:38 PM, Ceraolo Spurio, Daniele wrote:



On 11/4/2022 4:26 PM, Brian Norris wrote:

Hi,

On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:
Don't know if this is real or not yet, hit it while running 
selftests a bit. Something to keep an eye on.


[ 2928.370577] ODEBUG: init destroyed (active state 0) object type: 
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
[ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502 
debug_print_object+0x6b/0x90
[ 2928.370984] Modules linked in: i915(+) drm_display_helper 
drm_kms_helper netconsole cmac algif_hash algif_skcipher af_alg bnep 
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec 
snd_hwdep snd_hda_core snd_pcm intel_tcc_cooling 
x86_pkg_temp_thermal intel_powerclamp snd_seq_midi 
snd_seq_midi_event coretemp snd_rawmidi btusb btrtl btbcm kvm_intel 
btmtk btintel ath10k_pci snd_seq kvm ath10k_core bluetooth snd_timer 
rapl intel_cstate snd_seq_device input_leds mac80211 ecdh_generic 
libarc4 ath snd ecc serio_raw intel_wmi_thunderbolt at24 soundcore 
cfg80211 mei_me intel_xhci_usb_role_switch mei ideapad_laptop 
intel_pch_thermal platform_profile sparse_keymap acpi_pad 
sch_fq_codel msr efi_pstore ip_tables x_tables autofs4 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 
aesni_intel prime_numbers crypto_simd atkbd drm_buddy cryptd 
vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci 
syscopyarea ahci
[ 2928.371145]  xhci_pci_renesas sysfillrect sysimgblt libahci 
fb_sys_fops video wmi [last unloaded: drm_kms_helper]
[ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U  
W  6.1.0-rc1 #196
[ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS 
DCCN34WW(V2.03) 12/01/2015

[ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
[ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb 
8b 4b 14 89 15 ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec 
5b 60 00 <0f> 0b 83 05 28 5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a 
5a 3e 01

[ 2928.371782] RSP: 0018:9ed841607a18 EFLAGS: 00010286
[ 2928.371841] RAX:  RBX: 9208116a1d48 RCX: 

[ 2928.371909] RDX: 0001 RSI: bbd277d2 RDI: 

[ 2928.372024] RBP: c176a540 R08:  R09: 
bc07a1e0
[ 2928.372128] R10: 0001 R11: 0001 R12: 
9208122da830
[ 2928.372192] R13: 92080089b000 R14: 9208122da770 R15: 

[ 2928.372259] FS:  7f53e7617c40() GS:92086e50() 
knlGS:

[ 2928.372365] CS:  0010 DS:  ES:  CR0: 80050033
[ 2928.372425] CR2: 55cd28b33070 CR3: 000110dbd006 CR4: 
003706e0

[ 2928.372526] Call Trace:
[ 2928.372568]  
[ 2928.372614]  ? intel_guc_hang_check+0xb0/0xb0 [i915]
[ 2928.373001]  __i915_sw_fence_init+0x2b/0x50 [i915]
[ 2928.373374]  intel_huc_init_early+0x75/0xb0 [i915]
[ 2928.373868]  intel_uc_init_early+0x4e/0x210 [i915]
[ 2928.374241]  intel_gt_common_init_early+0x16f/0x180 [i915]
[ 2928.374718]  intel_root_gt_init_early+0x49/0x60 [i915]
[ 2928.375074]  i915_driver_probe+0x917/0xed0 [i915]

...

Did you track this down? Or consider reverting? This is tripping me up


No. I didn't manage to repro locally after Tvrtko reported it (I run 
the full selftest suite twice on both ADL-S and DG2 with the debug 
config enabled), so I was keeping an eye out as suggested to see if it 
popped out again. If you can repro this consistently, can you share 
your setup info? What platform you're running on, if you're using the 
latest drm-tip, any non-default params you're using, etc. Dmesg would 
also be useful to see if there are other errors before this one.




Just to further clarify, this issue is also not showing up in our CI 
runs (which do have both the DEBUG_OBJECTS kconfigs you pointed out 
enabled), hence why I'm suspecting that this is only happening on 
specific setups, potentially due to a different kconfig or modparam 
being involved.


Daniele


Thanks,
Daniele


on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
any subsequent tests, because of the kernel taint.

Brian






Re: [Intel-gfx] [CI 11/15] drm/i915/huc: track delayed HuC load with a fence

2022-11-04 Thread Ceraolo Spurio, Daniele



On 11/4/2022 4:26 PM, Brian Norris wrote:

Hi,

On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:

Don't know if this is real or not yet, hit it while running selftests a bit. 
Something to keep an eye on.

[ 2928.370577] ODEBUG: init destroyed (active state 0) object type: 
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
[ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502 
debug_print_object+0x6b/0x90
[ 2928.370984] Modules linked in: i915(+) drm_display_helper drm_kms_helper 
netconsole cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio 
snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm intel_tcc_cooling 
x86_pkg_temp_thermal intel_powerclamp snd_seq_midi snd_seq_midi_event coretemp 
snd_rawmidi btusb btrtl btbcm kvm_intel btmtk btintel ath10k_pci snd_seq kvm 
ath10k_core bluetooth snd_timer rapl intel_cstate snd_seq_device input_leds 
mac80211 ecdh_generic libarc4 ath snd ecc serio_raw intel_wmi_thunderbolt at24 
soundcore cfg80211 mei_me intel_xhci_usb_role_switch mei ideapad_laptop 
intel_pch_thermal platform_profile sparse_keymap acpi_pad sch_fq_codel msr 
efi_pstore ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel sha512_ssse3 aesni_intel prime_numbers crypto_simd atkbd 
drm_buddy cryptd vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci 
syscopyarea ahci
[ 2928.371145]  xhci_pci_renesas sysfillrect sysimgblt libahci fb_sys_fops 
video wmi [last unloaded: drm_kms_helper]
[ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U  W  
6.1.0-rc1 #196
[ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 
12/01/2015
[ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
[ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb 8b 4b 14 89 15 
ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec 5b 60 00 <0f> 0b 83 05 28 
5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a 5a 3e 01
[ 2928.371782] RSP: 0018:9ed841607a18 EFLAGS: 00010286
[ 2928.371841] RAX:  RBX: 9208116a1d48 RCX: 
[ 2928.371909] RDX: 0001 RSI: bbd277d2 RDI: 
[ 2928.372024] RBP: c176a540 R08:  R09: bc07a1e0
[ 2928.372128] R10: 0001 R11: 0001 R12: 9208122da830
[ 2928.372192] R13: 92080089b000 R14: 9208122da770 R15: 
[ 2928.372259] FS:  7f53e7617c40() GS:92086e50() 
knlGS:
[ 2928.372365] CS:  0010 DS:  ES:  CR0: 80050033
[ 2928.372425] CR2: 55cd28b33070 CR3: 000110dbd006 CR4: 003706e0
[ 2928.372526] Call Trace:
[ 2928.372568]  
[ 2928.372614]  ? intel_guc_hang_check+0xb0/0xb0 [i915]
[ 2928.373001]  __i915_sw_fence_init+0x2b/0x50 [i915]
[ 2928.373374]  intel_huc_init_early+0x75/0xb0 [i915]
[ 2928.373868]  intel_uc_init_early+0x4e/0x210 [i915]
[ 2928.374241]  intel_gt_common_init_early+0x16f/0x180 [i915]
[ 2928.374718]  intel_root_gt_init_early+0x49/0x60 [i915]
[ 2928.375074]  i915_driver_probe+0x917/0xed0 [i915]

...

Did you track this down? Or consider reverting? This is tripping me up


No. I didn't manage to repro locally after Tvrtko reported it (I run the 
full selftest suite twice on both ADL-S and DG2 with the debug config 
enabled), so I was keeping an eye out as suggested to see if it popped 
out again. If you can repro this consistently, can you share your setup 
info? What platform you're running on, if you're using the latest 
drm-tip, any non-default params you're using, etc. Dmesg would also be 
useful to see if there are other errors before this one.


Thanks,
Daniele


on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
any subsequent tests, because of the kernel taint.

Brian




[Intel-gfx] [PATCH 2/2] drm/i915/selftest: Bump up sample period for busy stats selftest

2022-11-04 Thread Umesh Nerlige Ramappa
Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.

When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. The
latency incurred by this register read causes the failure.

On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.

One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to user
since the CPU timestamp obtained here is only used for (1) selftest and
(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.

In order to make the selftest more robust and account for such
latencies, increase the sample period to 100 ms.

Signed-off-by: Umesh Nerlige Ramappa 
---
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
index 0dcb3ed44a73..87c94314cf67 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
ENGINE_TRACE(engine, "measuring busy time\n");
preempt_disable();
de = intel_engine_get_busy_time(engine, &t[0]);
-   mdelay(10);
+   mdelay(100);
de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
preempt_enable();
dt = ktime_sub(t[1], t[0]);
-- 
2.36.1



[Intel-gfx] [PATCH 1/2] i915/uncore: Acquire fw before loop in intel_uncore_read64_2x32

2022-11-04 Thread Umesh Nerlige Ramappa
PMU reads the GT timestamp as a 2x32 mmio read and since upper and lower
32 bit registers are read in a loop, there is a latency involved between
getting the GT timestamp and the CPU timestamp. As part of the
resolution, refactor intel_uncore_read64_2x32 to acquire forcewake and
uncore lock prior to reading upper and lower regs.

Signed-off-by: Umesh Nerlige Ramappa 
---
 drivers/gpu/drm/i915/intel_uncore.h | 44 -
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.h 
b/drivers/gpu/drm/i915/intel_uncore.h
index 5449146a0624..e9e38490815d 100644
--- a/drivers/gpu/drm/i915/intel_uncore.h
+++ b/drivers/gpu/drm/i915/intel_uncore.h
@@ -382,20 +382,6 @@ __uncore_write(write_notrace, 32, l, false)
  */
 __uncore_read(read64, 64, q, true)
 
-static inline u64
-intel_uncore_read64_2x32(struct intel_uncore *uncore,
-i915_reg_t lower_reg, i915_reg_t upper_reg)
-{
-   u32 upper, lower, old_upper, loop = 0;
-   upper = intel_uncore_read(uncore, upper_reg);
-   do {
-   old_upper = upper;
-   lower = intel_uncore_read(uncore, lower_reg);
-   upper = intel_uncore_read(uncore, upper_reg);
-   } while (upper != old_upper && loop++ < 2);
-   return (u64)upper << 32 | lower;
-}
-
 #define intel_uncore_posting_read(...) 
((void)intel_uncore_read_notrace(__VA_ARGS__))
 #define intel_uncore_posting_read16(...) 
((void)intel_uncore_read16_notrace(__VA_ARGS__))
 
@@ -455,6 +441,36 @@ static inline void intel_uncore_rmw_fw(struct intel_uncore 
*uncore,
intel_uncore_write_fw(uncore, reg, val);
 }
 
+static inline u64
+intel_uncore_read64_2x32(struct intel_uncore *uncore,
+i915_reg_t lower_reg, i915_reg_t upper_reg)
+{
+   u32 upper, lower, old_upper, loop = 0;
+   enum forcewake_domains fw_domains;
+   unsigned long flags;
+
+   fw_domains = intel_uncore_forcewake_for_reg(uncore, lower_reg,
+   FW_REG_READ);
+
+   fw_domains |= intel_uncore_forcewake_for_reg(uncore, upper_reg,
+   FW_REG_READ);
+
+   spin_lock_irqsave(&uncore->lock, flags);
+   intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+   upper = intel_uncore_read_fw(uncore, upper_reg);
+   do {
+   old_upper = upper;
+   lower = intel_uncore_read_fw(uncore, lower_reg);
+   upper = intel_uncore_read_fw(uncore, upper_reg);
+   } while (upper != old_upper && loop++ < 2);
+
+   intel_uncore_forcewake_put__locked(uncore, fw_domains);
+   spin_unlock_irqrestore(&uncore->lock, flags);
+
+   return (u64)upper << 32 | lower;
+}
+
 static inline int intel_uncore_write_and_verify(struct intel_uncore *uncore,
i915_reg_t reg, u32 val,
u32 mask, u32 expected_val)
-- 
2.36.1



[Intel-gfx] [PATCH 0/2] Fix live busy stats selftest failure

2022-11-04 Thread Umesh Nerlige Ramappa
Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.

When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. The
latency incurred by this register read causes the failure.

On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.

In order to make the selftest more robust and account for such
latencies, increase the sample period to 100 ms.

v2: (Tvrtko)
In addition refactor intel_uncore_read64_2x32 to obtain the forcewake
once before reading upper and lower register dwords.

Signed-off-by: Umesh Nerlige Ramappa 

Umesh Nerlige Ramappa (2):
  i915/uncore: Acquire fw before loop in intel_uncore_read64_2x32
  drm/i915/selftest: Bump up sample period for busy stats selftest

 drivers/gpu/drm/i915/gt/selftest_engine_pm.c |  2 +-
 drivers/gpu/drm/i915/intel_uncore.h  | 44 +---
 2 files changed, 31 insertions(+), 15 deletions(-)

-- 
2.36.1



Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Guenter Roeck
On Fri, Nov 04, 2022 at 01:40:53AM -0400, Steven Rostedt wrote:
> 
> Back in April, I posted an RFC patch set to help mitigate a common issue
> where a timer gets armed just before it is freed, and when the timer
> goes off, it crashes in the timer code without any evidence of who the
> culprit was. I got side tracked and never finished up on that patch set.
> Since this type of crash is still our #1 crash we are seeing in the field,
> it has become a priority again to finish it.
> 

After applying the patches attached below, everything compiles for me,
and there are no crashes. There are still various warnings, most in
networking. I know I need to apply some patch(es) to fix the networking
warnings, but I didn't entirely understand what exactly to apply, so
I didn't try.

Complete logs are at https://kerneltests.org/builders, on the bottom half
of the page (qemu tests, in the 'testing' column).

Guenter

---
Warnings:

ODEBUG: free active (active state 0) object type: timer_list hint: 
tcp_write_timer+0x0/0x1d0
from tcp_close -> __sk_destruct -> tcp_write_timer

ODEBUG: free active (active state 0) object type: timer_list hint: 
tcp_keepalive_timer+0x0/0x4c0
from tcp_close -> __sk_destruct -> tcp_keepalive_timer -> 
__del_timer_sync

ODEBUG: free active (active state 0) object type: timer_list hint: 
blk_rq_timed_out_timer+0x0/0x40
blk_free_queue_rcu -> blk_free_queue_rcu -> blk_rq_timed_out_timer

---
Changes applied on top of patch set to fix build errors:

diff --git a/arch/arm/mach-spear/time.c b/arch/arm/mach-spear/time.c
index e979e2197f8e..5371c824786d 100644
--- a/arch/arm/mach-spear/time.c
+++ b/arch/arm/mach-spear/time.c
@@ -90,7 +90,7 @@ static void __init spear_clocksource_init(void)
200, 16, clocksource_mmio_readw_up);
 }
 
-static inline void timer_shutdown(struct clock_event_device *evt)
+static inline void spear_timer_shutdown(struct clock_event_device *evt)
 {
u16 val = readw(gpt_base + CR(CLKEVT));
 
@@ -101,7 +101,7 @@ static inline void timer_shutdown(struct clock_event_device 
*evt)
 
 static int spear_shutdown(struct clock_event_device *evt)
 {
-   timer_shutdown(evt);
+   spear_timer_shutdown(evt);
 
return 0;
 }
@@ -111,7 +111,7 @@ static int spear_set_oneshot(struct clock_event_device *evt)
u16 val;
 
/* stop the timer */
-   timer_shutdown(evt);
+   spear_timer_shutdown(evt);
 
val = readw(gpt_base + CR(CLKEVT));
val |= CTRL_ONE_SHOT;
@@ -126,7 +126,7 @@ static int spear_set_periodic(struct clock_event_device 
*evt)
u16 val;
 
/* stop the timer */
-   timer_shutdown(evt);
+   spear_timer_shutdown(evt);
 
period = clk_get_rate(gpt_clk) / HZ;
period >>= CTRL_PRESCALER16;
diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index a7ff77550e17..9c3420a0d19d 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -687,8 +687,8 @@ static irqreturn_t arch_timer_handler_virt_mem(int irq, 
void *dev_id)
return timer_handler(ARCH_TIMER_MEM_VIRT_ACCESS, evt);
 }
 
-static __always_inline int timer_shutdown(const int access,
- struct clock_event_device *clk)
+static __always_inline int arch_timer_shutdown(const int access,
+  struct clock_event_device *clk)
 {
unsigned long ctrl;
 
@@ -701,22 +701,22 @@ static __always_inline int timer_shutdown(const int 
access,
 
 static int arch_timer_shutdown_virt(struct clock_event_device *clk)
 {
-   return timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
+   return arch_timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_phys(struct clock_event_device *clk)
 {
-   return timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
+   return arch_timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_virt_mem(struct clock_event_device *clk)
 {
-   return timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
+   return arch_timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
 }
 
 static int arch_timer_shutdown_phys_mem(struct clock_event_device *clk)
 {
-   return timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
+   return arch_timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
 }
 
 static __always_inline void set_next_event(const int access, unsigned long evt,
diff --git a/drivers/clocksource/timer-sp804.c 
b/drivers/clocksource/timer-sp804.c
index e6a87f4af2b5..a3c38e1343f0 100644
--- a/drivers/clocksource/timer-sp804.c
+++ b/drivers/clocksource/timer-sp804.c
@@ -155,14 +155,14 @@ static irqreturn_t sp804_timer_interrupt(int irq, void 
*dev_id)
return IRQ_HANDLED;
 }
 
-static inline void timer_shutdown(struct clock_event_device *evt)
+static inline void sp804_timer_shutdown(struct clock_event_device *evt)
 {
writel(0, common_clkevt->ctrl);
 }
 
 static int sp

[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/2] drm/i915/display: Do both crawl and squash when changing cdclk

2022-11-04 Thread Patchwork
== Series Details ==

Series: series starting with [1/2] drm/i915/display: Do both crawl and squash 
when changing cdclk
URL   : https://patchwork.freedesktop.org/series/110554/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12346 -> Patchwork_110554v1


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/index.html

Participating hosts (39 -> 28)
--

  Additional (2): fi-rkl-11600 fi-tgl-dsi 
  Missing(13): fi-bdw-samus bat-dg2-8 bat-dg2-9 bat-adlp-6 bat-adlp-4 
fi-ctg-p8600 fi-hsw-4770 bat-adln-1 bat-rplp-1 bat-rpls-1 bat-rpls-2 bat-dg2-11 
bat-jsl-1 

Known issues


  Here are the changes found in Patchwork_110554v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_gttfill@basic:
- fi-pnv-d510:[PASS][1] -> [FAIL][2] ([i915#7229])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12346/fi-pnv-d510/igt@gem_exec_gttf...@basic.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-pnv-d510/igt@gem_exec_gttf...@basic.html

  * igt@gem_huc_copy@huc-copy:
- fi-rkl-11600:   NOTRUN -> [SKIP][3] ([i915#2190])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@parallel-random-engines:
- fi-rkl-11600:   NOTRUN -> [SKIP][4] ([i915#4613]) +3 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@gem_lmem_swapp...@parallel-random-engines.html

  * igt@gem_tiled_pread_basic:
- fi-rkl-11600:   NOTRUN -> [SKIP][5] ([i915#3282])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- fi-rkl-11600:   NOTRUN -> [SKIP][6] ([i915#3012])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_selftest@live@gt_heartbeat:
- fi-apl-guc: [PASS][7] -> [DMESG-FAIL][8] ([i915#5334])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12346/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_suspend@basic-s3-without-i915:
- fi-rkl-11600:   NOTRUN -> [INCOMPLETE][9] ([i915#4817])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_chamelium@hdmi-hpd-fast:
- fi-rkl-11600:   NOTRUN -> [SKIP][10] ([fdo#111827]) +7 similar issues
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@kms_chamel...@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
- fi-rkl-11600:   NOTRUN -> [SKIP][11] ([i915#4103])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@kms_cursor_leg...@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
- fi-rkl-11600:   NOTRUN -> [SKIP][12] ([fdo#109285] / [i915#4098])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@kms_force_connector_ba...@force-load-detect.html

  * igt@kms_psr@sprite_plane_onoff:
- fi-rkl-11600:   NOTRUN -> [SKIP][13] ([i915#1072]) +3 similar issues
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@kms_psr@sprite_plane_onoff.html

  * igt@kms_setmode@basic-clone-single-crtc:
- fi-rkl-11600:   NOTRUN -> [SKIP][14] ([i915#3555] / [i915#4098])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@kms_setm...@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-read:
- fi-rkl-11600:   NOTRUN -> [SKIP][15] ([fdo#109295] / [i915#3291] / 
[i915#3708]) +2 similar issues
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@prime_v...@basic-read.html

  * igt@prime_vgem@basic-userptr:
- fi-rkl-11600:   NOTRUN -> [SKIP][16] ([fdo#109295] / [i915#3301] / 
[i915#3708])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-rkl-11600/igt@prime_v...@basic-userptr.html

  
 Possible fixes 

  * igt@i915_selftest@live@hangcheck:
- {fi-ehl-2}: [INCOMPLETE][17] -> [PASS][18]
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12346/fi-ehl-2/igt@i915_selftest@l...@hangcheck.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110554v1/fi-ehl-2/igt@i915_selftest@l...@hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
  the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109284]: https://bugs.freedesktop.org/sho

Re: [Intel-gfx] [CI 11/15] drm/i915/huc: track delayed HuC load with a fence

2022-11-04 Thread Brian Norris
Hi,

On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:
> Don't know if this is real or not yet, hit it while running selftests a bit. 
> Something to keep an eye on.
> 
> [ 2928.370577] ODEBUG: init destroyed (active state 0) object type: 
> i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
> [ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502 
> debug_print_object+0x6b/0x90
> [ 2928.370984] Modules linked in: i915(+) drm_display_helper drm_kms_helper 
> netconsole cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 
> snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio 
> snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm 
> intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_seq_midi 
> snd_seq_midi_event coretemp snd_rawmidi btusb btrtl btbcm kvm_intel btmtk 
> btintel ath10k_pci snd_seq kvm ath10k_core bluetooth snd_timer rapl 
> intel_cstate snd_seq_device input_leds mac80211 ecdh_generic libarc4 ath snd 
> ecc serio_raw intel_wmi_thunderbolt at24 soundcore cfg80211 mei_me 
> intel_xhci_usb_role_switch mei ideapad_laptop intel_pch_thermal 
> platform_profile sparse_keymap acpi_pad sch_fq_codel msr efi_pstore ip_tables 
> x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
> sha512_ssse3 aesni_intel prime_numbers crypto_simd atkbd drm_buddy cryptd 
> vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci syscopyarea 
> ahci
> [ 2928.371145]  xhci_pci_renesas sysfillrect sysimgblt libahci fb_sys_fops 
> video wmi [last unloaded: drm_kms_helper]
> [ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U  W  
> 6.1.0-rc1 #196
> [ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 
> 12/01/2015
> [ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
> [ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb 8b 4b 14 
> 89 15 ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec 5b 60 00 <0f> 0b 
> 83 05 28 5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a 5a 3e 01
> [ 2928.371782] RSP: 0018:9ed841607a18 EFLAGS: 00010286
> [ 2928.371841] RAX:  RBX: 9208116a1d48 RCX: 
> 
> [ 2928.371909] RDX: 0001 RSI: bbd277d2 RDI: 
> 
> [ 2928.372024] RBP: c176a540 R08:  R09: 
> bc07a1e0
> [ 2928.372128] R10: 0001 R11: 0001 R12: 
> 9208122da830
> [ 2928.372192] R13: 92080089b000 R14: 9208122da770 R15: 
> 
> [ 2928.372259] FS:  7f53e7617c40() GS:92086e50() 
> knlGS:
> [ 2928.372365] CS:  0010 DS:  ES:  CR0: 80050033
> [ 2928.372425] CR2: 55cd28b33070 CR3: 000110dbd006 CR4: 
> 003706e0
> [ 2928.372526] Call Trace:
> [ 2928.372568]  
> [ 2928.372614]  ? intel_guc_hang_check+0xb0/0xb0 [i915]
> [ 2928.373001]  __i915_sw_fence_init+0x2b/0x50 [i915]
> [ 2928.373374]  intel_huc_init_early+0x75/0xb0 [i915]
> [ 2928.373868]  intel_uc_init_early+0x4e/0x210 [i915]
> [ 2928.374241]  intel_gt_common_init_early+0x16f/0x180 [i915]
> [ 2928.374718]  intel_root_gt_init_early+0x49/0x60 [i915]
> [ 2928.375074]  i915_driver_probe+0x917/0xed0 [i915]
...

Did you track this down? Or consider reverting? This is tripping me up
on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
any subsequent tests, because of the kernel taint.

Brian


[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/display: Do both crawl and squash when changing cdclk

2022-11-04 Thread Patchwork
== Series Details ==

Series: series starting with [1/2] drm/i915/display: Do both crawl and squash 
when changing cdclk
URL   : https://patchwork.freedesktop.org/series/110554/
State : warning

== Summary ==

Error: dim checkpatch failed
a2db805524d6 drm/i915/display: Do both crawl and squash when changing cdclk
-:161: ERROR:TRAILING_WHITESPACE: trailing whitespace
#161: FILE: drivers/gpu/drm/i915/display/intel_cdclk.c:1833:
+^I * this for MTL. $

total: 1 errors, 0 warnings, 0 checks, 191 lines checked
8fe91359a4db drm/i915/display: Add CDCLK Support for MTL




[Intel-gfx] [PATCH 2/2] drm/i915/display: Add CDCLK Support for MTL

2022-11-04 Thread Anusha Srivatsa
As per bSpec MTL has 38.4 MHz Reference clock.
Adding the cdclk tables and cdclk_funcs that MTL
will use.

v2: Revert to using bxt_get_cdclk()

BSpec: 65243

Cc: Clint Taylor 
Signed-off-by: Anusha Srivatsa 
Reviewed-by: Clint Taylor 
---
 drivers/gpu/drm/i915/display/intel_cdclk.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cdclk.c 
b/drivers/gpu/drm/i915/display/intel_cdclk.c
index d1e0763513be..e7374fd92da9 100644
--- a/drivers/gpu/drm/i915/display/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/display/intel_cdclk.c
@@ -1345,6 +1345,16 @@ static const struct intel_cdclk_vals dg2_cdclk_table[] = 
{
{}
 };
 
+static const struct intel_cdclk_vals mtl_cdclk_table[] = {
+   { .refclk = 38400, .cdclk = 172800, .divider = 2, .ratio = 16, 
.waveform = 0xad5a },
+   { .refclk = 38400, .cdclk = 192000, .divider = 2, .ratio = 16, 
.waveform = 0xb6b6 },
+   { .refclk = 38400, .cdclk = 307200, .divider = 2, .ratio = 16, 
.waveform = 0x },
+   { .refclk = 38400, .cdclk = 48, .divider = 2, .ratio = 25, 
.waveform = 0x },
+   { .refclk = 38400, .cdclk = 556800, .divider = 2, .ratio = 29, 
.waveform = 0x },
+   { .refclk = 38400, .cdclk = 652800, .divider = 2, .ratio = 34, 
.waveform = 0x },
+   {}
+};
+
 static int bxt_calc_cdclk(struct drm_i915_private *dev_priv, int min_cdclk)
 {
const struct intel_cdclk_vals *table = dev_priv->display.cdclk.table;
@@ -3164,6 +3174,13 @@ u32 intel_read_rawclk(struct drm_i915_private *dev_priv)
return freq;
 }
 
+static const struct intel_cdclk_funcs mtl_cdclk_funcs = {
+   .get_cdclk = bxt_get_cdclk,
+   .set_cdclk = bxt_set_cdclk,
+   .modeset_calc_cdclk = bxt_modeset_calc_cdclk,
+   .calc_voltage_level = tgl_calc_voltage_level,
+};
+
 static const struct intel_cdclk_funcs tgl_cdclk_funcs = {
.get_cdclk = bxt_get_cdclk,
.set_cdclk = bxt_set_cdclk,
@@ -3299,7 +3316,10 @@ static const struct intel_cdclk_funcs i830_cdclk_funcs = 
{
  */
 void intel_init_cdclk_hooks(struct drm_i915_private *dev_priv)
 {
-   if (IS_DG2(dev_priv)) {
+   if (IS_METEORLAKE(dev_priv)) {
+   dev_priv->display.funcs.cdclk = &mtl_cdclk_funcs;
+   dev_priv->display.cdclk.table = mtl_cdclk_table;
+   } else if (IS_DG2(dev_priv)) {
dev_priv->display.funcs.cdclk = &tgl_cdclk_funcs;
dev_priv->display.cdclk.table = dg2_cdclk_table;
} else if (IS_ALDERLAKE_P(dev_priv)) {
-- 
2.25.1



[Intel-gfx] [PATCH 1/2] drm/i915/display: Do both crawl and squash when changing cdclk

2022-11-04 Thread Anusha Srivatsa
From: Ville Syrjälä 

For MTL, changing cdclk from between certain frequencies has
both squash and crawl. Use the current cdclk config and
the new(desired) cdclk config to construtc a mid cdclk config.
Set the cdclk twice:
- Current cdclk -> mid cdclk
- mid cdclk -> desired cdclk

v2: Add check in intel_modeset_calc_cdclk() to avoid cdclk
change via modeset for platforms that support squash_crawl sequences(Ville)

v3: Add checks for:
- scenario where only slow clock is used and
cdclk is actually 0 (bringing up display).
- PLLs are on before looking up the waveform.
- Squash and crawl capability checks.(Ville)

v4: Rebase
- Move checks to be more consistent (Ville)
- Add comments (Bala)
v5:
- Further small changes. Move checks around.
- Make if-else better looking (Ville)

v6: MTl should not follow PUnit mailbox communication as the rest of
gen11+ platforms.(Anusha)

Cc: Clint Taylor 
Cc: Balasubramani Vivekanandan 
Signed-off-by: Anusha Srivatsa 
Signed-off-by: Ville Syrjälä 
---
 drivers/gpu/drm/i915/display/intel_cdclk.c | 161 +
 1 file changed, 133 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_cdclk.c 
b/drivers/gpu/drm/i915/display/intel_cdclk.c
index eada931cb1c8..d1e0763513be 100644
--- a/drivers/gpu/drm/i915/display/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/display/intel_cdclk.c
@@ -1716,37 +1716,74 @@ static void dg2_cdclk_squash_program(struct 
drm_i915_private *i915,
intel_de_write(i915, CDCLK_SQUASH_CTL, squash_ctl);
 }
 
-static void bxt_set_cdclk(struct drm_i915_private *dev_priv,
- const struct intel_cdclk_config *cdclk_config,
- enum pipe pipe)
+static int cdclk_squash_divider(u16 waveform)
+{
+   return hweight16(waveform ?: 0x);
+}
+
+static bool cdclk_crawl_and_squash(struct drm_i915_private *i915,
+  const struct intel_cdclk_config 
*old_cdclk_config,
+  const struct intel_cdclk_config 
*new_cdclk_config,
+  struct intel_cdclk_config *mid_cdclk_config)
+{
+   u16 old_waveform, new_waveform, mid_waveform;
+   int size = 16;
+   int div = 2;
+
+   /* Return if both Squash and Crawl are not present */
+   if (!HAS_CDCLK_CRAWL(i915) || !HAS_CDCLK_SQUASH(i915))
+   return false;
+
+   old_waveform = cdclk_squash_waveform(i915, old_cdclk_config->cdclk);
+   new_waveform = cdclk_squash_waveform(i915, new_cdclk_config->cdclk);
+
+   /* Return if Squash only or Crawl only is the desired action */
+   if (old_cdclk_config->vco <= 0 || new_cdclk_config->vco <= 0 ||
+   old_cdclk_config->vco == new_cdclk_config->vco ||
+   old_waveform == new_waveform)
+   return false;
+
+   *mid_cdclk_config = *new_cdclk_config;
+
+   /* Populate the mid_cdclk_config accordingly.
+* - If moving to a higher cdclk, the desired action is squashing.
+* The mid cdclk config should have the new (squash) waveform.
+* - If moving to a lower cdclk, the desired action is crawling.
+* The mid cdclk config should have the new vco.
+*/
+
+   if (cdclk_squash_divider(new_waveform) > 
cdclk_squash_divider(old_waveform)) {
+   mid_cdclk_config->vco = old_cdclk_config->vco;
+   mid_waveform = new_waveform;
+   } else {
+   mid_cdclk_config->vco = new_cdclk_config->vco;
+   mid_waveform = old_waveform;
+   }
+
+   mid_cdclk_config->cdclk = 
DIV_ROUND_CLOSEST(cdclk_squash_divider(mid_waveform) *
+   mid_cdclk_config->vco, size 
* div);
+
+   /* make sure the mid clock came out sane */
+
+   drm_WARN_ON(&i915->drm, mid_cdclk_config->cdclk <
+   min(old_cdclk_config->cdclk, new_cdclk_config->cdclk));
+   drm_WARN_ON(&i915->drm, mid_cdclk_config->cdclk >
+   i915->display.cdclk.max_cdclk_freq);
+   drm_WARN_ON(&i915->drm, cdclk_squash_waveform(i915, 
mid_cdclk_config->cdclk) !=
+   mid_waveform);
+
+   return true;
+}
+
+static void _bxt_set_cdclk(struct drm_i915_private *dev_priv,
+  const struct intel_cdclk_config *cdclk_config,
+  enum pipe pipe)
 {
int cdclk = cdclk_config->cdclk;
int vco = cdclk_config->vco;
u32 val;
u16 waveform;
int clock;
-   int ret;
-
-   /* Inform power controller of upcoming frequency change. */
-   if (DISPLAY_VER(dev_priv) >= 11)
-   ret = skl_pcode_request(&dev_priv->uncore, 
SKL_PCODE_CDCLK_CONTROL,
-   SKL_CDCLK_PREPARE_FOR_CHANGE,
-   SKL_CDCLK_READY_FOR_CHANGE,
-   SKL_CDCLK_READY_FOR_CHANGE, 3);
-   else
-   /*
-* BSpec requires us to wait up to

Re: [Intel-gfx] [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry

2022-11-04 Thread Ville Syrjälä
On Sat, Nov 05, 2022 at 12:59:30AM +0900, Naoya Horiguchi wrote:
> On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> > On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > > +/*
> > > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > > + * Otherwise, returns 0.
> > > + */
> > >  int pud_huge(pud_t pud)
> > >  {
> > > - return !!(pud_val(pud) & _PAGE_PSE);
> > > + return !pud_none(pud) &&
> > > + (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> > >  }
> > 
> > Hi,
> > 
> > This causes i915 to trip a BUG_ON() on x86-32 when I start X.
> 
> Hello,
> 
> Thank you for finding and reporting the issue.
> 
> x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
> supposed to be false on x86-32.  Doing like below looks to me a fix
> (reverting to the original behavior for x86-32):
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 6b3033845c6d..bf73f25aaa32 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
>   */
>  int pud_huge(pud_t pud)
>  {
> +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> return !pud_none(pud) &&
> (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> +#else
> +   return !!(pud_val(pud) & _PAGE_PSE);// or "return 0;" ?
> +#endif
>  }
> 
>  #ifdef CONFIG_HUGETLB_PAGE
> 
> 
> Let me guess what the PUD entry was there when triggering the issue.
> Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
> bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
> true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
> bits (rather than PRESENT and PSE) are set so that pud_none() is false.
> I'm not sure what such a non-present PUD entry does mean.

pud_val()==0 when it blows up, and pud_none() is false because
pgtable-nopmd.h says so with 2 level paging.

And given that I just tested with PAE / 3 level paging, 
and sure enough it no longer blows up.

So looks to me like maybe this new code just doesn't understand
how the levels get folded.

I might also be missing something obvious, but why is it even
necessary to treat PRESENT==0+PSE==0 as a huge entry?

-- 
Ville Syrjälä
Intel


Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Guenter Roeck
On Fri, Nov 04, 2022 at 04:38:34PM -0400, Steven Rostedt wrote:
> On Fri, 4 Nov 2022 15:42:09 -0400
> Steven Rostedt  wrote:
> 
[ ... ]
> 
> > drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> > drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> > drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> > drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
> > ast2600_timer_shutdown;
> > drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
> > fttmr010_timer_shutdown;
> > drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.set_state_shutdown 
> > = fttmr010->timer_shutdown;
> > drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.tick_resume = 
> > fttmr010->timer_shutdown;
> 
> I won't touch structure fields though.
> 

Agreed, same here.

Guenter


Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Guenter Roeck
On Fri, Nov 04, 2022 at 03:42:09PM -0400, Steven Rostedt wrote:
> On Fri, 4 Nov 2022 12:22:32 -0700
> Guenter Roeck  wrote:
> 
> > Unfortunately the renaming caused some symbol conflicts.
> > 
> > Global definition: timer_shutdown
> > 
> >   File Line
> > 0 time.c93 static inline void timer_shutdown(struct 
> > clock_event_device *evt)
> > 1 arm_arch_timer.c 690 static __always_inline int timer_shutdown(const int 
> > access,
> > 2 timer-fttmr010.c 105 int (*timer_shutdown)(struct clock_event_device 
> > *evt);
> > 3 timer-sp804.c158 static inline void timer_shutdown(struct 
> > clock_event_device *evt)
> > 4 timer.h  239 static inline int timer_shutdown(struct timer_list 
> > *timer)
> 
> $ git grep '\btimer_shutdown'
> arch/arm/mach-spear/time.c:static inline void timer_shutdown(struct 
> clock_event_device *evt)
> arch/arm/mach-spear/time.c: timer_shutdown(evt);
> arch/arm/mach-spear/time.c: timer_shutdown(evt);
> arch/arm/mach-spear/time.c: timer_shutdown(evt);
> drivers/clocksource/arm_arch_timer.c:static __always_inline int 
> timer_shutdown(const int access,
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
> drivers/clocksource/timer-fttmr010.c:   int (*timer_shutdown)(struct 
> clock_event_device *evt);
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
> ast2600_timer_shutdown;
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
> fttmr010_timer_shutdown;
> drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.set_state_shutdown = 
> fttmr010->timer_shutdown;
> drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.tick_resume = 
> fttmr010->timer_shutdown;
> drivers/clocksource/timer-sp804.c:static inline void timer_shutdown(struct 
> clock_event_device *evt)
> drivers/clocksource/timer-sp804.c:  timer_shutdown(evt);
> drivers/clocksource/timer-sp804.c:  timer_shutdown(evt);
> 
> Honestly, I think these need to be renamed, as "timer_shutdown()"
> should be specific to the timer code, and not individual timers.

Yes, that is what I did locally. I am repeating my test now with that
change made.

Guenter


Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Steven Rostedt
On Fri, 4 Nov 2022 15:42:09 -0400
Steven Rostedt  wrote:

> $ git grep '\btimer_shutdown'
> arch/arm/mach-spear/time.c:static inline void timer_shutdown(struct 
> clock_event_device *evt)
> arch/arm/mach-spear/time.c: timer_shutdown(evt);
> arch/arm/mach-spear/time.c: timer_shutdown(evt);
> arch/arm/mach-spear/time.c: timer_shutdown(evt);
> drivers/clocksource/arm_arch_timer.c:static __always_inline int 
> timer_shutdown(const int access,
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
> drivers/clocksource/arm_arch_timer.c:   return 
> timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
> drivers/clocksource/timer-fttmr010.c:   int (*timer_shutdown)(struct 
> clock_event_device *evt);



> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
> ast2600_timer_shutdown;
> drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
> fttmr010_timer_shutdown;
> drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.set_state_shutdown = 
> fttmr010->timer_shutdown;
> drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.tick_resume = 
> fttmr010->timer_shutdown;

I won't touch structure fields though.

-- Steve


> drivers/clocksource/timer-sp804.c:static inline void timer_shutdown(struct 
> clock_event_device *evt)
> drivers/clocksource/timer-sp804.c:  timer_shutdown(evt);
> drivers/clocksource/timer-sp804.c:  timer_shutdown(evt);


Re: [Intel-gfx] [PATCH 07/10] vfio-iommufd: Support iommufd for physical VFIO devices

2022-11-04 Thread Jason Gunthorpe
On Tue, Nov 01, 2022 at 08:21:20AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, October 26, 2022 2:51 AM
> > 
> > +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> > +{
> > +   u32 ioas_id;
> > +   u32 device_id;
> > +   int ret;
> > +
> > +   lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +   /*
> > +* If the driver doesn't provide this op then it means the device does
> > +* not do DMA at all. So nothing to do.
> > +*/
> > +   if (!vdev->ops->bind_iommufd)
> > +   return 0;
> 
> Nothing to do or return -EOPNOTSUPP?

As in the other email, nothing to do, driver is "bound" but doesn't
actually need iommufd at all.

> > +   ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (vdev->ops->attach_ioas) {
> 
> __vfio_register_dev() already verifies that all three callbacks must
> co-exist. Then no need to check it again here and later.

Ok

> > +void vfio_iommufd_unbind(struct vfio_device *vdev)
> > +{
> > +   lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +   if (!vdev->iommufd_device)
> > +   return;
> 
> there is no iommufd_device in the emulated path...

Yes, this if should just be deleted

Thanks,
Jason


Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Linus Torvalds
On Fri, Nov 4, 2022 at 12:42 PM Steven Rostedt  wrote:
>
> Linus, should I also add any patches that has already been acked by the
> respective maintainer?

No, I'd prefer to keep only the ones that are 100% unambiguously not
changing any semantics.

  Linus


Re: [Intel-gfx] KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL

2022-11-04 Thread Daniel Latypov
On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
 wrote:
>
> Hi,
>
> I'm facing a couple of issues when testing KUnit with the i915 driver.
>
> The DRM subsystem and the i915 driver has, for a long time, his own
> way to do unit tests, which seems to be added before KUnit.
>
> I'm now checking if it is worth start using KUnit at i915. So, I wrote
> a RFC with some patches adding support for the tests we have to be
> reported using Kernel TAP and KUnit.
>
> There are basically 3 groups of tests there:
>
> - mock tests - check i915 hardware-independent logic;
> - live tests - run some hardware-specific tests;
> - perf tests - check perf support - also hardware-dependent.
>
> As they depend on i915 driver, they run only on x86, with PCI
> stack enabled, but the mock tests run nicely via qemu.
>
> The live and perf tests require a real hardware. As we run them
> together with our CI, which, among other things, test module
> unload/reload and test loading i915 driver with different
> modprobe parameters, the KUnit tests should be able to run as
> a module.
>
> While testing KUnit, I noticed a couple of issues:
>
> 1. kunit.py parser is currently broken when used with modules
>
> the parser expects "TAP version xx" output, but this won't
> happen when loading the kunit test driver.
>
> Are there any plans or patches fixing this issue?

Partially.
Note: we need a header to look for so we can strip prefixes (like timestamps).

But there is a patch in the works to add a TAP header for each
subtest, hopefully in time for 6.2.
This is to match the KTAP spec:
https://kernel.org/doc/html/latest/dev-tools/ktap.html

That should fix it so you can parse one suite's results at a time.
I'm pretty sure it won't fix the case where there's multiple suites
and/or you're trying to parse all test results at once via

$ find /sys/kernel/debug/kunit/ -type f | xargs cat |
./tools/testing/kunit/kunit.py parse

I think that in-kernel code change + some more python changes could
make the above command work, but no one has actively started looking
at that just yet.
Hopefully we can pick this up and also get it done for 6.2 (unless I'm
underestimating how complicated this is).

>
> 2. current->mm is not initialized
>
> Some tests do mmap(). They need the mm user context to be initialized,
> but this is not happening right now.
>
> Are there a way to properly initialize it for KUnit?

Right, this is a consequence of how early built-in KUnit tests are run
after boot.
I think for now, the answer is to make the test module-only.

I know David had some ideas here, but I can't speak to them.

>
> 3. there's no test filters for modules
>
> In order to be able to do proper CI automation, it is needed to
> be able to control what tests will run or not. That's specially
> interesting at development time where some tests may not apply
> or not run properly on new hardware.
>
> Are there any plans to add support for it at kunit_test_suites()
> when the driver is built as module? Ideally, the best would be to
> export a per-module filter_glob parameter on such cases.

I think this is a good idea and is doable. (I think I said as much on
the other thread).

The thinking before was that people would make group tests together in modules.
But if you want to share a single module for many tests, this becomes
more useful.

This has some potential merge conflicts w/ other pending work.
I was also prototyping the ability to tell KUnit "run tests #2 - #5",
so that also touches the filtering code very heavily.
(The goal there is to have kunit.py able to shard up tests and boot
multiple kernels concurrently.)

>
> 4. there are actually 3 levels of tests on i915:
> - Level 1: mock, live, perf
> - Level 2: test group (mmap, fences, ...)
> - Level 3: unit tests
>
> Currently, KUnit seems to have just two levels (test suite and tests).
> Are there a way to add test groups there?

Parameterized tests are the closest we have to a third-level of tests.
But other than that, the answer is no.

I'd need to get more familiar with the existing tests, but I'm pretty
sure parameters won't work for you.

And I don't know if this will get done.

Note: the kunit_parser.py code should be able to handle arbitrary
levels of tests in the output.
This restriction is purely in the in-kernel code.

I had brought up the idea of more layers of tests before.
It would also be useful for
a) sharing expensive setup between multiple tests
b) allowing more granular scope for cleanups (kunit_kmalloc and others)
c) more flexibility in dynamically generating subtests than
parameterized testing

There's some precedent in other unit testing frameworks, for example:
https://pkg.go.dev/testing#T.Run

Daniel


Re: [Intel-gfx] [PATCH v5] overflow: Introduce overflows_type() and castable_to_type()

2022-11-04 Thread Rasmus Villemoes
On 24/10/2022 22.11, Gwan-gyeong Mun wrote:
> From: Kees Cook 
> 
> Implement a robust overflows_type() macro to test if a variable or
> constant value would overflow another variable or type. This can be
> used as a constant expression for static_assert() (which requires a
> constant expression[1][2]) when used on constant values. This must be
> constructed manually, since __builtin_add_overflow() does not produce
> a constant expression[3].
> 
> Additionally adds castable_to_type(), similar to __same_type(), but for
> checking if a constant value would overflow if cast to a given type.
> 

> +#define __overflows_type_constexpr(x, T) (   \
> + is_unsigned_type(typeof(x)) ?   \
> + (x) > type_max(typeof(T)) ? 1 : 0   \
> + : is_unsigned_type(typeof(T)) ? \
> + (x) < 0 || (x) > type_max(typeof(T)) ? 1 : 0\
> + : (x) < type_min(typeof(T)) ||  \
> +   (x) > type_max(typeof(T)) ? 1 : 0)
> +

Can't all these instances of "foo ? 1 : 0" be simplified to "foo"? That
would improve the readability of this thing somewhat IMO.

Rasmus



Re: [Intel-gfx] KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL

2022-11-04 Thread Daniel Latypov
On Fri, Nov 4, 2022 at 12:50 AM Mauro Carvalho Chehab
 wrote:
>
> On Thu, 3 Nov 2022 15:43:26 -0700
> Daniel Latypov  wrote:
>
> > On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
> >  wrote:
> > >
> > > Hi,
> > >
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.
> > >
> > > The DRM subsystem and the i915 driver has, for a long time, his own
> > > way to do unit tests, which seems to be added before KUnit.
> > >
> > > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > > a RFC with some patches adding support for the tests we have to be
> > > reported using Kernel TAP and KUnit.
> > >
> > > There are basically 3 groups of tests there:
> > >
> > > - mock tests - check i915 hardware-independent logic;
> > > - live tests - run some hardware-specific tests;
> > > - perf tests - check perf support - also hardware-dependent.
> > >
> > > As they depend on i915 driver, they run only on x86, with PCI
> > > stack enabled, but the mock tests run nicely via qemu.
> > >
> > > The live and perf tests require a real hardware. As we run them
> > > together with our CI, which, among other things, test module
> > > unload/reload and test loading i915 driver with different
> > > modprobe parameters, the KUnit tests should be able to run as
> > > a module.
> > >
> > > While testing KUnit, I noticed a couple of issues:
> > >
> > > 1. kunit.py parser is currently broken when used with modules
> > >
> > > the parser expects "TAP version xx" output, but this won't
> > > happen when loading the kunit test driver.
> > >
> > > Are there any plans or patches fixing this issue?
> >
> > Partially.
> > Note: we need a header to look for so we can strip prefixes (like 
> > timestamps).
> >
> > But there is a patch in the works to add a TAP header for each
> > subtest, hopefully in time for 6.2.
>
> Good to know.
>
> > This is to match the KTAP spec:
> > https://kernel.org/doc/html/latest/dev-tools/ktap.html
>
> I see.
>
> > That should fix it so you can parse one suite's results at a time.
> > I'm pretty sure it won't fix the case where there's multiple suites
> > and/or you're trying to parse all test results at once via
> >
> > $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> > ./tools/testing/kunit/kunit.py parse
>
> Could you point me to the changeset? perhaps I can write a followup
> patch addressing this case.

rm...@google.com was working on them and should hopefully be able to
send them out real soon.
You should get CC'd on those.

I think the follow-up work is just crafting an example parser input
file and iterating until
  $ ./tools/testing/kunit/kunit.py parse < /tmp/example_input
produces our desired results.

>
> > I think that in-kernel code change + some more python changes could
> > make the above command work, but no one has actively started looking
> > at that just yet.
> > Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> > underestimating how complicated this is).
> >
> > >
> > > 2. current->mm is not initialized
> > >
> > > Some tests do mmap(). They need the mm user context to be initialized,
> > > but this is not happening right now.
> > >
> > > Are there a way to properly initialize it for KUnit?
> >
> > Right, this is a consequence of how early built-in KUnit tests are run
> > after boot.
> > I think for now, the answer is to make the test module-only.
> >
> > I know David had some ideas here, but I can't speak to them.
>
> This is happening when test-i915 is built as module as well.

Oh, I didn't expect that at all.

>
> I suspect that the function which initializes it is mm_alloc() inside
> kernel/fork.c:
>
> struct mm_struct *mm_alloc(void)
> {
> struct mm_struct *mm;
>
> mm = allocate_mm();
> if (!mm)
> return NULL;
>
> memset(mm, 0, sizeof(*mm));
> return mm_init(mm, current, current_user_ns());
> }
>
> As modprobing a test won't fork until all tests run, this never runs.
>
> It seems that the normal usage is at fs/exec.c:
>
> fs/exec.c:  bprm->mm = mm = mm_alloc();
>
> but other places also call it:
>
> arch/arm/mach-rpc/ecard.c:  struct mm_struct * mm = mm_alloc();
> drivers/dma-buf/dma-resv.c: struct mm_struct *mm = mm_alloc();
> include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
> mm/debug_vm_pgtable.c:  args->mm = mm_alloc();
>
> Probably the solution would be to call it inside kunit executor code,
> adding support for modules to use it.

I know basically nothing about the mm code.
I think I vaguely recall there being issues with this on UML or
something, but I could be totally wrong.

I'll wait for David to chime in when he can.

>
> > > 3. there's no test filters for modules
> > >
> > > In order to be able to do proper CI automation, it is needed to
> > > be able to control what tests will run or not. That's specially
> > > int

Re: [Intel-gfx] [PATCH v7 0/9] dyndbg: drm.debug adaptation

2022-11-04 Thread Jason Baron




On 10/31/22 6:11 PM, jim.cro...@gmail.com wrote:

On Mon, Oct 31, 2022 at 7:07 AM Ville Syrjälä
 wrote:

On Sun, Oct 30, 2022 at 08:42:52AM -0600, jim.cro...@gmail.com wrote:

On Thu, Oct 27, 2022 at 2:10 PM Ville Syrjälä
 wrote:

On Thu, Oct 27, 2022 at 01:55:39PM -0600, jim.cro...@gmail.com wrote:

On Thu, Oct 27, 2022 at 9:59 AM Ville Syrjälä
 wrote:

On Thu, Oct 27, 2022 at 09:37:52AM -0600, jim.cro...@gmail.com wrote:

On Thu, Oct 27, 2022 at 9:08 AM Jason Baron  wrote:



On 10/21/22 05:18, Jani Nikula wrote:

On Thu, 20 Oct 2022, Ville Syrjälä  wrote:

On Sat, Sep 24, 2022 at 03:02:34PM +0200, Greg KH wrote:

On Sun, Sep 11, 2022 at 11:28:43PM -0600, Jim Cromie wrote:

hi Greg, Dan, Jason, DRM-folk,

heres follow-up to V6:
   rebased on driver-core/driver-core-next for -v6 applied bits (thanks)
   rework drm_debug_enabled{_raw,_instrumented,} per Dan.

It excludes:
   nouveau parts (immature)
   tracefs parts (I missed --to=Steve on v6)
   split _ddebug_site and de-duplicate experiment (way unready)

IOW, its the remaining commits of V6 on which Dan gave his Reviewed-by.

If these are good to apply, I'll rebase and repost the rest separately.

All now queued up, thanks.

This stuff broke i915 debugs. When I first load i915 no debug prints are
produced. If I then go fiddle around in /sys/module/drm/parameters/debug
the debug prints start to suddenly work.

Wait what? I always assumed the default behaviour would stay the same,
which is usually how we roll. It's a regression in my books. We've got a
CI farm that's not very helpful in terms of dmesg logging right now
because of this.

BR,
Jani.



That doesn't sound good - so you are saying that prior to this change some
of the drm debugs were default enabled. But now you have to manually enable
them?

Thanks,

-Jason


Im just seeing this now.
Any new details ?

No. We just disabled it as BROKEN for now. I was just today thinking
about sending that patch out if no solutin is forthcoming soon since
we need this working before 6.1 is released.

Pretty sure you should see the problem immediately with any driver
(at least if it's built as a module, didn't try builtin). Or at least
can't think what would make i915 any more special.


So, I should note -
99% of my time & energy on this dyndbg + drm patchset
has been done using virtme,
so my world-view (and dev-hack-test env) has been smaller, simpler
maybe its been fatally simplistic.

ive just rebuilt v6.0  (before the trouble)
and run it thru my virtual home box,
I didnt see any unfamiliar drm-debug output
that I might have inadvertently altered somehow

I have some real HW I can put a reference kernel on,0
to look for the missing output, but its all gonna take some time,
esp to tighten up my dev-test-env

in the meantime, there is:

config DRM_USE_DYNAMIC_DEBUG
bool "use dynamic debug to implement drm.debug"
default y
depends on DRM
depends on DYNAMIC_DEBUG || DYNAMIC_DEBUG_CORE
depends on JUMP_LABEL
help
   Use dynamic-debug to avoid drm_debug_enabled() runtime overheads.
   Due to callsite counts in DRM drivers (~4k in amdgpu) and 56
   bytes per callsite, the .data costs can be substantial, and
   are therefore configurable.

Does changing the default fix things for i915 dmesg ?

I think we want to mark it BROKEN in addition to make sure no one

Ok, I get the distinction now.
youre spelling that
   depends on BROKEN

I have a notional explanation, and a conflating commit:

can you eliminate
git log -p ccc2b496324c13e917ef05f563626f4e7826bef1

as the cause ?

Reverting that doesn't help.


thanks for eliminating it.


I do need to clarify, I dont know exactly what debug/logging output
is missing such that CI is failing

CI isn't failing. But any logs it produces are 100% useless,
as are any user reported logs.

The debugs that are missing are anything not coming directly
from drm.ko.

The stuff that I see being printed by i915.ko are drm_info()
and the drm_printer stuff from i915_welcome_messages(). That
also implies that drm_debug_enabled(DRM_UT_DRIVER) does at
least still work correctly.

I suspect that the problem is just that the debug calls
aren't getting patched in when a module loads. And fiddling
with the modparam after the fact does trigger that somehow.


ok, heres the 'tape' of a virtme boot,
then modprobe going wrong.

[1.785873] dyndbg:   2 debug prints in module intel_rapl_msr
[2.040598] virtme-init: udev is done
virtme-init: console is ttyS0


load drm driver

bash-5.2# modprobe i915


drm module is loaded 1st

[6.549451] dyndbg: add-module: drm.302 sites
[6.549991] dyndbg: class[0]: module:drm base:0 len:10 ty:0
[6.550647] dyndbg:  0: 0 DRM_UT_CORE
[6.551097] dyndbg:  1: 1 DRM_UT_DRIVER
[6.551531] dyndbg:  2: 2 DRM_UT_KMS
[6.551931] dyndbg:  3: 3 DRM_UT_PRIME
[6.552402] dyndbg:  4: 4 DRM_UT_ATOMIC
[6.552799] dyndbg:  5: 5 DRM_UT_VBL
[6.553270] dyndbg:  6: 6 DRM_UT_STATE
[6.553634] dyndbg:  7: 7 DRM_UT_LEASE
[6.554043] dyndbg:  8: 8 

Re: [Intel-gfx] [mm-unstable PATCH v7 2/8] mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry

2022-11-04 Thread Naoya Horiguchi
On Wed, Nov 02, 2022 at 10:51:40PM +0200, Ville Syrjälä wrote:
> On Thu, Jul 14, 2022 at 01:24:14PM +0900, Naoya Horiguchi wrote:
> > +/*
> > + * pud_huge() returns 1 if @pud is hugetlb related entry, that is normal
> > + * hugetlb entry or non-present (migration or hwpoisoned) hugetlb entry.
> > + * Otherwise, returns 0.
> > + */
> >  int pud_huge(pud_t pud)
> >  {
> > -   return !!(pud_val(pud) & _PAGE_PSE);
> > +   return !pud_none(pud) &&
> > +   (pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
> >  }
> 
> Hi,
> 
> This causes i915 to trip a BUG_ON() on x86-32 when I start X.

Hello,

Thank you for finding and reporting the issue.

x86-32 does not enable CONFIG_ARCH_HAS_GIGANTIC_PAGE, so pud_huge() is
supposed to be false on x86-32.  Doing like below looks to me a fix
(reverting to the original behavior for x86-32):


diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 6b3033845c6d..bf73f25aaa32 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -37,8 +37,12 @@ int pmd_huge(pmd_t pmd)
  */
 int pud_huge(pud_t pud)
 {
+#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
return !pud_none(pud) &&
(pud_val(pud) & (_PAGE_PRESENT|_PAGE_PSE)) != _PAGE_PRESENT;
+#else
+   return !!(pud_val(pud) & _PAGE_PSE);// or "return 0;" ?
+#endif
 }

 #ifdef CONFIG_HUGETLB_PAGE


Let me guess what the PUD entry was there when triggering the issue.
Assuming that the original code (before 3a194f3f8ad0) was correct, the PSE
bit in pud_val(pud) should be always cleared.  So, when pud_huge() returns
true since 3a194f3f8ad0, the PRESENT bit should be clear and some other
bits (rather than PRESENT and PSE) are set so that pud_none() is false.
I'm not sure what such a non-present PUD entry does mean.

Thanks,
Naoya Horiguchi

> 
> [  225.777375] kernel BUG at mm/memory.c:2664!
> [  225.777391] invalid opcode:  [#1] PREEMPT SMP
> [  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
> [  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
> [  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
> [  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 
> 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 
> 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
> [  225.777446] EAX: 0001 EBX: c53a3b58 ECX: b5c0 EDX: c258aa00
> [  225.777454] ESI: b5c0 EDI: b590 EBP: c4b0fdb4 ESP: c4b0fd80
> [  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
> [  225.777470] CR0: 80050033 CR2: b590 CR3: 053a3000 CR4: 06d0
> [  225.777479] Call Trace:
> [  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.777684]  apply_to_page_range+0x21/0x27
> [  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.777870]  remap_io_mapping+0x49/0x75 [i915]
> [  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
> [  225.778220]  ? mutex_unlock+0xb/0xd
> [  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
> [  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
> [  225.778644]  ? lock_is_held_type+0x56/0xe7
> [  225.778655]  ? lock_is_held_type+0x7a/0xe7
> [  225.778663]  ? 0xc100
> [  225.778670]  __do_fault+0x21/0x6a
> [  225.778679]  handle_mm_fault+0x708/0xb21
> [  225.778686]  ? mt_find+0x21e/0x5ae
> [  225.778696]  exc_page_fault+0x185/0x705
> [  225.778704]  ? doublefault_shim+0x127/0x127
> [  225.778715]  handle_exception+0x130/0x130
> [  225.778723] EIP: 0xb700468a
> [  225.778730] Code: 44 24 40 8b 7c 24 1c 89 47 54 8b 44 24 5c 65 2b 05 14 00 
> 00 00 0f 85 8a 01 00 00 83 c4 6c 5b 5e 5f 5d c3 8b 44 24 1c 8b 40 28  00 
> 00 00 00 00 8b 44 24 20 8d 90 20 1b 00 00 8b 02 83 e8 01 89
> [  225.778738] EAX: b590 EBX: b7148000 ECX:  EDX: 
> [  225.778745] ESI: 0103eb60 EDI: b7148000 EBP: b6cf7000 ESP: bfd76650
> [  225.778752] DS: 007b ES: 007b FS:  GS: 0033 SS: 007b EFLAGS: 00010246
> [  225.778761]  ? doublefault_shim+0x127/0x127
> [  225.778769] Modules linked in: i915 prime_numbers i2c_algo_bit iosf_mbi 
> drm_buddy video wmi drm_display_helper drm_kms_helper syscopyarea sysfillrect 
> sysimgblt fb_sys_fops ttm drm drm_panel_orientation_quirks backlight cfg80211 
> rfkill sch_fq_codel xt_tcpudp xt_multiport xt_state iptable_filter 
> iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc 
> i2c_dev iTCO_wdt snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer 
> psmouse i2c_i801 snd i2c_smbus uhci_hcd i2c_core pcspkr soundcore lpc_ich 
> mfd_core ehci_pci ehci_hcd skge intel_agp intel_gtt usbcore agpgart 
> usb_common rng_core parport_pc parport evdev
> [  225.778899] ---[ end trace  ]---
> [  225.778906] EIP: __apply_to_page_range+0x24d/0x31c
> [  225.778916] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 
> 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 
> 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 

Re: [Intel-gfx] [PATCH v5 1/3] drm: Use XArray instead of IDR for minors

2022-11-04 Thread Oded Gabbay
On Mon, Sep 12, 2022 at 12:17 AM Michał Winiarski
 wrote:
>
> IDR is deprecated, and since XArray manages its own state with internal
> locking, it simplifies the locking on DRM side.
> Additionally, don't use the IRQ-safe variant, since operating on drm
> minor is not done in IRQ context.
>
> Signed-off-by: Michał Winiarski 
> Suggested-by: Matthew Wilcox 
> ---
>  drivers/gpu/drm/drm_drv.c | 51 ++-
>  1 file changed, 18 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 8214a0b1ab7f..61d24cdcd0f8 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -53,8 +54,7 @@ MODULE_AUTHOR("Gareth Hughes, Leif Delgass, José Fonseca, 
> Jon Smirl");
>  MODULE_DESCRIPTION("DRM shared core routines");
>  MODULE_LICENSE("GPL and additional rights");
>
> -static DEFINE_SPINLOCK(drm_minor_lock);
> -static struct idr drm_minors_idr;
> +static DEFINE_XARRAY_ALLOC(drm_minors_xa);
>
>  /*
>   * If the drm core fails to init for whatever reason,
> @@ -98,21 +98,19 @@ static struct drm_minor **drm_minor_get_slot(struct 
> drm_device *dev,
>  static void drm_minor_alloc_release(struct drm_device *dev, void *data)
>  {
> struct drm_minor *minor = data;
> -   unsigned long flags;
>
> WARN_ON(dev != minor->dev);
>
> put_device(minor->kdev);
>
> -   spin_lock_irqsave(&drm_minor_lock, flags);
> -   idr_remove(&drm_minors_idr, minor->index);
> -   spin_unlock_irqrestore(&drm_minor_lock, flags);
> +   xa_erase(&drm_minors_xa, minor->index);
>  }
>
> +#define DRM_MINOR_LIMIT(t) ({ typeof(t) _t = (t); XA_LIMIT(64 * _t, 64 * _t 
> + 63); })
> +
>  static int drm_minor_alloc(struct drm_device *dev, unsigned int type)
>  {
> struct drm_minor *minor;
> -   unsigned long flags;
> int r;
>
> minor = drmm_kzalloc(dev, sizeof(*minor), GFP_KERNEL);
> @@ -122,21 +120,10 @@ static int drm_minor_alloc(struct drm_device *dev, 
> unsigned int type)
> minor->type = type;
> minor->dev = dev;
>
> -   idr_preload(GFP_KERNEL);
> -   spin_lock_irqsave(&drm_minor_lock, flags);
> -   r = idr_alloc(&drm_minors_idr,
> - NULL,
> - 64 * type,
> - 64 * (type + 1),
> - GFP_NOWAIT);
> -   spin_unlock_irqrestore(&drm_minor_lock, flags);
> -   idr_preload_end();
> -
> +   r = xa_alloc(&drm_minors_xa, &minor->index, NULL, 
> DRM_MINOR_LIMIT(type), GFP_KERNEL);
> if (r < 0)
> return r;
>
> -   minor->index = r;
> -
> r = drmm_add_action_or_reset(dev, drm_minor_alloc_release, minor);
> if (r)
> return r;
> @@ -152,7 +139,7 @@ static int drm_minor_alloc(struct drm_device *dev, 
> unsigned int type)
>  static int drm_minor_register(struct drm_device *dev, unsigned int type)
>  {
> struct drm_minor *minor;
> -   unsigned long flags;
> +   void *entry;
> int ret;
>
> DRM_DEBUG("\n");
> @@ -172,9 +159,12 @@ static int drm_minor_register(struct drm_device *dev, 
> unsigned int type)
> goto err_debugfs;
>
> /* replace NULL with @minor so lookups will succeed from now on */
> -   spin_lock_irqsave(&drm_minor_lock, flags);
> -   idr_replace(&drm_minors_idr, minor, minor->index);
> -   spin_unlock_irqrestore(&drm_minor_lock, flags);
> +   entry = xa_cmpxchg(&drm_minors_xa, minor->index, NULL, &minor, 
> GFP_KERNEL);
I believe we should pass in "minor", without the &, as &minor will
give you the address of the local pointer.

Oded

> +   if (xa_is_err(entry)) {
> +   ret = xa_err(entry);
> +   goto err_debugfs;
> +   }
> +   WARN_ON(entry);
>
> DRM_DEBUG("new minor registered %d\n", minor->index);
> return 0;
> @@ -187,16 +177,13 @@ static int drm_minor_register(struct drm_device *dev, 
> unsigned int type)
>  static void drm_minor_unregister(struct drm_device *dev, unsigned int type)
>  {
> struct drm_minor *minor;
> -   unsigned long flags;
>
> minor = *drm_minor_get_slot(dev, type);
> if (!minor || !device_is_registered(minor->kdev))
> return;
>
> /* replace @minor with NULL so lookups will fail from now on */
> -   spin_lock_irqsave(&drm_minor_lock, flags);
> -   idr_replace(&drm_minors_idr, NULL, minor->index);
> -   spin_unlock_irqrestore(&drm_minor_lock, flags);
> +   xa_store(&drm_minors_xa, minor->index, NULL, GFP_KERNEL);
>
> device_del(minor->kdev);
> dev_set_drvdata(minor->kdev, NULL); /* safety belt */
> @@ -215,13 +202,12 @@ static void drm_minor_unregister(struct drm_device 
> *dev, unsigned int type)
>  struct drm_minor *drm_minor_acquire(unsigned int minor_id)
>  {
> struct drm_minor *mino

Re: [Intel-gfx] [PATCH v2 7/7] vfio: Remove vfio_free_device

2022-11-04 Thread Anthony Krowiak

Reviewed-by: Tony Krowiak  : vfio_ap part

On 11/2/22 11:01 AM, Eric Farman wrote:

With the "mess" sorted out, we should be able to inline the
vfio_free_device call introduced by commit cb9ff3f3b84c
("vfio: Add helpers for unifying vfio_device life cycle")
and remove them from driver release callbacks.

Signed-off-by: Eric Farman 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
---
  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 -
  drivers/s390/cio/vfio_ccw_ops.c   |  2 --
  drivers/s390/crypto/vfio_ap_ops.c |  6 --
  drivers/vfio/fsl-mc/vfio_fsl_mc.c |  1 -
  drivers/vfio/pci/vfio_pci_core.c  |  1 -
  drivers/vfio/platform/vfio_amba.c |  1 -
  drivers/vfio/platform/vfio_platform.c |  1 -
  drivers/vfio/vfio_main.c  | 22 --
  include/linux/vfio.h  |  1 -
  samples/vfio-mdev/mbochs.c|  1 -
  samples/vfio-mdev/mdpy.c  |  1 -
  samples/vfio-mdev/mtty.c  |  1 -
  12 files changed, 4 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 7a45e5360caf..eee6805e67de 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1461,7 +1461,6 @@ static void intel_vgpu_release_dev(struct vfio_device 
*vfio_dev)
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
  
  	intel_gvt_destroy_vgpu(vgpu);

-   vfio_free_device(vfio_dev);
  }
  
  static const struct vfio_device_ops intel_vgpu_dev_ops = {

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 1155f8bcedd9..598a3814d428 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -143,8 +143,6 @@ static void vfio_ccw_mdev_release_dev(struct vfio_device 
*vdev)
kmem_cache_free(vfio_ccw_io_region, private->io_region);
kfree(private->cp.guest_cp);
mutex_destroy(&private->io_mutex);
-
-   vfio_free_device(vdev);
  }
  
  static void vfio_ccw_mdev_remove(struct mdev_device *mdev)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 0b4cc8c597ae..f108c0f14712 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -765,11 +765,6 @@ static void vfio_ap_mdev_unlink_fr_queues(struct 
ap_matrix_mdev *matrix_mdev)
}
  }
  
-static void vfio_ap_mdev_release_dev(struct vfio_device *vdev)

-{
-   vfio_free_device(vdev);
-}
-
  static void vfio_ap_mdev_remove(struct mdev_device *mdev)
  {
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(&mdev->dev);
@@ -1784,7 +1779,6 @@ static const struct attribute_group vfio_queue_attr_group 
= {
  
  static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {

.init = vfio_ap_mdev_init_dev,
-   .release = vfio_ap_mdev_release_dev,
.open_device = vfio_ap_mdev_open_device,
.close_device = vfio_ap_mdev_close_device,
.ioctl = vfio_ap_mdev_ioctl,
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c 
b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index b16874e913e4..7b8889f55007 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -568,7 +568,6 @@ static void vfio_fsl_mc_release_dev(struct vfio_device 
*core_vdev)
  
  	vfio_fsl_uninit_device(vdev);

mutex_destroy(&vdev->igate);
-   vfio_free_device(core_vdev);
  }
  
  static int vfio_fsl_mc_remove(struct fsl_mc_device *mc_dev)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index badc9d828cac..9be2d5be5d95 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2109,7 +2109,6 @@ void vfio_pci_core_release_dev(struct vfio_device 
*core_vdev)
mutex_destroy(&vdev->vma_lock);
kfree(vdev->region);
kfree(vdev->pm_save);
-   vfio_free_device(core_vdev);
  }
  EXPORT_SYMBOL_GPL(vfio_pci_core_release_dev);
  
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c

index eaea63e5294c..18faf2678b99 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -95,7 +95,6 @@ static void vfio_amba_release_dev(struct vfio_device 
*core_vdev)
  
  	vfio_platform_release_common(vdev);

kfree(vdev->name);
-   vfio_free_device(core_vdev);
  }
  
  static void vfio_amba_remove(struct amba_device *adev)

diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
index 82cedcebfd90..9910451dc341 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -83,7 +83,6 @@ static void vfio_platform_release_dev(struct vfio_device 
*core_vdev)
container_of(core_vdev, struct vfio_platform_device, vdev);
  
  	vfio_platform_release_common(vdev);

-   vfio_free_device(core_vdev);
  }
  
  static int vfio_platform_remove(struct platform_device *pdev)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_

Re: [Intel-gfx] [PATCH 05/10] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()

2022-11-04 Thread Jason Gunthorpe
On Thu, Nov 03, 2022 at 04:38:16AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, November 1, 2022 8:26 PM
> > And this:
> > 
> > /*
> >  * If the device does not have
> > IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
> >  * any domain later attached to it will also not support it. If the cap
> >  * is set then the iommu_domain eventually attached to the
> > device/group
> >  * must must use a domain with enforce_cache_coherency().
> >  */
> 
> duplicated 'must'

Done

Jason


Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Steven Rostedt
On Fri, 4 Nov 2022 12:22:32 -0700
Guenter Roeck  wrote:

> Unfortunately the renaming caused some symbol conflicts.
> 
> Global definition: timer_shutdown
> 
>   File Line
> 0 time.c93 static inline void timer_shutdown(struct 
> clock_event_device *evt)
> 1 arm_arch_timer.c 690 static __always_inline int timer_shutdown(const int 
> access,
> 2 timer-fttmr010.c 105 int (*timer_shutdown)(struct clock_event_device *evt);
> 3 timer-sp804.c158 static inline void timer_shutdown(struct 
> clock_event_device *evt)
> 4 timer.h  239 static inline int timer_shutdown(struct timer_list 
> *timer)

$ git grep '\btimer_shutdown'
arch/arm/mach-spear/time.c:static inline void timer_shutdown(struct 
clock_event_device *evt)
arch/arm/mach-spear/time.c: timer_shutdown(evt);
arch/arm/mach-spear/time.c: timer_shutdown(evt);
arch/arm/mach-spear/time.c: timer_shutdown(evt);
drivers/clocksource/arm_arch_timer.c:static __always_inline int 
timer_shutdown(const int access,
drivers/clocksource/arm_arch_timer.c:   return 
timer_shutdown(ARCH_TIMER_VIRT_ACCESS, clk);
drivers/clocksource/arm_arch_timer.c:   return 
timer_shutdown(ARCH_TIMER_PHYS_ACCESS, clk);
drivers/clocksource/arm_arch_timer.c:   return 
timer_shutdown(ARCH_TIMER_MEM_VIRT_ACCESS, clk);
drivers/clocksource/arm_arch_timer.c:   return 
timer_shutdown(ARCH_TIMER_MEM_PHYS_ACCESS, clk);
drivers/clocksource/timer-fttmr010.c:   int (*timer_shutdown)(struct 
clock_event_device *evt);
drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown(evt);
drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
ast2600_timer_shutdown;
drivers/clocksource/timer-fttmr010.c:   fttmr010->timer_shutdown = 
fttmr010_timer_shutdown;
drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.set_state_shutdown = 
fttmr010->timer_shutdown;
drivers/clocksource/timer-fttmr010.c:   fttmr010->clkevt.tick_resume = 
fttmr010->timer_shutdown;
drivers/clocksource/timer-sp804.c:static inline void timer_shutdown(struct 
clock_event_device *evt)
drivers/clocksource/timer-sp804.c:  timer_shutdown(evt);
drivers/clocksource/timer-sp804.c:  timer_shutdown(evt);

Honestly, I think these need to be renamed, as "timer_shutdown()"
should be specific to the timer code, and not individual timers.

I'll start making a patch set that starts by renaming these timers,
then adds the timer_shutdown() API, and finished with the trivial
updates, and that will be a real "PATCH" (non RFC).

Linus, should I also add any patches that has already been acked by the
respective maintainer?

-- Steve


Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for Fix for two GuC issues (rev2)

2022-11-04 Thread John Harrison

On 11/2/2022 21:45, Patchwork wrote:

Project List - Patchwork *Patch Details*
*Series:*   Fix for two GuC issues (rev2)
*URL:*  https://patchwork.freedesktop.org/series/110269/
*State:*failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110269v2/index.html



  CI Bug Log - changes from CI_DRM_12332_full -> Patchwork_110269v2_full


Summary

*FAILURE*

Serious unknown changes coming with Patchwork_110269v2_full absolutely 
need to be

verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_110269v2_full, please notify your bug team to 
allow them
to document this new failure mode, which will reduce false positives 
in CI.



Participating hosts (9 -> 9)

No changes in participating hosts


Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_110269v2_full:



  CI changes


Possible regressions

  * boot:
  o shard-iclb: (PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

)
-> (PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS



Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Guenter Roeck
On Fri, Nov 04, 2022 at 01:40:53AM -0400, Steven Rostedt wrote:
> 
> Back in April, I posted an RFC patch set to help mitigate a common issue
> where a timer gets armed just before it is freed, and when the timer
> goes off, it crashes in the timer code without any evidence of who the
> culprit was. I got side tracked and never finished up on that patch set.
> Since this type of crash is still our #1 crash we are seeing in the field,
> it has become a priority again to finish it.
> 
> This is v3 of that patch set. Thomas Gleixner posted an untested version
> that makes timer->function NULL as the flag that it is shutdown. I took that
> code, tested it (fixed it up), added more comments, and changed the
> name to timer_shutdown_sync(). I also converted it to use WARN_ON_ONCE()
> instead of just WARN_ON() as Linus asked for.
> 

Unfortunately the renaming caused some symbol conflicts.

Global definition: timer_shutdown

  File Line
0 time.c93 static inline void timer_shutdown(struct 
clock_event_device *evt)
1 arm_arch_timer.c 690 static __always_inline int timer_shutdown(const int 
access,
2 timer-fttmr010.c 105 int (*timer_shutdown)(struct clock_event_device *evt);
3 timer-sp804.c158 static inline void timer_shutdown(struct 
clock_event_device *evt)
4 timer.h  239 static inline int timer_shutdown(struct timer_list 
*timer)

Guenter


Re: [Intel-gfx] [PATCH 2/2] drm/i915/mtl: Enable Idle Messaging for GSC CS

2022-11-04 Thread Belgaumkar, Vinay



On 10/31/2022 8:36 PM, Badal Nilawar wrote:

From: Vinay Belgaumkar 

By defaut idle mesaging is disabled for GSC CS so to unblock RC6
entry on media tile idle messaging need to be enabled.

C6 entry instead of RC6. Also *needs*.


Bspec: 71496

Cc: Daniele Ceraolo Spurio 
Signed-off-by: Vinay Belgaumkar 
Signed-off-by: Badal Nilawar 
---
  drivers/gpu/drm/i915/gt/intel_engine_pm.c | 12 
  drivers/gpu/drm/i915/gt/intel_gt_regs.h   |  3 +++
  2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c 
b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index b0a4a2dbe3ee..8d391f8fd861 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -15,6 +15,7 @@
  #include "intel_rc6.h"
  #include "intel_ring.h"
  #include "shmem_utils.h"
+#include "intel_gt_regs.h"
  
  static void dbg_poison_ce(struct intel_context *ce)

  {
@@ -271,10 +272,21 @@ static const struct intel_wakeref_ops wf_ops = {
  
  void intel_engine_init__pm(struct intel_engine_cs *engine)

  {
+   struct drm_i915_private *i915 = engine->i915;
struct intel_runtime_pm *rpm = engine->uncore->rpm;
  
  	intel_wakeref_init(&engine->wakeref, rpm, &wf_ops);

intel_engine_init_heartbeat(engine);
+
+   if (IS_METEORLAKE(i915) && engine->id == GSC0) {
+   intel_uncore_write(engine->gt->uncore,
+  RC_PSMI_CTRL_GSCCS,
+  _MASKED_BIT_DISABLE(IDLE_MSG_DISABLE));
+   drm_dbg(&i915->drm,
+   "Set GSC CS Idle Reg to: 0x%x",
+   intel_uncore_read(engine->gt->uncore, 
RC_PSMI_CTRL_GSCCS));

Do we need the debug print here?

+   }
+
  }
  
  /**

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h 
b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index f4624262dc81..176902a9f2a2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -908,6 +908,9 @@
  #define  MSG_IDLE_FW_MASK REG_GENMASK(13, 9)
  #define  MSG_IDLE_FW_SHIFT9
  
+#define	RC_PSMI_CTRL_GSCCS	_MMIO(0x11a050)

+#define IDLE_MSG_DISABLE   BIT(0)


Is the alignment off?

Thanks,

Vinay.


+
  #define FORCEWAKE_MEDIA_GEN9  _MMIO(0xa270)
  #define FORCEWAKE_RENDER_GEN9 _MMIO(0xa278)
  


[Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/mtl: Add Wa_14017073508 for SAMedia

2022-11-04 Thread Patchwork
== Series Details ==

Series: drm/i915/mtl: Add Wa_14017073508 for SAMedia
URL   : https://patchwork.freedesktop.org/series/110502/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12339_full -> Patchwork_110502v1_full


Summary
---

  **WARNING**

  Minor unknown changes coming with Patchwork_110502v1_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_110502v1_full, please notify your bug team to allow 
them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 11)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110502v1_full:

### IGT changes ###

 Warnings 

  * igt@kms_plane_scaling@plane-upscale-with-rotation-factor-0-25@pipe-c-edp-1:
- shard-tglb: [SKIP][1] ([i915#5176]) -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-tglb7/igt@kms_plane_scaling@plane-upscale-with-rotation-factor-0...@pipe-c-edp-1.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110502v1/shard-tglb8/igt@kms_plane_scaling@plane-upscale-with-rotation-factor-0...@pipe-c-edp-1.html

  
 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_exec_gttfill@engines@rcs0:
- {shard-rkl}:([PASS][3], [PASS][4]) -> ([INCOMPLETE][5], 
[PASS][6]) +1 similar issue
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-5/igt@gem_exec_gttfill@engi...@rcs0.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-4/igt@gem_exec_gttfill@engi...@rcs0.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110502v1/shard-rkl-2/igt@gem_exec_gttfill@engi...@rcs0.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110502v1/shard-rkl-4/igt@gem_exec_gttfill@engi...@rcs0.html

  * igt@kms_cursor_legacy@torture-move@all-pipes:
- {shard-rkl}:([PASS][7], [PASS][8]) -> [INCOMPLETE][9]
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-2/igt@kms_cursor_legacy@torture-m...@all-pipes.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-4/igt@kms_cursor_legacy@torture-m...@all-pipes.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110502v1/shard-rkl-5/igt@kms_cursor_legacy@torture-m...@all-pipes.html

  
Known issues


  Here are the changes found in Patchwork_110502v1_full that come from known 
issues:

### CI changes ###

 Possible fixes 

  * boot:
- shard-snb:  ([PASS][10], [PASS][11], [PASS][12], [PASS][13], 
[PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], 
[PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], [PASS][25], 
[FAIL][26], [PASS][27], [PASS][28], [PASS][29], [PASS][30], [PASS][31], 
[PASS][32], [PASS][33], [PASS][34]) ([i915#4338]) -> ([PASS][35], [PASS][36], 
[PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], 
[PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], 
[PASS][49], [PASS][50], [PASS][51], [PASS][52], [PASS][53], [PASS][54], 
[PASS][55], [PASS][56], [PASS][57], [PASS][58], [PASS][59])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [27]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
 

Re: [Intel-gfx] [RFC][PATCH v3 13/33] timers: drm: Use timer_shutdown_sync() before freeing timer

2022-11-04 Thread Steven Rostedt
On Fri, 4 Nov 2022 08:48:28 +
Tvrtko Ursulin  wrote:

> If it stays all DRM drivers in one patch then I guess it needs to go via 
> drm-misc, which for i915 would be okay I think in this case since patch 
> is extremely unlikely to clash with anything. Or split it up per driver 
> and then we can handle it in drm-intel-next once core functionality is in.
> 
> We do however have some more calls to del_timer_sync, where freeing is 
> perhaps not immediately next to the site in code, but things definitely 
> get freed like on module unload. Would we need to convert all of them to 
> avoid some, presumably new, warnings?


I'm happy to split this patch up. I just got a bit lazy and started
just grouping via entire subsystems. You should see the networking
patch ;-)

-- Steve


Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Properly initialise kernel contexts

2022-11-04 Thread John Harrison

On 11/4/2022 11:53, Ceraolo Spurio, Daniele wrote:

On 11/2/2022 12:21 PM, john.c.harri...@intel.com wrote:

From: John Harrison 

If a context has already been registered prior to first submission
then context init code was not being called. The noticeable effect of
that was the scheduling priority was left at zero (meaning super high
priority) instead of being set to normal. This would occur with
kernel contexts at start of day as they are manually pinned up front
rather than on first submission. So add a call to initialise those
when they are pinned.


Does this need a fixes tag? on one side, we were leaving the priority 
to the wrong value, but on the other there were no actual consequences.


I think that's the point. There was no actual issue, it's just a 
theoretical problem. So there is nothing to be gained by pushing this as 
a fix. It it seems like it would be a lot of unnecessary effort to push 
it all the way back to 5.17.


John.



Reviewed-by: Daniele Ceraolo Spurio 

Daniele


Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

index 4ccb29f9ac55c..941613be3b9dd 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4111,6 +4111,9 @@ static inline void 
guc_kernel_context_pin(struct intel_guc *guc,

  if (context_guc_id_invalid(ce))
  pin_guc_id(guc, ce);
  +    if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
+    guc_context_init(ce);
+
  try_context_registration(ce, true);
  }






Re: [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Properly initialise kernel contexts

2022-11-04 Thread Ceraolo Spurio, Daniele




On 11/2/2022 12:21 PM, john.c.harri...@intel.com wrote:

From: John Harrison 

If a context has already been registered prior to first submission
then context init code was not being called. The noticeable effect of
that was the scheduling priority was left at zero (meaning super high
priority) instead of being set to normal. This would occur with
kernel contexts at start of day as they are manually pinned up front
rather than on first submission. So add a call to initialise those
when they are pinned.


Does this need a fixes tag? on one side, we were leaving the priority to 
the wrong value, but on the other there were no actual consequences.


Reviewed-by: Daniele Ceraolo Spurio 

Daniele


Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 4ccb29f9ac55c..941613be3b9dd 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -4111,6 +4111,9 @@ static inline void guc_kernel_context_pin(struct 
intel_guc *guc,
if (context_guc_id_invalid(ce))
pin_guc_id(guc, ce);
  
+	if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))

+   guc_context_init(ce);
+
try_context_registration(ce, true);
  }
  




Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/guc: Remove excessive line feeds in state dumps

2022-11-04 Thread John Harrison

On 10/31/2022 18:26, Patchwork wrote:

Project List - Patchwork *Patch Details*
*Series:*   drm/i915/guc: Remove excessive line feeds in state dumps
*URL:*  https://patchwork.freedesktop.org/series/110343/
*State:*failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110343v1/index.html



  CI Bug Log - changes from CI_DRM_12325_full -> Patchwork_110343v1_full


Summary

*FAILURE*

Serious unknown changes coming with Patchwork_110343v1_full absolutely 
need to be

verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_110343v1_full, please notify your bug team to 
allow them
to document this new failure mode, which will reduce false positives 
in CI.



Participating hosts (11 -> 9)

Missing (2): shard-rkl shard-dg1


Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_110343v1_full:



  IGT changes


Possible regressions

 *

igt@i915_module_load@reload-with-fault-injection:

  o shard-snb: PASS


-> INCOMPLETE


 *

igt@kms_cursor_crc@cursor-offscreen-64x21@pipe-b-edp-1:

  o shard-tglb: PASS


-> INCOMPLETE


 *


igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-downscaling@pipe-a-valid-mode:

  o shard-iclb: PASS


-> FAIL


 *

igt@kms_sequence@queue-busy@edp-1-pipe-a:

  o shard-skl: PASS


-> FAIL



This patch is literally just removing excess '\n' characters from some 
GuC only debugfs prints. It cannot cause any of the above failures.


John.



 *


Known issues

Here are the changes found in Patchwork_110343v1_full that come from 
known issues:



  CI changes


Issues hit

  * boot:
  o shard-skl: (PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS



[Intel-gfx] ✓ Fi.CI.BAT: success for Add GT oriented dmesg output

2022-11-04 Thread Patchwork
== Series Details ==

Series: Add GT oriented dmesg output
URL   : https://patchwork.freedesktop.org/series/110550/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12344 -> Patchwork_110550v1


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/index.html

Participating hosts (41 -> 30)
--

  Additional (1): fi-cml-u2 
  Missing(12): fi-bdw-samus bat-dg2-8 bat-dg2-9 bat-adlp-6 bat-adlp-4 
fi-ctg-p8600 bat-adln-1 bat-rplp-1 bat-rpls-1 bat-rpls-2 bat-dg2-11 bat-jsl-1 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110550v1:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@debugfs_test@basic-hwmon}:
- fi-cml-u2:  NOTRUN -> [SKIP][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@debugfs_t...@basic-hwmon.html

  
Known issues


  Here are the changes found in Patchwork_110550v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_fence@basic-busy@bcs0:
- fi-cml-u2:  NOTRUN -> [SKIP][2] ([i915#1208]) +1 similar issue
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@gem_exec_fence@basic-b...@bcs0.html

  * igt@gem_huc_copy@huc-copy:
- fi-cml-u2:  NOTRUN -> [SKIP][3] ([i915#2190])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@verify-random:
- fi-cml-u2:  NOTRUN -> [SKIP][4] ([i915#4613]) +3 similar issues
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@gem_lmem_swapp...@verify-random.html

  * igt@i915_suspend@basic-s3-without-i915:
- fi-rkl-11600:   [PASS][5] -> [INCOMPLETE][6] ([i915#4817])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12344/fi-rkl-11600/igt@i915_susp...@basic-s3-without-i915.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-rkl-11600/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_chamelium@vga-hpd-fast:
- fi-cml-u2:  NOTRUN -> [SKIP][7] ([fdo#109284] / [fdo#111827]) +8 
similar issues
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@kms_chamel...@vga-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
- fi-cml-u2:  NOTRUN -> [SKIP][8] ([i915#4213])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@kms_cursor_leg...@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
- fi-cml-u2:  NOTRUN -> [SKIP][9] ([fdo#109285])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@kms_force_connector_ba...@force-load-detect.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1:
- fi-cml-u2:  NOTRUN -> [INCOMPLETE][10] ([i915#7379])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@kms_pipe_crc_basic@suspend-read-...@pipe-a-edp-1.html

  * igt@kms_setmode@basic-clone-single-crtc:
- fi-cml-u2:  NOTRUN -> [SKIP][11] ([i915#3555])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@kms_setm...@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-userptr:
- fi-cml-u2:  NOTRUN -> [SKIP][12] ([fdo#109295] / [i915#3301])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-cml-u2/igt@prime_v...@basic-userptr.html

  
 Possible fixes 

  * igt@i915_selftest@live@gt_heartbeat:
- fi-apl-guc: [DMESG-FAIL][13] ([i915#5334]) -> [PASS][14]
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12344/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110550v1/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html

  
  {name}: This element is suppressed. This means it is ignored when computing
  the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1208]: https://gitlab.freedesktop.org/drm/intel/issues/1208
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4613]: https://gitlab.freedesktop.org/drm/inte

[Intel-gfx] ✗ Fi.CI.IGT: failure for mei: add timeout to send

2022-11-04 Thread Patchwork
== Series Details ==

Series: mei: add timeout to send
URL   : https://patchwork.freedesktop.org/series/110495/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12339_full -> Patchwork_110495v1_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_110495v1_full absolutely need 
to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_110495v1_full, please notify your bug team to allow 
them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 11)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110495v1_full:

### IGT changes ###

 Possible regressions 

  * igt@gem_eio@kms:
- shard-tglb: [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-tglb8/igt@gem_...@kms.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110495v1/shard-tglb5/igt@gem_...@kms.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-edp1:
- shard-skl:  [PASS][3] -> [INCOMPLETE][4] +1 similar issue
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-skl5/igt@kms_flip@flip-vs-suspend-interrupti...@c-edp1.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110495v1/shard-skl6/igt@kms_flip@flip-vs-suspend-interrupti...@c-edp1.html

  
Known issues


  Here are the changes found in Patchwork_110495v1_full that come from known 
issues:

### CI changes ###

 Issues hit 

  * boot:
- shard-apl:  ([PASS][5], [PASS][6], [PASS][7], [PASS][8], 
[PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], 
[PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], 
[PASS][21], [PASS][22], [PASS][23], [PASS][24], [PASS][25], [PASS][26], 
[PASS][27], [PASS][28], [PASS][29]) -> ([PASS][30], [FAIL][31], [PASS][32], 
[PASS][33], [PASS][34], [PASS][35], [PASS][36], [PASS][37], [PASS][38], 
[PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], 
[PASS][45], [PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50], 
[PASS][51], [PASS][52], [PASS][53], [PASS][54]) ([i915#4386])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl8/boot.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl1/boot.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl1/boot.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl1/boot.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl1/boot.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl2/boot.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl2/boot.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl2/boot.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl2/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl3/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl3/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl3/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl3/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl6/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl6/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl6/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl6/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl6/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl8/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl8/boot.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl8/boot.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl7/boot.html
   [27]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl7/boot.html
   [28]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl7/boot.html
   [29]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-apl7/boot.html
   [30]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110495v1/shard-apl1/boot.html
   [31]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110495v1/shard-apl1/boot.html
   [32]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110495v1/shard-apl1/boot.html
   [33]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110495v1/shard-apl1/boot.html
   [34]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110495v1/shard-apl2/boot.html
   [35

Re: [Intel-gfx] [PATCH v2] drm/i915/mtl: Add Wa_14017073508 for SAMedia

2022-11-04 Thread Rodrigo Vivi


On Fri, Nov 04, 2022 at 12:15:59AM +0530, Badal Nilawar wrote:
> This workaround is added for Media tile of MTL A step. It is to help
> pcode workaround which handles the hardware issue seen during package C2/C3
> transitions due to RC6 entry/exit transitions on Media tile. As a part of
> workaround pcode expect kmd to send mailbox message "media busy" when
> components of Media tile are in use and "media idle" otherwise.
> As per workaround description gucrc need to be disabled so enabled
> host based RC for Media tile.
> 
> v2:
>  - Correct workaround id (Matt)
>  - Fix review comments (Rodrigo)
> 
> Cc: Rodrigo Vivi 
> Cc: Radhakrishna Sripada 
> Cc: Vinay Belgaumkar 
> Cc: Chris Wilson 
> Signed-off-by: Badal Nilawar 

Reviewed-by: Rodrigo Vivi 
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c | 27 +++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c | 13 ++-
>  drivers/gpu/drm/i915/i915_drv.h   |  4 
>  drivers/gpu/drm/i915/i915_reg.h   |  9 
>  4 files changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index f553e2173bda..833b7682643f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -19,10 +19,31 @@
>  #include "intel_rc6.h"
>  #include "intel_rps.h"
>  #include "intel_wakeref.h"
> +#include "intel_pcode.h"
>  #include "pxp/intel_pxp_pm.h"
>  
>  #define I915_GT_SUSPEND_IDLE_TIMEOUT (HZ / 2)
>  
> +static void mtl_media_busy(struct intel_gt *gt)
> +{
> + /* Wa_14017073508: mtl */
> + if (IS_MTL_GRAPHICS_STEP(gt->i915, P, STEP_A0, STEP_B0) &&
> + gt->type == GT_MEDIA)
> + snb_pcode_write_p(gt->uncore, PCODE_MBOX_GT_STATE,
> +   PCODE_MBOX_GT_STATE_MEDIA_BUSY,
> +   PCODE_MBOX_GT_STATE_DOMAIN_MEDIA, 0);
> +}
> +
> +static void mtl_media_idle(struct intel_gt *gt)
> +{
> + /* Wa_14017073508: mtl */
> + if (IS_MTL_GRAPHICS_STEP(gt->i915, P, STEP_A0, STEP_B0) &&
> + gt->type == GT_MEDIA)
> + snb_pcode_write_p(gt->uncore, PCODE_MBOX_GT_STATE,
> +   PCODE_MBOX_GT_STATE_MEDIA_NOT_BUSY,
> +   PCODE_MBOX_GT_STATE_DOMAIN_MEDIA, 0);
> +}
> +
>  static void user_forcewake(struct intel_gt *gt, bool suspend)
>  {
>   int count = atomic_read(>->user_wakeref);
> @@ -70,6 +91,9 @@ static int __gt_unpark(struct intel_wakeref *wf)
>  
>   GT_TRACE(gt, "\n");
>  
> + /* Wa_14017073508: mtl */
> + mtl_media_busy(gt);
> +
>   /*
>* It seems that the DMC likes to transition between the DC states a lot
>* when there are no connected displays (no active power domains) during
> @@ -119,6 +143,9 @@ static int __gt_park(struct intel_wakeref *wf)
>   GEM_BUG_ON(!wakeref);
>   intel_display_power_put_async(i915, POWER_DOMAIN_GT_IRQ, wakeref);
>  
> + /* Wa_14017073508: mtl */
> + mtl_media_idle(gt);
> +
>   return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c 
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c
> index 8f8dd05835c5..b5855091cf6a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c
> @@ -11,9 +11,20 @@
>  
>  static bool __guc_rc_supported(struct intel_guc *guc)
>  {
> + struct intel_gt *gt = guc_to_gt(guc);
> +
> + /*
> +  * Wa_14017073508: mtl
> +  * Do not enable gucrc to avoid additional interrupts which
> +  * may disrupt pcode wa.
> +  */
> + if (IS_MTL_GRAPHICS_STEP(gt->i915, P, STEP_A0, STEP_B0) &&
> + gt->type == GT_MEDIA)
> + return false;
> +
>   /* GuC RC is unavailable for pre-Gen12 */
>   return guc->submission_supported &&
> - GRAPHICS_VER(guc_to_gt(guc)->i915) >= 12;
> + GRAPHICS_VER(gt->i915) >= 12;
>  }
>  
>  static bool __guc_rc_selected(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 05b3300cc4ed..659b92382ff2 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -740,6 +740,10 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
>  #define IS_XEHPSDV_GRAPHICS_STEP(__i915, since, until) \
>   (IS_XEHPSDV(__i915) && IS_GRAPHICS_STEP(__i915, since, until))
>  
> +#define IS_MTL_GRAPHICS_STEP(__i915, variant, since, until) \
> + (IS_SUBPLATFORM(__i915, INTEL_METEORLAKE, INTEL_SUBPLATFORM_##variant) 
> && \
> +  IS_GRAPHICS_STEP(__i915, since, until))
> +
>  /*
>   * DG2 hardware steppings are a bit unusual.  The hardware design was forked 
> to
>   * create three variants (G10, G11, and G12) which each have distinct
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 765a10e0de88..23d732413919 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GT oriented dmesg output

2022-11-04 Thread Patchwork
== Series Details ==

Series: Add GT oriented dmesg output
URL   : https://patchwork.freedesktop.org/series/110550/
State : warning

== Summary ==

Error: dim checkpatch failed
3c6bc62f46e5 drm/i915/gt: Add GT oriented dmesg output
-:22: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_gt' - possible side-effects?
#22: FILE: drivers/gpu/drm/i915/gt/intel_gt.h:16:
+#define GT_ERR(_gt, _fmt, ...) \
+   drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)

-:25: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_gt' - possible side-effects?
#25: FILE: drivers/gpu/drm/i915/gt/intel_gt.h:19:
+#define GT_WARN(_gt, _fmt, ...) \
+   drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)

-:28: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_gt' - possible side-effects?
#28: FILE: drivers/gpu/drm/i915/gt/intel_gt.h:22:
+#define GT_NOTICE(_gt, _fmt, ...) \
+   drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)

-:31: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_gt' - possible side-effects?
#31: FILE: drivers/gpu/drm/i915/gt/intel_gt.h:25:
+#define GT_INFO(_gt, _fmt, ...) \
+   drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)

-:34: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_gt' - possible side-effects?
#34: FILE: drivers/gpu/drm/i915/gt/intel_gt.h:28:
+#define GT_DBG(_gt, _fmt, ...) \
+   drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)

total: 0 errors, 0 warnings, 5 checks, 21 lines checked
bfd7d8c0cc66 drm/i915/uc: Update the gt/uc code to use GT_ERR and friends




Re: [Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches

2022-11-04 Thread John Harrison

On 11/4/2022 03:01, Tvrtko Ursulin wrote:

On 03/11/2022 19:16, John Harrison wrote:

On 11/3/2022 02:38, Tvrtko Ursulin wrote:

On 03/11/2022 09:18, Tvrtko Ursulin wrote:

On 03/11/2022 01:33, John Harrison wrote:

On 11/2/2022 07:20, Tvrtko Ursulin wrote:

On 02/11/2022 12:12, Jani Nikula wrote:

On Tue, 01 Nov 2022, john.c.harri...@intel.com wrote:

From: John Harrison 

At the end of each test, IGT does a drop caches call via sysfs 
with


sysfs?
Sorry, that was meant to say debugfs. I've also been working on 
some sysfs IGT issues and evidently got my wires crossed!




special flags set. One of the possible paths waits for idle 
with an
infinite timeout. That causes problems for debugging issues 
when CI
catches a "can't go idle" test failure. Best case, the CI 
system times

out (after 90s), attempts a bunch of state dump actions and then
reboots the system to recover it. Worst case, the CI system 
can't do
anything at all and then times out (after 1000s) and simply 
reboots.
Sometimes a serial port log of dmesg might be available, 
sometimes not.


So rather than making life hard for ourselves, change the 
timeout to

be 10s rather than infinite. Also, trigger the standard
wedge/reset/recover sequence so that testing can continue with a
working system (if possible).

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/i915_debugfs.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c

index ae987e92251dd..9d916fbbfc27c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -641,6 +641,9 @@ 
DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,

    DROP_RESET_ACTIVE | \
    DROP_RESET_SEQNO | \
    DROP_RCU)
+
+#define DROP_IDLE_TIMEOUT    (HZ * 10)


I915_IDLE_ENGINES_TIMEOUT is defined in i915_drv.h. It's also 
only used

here.


So move here, dropping i915 prefix, next to the newly proposed one?

Sure, can do that.




I915_GEM_IDLE_TIMEOUT is defined in i915_gem.h. It's only used in
gt/intel_gt.c.


Move there and rename to GT_IDLE_TIMEOUT?

I915_GT_SUSPEND_IDLE_TIMEOUT is defined and used only in 
intel_gt_pm.c.


No action needed, maybe drop i915 prefix if wanted.

These two are totally unrelated and in code not being touched by 
this change. I would rather not conflate changing random other 
things with fixing this specific issue.



I915_IDLE_ENGINES_TIMEOUT is in ms, the rest are in jiffies.


Add _MS suffix if wanted.


My head spins.


I follow and raise that the newly proposed DROP_IDLE_TIMEOUT 
applies to DROP_ACTIVE and not only DROP_IDLE.
My original intention for the name was that is the 'drop caches 
timeout for intel_gt_wait_for_idle'. Which is quite the mouthful 
and hence abbreviated to DROP_IDLE_TIMEOUT. But yes, I realised 
later that name can be conflated with the DROP_IDLE flag. Will 
rename.





Things get refactored, code moves around, bits get left behind, 
who knows. No reason to get too worked up. :) As long as people 
are taking a wider view when touching the code base, and are not 
afraid to send cleanups, things should be good.
On the other hand, if every patch gets blocked in code review 
because someone points out some completely unrelated piece of code 
could be a bit better then nothing ever gets fixed. If you spot 
something that you think should be improved, isn't the general 
idea that you should post a patch yourself to improve it?


There's two maintainers per branch and an order of magnitude or two 
more developers so it'd be nice if cleanups would just be incoming 
on self-initiative basis. ;)


For the actual functional change at hand - it would be nice if 
code paths in question could handle SIGINT and then we could punt 
the decision on how long someone wants to wait purely to 
userspace. But it's probably hard and it's only debugfs so whatever.


The code paths in question will already abort on a signal won't 
they? Both intel_gt_wait_for_idle() and 
intel_guc_wait_for_pending_msg(), which is where the 
uc_wait_for_idle eventually ends up, have an 'if(signal_pending) 
return -EINTR;' check. Beyond that, it sounds like what you are 
asking for is a change in the IGT libraries and/or CI framework to 
start sending signals after some specific timeout. That seems like 
a significantly more complex change (in terms of the number of 
entities affected and number of groups involved) and unnecessary.


If you say so, I haven't looked at them all. But if the code path 
in question already aborts on signals then I am not sure what is 
the patch fixing? I assumed you are trying to avoid the write stuck 
in D forever, which then prevents driver unload and everything, 
requiring the test runner to eventually reboot. If you say SIGINT 
works then you can already recover from userspace, no?


Whether or not 10s is enough CI will hopefully tell us. I'd 
probably err on the side of safety and make it longer,

Re: [Intel-gfx] [PATCH v2 5/5] drm/i915/mtl: don't expose GSC command streamer to the user

2022-11-04 Thread Matt Roper
On Wed, Nov 02, 2022 at 10:10:47AM -0700, Daniele Ceraolo Spurio wrote:
> There is no userspace user for this CS yet, we only need it for internal
> kernel ops (e.g. HuC, PXP), so don't expose it.
> 
> v2: even if it's not exposed, rename the engine so it is easier to
> identify in the debug logs (Matt)
> 
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Matt Roper 

Reviewed-by: Matt Roper 

> ---
>  drivers/gpu/drm/i915/gt/intel_engine_user.c | 27 -
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 79312b734690..cd4f1b126f75 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -191,6 +191,15 @@ static void add_legacy_ring(struct legacy_ring *ring,
>   ring->instance++;
>  }
>  
> +static void engine_rename(struct intel_engine_cs *engine, const char *name, 
> u16 instance)
> +{
> + char old[sizeof(engine->name)];
> +
> + memcpy(old, engine->name, sizeof(engine->name));
> + scnprintf(engine->name, sizeof(engine->name), "%s%u", name, instance);
> + drm_dbg(&engine->i915->drm, "renamed %s to %s\n", old, engine->name);
> +}
> +
>  void intel_engines_driver_register(struct drm_i915_private *i915)
>  {
>   struct legacy_ring ring = {};
> @@ -206,11 +215,19 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>   struct intel_engine_cs *engine =
>   container_of((struct rb_node *)it, typeof(*engine),
>uabi_node);
> - char old[sizeof(engine->name)];
>  
>   if (intel_gt_has_unrecoverable_error(engine->gt))
>   continue; /* ignore incomplete engines */
>  
> + /*
> +  * We don't want to expose the GSC engine to the users, but we
> +  * still rename it so it is easier to identify in the debug logs
> +  */
> + if (engine->id == GSC0) {
> + engine_rename(engine, "gsc", 0);
> + continue;
> + }
> +
>   GEM_BUG_ON(engine->class >= ARRAY_SIZE(uabi_classes));
>   engine->uabi_class = uabi_classes[engine->class];
>  
> @@ -220,11 +237,9 @@ void intel_engines_driver_register(struct 
> drm_i915_private *i915)
>   i915->engine_uabi_class_count[engine->uabi_class]++;
>  
>   /* Replace the internal name with the final user facing name */
> - memcpy(old, engine->name, sizeof(engine->name));
> - scnprintf(engine->name, sizeof(engine->name), "%s%u",
> -   intel_engine_class_repr(engine->class),
> -   engine->uabi_instance);
> - DRM_DEBUG_DRIVER("renamed %s to %s\n", old, engine->name);
> + engine_rename(engine,
> +   intel_engine_class_repr(engine->class),
> +   engine->uabi_instance);
>  
>   rb_link_node(&engine->uabi_node, prev, p);
>   rb_insert_color(&engine->uabi_node, &i915->uabi_engines);
> -- 
> 2.37.3
> 

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation


[Intel-gfx] [PATCH 0/2] Add GT oriented dmesg output

2022-11-04 Thread John . C . Harrison
From: John Harrison 

When trying to analyse bug reports from CI, customers, etc. it can be
difficult to work out exactly what is happening on which GT in a
multi-GT system. So add GT oriented debug/error message wrappers. If
used instead of the drm_ equivalents, you get the same output but with
a GT# prefix on it.

This patch also updates the gt/uc files to use the new helpers as a
first step. The intention would be to convert all output messages that
have access to a GT structure.

Signed-off-by: John Harrison 


John Harrison (2):
  drm/i915/gt: Add GT oriented dmesg output
  drm/i915/uc: Update the gt/uc code to use GT_ERR and friends

 drivers/gpu/drm/i915/gt/intel_gt.h| 15 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c| 25 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|  9 +-
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 50 --
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  9 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 17 ++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c| 49 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c |  3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |  6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 ++--
 drivers/gpu/drm/i915/gt/uc/intel_huc.c| 20 ++--
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 84 -
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 91 +--
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 36 
 .../drm/i915/gt/uc/selftest_guc_hangcheck.c   | 22 ++---
 .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 10 +-
 16 files changed, 243 insertions(+), 259 deletions(-)

-- 
2.37.3



[Intel-gfx] [PATCH 2/2] drm/i915/uc: Update the gt/uc code to use GT_ERR and friends

2022-11-04 Thread John . C . Harrison
From: John Harrison 

Use the new GT oriented output message helpers where possible.

Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.c| 25 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c|  9 +-
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 50 --
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  9 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 17 ++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c| 49 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c |  3 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c   |  6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 56 ++--
 drivers/gpu/drm/i915/gt/uc/intel_huc.c| 20 ++--
 drivers/gpu/drm/i915/gt/uc/intel_uc.c | 84 -
 drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 91 +--
 drivers/gpu/drm/i915/gt/uc/selftest_guc.c | 36 
 .../drm/i915/gt/uc/selftest_guc_hangcheck.c   | 22 ++---
 .../drm/i915/gt/uc/selftest_guc_multi_lrc.c   | 10 +-
 15 files changed, 228 insertions(+), 259 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 27b09ba1d295f..36983a2cc20e8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -330,7 +330,7 @@ static void guc_init_params(struct intel_guc *guc)
params[GUC_CTL_DEVID] = guc_ctl_devid(guc);
 
for (i = 0; i < GUC_CTL_MAX_DWORDS; i++)
-   DRM_DEBUG_DRIVER("param[%2d] = %#x\n", i, params[i]);
+   GT_DBG(guc_to_gt(guc), "param[%2d] = %#x\n", i, params[i]);
 }
 
 /*
@@ -475,8 +475,8 @@ void intel_guc_fini(struct intel_guc *guc)
 int intel_guc_send_mmio(struct intel_guc *guc, const u32 *request, u32 len,
u32 *response_buf, u32 response_buf_size)
 {
-   struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
-   struct intel_uncore *uncore = guc_to_gt(guc)->uncore;
+   struct intel_gt *gt = guc_to_gt(guc);
+   struct intel_uncore *uncore = gt->uncore;
u32 header;
int i;
int ret;
@@ -510,8 +510,7 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 
*request, u32 len,
   10, 10, &header);
if (unlikely(ret)) {
 timeout:
-   drm_err(&i915->drm, "mmio request %#x: no reply %x\n",
-   request[0], header);
+   GT_ERR(gt, "mmio request %#x: no reply %x\n", request[0], 
header);
goto out;
}
 
@@ -532,8 +531,7 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 
*request, u32 len,
if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) == 
GUC_HXG_TYPE_NO_RESPONSE_RETRY) {
u32 reason = FIELD_GET(GUC_HXG_RETRY_MSG_0_REASON, header);
 
-   drm_dbg(&i915->drm, "mmio request %#x: retrying, reason %u\n",
-   request[0], reason);
+   GT_DBG(gt, "mmio request %#x: retrying, reason %u\n", 
request[0], reason);
goto retry;
}
 
@@ -541,16 +539,14 @@ int intel_guc_send_mmio(struct intel_guc *guc, const u32 
*request, u32 len,
u32 hint = FIELD_GET(GUC_HXG_FAILURE_MSG_0_HINT, header);
u32 error = FIELD_GET(GUC_HXG_FAILURE_MSG_0_ERROR, header);
 
-   drm_err(&i915->drm, "mmio request %#x: failure %x/%u\n",
-   request[0], error, hint);
+   GT_ERR(gt, "mmio request %#x: failure %x/%u\n", request[0], 
error, hint);
ret = -ENXIO;
goto out;
}
 
if (FIELD_GET(GUC_HXG_MSG_0_TYPE, header) != 
GUC_HXG_TYPE_RESPONSE_SUCCESS) {
 proto:
-   drm_err(&i915->drm, "mmio request %#x: unexpected reply %#x\n",
-   request[0], header);
+   GT_ERR(gt, "mmio request %#x: unexpected reply %#x\n", 
request[0], header);
ret = -EPROTO;
goto out;
}
@@ -592,9 +588,9 @@ int intel_guc_to_host_process_recv_msg(struct intel_guc 
*guc,
msg = payload[0] & guc->msg_enabled_mask;
 
if (msg & INTEL_GUC_RECV_MSG_CRASH_DUMP_POSTED)
-   drm_err(&guc_to_gt(guc)->i915->drm, "Received early GuC crash 
dump notification!\n");
+   GT_ERR(guc_to_gt(guc), "Received early GuC crash dump 
notification!\n");
if (msg & INTEL_GUC_RECV_MSG_EXCEPTION)
-   drm_err(&guc_to_gt(guc)->i915->drm, "Received early GuC 
exception notification!\n");
+   GT_ERR(guc_to_gt(guc), "Received early GuC exception 
notification!\n");
 
return 0;
 }
@@ -648,7 +644,8 @@ int intel_guc_suspend(struct intel_guc *guc)
 */
ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), 
NULL, 0);
if (ret)
-   DRM_ERROR("GuC suspend: RESET_CLIENT action failed with 
error %d!\n", ret);
+   GT_ERR(guc_to_gt(guc),
+  

[Intel-gfx] [PATCH 1/2] drm/i915/gt: Add GT oriented dmesg output

2022-11-04 Thread John . C . Harrison
From: John Harrison 

When trying to analyse bug reports from CI, customers, etc. it can be
difficult to work out exactly what is happening on which GT in a
multi-GT system. So add GT oriented debug/error message wrappers. If
used instead of the drm_ equivalents, you get the same output but with
a GT# prefix on it.

Signed-off-by: John Harrison 
---
 drivers/gpu/drm/i915/gt/intel_gt.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index e0365d5562484..1e016fb0117a4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -13,6 +13,21 @@
 struct drm_i915_private;
 struct drm_printer;
 
+#define GT_ERR(_gt, _fmt, ...) \
+   drm_err(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
+#define GT_WARN(_gt, _fmt, ...) \
+   drm_warn(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)
+
+#define GT_NOTICE(_gt, _fmt, ...) \
+   drm_notice(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)
+
+#define GT_INFO(_gt, _fmt, ...) \
+   drm_info(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, 
##__VA_ARGS__)
+
+#define GT_DBG(_gt, _fmt, ...) \
+   drm_dbg(&(_gt)->i915->drm, "GT%u: " _fmt, (_gt)->info.id, ##__VA_ARGS__)
+
 #define GT_TRACE(gt, fmt, ...) do {\
const struct intel_gt *gt__ __maybe_unused = (gt);  \
GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev), \
-- 
2.37.3



[Intel-gfx] ✓ Fi.CI.IGT: success for drm/fb-helper: Untangle fbdev emulation and helpers (rev3)

2022-11-04 Thread Patchwork
== Series Details ==

Series: drm/fb-helper: Untangle fbdev emulation and helpers (rev3)
URL   : https://patchwork.freedesktop.org/series/109942/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12339_full -> Patchwork_109942v3_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Participating hosts (11 -> 11)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_109942v3_full:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_cursor_legacy@single-move@pipe-b:
- {shard-rkl}:[PASS][1] -> ([INCOMPLETE][2], [PASS][3])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-1/igt@kms_cursor_legacy@single-m...@pipe-b.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-rkl-5/igt@kms_cursor_legacy@single-m...@pipe-b.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-rkl-6/igt@kms_cursor_legacy@single-m...@pipe-b.html

  
Known issues


  Here are the changes found in Patchwork_109942v3_full that come from known 
issues:

### CI changes ###

 Possible fixes 

  * boot:
- shard-snb:  ([PASS][4], [PASS][5], [PASS][6], [PASS][7], 
[PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], 
[PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], 
[FAIL][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], [PASS][25], 
[PASS][26], [PASS][27], [PASS][28]) ([i915#4338]) -> ([PASS][29], [PASS][30], 
[PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], 
[PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], 
[PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], 
[PASS][49], [PASS][50], [PASS][51], [PASS][52], [PASS][53])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb5/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb4/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb4/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb4/boot.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb4/boot.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb4/boot.html
   [27]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [28]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [29]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb4/boot.html
   [30]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb4/boot.html
   [31]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb4/boot.html
   [32]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb4/boot.html
   [33]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb2/boot.html
   [34]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb2/boot.html
   [35]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb2/boot.html
   [36]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb2/boot.html
   [37]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109942v3/shard-snb7/boot.h

Re: [Intel-gfx] [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers

2022-11-04 Thread Linus Torvalds
On Thu, Nov 3, 2022 at 10:48 PM Steven Rostedt  wrote:
>
> Ideally, I would have the first patch go into this rc cycle, which is mostly
> non functional as it will allow the other patches to come in via the 
> respective
> subsystems in the next merge window.

Ack.

I also wonder if we could do the completely trivially correct
conversions immediately.

I'm talking about the scripted ones where it's currently a
"del_timer_sync()", and the very next action is freeing whatever data
structure the timer is in (possibly with something like free_irq() in
between - my point is that there's an unconditional free that is very
clear and unambiguous), so that there is absolutely no question about
whether they should use "timer_shutdown_sync()" or not.

IOW, things like patches 03, 17 and 31, and at least parts others in
this series.

This series clearly has several much more complex cases that need
actual real code review, and I think it would help to have the
completely unambiguous cases out of the way, just to get rid of noise.

So I'd take that first patch, and a scripted set of "this cannot
change any semantics" patches early.

Linus


[Intel-gfx] ✓ Fi.CI.BAT: success for vfio-ccw parent rework (rev3)

2022-11-04 Thread Patchwork
== Series Details ==

Series: vfio-ccw parent rework (rev3)
URL   : https://patchwork.freedesktop.org/series/109899/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12344 -> Patchwork_109899v3


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/index.html

Participating hosts (41 -> 29)
--

  Additional (1): fi-cml-u2 
  Missing(13): fi-bdw-samus fi-tgl-dsi bat-dg2-8 bat-dg2-9 bat-adlp-6 
bat-adlp-4 fi-ctg-p8600 bat-adln-1 bat-rplp-1 bat-rpls-1 bat-rpls-2 bat-dg2-11 
bat-jsl-1 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_109899v3:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@debugfs_test@basic-hwmon}:
- fi-cml-u2:  NOTRUN -> [SKIP][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@debugfs_t...@basic-hwmon.html

  
Known issues


  Here are the changes found in Patchwork_109899v3 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_fence@basic-busy@bcs0:
- fi-cml-u2:  NOTRUN -> [SKIP][2] ([i915#1208]) +1 similar issue
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@gem_exec_fence@basic-b...@bcs0.html

  * igt@gem_exec_gttfill@basic:
- fi-pnv-d510:[PASS][3] -> [FAIL][4] ([i915#7229])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12344/fi-pnv-d510/igt@gem_exec_gttf...@basic.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-pnv-d510/igt@gem_exec_gttf...@basic.html

  * igt@gem_huc_copy@huc-copy:
- fi-cml-u2:  NOTRUN -> [SKIP][5] ([i915#2190])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@verify-random:
- fi-cml-u2:  NOTRUN -> [SKIP][6] ([i915#4613]) +3 similar issues
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@gem_lmem_swapp...@verify-random.html

  * igt@i915_selftest@live@hangcheck:
- fi-hsw-4770:[PASS][7] -> [INCOMPLETE][8] ([i915#4785])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12344/fi-hsw-4770/igt@i915_selftest@l...@hangcheck.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-hsw-4770/igt@i915_selftest@l...@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
- fi-rkl-11600:   [PASS][9] -> [INCOMPLETE][10] ([i915#4817])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12344/fi-rkl-11600/igt@i915_susp...@basic-s3-without-i915.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-rkl-11600/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_chamelium@vga-hpd-fast:
- fi-cml-u2:  NOTRUN -> [SKIP][11] ([fdo#109284] / [fdo#111827]) +8 
similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@kms_chamel...@vga-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
- fi-cml-u2:  NOTRUN -> [SKIP][12] ([i915#4213])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@kms_cursor_leg...@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
- fi-cml-u2:  NOTRUN -> [SKIP][13] ([fdo#109285])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@kms_force_connector_ba...@force-load-detect.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-a-edp-1:
- fi-cml-u2:  NOTRUN -> [INCOMPLETE][14] ([i915#7379])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@kms_pipe_crc_basic@suspend-read-...@pipe-a-edp-1.html

  * igt@kms_setmode@basic-clone-single-crtc:
- fi-cml-u2:  NOTRUN -> [SKIP][15] ([i915#3555])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@kms_setm...@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-userptr:
- fi-cml-u2:  NOTRUN -> [SKIP][16] ([fdo#109295] / [i915#3301])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-cml-u2/igt@prime_v...@basic-userptr.html

  * igt@runner@aborted:
- fi-hsw-4770:NOTRUN -> [FAIL][17] ([fdo#109271] / [i915#4312] / 
[i915#5594])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109899v3/fi-hsw-4770/igt@run...@aborted.html

  
 Possible fixes 

  * igt@i915_selftest@live@gt_heartbeat:
- fi-apl-guc: [DMESG-FAIL][18] ([i915#5334]) -> [PASS][19]
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12344/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
   [19]: 
https://intel-gfx-ci.01.org/tree/

Re: [Intel-gfx] [PATCH v5 02/31] drm/i915: Don't register backlight when another backlight should be used (v2)

2022-11-04 Thread Hans de Goede
Hi Matthew, Rafael,

On 10/27/22 14:09, Rafael J. Wysocki wrote:
> On Thu, Oct 27, 2022 at 12:37 PM Hans de Goede  wrote:
>>
>> Hi,
>>
>> On 10/27/22 11:52, Matthew Garrett wrote:
>>> On Thu, Oct 27, 2022 at 11:39:38AM +0200, Hans de Goede wrote:
>>>
 The *only* behavior which actually is new in 6.1 is the native GPU
 drivers now doing the equivalent of:

  if (acpi_video_get_backlight_type() != acpi_backlight_native)
  return;

 In their backlight register paths (i), which is causing the native
 backlight to disappear on your custom laptop setup and on Chromebooks
 (with the Chromebooks case being already solved I hope.).
>>>
>>> It's causing the backlight control to vanish on any machine that isn't
>>> ((acpi_video || vendor interface) || !acpi). Most machines that fall
>>> into that are either weird or Chromebooks or old, but there are machines
>>> that fall into that.
>>
>> I acknowledge that their are machines that fall into this category,
>> but I expect / hope there to be so few of them that we can just DMI
>> quirk our way out if this.
>>
>> I believe the old group to be small because:
>>
>> 1. Generally speaking the "native" control method is usually not
>> present on the really old (pre ACPI video spec) mobile GPUs.
>>
>> 2. On most old laptops I would still expect there to be a vendor
>> interface too, and if both get registered standard desktop environments
>> will prefer the vendor one, so then we need a native DMI quirk to
>> disable the vendor interface anyways and we already have a bunch of
>> those, so some laptops in this group are already covered by DMI quirks.
>>
>> And a fix for the Chromebook case is already in Linus' tree, which
>> just leaves the weird case, of which there will hopefully be only
>> a few.
>>
>> I do share your worry that this might break some machines, but
>> the only way to really find out is to get this code out there
>> I'm afraid.
>>
>> I have just written a blog post asking for people to check if
>> their laptop might be affected; and to report various details
>> to me of their laptop is affected:
>>
>> https://hansdegoede.dreamwidth.org/26548.html
>>
>> Lets wait and see how this goes. If I get (too) many reports then
>> I will send a revert of the addition of the:
>>
>> if (acpi_video_get_backlight_type() != acpi_backlight_native)
>> return;
>>
>> check to the i915 / radeon / amd / nouveau drivers.
>>
>> (And if I only get a couple of reports I will probably just submit
>> DMI quirks for the affected models).
> 
> Sounds reasonable to me, FWIW.

I have received quite a few test reports as a result of my blogpost
(and of the blogpost's mention in an arstechnica article).

Long story short, Matthew, you are right. Quite a few laptop models
will end up with an empty /sys/class/backlight because of the native
backlight class devices no longer registering when
acpi_video_backlight_use_native() returns false.

I will submit a patch-set later today to fix this (by making 
cpi_video_backlight_use_native() always return true for now).

More detailed summary/analysis of the received test reports:

-30 unaffected models

-The following laptop models:
 Acer Aspire 1640
 Apple MacBook 2.1
 Apple MacBook 4.1
 Apple MacBook Pro 7.1 (uses nv_backligh instead of intel_backlight!)
 HP Compaq nc6120
 IBM ThinkPad X40
 System76 Starling Star1

 All only have a native intel_backlight interface and the heuristics from
 acpi_video_get_backlight_type() return acpi_backlight_vendor there causing
 the changes in 6.1 to not register native backlights when
 acpi_video_backlight_use_native() returns false resulting in an empty
 /sys/class/backlight, breaking users ability to control their laptop
 panel's brightness.

 I will submit a patch to always make acpi_video_backlight_use_native()
 return true for now to work around this for 6.1.

 I do plan to try to re-introduce that change again later. First I need to
 change the heuristics to still native on more models so that on models
 where the native backlight is the only (working) entry they will
 return native.

-The Dell N1410 has acpi_video support and acpi_osi_is_win8() returns false
 so acpi_video_get_backlight_type() returns acpi_video, but acpi_video
 fails to register a backlight device due to a_BCM eval error.
 The intel_backlight interface works fine, but this model is going to need
 a DMI-use-native-quirk to avoid intel_backlight disappearing when
 acpi_video_backlight_use_native() is changed back.

-The following laptop models actually use a vendor backlight control method,
 while also having a native backlight entry under /sys/class/backlight:

 Asus EeePC 901   -> native backlight confirmed to also work
 Dell Latitude D610   -> native backlight confirmed to work better then vendor
 Sony Vaio PCG-FRV3   -> native backlight not tested

 Note these will keep working the same as before in 6.1, independent of
 the revert. I've tracked these seperatel

[Intel-gfx] ✗ Fi.CI.DOCS: warning for vfio-ccw parent rework (rev3)

2022-11-04 Thread Patchwork
== Series Details ==

Series: vfio-ccw parent rework (rev3)
URL   : https://patchwork.freedesktop.org/series/109899/
State : warning

== Summary ==

Error: make htmldocs had i915 warnings
./drivers/gpu/drm/i915/i915_perf_types.h:319: warning: Function parameter or 
member 'lock' not described in 'i915_perf_stream'




[Intel-gfx] ✗ Fi.CI.IGT: failure for Add KUnit support for i915 driver

2022-11-04 Thread Patchwork
== Series Details ==

Series: Add KUnit support for i915 driver
URL   : https://patchwork.freedesktop.org/series/110483/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12339_full -> Patchwork_110483v1_full


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_110483v1_full absolutely need 
to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_110483v1_full, please notify your bug team to allow 
them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 11)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110483v1_full:

### IGT changes ###

 Possible regressions 

  * igt@gem_exec_parallel@fds@bcs0:
- shard-skl:  [PASS][1] -> [INCOMPLETE][2]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-skl5/igt@gem_exec_parallel@f...@bcs0.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110483v1/shard-skl3/igt@gem_exec_parallel@f...@bcs0.html

  * igt@kms_atomic_interruptible@legacy-dpms@edp-1-pipe-a:
- shard-skl:  NOTRUN -> [INCOMPLETE][3]
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110483v1/shard-skl2/igt@kms_atomic_interruptible@legacy-d...@edp-1-pipe-a.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-shrfb-msflip-blt:
- shard-tglb: [PASS][4] -> [INCOMPLETE][5]
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-tglb6/igt@kms_frontbuffer_track...@psr-1p-primscrn-shrfb-msflip-blt.html
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110483v1/shard-tglb8/igt@kms_frontbuffer_track...@psr-1p-primscrn-shrfb-msflip-blt.html

  
 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@kms_cursor_legacy@single-move@pipe-b:
- {shard-rkl}:[PASS][6] -> ([PASS][7], [INCOMPLETE][8])
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-1/igt@kms_cursor_legacy@single-m...@pipe-b.html
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110483v1/shard-rkl-4/igt@kms_cursor_legacy@single-m...@pipe-b.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110483v1/shard-rkl-5/igt@kms_cursor_legacy@single-m...@pipe-b.html

  * igt@perf_pmu@rc6-runtime-pm-long:
- {shard-rkl}:([PASS][9], [PASS][10]) -> ([INCOMPLETE][11], 
[PASS][12]) +1 similar issue
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-4/igt@perf_...@rc6-runtime-pm-long.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-rkl-3/igt@perf_...@rc6-runtime-pm-long.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110483v1/shard-rkl-2/igt@perf_...@rc6-runtime-pm-long.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110483v1/shard-rkl-5/igt@perf_...@rc6-runtime-pm-long.html

  
Known issues


  Here are the changes found in Patchwork_110483v1_full that come from known 
issues:

### CI changes ###

 Possible fixes 

  * boot:
- shard-snb:  ([PASS][13], [PASS][14], [PASS][15], [PASS][16], 
[PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], 
[PASS][23], [PASS][24], [PASS][25], [PASS][26], [PASS][27], [PASS][28], 
[FAIL][29], [PASS][30], [PASS][31], [PASS][32], [PASS][33], [PASS][34], 
[PASS][35], [PASS][36], [PASS][37]) ([i915#4338]) -> ([PASS][38], [PASS][39], 
[PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], 
[PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50], [PASS][51], 
[PASS][52], [PASS][53], [PASS][54], [PASS][55], [PASS][56], [PASS][57], 
[PASS][58], [PASS][59], [PASS][60], [PASS][61], [PASS][62])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb2/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb7/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-snb6/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12339/shard-sn

Re: [Intel-gfx] [PATCH] drm/i915/selftest: Bump up sample period for busy stats selftest

2022-11-04 Thread Tvrtko Ursulin



On 04/11/2022 14:58, Umesh Nerlige Ramappa wrote:

On Fri, Nov 04, 2022 at 08:29:38AM +, Tvrtko Ursulin wrote:


On 03/11/2022 18:08, Umesh Nerlige Ramappa wrote:

On Thu, Nov 03, 2022 at 12:28:46PM +, Tvrtko Ursulin wrote:


On 03/11/2022 00:11, Umesh Nerlige Ramappa wrote:

Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.

When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. 
The

latency incurred by this register read causes the failure.

On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.


Do I read this right - that the latency of a 64 bit timestamp 
register read is 0.9 - 1.5ms? That would be the read in 
guc_update_pm_timestamp?


Correct. That is total time taken by intel_uncore_read64_2x32() 
measured with local_clock().


One other thing I missed out in the comments is that enable_dc=0 also 
resolves the issue, but display team confirmed there is no relation 
to display in this case other than that it somehow introduces a 
latency in the reg read.


Could it be the DMC wreaking havoc something similar to b68763741aa2 
("drm/i915: Restore GT performance in headless mode with DMC loaded")?




__gt_unpark is already doing a
gt->awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);

I would assume that __gt_unpark was called prior to running the 
selftest, need to confirm that though.


Right, I meant maybe something similar but not necessarily the same. 
Similar in the sense that it may be DMC doing many MMIO invisible to 
i915 and so introducing latency.



One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to 
user
since the CPU timestamp obtained here is only used for (1) selftest 
and

(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.


Note that this solution is here - 
https://patchwork.freedesktop.org/patch/509991/?series=110497&rev=1


but I am not intending to use it since it just reduces the frequency 
of failues, but the inherent issue still exists.


Right, I'd just go with that as well if it makes a significant 
improvement. Or even just refactor intel_uncore_read64_2x32 to be 
under one spinlock/fw. I don't see that it can have an excuse to be 
less efficient since there's a loop in there.


The patch did reduce the failure to once in 200 runs vs once in 10 runs.
I will refactor the helper in that case.


Yeah it makes sense to make it efficient. But feel free to go with the 
msleep increase as well to workaround the issue fully.


Regards,

Tvrtko


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-11-04 Thread Matthew Auld
On Thu, 3 Nov 2022 at 00:14, Brian Norris  wrote:
>
> On Wed, Nov 02, 2022 at 12:18:37PM +, Matthew Auld wrote:
> > On Tue, 1 Nov 2022 at 21:58, Brian Norris  wrote:
> > >
> > > On Fri, Oct 28, 2022 at 5:24 PM Patchwork
> > >  wrote:
> > > >
> > > > Patch Details
> > > > Series:drm/i915: Set PROBE_PREFER_ASYNCHRONOUS
> > > > URL:https://patchwork.freedesktop.org/series/110277/
> > > > State:failure
> > > > Details:https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> > > >
> > > > CI Bug Log - changes from CI_DRM_12317 -> Patchwork_110277v1
> > > >
> > > > Summary
> > > >
> > > > FAILURE
> > > >
> > > > Serious unknown changes coming with Patchwork_110277v1 absolutely need 
> > > > to be
> > > > verified manually.
> > > >
> > > > If you think the reported changes have nothing to do with the changes
> > > > introduced in Patchwork_110277v1, please notify your bug team to allow 
> > > > them
> > > > to document this new failure mode, which will reduce false positives in 
> > > > CI.
> > > >
> > > > External URL: 
> > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> > >
> > > For the record, I have almost zero idea what to do with this. From
> > > what I can tell, most (all?) of these failures are flaky(?) already
> > > and are probably not related to my change.
> >
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> >
> > According to that link, this change appears to break every platform
> > when running the live selftests (looking at the purple squares).
> > Running the selftests normally involves loading and unloading the
> > module. Looking at the logs there is scary stuff like:
> >
> [...]
>
> Ah, thanks. I'm not sure what made me think the tests were failing the
> same way on drm-tip, but maybe just chalk that up to my unfamiliarity
> with this particular dashboard... (There are a few isolated failure
> and/or flakes on drm-tip, but they don't look like this.)
>
> Anyway, I think I managed to run some of these tests on my own platforms
> [1], and I don't reproduce those failures. I do see other failures
> (crashes) though, like in i915_gem_mman_live_selftests/igt_mmap, where
> igt_mmap_offset() (selftest-only code) -> vm_mmap() assumes we have a
> valid |current->mm|. But that's borrowing the modprobe process's memory
> map, and with async probe, the selftest sequence happens in a kernel
> worker instead (and current->mm is NULL). So that clearly won't work.

Semi related:
https://lore.kernel.org/intel-gfx/20221104134703.3770b371@maurocar-mobl2/T/#m888972bb1ffb0a913e3db8b4099dffdc2ec7a0dc

Sounds like a similar issue when trying to convert the live selftests
over to kunit.

>
> I suppose I could disable async probe when built as a module (I believe
> it doesn't really have any value, since the module load task just waits
> for the async task anyway). I'm not familiar enough with MM to know what
> the vm_mmap() alternatives are, but this particular bit of code does
> feel odd.
>
> Additionally, I think this implies that live_selftests will break if
> i915 is built-in (i.e., =y, not =m), as we'll again run in a
> kernel-thread context at boot time. But I would hope nobody is trying to
> run them that way? I guess this gets even hairier, because even if the
> driver is built into the kernel, it's possible to kick them off from a
> process context by tweaking the module parameters later, and then
> re-binding the device... So all in all, this bug leaves an ugly
> situation, with or without my patch.
>
> I'm still curious about the reported failures, but maybe they require
> some particular sequence of tests? I also don't have the full
> igt-gpu-tools set running, so maybe they do something a little
> differently than my steps in [1]?
>
> Brian
>
> [1] I have a GLk system, if it matters. I figured I can run some of
> these with any one of the following:
>
>   modprobe i915 live_selftests=1
>   modprobe i915 live_selftests=1 igt__20__live_workarounds=Y
>   modprobe i915 live_selftests=1 igt__19__live_uncore=Y
>   modprobe i915 live_selftests=1 igt__18__live_sanitycheck=Y
>   ...


Re: [Intel-gfx] [PATCH] drm/i915/selftest: Bump up sample period for busy stats selftest

2022-11-04 Thread Umesh Nerlige Ramappa

On Fri, Nov 04, 2022 at 08:29:38AM +, Tvrtko Ursulin wrote:


On 03/11/2022 18:08, Umesh Nerlige Ramappa wrote:

On Thu, Nov 03, 2022 at 12:28:46PM +, Tvrtko Ursulin wrote:


On 03/11/2022 00:11, Umesh Nerlige Ramappa wrote:

Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.

When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. The
latency incurred by this register read causes the failure.

On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.


Do I read this right - that the latency of a 64 bit timestamp 
register read is 0.9 - 1.5ms? That would be the read in 
guc_update_pm_timestamp?


Correct. That is total time taken by intel_uncore_read64_2x32() 
measured with local_clock().


One other thing I missed out in the comments is that enable_dc=0 
also resolves the issue, but display team confirmed there is no 
relation to display in this case other than that it somehow 
introduces a latency in the reg read.


Could it be the DMC wreaking havoc something similar to b68763741aa2 
("drm/i915: Restore GT performance in headless mode with DMC loaded")?




__gt_unpark is already doing a 


gt->awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);

I would assume that __gt_unpark was called prior to running the 
selftest, need to confirm that though.



One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to user
since the CPU timestamp obtained here is only used for (1) selftest and
(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.


Note that this solution is here - 
https://patchwork.freedesktop.org/patch/509991/?series=110497&rev=1


but I am not intending to use it since it just reduces the frequency 
of failues, but the inherent issue still exists.


Right, I'd just go with that as well if it makes a significant 
improvement. Or even just refactor intel_uncore_read64_2x32 to be 
under one spinlock/fw. I don't see that it can have an excuse to be 
less efficient since there's a loop in there.


The patch did reduce the failure to once in 200 runs vs once in 10 runs.  


I will refactor the helper in that case.

Thanks,
Umesh



Regards,

Tvrtko


Regards,
Umesh



In order to make the selftest more robust and account for such
latencies, increase the sample period to 100 ms.

Signed-off-by: Umesh Nerlige Ramappa 
---
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c

index 0dcb3ed44a73..87c94314cf67 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
 ENGINE_TRACE(engine, "measuring busy time\n");
 preempt_disable();
 de = intel_engine_get_busy_time(engine, &t[0]);
-    mdelay(10);
+    mdelay(100);
 de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
 preempt_enable();
 dt = ktime_sub(t[1], t[0]);


Re: [Intel-gfx] [PATCH] i915/pmu: Use a faster read for 2x32 mmio reads

2022-11-04 Thread Umesh Nerlige Ramappa

On Thu, Nov 03, 2022 at 10:10:14PM -0700, Dixit, Ashutosh wrote:

On Thu, 03 Nov 2022 11:07:05 -0700, Umesh Nerlige Ramappa wrote:




Hi Umesh,


PMU reads the GT timestamp as a 2x32 mmio read and since upper and lower
32 bit registers are read in a loop, there is a latency involved in
getting the GT timestamp. To reduce the latency, define another version
of the helper that requires caller to acquire uncore->spinlock and
necessary forcewakes.


Why does this reduces the latency compared to intel_uncore_read64_2x32?


Most of the error introduced is between the time we capture GPU and CPU 
timestamps. I believe, with intel_uncore_read64_2x32, the time taken for 
forcewake is also included in that time, so that adds up.


Regards,
Umesh


Thanks.
--
Ashutosh


Signed-off-by: Umesh Nerlige Ramappa 
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 ---
 drivers/gpu/drm/i915/intel_uncore.h   | 24 +
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 693b07a97789..64b0193c9ee4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1252,6 +1252,28 @@ static u32 gpm_timestamp_shift(struct intel_gt *gt)
return 3 - shift;
 }

+static u64 gpm_timestamp(struct intel_uncore *uncore, ktime_t *now)
+{
+   enum forcewake_domains fw_domains;
+   u64 reg;
+
+   /* Assume MISC_STATUS0 and MISC_STATUS1 are in the same fw_domain */
+   fw_domains = intel_uncore_forcewake_for_reg(uncore,
+   MISC_STATUS0,
+   FW_REG_READ);
+
+   spin_lock_irq(&uncore->lock);
+   intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+   reg = intel_uncore_read64_2x32_fw(uncore, MISC_STATUS0, MISC_STATUS1);
+   *now = ktime_get();
+
+   intel_uncore_forcewake_put__locked(uncore, fw_domains);
+   spin_unlock_irq(&uncore->lock);
+
+   return reg;
+}
+
 static void guc_update_pm_timestamp(struct intel_guc *guc, ktime_t *now)
 {
struct intel_gt *gt = guc_to_gt(guc);
@@ -1261,10 +1283,8 @@ static void guc_update_pm_timestamp(struct intel_guc 
*guc, ktime_t *now)
lockdep_assert_held(&guc->timestamp.lock);

gt_stamp_hi = upper_32_bits(guc->timestamp.gt_stamp);
-   gpm_ts = intel_uncore_read64_2x32(gt->uncore, MISC_STATUS0,
- MISC_STATUS1) >> guc->timestamp.shift;
+   gpm_ts = gpm_timestamp(gt->uncore, now) >> guc->timestamp.shift;
gt_stamp_lo = lower_32_bits(gpm_ts);
-   *now = ktime_get();

if (gt_stamp_lo < lower_32_bits(guc->timestamp.gt_stamp))
gt_stamp_hi++;
diff --git a/drivers/gpu/drm/i915/intel_uncore.h 
b/drivers/gpu/drm/i915/intel_uncore.h
index 5449146a0624..dd0cf7d4ce6c 100644
--- a/drivers/gpu/drm/i915/intel_uncore.h
+++ b/drivers/gpu/drm/i915/intel_uncore.h
@@ -455,6 +455,30 @@ static inline void intel_uncore_rmw_fw(struct intel_uncore 
*uncore,
intel_uncore_write_fw(uncore, reg, val);
 }

+/*
+ * Introduce a _fw version of intel_uncore_read64_2x32 so that the 64 bit
+ * register read is as quick as possible.
+ *
+ * NOTE:
+ * Prior to calling this function, the caller must
+ * 1. obtain the uncore->lock
+ * 2. acquire forcewakes for the upper and lower register
+ */
+static inline u64
+intel_uncore_read64_2x32_fw(struct intel_uncore *uncore,
+   i915_reg_t lower_reg, i915_reg_t upper_reg)
+{
+   u32 upper, lower, old_upper, loop = 0;
+
+   upper = intel_uncore_read_fw(uncore, upper_reg);
+   do {
+   old_upper = upper;
+   lower = intel_uncore_read_fw(uncore, lower_reg);
+   upper = intel_uncore_read_fw(uncore, upper_reg);
+   } while (upper != old_upper && loop++ < 2);
+   return (u64)upper << 32 | lower;
+}
+
 static inline int intel_uncore_write_and_verify(struct intel_uncore *uncore,
i915_reg_t reg, u32 val,
u32 mask, u32 expected_val)
--
2.36.1



Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-11-04 Thread Matthew Auld
On Thu, 3 Nov 2022 at 00:14, Brian Norris  wrote:
>
> On Wed, Nov 02, 2022 at 12:18:37PM +, Matthew Auld wrote:
> > On Tue, 1 Nov 2022 at 21:58, Brian Norris  wrote:
> > >
> > > On Fri, Oct 28, 2022 at 5:24 PM Patchwork
> > >  wrote:
> > > >
> > > > Patch Details
> > > > Series:drm/i915: Set PROBE_PREFER_ASYNCHRONOUS
> > > > URL:https://patchwork.freedesktop.org/series/110277/
> > > > State:failure
> > > > Details:https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> > > >
> > > > CI Bug Log - changes from CI_DRM_12317 -> Patchwork_110277v1
> > > >
> > > > Summary
> > > >
> > > > FAILURE
> > > >
> > > > Serious unknown changes coming with Patchwork_110277v1 absolutely need 
> > > > to be
> > > > verified manually.
> > > >
> > > > If you think the reported changes have nothing to do with the changes
> > > > introduced in Patchwork_110277v1, please notify your bug team to allow 
> > > > them
> > > > to document this new failure mode, which will reduce false positives in 
> > > > CI.
> > > >
> > > > External URL: 
> > > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> > >
> > > For the record, I have almost zero idea what to do with this. From
> > > what I can tell, most (all?) of these failures are flaky(?) already
> > > and are probably not related to my change.
> >
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> >
> > According to that link, this change appears to break every platform
> > when running the live selftests (looking at the purple squares).
> > Running the selftests normally involves loading and unloading the
> > module. Looking at the logs there is scary stuff like:
> >
> [...]
>
> Ah, thanks. I'm not sure what made me think the tests were failing the
> same way on drm-tip, but maybe just chalk that up to my unfamiliarity
> with this particular dashboard... (There are a few isolated failure
> and/or flakes on drm-tip, but they don't look like this.)
>
> Anyway, I think I managed to run some of these tests on my own platforms
> [1], and I don't reproduce those failures. I do see other failures
> (crashes) though, like in i915_gem_mman_live_selftests/igt_mmap, where
> igt_mmap_offset() (selftest-only code) -> vm_mmap() assumes we have a
> valid |current->mm|. But that's borrowing the modprobe process's memory
> map, and with async probe, the selftest sequence happens in a kernel
> worker instead (and current->mm is NULL). So that clearly won't work.
>
> I suppose I could disable async probe when built as a module (I believe
> it doesn't really have any value, since the module load task just waits
> for the async task anyway). I'm not familiar enough with MM to know what
> the vm_mmap() alternatives are, but this particular bit of code does
> feel odd.
>
> Additionally, I think this implies that live_selftests will break if
> i915 is built-in (i.e., =y, not =m), as we'll again run in a
> kernel-thread context at boot time. But I would hope nobody is trying to
> run them that way? I guess this gets even hairier, because even if the
> driver is built into the kernel, it's possible to kick them off from a
> process context by tweaking the module parameters later, and then
> re-binding the device... So all in all, this bug leaves an ugly
> situation, with or without my patch.
>
> I'm still curious about the reported failures, but maybe they require
> some particular sequence of tests? I also don't have the full
> igt-gpu-tools set running, so maybe they do something a little
> differently than my steps in [1]?
>
> Brian
>
> [1] I have a GLk system, if it matters. I figured I can run some of
> these with any one of the following:
>
>   modprobe i915 live_selftests=1
>   modprobe i915 live_selftests=1 igt__20__live_workarounds=Y
>   modprobe i915 live_selftests=1 igt__19__live_uncore=Y
>   modprobe i915 live_selftests=1 igt__18__live_sanitycheck=Y
>   ...

CI should be using the IGT wrapper to run them, AFAIK. So something like:

./build/tests/i915_selftest

Or to just run the live, mock or perf:

./build/tests/i915_selftest --run-subtest live
./build/tests/i915_selftest --run-subtest mock
./build/tests/i915_selftest --run-subtest perf

Or if you want to run some particular selftest, like live mman tests:

./build/tests/i915_selftest --run-subtest live --dyn mman


[Intel-gfx] [PATCH v3 7/7] vfio: Remove vfio_free_device

2022-11-04 Thread Eric Farman
With the "mess" sorted out, we should be able to inline the
vfio_free_device call introduced by commit cb9ff3f3b84c
("vfio: Add helpers for unifying vfio_device life cycle")
and remove them from driver release callbacks.

Signed-off-by: Eric Farman 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Reviewed-by: Cornelia Huck 
Reviewed-by: Tony Krowiak   # vfio-ap part
Acked-by: Alex Williamson 
Reviewed-by: Matthew Rosato 
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 -
 drivers/s390/cio/vfio_ccw_ops.c   |  2 --
 drivers/s390/crypto/vfio_ap_ops.c |  6 --
 drivers/vfio/fsl-mc/vfio_fsl_mc.c |  1 -
 drivers/vfio/pci/vfio_pci_core.c  |  1 -
 drivers/vfio/platform/vfio_amba.c |  1 -
 drivers/vfio/platform/vfio_platform.c |  1 -
 drivers/vfio/vfio_main.c  | 22 --
 include/linux/vfio.h  |  1 -
 samples/vfio-mdev/mbochs.c|  1 -
 samples/vfio-mdev/mdpy.c  |  1 -
 samples/vfio-mdev/mtty.c  |  1 -
 12 files changed, 4 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 7a45e5360caf..eee6805e67de 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1461,7 +1461,6 @@ static void intel_vgpu_release_dev(struct vfio_device 
*vfio_dev)
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
 
intel_gvt_destroy_vgpu(vgpu);
-   vfio_free_device(vfio_dev);
 }
 
 static const struct vfio_device_ops intel_vgpu_dev_ops = {
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 1155f8bcedd9..598a3814d428 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -143,8 +143,6 @@ static void vfio_ccw_mdev_release_dev(struct vfio_device 
*vdev)
kmem_cache_free(vfio_ccw_io_region, private->io_region);
kfree(private->cp.guest_cp);
mutex_destroy(&private->io_mutex);
-
-   vfio_free_device(vdev);
 }
 
 static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
b/drivers/s390/crypto/vfio_ap_ops.c
index 0b4cc8c597ae..f108c0f14712 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -765,11 +765,6 @@ static void vfio_ap_mdev_unlink_fr_queues(struct 
ap_matrix_mdev *matrix_mdev)
}
 }
 
-static void vfio_ap_mdev_release_dev(struct vfio_device *vdev)
-{
-   vfio_free_device(vdev);
-}
-
 static void vfio_ap_mdev_remove(struct mdev_device *mdev)
 {
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(&mdev->dev);
@@ -1784,7 +1779,6 @@ static const struct attribute_group vfio_queue_attr_group 
= {
 
 static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
.init = vfio_ap_mdev_init_dev,
-   .release = vfio_ap_mdev_release_dev,
.open_device = vfio_ap_mdev_open_device,
.close_device = vfio_ap_mdev_close_device,
.ioctl = vfio_ap_mdev_ioctl,
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c 
b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index b16874e913e4..7b8889f55007 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -568,7 +568,6 @@ static void vfio_fsl_mc_release_dev(struct vfio_device 
*core_vdev)
 
vfio_fsl_uninit_device(vdev);
mutex_destroy(&vdev->igate);
-   vfio_free_device(core_vdev);
 }
 
 static int vfio_fsl_mc_remove(struct fsl_mc_device *mc_dev)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index badc9d828cac..9be2d5be5d95 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2109,7 +2109,6 @@ void vfio_pci_core_release_dev(struct vfio_device 
*core_vdev)
mutex_destroy(&vdev->vma_lock);
kfree(vdev->region);
kfree(vdev->pm_save);
-   vfio_free_device(core_vdev);
 }
 EXPORT_SYMBOL_GPL(vfio_pci_core_release_dev);
 
diff --git a/drivers/vfio/platform/vfio_amba.c 
b/drivers/vfio/platform/vfio_amba.c
index eaea63e5294c..18faf2678b99 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -95,7 +95,6 @@ static void vfio_amba_release_dev(struct vfio_device 
*core_vdev)
 
vfio_platform_release_common(vdev);
kfree(vdev->name);
-   vfio_free_device(core_vdev);
 }
 
 static void vfio_amba_remove(struct amba_device *adev)
diff --git a/drivers/vfio/platform/vfio_platform.c 
b/drivers/vfio/platform/vfio_platform.c
index 82cedcebfd90..9910451dc341 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -83,7 +83,6 @@ static void vfio_platform_release_dev(struct vfio_device 
*core_vdev)
container_of(core_vdev, struct vfio_platform_device, vdev);
 
vfio_platform_release_common(vdev);
-   vfio_free_device(core_vdev);
 }
 
 static int vfio_platform_remove(struct platform_device *pdev)
diff --git a/drivers/vfio/vfio_main.c b/driv

[Intel-gfx] [PATCH v3 1/7] vfio/ccw: create a parent struct

2022-11-04 Thread Eric Farman
Move the stuff associated with the mdev parent (and thus the
subchannel struct) into its own struct, and leave the rest in
the existing private structure.

The subchannel will point to the parent, and the parent will point
to the private, for the areas where one or both are needed. Further
separation of these structs will follow.

Signed-off-by: Eric Farman 
Reviewed-by: Matthew Rosato 
---
 drivers/s390/cio/vfio_ccw_drv.c | 98 +++--
 drivers/s390/cio/vfio_ccw_ops.c |  8 ++-
 drivers/s390/cio/vfio_ccw_private.h | 20 --
 3 files changed, 101 insertions(+), 25 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 7f5402fe857a..444b32047397 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -36,10 +36,19 @@ debug_info_t *vfio_ccw_debug_trace_id;
  */
 int vfio_ccw_sch_quiesce(struct subchannel *sch)
 {
-   struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+   struct vfio_ccw_parent *parent = dev_get_drvdata(&sch->dev);
+   struct vfio_ccw_private *private = dev_get_drvdata(&parent->dev);
DECLARE_COMPLETION_ONSTACK(completion);
int iretry, ret = 0;
 
+   /*
+* Probably an impossible situation, after being called through
+* FSM callbacks. But in the event it did, register a warning
+* and return as if things were fine.
+*/
+   if (WARN_ON(!private))
+   return 0;
+
iretry = 255;
do {
 
@@ -121,7 +130,23 @@ static void vfio_ccw_crw_todo(struct work_struct *work)
  */
 static void vfio_ccw_sch_irq(struct subchannel *sch)
 {
-   struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+   struct vfio_ccw_parent *parent = dev_get_drvdata(&sch->dev);
+   struct vfio_ccw_private *private = dev_get_drvdata(&parent->dev);
+
+   /*
+* The subchannel should still be disabled at this point,
+* so an interrupt would be quite surprising. As with an
+* interrupt while the FSM is closed, let's attempt to
+* disable the subchannel again.
+*/
+   if (!private) {
+   VFIO_CCW_MSG_EVENT(2, "sch %x.%x.%04x: unexpected interrupt\n",
+  sch->schid.cssid, sch->schid.ssid,
+  sch->schid.sch_no);
+
+   cio_disable_subchannel(sch);
+   return;
+   }
 
inc_irq_stat(IRQIO_CIO);
vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_INTERRUPT);
@@ -201,10 +226,19 @@ static void vfio_ccw_free_private(struct vfio_ccw_private 
*private)
mutex_destroy(&private->io_mutex);
kfree(private);
 }
+
+static void vfio_ccw_free_parent(struct device *dev)
+{
+   struct vfio_ccw_parent *parent = container_of(dev, struct 
vfio_ccw_parent, dev);
+
+   kfree(parent);
+}
+
 static int vfio_ccw_sch_probe(struct subchannel *sch)
 {
struct pmcw *pmcw = &sch->schib.pmcw;
struct vfio_ccw_private *private;
+   struct vfio_ccw_parent *parent;
int ret = -ENOMEM;
 
if (pmcw->qf) {
@@ -213,38 +247,58 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
return -ENODEV;
}
 
+   parent = kzalloc(sizeof(*parent), GFP_KERNEL);
+   if (!parent)
+   return -ENOMEM;
+
+   dev_set_name(&parent->dev, "parent");
+   parent->dev.parent = &sch->dev;
+   parent->dev.release = &vfio_ccw_free_parent;
+   ret = device_register(&parent->dev);
+   if (ret)
+   goto out_free;
+
private = vfio_ccw_alloc_private(sch);
-   if (IS_ERR(private))
+   if (IS_ERR(private)) {
+   device_unregister(&parent->dev);
return PTR_ERR(private);
+   }
 
-   dev_set_drvdata(&sch->dev, private);
+   dev_set_drvdata(&sch->dev, parent);
+   dev_set_drvdata(&parent->dev, private);
 
-   private->mdev_type.sysfs_name = "io";
-   private->mdev_type.pretty_name = "I/O subchannel (Non-QDIO)";
-   private->mdev_types[0] = &private->mdev_type;
-   ret = mdev_register_parent(&private->parent, &sch->dev,
+   parent->mdev_type.sysfs_name = "io";
+   parent->mdev_type.pretty_name = "I/O subchannel (Non-QDIO)";
+   parent->mdev_types[0] = &parent->mdev_type;
+   ret = mdev_register_parent(&parent->parent, &sch->dev,
   &vfio_ccw_mdev_driver,
-  private->mdev_types, 1);
+  parent->mdev_types, 1);
if (ret)
-   goto out_free;
+   goto out_unreg;
 
VFIO_CCW_MSG_EVENT(4, "bound to subchannel %x.%x.%04x\n",
   sch->schid.cssid, sch->schid.ssid,
   sch->schid.sch_no);
return 0;
 
+out_unreg:
+   device_unregister(&parent->dev);
 out_free:
+   dev_set_drvdata(&parent->dev, NULL);
dev_set_drvdata(&sch->dev,

[Intel-gfx] [PATCH v3 6/7] vfio/ccw: replace vfio_init_device with _alloc_

2022-11-04 Thread Eric Farman
Now that we have a reasonable separation of structs that follow
the subchannel and mdev lifecycles, there's no reason we can't
call the official vfio_alloc_device routine for our private data,
and behave like everyone else.

Signed-off-by: Eric Farman 
Reviewed-by: Kevin Tian 
Acked-by: Alex Williamson 
Reviewed-by: Matthew Rosato 
---
 drivers/s390/cio/vfio_ccw_drv.c | 18 --
 drivers/s390/cio/vfio_ccw_ops.c | 28 ++--
 drivers/s390/cio/vfio_ccw_private.h |  2 --
 drivers/vfio/vfio_main.c| 10 +-
 include/linux/vfio.h|  2 --
 5 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 9fbd1b27a1ac..c2a65808605a 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -152,24 +152,6 @@ static void vfio_ccw_sch_irq(struct subchannel *sch)
vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_INTERRUPT);
 }
 
-void vfio_ccw_free_private(struct vfio_ccw_private *private)
-{
-   struct vfio_ccw_crw *crw, *temp;
-
-   list_for_each_entry_safe(crw, temp, &private->crw, next) {
-   list_del(&crw->next);
-   kfree(crw);
-   }
-
-   kmem_cache_free(vfio_ccw_crw_region, private->crw_region);
-   kmem_cache_free(vfio_ccw_schib_region, private->schib_region);
-   kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
-   kmem_cache_free(vfio_ccw_io_region, private->io_region);
-   kfree(private->cp.guest_cp);
-   mutex_destroy(&private->io_mutex);
-   kfree(private);
-}
-
 static void vfio_ccw_free_parent(struct device *dev)
 {
struct vfio_ccw_parent *parent = container_of(dev, struct 
vfio_ccw_parent, dev);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 8a929a9cf3c6..1155f8bcedd9 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -102,15 +102,10 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
struct vfio_ccw_private *private;
int ret;
 
-   private = kzalloc(sizeof(*private), GFP_KERNEL);
-   if (!private)
-   return -ENOMEM;
-
-   ret = vfio_init_device(&private->vdev, &mdev->dev, &vfio_ccw_dev_ops);
-   if (ret) {
-   kfree(private);
-   return ret;
-   }
+   private = vfio_alloc_device(vfio_ccw_private, vdev, &mdev->dev,
+   &vfio_ccw_dev_ops);
+   if (IS_ERR(private))
+   return PTR_ERR(private);
 
dev_set_drvdata(&parent->dev, private);
 
@@ -135,8 +130,21 @@ static void vfio_ccw_mdev_release_dev(struct vfio_device 
*vdev)
 {
struct vfio_ccw_private *private =
container_of(vdev, struct vfio_ccw_private, vdev);
+   struct vfio_ccw_crw *crw, *temp;
+
+   list_for_each_entry_safe(crw, temp, &private->crw, next) {
+   list_del(&crw->next);
+   kfree(crw);
+   }
+
+   kmem_cache_free(vfio_ccw_crw_region, private->crw_region);
+   kmem_cache_free(vfio_ccw_schib_region, private->schib_region);
+   kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
+   kmem_cache_free(vfio_ccw_io_region, private->io_region);
+   kfree(private->cp.guest_cp);
+   mutex_destroy(&private->io_mutex);
 
-   vfio_ccw_free_private(private);
+   vfio_free_device(vdev);
 }
 
 static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_private.h 
b/drivers/s390/cio/vfio_ccw_private.h
index 2278fd38d34e..b441ae6700fd 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -131,8 +131,6 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch);
 void vfio_ccw_sch_io_todo(struct work_struct *work);
 void vfio_ccw_crw_todo(struct work_struct *work);
 
-void vfio_ccw_free_private(struct vfio_ccw_private *private);
-
 extern struct mdev_driver vfio_ccw_mdev_driver;
 
 /*
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2d168793d4e1..2901b8ad5be9 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -348,6 +348,9 @@ static void vfio_device_release(struct device *dev)
device->ops->release(device);
 }
 
+static int vfio_init_device(struct vfio_device *device, struct device *dev,
+   const struct vfio_device_ops *ops);
+
 /*
  * Allocate and initialize vfio_device so it can be registered to vfio
  * core.
@@ -386,11 +389,9 @@ EXPORT_SYMBOL_GPL(_vfio_alloc_device);
 
 /*
  * Initialize a vfio_device so it can be registered to vfio core.
- *
- * Only vfio-ccw driver should call this interface.
  */
-int vfio_init_device(struct vfio_device *device, struct device *dev,
-const struct vfio_device_ops *ops)
+static int vfio_init_device(struct vfio_device *device, struct device *dev,
+   const struct vfio_device_ops 

[Intel-gfx] [PATCH v3 4/7] vfio/ccw: move private to mdev lifecycle

2022-11-04 Thread Eric Farman
Now that the mdev parent data is split out into its own struct,
it is safe to move the remaining private data to follow the
mdev probe/remove lifecycle. The mdev parent data will remain
where it is, and follow the subchannel and the css driver
interfaces.

Signed-off-by: Eric Farman 
Reviewed-by: Matthew Rosato 
---
 drivers/s390/cio/vfio_ccw_drv.c | 16 +---
 drivers/s390/cio/vfio_ccw_ops.c | 26 +-
 drivers/s390/cio/vfio_ccw_private.h |  2 ++
 3 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index fbc26338ceab..9fbd1b27a1ac 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -152,7 +152,7 @@ static void vfio_ccw_sch_irq(struct subchannel *sch)
vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_INTERRUPT);
 }
 
-static void vfio_ccw_free_private(struct vfio_ccw_private *private)
+void vfio_ccw_free_private(struct vfio_ccw_private *private)
 {
struct vfio_ccw_crw *crw, *temp;
 
@@ -180,7 +180,6 @@ static void vfio_ccw_free_parent(struct device *dev)
 static int vfio_ccw_sch_probe(struct subchannel *sch)
 {
struct pmcw *pmcw = &sch->schib.pmcw;
-   struct vfio_ccw_private *private;
struct vfio_ccw_parent *parent;
int ret = -ENOMEM;
 
@@ -201,14 +200,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
if (ret)
goto out_free;
 
-   private = kzalloc(sizeof(*private), GFP_KERNEL);
-   if (!private) {
-   device_unregister(&parent->dev);
-   return -ENOMEM;
-   }
-
dev_set_drvdata(&sch->dev, parent);
-   dev_set_drvdata(&parent->dev, private);
 
parent->mdev_type.sysfs_name = "io";
parent->mdev_type.pretty_name = "I/O subchannel (Non-QDIO)";
@@ -227,25 +219,19 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 out_unreg:
device_unregister(&parent->dev);
 out_free:
-   dev_set_drvdata(&parent->dev, NULL);
dev_set_drvdata(&sch->dev, NULL);
-   if (private)
-   vfio_ccw_free_private(private);
return ret;
 }
 
 static void vfio_ccw_sch_remove(struct subchannel *sch)
 {
struct vfio_ccw_parent *parent = dev_get_drvdata(&sch->dev);
-   struct vfio_ccw_private *private = dev_get_drvdata(&parent->dev);
 
mdev_unregister_parent(&parent->parent);
 
device_unregister(&parent->dev);
dev_set_drvdata(&sch->dev, NULL);
 
-   vfio_ccw_free_private(private);
-
VFIO_CCW_MSG_EVENT(4, "unbound from subchannel %x.%x.%04x\n",
   sch->schid.cssid, sch->schid.ssid,
   sch->schid.sch_no);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index eb0b8cc210bb..e45d4acb109b 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -100,15 +100,20 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
 {
struct subchannel *sch = to_subchannel(mdev->dev.parent);
struct vfio_ccw_parent *parent = dev_get_drvdata(&sch->dev);
-   struct vfio_ccw_private *private = dev_get_drvdata(&parent->dev);
+   struct vfio_ccw_private *private;
int ret;
 
-   if (private->state == VFIO_CCW_STATE_NOT_OPER)
-   return -ENODEV;
+   private = kzalloc(sizeof(*private), GFP_KERNEL);
+   if (!private)
+   return -ENOMEM;
 
ret = vfio_init_device(&private->vdev, &mdev->dev, &vfio_ccw_dev_ops);
-   if (ret)
+   if (ret) {
+   kfree(private);
return ret;
+   }
+
+   dev_set_drvdata(&parent->dev, private);
 
VFIO_CCW_MSG_EVENT(2, "sch %x.%x.%04x: create\n",
   sch->schid.cssid,
@@ -122,6 +127,7 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
return 0;
 
 err_put_vdev:
+   dev_set_drvdata(&parent->dev, NULL);
vfio_put_device(&private->vdev);
return ret;
 }
@@ -131,15 +137,6 @@ static void vfio_ccw_mdev_release_dev(struct vfio_device 
*vdev)
struct vfio_ccw_private *private =
container_of(vdev, struct vfio_ccw_private, vdev);
 
-   /*
-* We cannot free vfio_ccw_private here because it includes
-* parent info which must be free'ed by css driver.
-*
-* Use a workaround by memset'ing the core device part and
-* then notifying the remove path that all active references
-* to this device have been released.
-*/
-   memset(vdev, 0, sizeof(*vdev));
complete(&private->release_comp);
 }
 
@@ -156,6 +153,7 @@ static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
 
vfio_unregister_group_dev(&private->vdev);
 
+   dev_set_drvdata(&parent->dev, NULL);
vfio_put_device(&private->vdev);
/*
 * Wait for all active references on mdev are released so it
@@ -166,6 +164,8 @@ static 

[Intel-gfx] [PATCH v3 2/7] vfio/ccw: remove private->sch

2022-11-04 Thread Eric Farman
These places all rely on the ability to jump from a private
struct back to the subchannel struct. Rather than keeping a
copy in our back pocket, let's use the relationship provided
by the vfio_device embedded within the private.

Signed-off-by: Eric Farman 
Reviewed-by: Matthew Rosato 
---
 drivers/s390/cio/vfio_ccw_chp.c |  5 +++--
 drivers/s390/cio/vfio_ccw_drv.c |  3 +--
 drivers/s390/cio/vfio_ccw_fsm.c | 27 ---
 drivers/s390/cio/vfio_ccw_ops.c | 12 ++--
 drivers/s390/cio/vfio_ccw_private.h |  7 ---
 5 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_chp.c b/drivers/s390/cio/vfio_ccw_chp.c
index 13b26a1c7988..d3f3a611f95b 100644
--- a/drivers/s390/cio/vfio_ccw_chp.c
+++ b/drivers/s390/cio/vfio_ccw_chp.c
@@ -16,6 +16,7 @@ static ssize_t vfio_ccw_schib_region_read(struct 
vfio_ccw_private *private,
  char __user *buf, size_t count,
  loff_t *ppos)
 {
+   struct subchannel *sch = to_subchannel(private->vdev.dev->parent);
unsigned int i = VFIO_CCW_OFFSET_TO_INDEX(*ppos) - VFIO_CCW_NUM_REGIONS;
loff_t pos = *ppos & VFIO_CCW_OFFSET_MASK;
struct ccw_schib_region *region;
@@ -27,12 +28,12 @@ static ssize_t vfio_ccw_schib_region_read(struct 
vfio_ccw_private *private,
mutex_lock(&private->io_mutex);
region = private->region[i].data;
 
-   if (cio_update_schib(private->sch)) {
+   if (cio_update_schib(sch)) {
ret = -ENODEV;
goto out;
}
 
-   memcpy(region, &private->sch->schib, sizeof(*region));
+   memcpy(region, &sch->schib, sizeof(*region));
 
if (copy_to_user(buf, (void *)region + pos, count)) {
ret = -EFAULT;
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 444b32047397..2c680a556383 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -160,7 +160,6 @@ static struct vfio_ccw_private 
*vfio_ccw_alloc_private(struct subchannel *sch)
if (!private)
return ERR_PTR(-ENOMEM);
 
-   private->sch = sch;
mutex_init(&private->io_mutex);
private->state = VFIO_CCW_STATE_STANDBY;
INIT_LIST_HEAD(&private->crw);
@@ -395,7 +394,7 @@ static int vfio_ccw_chp_event(struct subchannel *sch,
if (!private || !mask)
return 0;
 
-   trace_vfio_ccw_chp_event(private->sch->schid, mask, event);
+   trace_vfio_ccw_chp_event(sch->schid, mask, event);
VFIO_CCW_MSG_EVENT(2, "sch %x.%x.%04x: mask=0x%x event=%d\n",
   sch->schid.cssid,
   sch->schid.ssid, sch->schid.sch_no,
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
index a59c758869f8..e67fad897af3 100644
--- a/drivers/s390/cio/vfio_ccw_fsm.c
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -18,15 +18,13 @@
 
 static int fsm_io_helper(struct vfio_ccw_private *private)
 {
-   struct subchannel *sch;
+   struct subchannel *sch = to_subchannel(private->vdev.dev->parent);
union orb *orb;
int ccode;
__u8 lpm;
unsigned long flags;
int ret;
 
-   sch = private->sch;
-
spin_lock_irqsave(sch->lock, flags);
 
orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
@@ -80,13 +78,11 @@ static int fsm_io_helper(struct vfio_ccw_private *private)
 
 static int fsm_do_halt(struct vfio_ccw_private *private)
 {
-   struct subchannel *sch;
+   struct subchannel *sch = to_subchannel(private->vdev.dev->parent);
unsigned long flags;
int ccode;
int ret;
 
-   sch = private->sch;
-
spin_lock_irqsave(sch->lock, flags);
 
VFIO_CCW_TRACE_EVENT(2, "haltIO");
@@ -121,13 +117,11 @@ static int fsm_do_halt(struct vfio_ccw_private *private)
 
 static int fsm_do_clear(struct vfio_ccw_private *private)
 {
-   struct subchannel *sch;
+   struct subchannel *sch = to_subchannel(private->vdev.dev->parent);
unsigned long flags;
int ccode;
int ret;
 
-   sch = private->sch;
-
spin_lock_irqsave(sch->lock, flags);
 
VFIO_CCW_TRACE_EVENT(2, "clearIO");
@@ -160,7 +154,7 @@ static int fsm_do_clear(struct vfio_ccw_private *private)
 static void fsm_notoper(struct vfio_ccw_private *private,
enum vfio_ccw_event event)
 {
-   struct subchannel *sch = private->sch;
+   struct subchannel *sch = to_subchannel(private->vdev.dev->parent);
 
VFIO_CCW_MSG_EVENT(2, "sch %x.%x.%04x: notoper event %x state %x\n",
   sch->schid.cssid,
@@ -228,7 +222,7 @@ static void fsm_async_retry(struct vfio_ccw_private 
*private,
 static void fsm_disabled_irq(struct vfio_ccw_private *private,
 enum vfio_ccw_event event)
 {
-   struct subchannel *sch = private->sch;
+   str

[Intel-gfx] [PATCH v3 5/7] vfio/ccw: remove release completion

2022-11-04 Thread Eric Farman
There's enough separation between the parent and private structs now,
that it is fine to remove the release completion hack.

Signed-off-by: Eric Farman 
Reviewed-by: Kevin Tian 
Reviewed-by: Matthew Rosato 
---
 drivers/s390/cio/vfio_ccw_ops.c | 14 +-
 drivers/s390/cio/vfio_ccw_private.h |  3 ---
 2 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index e45d4acb109b..8a929a9cf3c6 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -54,7 +54,6 @@ static int vfio_ccw_mdev_init_dev(struct vfio_device *vdev)
INIT_LIST_HEAD(&private->crw);
INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
INIT_WORK(&private->crw_work, vfio_ccw_crw_todo);
-   init_completion(&private->release_comp);
 
private->cp.guest_cp = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1),
   GFP_KERNEL);
@@ -137,7 +136,7 @@ static void vfio_ccw_mdev_release_dev(struct vfio_device 
*vdev)
struct vfio_ccw_private *private =
container_of(vdev, struct vfio_ccw_private, vdev);
 
-   complete(&private->release_comp);
+   vfio_ccw_free_private(private);
 }
 
 static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
@@ -155,17 +154,6 @@ static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
 
dev_set_drvdata(&parent->dev, NULL);
vfio_put_device(&private->vdev);
-   /*
-* Wait for all active references on mdev are released so it
-* is safe to defer kfree() to a later point.
-*
-* TODO: the clean fix is to split parent/mdev info from ccw
-* private structure so each can be managed in its own life
-* cycle.
-*/
-   wait_for_completion(&private->release_comp);
-
-   vfio_ccw_free_private(private);
 }
 
 static int vfio_ccw_mdev_open_device(struct vfio_device *vdev)
diff --git a/drivers/s390/cio/vfio_ccw_private.h 
b/drivers/s390/cio/vfio_ccw_private.h
index 747aba5f5272..2278fd38d34e 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -102,7 +102,6 @@ struct vfio_ccw_parent {
  * @req_trigger: eventfd ctx for signaling userspace to return device
  * @io_work: work for deferral process of I/O handling
  * @crw_work: work for deferral process of CRW handling
- * @release_comp: synchronization helper for vfio device release
  */
 struct vfio_ccw_private {
struct vfio_device vdev;
@@ -126,8 +125,6 @@ struct vfio_ccw_private {
struct eventfd_ctx  *req_trigger;
struct work_struct  io_work;
struct work_struct  crw_work;
-
-   struct completion   release_comp;
 } __aligned(8);
 
 int vfio_ccw_sch_quiesce(struct subchannel *sch);
-- 
2.34.1



[Intel-gfx] [PATCH v3 3/7] vfio/ccw: move private initialization to callback

2022-11-04 Thread Eric Farman
There's already a device initialization callback that is used to
initialize the release completion workaround that was introduced
by commit ebb72b765fb49 ("vfio/ccw: Use the new device life cycle
helpers").

Move the other elements of the vfio_ccw_private struct that
require distinct initialization over to that routine.

With that done, the vfio_ccw_alloc_private routine only does a
kzalloc, so fold it inline.

Signed-off-by: Eric Farman 
Reviewed-by: Matthew Rosato 
---
 drivers/s390/cio/vfio_ccw_drv.c | 74 -
 drivers/s390/cio/vfio_ccw_ops.c | 43 +
 drivers/s390/cio/vfio_ccw_private.h |  7 ++-
 3 files changed, 58 insertions(+), 66 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 2c680a556383..fbc26338ceab 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -23,10 +23,10 @@
 #include "vfio_ccw_private.h"
 
 struct workqueue_struct *vfio_ccw_work_q;
-static struct kmem_cache *vfio_ccw_io_region;
-static struct kmem_cache *vfio_ccw_cmd_region;
-static struct kmem_cache *vfio_ccw_schib_region;
-static struct kmem_cache *vfio_ccw_crw_region;
+struct kmem_cache *vfio_ccw_io_region;
+struct kmem_cache *vfio_ccw_cmd_region;
+struct kmem_cache *vfio_ccw_schib_region;
+struct kmem_cache *vfio_ccw_crw_region;
 
 debug_info_t *vfio_ccw_debug_msg_id;
 debug_info_t *vfio_ccw_debug_trace_id;
@@ -79,7 +79,7 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
return ret;
 }
 
-static void vfio_ccw_sch_io_todo(struct work_struct *work)
+void vfio_ccw_sch_io_todo(struct work_struct *work)
 {
struct vfio_ccw_private *private;
struct irb *irb;
@@ -115,7 +115,7 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
eventfd_signal(private->io_trigger, 1);
 }
 
-static void vfio_ccw_crw_todo(struct work_struct *work)
+void vfio_ccw_crw_todo(struct work_struct *work)
 {
struct vfio_ccw_private *private;
 
@@ -152,62 +152,6 @@ static void vfio_ccw_sch_irq(struct subchannel *sch)
vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_INTERRUPT);
 }
 
-static struct vfio_ccw_private *vfio_ccw_alloc_private(struct subchannel *sch)
-{
-   struct vfio_ccw_private *private;
-
-   private = kzalloc(sizeof(*private), GFP_KERNEL);
-   if (!private)
-   return ERR_PTR(-ENOMEM);
-
-   mutex_init(&private->io_mutex);
-   private->state = VFIO_CCW_STATE_STANDBY;
-   INIT_LIST_HEAD(&private->crw);
-   INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
-   INIT_WORK(&private->crw_work, vfio_ccw_crw_todo);
-
-   private->cp.guest_cp = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1),
-  GFP_KERNEL);
-   if (!private->cp.guest_cp)
-   goto out_free_private;
-
-   private->io_region = kmem_cache_zalloc(vfio_ccw_io_region,
-  GFP_KERNEL | GFP_DMA);
-   if (!private->io_region)
-   goto out_free_cp;
-
-   private->cmd_region = kmem_cache_zalloc(vfio_ccw_cmd_region,
-   GFP_KERNEL | GFP_DMA);
-   if (!private->cmd_region)
-   goto out_free_io;
-
-   private->schib_region = kmem_cache_zalloc(vfio_ccw_schib_region,
- GFP_KERNEL | GFP_DMA);
-
-   if (!private->schib_region)
-   goto out_free_cmd;
-
-   private->crw_region = kmem_cache_zalloc(vfio_ccw_crw_region,
-   GFP_KERNEL | GFP_DMA);
-
-   if (!private->crw_region)
-   goto out_free_schib;
-   return private;
-
-out_free_schib:
-   kmem_cache_free(vfio_ccw_schib_region, private->schib_region);
-out_free_cmd:
-   kmem_cache_free(vfio_ccw_cmd_region, private->cmd_region);
-out_free_io:
-   kmem_cache_free(vfio_ccw_io_region, private->io_region);
-out_free_cp:
-   kfree(private->cp.guest_cp);
-out_free_private:
-   mutex_destroy(&private->io_mutex);
-   kfree(private);
-   return ERR_PTR(-ENOMEM);
-}
-
 static void vfio_ccw_free_private(struct vfio_ccw_private *private)
 {
struct vfio_ccw_crw *crw, *temp;
@@ -257,10 +201,10 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
if (ret)
goto out_free;
 
-   private = vfio_ccw_alloc_private(sch);
-   if (IS_ERR(private)) {
+   private = kzalloc(sizeof(*private), GFP_KERNEL);
+   if (!private) {
device_unregister(&parent->dev);
-   return PTR_ERR(private);
+   return -ENOMEM;
}
 
dev_set_drvdata(&sch->dev, parent);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 79c50cb7dcb8..eb0b8cc210bb 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -49,8 +49,51 @@ static int vfio_ccw_mdev_init_dev(struct vfio_device *vdev)

[Intel-gfx] [PATCH v3 0/7] vfio-ccw parent rework

2022-11-04 Thread Eric Farman
Hi Alex,

Here's the (last?) update to the vfio-ccw lifecycle changes that I've sent
recently, and were previously discussed at various points [1][2].

Patches 1-5 rework the behavior of the vfio-ccw driver's private struct.
In summary, the mdev pieces are split out of vfio_ccw_private and into a
new vfio_ccw_parent struct that will continue to follow today's lifecycle.
The remainder (bulk) of the private struct moves to follow the mdev
probe/remove pair. There's opportunity for further separation of the
things in the private struct, which would simplify some of the vfio-ccw
code, but it got too hairy as I started that. Once vfio-ccw is no longer
considered unique, those cleanups can happen at our leisure. 

Patch 6 removes the trickery where vfio-ccw uses vfio_init_device instead of
vfio_alloc_device, and thus removes vfio_init_device from the outside world.

Patch 7 removes vfio_free_device from vfio-ccw and the other drivers (hello,
CC list!), letting it be handled by vfio_device_release directly.

I believe this covers everything in this space; let me know if not!

Thanks,
Eric

[1] https://lore.kernel.org/kvm/0-v3-57c1502c62fd+2190-ccw_mdev_...@nvidia.com/
[2] https://lore.kernel.org/kvm/20220602171948.2790690-1-far...@linux.ibm.com/

v2->v3:
 - [MR] Added r-b to remaining patches (Thank you!)
 - Patch 1:
   [gfx checkpatch] Whitespace
   [EF] Remove put_device(&parent->dev)
   [MR] Fix error exit when alloc of parent fails
   [MR] Check for !private on sch_probe error path
 - Patch 3:
   [EF] Fix error exit when alloc of private fails
 - Patch 6:
   [AW] Added ack (Thank you!)
 - Patch 7:
   [CH, AK] Added r-b (Thank you!)
   [AW] Added ack (Thank you!)
v2: https://lore.kernel.org/kvm/20221102150152.2521475-1-far...@linux.ibm.com/
v1: https://lore.kernel.org/kvm/20221019162135.798901-1-far...@linux.ibm.com/

Eric Farman (7):
  vfio/ccw: create a parent struct
  vfio/ccw: remove private->sch
  vfio/ccw: move private initialization to callback
  vfio/ccw: move private to mdev lifecycle
  vfio/ccw: remove release completion
  vfio/ccw: replace vfio_init_device with _alloc_
  vfio: Remove vfio_free_device

 drivers/gpu/drm/i915/gvt/kvmgt.c  |   1 -
 drivers/s390/cio/vfio_ccw_chp.c   |   5 +-
 drivers/s390/cio/vfio_ccw_drv.c   | 173 +++---
 drivers/s390/cio/vfio_ccw_fsm.c   |  27 ++--
 drivers/s390/cio/vfio_ccw_ops.c   | 107 +++-
 drivers/s390/cio/vfio_ccw_private.h   |  37 --
 drivers/s390/crypto/vfio_ap_ops.c |   6 -
 drivers/vfio/fsl-mc/vfio_fsl_mc.c |   1 -
 drivers/vfio/pci/vfio_pci_core.c  |   1 -
 drivers/vfio/platform/vfio_amba.c |   1 -
 drivers/vfio/platform/vfio_platform.c |   1 -
 drivers/vfio/vfio_main.c  |  32 ++---
 include/linux/vfio.h  |   3 -
 samples/vfio-mdev/mbochs.c|   1 -
 samples/vfio-mdev/mdpy.c  |   1 -
 samples/vfio-mdev/mtty.c  |   1 -
 16 files changed, 196 insertions(+), 202 deletions(-)

-- 
2.34.1



[Intel-gfx] ✓ Fi.CI.BAT: success for KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL (rev2)

2022-11-04 Thread Patchwork
== Series Details ==

Series: KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if 
current->mm is not NULL (rev2)
URL   : https://patchwork.freedesktop.org/series/110492/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12342 -> Patchwork_110492v2


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/index.html

Participating hosts (27 -> 29)
--

  Additional (4): fi-adl-ddr5 fi-rkl-11600 fi-tgl-dsi fi-ilk-650 
  Missing(2): fi-ctg-p8600 fi-bdw-samus 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110492v2:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@debugfs_test@basic-hwmon}:
- fi-adl-ddr5:NOTRUN -> [SKIP][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-adl-ddr5/igt@debugfs_t...@basic-hwmon.html

  
Known issues


  Here are the changes found in Patchwork_110492v2 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_exec_gttfill@basic:
- fi-pnv-d510:[PASS][2] -> [FAIL][3] ([i915#7229])
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-pnv-d510/igt@gem_exec_gttf...@basic.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-pnv-d510/igt@gem_exec_gttf...@basic.html

  * igt@gem_huc_copy@huc-copy:
- fi-rkl-11600:   NOTRUN -> [SKIP][4] ([i915#2190])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-rkl-11600/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@parallel-random-engines:
- fi-rkl-11600:   NOTRUN -> [SKIP][5] ([i915#4613]) +3 similar issues
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-rkl-11600/igt@gem_lmem_swapp...@parallel-random-engines.html
- fi-adl-ddr5:NOTRUN -> [SKIP][6] ([i915#4613]) +3 similar issues
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-adl-ddr5/igt@gem_lmem_swapp...@parallel-random-engines.html

  * igt@gem_tiled_pread_basic:
- fi-rkl-11600:   NOTRUN -> [SKIP][7] ([i915#3282])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-rkl-11600/igt@gem_tiled_pread_basic.html
- fi-adl-ddr5:NOTRUN -> [SKIP][8] ([i915#3282])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-adl-ddr5/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- fi-adl-ddr5:NOTRUN -> [SKIP][9] ([i915#1155])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-adl-ddr5/igt@i915_pm_backli...@basic-brightness.html
- fi-rkl-11600:   NOTRUN -> [SKIP][10] ([i915#3012])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-rkl-11600/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
- fi-ilk-650: NOTRUN -> [SKIP][11] ([fdo#109271]) +19 similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-ilk-650/igt@i915_pm_...@basic-pci-d3-state.html

  * igt@i915_selftest@live@hangcheck:
- fi-hsw-g3258:   [PASS][12] -> [INCOMPLETE][13] ([i915#3303] / 
[i915#4785])
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-hsw-g3258/igt@i915_selftest@l...@hangcheck.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-hsw-g3258/igt@i915_selftest@l...@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
- fi-rkl-11600:   NOTRUN -> [INCOMPLETE][14] ([i915#4817])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-rkl-11600/igt@i915_susp...@basic-s3-without-i915.html

  * igt@kms_chamelium@dp-edid-read:
- fi-adl-ddr5:NOTRUN -> [SKIP][15] ([fdo#111827]) +8 similar issues
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-adl-ddr5/igt@kms_chamel...@dp-edid-read.html

  * igt@kms_chamelium@hdmi-edid-read:
- fi-ilk-650: NOTRUN -> [SKIP][16] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-ilk-650/igt@kms_chamel...@hdmi-edid-read.html

  * igt@kms_chamelium@hdmi-hpd-fast:
- fi-rkl-11600:   NOTRUN -> [SKIP][17] ([fdo#111827]) +7 similar issues
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-rkl-11600/igt@kms_chamel...@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
- fi-rkl-11600:   NOTRUN -> [SKIP][18] ([i915#4103])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110492v2/fi-rkl-11600/igt@kms_cursor_leg...@basic-busy-flip-before-cursor.html
- fi-adl-ddr5:NOTRUN -> [SKIP][19] ([i915#4103])
   [19]: 
https://intel-gfx-

[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL (rev2)

2022-11-04 Thread Patchwork
== Series Details ==

Series: KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if 
current->mm is not NULL (rev2)
URL   : https://patchwork.freedesktop.org/series/110492/
State : warning

== Summary ==

Error: dim checkpatch failed
1bcd7d5d971a KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check 
if current->mm is not NULL
-:18: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
(prefer a maximum 75 chars per line)
#18: 
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.

-:261: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 41 lines checked




[Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [CI,1/2] Revert "freezer, sched: Rewrite core freezer logic fix"

2022-11-04 Thread Patchwork
== Series Details ==

Series: series starting with [CI,1/2] Revert "freezer, sched: Rewrite core 
freezer logic fix"
URL   : https://patchwork.freedesktop.org/series/110529/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12342 -> Patchwork_110529v1


Summary
---

  **SUCCESS**

  No regressions found.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/index.html

Participating hosts (27 -> 28)
--

  Additional (3): fi-adl-ddr5 fi-rkl-11600 fi-ilk-650 
  Missing(2): fi-ctg-p8600 fi-bdw-samus 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_110529v1:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@debugfs_test@basic-hwmon}:
- fi-adl-ddr5:NOTRUN -> [SKIP][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-adl-ddr5/igt@debugfs_t...@basic-hwmon.html

  * igt@i915_selftest@live@hangcheck:
- {fi-jsl-1}: [PASS][2] -> [INCOMPLETE][3]
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-jsl-1/igt@i915_selftest@l...@hangcheck.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-jsl-1/igt@i915_selftest@l...@hangcheck.html

  
Known issues


  Here are the changes found in Patchwork_110529v1 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@gem_huc_copy@huc-copy:
- fi-rkl-11600:   NOTRUN -> [SKIP][4] ([i915#2190])
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-rkl-11600/igt@gem_huc_c...@huc-copy.html

  * igt@gem_lmem_swapping@parallel-random-engines:
- fi-rkl-11600:   NOTRUN -> [SKIP][5] ([i915#4613]) +3 similar issues
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-rkl-11600/igt@gem_lmem_swapp...@parallel-random-engines.html
- fi-adl-ddr5:NOTRUN -> [SKIP][6] ([i915#4613]) +3 similar issues
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-adl-ddr5/igt@gem_lmem_swapp...@parallel-random-engines.html

  * igt@gem_tiled_pread_basic:
- fi-rkl-11600:   NOTRUN -> [SKIP][7] ([i915#3282])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-rkl-11600/igt@gem_tiled_pread_basic.html
- fi-adl-ddr5:NOTRUN -> [SKIP][8] ([i915#3282])
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-adl-ddr5/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
- fi-adl-ddr5:NOTRUN -> [SKIP][9] ([i915#1155])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-adl-ddr5/igt@i915_pm_backli...@basic-brightness.html
- fi-rkl-11600:   NOTRUN -> [SKIP][10] ([i915#3012])
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-rkl-11600/igt@i915_pm_backli...@basic-brightness.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
- fi-ilk-650: NOTRUN -> [SKIP][11] ([fdo#109271]) +19 similar issues
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-ilk-650/igt@i915_pm_...@basic-pci-d3-state.html

  * igt@kms_chamelium@dp-edid-read:
- fi-adl-ddr5:NOTRUN -> [SKIP][12] ([fdo#111827]) +8 similar issues
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-adl-ddr5/igt@kms_chamel...@dp-edid-read.html

  * igt@kms_chamelium@hdmi-edid-read:
- fi-ilk-650: NOTRUN -> [SKIP][13] ([fdo#109271] / [fdo#111827]) +8 
similar issues
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-ilk-650/igt@kms_chamel...@hdmi-edid-read.html

  * igt@kms_chamelium@hdmi-hpd-fast:
- fi-rkl-11600:   NOTRUN -> [SKIP][14] ([fdo#111827]) +8 similar issues
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-rkl-11600/igt@kms_chamel...@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor:
- fi-rkl-11600:   NOTRUN -> [SKIP][15] ([i915#4103])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-rkl-11600/igt@kms_cursor_leg...@basic-busy-flip-before-cursor.html
- fi-adl-ddr5:NOTRUN -> [SKIP][16] ([i915#4103])
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-adl-ddr5/igt@kms_cursor_leg...@basic-busy-flip-before-cursor.html

  * igt@kms_force_connector_basic@force-load-detect:
- fi-rkl-11600:   NOTRUN -> [SKIP][17] ([fdo#109285] / [i915#4098])
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-rkl-11600/igt@kms_force_connector_ba...@force-load-detect.html
- fi-adl-ddr5:NOTRUN -> [SKIP][18] ([fdo#109285])
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110529v1/fi-adl-ddr5/igt@kms_force_connector_ba...@force-load-detect.html

  * igt@kms_psr@cursor_plane_move:
- fi-a

Re: [Intel-gfx] KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL

2022-11-04 Thread Mauro Carvalho Chehab
On Fri, 4 Nov 2022 08:49:55 +0100
Mauro Carvalho Chehab  wrote:

> On Thu, 3 Nov 2022 15:43:26 -0700
> Daniel Latypov  wrote:
> 
> > On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
> >  wrote:  
> > >
> > > Hi,
> > >
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.
> > >
> > > The DRM subsystem and the i915 driver has, for a long time, his own
> > > way to do unit tests, which seems to be added before KUnit.
> > >
> > > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > > a RFC with some patches adding support for the tests we have to be
> > > reported using Kernel TAP and KUnit.
> > >
> > > There are basically 3 groups of tests there:
> > >
> > > - mock tests - check i915 hardware-independent logic;
> > > - live tests - run some hardware-specific tests;
> > > - perf tests - check perf support - also hardware-dependent.
> > >
> > > As they depend on i915 driver, they run only on x86, with PCI
> > > stack enabled, but the mock tests run nicely via qemu.
> > >
> > > The live and perf tests require a real hardware. As we run them
> > > together with our CI, which, among other things, test module
> > > unload/reload and test loading i915 driver with different
> > > modprobe parameters, the KUnit tests should be able to run as
> > > a module.
> > >
> > > While testing KUnit, I noticed a couple of issues:
> > >
> > > 1. kunit.py parser is currently broken when used with modules
> > >
> > > the parser expects "TAP version xx" output, but this won't
> > > happen when loading the kunit test driver.
> > >
> > > Are there any plans or patches fixing this issue?
> > 
> > Partially.
> > Note: we need a header to look for so we can strip prefixes (like 
> > timestamps).
> > 
> > But there is a patch in the works to add a TAP header for each
> > subtest, hopefully in time for 6.2.  
> 
> Good to know.
> 
> > This is to match the KTAP spec:
> > https://kernel.org/doc/html/latest/dev-tools/ktap.html  
> 
> I see.
> 
> > That should fix it so you can parse one suite's results at a time.
> > I'm pretty sure it won't fix the case where there's multiple suites
> > and/or you're trying to parse all test results at once via
> > 
> > $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> > ./tools/testing/kunit/kunit.py parse  
> 
> Could you point me to the changeset? perhaps I can write a followup
> patch addressing this case.
> 
> > I think that in-kernel code change + some more python changes could
> > make the above command work, but no one has actively started looking
> > at that just yet.
> > Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> > underestimating how complicated this is).
> >   
> > >
> > > 2. current->mm is not initialized
> > >
> > > Some tests do mmap(). They need the mm user context to be initialized,
> > > but this is not happening right now.
> > >
> > > Are there a way to properly initialize it for KUnit?
> > 
> > Right, this is a consequence of how early built-in KUnit tests are run
> > after boot.
> > I think for now, the answer is to make the test module-only.
> > 
> > I know David had some ideas here, but I can't speak to them.  
> 
> This is happening when test-i915 is built as module as well.
> 
> I suspect that the function which initializes it is mm_alloc() inside 
> kernel/fork.c:
> 
>   struct mm_struct *mm_alloc(void)
>   {
>   struct mm_struct *mm;
> 
>   mm = allocate_mm();
>   if (!mm)
>   return NULL;
> 
>   memset(mm, 0, sizeof(*mm));
>   return mm_init(mm, current, current_user_ns());
>   }
> 
> As modprobing a test won't fork until all tests run, this never runs.
> 
> It seems that the normal usage is at fs/exec.c:
> 
>   fs/exec.c:  bprm->mm = mm = mm_alloc();
> 
> but other places also call it:
> 
>   arch/arm/mach-rpc/ecard.c:  struct mm_struct * mm = mm_alloc();
>   drivers/dma-buf/dma-resv.c: struct mm_struct *mm = mm_alloc();
>   include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
>   mm/debug_vm_pgtable.c:  args->mm = mm_alloc();
> 
> Probably the solution would be to call it inside kunit executor code,
> adding support for modules to use it.


Hmm... it is not that simple... I tried the enclosed patch, but it caused
another issue at the live/mman/mmap test:


[  152.815543] test_i915: :00:02.0: it is a i915 device.
[  152.816456] # Subtest: i915 live selftests
[  152.816463] 1..1
[  152.816835] kunit_try_run_case: allocating user context
[  152.816978] CPU: 1 PID: 1139 Comm: kunit_try_catch Tainted: G
 N 6.1.0-rc2-drm-110e9bebcbcc+ #20
[  152.817063] Hardware name: Intel Corporation Tiger Lake Client 
Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 
06/10/2020
[  152.817583] i915: Performing live_mman selftests with 
st_random_seed=0x11aaba4d st_timeout=500
[  152.817735] test_i915: Setting dangerous o

[Intel-gfx] ✗ Fi.CI.DOCS: warning for series starting with [CI,1/2] Revert "freezer, sched: Rewrite core freezer logic fix"

2022-11-04 Thread Patchwork
== Series Details ==

Series: series starting with [CI,1/2] Revert "freezer, sched: Rewrite core 
freezer logic fix"
URL   : https://patchwork.freedesktop.org/series/110529/
State : warning

== Summary ==

Error: make htmldocs had i915 warnings
./drivers/gpu/drm/i915/i915_perf_types.h:319: warning: Function parameter or 
member 'lock' not described in 'i915_perf_stream'




Re: [Intel-gfx] [PATCH v2 4/7] vfio/ccw: move private to mdev lifecycle

2022-11-04 Thread Eric Farman
On Thu, 2022-11-03 at 19:22 -0400, Matthew Rosato wrote:
> On 11/2/22 11:01 AM, Eric Farman wrote:
> > Now that the mdev parent data is split out into its own struct,
> > it is safe to move the remaining private data to follow the
> > mdev probe/remove lifecycle. The mdev parent data will remain
> > where it is, and follow the subchannel and the css driver
> > interfaces.
> > 
> > Signed-off-by: Eric Farman 
> > ---
> >  drivers/s390/cio/vfio_ccw_drv.c | 15 +--
> >  drivers/s390/cio/vfio_ccw_ops.c | 26 +
> > -
> >  drivers/s390/cio/vfio_ccw_private.h |  2 ++
> >  3 files changed, 16 insertions(+), 27 deletions(-)
> > 
> 
> ...
> 
> > diff --git a/drivers/s390/cio/vfio_ccw_ops.c
> > b/drivers/s390/cio/vfio_ccw_ops.c
> > index eb0b8cc210bb..e45d4acb109b 100644
> > --- a/drivers/s390/cio/vfio_ccw_ops.c
> > +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > @@ -100,15 +100,20 @@ static int vfio_ccw_mdev_probe(struct
> > mdev_device *mdev)
> >  {
> > struct subchannel *sch = to_subchannel(mdev->dev.parent);
> > struct vfio_ccw_parent *parent = dev_get_drvdata(&sch-
> > >dev);
> > -   struct vfio_ccw_private *private = dev_get_drvdata(&parent-
> > >dev);
> > +   struct vfio_ccw_private *private;
> > int ret;
> >  
> > -   if (private->state == VFIO_CCW_STATE_NOT_OPER)
> > -   return -ENODEV;
> > +   private = kzalloc(sizeof(*private), GFP_KERNEL);
> > +   if (!private)
> > +   return -ENOMEM;
> 
> Ha, looks like you time traveled and took my advice :)

Ha, I forgot I did this in the future. :)

> 
> In fact it looks like some of my other comments from patch 1 get
> cleaned up here too -- but would still be good to make those changes
> in patch 1 for completeness/bisect.

Agreed, I'll pull those down to patch 1; thanks.

> 
> Reviewed-by: Matthew Rosato 
> 



Re: [Intel-gfx] [PATCH v2 0/7] vfio-ccw parent rework

2022-11-04 Thread Eric Farman
On Thu, 2022-11-03 at 19:43 -0400, Matthew Rosato wrote:
> On 11/3/22 5:56 PM, Alex Williamson wrote:
> > On Wed,  2 Nov 2022 16:01:45 +0100
> > Eric Farman  wrote:
> > 
> > > Hi all,
> > > 
> > > Here is an update to the vfio-ccw lifecycle changes that have
> > > been discussed
> > > in various forms over the past year [1][2] or so, and which I
> > > dusted off
> > > recently.
> > > 
> > > Patches 1-5 rework the behavior of the vfio-ccw driver's private
> > > struct.
> > > In summary, the mdev pieces are split out of vfio_ccw_private and
> > > into a
> > > new vfio_ccw_parent struct that will continue to follow today's
> > > lifecycle.
> > > The remainder (bulk) of the private struct moves to follow the
> > > mdev
> > > probe/remove pair. There's opportunity for further separation of
> > > the
> > > things in the private struct, which would simplify some of the
> > > vfio-ccw
> > > code, but it got too hairy as I started that. Once vfio-ccw is no
> > > longer
> > > considered unique, those cleanups can happen at our leisure. 
> > > 
> > > Patch 6 removes the trickery where vfio-ccw uses vfio_init_device
> > > instead of
> > > vfio_alloc_device, and thus removes vfio_init_device from the
> > > outside world.
> > > 
> > > Patch 7 removes vfio_free_device from vfio-ccw and the other
> > > drivers (hello,
> > > CC list!), letting it be handled by vfio_device_release directly.
> > 
> > Looks like another spin is pending, but the vfio core and
> > collateral
> > changes in 6 and 7 look good to me.  Would this go in through the
> > vfio
> > or s390 tree?  I'd be happy to merge or provide a branch, depending
> > on
> > the route.
> > 
> > For 6 & 7:
> > Acked-by: Alex Williamson 
> > 
> > Thanks,
> > Alex
> 
> LGTM with those few comments addressed -- @Eric please send a v3 and
> I think it's ready.

Will do that now; thanks Matt.

> 
> I would suggest vfio tree to reduce the chance of conflicts; this
> touches various vfio drivers (and main) with the last patches while
> the s390 hits are at least all contained to the vfio-ccw driver code.
> 

Agreed. Thanks to you both.


Re: [Intel-gfx] [PATCH 2/9] drm/i915: Use kmap_local_page() in gem/i915_gem_pyhs.c

2022-11-04 Thread Zhao Liu
On Sat, Oct 29, 2022 at 03:32:08PM +0200, Fabio M. De Francesco wrote:
> Date: Sat, 29 Oct 2022 15:32:08 +0200
> From: "Fabio M. De Francesco" 
> Subject: Re: [PATCH 2/9] drm/i915: Use kmap_local_page() in
>  gem/i915_gem_pyhs.c
> 
> On luned? 17 ottobre 2022 11:37:18 CEST Zhao Liu wrote:
> > From: Zhao Liu 
> > 
> > The use of kmap_atomic() is being deprecated in favor of
> > kmap_local_page()[1].
> > 
> > The main difference between atomic and local mappings is that local
> > mappings doesn't disable page faults or preemption.
> > 
> > In drm/i915/gem/i915_gem_phys.c, the functions
> > i915_gem_object_get_pages_phys() and i915_gem_object_put_pages_phys()
> > don't need to disable pagefaults and preemption for mapping because of
> > these 2 reasons:
> > 
> > 1. The flush operation is safe for CPU hotplug when preemption is not
> > disabled. In drm/i915/gem/i915_gem_object.c, the functions
> > i915_gem_object_get_pages_phys() and i915_gem_object_put_pages_phys()
> > calls drm_clflush_virt_range() to use CLFLUSHOPT or WBINVD to flush.
> > Since CLFLUSHOPT is global on x86 and WBINVD is called on each cpu in
> > drm_clflush_virt_range(), the flush operation is global and any issue
> > with cpu's being added or removed can be handled safely.
> > 
> > 2. Any context switch caused by preemption or sleep (pagefault may
> > cause sleep) doesn't affect the validity of local mapping.
> > 
> > Therefore, i915_gem_object_get_pages_phys() and
> > i915_gem_object_put_pages_phys() are two functions where the use of
> > kmap_local_page() in place of kmap_atomic() is correctly suited.
> > 
> > Convert the calls of kmap_atomic() / kunmap_atomic() to
> > kmap_local_page() / kunmap_local().
> > 
> 
> I have here the same questions as in 1/9.
> 
> > [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com
> > 
> > Suggested-by: Dave Hansen 
> > Suggested-by: Ira Weiny 
> > Suggested-by: Fabio M. De Francesco 
> > Signed-off-by: Zhao Liu 
> > ---
> > Suggested by credits:
> >   Dave: Referred to his explanation about cache flush.
> >   Ira: Referred to his task document, review comments and explanation about
> >cache flush.
> >   Fabio: Referred to his boiler plate commit message.
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_phys.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 0d0e46dae559..d602ba19ecb2 
> 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
> > @@ -66,10 +66,10 @@ static int i915_gem_object_get_pages_phys(struct 
> drm_i915_gem_object
> > *obj) if (IS_ERR(page))
> > goto err_st;
> > 
> > -   src = kmap_atomic(page);
> > +   src = kmap_local_page(page);
> > memcpy(dst, src, PAGE_SIZE);
> > drm_clflush_virt_range(dst, PAGE_SIZE);
> > -   kunmap_atomic(src);
> > +   kunmap_local(src);
> 
> Please use memcpy_from_page() instead of open coding mapping + memcpy() + 
> unmapping.

Ok.

> 
> > 
> > put_page(page);
> > dst += PAGE_SIZE;
> > @@ -114,10 +114,10 @@ i915_gem_object_put_pages_phys(struct 
> drm_i915_gem_object *obj,
> > if (IS_ERR(page))
> > continue;
> > 
> > -   dst = kmap_atomic(page);
> > +   dst = kmap_local_page(page);
> > drm_clflush_virt_range(src, PAGE_SIZE);
> > memcpy(dst, src, PAGE_SIZE);
> > -   kunmap_atomic(dst);
> > +   kunmap_local(dst);
> 
> For the same reasons said above, memcpy_to_page() should be used here and 
> avoid open coding of three functions.
> 
> Using those helpers forces you to move drm_clflush_virt_range() out of the 
> mapping / un-mapping region. I may be wrong, however I'm pretty sure that the 
> relative positions of each of those call sites is something that cannot be 
> randomly chosen.

I agree. Will use memcpy_to_page().

Thanks,
Zhao

> 
> Thanks,
> 
> Fabio
> 
> > 
> > set_page_dirty(page);
> > if (obj->mm.madv == I915_MADV_WILLNEED)
> 
> 
> 


[Intel-gfx] [PATCH CI 2/2] freezer, sched: Rewrite core freezer logic v2

2022-11-04 Thread Ville Syrjala
From: Peter Zijlstra 

On Wed, Nov 02, 2022 at 06:57:51PM +0200, Ville Syrjälä wrote:
> On Thu, Oct 27, 2022 at 06:53:23PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 27, 2022 at 04:09:01PM +0300, Ville Syrjälä wrote:
> > > On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> >
> > > > Could you please give the below a spin?
> > >
> > > Thanks. I've added this to our CI branch. I'll try to keep and eye
> > > on it in the coming days and let you know if anything still trips.
> > > And I'll report back maybe ~middle of next week if we haven't caught
> > > anything by then.
> >
> > Thanks!
>
> Looks like we haven't caught anything since I put the patch in.
> So the fix seems good.

While writing up the Changelog, it occured to me it might be possible to
fix another way, could I bother you to also run the below patch for a
bit?

Link: 
https://lore.kernel.org/all/y2lsuifbuiy2a...@hirez.programming.kicks-ass.net/
Signed-off-by: Ville Syrjälä 
---
 kernel/sched/core.c | 52 ++---
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cb2aa2b54c7a..daff72f00385 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4200,6 +4200,40 @@ try_to_wake_up(struct task_struct *p, unsigned int 
state, int wake_flags)
return success;
 }
 
+static bool __task_needs_rq_lock(struct task_struct *p)
+{
+   unsigned int state = READ_ONCE(p->__state);
+
+   /*
+* Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
+* the task is blocked. Make sure to check @state since ttwu() can drop
+* locks at the end, see ttwu_queue_wakelist().
+*/
+   if (state == TASK_RUNNING || state == TASK_WAKING)
+   return true;
+
+   /*
+* Ensure we load p->on_rq after p->__state, otherwise it would be
+* possible to, falsely, observe p->on_rq == 0.
+*
+* See try_to_wake_up() for a longer comment.
+*/
+   smp_rmb();
+   if (p->on_rq)
+   return true;
+
+#ifdef CONFIG_SMP
+   /*
+* Ensure the task has finished __schedule() and will not be referenced
+* anymore. Again, see try_to_wake_up() for a longer comment.
+*/
+   smp_rmb();
+   smp_cond_load_acquire(&p->on_cpu, !VAL);
+#endif
+
+   return false;
+}
+
 /**
  * task_call_func - Invoke a function on task in fixed state
  * @p: Process for which the function is to be invoked, can be @current.
@@ -4217,28 +4251,12 @@ try_to_wake_up(struct task_struct *p, unsigned int 
state, int wake_flags)
 int task_call_func(struct task_struct *p, task_call_f func, void *arg)
 {
struct rq *rq = NULL;
-   unsigned int state;
struct rq_flags rf;
int ret;
 
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 
-   state = READ_ONCE(p->__state);
-
-   /*
-* Ensure we load p->on_rq after p->__state, otherwise it would be
-* possible to, falsely, observe p->on_rq == 0.
-*
-* See try_to_wake_up() for a longer comment.
-*/
-   smp_rmb();
-
-   /*
-* Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
-* the task is blocked. Make sure to check @state since ttwu() can drop
-* locks at the end, see ttwu_queue_wakelist().
-*/
-   if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
+   if (__task_needs_rq_lock(p))
rq = __task_rq_lock(p, &rf);
 
/*
-- 
2.37.4



[Intel-gfx] [PATCH CI 1/2] Revert "freezer, sched: Rewrite core freezer logic fix"

2022-11-04 Thread Ville Syrjala
From: Ville Syrjälä 

This reverts commit f3387d5883ad92e9a54306fa3dff97d4f0581d78.
---
 kernel/sched/core.c | 49 -
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f519f44cd4c7..cb2aa2b54c7a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4200,37 +4200,6 @@ try_to_wake_up(struct task_struct *p, unsigned int 
state, int wake_flags)
return success;
 }
 
-static bool __task_needs_rq_lock(struct task_struct *p)
-{
-   unsigned int state = READ_ONCE(p->__state);
-
-   /*
-* Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
-* the task is blocked. Make sure to check @state since ttwu() can drop
-* locks at the end, see ttwu_queue_wakelist().
-*/
-   if (state == TASK_RUNNING || state == TASK_WAKING)
-   return true;
-
-   /*
-* Ensure we load p->on_rq after p->__state, otherwise it would be
-* possible to, falsely, observe p->on_rq == 0.
-*
-* See try_to_wake_up() for a longer comment.
-*/
-   smp_rmb();
-   if (p->on_rq)
-   return true;
-
-#ifdef CONFIG_SMP
-   smp_rmb();
-   if (p->on_cpu)
-   return true;
-#endif
-
-   return false;
-}
-
 /**
  * task_call_func - Invoke a function on task in fixed state
  * @p: Process for which the function is to be invoked, can be @current.
@@ -4248,12 +4217,28 @@ static bool __task_needs_rq_lock(struct task_struct *p)
 int task_call_func(struct task_struct *p, task_call_f func, void *arg)
 {
struct rq *rq = NULL;
+   unsigned int state;
struct rq_flags rf;
int ret;
 
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 
-   if (__task_needs_rq_lock(p))
+   state = READ_ONCE(p->__state);
+
+   /*
+* Ensure we load p->on_rq after p->__state, otherwise it would be
+* possible to, falsely, observe p->on_rq == 0.
+*
+* See try_to_wake_up() for a longer comment.
+*/
+   smp_rmb();
+
+   /*
+* Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
+* the task is blocked. Make sure to check @state since ttwu() can drop
+* locks at the end, see ttwu_queue_wakelist().
+*/
+   if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
rq = __task_rq_lock(p, &rf);
 
/*
-- 
2.37.4



[Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/pps: Add get_pps_idx() hook as part of pps_get_register() cleanup (rev2)

2022-11-04 Thread Patchwork
== Series Details ==

Series: series starting with [1/2] drm/i915/pps: Add get_pps_idx() hook as part 
of pps_get_register() cleanup (rev2)
URL   : https://patchwork.freedesktop.org/series/109820/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12342 -> Patchwork_109820v2


Summary
---

  **FAILURE**

  Serious unknown changes coming with Patchwork_109820v2 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_109820v2, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/index.html

Participating hosts (27 -> 35)
--

  Additional (10): bat-dg2-8 bat-dg2-9 bat-adlp-6 bat-adlp-4 bat-adln-1 
bat-rplp-1 bat-rpls-1 bat-rpls-2 bat-dg2-11 bat-jsl-1 
  Missing(2): fi-ctg-p8600 fi-bdw-samus 

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_109820v2:

### IGT changes ###

 Possible regressions 

  * igt@i915_module_load@load:
- bat-adlp-4: NOTRUN -> [DMESG-WARN][1]
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/bat-adlp-4/igt@i915_module_l...@load.html

  
 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_module_load@load:
- {bat-rplp-1}:   NOTRUN -> [DMESG-WARN][2]
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/bat-rplp-1/igt@i915_module_l...@load.html

  
Known issues


  Here are the changes found in Patchwork_109820v2 that come from known issues:

### IGT changes ###

 Issues hit 

  * igt@core_hotunplug@unbind-rebind:
- fi-apl-guc: [PASS][3] -> [INCOMPLETE][4] ([i915#7073])
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-apl-guc/igt@core_hotunp...@unbind-rebind.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-apl-guc/igt@core_hotunp...@unbind-rebind.html

  * igt@gem_exec_gttfill@basic:
- fi-pnv-d510:[PASS][5] -> [FAIL][6] ([i915#7229])
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-pnv-d510/igt@gem_exec_gttf...@basic.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-pnv-d510/igt@gem_exec_gttf...@basic.html

  * igt@gem_linear_blits@basic:
- fi-pnv-d510:[PASS][7] -> [SKIP][8] ([fdo#109271])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-pnv-d510/igt@gem_linear_bl...@basic.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-pnv-d510/igt@gem_linear_bl...@basic.html

  * igt@i915_selftest@live@hangcheck:
- fi-hsw-4770:[PASS][9] -> [INCOMPLETE][10] ([i915#4785])
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-hsw-4770/igt@i915_selftest@l...@hangcheck.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-hsw-4770/igt@i915_selftest@l...@hangcheck.html
- fi-hsw-g3258:   [PASS][11] -> [INCOMPLETE][12] ([i915#3303] / 
[i915#4785])
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-hsw-g3258/igt@i915_selftest@l...@hangcheck.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-hsw-g3258/igt@i915_selftest@l...@hangcheck.html

  * igt@runner@aborted:
- fi-hsw-4770:NOTRUN -> [FAIL][13] ([fdo#109271] / [i915#4312] / 
[i915#5594])
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-hsw-4770/igt@run...@aborted.html
- bat-adlp-4: NOTRUN -> [FAIL][14] ([i915#4312])
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/bat-adlp-4/igt@run...@aborted.html
- fi-hsw-g3258:   NOTRUN -> [FAIL][15] ([fdo#109271] / [i915#4312] / 
[i915#4991])
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-hsw-g3258/igt@run...@aborted.html

  
 Possible fixes 

  * 
igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions-varying-size:
- fi-bsw-kefka:   [FAIL][16] ([i915#6298]) -> [PASS][17]
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12342/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cur...@atomic-transitions-varying-size.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109820v2/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cur...@atomic-transitions-varying-size.html

  
  {name}: This element is suppressed. This means it is ignored when computing
  the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295

Re: [Intel-gfx] [PATCH 1/9] drm/i915: Use kmap_local_page() in gem/i915_gem_object.c

2022-11-04 Thread Zhao Liu
On Thu, Nov 03, 2022 at 08:22:04PM +0100, Fabio M. De Francesco wrote:
> Date: Thu, 03 Nov 2022 20:22:04 +0100
> From: "Fabio M. De Francesco" 
> Subject: Re: [PATCH 1/9] drm/i915: Use kmap_local_page() in
>  gem/i915_gem_object.c
> 
> On gioved? 3 novembre 2022 17:51:23 CET Ira Weiny wrote:
> > On Sat, Oct 29, 2022 at 01:17:03PM +0200, Fabio M. De Francesco wrote:
> > > On luned? 17 ottobre 2022 11:37:17 CEST Zhao Liu wrote:
> > > > From: Zhao Liu 
> > > > 
> > > > The use of kmap_atomic() is being deprecated in favor of
> > > > kmap_local_page()[1].
> > > > 
> > > > The main difference between atomic and local mappings is that local
> > > > mappings doesn't disable page faults or preemption.
> > > 
> > > You are right about about page faults which are never disabled by
> > > kmap_local_page(). However kmap_atomic might not disable preemption. It
> > > depends on CONFIG_PREEMPT_RT.
> > > 
> > > Please refer to how kmap_atomic_prot() works (this function is called by
> > > kmap_atomic() when kernels have HIGHMEM enabled).
> > > 
> > > > There're 2 reasons why i915_gem_object_read_from_page_kmap() doesn't
> > > > need to disable pagefaults and preemption for mapping:
> > > > 
> > > > 1. The flush operation is safe for CPU hotplug when preemption is not
> > > > disabled.
> > > 
> > > I'm confused here. Why are you talking about CPU hotplug?
> > 
> > I agree with Fabio here.  I'm not making the connection between cpu hotplug 
> and
> > this code path.
> > 
> > Ira
> 
> @Zhao,
> 
> I'd like to add that I was about to put my reviewed-by tag. The other things 
> I 
> objected are minor nits. Please just clarify this connection.

Thanks Fabio for your comments! Sorry I missed the mails that day. This 
connection
is my misunderstanding. Other thoughts please refer to my reply to your first 
email
in this thread.

Thanks,
Zhao



Re: [Intel-gfx] [PATCH 1/9] drm/i915: Use kmap_local_page() in gem/i915_gem_object.c

2022-11-04 Thread Zhao Liu
On Thu, Nov 03, 2022 at 09:51:23AM -0700, Ira Weiny wrote:
> Date: Thu, 3 Nov 2022 09:51:23 -0700
> From: Ira Weiny 
> Subject: Re: [PATCH 1/9] drm/i915: Use kmap_local_page() in
>  gem/i915_gem_object.c
> 
> On Sat, Oct 29, 2022 at 01:17:03PM +0200, Fabio M. De Francesco wrote:
> > On luned? 17 ottobre 2022 11:37:17 CEST Zhao Liu wrote:
> > > From: Zhao Liu 
> > > 
> > > The use of kmap_atomic() is being deprecated in favor of
> > > kmap_local_page()[1].
> > > 
> > > The main difference between atomic and local mappings is that local
> > > mappings doesn't disable page faults or preemption.
> > 
> > You are right about about page faults which are never disabled by 
> > kmap_local_page(). However kmap_atomic might not disable preemption. It 
> > depends on CONFIG_PREEMPT_RT.
> > 
> > Please refer to how kmap_atomic_prot() works (this function is called by 
> > kmap_atomic() when kernels have HIGHMEM enabled).
> > 
> > > 
> > > There're 2 reasons why i915_gem_object_read_from_page_kmap() doesn't
> > > need to disable pagefaults and preemption for mapping:
> > > 
> > > 1. The flush operation is safe for CPU hotplug when preemption is not
> > > disabled. 
> > 
> > I'm confused here. Why are you talking about CPU hotplug?
> 
> I agree with Fabio here.  I'm not making the connection between cpu hotplug 
> and
> this code path.

Sorry, my misunderstanding. Will delete this wrong explanation.

Thanks,
Zhao


Re: [Intel-gfx] [PATCH 1/9] drm/i915: Use kmap_local_page() in gem/i915_gem_object.c

2022-11-04 Thread Zhao Liu
On Sat, Oct 29, 2022 at 01:17:03PM +0200, Fabio M. De Francesco wrote:
> Date: Sat, 29 Oct 2022 13:17:03 +0200
> From: "Fabio M. De Francesco" 
> Subject: Re: [PATCH 1/9] drm/i915: Use kmap_local_page() in
>  gem/i915_gem_object.c
> 
> On luned? 17 ottobre 2022 11:37:17 CEST Zhao Liu wrote:
> > From: Zhao Liu 
> > 
> > The use of kmap_atomic() is being deprecated in favor of
> > kmap_local_page()[1].
> > 
> > The main difference between atomic and local mappings is that local
> > mappings doesn't disable page faults or preemption.
> 
> You are right about about page faults which are never disabled by 
> kmap_local_page(). However kmap_atomic might not disable preemption. It 
> depends on CONFIG_PREEMPT_RT.
> 
> Please refer to how kmap_atomic_prot() works (this function is called by 
> kmap_atomic() when kernels have HIGHMEM enabled).

Yes, there is some ambiguity here. What about "The main difference between
atomic and local mappings is that local mappings never disable page faults
or preemption"?

> 
> > 
> > There're 2 reasons why i915_gem_object_read_from_page_kmap() doesn't
> > need to disable pagefaults and preemption for mapping:
> > 
> > 1. The flush operation is safe for CPU hotplug when preemption is not
> > disabled. 
> 
> I'm confused here. Why are you talking about CPU hotplug?
> In any case, developers should never rely on implicit calls of 
> preempt_disable() for the reasons said above. Therefore, flush operations 
> should be allowed regardless that kmap_atomic() potential side effect.

Sorry, it's my fault, my misunderstanding about the connection between hotplug
and flush here. When mapping exists, the cpu cannot be unplugged via 
CPU-hotplug.
But whether plug or unplug, it has nothing to do with flush. I will delete this
wrong description.

My initial consideration is that this interface of flush may require an atomic
context, so I want to explain more from the details of its implementation
that cache consistency can be guaranteed without atomic context. Is this
consideration redundant?
Also, do I need to state that migration is still ok for this flush interface
here (since __kmap_local_page_prot() doesn't always disable migration)?

> > In drm/i915/gem/i915_gem_object.c, the function
> > i915_gem_object_read_from_page_kmap() calls drm_clflush_virt_range()
> 
> If I recall correctly, drm_clflush_virt_range() can always be called with 
> page 
> faults and preemption enabled. If so, this is enough to say that the 
> conversion is safe. 
> 
> Is this code explicitly related to flushing the cache lines before removing / 
> adding CPUs? If I recall correctly, there are several other reasons behind 
> the 
> need to issue cache lines flushes. Am I wrong about this?
> 
> Can you please say more about what I'm missing here?
> 
> > to
> > use CLFLUSHOPT or WBINVD to flush. Since CLFLUSHOPT is global on x86
> > and WBINVD is called on each cpu in drm_clflush_virt_range(), the flush
> > operation is global and any issue with cpu's being added or removed
> > can be handled safely.
> 
> Again your main concern is about CPU hotplug.
> 
> Even if I'm missing something, do we really need all these details about the 
> inner workings of drm_clflush_virt_range()? 
> 
> I'm not an expert, so may be that I'm wrong about all I wrote above.
> 
> Therefore, can you please elaborate a little more for readers with very 
> little 
> knowledge of these kinds of things (like me and perhaps others)?
>  
> > 2. Any context switch caused by preemption or sleep (pagefault may
> > cause sleep) doesn't affect the validity of local mapping.
> 
> I'd replace "preemption or sleep" with "preemption and page faults" since 
> yourself then added that page faults lead to tasks being put to sleep.  

Thanks, good advice.

Zhao



Re: [Intel-gfx] [PATCH 0/9] drm/i915: Replace kmap_atomic() with kmap_local_page()

2022-11-04 Thread Zhao Liu
On Sat, Oct 29, 2022 at 09:12:27AM +0200, Fabio M. De Francesco wrote:
> Date: Sat, 29 Oct 2022 09:12:27 +0200
> From: "Fabio M. De Francesco" 
> Subject: Re: [PATCH 0/9] drm/i915: Replace kmap_atomic() with
>  kmap_local_page()

Hi Fabio, thanks for your review!! (I'm sorry I missed the previous mails).

> 
> On luned? 17 ottobre 2022 11:37:16 CEST Zhao Liu wrote:
> > From: Zhao Liu 
> > 
> > The use of kmap_atomic() is being deprecated in favor of
> > kmap_local_page()[1].
> 
> Some words to explain why kmap_atomic was deprecated won't hurt. Many 
> maintainers and reviewers, and also casual readers might not yet be aware of 
> the reasons behind that deprecation.
>  
> > In the following patches, we can convert the calls of kmap_atomic() /
> > kunmap_atomic() to kmap_local_page() / kunmap_local(), which can
> > instead do the mapping / unmapping regardless of the context.
> 
> Readers are probably much more interested in what you did in the following 
> patches and why you did it, instead of being informed about what "we can" do.
> 
> I would suggest something like "The following patches convert the calls to 
> kmap_atomic() to kmap_local_page() [the rest looks OK]".
> 
> This could also be the place to say something about why we prefer 
> kmap_local_page() to kmap_atomic(). 
> 
> Are you sure that the reasons that motivates your conversions are merely 
> summarized to kmap_local_page() being able to do mappings regardless of 
> context? I think you are missing the real reasons why. 

Thanks for your reminder, I'll emphasize the motivation here.

> What about avoiding the often unwanted side effect of unnecessary page faults 
> disables?

Good suggestion! I'll add this into this cover message.

What I think is that we have two reasons to do the replacement work:
1. (main motication) Avoid unnessary pagefaulta and preemption disabling to gain
performance benefits.
2. We are trying to deprecate the old kmap/kmap_atomic interface. Some 
maintainer
said it's also a good reason especially for the case that the performance is not
critical [1].

In addition, also from [1], I find in some case people chooses kmap_atomic() for
the consideration that they want the atomic context. So, the explaination about
why the atomic context is not needed is also a reasion? I understand that I need
to make special explaination in each commit depending on the situation (In this
case, it is not suitable to describe in the cover?).

[1]: https://lore.kernel.org/lkml/YzRVaJA0EyfcVisW@liuwe-devbox-debian-v2/#t

> 
> > 
> > With kmap_local_page(), the mapping is per thread, CPU local and not
> > globally visible.
> 
> No news here. kmap_atomic() is "per thread, CPU local and not glocally 
> visible". I cannot see any difference here between kmap_atomic() and 
> kmap_local_page().

What about the below description which refers to your doc?
"kmap_atomic() in the kernel creates a non-preemptible section
and disable pagefaults. This could be a source of unwanted latency.
And kmap_local_page effectively overcomes this issue because it doesn't
disable pagefault and preemption."

Thanks,
Zhao



Re: [Intel-gfx] [PATCH v3 23/23] drm/fb-helper: Clarify use of last_close and output_poll_changed

2022-11-04 Thread Javier Martinez Canillas
On 11/3/22 16:14, Thomas Zimmermann wrote:
> Clarify documentation in the use of struct drm_driver.last_close and
> struct drm_mode_config_funcs.output_poll_changed. Those callbacks should
> not be said for fbdev implementations on top of struct drm_client_funcs.
> 
> Signed-off-by: Thomas Zimmermann 
> ---

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Intel-gfx] [PATCH v3 20/23] drm/fb-helper: Set flag in struct drm_fb_helper for leaking physical addresses

2022-11-04 Thread Javier Martinez Canillas
On 11/3/22 16:14, Thomas Zimmermann wrote:
> Uncouple the parameter drm_leak_fbdev_smem from the implementation by
> setting a flag in struct drm_fb_helper. This will help to move the
> generic fbdev emulation into its own source file, while keeping the
> parameter in drm_fb_helper.c. No functional changes.
> 
> Signed-off-by: Thomas Zimmermann 
> ---

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [Intel-gfx] [PATCH] drm/i915/dsc: Add is_dsc_supported()

2022-11-04 Thread Jani Nikula
On Thu, 03 Nov 2022, "Navare, Manasi"  wrote:
> On Thu, Nov 03, 2022 at 11:32:22AM +0530, Swati Sharma wrote:
>> Lets use RUNTIME_INFO->has_dsc since platforms supporting dsc has this
>> flag enabled.
>> 
>> This is done based on the review comments received on
>> https://patchwork.freedesktop.org/patch/509393/

I don't think that's necessary. If it were an idea worth crediting, the
usual way is using Suggested-by: tag.

>> 
>> Signed-off-by: Swati Sharma 
>> ---
>>  drivers/gpu/drm/i915/display/intel_dp.c   | 6 +++---
>>  drivers/gpu/drm/i915/display/intel_vdsc.c | 7 ++-
>>  drivers/gpu/drm/i915/display/intel_vdsc.h | 2 ++
>>  3 files changed, 11 insertions(+), 4 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
>> b/drivers/gpu/drm/i915/display/intel_dp.c
>> index 7400d6b4c587..eb908da80f2b 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dp.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
>> @@ -1012,7 +1012,7 @@ intel_dp_mode_valid(struct drm_connector *_connector,
>>   * Output bpp is stored in 6.4 format so right shift by 4 to get the
>>   * integer value since we support only integer values of bpp.
>>   */
>> -if (DISPLAY_VER(dev_priv) >= 10 &&
>> +if (is_dsc_supported(dev_priv) &&
>>  drm_dp_sink_supports_dsc(intel_dp->dsc_dpcd)) {
>>  /*
>>   * TBD pass the connector BPC,
>> @@ -2906,7 +2906,7 @@ intel_edp_init_dpcd(struct intel_dp *intel_dp)
>>  intel_dp_set_max_sink_lane_count(intel_dp);
>>  
>>  /* Read the eDP DSC DPCD registers */
>> -if (DISPLAY_VER(dev_priv) >= 10)
>> +if (is_dsc_supported(dev_priv))
>>  intel_dp_get_dsc_sink_cap(intel_dp);
>>  
>>  /*
>> @@ -4691,7 +4691,7 @@ intel_dp_detect(struct drm_connector *connector,
>>  }
>>  
>>  /* Read DP Sink DSC Cap DPCD regs for DP v1.4 */
>> -if (DISPLAY_VER(dev_priv) >= 11)
>> +if (is_dsc_supported(dev_priv))
>>  intel_dp_get_dsc_sink_cap(intel_dp);
>>  
>>  intel_dp_configure_mst(intel_dp);
>> diff --git a/drivers/gpu/drm/i915/display/intel_vdsc.c 
>> b/drivers/gpu/drm/i915/display/intel_vdsc.c
>> index 269f9792390d..e7c1169538da 100644
>> --- a/drivers/gpu/drm/i915/display/intel_vdsc.c
>> +++ b/drivers/gpu/drm/i915/display/intel_vdsc.c
>> @@ -338,13 +338,18 @@ static const struct rc_parameters *get_rc_params(u16 
>> compressed_bpp,
>>  return &rc_parameters[row_index][column_index];
>>  }
>>  
>> +bool is_dsc_supported(struct drm_i915_private *dev_priv)
>> +{
>> +return RUNTIME_INFO(dev_priv)->has_dsc;
>> +}
>> +

All of the wrappers to runtime/device info members are of the form:

#define HAS_DSC(__i915) (RUNTIME_INFO(__i915)->has_dsc)

in i915_drv.h.

>>  bool intel_dsc_source_support(const struct intel_crtc_state *crtc_state)
>>  {
>>  const struct intel_crtc *crtc = to_intel_crtc(crtc_state->uapi.crtc);
>>  struct drm_i915_private *i915 = to_i915(crtc->base.dev);
>>  enum transcoder cpu_transcoder = crtc_state->cpu_transcoder;
>>  
>> -if (!RUNTIME_INFO(i915)->has_dsc)
>> +if (!is_dsc_supported(i915))
>>  return false;
>>  
>>  if (DISPLAY_VER(i915) >= 12)
>
> In Runtime info, Gen 12 should have Gen 11 runtime has dsc set, so makes
> this check here redundant.

As it is, it's not redundant. It's tied to the transcoder check.

But this could be simplified as:

if (!HAS_DSC(i915))
return false;

if (DISPLAY_VER(i915) == 11 && cpu_transcoder == TRANSCODER_A)
return false;

return true;

It could be condenced even further, but at the const of losing clarity.

BR,
Jani.


>
> Manasi
>
>> diff --git a/drivers/gpu/drm/i915/display/intel_vdsc.h 
>> b/drivers/gpu/drm/i915/display/intel_vdsc.h
>> index 8763f00fa7e2..049e8b95fdde 100644
>> --- a/drivers/gpu/drm/i915/display/intel_vdsc.h
>> +++ b/drivers/gpu/drm/i915/display/intel_vdsc.h
>> @@ -12,7 +12,9 @@ enum transcoder;
>>  struct intel_crtc;
>>  struct intel_crtc_state;
>>  struct intel_encoder;
>> +struct drm_i915_private;
>>  
>> +bool is_dsc_supported(struct drm_i915_private *dev_priv);
>>  bool intel_dsc_source_support(const struct intel_crtc_state *crtc_state);
>>  void intel_uncompressed_joiner_enable(const struct intel_crtc_state 
>> *crtc_state);
>>  void intel_dsc_enable(const struct intel_crtc_state *crtc_state);
>> -- 
>> 2.25.1
>> 

-- 
Jani Nikula, Intel Open Source Graphics Center


[Intel-gfx] ✓ Fi.CI.IGT: success for Add DP MST DSC support to i915 (rev16)

2022-11-04 Thread Patchwork
== Series Details ==

Series: Add DP MST DSC support to i915 (rev16)
URL   : https://patchwork.freedesktop.org/series/101492/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12337_full -> Patchwork_101492v16_full


Summary
---

  **SUCCESS**

  No regressions found.

  

Participating hosts (11 -> 11)
--

  No changes in participating hosts

Possible new issues
---

  Here are the unknown changes that may have been introduced in 
Patchwork_101492v16_full:

### IGT changes ###

 Suppressed 

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_ctx_isolation@preservation-reset@bcs0:
- {shard-rkl}:([PASS][1], [PASS][2]) -> ([INCOMPLETE][3], [PASS][4])
   [1]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-rkl-4/igt@gem_ctx_isolation@preservation-re...@bcs0.html
   [2]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-rkl-1/igt@gem_ctx_isolation@preservation-re...@bcs0.html
   [3]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101492v16/shard-rkl-5/igt@gem_ctx_isolation@preservation-re...@bcs0.html
   [4]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101492v16/shard-rkl-1/igt@gem_ctx_isolation@preservation-re...@bcs0.html

  * igt@gen7_exec_parse@batch-without-end:
- {shard-dg1}:[SKIP][5] ([fdo#109289]) -> [INCOMPLETE][6]
   [5]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-dg1-17/igt@gen7_exec_pa...@batch-without-end.html
   [6]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101492v16/shard-dg1-19/igt@gen7_exec_pa...@batch-without-end.html

  
Known issues


  Here are the changes found in Patchwork_101492v16_full that come from known 
issues:

### CI changes ###

 Possible fixes 

  * boot:
- shard-apl:  ([PASS][7], [PASS][8], [PASS][9], [PASS][10], 
[PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], 
[PASS][17], [PASS][18], [PASS][19], [PASS][20], [FAIL][21], [PASS][22], 
[PASS][23], [PASS][24], [PASS][25], [PASS][26], [PASS][27], [PASS][28], 
[PASS][29], [PASS][30], [PASS][31]) ([i915#4386]) -> ([PASS][32], [PASS][33], 
[PASS][34], [PASS][35], [PASS][36], [PASS][37], [PASS][38], [PASS][39], 
[PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], 
[PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50], [PASS][51], 
[PASS][52], [PASS][53], [PASS][54], [PASS][55], [PASS][56])
   [7]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl2/boot.html
   [8]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl2/boot.html
   [9]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl2/boot.html
   [10]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl3/boot.html
   [11]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl3/boot.html
   [12]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl3/boot.html
   [13]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl3/boot.html
   [14]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl6/boot.html
   [15]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl6/boot.html
   [16]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl6/boot.html
   [17]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl6/boot.html
   [18]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl7/boot.html
   [19]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl7/boot.html
   [20]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl2/boot.html
   [21]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl2/boot.html
   [22]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl1/boot.html
   [23]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl1/boot.html
   [24]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl1/boot.html
   [25]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl1/boot.html
   [26]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl7/boot.html
   [27]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl7/boot.html
   [28]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl8/boot.html
   [29]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl8/boot.html
   [30]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl8/boot.html
   [31]: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12337/shard-apl8/boot.html
   [32]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101492v16/shard-apl1/boot.html
   [33]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101492v16/shard-apl1/boot.html
   [34]: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101492v16/shard-apl1/boot.html
   [35]: 
https://intel-gfx-ci

Re: [Intel-gfx] [PATCH] drm/i915: Don't wait forever in drop_caches

2022-11-04 Thread Tvrtko Ursulin



On 03/11/2022 19:16, John Harrison wrote:

On 11/3/2022 02:38, Tvrtko Ursulin wrote:

On 03/11/2022 09:18, Tvrtko Ursulin wrote:

On 03/11/2022 01:33, John Harrison wrote:

On 11/2/2022 07:20, Tvrtko Ursulin wrote:

On 02/11/2022 12:12, Jani Nikula wrote:

On Tue, 01 Nov 2022, john.c.harri...@intel.com wrote:

From: John Harrison 

At the end of each test, IGT does a drop caches call via sysfs with


sysfs?
Sorry, that was meant to say debugfs. I've also been working on some 
sysfs IGT issues and evidently got my wires crossed!





special flags set. One of the possible paths waits for idle with an
infinite timeout. That causes problems for debugging issues when CI
catches a "can't go idle" test failure. Best case, the CI system 
times

out (after 90s), attempts a bunch of state dump actions and then
reboots the system to recover it. Worst case, the CI system can't do
anything at all and then times out (after 1000s) and simply reboots.
Sometimes a serial port log of dmesg might be available, 
sometimes not.


So rather than making life hard for ourselves, change the timeout to
be 10s rather than infinite. Also, trigger the standard
wedge/reset/recover sequence so that testing can continue with a
working system (if possible).

Signed-off-by: John Harrison 
---
  drivers/gpu/drm/i915/i915_debugfs.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c

index ae987e92251dd..9d916fbbfc27c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -641,6 +641,9 @@ 
DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops,

    DROP_RESET_ACTIVE | \
    DROP_RESET_SEQNO | \
    DROP_RCU)
+
+#define DROP_IDLE_TIMEOUT    (HZ * 10)


I915_IDLE_ENGINES_TIMEOUT is defined in i915_drv.h. It's also only 
used

here.


So move here, dropping i915 prefix, next to the newly proposed one?

Sure, can do that.




I915_GEM_IDLE_TIMEOUT is defined in i915_gem.h. It's only used in
gt/intel_gt.c.


Move there and rename to GT_IDLE_TIMEOUT?

I915_GT_SUSPEND_IDLE_TIMEOUT is defined and used only in 
intel_gt_pm.c.


No action needed, maybe drop i915 prefix if wanted.

These two are totally unrelated and in code not being touched by 
this change. I would rather not conflate changing random other 
things with fixing this specific issue.



I915_IDLE_ENGINES_TIMEOUT is in ms, the rest are in jiffies.


Add _MS suffix if wanted.


My head spins.


I follow and raise that the newly proposed DROP_IDLE_TIMEOUT 
applies to DROP_ACTIVE and not only DROP_IDLE.
My original intention for the name was that is the 'drop caches 
timeout for intel_gt_wait_for_idle'. Which is quite the mouthful and 
hence abbreviated to DROP_IDLE_TIMEOUT. But yes, I realised later 
that name can be conflated with the DROP_IDLE flag. Will rename.





Things get refactored, code moves around, bits get left behind, who 
knows. No reason to get too worked up. :) As long as people are 
taking a wider view when touching the code base, and are not afraid 
to send cleanups, things should be good.
On the other hand, if every patch gets blocked in code review 
because someone points out some completely unrelated piece of code 
could be a bit better then nothing ever gets fixed. If you spot 
something that you think should be improved, isn't the general idea 
that you should post a patch yourself to improve it?


There's two maintainers per branch and an order of magnitude or two 
more developers so it'd be nice if cleanups would just be incoming on 
self-initiative basis. ;)


For the actual functional change at hand - it would be nice if code 
paths in question could handle SIGINT and then we could punt the 
decision on how long someone wants to wait purely to userspace. But 
it's probably hard and it's only debugfs so whatever.


The code paths in question will already abort on a signal won't 
they? Both intel_gt_wait_for_idle() and 
intel_guc_wait_for_pending_msg(), which is where the 
uc_wait_for_idle eventually ends up, have an 'if(signal_pending) 
return -EINTR;' check. Beyond that, it sounds like what you are 
asking for is a change in the IGT libraries and/or CI framework to 
start sending signals after some specific timeout. That seems like a 
significantly more complex change (in terms of the number of 
entities affected and number of groups involved) and unnecessary.


If you say so, I haven't looked at them all. But if the code path in 
question already aborts on signals then I am not sure what is the 
patch fixing? I assumed you are trying to avoid the write stuck in D 
forever, which then prevents driver unload and everything, requiring 
the test runner to eventually reboot. If you say SIGINT works then 
you can already recover from userspace, no?


Whether or not 10s is enough CI will hopefully tell us. I'd 
probably err on the side of safety and make it longer, but at most 
half from the test runner timeout.

Re: [Intel-gfx] [PATCH 11/11] drm/i915: Create resized LUTs for ivb+ split gamma mode

2022-11-04 Thread Ville Syrjälä
On Fri, Nov 04, 2022 at 10:49:39AM +0530, Nautiyal, Ankit K wrote:
> Patch looks good to me.
> 
> Minor suggestions inline:
> 
> On 10/26/2022 5:09 PM, Ville Syrjala wrote:
> > From: Ville Syrjälä 
> >
> > Currently when opeating in split gamma mode we do the
> nitpick: 'operating' typo.
> > "skip ever other sw LUT entry" trick in the low level
> > LUT programming/readout functions. That is very annoying
> > and a big hinderance to revamping the color management
> > uapi.
> >
> > Let's get rid of that problem by making half sized copies
> > of the software LUTs and plugging those into the internal
> > {pre,post}_csc_lut attachment points (instead of the sticking
> > the uapi provide sw LUTs there directly).
> >
> > With this the low level stuff will operate purely in terms
> > the hardware LUT sizes, and all uapi nonsense is contained
> > to the atomic check phase. The one thing we do lose is
> > intel_color_assert_luts() since we no longer have a way to
> > check that the uapi LUTs were correctly used when generating
> > the internal copies. But that seems like a price worth paying.
> >
> > Signed-off-by: Ville Syrjälä 
> > ---
> >   drivers/gpu/drm/i915/display/intel_color.c | 81 +-
> >   1 file changed, 64 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/display/intel_color.c 
> > b/drivers/gpu/drm/i915/display/intel_color.c
> > index 33871bfacee7..d48904f90e3a 100644
> > --- a/drivers/gpu/drm/i915/display/intel_color.c
> > +++ b/drivers/gpu/drm/i915/display/intel_color.c
> > @@ -597,6 +597,30 @@ create_linear_lut(struct drm_i915_private *i915, int 
> > lut_size)
> > return blob;
> >   }
> >   
> > +static struct drm_property_blob *
> > +create_resized_lut(struct drm_i915_private *i915,
> > +  const struct drm_property_blob *blob_in, int lut_out_size)
> > +{
> > +   int i, lut_in_size = drm_color_lut_size(blob_in);
> > +   struct drm_property_blob *blob_out;
> > +   const struct drm_color_lut *lut_in;
> > +   struct drm_color_lut *lut_out;
> > +
> > +   blob_out = drm_property_create_blob(&i915->drm,
> > +   sizeof(lut_out[0]) * lut_out_size,
> > +   NULL);
> > +   if (IS_ERR(blob_out))
> > +   return blob_out;
> > +
> > +   lut_in = blob_in->data;
> > +   lut_out = blob_out->data;
> > +
> > +   for (i = 0; i < lut_out_size; i++)
> > +   lut_out[i] = lut_in[i * (lut_in_size - 1) / (lut_out_size - 1)];
> > +
> > +   return blob_out;
> > +}
> > +
> >   static void i9xx_load_lut_8(struct intel_crtc *crtc,
> > const struct drm_property_blob *blob)
> >   {
> > @@ -723,19 +747,14 @@ static void ivb_load_lut_10(struct intel_crtc *crtc,
> > u32 prec_index)
> >   {
> > struct drm_i915_private *i915 = to_i915(crtc->base.dev);
> > -   int hw_lut_size = ivb_lut_10_size(prec_index);
> > const struct drm_color_lut *lut = blob->data;
> > int i, lut_size = drm_color_lut_size(blob);
> > enum pipe pipe = crtc->pipe;
> >   
> > -   for (i = 0; i < hw_lut_size; i++) {
> > -   /* We discard half the user entries in split gamma mode */
> > -   const struct drm_color_lut *entry =
> > -   &lut[i * (lut_size - 1) / (hw_lut_size - 1)];
> > -
> > +   for (i = 0; i < lut_size; i++) {
> > intel_de_write_fw(i915, PREC_PAL_INDEX(pipe), prec_index++);
> > intel_de_write_fw(i915, PREC_PAL_DATA(pipe),
> > - ilk_lut_10(entry));
> > + ilk_lut_10(&lut[i]));
> > }
> >   
> > /*
> > @@ -751,7 +770,6 @@ static void bdw_load_lut_10(struct intel_crtc *crtc,
> > u32 prec_index)
> >   {
> > struct drm_i915_private *i915 = to_i915(crtc->base.dev);
> > -   int hw_lut_size = ivb_lut_10_size(prec_index);
> > const struct drm_color_lut *lut = blob->data;
> > int i, lut_size = drm_color_lut_size(blob);
> > enum pipe pipe = crtc->pipe;
> > @@ -759,14 +777,9 @@ static void bdw_load_lut_10(struct intel_crtc *crtc,
> > intel_de_write_fw(i915, PREC_PAL_INDEX(pipe),
> >   prec_index | PAL_PREC_AUTO_INCREMENT);
> >   
> > -   for (i = 0; i < hw_lut_size; i++) {
> > -   /* We discard half the user entries in split gamma mode */
> > -   const struct drm_color_lut *entry =
> > -   &lut[i * (lut_size - 1) / (hw_lut_size - 1)];
> > -
> > +   for (i = 0; i < lut_size; i++)
> > intel_de_write_fw(i915, PREC_PAL_DATA(pipe),
> > - ilk_lut_10(entry));
> > -   }
> > + ilk_lut_10(&lut[i]));
> >   
> > /*
> >  * Reset the index, otherwise it prevents the legacy palette to be
> > @@ -1343,7 +1356,7 @@ void intel_color_assert_luts(const struct 
> > intel_crtc_state *crtc_state)
> > crtc_state->pre_csc_lut != 
> > i915->display.color.glk_linear_degamma_lut);

Re: [Intel-gfx] [RFC][PATCH v3 13/33] timers: drm: Use timer_shutdown_sync() before freeing timer

2022-11-04 Thread Tvrtko Ursulin



Hi,

On 04/11/2022 05:41, Steven Rostedt wrote:

From: "Steven Rostedt (Google)" 

Before a timer is freed, timer_shutdown_sync() must be called.

Link: https://lore.kernel.org/all/20220407161745.7d675...@gandalf.local.home/

Cc: "Noralf Trønnes" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: Tvrtko Ursulin 
Cc: dri-de...@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Steven Rostedt (Google) 
---
  drivers/gpu/drm/gud/gud_pipe.c   | 2 +-
  drivers/gpu/drm/i915/i915_sw_fence.c | 2 +-


If it stays all DRM drivers in one patch then I guess it needs to go via 
drm-misc, which for i915 would be okay I think in this case since patch 
is extremely unlikely to clash with anything. Or split it up per driver 
and then we can handle it in drm-intel-next once core functionality is in.


We do however have some more calls to del_timer_sync, where freeing is 
perhaps not immediately next to the site in code, but things definitely 
get freed like on module unload. Would we need to convert all of them to 
avoid some, presumably new, warnings?


Regards,

Tvrtko


  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/gud/gud_pipe.c b/drivers/gpu/drm/gud/gud_pipe.c
index 7c6dc2bcd14a..08429bdd57cf 100644
--- a/drivers/gpu/drm/gud/gud_pipe.c
+++ b/drivers/gpu/drm/gud/gud_pipe.c
@@ -272,7 +272,7 @@ static int gud_usb_bulk(struct gud_device *gdrm, size_t len)
  
  	usb_sg_wait(&ctx.sgr);
  
-	if (!del_timer_sync(&ctx.timer))

+   if (!timer_shutdown_sync(&ctx.timer))
ret = -ETIMEDOUT;
else if (ctx.sgr.status < 0)
ret = ctx.sgr.status;
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
b/drivers/gpu/drm/i915/i915_sw_fence.c
index 6fc0d1b89690..bfaa9a67dc35 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -465,7 +465,7 @@ static void irq_i915_sw_fence_work(struct irq_work *wrk)
struct i915_sw_dma_fence_cb_timer *cb =
container_of(wrk, typeof(*cb), work);
  
-	del_timer_sync(&cb->timer);

+   timer_shutdown_sync(&cb->timer);
dma_fence_put(cb->dma);
  
  	kfree_rcu(cb, rcu);


Re: [Intel-gfx] [PATCH] drm/i915/selftest: Bump up sample period for busy stats selftest

2022-11-04 Thread Tvrtko Ursulin



On 03/11/2022 18:08, Umesh Nerlige Ramappa wrote:

On Thu, Nov 03, 2022 at 12:28:46PM +, Tvrtko Ursulin wrote:


On 03/11/2022 00:11, Umesh Nerlige Ramappa wrote:

Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.

When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. The
latency incurred by this register read causes the failure.

On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.


Do I read this right - that the latency of a 64 bit timestamp register 
read is 0.9 - 1.5ms? That would be the read in guc_update_pm_timestamp?


Correct. That is total time taken by intel_uncore_read64_2x32() measured 
with local_clock().


One other thing I missed out in the comments is that enable_dc=0 also 
resolves the issue, but display team confirmed there is no relation to 
display in this case other than that it somehow introduces a latency in 
the reg read.


Could it be the DMC wreaking havoc something similar to b68763741aa2 
("drm/i915: Restore GT performance in headless mode with DMC loaded")?



One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to user
since the CPU timestamp obtained here is only used for (1) selftest and
(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.


Note that this solution is here - 
https://patchwork.freedesktop.org/patch/509991/?series=110497&rev=1


but I am not intending to use it since it just reduces the frequency of 
failues, but the inherent issue still exists.


Right, I'd just go with that as well if it makes a significant 
improvement. Or even just refactor intel_uncore_read64_2x32 to be under 
one spinlock/fw. I don't see that it can have an excuse to be less 
efficient since there's a loop in there.


Regards,

Tvrtko


Regards,
Umesh



In order to make the selftest more robust and account for such
latencies, increase the sample period to 100 ms.

Signed-off-by: Umesh Nerlige Ramappa 
---
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c 
b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c

index 0dcb3ed44a73..87c94314cf67 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
 ENGINE_TRACE(engine, "measuring busy time\n");
 preempt_disable();
 de = intel_engine_get_busy_time(engine, &t[0]);
-    mdelay(10);
+    mdelay(100);
 de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
 preempt_enable();
 dt = ktime_sub(t[1], t[0]);


Re: [Intel-gfx] [PATCH] mei: add timeout to send

2022-11-04 Thread Tvrtko Ursulin



Hi,

Not really driver I looked at before but since you copied me some 
comments below.


On 03/11/2022 15:55, Alexander Usyskin wrote:

When driver wakes up the firmware from the low power stand,
it is sending a memory ready message.
The send is done via synchronous/blocking function to ensure
that firmware is in ready state. However firmware might be
in unstable state and send might be block forever.
To address this issue a timeout is added to blocking write command on
the internal bus.


It would be preferrable to consistently wrap at 75 as per kernel coding 
style so it looks tidy.



Signed-off-by: Alexander Usyskin 
---
  drivers/misc/mei/bus-fixup.c | 19 +++
  drivers/misc/mei/bus.c   |  9 +
  drivers/misc/mei/client.c| 21 +
  drivers/misc/mei/client.h|  2 +-
  drivers/misc/mei/main.c  |  2 +-
  drivers/misc/mei/mei_dev.h   |  2 +-
  6 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/drivers/misc/mei/bus-fixup.c b/drivers/misc/mei/bus-fixup.c
index 71fbf0bc8453..3174cad8a5cc 100644
--- a/drivers/misc/mei/bus-fixup.c
+++ b/drivers/misc/mei/bus-fixup.c
@@ -128,7 +128,7 @@ static int mei_osver(struct mei_cl_device *cldev)
os_ver = (struct mei_os_ver *)fwcaps->data;
os_ver->os_type = OSTYPE_LINUX;
  
-	return __mei_cl_send(cldev->cl, buf, size, 0, mode);

+   return __mei_cl_send(cldev->cl, buf, size, 0, mode, 0);
  }
  
  #define MKHI_FWVER_BUF_LEN (sizeof(struct mkhi_msg_hdr) + \

@@ -149,7 +149,7 @@ static int mei_fwver(struct mei_cl_device *cldev)
req.hdr.command = MKHI_GEN_GET_FW_VERSION_CMD;
  
  	ret = __mei_cl_send(cldev->cl, (u8 *)&req, sizeof(req), 0,

-   MEI_CL_IO_TX_BLOCKING);
+   MEI_CL_IO_TX_BLOCKING, 0);
if (ret < 0) {
dev_err(&cldev->dev, "Could not send ReqFWVersion cmd\n");
return ret;
@@ -188,17 +188,19 @@ static int mei_fwver(struct mei_cl_device *cldev)
return ret;
  }
  
+#define GFX_MEMORY_READY_TIMEOUT 200

+
  static int mei_gfx_memory_ready(struct mei_cl_device *cldev)
  {
struct mkhi_gfx_mem_ready req = {0};
-   unsigned int mode = MEI_CL_IO_TX_INTERNAL;
+   unsigned int mode = MEI_CL_IO_TX_INTERNAL | MEI_CL_IO_TX_BLOCKING;
  
  	req.hdr.group_id = MKHI_GROUP_ID_GFX;

req.hdr.command = MKHI_GFX_MEMORY_READY_CMD_REQ;
req.flags = MKHI_GFX_MEM_READY_PXP_ALLOWED;
  
  	dev_dbg(&cldev->dev, "Sending memory ready command\n");

-   return __mei_cl_send(cldev->cl, (u8 *)&req, sizeof(req), 0, mode);
+   return __mei_cl_send(cldev->cl, (u8 *)&req, sizeof(req), 0, mode, 
GFX_MEMORY_READY_TIMEOUT);
  }
  
  static void mei_mkhi_fix(struct mei_cl_device *cldev)

@@ -263,12 +265,13 @@ static void mei_gsc_mkhi_fix_ver(struct mei_cl_device 
*cldev)
  
  	if (cldev->bus->pxp_mode == MEI_DEV_PXP_INIT) {

ret = mei_gfx_memory_ready(cldev);
-   if (ret < 0)
+   if (ret < 0) {
dev_err(&cldev->dev, "memory ready command failed 
%d\n", ret);
-   else
+   } else {
dev_dbg(&cldev->dev, "memory ready command sent\n");
+   cldev->bus->pxp_mode = MEI_DEV_PXP_SETUP;
+   }
/* we go to reset after that */
-   cldev->bus->pxp_mode = MEI_DEV_PXP_SETUP;
goto out;
}
  
@@ -374,7 +377,7 @@ static int mei_nfc_if_version(struct mei_cl *cl,

WARN_ON(mutex_is_locked(&bus->device_lock));
  
  	ret = __mei_cl_send(cl, (u8 *)&cmd, sizeof(cmd), 0,

-   MEI_CL_IO_TX_BLOCKING);
+   MEI_CL_IO_TX_BLOCKING, 0);
if (ret < 0) {
dev_err(bus->dev, "Could not send IF version cmd\n");
return ret;
diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index 1fbe127ff633..136b45192904 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -29,11 +29,12 @@
   * @length: buffer length
   * @vtag: virtual tag
   * @mode: sending mode
+ * @timeout: send timeout for blocking writes, 0 for infinite timeout
   *
   * Return: written size bytes or < 0 on error
   */
  ssize_t __mei_cl_send(struct mei_cl *cl, const u8 *buf, size_t length, u8 
vtag,
- unsigned int mode)
+ unsigned int mode, unsigned long timeout)
  {
struct mei_device *bus;
struct mei_cl_cb *cb;
@@ -108,7 +109,7 @@ ssize_t __mei_cl_send(struct mei_cl *cl, const u8 *buf, 
size_t length, u8 vtag,
cb->buf.size = 0;
}
  
-	rets = mei_cl_write(cl, cb);

+   rets = mei_cl_write(cl, cb, timeout);
  
  	if (mode & MEI_CL_IO_SGL && rets == 0)

rets = length;
@@ -254,7 +255,7 @@ ssize_t mei_cldev_send_vtag(struct mei_cl_device *cldev, 
const u8 *buf,
  {
struct mei_cl *cl = cldev->cl;
  
-	return __mei_cl_send(cl, buf, length, vtag, MEI_

Re: [Intel-gfx] [PATCH] drm/i915/mtl: Media GT and Render GT share common GGTT

2022-11-04 Thread Iddamsetty, Aravind
Hi Lucas,

> -Original Message-
> From: De Marchi, Lucas 
> Sent: Friday, November 4, 2022 12:36 PM
> To: Iddamsetty, Aravind 
> Cc: intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Media GT and Render GT share
> common GGTT
> 
> On Mon, Oct 31, 2022 at 06:01:11PM +0530, Aravind Iddamsetty wrote:
> >On XE_LPM+ platforms the media engines are carved out into a separate
> >GT but have a common GGTMMADR address range which essentially makes
> the
> >GGTT address space to be shared between media and render GT.
> >
> >BSPEC: 63834
> >
> >Cc: Matt Roper 
> >Signed-off-by: Aravind Iddamsetty 
> >---
> > drivers/gpu/drm/i915/gt/intel_ggtt.c  | 49 +++---
> > drivers/gpu/drm/i915/gt/intel_gt.c| 15 +-
> > drivers/gpu/drm/i915/gt/intel_gt_types.h  |  3 ++
> > drivers/gpu/drm/i915/gt/intel_gtt.h   |  3 ++
> > drivers/gpu/drm/i915/i915_driver.c| 19 +--
> > drivers/gpu/drm/i915/i915_gem_evict.c | 63 +--
> > drivers/gpu/drm/i915/i915_vma.c   |  5 +-
> > drivers/gpu/drm/i915/selftests/i915_gem.c |  2 +
> >drivers/gpu/drm/i915/selftests/mock_gtt.c |  1 +
> > 9 files changed, 115 insertions(+), 45 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> >b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> >index 2518cebbf931..f5c2f3c58627 100644
> >--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> >+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> >@@ -196,10 +196,13 @@ void i915_ggtt_suspend_vm(struct
> >i915_address_space *vm)
> >
> > void i915_ggtt_suspend(struct i915_ggtt *ggtt) {
> >+struct intel_gt *gt;
> >+
> > i915_ggtt_suspend_vm(&ggtt->vm);
> > ggtt->invalidate(ggtt);
> >
> >-intel_gt_check_and_clear_faults(ggtt->vm.gt);
> >+list_for_each_entry(gt, &ggtt->gt_list, ggtt_link)
> >+intel_gt_check_and_clear_faults(gt);
> > }
> >
> > void gen6_ggtt_invalidate(struct i915_ggtt *ggtt) @@ -214,27 +217,36
> >@@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt)
> >
> > static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)  {
> >-struct intel_uncore *uncore = ggtt->vm.gt->uncore;
> >+struct intel_uncore *uncore;
> >+struct intel_gt *gt;
> >
> >-/*
> >- * Note that as an uncached mmio write, this will flush the
> >- * WCB of the writes into the GGTT before it triggers the invalidate.
> >- */
> >-intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6,
> GFX_FLSH_CNTL_EN);
> >+list_for_each_entry(gt, &ggtt->gt_list, ggtt_link) {
> >+uncore = gt->uncore;
> >+/*
> >+ * Note that as an uncached mmio write, this will flush the
> >+ * WCB of the writes into the GGTT before it triggers the
> invalidate.
> >+ */
> >+intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6,
> GFX_FLSH_CNTL_EN);
> >+}
> > }
> >
> > static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)  {
> >-struct intel_uncore *uncore = ggtt->vm.gt->uncore;
> > struct drm_i915_private *i915 = ggtt->vm.i915;
> >
> > gen8_ggtt_invalidate(ggtt);
> >
> >-if (GRAPHICS_VER(i915) >= 12)
> >-intel_uncore_write_fw(uncore, GEN12_GUC_TLB_INV_CR,
> >-  GEN12_GUC_TLB_INV_CR_INVALIDATE);
> >-else
> >-intel_uncore_write_fw(uncore, GEN8_GTCR,
> GEN8_GTCR_INVALIDATE);
> >+if (GRAPHICS_VER(i915) >= 12) {
> >+struct intel_gt *gt;
> >+
> >+list_for_each_entry(gt, &ggtt->gt_list, ggtt_link)
> >+intel_uncore_write_fw(gt->uncore,
> >+  GEN12_GUC_TLB_INV_CR,
> >+
> GEN12_GUC_TLB_INV_CR_INVALIDATE);
> >+} else {
> >+intel_uncore_write_fw(ggtt->vm.gt->uncore,
> >+  GEN8_GTCR, GEN8_GTCR_INVALIDATE);
> >+}
> > }
> >
> > u64 gen8_ggtt_pte_encode(dma_addr_t addr, @@ -986,8 +998,6 @@
> static
> >int gen8_gmch_probe(struct i915_ggtt *ggtt)
> >
> > ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
> >
> >-setup_private_pat(ggtt->vm.gt);
> >-
> > return ggtt_probe_common(ggtt, size);  }
> >
> >@@ -1186,7 +1196,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt,
> struct intel_gt *gt)
> > (u64)ggtt->mappable_end >> 20);
> > drm_dbg(&i915->drm, "DSM size = %lluM\n",
> > (u64)resource_size(&intel_graphics_stolen_res) >> 20);
> >-
> >+INIT_LIST_HEAD(&ggtt->gt_list);
> > return 0;
> > }
> >
> >@@ -1296,9 +1306,11 @@ bool i915_ggtt_resume_vm(struct
> >i915_address_space *vm)
> >
> > void i915_ggtt_resume(struct i915_ggtt *ggtt) {
> >+struct intel_gt *gt;
> > bool flush;
> >
> >-intel_gt_check_and_clear_faults(ggtt->vm.gt);
> >+list_for_each_entry(gt, &ggtt->gt_list, ggtt_link)
> >+intel_gt_check_and_clear_faults(gt);
> >
> > flush = i915_ggtt_resume_vm(&ggtt->vm);
> >
> >@@ -1307,9 +1319,6 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
> > if (

Re: [Intel-gfx] KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL

2022-11-04 Thread Mauro Carvalho Chehab
On Thu, 3 Nov 2022 15:43:26 -0700
Daniel Latypov  wrote:

> On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
>  wrote:
> >
> > Hi,
> >
> > I'm facing a couple of issues when testing KUnit with the i915 driver.
> >
> > The DRM subsystem and the i915 driver has, for a long time, his own
> > way to do unit tests, which seems to be added before KUnit.
> >
> > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > a RFC with some patches adding support for the tests we have to be
> > reported using Kernel TAP and KUnit.
> >
> > There are basically 3 groups of tests there:
> >
> > - mock tests - check i915 hardware-independent logic;
> > - live tests - run some hardware-specific tests;
> > - perf tests - check perf support - also hardware-dependent.
> >
> > As they depend on i915 driver, they run only on x86, with PCI
> > stack enabled, but the mock tests run nicely via qemu.
> >
> > The live and perf tests require a real hardware. As we run them
> > together with our CI, which, among other things, test module
> > unload/reload and test loading i915 driver with different
> > modprobe parameters, the KUnit tests should be able to run as
> > a module.
> >
> > While testing KUnit, I noticed a couple of issues:
> >
> > 1. kunit.py parser is currently broken when used with modules
> >
> > the parser expects "TAP version xx" output, but this won't
> > happen when loading the kunit test driver.
> >
> > Are there any plans or patches fixing this issue?  
> 
> Partially.
> Note: we need a header to look for so we can strip prefixes (like timestamps).
> 
> But there is a patch in the works to add a TAP header for each
> subtest, hopefully in time for 6.2.

Good to know.

> This is to match the KTAP spec:
> https://kernel.org/doc/html/latest/dev-tools/ktap.html

I see.

> That should fix it so you can parse one suite's results at a time.
> I'm pretty sure it won't fix the case where there's multiple suites
> and/or you're trying to parse all test results at once via
> 
> $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> ./tools/testing/kunit/kunit.py parse

Could you point me to the changeset? perhaps I can write a followup
patch addressing this case.

> I think that in-kernel code change + some more python changes could
> make the above command work, but no one has actively started looking
> at that just yet.
> Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> underestimating how complicated this is).
> 
> >
> > 2. current->mm is not initialized
> >
> > Some tests do mmap(). They need the mm user context to be initialized,
> > but this is not happening right now.
> >
> > Are there a way to properly initialize it for KUnit?  
> 
> Right, this is a consequence of how early built-in KUnit tests are run
> after boot.
> I think for now, the answer is to make the test module-only.
> 
> I know David had some ideas here, but I can't speak to them.

This is happening when test-i915 is built as module as well.

I suspect that the function which initializes it is mm_alloc() inside 
kernel/fork.c:

struct mm_struct *mm_alloc(void)
{
struct mm_struct *mm;

mm = allocate_mm();
if (!mm)
return NULL;

memset(mm, 0, sizeof(*mm));
return mm_init(mm, current, current_user_ns());
}

As modprobing a test won't fork until all tests run, this never runs.

It seems that the normal usage is at fs/exec.c:

fs/exec.c:  bprm->mm = mm = mm_alloc();

but other places also call it:

arch/arm/mach-rpc/ecard.c:  struct mm_struct * mm = mm_alloc();
drivers/dma-buf/dma-resv.c: struct mm_struct *mm = mm_alloc();
include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
mm/debug_vm_pgtable.c:  args->mm = mm_alloc();

Probably the solution would be to call it inside kunit executor code,
adding support for modules to use it.

> > 3. there's no test filters for modules
> >
> > In order to be able to do proper CI automation, it is needed to
> > be able to control what tests will run or not. That's specially
> > interesting at development time where some tests may not apply
> > or not run properly on new hardware.
> >
> > Are there any plans to add support for it at kunit_test_suites()
> > when the driver is built as module? Ideally, the best would be to
> > export a per-module filter_glob parameter on such cases.  
> 
> I think this is a good idea and is doable. (I think I said as much on
> the other thread).
> 
> The thinking before was that people would make group tests together in 
> modules.
> But if you want to share a single module for many tests, this becomes
> more useful.

At least for this RFC, I opted to place everything we have already on
a single module. 

Perhaps I could create, instead, 3 separate modules. This way, I would gain
a "third level" and a poor man's way of filtering what test type
will run (mock, li

[Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915/mtl: Media GT and Render GT share common GGTT (rev2)

2022-11-04 Thread Patchwork
== Series Details ==

Series: drm/i915/mtl: Media GT and Render GT share common GGTT (rev2)
URL   : https://patchwork.freedesktop.org/series/110321/
State : failure

== Summary ==

Error: patch 
https://patchwork.freedesktop.org/api/1.0/series/110321/revisions/2/mbox/ not 
applied
Applying: drm/i915/mtl: Media GT and Render GT share common GGTT
error: sha1 information is lacking or useless 
(drivers/gpu/drm/i915/gt/intel_ggtt.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 drm/i915/mtl: Media GT and Render GT share common GGTT
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".




Re: [Intel-gfx] [PATCH] drm/i915/mtl: Media GT and Render GT share common GGTT

2022-11-04 Thread Lucas De Marchi

On Mon, Oct 31, 2022 at 06:01:11PM +0530, Aravind Iddamsetty wrote:

On XE_LPM+ platforms the media engines are carved out into a separate
GT but have a common GGTMMADR address range which essentially makes
the GGTT address space to be shared between media and render GT.

BSPEC: 63834

Cc: Matt Roper 
Signed-off-by: Aravind Iddamsetty 
---
drivers/gpu/drm/i915/gt/intel_ggtt.c  | 49 +++---
drivers/gpu/drm/i915/gt/intel_gt.c| 15 +-
drivers/gpu/drm/i915/gt/intel_gt_types.h  |  3 ++
drivers/gpu/drm/i915/gt/intel_gtt.h   |  3 ++
drivers/gpu/drm/i915/i915_driver.c| 19 +--
drivers/gpu/drm/i915/i915_gem_evict.c | 63 +--
drivers/gpu/drm/i915/i915_vma.c   |  5 +-
drivers/gpu/drm/i915/selftests/i915_gem.c |  2 +
drivers/gpu/drm/i915/selftests/mock_gtt.c |  1 +
9 files changed, 115 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 2518cebbf931..f5c2f3c58627 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -196,10 +196,13 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm)

void i915_ggtt_suspend(struct i915_ggtt *ggtt)
{
+   struct intel_gt *gt;
+
i915_ggtt_suspend_vm(&ggtt->vm);
ggtt->invalidate(ggtt);

-   intel_gt_check_and_clear_faults(ggtt->vm.gt);
+   list_for_each_entry(gt, &ggtt->gt_list, ggtt_link)
+   intel_gt_check_and_clear_faults(gt);
}

void gen6_ggtt_invalidate(struct i915_ggtt *ggtt)
@@ -214,27 +217,36 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt)

static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
{
-   struct intel_uncore *uncore = ggtt->vm.gt->uncore;
+   struct intel_uncore *uncore;
+   struct intel_gt *gt;

-   /*
-* Note that as an uncached mmio write, this will flush the
-* WCB of the writes into the GGTT before it triggers the invalidate.
-*/
-   intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
+   list_for_each_entry(gt, &ggtt->gt_list, ggtt_link) {
+   uncore = gt->uncore;
+   /*
+* Note that as an uncached mmio write, this will flush the
+* WCB of the writes into the GGTT before it triggers the 
invalidate.
+*/
+   intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6, 
GFX_FLSH_CNTL_EN);
+   }
}

static void guc_ggtt_invalidate(struct i915_ggtt *ggtt)
{
-   struct intel_uncore *uncore = ggtt->vm.gt->uncore;
struct drm_i915_private *i915 = ggtt->vm.i915;

gen8_ggtt_invalidate(ggtt);

-   if (GRAPHICS_VER(i915) >= 12)
-   intel_uncore_write_fw(uncore, GEN12_GUC_TLB_INV_CR,
- GEN12_GUC_TLB_INV_CR_INVALIDATE);
-   else
-   intel_uncore_write_fw(uncore, GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+   if (GRAPHICS_VER(i915) >= 12) {
+   struct intel_gt *gt;
+
+   list_for_each_entry(gt, &ggtt->gt_list, ggtt_link)
+   intel_uncore_write_fw(gt->uncore,
+ GEN12_GUC_TLB_INV_CR,
+ GEN12_GUC_TLB_INV_CR_INVALIDATE);
+   } else {
+   intel_uncore_write_fw(ggtt->vm.gt->uncore,
+ GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+   }
}

u64 gen8_ggtt_pte_encode(dma_addr_t addr,
@@ -986,8 +998,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)

ggtt->vm.pte_encode = gen8_ggtt_pte_encode;

-   setup_private_pat(ggtt->vm.gt);
-
return ggtt_probe_common(ggtt, size);
}

@@ -1186,7 +1196,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct 
intel_gt *gt)
(u64)ggtt->mappable_end >> 20);
drm_dbg(&i915->drm, "DSM size = %lluM\n",
(u64)resource_size(&intel_graphics_stolen_res) >> 20);
-
+   INIT_LIST_HEAD(&ggtt->gt_list);
return 0;
}

@@ -1296,9 +1306,11 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)

void i915_ggtt_resume(struct i915_ggtt *ggtt)
{
+   struct intel_gt *gt;
bool flush;

-   intel_gt_check_and_clear_faults(ggtt->vm.gt);
+   list_for_each_entry(gt, &ggtt->gt_list, ggtt_link)
+   intel_gt_check_and_clear_faults(gt);

flush = i915_ggtt_resume_vm(&ggtt->vm);

@@ -1307,9 +1319,6 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
if (flush)
wbinvd_on_all_cpus();

-   if (GRAPHICS_VER(ggtt->vm.i915) >= 8)
-   setup_private_pat(ggtt->vm.gt);
-
intel_ggtt_restore_fences(ggtt);
}

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index 2e796ffad911..d72efb74563a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -110,9 +110,17 @@ static int intel_gt_probe_lmem(struct intel_gt *gt)

int intel_gt_assig

  1   2   >