[Intel-gfx] [PATCH] i915/i915_gem_context: Remove debug message in i915_gem_context_create_ioctl

2022-10-25 Thread Karolina Drobnik
We know that as long as GEM context create ioctl succeeds, a context was
created. There is no need to write about it, especially when such a message
heavily pollutes dmesg and makes debugging actual errors harder.

Suggested-by: Chris Wilson 
Signed-off-by: Karolina Drobnik 
Cc: Andi Shyti 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 1e29b1e6d186..1456ca87c04e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -2298,7 +2298,6 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, 
void *data,
}
 
args->ctx_id = id;
-   drm_dbg(>drm, "HW context %d created\n", args->ctx_id);
 
return 0;
 
-- 
2.25.1



Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-19 Thread Karolina Drobnik

On 16.09.2022 22:32, Lucas De Marchi wrote:

On Fri, Sep 16, 2022 at 02:24:00PM +0200, Karolina Drobnik wrote:

On 14.09.2022 17:54, Robin Murphy wrote:

On 2022-09-14 16:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that
completely blocks testing. To enable CI and avoid possible
circular locking warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this 
issue? We could do this in our "for CI only" branch, but it's 
equally important that this is fixed for 6.0


Cc'ing them.


The lockdep report doesn't make much sense to me - the deadlock cycle
it's reporting doesn't even involve the mutex added by that commit,
and otherwise the lock ordering between the IOMMU bus notifier(s) and
cpu_hotplug_lock has existed for ages. Has lockdep somehow got
multiple different and unrelated bus notifiers mixed up, maybe?

FWIW nobody else has reported anything, and that mutex addresses a 
real-world concurrency issue, so I'm not convinced a revert is 
appropriate without at least a much clearer justification.


I'll share more background on this regression. We've noticed that no
tests were run for Alder Lake platforms. This may happens when, for
example, there is a kernel taint or lockdep warning.

Links:
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlm-1.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlp-6.html

The CI logs (which can be found for example here[1], boot0 file)
revealed a lockdep warning. One of the recent changes in the area was
commit ac9a5d522bb8 ("iommu/dma: Fix race condition during iova_domain
initialization"), and I sent a revert patch to test it on CI[2]. This
proved to be effective, as the tests started running on Alder Lake
platform:
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_108474v1/index.html?hosts=adlp 



To be clear, that revert is just a way of unblocking CI testing, the
problem requires a specific fix.

Lucas, would it be possible to merge this revert to the topic branch to
unblock Alder Lake until this issue is fixed? I'm afraid that some
regressions could slip through the cracks if we don't do it soon enough.


Yeah. Let's have CI running with the revertt so we can see if on next runs
it will really show it was a regression or if it's something else. I
think it will help us understand why it's failing.


Thanks for the merge. It looks like all adls are doing better now
(revert went in CI_DRM_12147):
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlp-6.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adln-1.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlm-1.html (CI_DRM_12149 
seems to show a different problem)


All the best,
Karolina


Lucas De Marchi




Thanks,
Karolina


[1] -
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12145/bat-adlm-1/igt@run...@aborted.html 


[2] - https://patchwork.freedesktop.org/series/108474/


Robin.


thanks Lucas De Marchi



kernel log:

== WARNING: 
possible circular locking dependency detected 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted 
-- cpuhp/0/15
is trying to acquire lock: 8881013df278 
(&(>bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50 but task is already holding 
lock: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at:

cpuhp_thread_fun+0x48/0x1f0 which lock already depends on the
new loc the existing dependency chain (in reverse order) is: ->
#3 (cpuhp_state-up){+.+.}-{0:0}: lock_acquire+0xd3/0x310 
cpuhp_thread_fun+0xa6/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 -> #2 
(cpu_hotplug_lock){}-{0:0}: lock_acquire+0xd3/0x310 
__cpuhp_state_add_instance+0x43/0x1c0 
iova_domain_init_rcaches+0x199/0x1c0 
iommu_setup_dma_ops+0x130/0x440 bus_iommu_probe+0x26a/0x2d0 
bus_set_iommu+0x82/0xd0 intel_iommu_init+0xe33/0x1039 
pci_iommu_init+0x9/0x31 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #1 
(>iova_cookie->mutex){+.+.}-{3:3}: lock_acquire+0xd3/0x310 
__mutex_lock+0x97/0xf10 iommu_setup_dma_ops+0xd7/0x440 
iommu_probe_device+0xa4/0x180 iommu_bus_notifier+0x2d/0x40 
notifier_call_chain+0x31/0x90 
blocking_notifier_call_chain+0x3a/0x50 device_add+0x3c1/0x900 
pci_device_add+0x255/0x580 pci_scan_single_device+0xa6/0xd0 
pci_scan_slot+0x7a/0x1b0 pci_scan_child_bus_extend+0x35/0x2a0 
vmd_probe+0x5cd/0x970 pci_device_probe+0x95/0x110 
really_probe+0xd6/0x350 __driver_probe_device+0x73/0x170 
driver_probe_device+0x1a/0x90 __driver_attach+0xbc/0x190 
bus_for_each_dev+0x72/0xc0 bus_add_driver+0x1bb/0x210 
driver_register+0x66/0xc0 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x3

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-16 Thread Karolina Drobnik

On 14.09.2022 17:54, Robin Murphy wrote:

On 2022-09-14 16:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that
completely blocks testing. To enable CI and avoid possible
circular locking warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this 
issue? We could do this in our "for CI only" branch, but it's 
equally important that this is fixed for 6.0


Cc'ing them.


The lockdep report doesn't make much sense to me - the deadlock cycle
it's reporting doesn't even involve the mutex added by that commit,
and otherwise the lock ordering between the IOMMU bus notifier(s) and
cpu_hotplug_lock has existed for ages. Has lockdep somehow got
multiple different and unrelated bus notifiers mixed up, maybe?

FWIW nobody else has reported anything, and that mutex addresses a 
real-world concurrency issue, so I'm not convinced a revert is 
appropriate without at least a much clearer justification.


I'll share more background on this regression. We've noticed that no
tests were run for Alder Lake platforms. This may happens when, for
example, there is a kernel taint or lockdep warning.

Links:
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlm-1.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlp-6.html

The CI logs (which can be found for example here[1], boot0 file)
revealed a lockdep warning. One of the recent changes in the area was
commit ac9a5d522bb8 ("iommu/dma: Fix race condition during iova_domain
initialization"), and I sent a revert patch to test it on CI[2]. This
proved to be effective, as the tests started running on Alder Lake
platform:
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_108474v1/index.html?hosts=adlp

To be clear, that revert is just a way of unblocking CI testing, the
problem requires a specific fix.

Lucas, would it be possible to merge this revert to the topic branch to
unblock Alder Lake until this issue is fixed? I'm afraid that some
regressions could slip through the cracks if we don't do it soon enough.

Thanks,
Karolina


[1] -
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12145/bat-adlm-1/igt@run...@aborted.html
[2] - https://patchwork.freedesktop.org/series/108474/


Robin.


thanks Lucas De Marchi



kernel log:

== WARNING: 
possible circular locking dependency detected 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted 
-- cpuhp/0/15
is trying to acquire lock: 8881013df278 
(&(>bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50 but task is already 
holding lock: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at:

 cpuhp_thread_fun+0x48/0x1f0 which lock already depends on the
new loc the existing dependency chain (in reverse order) is: ->
#3 (cpuhp_state-up){+.+.}-{0:0}: lock_acquire+0xd3/0x310 
cpuhp_thread_fun+0xa6/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 -> #2 
(cpu_hotplug_lock){}-{0:0}: lock_acquire+0xd3/0x310 
__cpuhp_state_add_instance+0x43/0x1c0 
iova_domain_init_rcaches+0x199/0x1c0 
iommu_setup_dma_ops+0x130/0x440 bus_iommu_probe+0x26a/0x2d0 
bus_set_iommu+0x82/0xd0 intel_iommu_init+0xe33/0x1039 
pci_iommu_init+0x9/0x31 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #1 
(>iova_cookie->mutex){+.+.}-{3:3}: 
lock_acquire+0xd3/0x310 __mutex_lock+0x97/0xf10 
iommu_setup_dma_ops+0xd7/0x440 iommu_probe_device+0xa4/0x180 
iommu_bus_notifier+0x2d/0x40 notifier_call_chain+0x31/0x90 
blocking_notifier_call_chain+0x3a/0x50 device_add+0x3c1/0x900 
pci_device_add+0x255/0x580 pci_scan_single_device+0xa6/0xd0 
pci_scan_slot+0x7a/0x1b0 pci_scan_child_bus_extend+0x35/0x2a0 
vmd_probe+0x5cd/0x970 pci_device_probe+0x95/0x110 
really_probe+0xd6/0x350 __driver_probe_device+0x73/0x170 
driver_probe_device+0x1a/0x90 __driver_attach+0xbc/0x190 
bus_for_each_dev+0x72/0xc0 bus_add_driver+0x1bb/0x210 
driver_register+0x66/0xc0 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #0 
(&(>bus_notifier)->rwsem){}-{3:3}: 
validate_chain+0xb3f/0x2000 __lock_acquire+0x5a4/0xb70 
lock_acquire+0xd3/0x310 down_read+0x39/0x140 
blocking_notifier_call_chain+0x20/0x50 device_add+0x3c1/0x900 
platform_device_add+0x108/0x240 coretemp_cpu_online+0xe1/0x15e 
[coretemp] cpuhp_invoke_callback+0x181/0x8a0 
cpuhp_thread_fun+0x188/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 other info that might
 help us debug thi Chain exists of &(>bus_notifier)->rwsem 
--> cpu_hotplug_lock --> cpuhp_state- Possible unsafe locking 
scenari CPU0 CPU1  
lock(cpuhp_state-up); lock(cpu_hotplug_

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-14 Thread Karolina Drobnik

On 14.09.2022 17:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that completely
blocks testing. To enable CI and avoid possible circular locking
warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this issue?
We could do this in our "for CI only" branch, but it's equally important
that this is fixed for 6.0


I planned to reach out to them after merging this revert on "CI only" 
branch (hence the topic tag) with more justification. And yes, I'm fully 
aware we're quite late in the cycle, so that's also why I went with this 
patch first.


Many thanks,
Karolina


Cc'ing them.

thanks
Lucas De Marchi



kernel log:

==
WARNING: possible circular locking dependency detected
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted
--
cpuhp/0/15 is trying to acquire lock:
8881013df278 (&(>bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50

 but task is already holding lock:
826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0

 which lock already depends on the new loc
 the existing dependency chain (in reverse order) is:
 -> #3 (cpuhp_state-up){+.+.}-{0:0}:
  lock_acquire+0xd3/0x310
  cpuhp_thread_fun+0xa6/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 -> #2 (cpu_hotplug_lock){}-{0:0}:
  lock_acquire+0xd3/0x310
  __cpuhp_state_add_instance+0x43/0x1c0
  iova_domain_init_rcaches+0x199/0x1c0
  iommu_setup_dma_ops+0x130/0x440
  bus_iommu_probe+0x26a/0x2d0
  bus_set_iommu+0x82/0xd0
  intel_iommu_init+0xe33/0x1039
  pci_iommu_init+0x9/0x31
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #1 (>iova_cookie->mutex){+.+.}-{3:3}:
  lock_acquire+0xd3/0x310
  __mutex_lock+0x97/0xf10
  iommu_setup_dma_ops+0xd7/0x440
  iommu_probe_device+0xa4/0x180
  iommu_bus_notifier+0x2d/0x40
  notifier_call_chain+0x31/0x90
  blocking_notifier_call_chain+0x3a/0x50
  device_add+0x3c1/0x900
  pci_device_add+0x255/0x580
  pci_scan_single_device+0xa6/0xd0
  pci_scan_slot+0x7a/0x1b0
  pci_scan_child_bus_extend+0x35/0x2a0
  vmd_probe+0x5cd/0x970
  pci_device_probe+0x95/0x110
  really_probe+0xd6/0x350
  __driver_probe_device+0x73/0x170
  driver_probe_device+0x1a/0x90
  __driver_attach+0xbc/0x190
  bus_for_each_dev+0x72/0xc0
  bus_add_driver+0x1bb/0x210
  driver_register+0x66/0xc0
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #0 (&(>bus_notifier)->rwsem){}-{3:3}:
  validate_chain+0xb3f/0x2000
  __lock_acquire+0x5a4/0xb70
  lock_acquire+0xd3/0x310
  down_read+0x39/0x140
  blocking_notifier_call_chain+0x20/0x50
  device_add+0x3c1/0x900
  platform_device_add+0x108/0x240
  coretemp_cpu_online+0xe1/0x15e [coretemp]
  cpuhp_invoke_callback+0x181/0x8a0
  cpuhp_thread_fun+0x188/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 other info that might help us debug thi
Chain exists of &(>bus_notifier)->rwsem --> 
cpu_hotplug_lock --> cpuhp_state-

Possible unsafe locking scenari
  CPU0    CPU1
      
 lock(cpuhp_state-up);
  lock(cpu_hotplug_lock);
  lock(cpuhp_state-up);
 lock(&(>bus_notifier)->rwsem);
  *** DEADLOCK *
2 locks held by cpuhp/0/15:
#0: 82648f10 (cpu_hotplug_lock){}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
#1: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0

 stack backtrace:
CPU: 0 PID: 15 Comm: cpuhp/0 Not tainted 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1
Hardware name: Intel Corporation Alder Lake Client 
Platform/AlderLake-P DDR4 RVP, BIOS ADLPFWI1.R00.3135.A00.2203251419 
03/25/2022

Call Trace:

dump_stack_lvl+0x56/0x7f
check_noncircular+0x132/0x150
validate_chain+0xb3f/0x2000
__lock_acquire+0x5a4/0xb70
lock_acquire+0xd3/0x310
? blocking_notifier_call_chain+0x20/0x50
down_read+0x39/0x140
? blocking_notifier_call_chain+0x20/0x50
blocking_notifier_call_chain+0x20/0x50
device_add+0x3c1/0x900
? dev_set_name+0x4e/0x70
platform_device_add+0x108/0x240
coretemp_cpu_online+0xe1/0x15e [coretemp]
? create_core_data+0x550/0x550 [coretemp]
cpuhp_invoke_callback+0x181/0x8a0
cpuhp_thread_fun+0x188/0x1f0

[Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-14 Thread Karolina Drobnik
This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that completely
blocks testing. To enable CI and avoid possible circular locking
warning, revert the patch.

kernel log:

==
WARNING: possible circular locking dependency detected
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted
--
cpuhp/0/15 is trying to acquire lock:
8881013df278 (&(>bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50
  but task is already holding lock:
826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x48/0x1f0
  which lock already depends on the new loc
  the existing dependency chain (in reverse order) is:
  -> #3 (cpuhp_state-up){+.+.}-{0:0}:
   lock_acquire+0xd3/0x310
   cpuhp_thread_fun+0xa6/0x1f0
   smpboot_thread_fn+0x1b5/0x260
   kthread+0xed/0x120
   ret_from_fork+0x1f/0x30
  -> #2 (cpu_hotplug_lock){}-{0:0}:
   lock_acquire+0xd3/0x310
   __cpuhp_state_add_instance+0x43/0x1c0
   iova_domain_init_rcaches+0x199/0x1c0
   iommu_setup_dma_ops+0x130/0x440
   bus_iommu_probe+0x26a/0x2d0
   bus_set_iommu+0x82/0xd0
   intel_iommu_init+0xe33/0x1039
   pci_iommu_init+0x9/0x31
   do_one_initcall+0x53/0x2f0
   kernel_init_freeable+0x18f/0x1e1
   kernel_init+0x11/0x120
   ret_from_fork+0x1f/0x30
  -> #1 (>iova_cookie->mutex){+.+.}-{3:3}:
   lock_acquire+0xd3/0x310
   __mutex_lock+0x97/0xf10
   iommu_setup_dma_ops+0xd7/0x440
   iommu_probe_device+0xa4/0x180
   iommu_bus_notifier+0x2d/0x40
   notifier_call_chain+0x31/0x90
   blocking_notifier_call_chain+0x3a/0x50
   device_add+0x3c1/0x900
   pci_device_add+0x255/0x580
   pci_scan_single_device+0xa6/0xd0
   pci_scan_slot+0x7a/0x1b0
   pci_scan_child_bus_extend+0x35/0x2a0
   vmd_probe+0x5cd/0x970
   pci_device_probe+0x95/0x110
   really_probe+0xd6/0x350
   __driver_probe_device+0x73/0x170
   driver_probe_device+0x1a/0x90
   __driver_attach+0xbc/0x190
   bus_for_each_dev+0x72/0xc0
   bus_add_driver+0x1bb/0x210
   driver_register+0x66/0xc0
   do_one_initcall+0x53/0x2f0
   kernel_init_freeable+0x18f/0x1e1
   kernel_init+0x11/0x120
   ret_from_fork+0x1f/0x30
  -> #0 (&(>bus_notifier)->rwsem){}-{3:3}:
   validate_chain+0xb3f/0x2000
   __lock_acquire+0x5a4/0xb70
   lock_acquire+0xd3/0x310
   down_read+0x39/0x140
   blocking_notifier_call_chain+0x20/0x50
   device_add+0x3c1/0x900
   platform_device_add+0x108/0x240
   coretemp_cpu_online+0xe1/0x15e [coretemp]
   cpuhp_invoke_callback+0x181/0x8a0
   cpuhp_thread_fun+0x188/0x1f0
   smpboot_thread_fn+0x1b5/0x260
   kthread+0xed/0x120
   ret_from_fork+0x1f/0x30
  other info that might help us debug thi
Chain exists of &(>bus_notifier)->rwsem --> 
cpu_hotplug_lock --> cpuhp_state-
 Possible unsafe locking scenari
   CPU0CPU1
   
  lock(cpuhp_state-up);
   lock(cpu_hotplug_lock);
   lock(cpuhp_state-up);
  lock(&(>bus_notifier)->rwsem);
   *** DEADLOCK *
2 locks held by cpuhp/0/15:
 #0: 82648f10 (cpu_hotplug_lock){}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
 #1: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
  stack backtrace:
CPU: 0 PID: 15 Comm: cpuhp/0 Not tainted 6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ 
#1
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 
RVP, BIOS ADLPFWI1.R00.3135.A00.2203251419 03/25/2022
Call Trace:
 
 dump_stack_lvl+0x56/0x7f
 check_noncircular+0x132/0x150
 validate_chain+0xb3f/0x2000
 __lock_acquire+0x5a4/0xb70
 lock_acquire+0xd3/0x310
 ? blocking_notifier_call_chain+0x20/0x50
 down_read+0x39/0x140
 ? blocking_notifier_call_chain+0x20/0x50
 blocking_notifier_call_chain+0x20/0x50
 device_add+0x3c1/0x900
 ? dev_set_name+0x4e/0x70
 platform_device_add+0x108/0x240
 coretemp_cpu_online+0xe1/0x15e [coretemp]
 ? create_core_data+0x550/0x550 [coretemp]
 cpuhp_invoke_callback+0x181/0x8a0
 cpuhp_thread_fun+0x188/0x1f0
 ? smpboot_thread_fn+0x1e/0x260
 smpboot_thread_fn+0x1b5/0x260
 ? sort_range+0x20/0x20
 kthread+0xed/0x120
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6641

Signed-off-by: Karolina Drobnik 
Cc: Lucas De Marchi 
---
 drivers/iommu/dma-iommu.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 17dd683b2fce..9616b473e4c7 100644
--- a/drivers/

[Intel-gfx] [PATCH v5 3/4] drm/i915/selftest: Always cancel semaphore on error

2022-09-13 Thread Karolina Drobnik
From: Chris Wilson 

Ensure that we always signal the semaphore when timing out, so that if it
happens to be stuck waiting for the semaphore we will quickly recover
without having to wait for a reset.

Reported-by: CQ Tang 
Signed-off-by: Chris Wilson 
Cc: CQ Tang 
cc: Joonas Lahtinen 
Signed-off-by: Ramalingam C 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 8320eab7ac36..71e664fc87e9 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -1460,18 +1460,17 @@ static int __lrc_isolation(struct intel_engine_cs 
*engine, u32 poison)
}
 
err = poison_registers(B, poison, sema);
-   if (err) {
-   WRITE_ONCE(*sema, -1);
-   i915_request_put(rq);
-   goto err_result1;
-   }
-
-   if (i915_request_wait(rq, 0, HZ / 2) < 0) {
-   i915_request_put(rq);
+   if (err == 0 && i915_request_wait(rq, 0, HZ / 2) < 0) {
+   pr_err("%s(%s): wait for results timed out\n",
+  __func__, engine->name);
err = -ETIME;
-   goto err_result1;
}
+
+   /* Always cancel the semaphore wait, just in case the GPU gets stuck */
+   WRITE_ONCE(*sema, -1);
i915_request_put(rq);
+   if (err)
+   goto err_result1;
 
err = compare_isolation(engine, ref, result, A, poison);
 
-- 
2.25.1



[Intel-gfx] [PATCH v5 2/4] drm/i915/selftests: Check for incomplete LRI from the context image

2022-09-13 Thread Karolina Drobnik
From: Chris Wilson 

In order to keep the context image parser simple, we assume that all
commands follow a similar format. A few, especially not MI commands on
the render engines, have fixed lengths not encoded in a length field.
This caused us to incorrectly skip over 3D state commands, and start
interpreting context data as instructions. Eventually, as Daniele
discovered, this would lead us to find addition LRI as part of the data
and mistakenly add invalid LRI commands to the context probes.

Stop parsing after we see the first !MI command, as we know we will have
seen all the context registers by that point. (Mostly true for all gen
so far, though the render context does have LRI after the first page
that we have been ignoring so far. It would be useful to extract those
as well so that we have the full list of user accessible registers.)

Similarly, emit a warning if we do try to emit an invalid zero-length
LRI.

Testcase: igt@i915_selftest@live@gt_lrc
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6580
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6670

Reported-by: Daniele Ceraolo Spurio 
Signed-off-by: Chris Wilson 
Cc: Daniele Ceraolo Spurio 
Signed-off-by: Ramalingam C 
Acked-by: Thomas Hellstrom 
Signed-off-by: Karolina Drobnik 
Reviewed-by: Gwan-gyeong Mun 
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 61 +++---
 1 file changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 9fc031267e0f..8320eab7ac36 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -27,6 +27,9 @@
 #define NUM_GPR 16
 #define NUM_GPR_DW (NUM_GPR * 2) /* each GPR is 2 dwords */
 
+#define LRI_HEADER MI_INSTR(0x22, 0)
+#define LRI_LENGTH_MASK GENMASK(7, 0)
+
 static struct i915_vma *create_scratch(struct intel_gt *gt)
 {
return __vm_create_scratch_for_read_pinned(>ggtt->vm, PAGE_SIZE);
@@ -202,7 +205,7 @@ static int live_lrc_layout(void *arg)
continue;
}
 
-   if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
+   if ((lri & GENMASK(31, 23)) != LRI_HEADER) {
pr_err("%s: Expected LRI command at dword %d, 
found %08x\n",
   engine->name, dw, lri);
err = -EINVAL;
@@ -992,18 +995,40 @@ store_context(struct intel_context *ce, struct i915_vma 
*scratch)
hw = defaults;
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
-   u32 len = hw[dw] & 0x7f;
+   u32 len = hw[dw] & LRI_LENGTH_MASK;
+
+   /*
+* Keep it simple, skip parsing complex commands
+*
+* At present, there are no more MI_LOAD_REGISTER_IMM
+* commands after the first 3D state command. Rather
+* than include a table (see i915_cmd_parser.c) of all
+* the possible commands and their instruction lengths
+* (or mask for variable length instructions), assume
+* we have gathered the complete list of registers and
+* bail out.
+*/
+   if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT)
+   break;
 
if (hw[dw] == 0) {
dw++;
continue;
}
 
-   if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
+   if ((hw[dw] & GENMASK(31, 23)) != LRI_HEADER) {
+   /* Assume all other MI commands match LRI length mask */
dw += len + 2;
continue;
}
 
+   if (!len) {
+   pr_err("%s: invalid LRI found in context image\n",
+  ce->engine->name);
+   igt_hexdump(defaults, PAGE_SIZE);
+   break;
+   }
+
dw++;
len = (len + 1) / 2;
while (len--) {
@@ -1155,18 +1180,29 @@ static struct i915_vma *load_context(struct 
intel_context *ce, u32 poison)
hw = defaults;
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
-   u32 len = hw[dw] & 0x7f;
+   u32 len = hw[dw] & LRI_LENGTH_MASK;
+
+   /* For simplicity, break parsing at the first complex command */
+   if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT)
+   break;
 
if (hw[dw] == 0) {
dw++;
continue;
}
 
-   if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
+   if ((hw[dw] & GENMASK(31, 23)) !

[Intel-gfx] [PATCH v5 4/4] drm/i915/selftest: Clear the output buffers before GPU writes

2022-09-13 Thread Karolina Drobnik
From: Chris Wilson 

When testing whether we can get the GPU to leak information about
non-privileged state, we first need to ensure that the output buffer is
set to a known value as the HW may opt to skip the write into memory for
a non-privileged read of a sensitive register. We chose POISON_INUSE (0x5a)
so that is both non-zero and distinct from the poison values used during
the test.

v2:
  Use i915_gem_object_pin_map_unlocked

Reported-by: CQ Tang 
Signed-off-by: Chris Wilson 
Cc: CQ Tang 
cc: Joonas Lahtinen 
Signed-off-by: Ramalingam C 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 32 ++
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 71e664fc87e9..82d3f8058995 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -1395,6 +1395,30 @@ static int compare_isolation(struct intel_engine_cs 
*engine,
return err;
 }
 
+static struct i915_vma *
+create_result_vma(struct i915_address_space *vm, unsigned long sz)
+{
+   struct i915_vma *vma;
+   void *ptr;
+
+   vma = create_user_vma(vm, sz);
+   if (IS_ERR(vma))
+   return vma;
+
+   /* Set the results to a known value distinct from the poison */
+   ptr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WC);
+   if (IS_ERR(ptr)) {
+   i915_vma_put(vma);
+   return ERR_CAST(ptr);
+   }
+
+   memset(ptr, POISON_INUSE, vma->size);
+   i915_gem_object_flush_map(vma->obj);
+   i915_gem_object_unpin_map(vma->obj);
+
+   return vma;
+}
+
 static int __lrc_isolation(struct intel_engine_cs *engine, u32 poison)
 {
u32 *sema = memset32(engine->status_page.addr + 1000, 0, 1);
@@ -1413,13 +1437,13 @@ static int __lrc_isolation(struct intel_engine_cs 
*engine, u32 poison)
goto err_A;
}
 
-   ref[0] = create_user_vma(A->vm, SZ_64K);
+   ref[0] = create_result_vma(A->vm, SZ_64K);
if (IS_ERR(ref[0])) {
err = PTR_ERR(ref[0]);
goto err_B;
}
 
-   ref[1] = create_user_vma(A->vm, SZ_64K);
+   ref[1] = create_result_vma(A->vm, SZ_64K);
if (IS_ERR(ref[1])) {
err = PTR_ERR(ref[1]);
goto err_ref0;
@@ -1441,13 +1465,13 @@ static int __lrc_isolation(struct intel_engine_cs 
*engine, u32 poison)
}
i915_request_put(rq);
 
-   result[0] = create_user_vma(A->vm, SZ_64K);
+   result[0] = create_result_vma(A->vm, SZ_64K);
if (IS_ERR(result[0])) {
err = PTR_ERR(result[0]);
goto err_ref1;
}
 
-   result[1] = create_user_vma(A->vm, SZ_64K);
+   result[1] = create_result_vma(A->vm, SZ_64K);
if (IS_ERR(result[1])) {
err = PTR_ERR(result[1]);
goto err_result0;
-- 
2.25.1



[Intel-gfx] [PATCH v5 0/4] lrc selftest fixes

2022-09-13 Thread Karolina Drobnik
Few bug fixes for lrc selftest.

v5:
  - Add Reviewed-by tag to "drm/i915/selftests: Check
for incomplete LRI from the context image" patch, as
it got reviewed in different series:
https://patchwork.freedesktop.org/series/108487/

Chris Wilson (4):
  drm/i915/gt: Explicitly clear BB_OFFSET for new contexts
  drm/i915/selftests: Check for incomplete LRI from the context image
  drm/i915/selftest: Always cancel semaphore on error
  drm/i915/selftest: Clear the output buffers before GPU writes

 drivers/gpu/drm/i915/gt/intel_engine_regs.h |   1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c |  20 
 drivers/gpu/drm/i915/gt/selftest_lrc.c  | 115 
 3 files changed, 116 insertions(+), 20 deletions(-)

--
2.25.1


[Intel-gfx] [PATCH v5 1/4] drm/i915/gt: Explicitly clear BB_OFFSET for new contexts

2022-09-13 Thread Karolina Drobnik
From: Chris Wilson 

Even though the initial protocontext we load onto HW has the register
cleared, by the time we save it into the default image, BB_OFFSET has
had the enable bit set. Reclear BB_OFFSET for each new context.

Testcase: igt/i915_selftests/gt_lrc

v2:
  Extend it for gen8.
v3:
  BB_OFFSET is recorded per engine from Gen9 onwards

Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Signed-off-by: Ramalingam C 
Reviewed-by: Thomas Hellstrom 
---
 drivers/gpu/drm/i915/gt/intel_engine_regs.h |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c | 20 
 drivers/gpu/drm/i915/gt/selftest_lrc.c  |  5 +
 3 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_regs.h 
b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
index 889f0df3940b..fe1a0d5fd4b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_regs.h
@@ -110,6 +110,7 @@
 #define RING_SBBSTATE(base)_MMIO((base) + 0x118) /* hsw+ */
 #define RING_SBBADDR_UDW(base) _MMIO((base) + 0x11c) /* gen8+ 
*/
 #define RING_BBADDR(base)  _MMIO((base) + 0x140)
+#define RING_BB_OFFSET(base)   _MMIO((base) + 0x158)
 #define RING_BBADDR_UDW(base)  _MMIO((base) + 0x168) /* gen8+ 
*/
 #define CCID(base) _MMIO((base) + 0x180)
 #define   CCID_EN  BIT(0)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 08214683e130..3955292483a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -662,6 +662,21 @@ static int lrc_ring_mi_mode(const struct intel_engine_cs 
*engine)
return -1;
 }
 
+static int lrc_ring_bb_offset(const struct intel_engine_cs *engine)
+{
+   if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
+   return 0x80;
+   else if (GRAPHICS_VER(engine->i915) >= 12)
+   return 0x70;
+   else if (GRAPHICS_VER(engine->i915) >= 9)
+   return 0x64;
+   else if (GRAPHICS_VER(engine->i915) >= 8 &&
+engine->class == RENDER_CLASS)
+   return 0xc4;
+   else
+   return -1;
+}
+
 static int lrc_ring_gpr0(const struct intel_engine_cs *engine)
 {
if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
@@ -768,6 +783,7 @@ static void init_common_regs(u32 * const regs,
 bool inhibit)
 {
u32 ctl;
+   int loc;
 
ctl = _MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH);
ctl |= _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
@@ -779,6 +795,10 @@ static void init_common_regs(u32 * const regs,
regs[CTX_CONTEXT_CONTROL] = ctl;
 
regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
+
+   loc = lrc_ring_bb_offset(engine);
+   if (loc != -1)
+   regs[loc + 1] = 0;
 }
 
 static void init_wa_bb_regs(u32 * const regs,
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 1109088fe8f6..9fc031267e0f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -357,6 +357,11 @@ static int live_lrc_fixed(void *arg)
lrc_ring_cmd_buf_cctl(engine),
"RING_CMD_BUF_CCTL"
},
+   {
+   
i915_mmio_reg_offset(RING_BB_OFFSET(engine->mmio_base)),
+   lrc_ring_bb_offset(engine),
+   "RING_BB_OFFSET"
+   },
{ },
}, *t;
u32 *hw;
-- 
2.25.1



[Intel-gfx] [PATCH] drm/i915/selftests: Check for incomplete LRI from the context image

2022-09-13 Thread Karolina Drobnik
From: Chris Wilson 

In order to keep the context image parser simple, we assume that all
commands follow a similar format. A few, especially not MI commands on
the render engines, have fixed lengths not encoded in a length field.
This caused us to incorrectly skip over 3D state commands, and start
interpreting context data as instructions. Eventually, as Daniele
discovered, this would lead us to find addition LRI as part of the data
and mistakenly add invalid LRI commands to the context probes.

Stop parsing after we see the first !MI command, as we know we will have
seen all the context registers by that point. (Mostly true for all gen so far,
though the render context does have LRI after the first page that we
have been ignoring so far. It would be useful to extract those as well
so that we have the full list of user accesisble registers.)

Similarly, emit a warning if we do try to emit an invalid zero-length
LRI.

Reported-by: Daniele Ceraolo Spurio 
Signed-off-by: Chris Wilson 
Cc: Daniele Ceraolo Spurio 
Signed-off-by: Karolina Drobnik 
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 61 +++---
 1 file changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 1109088fe8f6..954a1c5c10ef 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -27,6 +27,9 @@
 #define NUM_GPR 16
 #define NUM_GPR_DW (NUM_GPR * 2) /* each GPR is 2 dwords */
 
+#define LRI_HEADER MI_INSTR(0x22, 0)
+#define LRI_LENGTH_MASK GENMASK(7, 0)
+
 static struct i915_vma *create_scratch(struct intel_gt *gt)
 {
return __vm_create_scratch_for_read_pinned(>ggtt->vm, PAGE_SIZE);
@@ -202,7 +205,7 @@ static int live_lrc_layout(void *arg)
continue;
}
 
-   if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
+   if ((lri & GENMASK(31, 23)) != LRI_HEADER) {
pr_err("%s: Expected LRI command at dword %d, 
found %08x\n",
   engine->name, dw, lri);
err = -EINVAL;
@@ -987,18 +990,40 @@ store_context(struct intel_context *ce, struct i915_vma 
*scratch)
hw = defaults;
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
-   u32 len = hw[dw] & 0x7f;
+   u32 len = hw[dw] & LRI_LENGTH_MASK;
+
+   /*
+* Keep it simple, skip parsing complex commands
+*
+* At present, there are no more MI_LOAD_REGISTER_IMM
+* commands after the first 3D state command. Rather
+* than include a table (see i915_cmd_parser.c) of all
+* the possible commands and their instruction lengths
+* (or mask for variable length instructions), assume
+* we have gathered the complete list of registers and
+* bail out.
+*/
+   if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT)
+   break;
 
if (hw[dw] == 0) {
dw++;
continue;
}
 
-   if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
+   if ((hw[dw] & GENMASK(31, 23)) != LRI_HEADER) {
+   /* Assume all other MI commands match LRI length mask */
dw += len + 2;
continue;
}
 
+   if (!len) {
+   pr_err("%s: invalid LRI found in context image\n",
+  ce->engine->name);
+   igt_hexdump(defaults, PAGE_SIZE);
+   break;
+   }
+
dw++;
len = (len + 1) / 2;
while (len--) {
@@ -1150,18 +1175,29 @@ static struct i915_vma *load_context(struct 
intel_context *ce, u32 poison)
hw = defaults;
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
-   u32 len = hw[dw] & 0x7f;
+   u32 len = hw[dw] & LRI_LENGTH_MASK;
+
+   /* For simplicity, break parsing at the first complex command */
+   if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT)
+   break;
 
if (hw[dw] == 0) {
dw++;
continue;
}
 
-   if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {
+   if ((hw[dw] & GENMASK(31, 23)) != LRI_HEADER) {
dw += len + 2;
continue;
}
 
+   if (!len) {
+   pr_err("%s: invalid LRI found in context image\n",
+  ce->en

Re: [Intel-gfx] [PATCH] dma-buf: revert "return only unsignaled fences in dma_fence_unwrap_for_each v3"

2022-07-12 Thread Karolina Drobnik

Hi Christian,

On 12.07.2022 12:28, Christian König wrote:

This reverts commit 8f61973718485f3e89bc4f408f929048b7b47c83.

It turned out that this is not correct. Especially the sync_file info
IOCTL needs to see even signaled fences to correctly report back their
status to userspace.

Instead add the filter in the merge function again where it makes sense.

Signed-off-by: Christian König 


After applying the patch, fence merging works and all sw_sync subtests 
are passing. Thanks for taking care of this.


Tested-by: Karolina Drobnik 


---
  drivers/dma-buf/dma-fence-unwrap.c | 3 ++-
  include/linux/dma-fence-unwrap.h   | 6 +-
  2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-fence-unwrap.c 
b/drivers/dma-buf/dma-fence-unwrap.c
index 502a65ea6d44..7002bca792ff 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -72,7 +72,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int 
num_fences,
count = 0;
for (i = 0; i < num_fences; ++i) {
dma_fence_unwrap_for_each(tmp, [i], fences[i])
-   ++count;
+   if (!dma_fence_is_signaled(tmp))
+   ++count;
}
  
  	if (count == 0)

diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h
index 390de1ee9d35..66b1e56fbb81 100644
--- a/include/linux/dma-fence-unwrap.h
+++ b/include/linux/dma-fence-unwrap.h
@@ -43,14 +43,10 @@ struct dma_fence *dma_fence_unwrap_next(struct 
dma_fence_unwrap *cursor);
   * Unwrap dma_fence_chain and dma_fence_array containers and deep dive into 
all
   * potential fences in them. If @head is just a normal fence only that one is
   * returned.
- *
- * Note that signalled fences are opportunistically filtered out, which
- * means the iteration is potentially over no fence at all.
   */
  #define dma_fence_unwrap_for_each(fence, cursor, head)
\
for (fence = dma_fence_unwrap_first(head, cursor); fence;   \
-fence = dma_fence_unwrap_next(cursor)) \
-   if (!dma_fence_is_signaled(fence))
+fence = dma_fence_unwrap_next(cursor))
  
  struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,

   struct dma_fence **fences,


Re: [Intel-gfx] [PATCH v2 3/3] drm/i915/gt: Only kick the signal worker if there's been an update

2022-07-12 Thread Karolina Drobnik

Hi Rodrigo,

Many thanks for taking another look at the patches.

On 08.07.2022 16:40, Rodrigo Vivi wrote:

On Fri, Jul 08, 2022 at 04:20:13PM +0200, Karolina Drobnik wrote:

From: Chris Wilson 

One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
dma_resv workaround") is that it stores many, many more fences. Whereas
adding an exclusive fence used to remove the shared fence list, that
list is now preserved and the write fences included into the list. Not
just a single write fence, but now a write/read fence per context. That
causes us to have to track more fences than before (albeit half of those
are redundant), and we trigger more interrupts for multi-engine
workloads.

As part of reducing the impact from handling more signaling, we observe
we only need to kick the signal worker after adding a fence iff we have


s/iff/if


This is fine, it means "if, and only if"


good cause to believe that there is work to be done in processing the
fence i.e. we either need to enable the interrupt or the request is
already complete but we don't know if we saw the interrupt and so need
to check signaling.

References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson 
Signed-off-by: Karolina Drobnik 
---
  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 9dc9dccf7b09..ecc990ec1b95 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -399,7 +399,8 @@ static void insert_breadcrumb(struct i915_request *rq)
 * the request as it may have completed and raised the interrupt as
 * we were attaching it into the lists.
 */
-   irq_work_queue(>irq_work);
+   if (!b->irq_armed || __i915_request_is_complete(rq))


would we need the READ_ONCE(irq_armed) ?
would we need to use the irq_lock?


I'll rephrase Chris' answer here:

No, it doesn't need either, the workqueuing is unrelated to the 
irq_lock. The worker enables the interrupt if there are any breadcrumbs 
at the end of its task. When queuing the work, we have to consider the 
race conditions:


  - If the worker is running and b->irq_armed at this point, we know the
irq will remain armed
  - If the worker is running and !b->irq_armed at this point, we will
kick the worker again -- it doesn't make any difference then if the
worker is in the process of trying to arm the irq
  - If the worker is not running, b->irq_armed is constant, no race

Ergo, the only race condition is where the worker is trying to arm the 
irq, and we end up running the worker a second time.


The only danger to consider is _not_ running the worker when we need to. 
Once we put the breadcrumb on the signal, it has to be removed at some 
point. Normally this is only performed by the worker, so we have to 
confident that the worker will be run. We know that if the irq is armed 
(after we have attached this breadcrumb) there must be another run of 
the worker.


The other condition then, if the irq is armed, but the breadcrumb is 
already completed, we may not see an interrupt from the gpu as the 
breadcrumb may have completed as we attached it, keeping the worker 
alive, but not noticing the completed breadcrumb in that case, we have 
to simulate the interrupt ourselves and give the worker a kick.


The irq_lock is immaterial in both cases.


+   irq_work_queue(>irq_work);
  }
  
  bool i915_request_enable_breadcrumb(struct i915_request *rq)

--
2.25.1



Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Apply waitboosting before fence wait (rev2)

2022-07-10 Thread Karolina Drobnik




On 09.07.2022 04:00, Patchwork wrote:

*Patch Details*
*Series:*   drm/i915: Apply waitboosting before fence wait (rev2)
*URL:*	https://patchwork.freedesktop.org/series/105925/ 


*State:*failure
*Details:* 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105925v2/index.html 




  CI Bug Log - changes from CI_DRM_11862_full -> Patchwork_105925v2_full


Summary

*FAILURE*

Serious unknown changes coming with Patchwork_105925v2_full absolutely 
need to be

verified manually.

If you think the reported changes have nothing to do with the changes
introduced in Patchwork_105925v2_full, please notify your bug team to 
allow them

to document this new failure mode, which will reduce false positives in CI.


Participating hosts (10 -> 13)

Additional (3): shard-rkl shard-dg1 shard-tglu


Possible new issues

Here are the unknown changes that may have been introduced in 
Patchwork_105925v2_full:



  IGT changes


Possible regressions

  *

igt@i915_pm_rc6_residency@rc6-idle@vcs0:

  o shard-apl: PASS


-> WARN


  *

igt@kms_vblank@pipe-d-wait-forked-busy-hang:

  o shard-tglb: PASS


-> INCOMPLETE


+1 similar issue



These issues are unrelated to the patchset changes


Suppressed

The following results come from untrusted machines, tests, or statuses.
They do not affect the overall result.

  *

igt@kms_cursor_crc@cursor-offscreen:

  o {shard-rkl}: NOTRUN -> SKIP


+1 similar issue
  *

{igt@kms_invalid_mode@clock-too-high@edp-1-pipe-d}:

  o shard-tglb: NOTRUN -> SKIP


+3 similar issues


New tests

New tests have been introduced between CI_DRM_11862_full and 
Patchwork_105925v2_full:



  New IGT tests (1)

  * igt@kms_legacy_colorkey@basic:
  o Statuses : 1 skip(s)
  o Exec time: [0.0] s


Known issues

Here are the changes found in Patchwork_105925v2_full that come from 
known issues:



  CI changes


Issues hit

  * boot:
  o shard-glk: (PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS

,
PASS


Re: [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Apply waitboosting before fence wait (rev2)

2022-07-10 Thread Karolina Drobnik



On 08.07.2022 16:55, Patchwork wrote:

== Series Details ==

Series: drm/i915: Apply waitboosting before fence wait (rev2)
URL   : https://patchwork.freedesktop.org/series/105925/
State : warning

== Summary ==

Error: dim checkpatch failed
e35c61e9e46c drm/i915/gem: Look for waitboosting across the whole object prior 
to individual waits
85a4558b5d11 drm/i915: Bump GT idling delay to 2 jiffies
0ec610488c77 drm/i915/gt: Only kick the signal worker if there's been an update
-:23: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description 
(prefer a maximum 75 chars per line)
#23:
References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")


Tags shouldn't be wrapped


-:23: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> 
("")' - ie: 'commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv 
workaround")'
#23:
References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")


"References:" line doesn't need "commit" keyword, it's a false positive


total: 1 errors, 1 warnings, 0 checks, 9 lines checked




[Intel-gfx] [PATCH v2 3/3] drm/i915/gt: Only kick the signal worker if there's been an update

2022-07-08 Thread Karolina Drobnik
From: Chris Wilson 

One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
dma_resv workaround") is that it stores many, many more fences. Whereas
adding an exclusive fence used to remove the shared fence list, that
list is now preserved and the write fences included into the list. Not
just a single write fence, but now a write/read fence per context. That
causes us to have to track more fences than before (albeit half of those
are redundant), and we trigger more interrupts for multi-engine
workloads.

As part of reducing the impact from handling more signaling, we observe
we only need to kick the signal worker after adding a fence iff we have
good cause to believe that there is work to be done in processing the
fence i.e. we either need to enable the interrupt or the request is
already complete but we don't know if we saw the interrupt and so need
to check signaling.

References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson 
Signed-off-by: Karolina Drobnik 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 9dc9dccf7b09..ecc990ec1b95 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -399,7 +399,8 @@ static void insert_breadcrumb(struct i915_request *rq)
 * the request as it may have completed and raised the interrupt as
 * we were attaching it into the lists.
 */
-   irq_work_queue(>irq_work);
+   if (!b->irq_armed || __i915_request_is_complete(rq))
+   irq_work_queue(>irq_work);
 }
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
-- 
2.25.1



[Intel-gfx] [PATCH v2 2/3] drm/i915: Bump GT idling delay to 2 jiffies

2022-07-08 Thread Karolina Drobnik
From: Chris Wilson 

In monitoring a transcode pipeline that is latency sensitive (it waits
between submitting frames, and each frame requires work on rcs/vcs/vecs
engines), it is found that it took longer than a single jiffy for it to
sustain its workload. Allowing an extra jiffy headroom for the userspace
prevents us from prematurely parking and having to exit powersaving
immediately.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Signed-off-by: Chris Wilson 
Signed-off-by: Karolina Drobnik 
Reviewed-by: Rodrigo Vivi 
Reviewed-by: Andi Shyti 
---
 drivers/gpu/drm/i915/i915_active.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c 
b/drivers/gpu/drm/i915/i915_active.c
index ee2b3a375362..7412abf166a8 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -974,7 +974,7 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 
GEM_BUG_ON(!intel_engine_pm_is_awake(engine));
llist_add(barrier_to_ll(node), >barrier_tasks);
-   intel_engine_pm_put_delay(engine, 1);
+   intel_engine_pm_put_delay(engine, 2);
}
 }
 
-- 
2.25.1



[Intel-gfx] [PATCH v2 0/3] drm/i915: Apply waitboosting before fence wait

2022-07-08 Thread Karolina Drobnik
Waitboost is a heuristic that detects latency sensitive workloads waiting for
the results from previous execution. The wait can be seen as GPU
under-utilisation by RPS, Render Power State management, which might lower the
GPU frequency to save power. Limiting the frequency means more waiting for
results, which is undesirable for submissions with tight time constraints.
To circumvent this, with waitboost we iteratively check the list of fences
during gem_wait to see if any of them is stalled waiting for GPU. If such is
found, and the request hasn't yet started its execution, we temporarily bump up
the GPU frequency, so we get the required results as soon as possible.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") changes
the fences order and how they are iterated. Under this new scheme, we would wait
on each fence that starts executing, rendering them not suitable for waitboost.

To avoid situation like this, inspect the entire list of fences dma-resv
earlier, before gem_wait, instead of sequentially waiting for each of them,
applying the boost when needed.

v2:
  - Fixed a style issue in i915_gem_object_boost

Chris Wilson (3):
  drm/i915/gem: Look for waitboosting across the whole object prior to
individual waits
  drm/i915: Bump GT idling delay to 2 jiffies
  drm/i915/gt: Only kick the signal worker if there's been an update

 drivers/gpu/drm/i915/gem/i915_gem_wait.c| 34 +
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  3 +-
 drivers/gpu/drm/i915/i915_active.c  |  2 +-
 3 files changed, 37 insertions(+), 2 deletions(-)

--
2.25.1


[Intel-gfx] [PATCH v2 1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

2022-07-08 Thread Karolina Drobnik
From: Chris Wilson 

We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.

On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.

Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.

Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.

Reported-by: Thomas Voegtle 
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Signed-off-by: Karolina Drobnik 
Tested-by: Thomas Voegtle 
Reviewed-by: Andi Shyti 
---
 drivers/gpu/drm/i915/gem/i915_gem_wait.c | 34 
 1 file changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 319936f91ac5..e6e01c2a74a6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -9,6 +9,7 @@
 #include 
 
 #include "gt/intel_engine.h"
+#include "gt/intel_rps.h"
 
 #include "i915_gem_ioctls.h"
 #include "i915_gem_object.h"
@@ -31,6 +32,37 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
  timeout);
 }
 
+static void
+i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
+{
+   struct dma_resv_iter cursor;
+   struct dma_fence *fence;
+
+   /*
+* Prescan all fences for potential boosting before we begin waiting.
+*
+* When we wait, we wait on outstanding fences serially. If the
+* dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
+* form 1:2, then as we look at each wait in turn we see that each
+* request is currently executing and not worthy of boosting. But if
+* we only happen to look at the final fence in the sequence (because
+* of request coalescing or splitting between read/write arrays by
+* the iterator), then we would boost. As such our decision to boost
+* or not is delicately balanced on the order we wait on fences.
+*
+* So instead of looking for boosts sequentially, look for all boosts
+* upfront and then wait on the outstanding fences.
+*/
+
+   dma_resv_iter_begin(, resv,
+   dma_resv_usage_rw(flags & I915_WAIT_ALL));
+   dma_resv_for_each_fence_unlocked(, fence)
+   if (dma_fence_is_i915(fence) &&
+   !i915_request_started(to_request(fence)))
+   intel_rps_boost(to_request(fence));
+   dma_resv_iter_end();
+}
+
 static long
 i915_gem_object_wait_reservation(struct dma_resv *resv,
 unsigned int flags,
@@ -40,6 +72,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
struct dma_fence *fence;
long ret = timeout ?: 1;
 
+   i915_gem_object_boost(resv, flags);
+
dma_resv_iter_begin(, resv,
dma_resv_usage_rw(flags & I915_WAIT_ALL));
dma_resv_for_each_fence_unlocked(, fence) {
-- 
2.25.1



Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

2022-07-08 Thread Karolina Drobnik

Hi Andi,

On 08.07.2022 13:38, Andi Shyti wrote:

Hi Karolina,

[...]


+   dma_resv_for_each_fence_unlocked(, fence) {
+   if (dma_fence_is_i915(fence) &&
+   !i915_request_started(to_request(fence)))
+   intel_rps_boost(to_request(fence));
+   }


you can remove the brackets here.

Andi


Would you like me to send v2 for it?


if the committer takes care of removing it, then no need,
otherwise, please yes, resend it. Even if it's a stupid nitpick,
if it gets applied it would be very difficult to get it fixed[*].

Didn't checkpatch.pl complain about it?


Right, thanks for explaining this. checkpatch.pl only complained about 
unwrapped References tag (a false positive), but I can delete the braces 
and resend the patchset.



If you are going to resend it, you can add my:

Reviewed-by: Andi Shyti 

also here.


OK, will so do, thanks


All the best,
Karolina


Thanks,
Andi

[*] Because just minor coding style patches are generally
rejected, the only way for fixing style issues would be if:

  1. someone is working in that part of the code
  2. someone will sneak in the code fix in some unrelated patch
 screwing up git blame
  3. someone will send a big series on this file and have some
 trivial coding style patches in it.

Amongst the three above, number '2' is the one I dislike the
most, but unfortunately that's also the most used.


Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

2022-07-08 Thread Karolina Drobnik

Hi Rodrigo and Andi,

Thank you very much for your reviews.

On 07.07.2022 23:50, Andi Shyti wrote:

Hi Rodrigo, Chris and Karolina,

On Thu, Jul 07, 2022 at 01:57:52PM -0400, Rodrigo Vivi wrote:

On Tue, Jul 05, 2022 at 12:57:17PM +0200, Karolina Drobnik wrote:

From: Chris Wilson 

We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.


you can also write here "... is not impressed, was sad and cried"


:)


On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.

Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.

Instead of


Okay, all the explanation makes sense. But this commit message and
the cover letter tells that we are doing X *Instead* *of* Y.
That would mean code for Y would be removed. But this patch just add X.


The boost we have right now is applied in i915_request_wait_timeout, 
which is at the lower level than i915_gem_object_wait, and works for all 
users, not just gem_object(s).



So it looks to me that we are adding extra boosts with the code below.


That's true - we'll have a redundant boost check for gem_object, but 
this is fine. In this case it wouldn't apply the boost again because 
either (1) the request already started execution, or (2) intel_rps_boost 
returns early because i915_request_has_waitboost(rq) is true.




What am I missing?


I think the two things are unrelated and they are not mutually
exclusive.


Exactly


What this patch does is to scan the fences upfront and boost
those requests that are not naturally boosted (that is what we
currently do and as of now regressed) in order to not leave the
sad user above crying for long.


That is correct (especially the crying part)


Am I right? If so I would r-b this patch as it looks good to me.


asking whether to boost each fence in turn, we can look at

whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.

Reported-by: Thomas Voegtle 
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Signed-off-by: Karolina Drobnik 
Tested-by: Thomas Voegtle 
---
  drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 
  1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 319936f91ac5..3fbb464746e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -9,6 +9,7 @@
  #include 
  
  #include "gt/intel_engine.h"

+#include "gt/intel_rps.h"
  
  #include "i915_gem_ioctls.h"

  #include "i915_gem_object.h"
@@ -31,6 +32,38 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
  timeout);
  }
  
+static void

+i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
+{
+   struct dma_resv_iter cursor;
+   struct dma_fence *fence;
+
+   /*
+* Prescan all fences for potential boosting before we begin waiting.
+*
+* When we wait, we wait on outstanding fences serially. If the
+* dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
+* form 1:2, then as we look at each wait in turn we see that each
+* request is currently executing a

[Intel-gfx] [PATCH 0/3] drm/i915: Apply waitboosting before fence wait

2022-07-05 Thread Karolina Drobnik
Waitboost is a heuristic that detects latency sensitive workloads waiting for
the results from previous execution. The wait can be seen as GPU
under-utilisation by RPS, Render Power State management, which might lower the
GPU frequency to save power. Limiting the frequency means more waiting for
results, which is undesirable for submissions with tight time constraints.
To circumvent this, with waitboost we iteratively check the list of fences
during gem_wait to see if any of them is stalled waiting for GPU. If such is
found, and the request hasn't yet started its execution, we temporarily bump up
the GPU frequency, so we get the required results as soon as possible.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround") changes
the fences order and how they are iterated. Under this new scheme, we would wait
on each fence that starts executing, rendering them not suitable for waitboost.

To avoid situation like this, inspect the entire list of fences dma-resv
earlier, before gem_wait, instead of sequentially waiting for each of them,
applying the boost when needed.

Test-with: 20220705103551.3720180-1-karolina.drob...@intel.com

Chris Wilson (3):
  drm/i915/gem: Look for waitboosting across the whole object prior to
individual waits
  drm/i915: Bump GT idling delay to 2 jiffies
  drm/i915/gt: Only kick the signal worker if there's been an update

 drivers/gpu/drm/i915/gem/i915_gem_wait.c| 35 +
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  3 +-
 drivers/gpu/drm/i915/i915_active.c  |  2 +-
 3 files changed, 38 insertions(+), 2 deletions(-)

--
2.25.1


[Intel-gfx] [PATCH 1/3] drm/i915/gem: Look for waitboosting across the whole object prior to individual waits

2022-07-05 Thread Karolina Drobnik
From: Chris Wilson 

We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.

On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.

Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.

Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.

Reported-by: Thomas Voegtle 
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson 
Cc: Tvrtko Ursulin 
Signed-off-by: Karolina Drobnik 
Tested-by: Thomas Voegtle 
---
 drivers/gpu/drm/i915/gem/i915_gem_wait.c | 35 
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 319936f91ac5..3fbb464746e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -9,6 +9,7 @@
 #include 
 
 #include "gt/intel_engine.h"
+#include "gt/intel_rps.h"
 
 #include "i915_gem_ioctls.h"
 #include "i915_gem_object.h"
@@ -31,6 +32,38 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
  timeout);
 }
 
+static void
+i915_gem_object_boost(struct dma_resv *resv, unsigned int flags)
+{
+   struct dma_resv_iter cursor;
+   struct dma_fence *fence;
+
+   /*
+* Prescan all fences for potential boosting before we begin waiting.
+*
+* When we wait, we wait on outstanding fences serially. If the
+* dma-resv contains a sequence such as 1:1, 1:2 instead of a reduced
+* form 1:2, then as we look at each wait in turn we see that each
+* request is currently executing and not worthy of boosting. But if
+* we only happen to look at the final fence in the sequence (because
+* of request coalescing or splitting between read/write arrays by
+* the iterator), then we would boost. As such our decision to boost
+* or not is delicately balanced on the order we wait on fences.
+*
+* So instead of looking for boosts sequentially, look for all boosts
+* upfront and then wait on the outstanding fences.
+*/
+
+   dma_resv_iter_begin(, resv,
+   dma_resv_usage_rw(flags & I915_WAIT_ALL));
+   dma_resv_for_each_fence_unlocked(, fence) {
+   if (dma_fence_is_i915(fence) &&
+   !i915_request_started(to_request(fence)))
+   intel_rps_boost(to_request(fence));
+   }
+   dma_resv_iter_end();
+}
+
 static long
 i915_gem_object_wait_reservation(struct dma_resv *resv,
 unsigned int flags,
@@ -40,6 +73,8 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
struct dma_fence *fence;
long ret = timeout ?: 1;
 
+   i915_gem_object_boost(resv, flags);
+
dma_resv_iter_begin(, resv,
dma_resv_usage_rw(flags & I915_WAIT_ALL));
dma_resv_for_each_fence_unlocked(, fence) {
-- 
2.25.1



[Intel-gfx] [PATCH 3/3] drm/i915/gt: Only kick the signal worker if there's been an update

2022-07-05 Thread Karolina Drobnik
From: Chris Wilson 

One impact of commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove
dma_resv workaround") is that it stores many, many more fences. Whereas
adding an exclusive fence used to remove the shared fence list, that
list is now preserved and the write fences included into the list. Not
just a single write fence, but now a write/read fence per context. That
causes us to have to track more fences than before (albeit half of those
are redundant), and we trigger more interrupts for multi-engine
workloads.

As part of reducing the impact from handling more signaling, we observe
we only need to kick the signal worker after adding a fence iff we have
good cause to believe that there is work to be done in processing the
fence i.e. we either need to enable the interrupt or the request is
already complete but we don't know if we saw the interrupt and so need
to check signaling.

References: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson 
Signed-off-by: Karolina Drobnik 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 9dc9dccf7b09..ecc990ec1b95 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -399,7 +399,8 @@ static void insert_breadcrumb(struct i915_request *rq)
 * the request as it may have completed and raised the interrupt as
 * we were attaching it into the lists.
 */
-   irq_work_queue(>irq_work);
+   if (!b->irq_armed || __i915_request_is_complete(rq))
+   irq_work_queue(>irq_work);
 }
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
-- 
2.25.1



[Intel-gfx] [PATCH 2/3] drm/i915: Bump GT idling delay to 2 jiffies

2022-07-05 Thread Karolina Drobnik
From: Chris Wilson 

In monitoring a transcode pipeline that is latency sensitive (it waits
between submitting frames, and each frame requires work on rcs/vcs/vecs
engines), it is found that it took longer than a single jiffy for it to
sustain its workload. Allowing an extra jiffy headroom for the userspace
prevents us from prematurely parking and having to exit powersaving
immediately.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Signed-off-by: Chris Wilson 
Signed-off-by: Karolina Drobnik 
---
 drivers/gpu/drm/i915/i915_active.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c 
b/drivers/gpu/drm/i915/i915_active.c
index ee2b3a375362..7412abf166a8 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -974,7 +974,7 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 
GEM_BUG_ON(!intel_engine_pm_is_awake(engine));
llist_add(barrier_to_ll(node), >barrier_tasks);
-   intel_engine_pm_put_delay(engine, 1);
+   intel_engine_pm_put_delay(engine, 2);
}
 }
 
-- 
2.25.1