[Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-14 Thread Karolina Drobnik
This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that completely
blocks testing. To enable CI and avoid possible circular locking
warning, revert the patch.

kernel log:

==
WARNING: possible circular locking dependency detected
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted
--
cpuhp/0/15 is trying to acquire lock:
8881013df278 (&(&priv->bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50
  but task is already holding lock:
826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x48/0x1f0
  which lock already depends on the new loc
  the existing dependency chain (in reverse order) is:
  -> #3 (cpuhp_state-up){+.+.}-{0:0}:
   lock_acquire+0xd3/0x310
   cpuhp_thread_fun+0xa6/0x1f0
   smpboot_thread_fn+0x1b5/0x260
   kthread+0xed/0x120
   ret_from_fork+0x1f/0x30
  -> #2 (cpu_hotplug_lock){}-{0:0}:
   lock_acquire+0xd3/0x310
   __cpuhp_state_add_instance+0x43/0x1c0
   iova_domain_init_rcaches+0x199/0x1c0
   iommu_setup_dma_ops+0x130/0x440
   bus_iommu_probe+0x26a/0x2d0
   bus_set_iommu+0x82/0xd0
   intel_iommu_init+0xe33/0x1039
   pci_iommu_init+0x9/0x31
   do_one_initcall+0x53/0x2f0
   kernel_init_freeable+0x18f/0x1e1
   kernel_init+0x11/0x120
   ret_from_fork+0x1f/0x30
  -> #1 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
   lock_acquire+0xd3/0x310
   __mutex_lock+0x97/0xf10
   iommu_setup_dma_ops+0xd7/0x440
   iommu_probe_device+0xa4/0x180
   iommu_bus_notifier+0x2d/0x40
   notifier_call_chain+0x31/0x90
   blocking_notifier_call_chain+0x3a/0x50
   device_add+0x3c1/0x900
   pci_device_add+0x255/0x580
   pci_scan_single_device+0xa6/0xd0
   pci_scan_slot+0x7a/0x1b0
   pci_scan_child_bus_extend+0x35/0x2a0
   vmd_probe+0x5cd/0x970
   pci_device_probe+0x95/0x110
   really_probe+0xd6/0x350
   __driver_probe_device+0x73/0x170
   driver_probe_device+0x1a/0x90
   __driver_attach+0xbc/0x190
   bus_for_each_dev+0x72/0xc0
   bus_add_driver+0x1bb/0x210
   driver_register+0x66/0xc0
   do_one_initcall+0x53/0x2f0
   kernel_init_freeable+0x18f/0x1e1
   kernel_init+0x11/0x120
   ret_from_fork+0x1f/0x30
  -> #0 (&(&priv->bus_notifier)->rwsem){}-{3:3}:
   validate_chain+0xb3f/0x2000
   __lock_acquire+0x5a4/0xb70
   lock_acquire+0xd3/0x310
   down_read+0x39/0x140
   blocking_notifier_call_chain+0x20/0x50
   device_add+0x3c1/0x900
   platform_device_add+0x108/0x240
   coretemp_cpu_online+0xe1/0x15e [coretemp]
   cpuhp_invoke_callback+0x181/0x8a0
   cpuhp_thread_fun+0x188/0x1f0
   smpboot_thread_fn+0x1b5/0x260
   kthread+0xed/0x120
   ret_from_fork+0x1f/0x30
  other info that might help us debug thi
Chain exists of &(&priv->bus_notifier)->rwsem --> 
cpu_hotplug_lock --> cpuhp_state-
 Possible unsafe locking scenari
   CPU0CPU1
   
  lock(cpuhp_state-up);
   lock(cpu_hotplug_lock);
   lock(cpuhp_state-up);
  lock(&(&priv->bus_notifier)->rwsem);
   *** DEADLOCK *
2 locks held by cpuhp/0/15:
 #0: 82648f10 (cpu_hotplug_lock){}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
 #1: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
  stack backtrace:
CPU: 0 PID: 15 Comm: cpuhp/0 Not tainted 6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ 
#1
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 
RVP, BIOS ADLPFWI1.R00.3135.A00.2203251419 03/25/2022
Call Trace:
 
 dump_stack_lvl+0x56/0x7f
 check_noncircular+0x132/0x150
 validate_chain+0xb3f/0x2000
 __lock_acquire+0x5a4/0xb70
 lock_acquire+0xd3/0x310
 ? blocking_notifier_call_chain+0x20/0x50
 down_read+0x39/0x140
 ? blocking_notifier_call_chain+0x20/0x50
 blocking_notifier_call_chain+0x20/0x50
 device_add+0x3c1/0x900
 ? dev_set_name+0x4e/0x70
 platform_device_add+0x108/0x240
 coretemp_cpu_online+0xe1/0x15e [coretemp]
 ? create_core_data+0x550/0x550 [coretemp]
 cpuhp_invoke_callback+0x181/0x8a0
 cpuhp_thread_fun+0x188/0x1f0
 ? smpboot_thread_fn+0x1e/0x260
 smpboot_thread_fn+0x1b5/0x260
 ? sort_range+0x20/0x20
 kthread+0xed/0x120
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6641

Signed-off-by: Karolina Drobnik 
Cc: Lucas De Marchi 
---
 drivers/iommu/dma-iommu.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 17dd683b2fce..9616b473e4c7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iom

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-14 Thread Lucas De Marchi

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that completely
blocks testing. To enable CI and avoid possible circular locking
warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this issue?
We could do this in our "for CI only" branch, but it's equally important
that this is fixed for 6.0

Cc'ing them.

thanks
Lucas De Marchi



kernel log:

==
WARNING: possible circular locking dependency detected
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted
--
cpuhp/0/15 is trying to acquire lock:
8881013df278 (&(&priv->bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50
 but task is already holding lock:
826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x48/0x1f0
 which lock already depends on the new loc
 the existing dependency chain (in reverse order) is:
 -> #3 (cpuhp_state-up){+.+.}-{0:0}:
  lock_acquire+0xd3/0x310
  cpuhp_thread_fun+0xa6/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 -> #2 (cpu_hotplug_lock){}-{0:0}:
  lock_acquire+0xd3/0x310
  __cpuhp_state_add_instance+0x43/0x1c0
  iova_domain_init_rcaches+0x199/0x1c0
  iommu_setup_dma_ops+0x130/0x440
  bus_iommu_probe+0x26a/0x2d0
  bus_set_iommu+0x82/0xd0
  intel_iommu_init+0xe33/0x1039
  pci_iommu_init+0x9/0x31
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #1 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
  lock_acquire+0xd3/0x310
  __mutex_lock+0x97/0xf10
  iommu_setup_dma_ops+0xd7/0x440
  iommu_probe_device+0xa4/0x180
  iommu_bus_notifier+0x2d/0x40
  notifier_call_chain+0x31/0x90
  blocking_notifier_call_chain+0x3a/0x50
  device_add+0x3c1/0x900
  pci_device_add+0x255/0x580
  pci_scan_single_device+0xa6/0xd0
  pci_scan_slot+0x7a/0x1b0
  pci_scan_child_bus_extend+0x35/0x2a0
  vmd_probe+0x5cd/0x970
  pci_device_probe+0x95/0x110
  really_probe+0xd6/0x350
  __driver_probe_device+0x73/0x170
  driver_probe_device+0x1a/0x90
  __driver_attach+0xbc/0x190
  bus_for_each_dev+0x72/0xc0
  bus_add_driver+0x1bb/0x210
  driver_register+0x66/0xc0
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #0 (&(&priv->bus_notifier)->rwsem){}-{3:3}:
  validate_chain+0xb3f/0x2000
  __lock_acquire+0x5a4/0xb70
  lock_acquire+0xd3/0x310
  down_read+0x39/0x140
  blocking_notifier_call_chain+0x20/0x50
  device_add+0x3c1/0x900
  platform_device_add+0x108/0x240
  coretemp_cpu_online+0xe1/0x15e [coretemp]
  cpuhp_invoke_callback+0x181/0x8a0
  cpuhp_thread_fun+0x188/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 other info that might help us debug thi
Chain exists of &(&priv->bus_notifier)->rwsem --> cpu_hotplug_lock 
--> cpuhp_state-
Possible unsafe locking scenari
  CPU0CPU1
  
 lock(cpuhp_state-up);
  lock(cpu_hotplug_lock);
  lock(cpuhp_state-up);
 lock(&(&priv->bus_notifier)->rwsem);
  *** DEADLOCK *
2 locks held by cpuhp/0/15:
#0: 82648f10 (cpu_hotplug_lock){}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
#1: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
 stack backtrace:
CPU: 0 PID: 15 Comm: cpuhp/0 Not tainted 6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ 
#1
Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 
RVP, BIOS ADLPFWI1.R00.3135.A00.2203251419 03/25/2022
Call Trace:

dump_stack_lvl+0x56/0x7f
check_noncircular+0x132/0x150
validate_chain+0xb3f/0x2000
__lock_acquire+0x5a4/0xb70
lock_acquire+0xd3/0x310
? blocking_notifier_call_chain+0x20/0x50
down_read+0x39/0x140
? blocking_notifier_call_chain+0x20/0x50
blocking_notifier_call_chain+0x20/0x50
device_add+0x3c1/0x900
? dev_set_name+0x4e/0x70
platform_device_add+0x108/0x240
coretemp_cpu_online+0xe1/0x15e [coretemp]
? create_core_data+0x550/0x550 [coretemp]
cpuhp_invoke_callback+0x181/0x8a0
cpuhp_thread_fun+0x188/0x1f0
? smpboot_thread_fn+0x1e/0x260
smpboot_thread_fn+0x1b5/0x260
? sort_range+0x20/0x20
kthread+0xed/0x120
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30


Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6641

Signed-off-by: Karolina Drobnik 
Cc: Lucas De Marchi 
---
drivers/iommu/dma-iommu.c | 17 -
1 file changed, 4 inserti

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-14 Thread Karolina Drobnik

On 14.09.2022 17:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that completely
blocks testing. To enable CI and avoid possible circular locking
warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this issue?
We could do this in our "for CI only" branch, but it's equally important
that this is fixed for 6.0


I planned to reach out to them after merging this revert on "CI only" 
branch (hence the topic tag) with more justification. And yes, I'm fully 
aware we're quite late in the cycle, so that's also why I went with this 
patch first.


Many thanks,
Karolina


Cc'ing them.

thanks
Lucas De Marchi



kernel log:

==
WARNING: possible circular locking dependency detected
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted
--
cpuhp/0/15 is trying to acquire lock:
8881013df278 (&(&priv->bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50

 but task is already holding lock:
826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0

 which lock already depends on the new loc
 the existing dependency chain (in reverse order) is:
 -> #3 (cpuhp_state-up){+.+.}-{0:0}:
  lock_acquire+0xd3/0x310
  cpuhp_thread_fun+0xa6/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 -> #2 (cpu_hotplug_lock){}-{0:0}:
  lock_acquire+0xd3/0x310
  __cpuhp_state_add_instance+0x43/0x1c0
  iova_domain_init_rcaches+0x199/0x1c0
  iommu_setup_dma_ops+0x130/0x440
  bus_iommu_probe+0x26a/0x2d0
  bus_set_iommu+0x82/0xd0
  intel_iommu_init+0xe33/0x1039
  pci_iommu_init+0x9/0x31
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #1 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
  lock_acquire+0xd3/0x310
  __mutex_lock+0x97/0xf10
  iommu_setup_dma_ops+0xd7/0x440
  iommu_probe_device+0xa4/0x180
  iommu_bus_notifier+0x2d/0x40
  notifier_call_chain+0x31/0x90
  blocking_notifier_call_chain+0x3a/0x50
  device_add+0x3c1/0x900
  pci_device_add+0x255/0x580
  pci_scan_single_device+0xa6/0xd0
  pci_scan_slot+0x7a/0x1b0
  pci_scan_child_bus_extend+0x35/0x2a0
  vmd_probe+0x5cd/0x970
  pci_device_probe+0x95/0x110
  really_probe+0xd6/0x350
  __driver_probe_device+0x73/0x170
  driver_probe_device+0x1a/0x90
  __driver_attach+0xbc/0x190
  bus_for_each_dev+0x72/0xc0
  bus_add_driver+0x1bb/0x210
  driver_register+0x66/0xc0
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #0 (&(&priv->bus_notifier)->rwsem){}-{3:3}:
  validate_chain+0xb3f/0x2000
  __lock_acquire+0x5a4/0xb70
  lock_acquire+0xd3/0x310
  down_read+0x39/0x140
  blocking_notifier_call_chain+0x20/0x50
  device_add+0x3c1/0x900
  platform_device_add+0x108/0x240
  coretemp_cpu_online+0xe1/0x15e [coretemp]
  cpuhp_invoke_callback+0x181/0x8a0
  cpuhp_thread_fun+0x188/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 other info that might help us debug thi
Chain exists of &(&priv->bus_notifier)->rwsem --> 
cpu_hotplug_lock --> cpuhp_state-

Possible unsafe locking scenari
  CPU0    CPU1
      
 lock(cpuhp_state-up);
  lock(cpu_hotplug_lock);
  lock(cpuhp_state-up);
 lock(&(&priv->bus_notifier)->rwsem);
  *** DEADLOCK *
2 locks held by cpuhp/0/15:
#0: 82648f10 (cpu_hotplug_lock){}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
#1: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0

 stack backtrace:
CPU: 0 PID: 15 Comm: cpuhp/0 Not tainted 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1
Hardware name: Intel Corporation Alder Lake Client 
Platform/AlderLake-P DDR4 RVP, BIOS ADLPFWI1.R00.3135.A00.2203251419 
03/25/2022

Call Trace:

dump_stack_lvl+0x56/0x7f
check_noncircular+0x132/0x150
validate_chain+0xb3f/0x2000
__lock_acquire+0x5a4/0xb70
lock_acquire+0xd3/0x310
? blocking_notifier_call_chain+0x20/0x50
down_read+0x39/0x140
? blocking_notifier_call_chain+0x20/0x50
blocking_notifier_call_chain+0x20/0x50
device_add+0x3c1/0x900
? dev_set_name+0x4e/0x70
platform_device_add+0x108/0x240
coretemp_cpu_online+0xe1/0x15e [coretemp]
? create_core_data+0x550/0x550 [coretemp]
cpuhp_invoke_callback+0x181/0x8a0
cpuhp_thread_fun+0x188/0x1f0
? smpboot_thread_fn+0x1e/0x260
smpboot_thread_fn+0x1

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-16 Thread Karolina Drobnik

On 14.09.2022 17:54, Robin Murphy wrote:

On 2022-09-14 16:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that
completely blocks testing. To enable CI and avoid possible
circular locking warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this 
issue? We could do this in our "for CI only" branch, but it's 
equally important that this is fixed for 6.0


Cc'ing them.


The lockdep report doesn't make much sense to me - the deadlock cycle
it's reporting doesn't even involve the mutex added by that commit,
and otherwise the lock ordering between the IOMMU bus notifier(s) and
cpu_hotplug_lock has existed for ages. Has lockdep somehow got
multiple different and unrelated bus notifiers mixed up, maybe?

FWIW nobody else has reported anything, and that mutex addresses a 
real-world concurrency issue, so I'm not convinced a revert is 
appropriate without at least a much clearer justification.


I'll share more background on this regression. We've noticed that no
tests were run for Alder Lake platforms. This may happens when, for
example, there is a kernel taint or lockdep warning.

Links:
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlm-1.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlp-6.html

The CI logs (which can be found for example here[1], boot0 file)
revealed a lockdep warning. One of the recent changes in the area was
commit ac9a5d522bb8 ("iommu/dma: Fix race condition during iova_domain
initialization"), and I sent a revert patch to test it on CI[2]. This
proved to be effective, as the tests started running on Alder Lake
platform:
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_108474v1/index.html?hosts=adlp

To be clear, that revert is just a way of unblocking CI testing, the
problem requires a specific fix.

Lucas, would it be possible to merge this revert to the topic branch to
unblock Alder Lake until this issue is fixed? I'm afraid that some
regressions could slip through the cracks if we don't do it soon enough.

Thanks,
Karolina


[1] -
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12145/bat-adlm-1/igt@run...@aborted.html
[2] - https://patchwork.freedesktop.org/series/108474/


Robin.


thanks Lucas De Marchi



kernel log:

== WARNING: 
possible circular locking dependency detected 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted 
-- cpuhp/0/15
is trying to acquire lock: 8881013df278 
(&(&priv->bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50 but task is already 
holding lock: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at:

 cpuhp_thread_fun+0x48/0x1f0 which lock already depends on the
new loc the existing dependency chain (in reverse order) is: ->
#3 (cpuhp_state-up){+.+.}-{0:0}: lock_acquire+0xd3/0x310 
cpuhp_thread_fun+0xa6/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 -> #2 
(cpu_hotplug_lock){}-{0:0}: lock_acquire+0xd3/0x310 
__cpuhp_state_add_instance+0x43/0x1c0 
iova_domain_init_rcaches+0x199/0x1c0 
iommu_setup_dma_ops+0x130/0x440 bus_iommu_probe+0x26a/0x2d0 
bus_set_iommu+0x82/0xd0 intel_iommu_init+0xe33/0x1039 
pci_iommu_init+0x9/0x31 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #1 
(&domain->iova_cookie->mutex){+.+.}-{3:3}: 
lock_acquire+0xd3/0x310 __mutex_lock+0x97/0xf10 
iommu_setup_dma_ops+0xd7/0x440 iommu_probe_device+0xa4/0x180 
iommu_bus_notifier+0x2d/0x40 notifier_call_chain+0x31/0x90 
blocking_notifier_call_chain+0x3a/0x50 device_add+0x3c1/0x900 
pci_device_add+0x255/0x580 pci_scan_single_device+0xa6/0xd0 
pci_scan_slot+0x7a/0x1b0 pci_scan_child_bus_extend+0x35/0x2a0 
vmd_probe+0x5cd/0x970 pci_device_probe+0x95/0x110 
really_probe+0xd6/0x350 __driver_probe_device+0x73/0x170 
driver_probe_device+0x1a/0x90 __driver_attach+0xbc/0x190 
bus_for_each_dev+0x72/0xc0 bus_add_driver+0x1bb/0x210 
driver_register+0x66/0xc0 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #0 
(&(&priv->bus_notifier)->rwsem){}-{3:3}: 
validate_chain+0xb3f/0x2000 __lock_acquire+0x5a4/0xb70 
lock_acquire+0xd3/0x310 down_read+0x39/0x140 
blocking_notifier_call_chain+0x20/0x50 device_add+0x3c1/0x900 
platform_device_add+0x108/0x240 coretemp_cpu_online+0xe1/0x15e 
[coretemp] cpuhp_invoke_callback+0x181/0x8a0 
cpuhp_thread_fun+0x188/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 other info that might
 help us debug thi Chain exists of &(&priv->bus_notifier)->rwsem 
--> cpu_hotplug_lock --> cpuhp_state- Possible unsafe locking 
scenari CPU0 CPU1  
lock(cpuhp_state-up); lock(cpu_hotplug_lock); 
lock(cpuhp_state-up); lock(&(&priv->bus_noti

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-16 Thread Lucas De Marchi

On Fri, Sep 16, 2022 at 02:24:00PM +0200, Karolina Drobnik wrote:

On 14.09.2022 17:54, Robin Murphy wrote:

On 2022-09-14 16:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that
completely blocks testing. To enable CI and avoid possible
circular locking warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this 
issue? We could do this in our "for CI only" branch, but it's 
equally important that this is fixed for 6.0


Cc'ing them.


The lockdep report doesn't make much sense to me - the deadlock cycle
it's reporting doesn't even involve the mutex added by that commit,
and otherwise the lock ordering between the IOMMU bus notifier(s) and
cpu_hotplug_lock has existed for ages. Has lockdep somehow got
multiple different and unrelated bus notifiers mixed up, maybe?

FWIW nobody else has reported anything, and that mutex addresses a 
real-world concurrency issue, so I'm not convinced a revert is 
appropriate without at least a much clearer justification.


I'll share more background on this regression. We've noticed that no
tests were run for Alder Lake platforms. This may happens when, for
example, there is a kernel taint or lockdep warning.

Links:
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlm-1.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlp-6.html

The CI logs (which can be found for example here[1], boot0 file)
revealed a lockdep warning. One of the recent changes in the area was
commit ac9a5d522bb8 ("iommu/dma: Fix race condition during iova_domain
initialization"), and I sent a revert patch to test it on CI[2]. This
proved to be effective, as the tests started running on Alder Lake
platform:
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_108474v1/index.html?hosts=adlp

To be clear, that revert is just a way of unblocking CI testing, the
problem requires a specific fix.

Lucas, would it be possible to merge this revert to the topic branch to
unblock Alder Lake until this issue is fixed? I'm afraid that some
regressions could slip through the cracks if we don't do it soon enough.


Yeah. Let's have CI running with the revertt so we can see if on next runs
it will really show it was a regression or if it's something else. I
think it will help us understand why it's failing.


Lucas De Marchi




Thanks,
Karolina


[1] -
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12145/bat-adlm-1/igt@run...@aborted.html
[2] - https://patchwork.freedesktop.org/series/108474/


Robin.


thanks Lucas De Marchi



kernel log:

== WARNING: 
possible circular locking dependency detected 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted 
-- 
cpuhp/0/15
is trying to acquire lock: 8881013df278 
(&(&priv->bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50 but task is already 
holding lock: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at:

cpuhp_thread_fun+0x48/0x1f0 which lock already depends on the
new loc the existing dependency chain (in reverse order) is: ->
#3 (cpuhp_state-up){+.+.}-{0:0}: lock_acquire+0xd3/0x310 
cpuhp_thread_fun+0xa6/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 -> #2 
(cpu_hotplug_lock){}-{0:0}: lock_acquire+0xd3/0x310 
__cpuhp_state_add_instance+0x43/0x1c0 
iova_domain_init_rcaches+0x199/0x1c0 
iommu_setup_dma_ops+0x130/0x440 bus_iommu_probe+0x26a/0x2d0 
bus_set_iommu+0x82/0xd0 intel_iommu_init+0xe33/0x1039 
pci_iommu_init+0x9/0x31 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #1 
(&domain->iova_cookie->mutex){+.+.}-{3:3}: 
lock_acquire+0xd3/0x310 __mutex_lock+0x97/0xf10 
iommu_setup_dma_ops+0xd7/0x440 iommu_probe_device+0xa4/0x180 
iommu_bus_notifier+0x2d/0x40 notifier_call_chain+0x31/0x90 
blocking_notifier_call_chain+0x3a/0x50 device_add+0x3c1/0x900 
pci_device_add+0x255/0x580 pci_scan_single_device+0xa6/0xd0 
pci_scan_slot+0x7a/0x1b0 pci_scan_child_bus_extend+0x35/0x2a0 
vmd_probe+0x5cd/0x970 pci_device_probe+0x95/0x110 
really_probe+0xd6/0x350 __driver_probe_device+0x73/0x170 
driver_probe_device+0x1a/0x90 __driver_attach+0xbc/0x190 
bus_for_each_dev+0x72/0xc0 bus_add_driver+0x1bb/0x210 
driver_register+0x66/0xc0 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #0 
(&(&priv->bus_notifier)->rwsem){}-{3:3}: 
validate_chain+0xb3f/0x2000 __lock_acquire+0x5a4/0xb70 
lock_acquire+0xd3/0x310 down_read+0x39/0x140 
blocking_notifier_call_chain+0x20/0x50 device_add+0x3c1/0x900 
platform_device_add+0x108/0x240 coretemp_cpu_online+0xe1/0x15e 
[coretemp] cpuhp_invoke_callback+0x181/0x8a0 
cpuhp_thread_fun+0x188/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 other

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-19 Thread Robin Murphy

On 2022-09-14 16:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that completely
blocks testing. To enable CI and avoid possible circular locking
warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this issue?
We could do this in our "for CI only" branch, but it's equally important
that this is fixed for 6.0

Cc'ing them.


The lockdep report doesn't make much sense to me - the deadlock cycle 
it's reporting doesn't even involve the mutex added by that commit, and 
otherwise the lock ordering between the IOMMU bus notifier(s) and 
cpu_hotplug_lock has existed for ages. Has lockdep somehow got multiple 
different and unrelated bus notifiers mixed up, maybe?


FWIW nobody else has reported anything, and that mutex addresses a 
real-world concurrency issue, so I'm not convinced a revert is 
appropriate without at least a much clearer justification.


Robin.


thanks
Lucas De Marchi



kernel log:

==
WARNING: possible circular locking dependency detected
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted
--
cpuhp/0/15 is trying to acquire lock:
8881013df278 (&(&priv->bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50

 but task is already holding lock:
826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0

 which lock already depends on the new loc
 the existing dependency chain (in reverse order) is:
 -> #3 (cpuhp_state-up){+.+.}-{0:0}:
  lock_acquire+0xd3/0x310
  cpuhp_thread_fun+0xa6/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 -> #2 (cpu_hotplug_lock){}-{0:0}:
  lock_acquire+0xd3/0x310
  __cpuhp_state_add_instance+0x43/0x1c0
  iova_domain_init_rcaches+0x199/0x1c0
  iommu_setup_dma_ops+0x130/0x440
  bus_iommu_probe+0x26a/0x2d0
  bus_set_iommu+0x82/0xd0
  intel_iommu_init+0xe33/0x1039
  pci_iommu_init+0x9/0x31
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #1 (&domain->iova_cookie->mutex){+.+.}-{3:3}:
  lock_acquire+0xd3/0x310
  __mutex_lock+0x97/0xf10
  iommu_setup_dma_ops+0xd7/0x440
  iommu_probe_device+0xa4/0x180
  iommu_bus_notifier+0x2d/0x40
  notifier_call_chain+0x31/0x90
  blocking_notifier_call_chain+0x3a/0x50
  device_add+0x3c1/0x900
  pci_device_add+0x255/0x580
  pci_scan_single_device+0xa6/0xd0
  pci_scan_slot+0x7a/0x1b0
  pci_scan_child_bus_extend+0x35/0x2a0
  vmd_probe+0x5cd/0x970
  pci_device_probe+0x95/0x110
  really_probe+0xd6/0x350
  __driver_probe_device+0x73/0x170
  driver_probe_device+0x1a/0x90
  __driver_attach+0xbc/0x190
  bus_for_each_dev+0x72/0xc0
  bus_add_driver+0x1bb/0x210
  driver_register+0x66/0xc0
  do_one_initcall+0x53/0x2f0
  kernel_init_freeable+0x18f/0x1e1
  kernel_init+0x11/0x120
  ret_from_fork+0x1f/0x30
 -> #0 (&(&priv->bus_notifier)->rwsem){}-{3:3}:
  validate_chain+0xb3f/0x2000
  __lock_acquire+0x5a4/0xb70
  lock_acquire+0xd3/0x310
  down_read+0x39/0x140
  blocking_notifier_call_chain+0x20/0x50
  device_add+0x3c1/0x900
  platform_device_add+0x108/0x240
  coretemp_cpu_online+0xe1/0x15e [coretemp]
  cpuhp_invoke_callback+0x181/0x8a0
  cpuhp_thread_fun+0x188/0x1f0
  smpboot_thread_fn+0x1b5/0x260
  kthread+0xed/0x120
  ret_from_fork+0x1f/0x30
 other info that might help us debug thi
Chain exists of &(&priv->bus_notifier)->rwsem --> 
cpu_hotplug_lock --> cpuhp_state-

Possible unsafe locking scenari
  CPU0    CPU1
      
 lock(cpuhp_state-up);
  lock(cpu_hotplug_lock);
  lock(cpuhp_state-up);
 lock(&(&priv->bus_notifier)->rwsem);
  *** DEADLOCK *
2 locks held by cpuhp/0/15:
#0: 82648f10 (cpu_hotplug_lock){}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0
#1: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at: 
cpuhp_thread_fun+0x48/0x1f0

 stack backtrace:
CPU: 0 PID: 15 Comm: cpuhp/0 Not tainted 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1
Hardware name: Intel Corporation Alder Lake Client 
Platform/AlderLake-P DDR4 RVP, BIOS ADLPFWI1.R00.3135.A00.2203251419 
03/25/2022

Call Trace:

dump_stack_lvl+0x56/0x7f
check_noncircular+0x132/0x150
validate_chain+0xb3f/0x2000
__lock_acquire+0x5a4/0xb70
lock_acquire+0xd3/0x310
? blocking_notifier_call_chain+0x20/0x50
down_read+0x39/0x140
? blocking_notifier_call_chain+0x20/0x50
blocking_notifier_call_chain+0x20

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-19 Thread Karolina Drobnik

On 16.09.2022 22:32, Lucas De Marchi wrote:

On Fri, Sep 16, 2022 at 02:24:00PM +0200, Karolina Drobnik wrote:

On 14.09.2022 17:54, Robin Murphy wrote:

On 2022-09-14 16:01, Lucas De Marchi wrote:

On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:

This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.

This change introduces a regression on Alder Lake that
completely blocks testing. To enable CI and avoid possible
circular locking warning, revert the patch.


We are already on rc5. Are iommu authors involved aware of this 
issue? We could do this in our "for CI only" branch, but it's 
equally important that this is fixed for 6.0


Cc'ing them.


The lockdep report doesn't make much sense to me - the deadlock cycle
it's reporting doesn't even involve the mutex added by that commit,
and otherwise the lock ordering between the IOMMU bus notifier(s) and
cpu_hotplug_lock has existed for ages. Has lockdep somehow got
multiple different and unrelated bus notifiers mixed up, maybe?

FWIW nobody else has reported anything, and that mutex addresses a 
real-world concurrency issue, so I'm not convinced a revert is 
appropriate without at least a much clearer justification.


I'll share more background on this regression. We've noticed that no
tests were run for Alder Lake platforms. This may happens when, for
example, there is a kernel taint or lockdep warning.

Links:
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlm-1.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlp-6.html

The CI logs (which can be found for example here[1], boot0 file)
revealed a lockdep warning. One of the recent changes in the area was
commit ac9a5d522bb8 ("iommu/dma: Fix race condition during iova_domain
initialization"), and I sent a revert patch to test it on CI[2]. This
proved to be effective, as the tests started running on Alder Lake
platform:
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_108474v1/index.html?hosts=adlp 



To be clear, that revert is just a way of unblocking CI testing, the
problem requires a specific fix.

Lucas, would it be possible to merge this revert to the topic branch to
unblock Alder Lake until this issue is fixed? I'm afraid that some
regressions could slip through the cracks if we don't do it soon enough.


Yeah. Let's have CI running with the revertt so we can see if on next runs
it will really show it was a regression or if it's something else. I
think it will help us understand why it's failing.


Thanks for the merge. It looks like all adls are doing better now
(revert went in CI_DRM_12147):
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlp-6.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adln-1.html
https://intel-gfx-ci.01.org/tree/drm-tip/bat-adlm-1.html (CI_DRM_12149 
seems to show a different problem)


All the best,
Karolina


Lucas De Marchi




Thanks,
Karolina


[1] -
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12145/bat-adlm-1/igt@run...@aborted.html 


[2] - https://patchwork.freedesktop.org/series/108474/


Robin.


thanks Lucas De Marchi



kernel log:

== WARNING: 
possible circular locking dependency detected 
6.0.0-rc5-CI_DRM_12132-g6c93e979e542+ #1 Not tainted 
-- cpuhp/0/15
is trying to acquire lock: 8881013df278 
(&(&priv->bus_notifier)->rwsem){}-{3:3}, at: 
blocking_notifier_call_chain+0x20/0x50 but task is already holding 
lock: 826490c0 (cpuhp_state-up){+.+.}-{0:0}, at:

cpuhp_thread_fun+0x48/0x1f0 which lock already depends on the
new loc the existing dependency chain (in reverse order) is: ->
#3 (cpuhp_state-up){+.+.}-{0:0}: lock_acquire+0xd3/0x310 
cpuhp_thread_fun+0xa6/0x1f0 smpboot_thread_fn+0x1b5/0x260 
kthread+0xed/0x120 ret_from_fork+0x1f/0x30 -> #2 
(cpu_hotplug_lock){}-{0:0}: lock_acquire+0xd3/0x310 
__cpuhp_state_add_instance+0x43/0x1c0 
iova_domain_init_rcaches+0x199/0x1c0 
iommu_setup_dma_ops+0x130/0x440 bus_iommu_probe+0x26a/0x2d0 
bus_set_iommu+0x82/0xd0 intel_iommu_init+0xe33/0x1039 
pci_iommu_init+0x9/0x31 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #1 
(&domain->iova_cookie->mutex){+.+.}-{3:3}: lock_acquire+0xd3/0x310 
__mutex_lock+0x97/0xf10 iommu_setup_dma_ops+0xd7/0x440 
iommu_probe_device+0xa4/0x180 iommu_bus_notifier+0x2d/0x40 
notifier_call_chain+0x31/0x90 
blocking_notifier_call_chain+0x3a/0x50 device_add+0x3c1/0x900 
pci_device_add+0x255/0x580 pci_scan_single_device+0xa6/0xd0 
pci_scan_slot+0x7a/0x1b0 pci_scan_child_bus_extend+0x35/0x2a0 
vmd_probe+0x5cd/0x970 pci_device_probe+0x95/0x110 
really_probe+0xd6/0x350 __driver_probe_device+0x73/0x170 
driver_probe_device+0x1a/0x90 __driver_attach+0xbc/0x190 
bus_for_each_dev+0x72/0xc0 bus_add_driver+0x1bb/0x210 
driver_register+0x66/0xc0 do_one_initcall+0x53/0x2f0 
kernel_init_freeable+0x18f/0x1e1 kernel_init+0x11/0x120 
ret_from_fork+0x1f/0x30 -> #0 
(&(&priv->bus_notifier)->rwsem){++

Re: [Intel-gfx] [topic/core-for-CI] Revert "iommu/dma: Fix race condition during iova_domain initialization"

2022-09-21 Thread Janusz Krzysztofik
Hi Robin,

On Wednesday, 14 September 2022 17:54:36 CEST Robin Murphy wrote:
> On 2022-09-14 16:01, Lucas De Marchi wrote:
> > On Wed, Sep 14, 2022 at 02:40:45PM +0200, Karolina Drobnik wrote:
> >> This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce.
> >>
> >> This change introduces a regression on Alder Lake that completely
> >> blocks testing. To enable CI and avoid possible circular locking
> >> warning, revert the patch.
> > 
> > We are already on rc5. Are iommu authors involved aware of this issue?
> > We could do this in our "for CI only" branch, but it's equally important
> > that this is fixed for 6.0
> > 
> > Cc'ing them.
> 
> The lockdep report doesn't make much sense to me - the deadlock cycle 
> it's reporting doesn't even involve the mutex added by that commit, and 
> otherwise the lock ordering between the IOMMU bus notifier(s) and 
> cpu_hotplug_lock has existed for ages. 

Indeed, that lockdep report is not quite related, but there are other lockdep 
reports in our CI results that better justify the revert.

https://intel-gfx-ci.01.org/tree/drm-tip/TrybotIGT_595/bat-dg2-8/igt@core_hotunp...@unplug-rescan.html

...
<7> [181.565177] [IGT] core_hotunplug: unplugging the device
<7> [181.565372] i915 :03:00.0: [drm:intel_power_well_enable [i915]] 
enabling DC_off
<7> [181.565708] i915 :03:00.0: [drm:gen9_set_dc_state.part.15 [i915]] 
Setting DC state from 01 to 00
<7> [181.566060] i915 :03:00.0: [drm:intel_power_well_enable [i915]] 
enabling PW_2
<7> [181.566216] i915 :03:00.0: [drm:intel_power_well_enable [i915]] 
enabling PW_A
<7> [181.566447] i915 :03:00.0: [drm:intel_power_well_enable [i915]] 
enabling PW_B
<7> [181.566607] i915 :03:00.0: [drm:intel_power_well_enable [i915]] 
enabling PW_C
<7> [181.566765] i915 :03:00.0: [drm:intel_power_well_enable [i915]] 
enabling PW_D
<7> [181.570683] intel_gt_set_wedged called from 
intel_gt_set_wedged_on_fini+0x9/0x30 [i915]
<7> [181.659480] i915 :03:00.0: [drm:drm_client_release] drm_fb_helper
<6> [181.773310] pci :03:00.0: Removing from iommu group 1
<7> [181.774416] [IGT] core_hotunplug: rediscovering the device
<6> [181.775800] pci :03:00.0: [8086:56a0] type 00 class 0x03
<6> [181.775833] pci :03:00.0: reg 0x10: [mem 0x9000-0x90ff 64bit]
<6> [181.775852] pci :03:00.0: reg 0x18: [mem 0x40-0x43 
64bit pref]
<6> [181.775886] pci :03:00.0: reg 0x30: [mem 0xffe0-0x pref]
<6> [181.776058] pci :03:00.0: ASPM: overriding L1 acceptable latency from 
0x0 to 0x7
<6> [181.776073] pci :03:00.0: Video device with shadowed ROM at [mem 
0x000c-0x000d]
<6> [181.776209] pci :03:00.0: PME# supported from D0 D3hot
<6> [181.776869] pci :03:00.0: vgaarb: setting as boot VGA device
<6> [181.776878] pci :03:00.0: vgaarb: bridge control possible
<6> [181.776881] pci :03:00.0: vgaarb: VGA device added: 
decodes=io+mem,owns=io+mem,locks=none
<6> [181.777052] pci :03:00.0: Adding to iommu group 1
<4> [181.777164] 
<4> [181.777169] ==
<4> [181.777176] WARNING: possible circular locking dependency detected
<4> [181.777182] 6.0.0-rc5-CI_DRM_12145-g2dc9ea03abff+ #1 Not tainted
<4> [181.777189] --
<4> [181.777196] core_hotunplug/1240 is trying to acquire lock:
<4> [181.777202] 8881029b33b0 (&domain->iova_cookie->mutex){+.+.}-{3:3}, 
at: iommu_setup_dma_ops+0xd7/0x440
<4> [181.777218] 
but task is already holding lock:
<4> [181.777225] 8881010c9e78 (&(&priv->bus_notifier)->rwsem){}-{3:3}, 
at: blocking_notifier_call_chain+0x20/0x50
<4> [181.777242] 
which lock already depends on the new lock.
<4> [181.777250] 
the existing dependency chain (in reverse order) is:
<4> [181.777258] 
-> #3 (&(&priv->bus_notifier)->rwsem){}-{3:3}:
<4> [181.777268]lock_acquire+0xd3/0x310
<4> [181.777274]down_read+0x39/0x140
<4> [181.777281]blocking_notifier_call_chain+0x20/0x50
<4> [181.777289]device_add+0x3c1/0x900
<4> [181.777295]platform_device_add+0x108/0x240
<4> [181.777302]coretemp_cpu_online+0xe1/0x15e [coretemp]
<4> [181.777310]cpuhp_invoke_callback+0x181/0x8a0
<4> [181.777318]cpuhp_thread_fun+0x188/0x1f0
<4> [181.777325]smpboot_thread_fn+0x1b5/0x260
<4> [181.777332]kthread+0xed/0x120
<4> [181.777337]ret_from_fork+0x1f/0x30
<4> [181.777343] 
-> #2 (cpuhp_state-up){+.+.}-{0:0}:
<4> [181.777352]lock_acquire+0xd3/0x310
<4> [181.777358]cpuhp_thread_fun+0xa6/0x1f0
<4> [181.777364]smpboot_thread_fn+0x1b5/0x260
<4> [181.777370]kthread+0xed/0x120
<4> [181.777375]ret_from_fork+0x1f/0x30
<4> [181.777381] 
-> #1 (cpu_hotplug_lock){}-{0:0}:
<4> [181.777390]lock_acquire+0xd3/0x310
<4> [181.777395]__cpuhp_state_add_instance+0x43/0x1c0
<4> [181.777402]iova_domain_init_rcaches+0x199/0x1c0
<4> [181.777409]