Hello Sathyanarayanan,

Hope you are doing well. I am Chaitanya from the linux graphics team in Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on
drm-tip repository.

Since backmerge of Linux 6.19-rc1, we are seeing the following regression in the PTL machines.

`````````````````````````````````````````````````````````````````````````````````
<4>[    8.197433] ============================================
<4>[    8.197437] WARNING: possible recursive locking detected
<4>[ 8.197440] 6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 Not tainted
<4>[    8.197444] --------------------------------------------
<4>[    8.197447] cpuhp/0/20 is trying to acquire lock:
<4>[ 8.197450] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197463]
                  but task is already holding lock:
<4>[ 8.197466] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x6d/0x290
<4>[    8.197477]
                  other info that might help us debug this:
<4>[    8.197480]  Possible unsafe locking scenario:

<4>[    8.197483]        CPU0
<4>[    8.197485]        ----
<4>[    8.197487]   lock(cpu_hotplug_lock);
<4>[    8.197490]   lock(cpu_hotplug_lock);
<4>[    8.197493]
                   *** DEADLOCK ***

<4>[    8.197496]  May be due to missing lock nesting notation

<4>[    8.197499] 2 locks held by cpuhp/0/20:
<4>[ 8.197503] #0: ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x6d/0x290 <4>[ 8.197513] #1: ffffffff83489f60 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x6d/0x290
<4>[    8.197523]
                  stack backtrace:
<4>[ 8.197528] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 PREEMPT(voluntary) <4>[ 8.197530] Hardware name: Intel Corporation Panther Lake Client Platform/PTL-UH LP5 T3 RVP1, BIOS PTLPFWI1.R00.3383.D10.2510222219 10/22/2025
<4>[    8.197532] Call Trace:
<4>[    8.197532]  <TASK>
<4>[    8.197533]  dump_stack_lvl+0x91/0xf0
<4>[    8.197537]  dump_stack+0x10/0x20
<4>[    8.197538]  print_deadlock_bug+0x23f/0x320
<4>[    8.197542]  __lock_acquire+0x146e/0x2790
<4>[    8.197548]  lock_acquire+0xc4/0x2c0
<4>[    8.197550]  ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197556]  cpus_read_lock+0x41/0x110
<4>[    8.197558]  ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197561]  rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197565]  rapl_cpu_online+0x85/0x87 [intel_rapl_msr]
<4>[    8.197568]  ? __pfx_rapl_cpu_online+0x10/0x10 [intel_rapl_msr]
<4>[    8.197570]  cpuhp_invoke_callback+0x41f/0x6c0
<4>[    8.197573]  ? cpuhp_thread_fun+0x6d/0x290
<4>[    8.197575]  cpuhp_thread_fun+0x1e2/0x290
<4>[    8.197578]  ? smpboot_thread_fn+0x26/0x290
<4>[    8.197581]  smpboot_thread_fn+0x12f/0x290
<4>[    8.197584]  ? __pfx_smpboot_thread_fn+0x10/0x10
<4>[    8.197586]  kthread+0x11f/0x250
<4>[    8.197589]  ? __pfx_kthread+0x10/0x10
<4>[    8.197592]  ret_from_fork+0x344/0x3a0
<4>[    8.197595]  ? __pfx_kthread+0x10/0x10
<4>[    8.197597]  ret_from_fork_asm+0x1a/0x30
<4>[    8.197604]  </TASK>
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [2].

After bisecting the tree, the following patch [3] seems to be the first "bad" commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 748d6ba43afde7e9ac27443233203995cc15d235
Author: Kuppuswamy Sathyanarayanan <[email protected]>
Date:   Thu Nov 20 16:05:39 2025 -0800

    powercap: intel_rapl: Enable MSR-based RAPL PMU support
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We also verified that if we revert the patch the issue is not seen.

Could you please check why the patch causes this regression and provide a fix if necessary?

Thank you.

Regards

Chaitanya

[1]
https://intel-gfx-ci.01.org/tree/intel-xe/combined-alt.html?
[2]
https://intel-gfx-ci.01.org/tree/intel-xe/xe-4242-05b7c58b3367dca84d4745dfcac3b5d4ee142404/bat-ptl-2/boot0.txt
[3] https://cgit.freedesktop.org/drm-tip/commit/?id=748d6ba43afde7e9ac27443233203995cc15d235

Reply via email to