On 11/05/26 3:20 PM, Shivang Upadhyay wrote:
During DLPAR CPU hotplug, newly added CPUs start in RTAS stopped state
(quiesced). If a kexec crash occurs before the guest starts these CPUs
via start-cpu RTAS call, H_SIGNAL_SYS_RESET_ALL_OTHERS will reset them
anyway, causing the kdump kernel to hang:
[ 5.519483][ T1] Processor 0 is stuck.
[ 11.089481][ T1] Processor 1 is stuck.
The hypervisor should only reset CPUs that the guest has started. The
cpu->env.quiesced flag tracks RTAS stopped state - CPUs in this state
are already inactive and should not be reset.
Skip system reset for quiesced CPUs to prevent kdump hangs during CPU
hotplug operations.
Cc: Sourabh Jain <[email protected]>
Cc: Harsh Prateek Bora <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Reported-by: Anushree Mathur <[email protected]>
Suggested-by: Vishal Chourasia <[email protected]>
Reviewed-by: Vishal Chourasia <[email protected]>
Signed-off-by: Shivang Upadhyay <[email protected]>
---
Changelog:
v2:
* added braces to adhere to style guide.
* rebase to master
v1:
* https://lore.kernel.org/all/[email protected]/
---
hw/ppc/spapr_hcall.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 032805a8d0..613dd893bb 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1105,6 +1105,12 @@ static target_ulong h_signal_sys_reset(PowerPCCPU *cpu,
continue;
}
}
+
+ /* Skip quiesced CPUs */
+ if (c->env.quiesced) {
+ continue;
+ }
+
run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL);
}
return H_SUCCESS;
Hi Shivang,
thanks for working on the reported issue. After applying the patch I am
seeing that this reported issue has been fixed which was guest getting
hung after
triggering kdump on guest while cpu hotplug is going on, but I am seeing
multiple other issues with multiple attempts with the same
scenario and one of the major issue that I have seen is a qemu crash. I
believe this needs to be fixed.
Here is my analysis of this issue with and without the patch:
1) Without applying the patch:
i) Start the guest with maxvcpus as 64 and current vcpus as 8
ii) Start cpu hotplug [virsh setvcpus guest_name 64] same time trigger
kdump on guest [echo c > /proc/sysrq-trigger]
Guest is getting hung.
[ 32.930453][ T1208] NIP [00007fffbe35b3c4] 0x7fffbe35b3c4
[ 32.930528][ T1208] LR [00007fffbe35b3c4] 0x7fffbe35b3c4
[ 32.930638][ T1208] ---- interrupt: 3000
[ 9.857410][ T1] Processor 0 is stuck.
2) After applying the patch
Multiple issues that were seen in multiple attempts of this scenario:
i) In 4th attempt I saw dlpar related traces along with OOPS after
triggering kdump:
[ 6.071156][ T121] pseries-hotplug-cpu: Cannot add cpu
/cpus/PowerPC,POWER11@20; this system configuration supports 32 logical
cpus.
[ 6.071313][ T121] OF: changeset notifier error
@/cpus/PowerPC,POWER11@20
[ 6.074099][ T121] BUG: Unable to handle kernel data access at
0x151591241bba0bb6
[ 6.074232][ T121] Faulting instruction address: 0xc0000000211e5b98
[ 6.074311][ T121] Oops: Kernel access of bad area, sig: 11 [#1]
[ 6.076695][ T121] Call Trace:
[ 6.076741][ T121] [c000000026a6baf0] [c000000026a6bb30]
0xc000000026a6bb30 (unreliable)
[ 6.076834][ T121] [c000000026a6bb60] [0000000010000021] 0x10000021
[ 6.076930][ T121] [c000000026a6bb90] [c000000020e55b14]
of_get_next_child+0x64/0xd0
[ 6.077034][ T121] [c000000026a6bbd0] [c0000000201cd1dc]
dlpar_cpu_add+0xbc/0x5e0
[ 6.077148][ T121] [c000000026a6bcb0] [c0000000201ce9d0]
dlpar_cpu+0x60/0x1f0
[ 6.077241][ T121] [c000000026a6bd40] [c0000000201c5914]
handle_dlpar_errorlog+0x1f4/0x6e0
[ 6.077333][ T121] [c000000026a6be20] [c0000000201c5e28]
pseries_hp_work_fn+0x28/0x60
[ 6.077425][ T121] [c000000026a6be50] [c000000020259e6c]
process_one_work+0x1dc/0x540
[ 6.077516][ T121] [c000000026a6bf00] [c00000002025ae0c]
worker_thread+0x36c/0x4d0
[ 6.077608][ T121] [c000000026a6bf90] [c000000020269978]
kthread+0x168/0x190
[ 6.077700][ T121] [c000000026a6bfe0] [c00000002000de58]
start_kernel_thread+0x14/0x18
ii) In 7th attempt I saw xive interrupts
[ 61.692603][ T1909] ---- interrupt: 3000
[ 0.010215][ T1] xive: H_INT_GET_QUEUE_INFO cpu=62 prio=6 failed -55
[ 0.013834][ T1] xive: Error -55 getting queue info CPU 62 prio 6
iii) qemu crashed after 10 attempts with the following error message in
the libvirt/qemu logs
qemu-system-ppc64: ../hw/ppc/spapr.c:4396: spapr_cpu_index_to_props:
Assertion `core_slot' failed.
2026-05-13 09:46:29.656+0000: shutting down, reason=crashed
Thank you!
Anushree Mathur