On 11/05/26 3:20 PM, Shivang Upadhyay wrote:
During DLPAR CPU hotplug, newly added CPUs start in RTAS stopped state
(quiesced). If a kexec crash occurs before the guest starts these CPUs
via start-cpu RTAS call, H_SIGNAL_SYS_RESET_ALL_OTHERS will reset them
anyway, causing the kdump kernel to hang:

   [    5.519483][    T1] Processor 0 is stuck.
   [   11.089481][    T1] Processor 1 is stuck.

The hypervisor should only reset CPUs that the guest has started. The
cpu->env.quiesced flag tracks RTAS stopped state - CPUs in this state
are already inactive and should not be reset.

Skip system reset for quiesced CPUs to prevent kdump hangs during CPU
hotplug operations.

Cc: Sourabh Jain <[email protected]>
Cc: Harsh Prateek Bora <[email protected]>
Cc: Mahesh J Salgaonkar <[email protected]>
Reported-by: Anushree Mathur <[email protected]>
Suggested-by: Vishal Chourasia <[email protected]>
Reviewed-by: Vishal Chourasia <[email protected]>
Signed-off-by: Shivang Upadhyay <[email protected]>
---
Changelog:

v2:
  * added braces to adhere to style guide.
  * rebase to master

v1:
  * https://lore.kernel.org/all/[email protected]/
---
  hw/ppc/spapr_hcall.c | 6 ++++++
  1 file changed, 6 insertions(+)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 032805a8d0..613dd893bb 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1105,6 +1105,12 @@ static target_ulong h_signal_sys_reset(PowerPCCPU *cpu,
                      continue;
                  }
              }
+
+            /* Skip quiesced CPUs */
+            if (c->env.quiesced) {
+                continue;
+            }
+
              run_on_cpu(cs, spapr_do_system_reset_on_cpu, RUN_ON_CPU_NULL);
          }
          return H_SUCCESS;


Hi Shivang,
thanks for working on the reported issue. After applying the patch I am seeing that this reported issue has been fixed which was guest getting hung after triggering kdump on guest while cpu hotplug is going on, but I am seeing multiple other issues with multiple attempts with the same scenario and one of the major issue that I have seen is a qemu crash. I believe this needs to be fixed.


Here is my analysis of this issue with and without the patch:

1) Without applying the patch:

i) Start the guest with maxvcpus as 64 and current vcpus as 8
ii) Start cpu hotplug [virsh setvcpus guest_name 64] same time trigger kdump on guest [echo c > /proc/sysrq-trigger]
Guest is getting hung.


[   32.930453][ T1208] NIP [00007fffbe35b3c4] 0x7fffbe35b3c4
[   32.930528][ T1208] LR [00007fffbe35b3c4] 0x7fffbe35b3c4
[   32.930638][ T1208] ---- interrupt: 3000
[    9.857410][    T1] Processor 0 is stuck.



2) After applying the patch


Multiple issues that were seen in multiple attempts of this scenario:


i) In 4th attempt I saw dlpar related traces along with OOPS after triggering kdump:

[    6.071156][  T121] pseries-hotplug-cpu: Cannot add cpu /cpus/PowerPC,POWER11@20; this system configuration supports 32 logical cpus. [    6.071313][  T121] OF: changeset notifier error @/cpus/PowerPC,POWER11@20 [    6.074099][  T121] BUG: Unable to handle kernel data access at 0x151591241bba0bb6
[    6.074232][  T121] Faulting instruction address: 0xc0000000211e5b98
[    6.074311][  T121] Oops: Kernel access of bad area, sig: 11 [#1]

[    6.076695][  T121] Call Trace:
[    6.076741][  T121] [c000000026a6baf0] [c000000026a6bb30] 0xc000000026a6bb30 (unreliable)
[    6.076834][  T121] [c000000026a6bb60] [0000000010000021] 0x10000021
[    6.076930][  T121] [c000000026a6bb90] [c000000020e55b14] of_get_next_child+0x64/0xd0 [    6.077034][  T121] [c000000026a6bbd0] [c0000000201cd1dc] dlpar_cpu_add+0xbc/0x5e0 [    6.077148][  T121] [c000000026a6bcb0] [c0000000201ce9d0] dlpar_cpu+0x60/0x1f0 [    6.077241][  T121] [c000000026a6bd40] [c0000000201c5914] handle_dlpar_errorlog+0x1f4/0x6e0 [    6.077333][  T121] [c000000026a6be20] [c0000000201c5e28] pseries_hp_work_fn+0x28/0x60 [    6.077425][  T121] [c000000026a6be50] [c000000020259e6c] process_one_work+0x1dc/0x540 [    6.077516][  T121] [c000000026a6bf00] [c00000002025ae0c] worker_thread+0x36c/0x4d0 [    6.077608][  T121] [c000000026a6bf90] [c000000020269978] kthread+0x168/0x190 [    6.077700][  T121] [c000000026a6bfe0] [c00000002000de58] start_kernel_thread+0x14/0x18

ii) In 7th attempt I saw xive interrupts


[   61.692603][ T1909] ---- interrupt: 3000
[    0.010215][    T1] xive: H_INT_GET_QUEUE_INFO cpu=62 prio=6 failed -55
[    0.013834][    T1] xive: Error -55 getting queue info CPU 62 prio 6



iii) qemu crashed after 10 attempts with the following error message in the libvirt/qemu logs

qemu-system-ppc64: ../hw/ppc/spapr.c:4396: spapr_cpu_index_to_props: Assertion `core_slot' failed.
2026-05-13 09:46:29.656+0000: shutting down, reason=crashed


Thank you!
Anushree Mathur

Reply via email to