On 26/06/24 15:10, Gautam Menghani wrote:
Without this patch, we had an issue where if we have some cpus disabled in the system and we try to do a 2 stage kexec as follows: kexec -l vmlinux .... kexec -e we would hit the following Oops [ 2598.923098] kernel BUG at arch/powerpc/kernel/exceptions-64s.S:501! [ 2598.923103] Oops: Exception in kernel mode, sig: 5 [#1] [ 2598.923107] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 2598.923111] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc kvm_hv kvm bonding tls rfkill binfmt_misc tg3 vmx_crypto aes_gcm_p10_crypto ibmveth crct10dif_vpmsum pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc fuse loop dm_multipath nfnetlink zram xfs ibmvscsi scsi_transport_srp crc32c_vpmsum pseries_wdt scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables [ 2598.923167] CPU: 11 PID: 1548 Comm: systemd-journal Not tainted 6.9.0+ #4 [ 2598.923171] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_022) hv:phyp pSeries [ 2598.923176] NIP: c0000000000089e4 LR: 00007fffaa1427c4 CTR: c0000000000089b0 [ 2598.923180] REGS: c0000008dfe7fd60 TRAP: 0700 Not tainted (6.9.0+) [ 2598.923184] MSR: 8000000000021031 <SF,ME,IR,DR,LE> CR: 28002413 XER: 00000000 [ 2598.923192] CFAR: c0000000000089dc IRQMASK: 0 [ 2598.923192] GPR00: 0000000000000003 00007ffff40fb110 0000000000000000 0000000000000009 [ 2598.923192] GPR04: 00007ffff40fbcf0 0000000000002000 00007ffff40fdcc0 0000000000000000 [ 2598.923192] GPR08: 00007fffaabc3b80 0000000048002413 00007ffff40fb3e0 0000000000017000 [ 2598.923192] GPR12: 8000000000009003 c0000008dfff2b00 0000000000000000 0000000000000000 [ 2598.923192] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 2598.923192] GPR20: 0000000000000000 0000000000000000 0000000000000000 00007fffaabaf448 [ 2598.923192] GPR24: 000000011bc72700 00007ffff40fddf8 0000000132490ea0 00007ffff40fddf0 [ 2598.923192] GPR28: 0000000000000000 00007ffff40fbcf0 0000000000002000 0000000000000009 [ 2598.923238] NIP [c0000000000089e4] data_access_common_virt+0x14/0x220 [ 2598.923245] LR [00007fffaa1427c4] 0x7fffaa1427c4 [ 2598.923251] Call Trace: [ 2598.923253] Code: 2c0a0000 39400300 408242c0 e94d0020 694a0002 7d400164 60420000 718a4000 7c2a0b78 3821fd30 41c20008 e82d0910 <0981fd30> f9210160 f9610130 f9810138 [ 2598.923269] ---[ end trace 0000000000000000 ]--- [ 2598.926662] pstore: backend (nvram) writing error (-1) With this patch, the disabled cpus are woken up and kexec goes through fine.
Verified the same on LPAR and has similar observation as Guatam mentioned above.
Thanks for the fix Nick. Tested-by: Sourabh Jain <sourabhj...@linux.ibm.com> - Sourabh