On 05/11/25 7:15 pm, Christophe Leroy wrote:


Le 23/10/2025 à 06:54, Venkat Rao Bagalkote a écrit :
Greetings!!!


IBM CI has reported a kernel crash while running mce selftests on mainline kernel, from tools/testing/selftests/powerpc/mce/.


This issue is hit when CONFIG_KASAN is enabled. If its disabled, test passes.


Traces:


[ 8041.225432] BUG: Unable to handle kernel data access on read at 0xc00e0001a1ad6103
[ 8041.225453] Faulting instruction address: 0xc0000000008c54d8
[ 8041.225461] Oops: Kernel access of bad area, sig: 11 [#1]
[ 8041.225467] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[ 8041.225475] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack bonding tls nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 nd_pmem papr_scm sd_mod libnvdimm sg ibmvscsi ibmveth scsi_transport_srp pseries_wdt [ 8041.225558] CPU: 17 UID: 0 PID: 877869 Comm: inject-ra-err Kdump: loaded Not tainted 6.18.0-rc2+ #1 VOLUNTARY [ 8041.225569] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries [ 8041.225576] NIP:  c0000000008c54d8 LR: c00000000004e464 CTR: 0000000000000000 [ 8041.225583] REGS: c0000000fff778d0 TRAP: 0300   Not tainted (6.18.0- rc2+) [ 8041.225590] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 48002828  XER: 00000000 [ 8041.225607] CFAR: c00000000004e460 DAR: c00e0001a1ad6103 DSISR: 40000000 IRQMASK: 3 [ 8041.225607] GPR00: c0000000019d0598 c0000000fff77b70 c00000000244a400 c000000d0d6b0818 [ 8041.225607] GPR04: 0000000000004d43 0000000000000008 c00000000004e464 004d424900000000 [ 8041.225607] GPR08: 0000000000000001 18000001a1ad6103 a80e000000000000 0000000003000048 [ 8041.225607] GPR12: 0000000000000000 c000000d0ddf3300 0000000000000000 0000000000000000 [ 8041.225607] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 8041.225607] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 8041.225607] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 8041.225607] GPR28: c000000d0d6b0888 c000000d0d6b0800 0000000000004d43 c000000d0d6b0818
[ 8041.225701] NIP [c0000000008c54d8] __asan_load2+0x54/0xd8
[ 8041.225712] LR [c00000000004e464] pseries_errorlog_id+0x20/0x3c
[ 8041.225722] Call Trace:
[ 8041.225726] [c0000000fff77b90] [c0000000001f8748] fwnmi_get_errinfo+0xd4/0x104 [ 8041.225738] [c0000000fff77bc0] [c0000000019d0598] get_pseries_errorlog+0xa8/0x110 [ 8041.225750] [c0000000fff77c00] [c0000000001f8f68] pseries_machine_check_realmode+0x11c/0x214 [ 8041.225762] [c0000000fff77ce0] [c000000000049ca4] machine_check_early+0x74/0xc0 [ 8041.225771] [c0000000fff77d30] [c0000000000084a4] machine_check_early_common+0x1b4/0x2c0

Is it a new problem or has it always been there ?


Its not a new problem. I have enabled KASAN recently in the config, and then I started seeing this issue.

I have tested on 6.17, 6.16 and 6.15 kernels and issues is there all along.


Regards,

Venkat.


The problem is because KASAN is not compatible with realmode (MMU translation is OFF).

pseries_machine_check_realmode() is located in arch/powerpc/platforms/pseries/ras.c built with KASAN_SANITIZE_ras.o := n

But pseries_machine_check_realmode() calls mce_handle_error() which calls get_pseries_errorlog().

get_pseries_errorlog() is in arch/powerpc/kernel/rtas.c which is _not_ built with KASAN_SANITIZE disabled hence the Oops.

Unrelated, but it looks like there is also a problem with commit cc15ff327569 ("powerpc/mce: Avoid using irq_work_queue() in realmode"), which removed the re-enabling of translation but left the call to mce_handle_err_virtmode().

Christophe


Reply via email to