Hi Colin,

Thank you for the insights.

On 5/6/25 8:16 오전, Colin King (gmail) wrote:
> On 05/05/2025 23:29, Yunseong Kim wrote:
>> Hi all,
>>
>> I encountered a kernel panic in the RCU core subsystem while running a 
>> stress-ng on a virtualized ARM64 system.
>>
>> This panic consistently occurs regardless of whether I increase or decrease 
>> the memory size.
>>
>> The crash seems to originate from rcu_do_batch(), jumping to a pointer 
>> (0xffff00003a114000) that appears to be non-executable.
>> The PTE for the address confirms XN=1. Given the heavy binderfs workload, I 
>> suspect there may be a use-after-free or dangling pointer involved in a 
>> callback invocation.
>>
>> Platform:
>>   Architecture: arm64
>>   Virtualized environment: Apple Silicon M2 (Apple Virtualization Framework)
>>   Kernel version: 6.15.0-rc4+
>>   Attached Config: CONFIG_PREEMPT_VOLUNTARY=y, CONFIG_KASAN=y
>>
>> Reproducer:
>>   sudo ./stress-ng --binderfs 8 --binderfs-ops 10000 -t 15 \
>>    --pathological --timestamp --tz --syslog --perf --no-rand-seed \
>>    --times --metrics --klog-check --status 5 -x smi -v --interrupts 
>> --change-cpu
> 
> 
> I suspect --change-cpu is required to trigger this issue. Does it trigger 
> without this option?  Can you reproduce the issue when reducing the number of 
> --binderfs intances?

As you suggested, I've been testing combinations of enabling and disabling 
'--binderfs' and '--change-cpu' separately.

While I'm not deeply familiar with the internal mechanisms of binderfs,
I found that the panic still occurs consistently with --binderfs, even without 
the --change-cpu option.

I've also tried replacing --binderfs with other core stressors like --procfs, 
--ramfs, and --file...,
but so far the issue does not reproduce with these alternatives — the kernel 
panic seems exclusive to binderfs.

I ran tests for --binderfs intance 1 through 3 about three times each, and the 
panic did not occur in those runs.

However, the panic occurred when I used 4 instances.

Reproducer with reduced instance from 8 to 4 and without change-cpu option:
 sudo ./stress-ng --binderfs 4 --binderfs-ops 10000 -t 15 \
  --pathological --timestamp --tz --syslog --perf --no-rand-seed \
  --times --metrics --klog-check --status 5 -x smi -v --interrupts

Here's the panic log from the test with the options mentioned above,
the call stack looks nearly identical to what I observed in the previous crash:

[ 1517.764550] stress-ng-thras (18990): drop_caches: 3
[ 1535.016245] stress-ng-thras (18990): drop_caches: 1
[ 1549.753497] stress-ng-thras (18990): drop_caches: 2
[ 1562.702095] stress-ng-thras (18990): drop_caches: 3
[ 1612.728092] stress-ng-thras (18990): drop_caches: 1
[ 1674.033654] stress-ng-thras (18990): drop_caches: 2
[ 1977.262956] Unable to handle kernel execute from non-executable memory at 
virtual address ffff00003a114000
[ 1977.262980] Mem abort info:
[ 1977.262988]   ESR = 0x000000008600000f
[ 1977.262998]   EC = 0x21: IABT (current EL), IL = 32 bits
[ 1977.263008]   SET = 0, FnV = 0
[ 1977.263017]   EA = 0, S1PTW = 0
[ 1977.263026]   FSC = 0x0f: level 3 permission fault
[ 1977.263036] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000dfd88000
[ 1977.263047] [ffff00003a114000] pgd=18000000effff403, p4d=18000000effff403, 
pud=18000000efffe403, pmd=18000000effad403, pte=006800007a114707
[ 1977.263088] Internal error: Oops: 000000008600000f [#1]  SMP
[ 1977.263097] Modules linked in: pcbc lrw xcbc wp512 nhpoly1305_neon 
nhpoly1305 libpoly1305 michael_mic md4 streebog_generic rmd160 crc32_generic 
twofish_generic twofish_common serpent_generic fcrypt cast6_generic 
cast5_generic cast_common camellia_generic blowfish_generic blowfish_common 
ecrdsa_generic des_generic libdes aegis128 overlay isofs uinput snd_seq_dummy 
snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet 
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack rfkill nf_defrag_ipv6 
nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc virtio_snd snd_seq snd_seq_device 
snd_pcm virtio_net snd_timer snd virtio_balloon net_failover soundcore failover 
vfat fat joydev loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common 
zram lz4hc_compress lz4_compress vmw_vsock_vmci_transport vmw_vmci vsock uas 
polyval_ce polyval_generic usb_storage ghash_ce sha3_ce sha512_ce sha512_arm64 
virtio_gpu virtio_dma_buf apple_mfi_fastcharge
[ 1977.263372]  fuse
[ 1977.263387] CPU: 2 UID: 0 PID: 27 Comm: ksoftirqd/2 Kdump: loaded Not 
tainted 6.15.0-rc4+ #1 PREEMPT(voluntary) 
[ 1977.263398] Hardware name: Apple Inc. Apple Virtualization Generic Platform, 
BIOS 2075.101.2.0.0 03/12/2025
[ 1977.263406] pstate: 21400805 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=-c)
[ 1977.263416] pc : 0xffff00003a114000
[ 1977.263443] lr : rcu_do_batch+0x2dc/0x860
[ 1977.263457] sp : ffff800080143c90
[ 1977.263462] x29: ffff800080143cb0 x28: ffff000048608000 x27: ffff00003a114000
[ 1977.263478] x26: ffff800084442000 x25: 0000000000000000 x24: ffff8000843d9b18
[ 1977.263492] x23: ffff800082150ac0 x22: 0000000000000007 x21: 000000000000000a
[ 1977.263506] x20: ffff000030e08000 x19: ffff0000af4cfe00 x18: 0000000000000002
[ 1977.263521] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000017
[ 1977.263535] x14: 0000000000000004 x13: ffff0000af4cfed0 x12: 0000000000000002
[ 1977.263549] x11: 0000000000110009 x10: 0000000000ff0100 x9 : ffff80008385a580
[ 1977.263563] x8 : 0000000100000100 x7 : 0000000000000000 x6 : ffff8000803f89bc
[ 1977.263577] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000002
[ 1977.263591] x2 : 0000000000000000 x1 : ffff800082a4aeb8 x0 : ffff000048608000
[ 1977.263605] Call trace:
[ 1977.263611]  0xffff00003a114000 (P)
[ 1977.263623]  rcu_core+0x2a0/0x4e8
[ 1977.263635]  rcu_core_si+0x1c/0x30
[ 1977.263646]  handle_softirqs+0x1b4/0x588
[ 1977.263661]  run_ksoftirqd+0x5c/0xf8
[ 1977.263670]  smpboot_thread_fn+0x27c/0x490
[ 1977.263683]  kthread+0x2ac/0x318
[ 1977.263697]  ret_from_fork+0x10/0x20
[ 1977.263714] Code: dff29fc3 00200000 dff28fc3 00200000 (48608000) 
[ 1977.263723] SMP: stopping secondary CPUs
[ 1977.264081] Starting crashdump kernel...
[ 1977.264090] Bye!

Please let me know if there's any specific option or debugging mechanism you'd 
recommend to help isolate this further.

Best regards,
Yunseong Kim


Reply via email to