On Mon, Jul 22, 2024 at 3:58 PM Lianbo Jiang <liji...@redhat.com> wrote:
> On 7/18/24 6:21 PM, devel-requ...@lists.crash-utility.osci.io wrote: > > > Date: Thu, 18 Jul 2024 07:26:02 -0000 > > From:qiwu.c...@transsion.com > > Subject: [Crash-utility] Re: [PATCH] arm64: fix a potential segfault > > in arm64_unwind_frame > > To:devel@lists.crash-utility.osci.io > > Message-ID:<20240718072602.21739.62...@lists.crash-utility.osci.io> > > Content-Type: text/plain; charset="utf-8" > > > > Hi Lianbo, > > > > 1. The current issue can be reproduced with arm64_unwind_frame_v2(): > > Thank you for the confirmation, qiwu. > > If so, the same changes should be done in the arm64_unwind_frame_v2(). > What do you think? > > In addition, I just noticed that there is a similar call trace here(see another patch): https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00858.html I'm curious if they are the same situation, can you help check if KASAN is enabled in your case as well? Thanks Lianbo > > Thanks > > Lianbo > > > > crash> bt > > [Detaching after fork from child process 4778] > > > > Thread 1 "crash" received signal SIGSEGV, Segmentation fault. > > 0x0000555555826dae in arm64_unwind_frame_v2 (bt=0x7fffffffd8f0, > > frame=0x7fffffffd060, ofp=0x555559909970) at arm64.c:3048 > > 3048 frame->pc = GET_STACK_ULONG(fp + 8); > > (gdb) bt > > #0 0x0000555555826dae in arm64_unwind_frame_v2 (bt=0x7fffffffd8f0, > > frame=0x7fffffffd060, ofp=0x555559909970) at arm64.c:3048 > > #1 0x0000555555827d99 in arm64_back_trace_cmd_v2 (bt=0x7fffffffd8f0) at > > arm64.c:3426 > > #2 0x00005555557df95e in back_trace (bt=0x7fffffffd8f0) at > > kernel.c:3240 > > #3 0x00005555557dd8b8 in cmd_bt () at kernel.c:2881 > > #4 0x000055555573696b in exec_command () at main.c:893 > > #5 0x000055555573673e in main_loop () at main.c:840 > > #6 0x0000555555aa4a61 in captured_main (data=<optimized out>) at > > main.c:1284 > > #7 gdb_main (args=<optimized out>) at main.c:1313 > > #8 0x0000555555aa4ae0 in gdb_main_entry (argc=<optimized out>, > > argv=<optimized out>) at main.c:1338 > > #9 0x00005555558021df in gdb_main_loop (argc=2, argv=0x7fffffffe248) at > > gdb_interface.c:81 > > #10 0x0000555555736401 in main (argc=3, argv=0x7fffffffe248) at > > main.c:721 > > (gdb) p/x *(struct arm64_stackframe *)0x7fffffffd060 > > $1 = {fp = 0xffffffc008003f50, sp = 0xffffffc008003f40, pc = > > 0xffffffdfd669447c} > > (gdb) p/x *(struct bt_info *)0x7fffffffd8f0 > > $2 = {task = 0xffffff8118012500, flags = 0x0, instptr = > > 0xffffffdfd669447c, stkptr = 0xffffffc008003f40, bptr = 0x0, stackbase = > > 0xffffffc01b5b0000, stacktop = 0xffffffc01b5b4000, > > stackbuf = 0x555556117a80, tc = 0x55557a3b3480, hp = 0x0, textlist = > > 0x0, ref = 0x0, frameptr = 0xffffffc008003f50, call_target = 0x0, > > machdep = 0x0, debug = 0x0, eframe_ip = 0x0, radix = 0x0, > > cpumask = 0x0} > > > > 2. The issue can be easily reproduced by "echo c > > > /proc/sysrq-trigger" on Andriod GKI-5.10 platform. > > >From the reproduced dump we can see, the current fp/sp of crashing > > cpu1 is out of range task's stack, but located in the irq stack of cpu0 > > KERNEL: vmlinux [TAINTED] > > DUMPFILE: SYS_COREDUMP > > CPUS: 8 [OFFLINE: 7] > > MACHINE: aarch64 (unknown Mhz) > > MEMORY: 8 GB > > PANIC: "Kernel panic - not syncing: sysrq triggered crash" > > PID: 9089 > > COMMAND: "sh" > > TASK: ffffff8118012500 [THREAD_INFO: ffffff8118012500] > > CPU: 1 > > STATE: TASK_RUNNING (PANIC) > > crash> help -m |grep irq > > irq_stack_size: 16384 > > irq_stacks[0]: ffffffc008000000 > > irq_stacks[1]: ffffffc008008000 > > irq_stacks[2]: ffffffc008010000 > > irq_stacks[3]: ffffffc008018000 > > irq_stacks[4]: ffffffc008020000 > > irq_stacks[5]: ffffffc008028000 > > irq_stacks[6]: ffffffc008030000 > > irq_stacks[7]: ffffffc008038000 > > crash> task_struct.thread -x ffffff8118012500 > > thread = { > > cpu_context = { > > x19 = 0xffffff80c01ea500, > > x20 = 0xffffff8118012500, > > x21 = 0xffffff8118012500, > > x22 = 0xffffff80c01ea500, > > x23 = 0xffffff8118012500, > > x24 = 0xffffff81319ac270, > > x25 = 0xffffffdfd8f87000, > > x26 = 0xffffff8118012500, > > x27 = 0xffffffdfd88ea180, > > x28 = 0xffffffdfd7e1b4b8, > > fp = 0xffffffc01b5b3a10, > > sp = 0xffffffc01b5b3a10, > > pc = 0xffffffdfd667b89c > > }, > > crash> bt -S 0xffffffc01b5b3a10 > > PID: 9089 TASK: ffffff8118012500 CPU: 1 COMMAND: "sh" > > #0 [ffffffc008003f50] local_cpu_stop at ffffffdfd6694478 > > crash> bt -S ffffffc008003f50 > > PID: 9089 TASK: ffffff8118012500 CPU: 1 COMMAND: "sh" > > bt: non-process stack address for this task: ffffffc008003f50 > > (valid range: ffffffc01b5b0000 - ffffffc01b5b4000) > > > > The second frame begins to switch to irq stack of cpu0. > > crash> rd ffffffc008003f50 2 > > ffffffc008003f50: ffffffc008003f70 ffffffdfd68125d0 p?.......%...... > > crash> dis -x ffffffdfd68125d0 > > 0xffffffdfd68125d0 <handle_percpu_devid_fasteoi_ipi+0xb0>: b > > 0xffffffdfd6812730 <handle_percpu_devid_fasteoi_ipi+0x210> > > crash> rd ffffffc008003f70 2 > > ffffffc008003f70: ffffffc008003fb0 ffffffdfd680352c .?......,5...... > > crash> dis ffffffdfd680352c -x > > 0xffffffdfd680352c <__handle_domain_irq+0x114>: bl > > 0xffffffdfd673d204 <__irq_exit_rcu> > > crash> rd ffffffc008003fb0 2 > > ffffffc008003fb0: ffffffc008003fe0 ffffffdfd6610380 .?........a..... > > crash> dis -x ffffffdfd6610380 > > 0xffffffdfd6610380 <gic_handle_irq.30555+0x6c>: cbz w0, > > 0xffffffdfd6610348 <gic_handle_irq.30555+0x34> > > crash> rd ffffffc008003fe0 2 > > ffffffc008003fe0: ffffffdfd8d83e20 ffffffdfd6612624 > > >......$&a..... > > crash> dis -x ffffffdfd6612624 > > 0xffffffdfd6612624 <el1_irq+0xe4>: mov sp, x19 > > crash> rd ffffffdfd8d83e20 2 > > ffffffdfd8d83e20: ffffffdfd8d83e80 ffffffdfd768c690 > > .>........h..... > > crash> dis -x ffffffdfd768c690 > > 0xffffffdfd768c690 <cpuidle_enter_state+0x3a4>: tbnz w19, #31, > > 0xffffffdfd768c720 <cpuidle_enter_state+0x434> > > crash> rd ffffffdfd8d83e80 2 > > ffffffdfd8d83e80: ffffffdfd8d83ef0 ffffffdfd67ab4f4 > > .>........z..... > > crash> dis -x ffffffdfd67ab4f4 > > 0xffffffdfd67ab4f4 <do_idle+0x308>: str xzr, [x19, #8] > > crash> rd ffffffdfd8d83ef0 2 > > ffffffdfd8d83ef0: ffffffdfd8d83f50 ffffffdfd67ab7e4 P?........z..... > > crash> dis -x ffffffdfd67ab7e4 > > 0xffffffdfd67ab7e4 <cpu_startup_entry+0x84>: b > > 0xffffffdfd67ab7e0 <cpu_startup_entry+0x80> > > > > It's unreasonable cpu1 is in cpu0's irq context, which is far away from > > the backtrace showed by "bt -T", so we must avoid this case. > > crash> bt -T > > PID: 9089 TASK: ffffff8118012500 CPU: 1 COMMAND: "sh" > > [ffffffc01b5b3238] vsnprintf at ffffffdfd7075c10 > > [ffffffc01b5b32b8] sprintf at ffffffdfd707b9e4 > > [ffffffc01b5b3398] __sprint_symbol at ffffffdfd68abff4 > > [ffffffc01b5b33c8] symbol_string at ffffffdfd70774ac > > [ffffffc01b5b33d8] symbol_string at ffffffdfd7077510 > > [ffffffc01b5b34c8] string at ffffffdfd70767a8 > > [ffffffc01b5b34d8] vsnprintf at ffffffdfd7075c2c > > [ffffffc01b5b34e8] vsnprintf at ffffffdfd7075fdc > > [ffffffc01b5b3518] vscnprintf at ffffffdfd707b8b4 > > [ffffffc01b5b3558] ktime_get_ts64 at ffffffdfd686a2f8 > > [ffffffc01b5b3598] data_alloc at ffffffdfd68009b4 > > [ffffffc01b5b35d8] prb_reserve at ffffffdfd68011b4 > > [ffffffc01b5b35e8] prb_reserve at ffffffdfd68010a0 > > [ffffffc01b5b3648] log_store at ffffffdfd67fb024 > > [ffffffc01b5b3698] number at ffffffdfd7076ea4 > > [ffffffc01b5b36d8] number at ffffffdfd7076ea4 > > [ffffffc01b5b3738] vsnprintf at ffffffdfd7075c10 > > [ffffffc01b5b3778] number at ffffffdfd7076ea4 > > [ffffffc01b5b37c8] number at ffffffdfd7076ea4 > > [ffffffc01b5b3828] vsnprintf at ffffffdfd7075c10 > > [ffffffc01b5b3868] vsnprintf at ffffffdfd7075c2c > > [ffffffc01b5b3888] number at ffffffdfd7076ea4 > > [ffffffc01b5b38e8] vsnprintf at ffffffdfd7075c10 > > [ffffffc01b5b3928] vsnprintf at ffffffdfd7075c2c > > [ffffffc01b5b3968] aee_nested_printf at ffffffdfd3d05d7c [mrdump] > > [ffffffc01b5b3a48] mrdump_common_die at ffffffdfd3d05a98 [mrdump] > > [ffffffc01b5b3ac8] ipanic at ffffffdfd3d06078 [mrdump] > > [ffffffc01b5b3ae8] __typeid__ZTSFiP14notifier_blockmPvE_global_addr at > > ffffffdfd7e3c118 > > [ffffffc01b5b3af0] ipanic.cfi_jt at ffffffdfd3d0ab40 [mrdump] > > [ffffffc01b5b3b18] atomic_notifier_call_chain at ffffffdfd678236c > > [ffffffc01b5b3b28] panic at ffffffdfd672aa28 > > [ffffffc01b5b3bf8] rcu_read_unlock.34874 at ffffffdfd718ddfc > > [ffffffc01b5b3c58] __handle_sysrq at ffffffdfd718d3c0 > > [ffffffc01b5b3c68] write_sysrq_trigger at ffffffdfd718ea20 > > [ffffffc01b5b3cb0] __typeid__ZTSFlP4filePKcmPxE_global_addr at > > ffffffdfd7e2fa88 > > [ffffffc01b5b3cc8] proc_reg_write at ffffffdfd6c56f44 > > [ffffffc01b5b3cd8] file_start_write at ffffffdfd6b3f884 > > [ffffffc01b5b3d08] vfs_write at ffffffdfd6b403b4 > > [ffffffc01b5b3da8] ksys_write at ffffffdfd6b40240 > > [ffffffc01b5b3df8] __arm64_sys_write at ffffffdfd6b401b4 > > [ffffffc01b5b3e20] __typeid__ZTSFlPK7pt_regsE_global_addr at > > ffffffdfd7e25240 > > [ffffffc01b5b3e38] el0_svc_common at ffffffdfd6695980 > > [ffffffc01b5b3e48] el0_da at ffffffdfd7d43e30 > > [ffffffc01b5b3e58] el0_svc at ffffffdfd7d43d9c > > [ffffffc01b5b3e98] el0_sync_handler at ffffffdfd7d43d10 > > [ffffffc01b5b3ea8] el0_sync at ffffffdfd66128b8 > > > > Thanks >
-- Crash-utility mailing list -- devel@lists.crash-utility.osci.io To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki