Re: BUG: corrupted list in freeary

Dmitry Vyukov Fri, 30 Nov 2018 09:59:22 -0800

On Thu, Nov 29, 2018 at 9:13 AM, Manfred Spraul
<manf...@colorfullife.com> wrote:
> Hello together,
>
> On 11/27/18 4:52 PM, syzbot wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    e195ca6cb6f2 Merge branch 'for-linus' of git://git.kernel...
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10d3e6a3400000
>
> From the console output:
>
> 20:36:14 executing program 4:
> semget$private(0x12000000, 0x39d0, 0x0)
>
>
> I don't understand the 0x12000000.
>
> What does that mean? What is the actual syscall?


Hi Manfred,

The syscall is semget with the first argument 0x12000000.


>
> Is 0x39d0 the number of semaphores in the array, i.e. create ~13.000
> semaphores?

If the second argument of 0x39d0 relates to creation of 0x39d0
semaphores, then yes.



> kernel config:  https://syzkaller.appspot.com/x/.config?x=73e2bc0cb6463446
> dashboard link: https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+c92d3646e35bc5d1a...@syzkaller.appspotmail.com
>
> input: syz1 as /devices/virtual/input/input670
> input: syz1 as /devices/virtual/input/input671
> list_del corruption. prev->next should be ffff8881dae2cdb8, but was
> 0000000000100000
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:53!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 6194 Comm: syz-executor5 Not tainted 4.20.0-rc3+ #348
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__list_del_entry_valid.cold.1+0x48/0x4a lib/list_debug.c:51
> Code: d0 60 88 e8 b2 31 d2 fd 0f 0b 48 89 de 48 c7 c7 00 d2 60 88 e8 a1 31
> d2 fd 0f 0b 48 89 de 48 c7 c7 a0 d1 60 88 e8 90 31 d2 fd <0f> 0b 48 89 d9 48
> c7 c7 60 d2 60 88 e8 7f 31 d2 fd 0f 0b 48 89 f1
> RSP: 0018:ffff8881848fee80 EFLAGS: 00010286
> RAX: 0000000000000054 RBX: ffff8881dae2cdb8 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff8165eaf5 RDI: 0000000000000005
> RBP: ffff8881848fee98 R08: ffff8881848f26c0 R09: 0000000000000006
> R10: 0000000000000000 R11: ffff8881848f26c0 R12: ffff8881c3173a00
> R13: ffff8881be118118 R14: ffff8881848ff280 R15: dffffc0000000000
> FS:  00000000020b2940(0000) GS:ffff8881dae00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000625208 CR3: 00000001c10d3000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  __list_del_entry include/linux/list.h:117 [inline]
>  list_del include/linux/list.h:125 [inline]
>  unlink_queue ipc/sem.c:786 [inline]
>
> Unlink_queue means transfer all waiting threads to the wake-q.
>
> There are 2*(1+<semaphores in the array>) linked lists in an array.
>
> And this fails, because one linked list contains 0x100000 instead of a real
> pointer.
>
> I could not find any semop() in the log --> all lists must be empty.
>
> Actually, the lists were initialized in newary(), and then never touched.
>
>  freeary+0xbd1/0x1a40 ipc/sem.c:1160
>
> Free a semaphore array
>
>  free_ipcs+0x9f/0x1c0 ipc/namespace.c:112
>  sem_exit_ns+0x20/0x40 ipc/sem.c:237
>  free_ipc_ns ipc/namespace.c:120 [inline]
>
> Free all ipc ids in the name space
>
>  put_ipc_ns+0x66/0x180 ipc/namespace.c:152
>  free_nsproxy+0xcf/0x220 kernel/nsproxy.c:180
>
> Free the name space
>
>  switch_task_namespaces+0xb3/0xd0 kernel/nsproxy.c:229
>  exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234
>  do_exit+0x1ad1/0x26d0 kernel/exit.c:866
>  do_group_exit+0x177/0x440 kernel/exit.c:970
>  get_signal+0x8b0/0x1980 kernel/signal.c:2517
>  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
>  exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
>  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x410fa0
>
> This is time code 604.599748 in the console output:
>
> [  604.599748] RIP: 0033:0x410fa0
>
>
> Questions:
>
> 1) What is this?
>
> [  600.924691]  entry_SYSCALL_64_after_hwframe+0x49/0xbe^M
> [  600.929872] RIP: 0033:0x7f3e597d0120^M
> [  600.933576] Code: Bad RIP value.^M
> [  600.936920] RSP: 002b:00007ffc2d83e008 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000002^M
> [  600.944608] RAX: ffffffffffffffda RBX: 000055ca2995b436 RCX:
> 00007f3e597d0120^M
> [  600.951856] RDX: 00007ffc2d83e244 RSI: 0000000000080000 RDI:
> 00007ffc2d83e220^M
> [  600.959107] RBP: 000055ca2995b1e0 R08: 0000000000000000 R09:
> 000055ca2995b099^M
> [  600.966355] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000001^M
> [  600.973628] R13: 000055ca2995b090 R14: 000055ca2995b190 R15:
> 00007ffc2d83e220^M
>
> Isn't this a kernel stack overrun?
>
> RSP: 0x..83e008. Assuming 8 kB kernel stack, and 8 kB alignment, we have
> used up everything.

I don't exact answer, that's just the kernel output that we captured
from console.

FWIW with KASAN stacks are 16K:
https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/page_64_types.h#L10


But it looks like the OOM may have been caused by a page fault, so the
IP/SP are legitimately in user-space, e.g. in the same log:

[  443.339635]  __do_page_fault+0x5e8/0xe60
[  443.343683]  ? trace_hardirqs_off+0xb8/0x310
[  443.348079]  do_page_fault+0xf2/0x7e0
[  443.351867]  ? vmalloc_sync_all+0x30/0x30
[  443.356002]  ? error_entry+0x70/0xd0
[  443.359702]  ? trace_hardirqs_off_caller+0xbb/0x310
[  443.364702]  ? trace_hardirqs_on_caller+0xc0/0x310
[  443.369620]  ? syscall_return_slowpath+0x5e0/0x5e0
[  443.374537]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  443.379390]  ? trace_hardirqs_on_caller+0x310/0x310
[  443.384397]  ? __bpf_trace_preemptirq_template+0x30/0x30
[  443.389837]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  443.395366]  ? prepare_exit_to_usermode+0x291/0x3b0
[  443.400374]  ? page_fault+0x8/0x30
[  443.403901]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  443.408729]  ? page_fault+0x8/0x30
[  443.412255]  page_fault+0x1e/0x30
[  443.415693] RIP: 0033:0x43d8e8
[  443.418884] Code: Bad RIP value.
[  443.422229] RSP: 002b:000000c43a053490 EFLAGS: 00010206




> - How/where are namespaces used by the bot?
>
> I.e. what triggered the namespace exit?

It creates whole set of namespaces per test process:
https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523

but then each test can also execute unshare as well.

Namespace exits usually happen when process terminates.
In the reported stack it seems to be the case:

 exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234
 do_exit+0x1ad1/0x26d0 kernel/exit.c:866
 do_group_exit+0x177/0x440 kernel/exit.c:970
 get_signal+0x8b0/0x1980 kernel/signal.c:2517
 do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
 exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162


> - There are ~370 calls to semget(), most with large (>10.000) semaphores in
> the arrays.
>
> Starting from [442.544635], the OOM killer starts to kill processes.
>
> Is this as intended?

Well, generally everything except for kernel crashes is expected.

We actually sandbox it with memcg quite aggressively:
https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L2159
But it seems to manage to either break the limits, or cause some
massive memory leaks. The nature of that is yet unknown.


> - Which stress tests are enabled? By chance, I found:
>
> [  433.304586] FAULT_INJECTION: forcing a failure.^M
> [  433.304586] name fail_page_alloc, interval 1, probability 0, space 0,
> times 0^M
> [  433.316471] CPU: 1 PID: 19653 Comm: syz-executor4 Not tainted 4.20.0-rc3+
> #348^M
> [  433.323841] Hardware name: Google Google Compute Engine/Google Compute
> Engine, BIOS Google 01/01/2011^M
>
> I need some more background, then I can review the code.

What exactly do you mean by "Which stress tests"?
Fault injection is enabled. Also random workload from userspace.


> Right now, I would put it into my "unknown syzcaller finding" folder.
>
> --
>
>     Manfred

Re: BUG: corrupted list in freeary

Reply via email to