Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, Jan 4, 2019 at 7:05 PM Stefano Brivio wrote: > > On Fri, 4 Jan 2019 18:26:16 +0100 > Dmitry Vyukov wrote: > > > On Fri, Jan 4, 2019 at 6:14 PM Stefano Brivio wrote: > > > > > > On Fri, 4 Jan 2019 12:05:04 +0100 > > > Dmitry Vyukov wrote: > > > > > > > I've added these as tests: > > > > > > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341 > > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342 > > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343 > > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344 > > > > > > > > Will try to figure out how to distinguish them from true corrupted > > > > reports. Usually when Call Trace does not have any frames, it's a sign > > > > of a corrupted report, and in other crashes we see the same report but > > > > with a stack trace. But some stack-corruption-related reliably don't > > > > have stack traces (not corrupted). But then some other > > > > stack-corruption-related crashes do have stack traces, and for these > > > > no stack trace again means a corrupted kernel output. Amusingly this > > > > is one of the most complex parts of syzkaller. > > > > > > I'm not sure how complicated that would be, but what about some metric > > > based on valid symbol names being reported? > > > > Please elaborate. What do you mean by "valid symbol names"? > > I mean a symbol name listed in /proc/kallsyms on the running system. > > This is usually my minimum threshold for "I can do something with this > report" -- which doesn't mean it's necessarily valid, but well, if you > have that, it means that at least something worked in the reporting, > and you can at least start having a look at a specific function. > > > Note that corrupted output detection solves 2 problems: > > 1. Do we think the output is truncated to the point of being not useful? > > E.g. sometimes kernel produces just 1 line: > > > > general protection fault: [#1] PREEMPT SMP KASAN > > > > This is sure a crash, but it's not too useful to report. > > Sure. In those tests above you have: > - 341: udp6_lib_lookup2+0x622, handle_irq+0x2cb > - 342: __sanitizer_cov_trace_pc+0x8, handle_irq+0x2cb > - 343: __udp6_lib_err, etc. > - 344: __udp6_lib_lookup+0x1d, etc. > > and this makes all those reports at least minimally useful. > > > 2. Do we have any reasons to think we extracted bogus crash identity? > > E.g. crash intermixed with output from another thread so that we say > > "something-bad in function foo", when in fact function foo come from > > output of the second non-crashing thread. > > Okay, this looks way more complicated :) Yeah, unfortunately, it's quite complicated. Just today this gen popped up. You won't find any ODEBUG checks at that stack, it's completely unrelated and come from another task. [ cut here ] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:4916 WARNING: CPU: 1 PID: 45 at lib/debugobjects.c:325 debug_print_object+0x16a/0x250 lib/debugobjects.c:325 CPU: 0 PID: 13619 Comm: syz-executor1 Not tainted 4.20.0+ #13 Kernel panic - not syncing: panic_on_warn set ... Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1db/0x2d0 lib/dump_stack.c:113 warn_alloc.cold+0xc2/0x1c8 mm/page_alloc.c:3570 __vmalloc_node_range+0x57a/0x910 mm/vmalloc.c:1766 __vmalloc_node mm/vmalloc.c:1795 [inline] __vmalloc_node_flags mm/vmalloc.c:1809 [inline] vmalloc+0x6b/0x90 mm/vmalloc.c:1831 sel_write_load+0x1de/0x470 security/selinux/selinuxfs.c:557 __vfs_write+0x116/0xb40 fs/read_write.c:485 vfs_write+0x20c/0x580 fs/read_write.c:549 ksys_write+0x105/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, 4 Jan 2019 18:26:16 +0100 Dmitry Vyukov wrote: > On Fri, Jan 4, 2019 at 6:14 PM Stefano Brivio wrote: > > > > On Fri, 4 Jan 2019 12:05:04 +0100 > > Dmitry Vyukov wrote: > > > > > I've added these as tests: > > > > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341 > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342 > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343 > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344 > > > > > > Will try to figure out how to distinguish them from true corrupted > > > reports. Usually when Call Trace does not have any frames, it's a sign > > > of a corrupted report, and in other crashes we see the same report but > > > with a stack trace. But some stack-corruption-related reliably don't > > > have stack traces (not corrupted). But then some other > > > stack-corruption-related crashes do have stack traces, and for these > > > no stack trace again means a corrupted kernel output. Amusingly this > > > is one of the most complex parts of syzkaller. > > > > I'm not sure how complicated that would be, but what about some metric > > based on valid symbol names being reported? > > Please elaborate. What do you mean by "valid symbol names"? I mean a symbol name listed in /proc/kallsyms on the running system. This is usually my minimum threshold for "I can do something with this report" -- which doesn't mean it's necessarily valid, but well, if you have that, it means that at least something worked in the reporting, and you can at least start having a look at a specific function. > Note that corrupted output detection solves 2 problems: > 1. Do we think the output is truncated to the point of being not useful? > E.g. sometimes kernel produces just 1 line: > > general protection fault: [#1] PREEMPT SMP KASAN > > This is sure a crash, but it's not too useful to report. Sure. In those tests above you have: - 341: udp6_lib_lookup2+0x622, handle_irq+0x2cb - 342: __sanitizer_cov_trace_pc+0x8, handle_irq+0x2cb - 343: __udp6_lib_err, etc. - 344: __udp6_lib_lookup+0x1d, etc. and this makes all those reports at least minimally useful. > 2. Do we have any reasons to think we extracted bogus crash identity? > E.g. crash intermixed with output from another thread so that we say > "something-bad in function foo", when in fact function foo come from > output of the second non-crashing thread. Okay, this looks way more complicated :) -- Stefano
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, 4 Jan 2019 12:24:18 -0500 Willem de Bruijn wrote: > On Fri, Jan 4, 2019 at 12:14 PM Stefano Brivio wrote: > > > > On Fri, 4 Jan 2019 12:05:04 +0100 > > Dmitry Vyukov wrote: > > > > > On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio > > > wrote: > > > > > > > > On Fri, 4 Jan 2019 11:32:12 +0100 > > > > Dmitry Vyukov wrote: > > > > > > > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio > > > > > wrote: > > > > > > > > > > > > On Thu, 3 Jan 2019 15:15:06 -0600 > > > > > > Willem de Bruijn wrote: > > > > > > > > > > > > > syzbot generated stack traces with > > > > > > > > > > > > > > [ 183.517380] udpv6_err+0x46/0x60 > > > > > > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > > > > > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > > > > > > > > > > > Where? I can't find that in any logs linked from the dashboard at > > > > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( > > > > > > > > > > Stefano, there are these 4 bugs reported that have similarly looking > > > > > reproducers involving udp sockets and that crash modes that looks like > > > > > stack corruption/overflow: > > > > > > > > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934 > > > > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7 > > > > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe > > > > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 > > > > > > > > > > Are these the same bug as this? > > > > > > > > Judging from the reproducers for the first three, they seem to be. > > > > > > OK, then I will mark them as dups of this one. > > > > syzbot just finished the tests I requested and couldn't reproduce the > > first three issues with the fix I posted (fou6: Prevent unbounded > > recursion in GUE error handler). > > Thanks for preparing the fixes so quickly, Stefano. > > I also noticed one trace that seemingly goes through an ip6erspan > tunnel as well as gue6. > > [ 760.618683] ? __udp6_lib_err+0xcb/0x640 > [ 760.622716] ? udplitev6_err+0x46/0x60 > [ 760.626573] ? gue6_err+0x105/0x270 > [ 760.630170] ? udp_lib_close+0x20/0x20 > [ 760.634027] ? ip6erspan_tunnel_xmit+0xdc0/0xdc0 > > Without knowing the err_handler code too well: is it possible that > packets with an intermediate IPIP or other tunnel still bypass the > checks (which check for strictly UDP in GUE)? Yes, I also noticed that, and concluded it's not an issue, but thanks for pointing that out. Recursion can't happen there because other handlers don't forward the exception to the exception handler of the inner layer. For ERSPAN, e.g., see ip6gre_err(): it "simply" looks up the tunnel and calls ip6_update_pmtu() and ip6_redirect(). For FoU and GUE this is not possible as we don't maintain enough state to be reasonably sure the exception is legitimate. -- Stefano
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, Jan 4, 2019 at 6:14 PM Stefano Brivio wrote: > > On Fri, 4 Jan 2019 12:05:04 +0100 > Dmitry Vyukov wrote: > > > On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio wrote: > > > > > > On Fri, 4 Jan 2019 11:32:12 +0100 > > > Dmitry Vyukov wrote: > > > > > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio > > > > wrote: > > > > > > > > > > On Thu, 3 Jan 2019 15:15:06 -0600 > > > > > Willem de Bruijn wrote: > > > > > > > > > > > syzbot generated stack traces with > > > > > > > > > > > > [ 183.517380] udpv6_err+0x46/0x60 > > > > > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > > > > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > > > > > > > > > Where? I can't find that in any logs linked from the dashboard at > > > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( > > > > > > > > Stefano, there are these 4 bugs reported that have similarly looking > > > > reproducers involving udp sockets and that crash modes that looks like > > > > stack corruption/overflow: > > > > > > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934 > > > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7 > > > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe > > > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 > > > > > > > > Are these the same bug as this? > > > > > > Judging from the reproducers for the first three, they seem to be. > > > > OK, then I will mark them as dups of this one. > > syzbot just finished the tests I requested and couldn't reproduce the > first three issues with the fix I posted (fou6: Prevent unbounded > recursion in GUE error handler). > > This should prove they are in fact the same issue. > > > > I > > > guess I can trigger tests also for those by sending a (sharp)syz > > > test ... e-mail with the patch to the Reported-by: addresses, right? > > > > Correct. > > These should be on LKML, but as you noted you can just add the syzbot > > email with tag to TO/CC. That email is available in the Reported-by > > tag (and also shown on the dashboard). > > Okay, thanks for confirming. > > > > And the three reports you pointed out from the pile of corrupted > > > reports also seem to match, others look unrelated. > > > > I've added these as tests: > > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341 > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342 > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343 > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344 > > > > Will try to figure out how to distinguish them from true corrupted > > reports. Usually when Call Trace does not have any frames, it's a sign > > of a corrupted report, and in other crashes we see the same report but > > with a stack trace. But some stack-corruption-related reliably don't > > have stack traces (not corrupted). But then some other > > stack-corruption-related crashes do have stack traces, and for these > > no stack trace again means a corrupted kernel output. Amusingly this > > is one of the most complex parts of syzkaller. > > I'm not sure how complicated that would be, but what about some metric > based on valid symbol names being reported? Please elaborate. What do you mean by "valid symbol names"? Note that corrupted output detection solves 2 problems: 1. Do we think the output is truncated to the point of being not useful? E.g. sometimes kernel produces just 1 line: general protection fault: [#1] PREEMPT SMP KASAN This is sure a crash, but it's not too useful to report. 2. Do we have any reasons to think we extracted bogus crash identity? E.g. crash intermixed with output from another thread so that we say "something-bad in function foo", when in fact function foo come from output of the second non-crashing thread.
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, Jan 4, 2019 at 12:14 PM Stefano Brivio wrote: > > On Fri, 4 Jan 2019 12:05:04 +0100 > Dmitry Vyukov wrote: > > > On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio wrote: > > > > > > On Fri, 4 Jan 2019 11:32:12 +0100 > > > Dmitry Vyukov wrote: > > > > > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio > > > > wrote: > > > > > > > > > > On Thu, 3 Jan 2019 15:15:06 -0600 > > > > > Willem de Bruijn wrote: > > > > > > > > > > > syzbot generated stack traces with > > > > > > > > > > > > [ 183.517380] udpv6_err+0x46/0x60 > > > > > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > > > > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > > > > > > > > > Where? I can't find that in any logs linked from the dashboard at > > > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( > > > > > > > > Stefano, there are these 4 bugs reported that have similarly looking > > > > reproducers involving udp sockets and that crash modes that looks like > > > > stack corruption/overflow: > > > > > > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934 > > > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7 > > > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe > > > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 > > > > > > > > Are these the same bug as this? > > > > > > Judging from the reproducers for the first three, they seem to be. > > > > OK, then I will mark them as dups of this one. > > syzbot just finished the tests I requested and couldn't reproduce the > first three issues with the fix I posted (fou6: Prevent unbounded > recursion in GUE error handler). Thanks for preparing the fixes so quickly, Stefano. I also noticed one trace that seemingly goes through an ip6erspan tunnel as well as gue6. [ 760.618683] ? __udp6_lib_err+0xcb/0x640 [ 760.622716] ? udplitev6_err+0x46/0x60 [ 760.626573] ? gue6_err+0x105/0x270 [ 760.630170] ? udp_lib_close+0x20/0x20 [ 760.634027] ? ip6erspan_tunnel_xmit+0xdc0/0xdc0 Without knowing the err_handler code too well: is it possible that packets with an intermediate IPIP or other tunnel still bypass the checks (which check for strictly UDP in GUE)?
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, 4 Jan 2019 12:05:04 +0100 Dmitry Vyukov wrote: > On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio wrote: > > > > On Fri, 4 Jan 2019 11:32:12 +0100 > > Dmitry Vyukov wrote: > > > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio > > > wrote: > > > > > > > > On Thu, 3 Jan 2019 15:15:06 -0600 > > > > Willem de Bruijn wrote: > > > > > > > > > syzbot generated stack traces with > > > > > > > > > > [ 183.517380] udpv6_err+0x46/0x60 > > > > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > > > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > > > > > > > Where? I can't find that in any logs linked from the dashboard at > > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( > > > > > > Stefano, there are these 4 bugs reported that have similarly looking > > > reproducers involving udp sockets and that crash modes that looks like > > > stack corruption/overflow: > > > > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934 > > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7 > > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe > > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 > > > > > > Are these the same bug as this? > > > > Judging from the reproducers for the first three, they seem to be. > > OK, then I will mark them as dups of this one. syzbot just finished the tests I requested and couldn't reproduce the first three issues with the fix I posted (fou6: Prevent unbounded recursion in GUE error handler). This should prove they are in fact the same issue. > > I > > guess I can trigger tests also for those by sending a (sharp)syz > > test ... e-mail with the patch to the Reported-by: addresses, right? > > Correct. > These should be on LKML, but as you noted you can just add the syzbot > email with tag to TO/CC. That email is available in the Reported-by > tag (and also shown on the dashboard). Okay, thanks for confirming. > > And the three reports you pointed out from the pile of corrupted > > reports also seem to match, others look unrelated. > > I've added these as tests: > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341 > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342 > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343 > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344 > > Will try to figure out how to distinguish them from true corrupted > reports. Usually when Call Trace does not have any frames, it's a sign > of a corrupted report, and in other crashes we see the same report but > with a stack trace. But some stack-corruption-related reliably don't > have stack traces (not corrupted). But then some other > stack-corruption-related crashes do have stack traces, and for these > no stack trace again means a corrupted kernel output. Amusingly this > is one of the most complex parts of syzkaller. I'm not sure how complicated that would be, but what about some metric based on valid symbol names being reported? -- Stefano
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio wrote: > > On Fri, 4 Jan 2019 11:32:12 +0100 > Dmitry Vyukov wrote: > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio wrote: > > > > > > On Thu, 3 Jan 2019 15:15:06 -0600 > > > Willem de Bruijn wrote: > > > > > > > syzbot generated stack traces with > > > > > > > > [ 183.517380] udpv6_err+0x46/0x60 > > > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > > > > > Where? I can't find that in any logs linked from the dashboard at > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( > > > > Stefano, there are these 4 bugs reported that have similarly looking > > reproducers involving udp sockets and that crash modes that looks like > > stack corruption/overflow: > > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934 > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7 > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 > > > > Are these the same bug as this? > > Judging from the reproducers for the first three, they seem to be. OK, then I will mark them as dups of this one. > I > guess I can trigger tests also for those by sending a (sharp)syz > test ... e-mail with the patch to the Reported-by: addresses, right? Correct. These should be on LKML, but as you noted you can just add the syzbot email with tag to TO/CC. That email is available in the Reported-by tag (and also shown on the dashboard). > And the three reports you pointed out from the pile of corrupted > reports also seem to match, others look unrelated. I've added these as tests: https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341 https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342 https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343 https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344 Will try to figure out how to distinguish them from true corrupted reports. Usually when Call Trace does not have any frames, it's a sign of a corrupted report, and in other crashes we see the same report but with a stack trace. But some stack-corruption-related reliably don't have stack traces (not corrupted). But then some other stack-corruption-related crashes do have stack traces, and for these no stack trace again means a corrupted kernel output. Amusingly this is one of the most complex parts of syzkaller.
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, 4 Jan 2019 11:32:12 +0100 Dmitry Vyukov wrote: > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio wrote: > > > > On Thu, 3 Jan 2019 15:15:06 -0600 > > Willem de Bruijn wrote: > > > > > syzbot generated stack traces with > > > > > > [ 183.517380] udpv6_err+0x46/0x60 > > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > > > Where? I can't find that in any logs linked from the dashboard at > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( > > Stefano, there are these 4 bugs reported that have similarly looking > reproducers involving udp sockets and that crash modes that looks like > stack corruption/overflow: > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934 > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7 > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 > > Are these the same bug as this? Judging from the reproducers for the first three, they seem to be. I guess I can trigger tests also for those by sending a (sharp)syz test ... e-mail with the patch to the Reported-by: addresses, right? And the three reports you pointed out from the pile of corrupted reports also seem to match, others look unrelated. -- Stefano
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Fri, Jan 4, 2019 at 11:32 AM Dmitry Vyukov wrote: > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio wrote: > > > > On Thu, 3 Jan 2019 15:15:06 -0600 > > Willem de Bruijn wrote: > > > > > syzbot generated stack traces with > > > > > > [ 183.517380] udpv6_err+0x46/0x60 > > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > > > Where? I can't find that in any logs linked from the dashboard at > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( We are ignoring too many bug reports and don't have a formal bug triage process, so unsurprisingly lots get lost/unnoticed/unconnected/etc. I've looked at the pile of corrupted reports: https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 and spotted these 4 that look relevant: [ 1431.820738] [ cut here ] [ 1431.825561] do_IRQ(): syz-executor3 has overflown the kernel stack (cur:88805370,sp:8880ac1651b8,irq stk top-bottom:8880ae600080-8880ae608000,exception stk top-bottom:fe006080-fe01,ip:udp6_lib_lookup2+0x622/0xb20) [ 1431.848168] WARNING: CPU: 0 PID: 14788 at arch/x86/kernel/irq_64.c:61 handle_irq+0x2cb/0x3d8 [ 1431.848178] Kernel panic - not syncing: panic_on_warn set ... [ 1431.862633] CPU: 0 PID: 14788 Comm: syz-executor3 Not tainted 4.20.0+ #6 [ 1431.869494] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 1431.878863] Call Trace: [ 1431.882758] Kernel Offset: disabled [ 1431.886385] Rebooting in 86400 seconds.. [ 343.370355] [ cut here ] [ 343.375254] do_IRQ(): syz-executor1 has overflown the kernel stack (cur:88806e81,sp:8880ac8e0c80,irq stk top-bottom:8880ae600080-8880ae608000,exception stk top-bottom:fe006080-fe01,ip:__sanitizer_cov_trace_pc+0x8/0x50) [ 343.398335] WARNING: CPU: 0 PID: 17088 at arch/x86/kernel/irq_64.c:61 handle_irq+0x2cb/0x3d8 [ 343.398345] Kernel panic - not syncing: panic_on_warn set ... [ 343.412823] CPU: 0 PID: 17088 Comm: syz-executor1 Not tainted 4.20.0+ #6 [ 343.419670] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 343.429024] Call Trace: [ 343.433016] Kernel Offset: disabled [ 343.436648] Rebooting in 86400 seconds.. [ 183.310893] == [ 183.318584] BUG: KASAN: stack-out-of-bounds in debug_lockdep_rcu_enabled.part.0+0x50/0x60 [ 183.326896] Read of size 4 at addr 8880a9eb8cbc by task 8�멀���d/1/356348210 [ 183.334536] [ 183.336165] CPU: 1 PID: 356348210 Comm: 8�멀���d/1 Not tainted 4.20.0+ #2 [ 183.343169] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ 183.352518] Call Trace: [ 183.355108] dump_stack+0x1db/0x2d0 [ 183.358743] ? dump_stack_print_info.cold+0x20/0x20 [ 183.364297] ? debug_lockdep_rcu_enabled.part.0+0x50/0x60 [ 183.369835] print_address_description.cold+0x7c/0x20d [ 183.375117] ? debug_lockdep_rcu_enabled.part.0+0x50/0x60 [ 183.380654] kasan_report.cold+0x8c/0x2ba [ 183.384811] ? gue6_err_proto_handler+0x280/0x280 [ 183.389651] __asan_report_load4_noabort+0x14/0x20 [ 183.394589] debug_lockdep_rcu_enabled.part.0+0x50/0x60 [ 183.399146] list_add corruption. next->prev should be prev (8880ae72d8d8), but was 8880a9eb8600. (next=8880a9eb84f0). [ 183.411727] debug_lockdep_rcu_enabled+0x71/0xa0 [ 183.416475] __udp6_lib_err+0xbc9/0x1890 [ 183.420537] ? udp6_lib_lookup+0xa0/0xa0 [ 183.424595] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20 [ 183.430126] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20 [ 183.435658] ? check_preemption_disabled+0x48/0x290 [ 183.440668] ? gue6_err_proto_handler+0x280/0x280 [ 183.445505] ? rcu_lockdep_current_cpu_online+0x1aa/0x220 [ 183.451033] ? rcu_pm_notify+0xd0/0xd0 [ 183.454912] udpv6_err+0x46/0x60 [ 183.458277] ? __udp6_lib_err+0x1890/0x1890 [ 183.462593] gue6_err_proto_handler+0x199/0x280 [ 183.467252] ? gre_rcv+0x1600/0x1600 [ 183.470971] ? check_preemption_disabled+0x48/0x290 [ 183.475983] gue6_err+0x4c1/0x6b0 [ 183.479435] ? gue6_err_proto_handler+0x280/0x280 [ 183.484287] __udp6_lib_err+0xc40/0x1890 [ 183.488352] ? udp6_lib_lookup+0xa0/0xa0 [ 183.492411] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20 [ 183.497941] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20 [ 183.503472] ? check_preemption_disabled+0x48/0x290 [ 183.508483] ? gue6_err_proto_handler+0x280/0x280 [ 183.513320] ? __lock_is_held+0xb6/0x140 [ 183.517380] udpv6_err+0x46/0x60 [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 [ 183.525054] gue6_err_proto_handler+0x199/0x280 [ 183.529719] ? gre_rcv+0x1600/0x1600 [ 183.533429] ? check_preemption_disabled+0x48/0x290 [ 183.538459] gue6_err+0x4c1/0x6b0 [ 183.541915] ? gue6_err_proto_handler+0x280/0x280 [ 183.546749] __udp6_lib_err+0xc40/0x1890 [ 183.550815] ?
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio wrote: > > On Thu, 3 Jan 2019 15:15:06 -0600 > Willem de Bruijn wrote: > > > syzbot generated stack traces with > > > > [ 183.517380] udpv6_err+0x46/0x60 > > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > > [ 183.525054] gue6_err_proto_handler+0x199/0x280 > > Where? I can't find that in any logs linked from the dashboard at > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( Stefano, there are these 4 bugs reported that have similarly looking reproducers involving udp sockets and that crash modes that looks like stack corruption/overflow: https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934 https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7 https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452 Are these the same bug as this?
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Thu, 3 Jan 2019 15:15:06 -0600 Willem de Bruijn wrote: > syzbot generated stack traces with > > [ 183.517380] udpv6_err+0x46/0x60 > [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 > [ 183.525054] gue6_err_proto_handler+0x199/0x280 Where? I can't find that in any logs linked from the dashboard at https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :( -- Stefano
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Thu, Jan 3, 2019 at 2:07 PM Stefano Brivio wrote: > > On Thu, 3 Jan 2019 12:01:29 -0800 > Eric Dumazet wrote: > > > On 01/03/2019 05:07 AM, syzbot wrote: > > > Hello, > > > > > > syzbot found the following crash on: > > > > > > HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of > > > git://git.kern.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7 > > > dashboard link: > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 > > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com > > > > > > protocol 88fb is buggy, dev hsr_slave_1 > > > protocol 88fb is buggy, dev hsr_slave_0 > > > protocol 88fb is buggy, dev hsr_slave_1 > > > FAT-fs (loop0): invalid media value (0x00) > > > FAT-fs (loop0): Can't find a valid FAT filesystem > > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted > > > in: udp4_lib_lookup2+0x7ea/0x7f0 net/ipv4/udp.c:455 > > > CPU: 1 PID: 17960 Comm: syz-executor2 Not tainted 4.20.0+ #176 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > > Google 01/01/2011 > > > Call Trace: > > > Kernel Offset: disabled > > > Rebooting in 86400 seconds.. > > > > > > > > > --- > > > This bug is generated by a bot. It may contain errors. > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > syzbot engineers can be reached at syzkal...@googlegroups.com. > > > > > > syzbot will keep track of this bug report. See: > > > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with > > > syzbot. > > > > Maybe commit 11789039da536fea96c98a40c2b441decf2e7323 > > Author: Stefano Brivio > > Date: Tue Dec 18 00:13:17 2018 +0100 > > > > fou: Prevent unbounded recursion in GUE error handler > > > > Forgot to deal with IPv6 ? > > Damn, yes. :( Thanks both for pointing that out, patch coming. > > Still, I can't be sure this is the same issue. syzbot generated stack traces with [ 183.517380] udpv6_err+0x46/0x60 [ 183.520739] ? __udp6_lib_err+0x1890/0x1890 [ 183.525054] gue6_err_proto_handler+0x199/0x280 so it is quite likely
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Thu, 3 Jan 2019 12:01:29 -0800 Eric Dumazet wrote: > On 01/03/2019 05:07 AM, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: 195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40 > > kernel config: https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7 > > dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com > > > > protocol 88fb is buggy, dev hsr_slave_1 > > protocol 88fb is buggy, dev hsr_slave_0 > > protocol 88fb is buggy, dev hsr_slave_1 > > FAT-fs (loop0): invalid media value (0x00) > > FAT-fs (loop0): Can't find a valid FAT filesystem > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: > > udp4_lib_lookup2+0x7ea/0x7f0 net/ipv4/udp.c:455 > > CPU: 1 PID: 17960 Comm: syz-executor2 Not tainted 4.20.0+ #176 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Call Trace: > > Kernel Offset: disabled > > Rebooting in 86400 seconds.. > > > > > > --- > > This bug is generated by a bot. It may contain errors. > > See https://goo.gl/tpsmEJ for more information about syzbot. > > syzbot engineers can be reached at syzkal...@googlegroups.com. > > > > syzbot will keep track of this bug report. See: > > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with > > syzbot. > > Maybe commit 11789039da536fea96c98a40c2b441decf2e7323 > Author: Stefano Brivio > Date: Tue Dec 18 00:13:17 2018 +0100 > > fou: Prevent unbounded recursion in GUE error handler > > Forgot to deal with IPv6 ? Damn, yes. :( Thanks both for pointing that out, patch coming. Still, I can't be sure this is the same issue. -- Stefano
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On 01/03/2019 05:07 AM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: 195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40 > kernel config: https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7 > dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com > > protocol 88fb is buggy, dev hsr_slave_1 > protocol 88fb is buggy, dev hsr_slave_0 > protocol 88fb is buggy, dev hsr_slave_1 > FAT-fs (loop0): invalid media value (0x00) > FAT-fs (loop0): Can't find a valid FAT filesystem > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: > udp4_lib_lookup2+0x7ea/0x7f0 net/ipv4/udp.c:455 > CPU: 1 PID: 17960 Comm: syz-executor2 Not tainted 4.20.0+ #176 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > Kernel Offset: disabled > Rebooting in 86400 seconds.. > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkal...@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot. Maybe commit 11789039da536fea96c98a40c2b441decf2e7323 Author: Stefano Brivio Date: Tue Dec 18 00:13:17 2018 +0100 fou: Prevent unbounded recursion in GUE error handler Forgot to deal with IPv6 ?
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
Hi Willem, On Thu, 3 Jan 2019 13:41:43 -0600 Willem de Bruijn wrote: > On Thu, Jan 3, 2019 at 1:39 PM Willem de Bruijn > wrote: > > > > On Thu, Jan 3, 2019 at 7:07 AM syzbot > > wrote: > > > > > > Hello, > > > > > > syzbot found the following crash on: > > > > > > HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of > > > git://git.kern.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7 > > > dashboard link: > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 > > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com > > > > > > protocol 88fb is buggy, dev hsr_slave_1 > > > protocol 88fb is buggy, dev hsr_slave_0 > > > protocol 88fb is buggy, dev hsr_slave_1 > > > FAT-fs (loop0): invalid media value (0x00) > > > FAT-fs (loop0): Can't find a valid FAT filesystem > > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted > > > in: > > > > This sounds similar to the stack corruption fixed recently in commit > > e7cc082455cb ("udp: Support for error handlers of tunnels ..."). > > > > That fix is for ipv4 gue_err(). ipv6 gue6_err() probably needs the same. > > Correction. The fix is 11789039da ("fou: prevent unbounded recursion > in GUE error handler") Yes, I looked into this, the fix for that issue is on the tree tested by syzbot, and I think this is unrelated, also because KASan should say something before we hit that. By the way, do you happen to know if I objects from kernels tested by syzbot are stored anywhere? It would be helpful to know for sure what's at udp4_lib_lookup2+0x7ea. -- Stefano
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Thu, Jan 3, 2019 at 1:39 PM Willem de Bruijn wrote: > > On Thu, Jan 3, 2019 at 7:07 AM syzbot > wrote: > > > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40 > > kernel config: https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7 > > dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com > > > > protocol 88fb is buggy, dev hsr_slave_1 > > protocol 88fb is buggy, dev hsr_slave_0 > > protocol 88fb is buggy, dev hsr_slave_1 > > FAT-fs (loop0): invalid media value (0x00) > > FAT-fs (loop0): Can't find a valid FAT filesystem > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: > > This sounds similar to the stack corruption fixed recently in commit > e7cc082455cb ("udp: Support for error handlers of tunnels ..."). > > That fix is for ipv4 gue_err(). ipv6 gue6_err() probably needs the same. Correction. The fix is 11789039da ("fou: prevent unbounded recursion in GUE error handler")
Re: kernel panic: stack is corrupted in udp4_lib_lookup2
On Thu, Jan 3, 2019 at 7:07 AM syzbot wrote: > > Hello, > > syzbot found the following crash on: > > HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40 > kernel config: https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7 > dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com > > protocol 88fb is buggy, dev hsr_slave_1 > protocol 88fb is buggy, dev hsr_slave_0 > protocol 88fb is buggy, dev hsr_slave_1 > FAT-fs (loop0): invalid media value (0x00) > FAT-fs (loop0): Can't find a valid FAT filesystem > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: This sounds similar to the stack corruption fixed recently in commit e7cc082455cb ("udp: Support for error handlers of tunnels ..."). That fix is for ipv4 gue_err(). ipv6 gue6_err() probably needs the same.