Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-07 Thread Dmitry Vyukov
On Fri, Jan 4, 2019 at 7:05 PM Stefano Brivio  wrote:
>
> On Fri, 4 Jan 2019 18:26:16 +0100
> Dmitry Vyukov  wrote:
>
> > On Fri, Jan 4, 2019 at 6:14 PM Stefano Brivio  wrote:
> > >
> > > On Fri, 4 Jan 2019 12:05:04 +0100
> > > Dmitry Vyukov  wrote:
> > >
> > > > I've added these as tests:
> > > >
> > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341
> > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342
> > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343
> > > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344
> > > >
> > > > Will try to figure out how to distinguish them from true corrupted
> > > > reports. Usually when Call Trace does not have any frames, it's a sign
> > > > of a corrupted report, and in other crashes we see the same report but
> > > > with a stack trace. But some stack-corruption-related reliably don't
> > > > have stack traces (not corrupted). But then some other
> > > > stack-corruption-related crashes do have stack traces, and for these
> > > > no stack trace again means a corrupted kernel output. Amusingly this
> > > > is one of the most complex parts of syzkaller.
> > >
> > > I'm not sure how complicated that would be, but what about some metric
> > > based on valid symbol names being reported?
> >
> > Please elaborate. What do you mean by "valid symbol names"?
>
> I mean a symbol name listed in /proc/kallsyms on the running system.
>
> This is usually my minimum threshold for "I can do something with this
> report" -- which doesn't mean it's necessarily valid, but well, if you
> have that, it means that at least something worked in the reporting,
> and you can at least start having a look at a specific function.
>
> > Note that corrupted output detection solves 2 problems:
> > 1. Do we think the output is truncated to the point of being not useful?
> > E.g. sometimes kernel produces just 1 line:
> >
> > general protection fault:  [#1] PREEMPT SMP KASAN
> >
> > This is sure a crash, but it's not too useful to report.
>
> Sure. In those tests above you have:
> - 341: udp6_lib_lookup2+0x622, handle_irq+0x2cb
> - 342: __sanitizer_cov_trace_pc+0x8, handle_irq+0x2cb
> - 343: __udp6_lib_err, etc.
> - 344: __udp6_lib_lookup+0x1d, etc.
>
> and this makes all those reports at least minimally useful.
>
> > 2. Do we have any reasons to think we extracted bogus crash identity?
> > E.g. crash intermixed with output from another thread so that we say
> > "something-bad in function foo", when in fact function foo come from
> > output of the second non-crashing thread.
>
> Okay, this looks way more complicated :)

Yeah, unfortunately, it's quite complicated.
Just today this gen popped up. You won't find any ODEBUG checks at
that stack, it's completely unrelated and come from another task.

[ cut here ]
ODEBUG: free active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:4916
WARNING: CPU: 1 PID: 45 at lib/debugobjects.c:325
debug_print_object+0x16a/0x250 lib/debugobjects.c:325
CPU: 0 PID: 13619 Comm: syz-executor1 Not tainted 4.20.0+ #13
Kernel panic - not syncing: panic_on_warn set ...
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 warn_alloc.cold+0xc2/0x1c8 mm/page_alloc.c:3570
 __vmalloc_node_range+0x57a/0x910 mm/vmalloc.c:1766
 __vmalloc_node mm/vmalloc.c:1795 [inline]
 __vmalloc_node_flags mm/vmalloc.c:1809 [inline]
 vmalloc+0x6b/0x90 mm/vmalloc.c:1831
 sel_write_load+0x1de/0x470 security/selinux/selinuxfs.c:557
 __vfs_write+0x116/0xb40 fs/read_write.c:485
 vfs_write+0x20c/0x580 fs/read_write.c:549
 ksys_write+0x105/0x260 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Stefano Brivio
On Fri, 4 Jan 2019 18:26:16 +0100
Dmitry Vyukov  wrote:

> On Fri, Jan 4, 2019 at 6:14 PM Stefano Brivio  wrote:
> >
> > On Fri, 4 Jan 2019 12:05:04 +0100
> > Dmitry Vyukov  wrote:
> >
> > > I've added these as tests:
> > >
> > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341
> > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342
> > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343
> > > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344
> > >
> > > Will try to figure out how to distinguish them from true corrupted
> > > reports. Usually when Call Trace does not have any frames, it's a sign
> > > of a corrupted report, and in other crashes we see the same report but
> > > with a stack trace. But some stack-corruption-related reliably don't
> > > have stack traces (not corrupted). But then some other
> > > stack-corruption-related crashes do have stack traces, and for these
> > > no stack trace again means a corrupted kernel output. Amusingly this
> > > is one of the most complex parts of syzkaller.  
> >
> > I'm not sure how complicated that would be, but what about some metric
> > based on valid symbol names being reported?  
> 
> Please elaborate. What do you mean by "valid symbol names"?

I mean a symbol name listed in /proc/kallsyms on the running system.

This is usually my minimum threshold for "I can do something with this
report" -- which doesn't mean it's necessarily valid, but well, if you
have that, it means that at least something worked in the reporting,
and you can at least start having a look at a specific function.

> Note that corrupted output detection solves 2 problems:
> 1. Do we think the output is truncated to the point of being not useful?
> E.g. sometimes kernel produces just 1 line:
> 
> general protection fault:  [#1] PREEMPT SMP KASAN
> 
> This is sure a crash, but it's not too useful to report.

Sure. In those tests above you have:
- 341: udp6_lib_lookup2+0x622, handle_irq+0x2cb
- 342: __sanitizer_cov_trace_pc+0x8, handle_irq+0x2cb
- 343: __udp6_lib_err, etc.
- 344: __udp6_lib_lookup+0x1d, etc.

and this makes all those reports at least minimally useful.

> 2. Do we have any reasons to think we extracted bogus crash identity?
> E.g. crash intermixed with output from another thread so that we say
> "something-bad in function foo", when in fact function foo come from
> output of the second non-crashing thread.

Okay, this looks way more complicated :)

-- 
Stefano


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Stefano Brivio
On Fri, 4 Jan 2019 12:24:18 -0500
Willem de Bruijn  wrote:

> On Fri, Jan 4, 2019 at 12:14 PM Stefano Brivio  wrote:
> >
> > On Fri, 4 Jan 2019 12:05:04 +0100
> > Dmitry Vyukov  wrote:
> >  
> > > On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio  
> > > wrote:  
> > > >
> > > > On Fri, 4 Jan 2019 11:32:12 +0100
> > > > Dmitry Vyukov  wrote:
> > > >  
> > > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  
> > > > > wrote:  
> > > > > >
> > > > > > On Thu, 3 Jan 2019 15:15:06 -0600
> > > > > > Willem de Bruijn  wrote:
> > > > > >  
> > > > > > > syzbot generated stack traces with
> > > > > > >
> > > > > > > [  183.517380]  udpv6_err+0x46/0x60
> > > > > > > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > > > > > > [  183.525054]  gue6_err_proto_handler+0x199/0x280  
> > > > > >
> > > > > > Where? I can't find that in any logs linked from the dashboard at
> > > > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(  
> > > > >
> > > > > Stefano, there are these 4 bugs reported that have similarly looking
> > > > > reproducers involving udp sockets and that crash modes that looks like
> > > > > stack corruption/overflow:
> > > > >
> > > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934
> > > > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7
> > > > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe
> > > > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> > > > >
> > > > > Are these the same bug as this?  
> > > >
> > > > Judging from the reproducers for the first three, they seem to be.  
> > >
> > > OK, then I will mark them as dups of this one.  
> >
> > syzbot just finished the tests I requested and couldn't reproduce the
> > first three issues with the fix I posted (fou6: Prevent unbounded
> > recursion in GUE error handler).  
> 
> Thanks for preparing the fixes so quickly, Stefano.
> 
> I also noticed one trace that seemingly goes through an ip6erspan
> tunnel as well as gue6.
> 
> [  760.618683]  ? __udp6_lib_err+0xcb/0x640
> [  760.622716]  ? udplitev6_err+0x46/0x60
> [  760.626573]  ? gue6_err+0x105/0x270
> [  760.630170]  ? udp_lib_close+0x20/0x20
> [  760.634027]  ? ip6erspan_tunnel_xmit+0xdc0/0xdc0
> 
> Without knowing the err_handler code too well: is it possible that
> packets with an intermediate IPIP or other tunnel still bypass the
> checks (which check for strictly UDP in GUE)?

Yes, I also noticed that, and concluded it's not an issue, but thanks
for pointing that out.

Recursion can't happen there because other handlers don't forward the
exception to the exception handler of the inner layer. For ERSPAN, e.g.,
see ip6gre_err(): it "simply" looks up the tunnel and calls
ip6_update_pmtu() and ip6_redirect().

For FoU and GUE this is not possible as we don't maintain enough state
to be reasonably sure the exception is legitimate.

-- 
Stefano


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Dmitry Vyukov
On Fri, Jan 4, 2019 at 6:14 PM Stefano Brivio  wrote:
>
> On Fri, 4 Jan 2019 12:05:04 +0100
> Dmitry Vyukov  wrote:
>
> > On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio  wrote:
> > >
> > > On Fri, 4 Jan 2019 11:32:12 +0100
> > > Dmitry Vyukov  wrote:
> > >
> > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  
> > > > wrote:
> > > > >
> > > > > On Thu, 3 Jan 2019 15:15:06 -0600
> > > > > Willem de Bruijn  wrote:
> > > > >
> > > > > > syzbot generated stack traces with
> > > > > >
> > > > > > [  183.517380]  udpv6_err+0x46/0x60
> > > > > > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > > > > > [  183.525054]  gue6_err_proto_handler+0x199/0x280
> > > > >
> > > > > Where? I can't find that in any logs linked from the dashboard at
> > > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(
> > > >
> > > > Stefano, there are these 4 bugs reported that have similarly looking
> > > > reproducers involving udp sockets and that crash modes that looks like
> > > > stack corruption/overflow:
> > > >
> > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934
> > > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7
> > > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe
> > > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> > > >
> > > > Are these the same bug as this?
> > >
> > > Judging from the reproducers for the first three, they seem to be.
> >
> > OK, then I will mark them as dups of this one.
>
> syzbot just finished the tests I requested and couldn't reproduce the
> first three issues with the fix I posted (fou6: Prevent unbounded
> recursion in GUE error handler).
>
> This should prove they are in fact the same issue.
>
> > > I
> > > guess I can trigger tests also for those by sending a (sharp)syz
> > > test ... e-mail with the patch to the Reported-by: addresses, right?
> >
> > Correct.
> > These should be on LKML, but as you noted you can just add the syzbot
> > email with tag to TO/CC. That email is available in the Reported-by
> > tag (and also shown on the dashboard).
>
> Okay, thanks for confirming.
>
> > > And the three reports you pointed out from the pile of corrupted
> > > reports also seem to match, others look unrelated.
> >
> > I've added these as tests:
> >
> > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341
> > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342
> > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343
> > https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344
> >
> > Will try to figure out how to distinguish them from true corrupted
> > reports. Usually when Call Trace does not have any frames, it's a sign
> > of a corrupted report, and in other crashes we see the same report but
> > with a stack trace. But some stack-corruption-related reliably don't
> > have stack traces (not corrupted). But then some other
> > stack-corruption-related crashes do have stack traces, and for these
> > no stack trace again means a corrupted kernel output. Amusingly this
> > is one of the most complex parts of syzkaller.
>
> I'm not sure how complicated that would be, but what about some metric
> based on valid symbol names being reported?

Please elaborate. What do you mean by "valid symbol names"?

Note that corrupted output detection solves 2 problems:
1. Do we think the output is truncated to the point of being not useful?
E.g. sometimes kernel produces just 1 line:

general protection fault:  [#1] PREEMPT SMP KASAN

This is sure a crash, but it's not too useful to report.

2. Do we have any reasons to think we extracted bogus crash identity?
E.g. crash intermixed with output from another thread so that we say
"something-bad in function foo", when in fact function foo come from
output of the second non-crashing thread.


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Willem de Bruijn
On Fri, Jan 4, 2019 at 12:14 PM Stefano Brivio  wrote:
>
> On Fri, 4 Jan 2019 12:05:04 +0100
> Dmitry Vyukov  wrote:
>
> > On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio  wrote:
> > >
> > > On Fri, 4 Jan 2019 11:32:12 +0100
> > > Dmitry Vyukov  wrote:
> > >
> > > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  
> > > > wrote:
> > > > >
> > > > > On Thu, 3 Jan 2019 15:15:06 -0600
> > > > > Willem de Bruijn  wrote:
> > > > >
> > > > > > syzbot generated stack traces with
> > > > > >
> > > > > > [  183.517380]  udpv6_err+0x46/0x60
> > > > > > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > > > > > [  183.525054]  gue6_err_proto_handler+0x199/0x280
> > > > >
> > > > > Where? I can't find that in any logs linked from the dashboard at
> > > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(
> > > >
> > > > Stefano, there are these 4 bugs reported that have similarly looking
> > > > reproducers involving udp sockets and that crash modes that looks like
> > > > stack corruption/overflow:
> > > >
> > > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934
> > > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7
> > > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe
> > > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> > > >
> > > > Are these the same bug as this?
> > >
> > > Judging from the reproducers for the first three, they seem to be.
> >
> > OK, then I will mark them as dups of this one.
>
> syzbot just finished the tests I requested and couldn't reproduce the
> first three issues with the fix I posted (fou6: Prevent unbounded
> recursion in GUE error handler).

Thanks for preparing the fixes so quickly, Stefano.

I also noticed one trace that seemingly goes through an ip6erspan
tunnel as well as gue6.

[  760.618683]  ? __udp6_lib_err+0xcb/0x640
[  760.622716]  ? udplitev6_err+0x46/0x60
[  760.626573]  ? gue6_err+0x105/0x270
[  760.630170]  ? udp_lib_close+0x20/0x20
[  760.634027]  ? ip6erspan_tunnel_xmit+0xdc0/0xdc0

Without knowing the err_handler code too well: is it possible that
packets with an intermediate IPIP or other tunnel still bypass the
checks (which check for strictly UDP in GUE)?


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Stefano Brivio
On Fri, 4 Jan 2019 12:05:04 +0100
Dmitry Vyukov  wrote:

> On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio  wrote:
> >
> > On Fri, 4 Jan 2019 11:32:12 +0100
> > Dmitry Vyukov  wrote:
> >  
> > > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  
> > > wrote:  
> > > >
> > > > On Thu, 3 Jan 2019 15:15:06 -0600
> > > > Willem de Bruijn  wrote:
> > > >  
> > > > > syzbot generated stack traces with
> > > > >
> > > > > [  183.517380]  udpv6_err+0x46/0x60
> > > > > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > > > > [  183.525054]  gue6_err_proto_handler+0x199/0x280  
> > > >
> > > > Where? I can't find that in any logs linked from the dashboard at
> > > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(  
> > >
> > > Stefano, there are these 4 bugs reported that have similarly looking
> > > reproducers involving udp sockets and that crash modes that looks like
> > > stack corruption/overflow:
> > >
> > > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934
> > > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7
> > > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe
> > > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> > >
> > > Are these the same bug as this?  
> >
> > Judging from the reproducers for the first three, they seem to be.  
> 
> OK, then I will mark them as dups of this one.

syzbot just finished the tests I requested and couldn't reproduce the
first three issues with the fix I posted (fou6: Prevent unbounded
recursion in GUE error handler).

This should prove they are in fact the same issue.

> > I
> > guess I can trigger tests also for those by sending a (sharp)syz
> > test ... e-mail with the patch to the Reported-by: addresses, right?  
> 
> Correct.
> These should be on LKML, but as you noted you can just add the syzbot
> email with tag to TO/CC. That email is available in the Reported-by
> tag (and also shown on the dashboard).

Okay, thanks for confirming.

> > And the three reports you pointed out from the pile of corrupted
> > reports also seem to match, others look unrelated.  
> 
> I've added these as tests:
> 
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343
> https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344
> 
> Will try to figure out how to distinguish them from true corrupted
> reports. Usually when Call Trace does not have any frames, it's a sign
> of a corrupted report, and in other crashes we see the same report but
> with a stack trace. But some stack-corruption-related reliably don't
> have stack traces (not corrupted). But then some other
> stack-corruption-related crashes do have stack traces, and for these
> no stack trace again means a corrupted kernel output. Amusingly this
> is one of the most complex parts of syzkaller.

I'm not sure how complicated that would be, but what about some metric
based on valid symbol names being reported?

-- 
Stefano



Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Dmitry Vyukov
On Fri, Jan 4, 2019 at 11:54 AM Stefano Brivio  wrote:
>
> On Fri, 4 Jan 2019 11:32:12 +0100
> Dmitry Vyukov  wrote:
>
> > On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  wrote:
> > >
> > > On Thu, 3 Jan 2019 15:15:06 -0600
> > > Willem de Bruijn  wrote:
> > >
> > > > syzbot generated stack traces with
> > > >
> > > > [  183.517380]  udpv6_err+0x46/0x60
> > > > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > > > [  183.525054]  gue6_err_proto_handler+0x199/0x280
> > >
> > > Where? I can't find that in any logs linked from the dashboard at
> > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(
> >
> > Stefano, there are these 4 bugs reported that have similarly looking
> > reproducers involving udp sockets and that crash modes that looks like
> > stack corruption/overflow:
> >
> > https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934
> > https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7
> > https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe
> > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> >
> > Are these the same bug as this?
>
> Judging from the reproducers for the first three, they seem to be.

OK, then I will mark them as dups of this one.

> I
> guess I can trigger tests also for those by sending a (sharp)syz
> test ... e-mail with the patch to the Reported-by: addresses, right?

Correct.
These should be on LKML, but as you noted you can just add the syzbot
email with tag to TO/CC. That email is available in the Reported-by
tag (and also shown on the dashboard).

> And the three reports you pointed out from the pile of corrupted
> reports also seem to match, others look unrelated.

I've added these as tests:

https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/341
https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/342
https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/343
https://github.com/google/syzkaller/blob/master/pkg/report/testdata/linux/report/344

Will try to figure out how to distinguish them from true corrupted
reports. Usually when Call Trace does not have any frames, it's a sign
of a corrupted report, and in other crashes we see the same report but
with a stack trace. But some stack-corruption-related reliably don't
have stack traces (not corrupted). But then some other
stack-corruption-related crashes do have stack traces, and for these
no stack trace again means a corrupted kernel output. Amusingly this
is one of the most complex parts of syzkaller.


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Stefano Brivio
On Fri, 4 Jan 2019 11:32:12 +0100
Dmitry Vyukov  wrote:

> On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  wrote:
> >
> > On Thu, 3 Jan 2019 15:15:06 -0600
> > Willem de Bruijn  wrote:
> >  
> > > syzbot generated stack traces with
> > >
> > > [  183.517380]  udpv6_err+0x46/0x60
> > > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > > [  183.525054]  gue6_err_proto_handler+0x199/0x280  
> >
> > Where? I can't find that in any logs linked from the dashboard at
> > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(  
> 
> Stefano, there are these 4 bugs reported that have similarly looking
> reproducers involving udp sockets and that crash modes that looks like
> stack corruption/overflow:
> 
> https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934
> https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7
> https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe
> https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> 
> Are these the same bug as this?

Judging from the reproducers for the first three, they seem to be. I
guess I can trigger tests also for those by sending a (sharp)syz
test ... e-mail with the patch to the Reported-by: addresses, right?

And the three reports you pointed out from the pile of corrupted
reports also seem to match, others look unrelated.

-- 
Stefano


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Dmitry Vyukov
On Fri, Jan 4, 2019 at 11:32 AM Dmitry Vyukov  wrote:
>
> On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  wrote:
> >
> > On Thu, 3 Jan 2019 15:15:06 -0600
> > Willem de Bruijn  wrote:
> >
> > > syzbot generated stack traces with
> > >
> > > [  183.517380]  udpv6_err+0x46/0x60
> > > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > > [  183.525054]  gue6_err_proto_handler+0x199/0x280
> >
> > Where? I can't find that in any logs linked from the dashboard at
> > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(

We are ignoring too many bug reports and don't have a formal bug
triage process, so unsurprisingly lots get
lost/unnoticed/unconnected/etc.

I've looked at the pile of corrupted reports:
https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
and spotted these 4 that look relevant:

[ 1431.820738] [ cut here ]
[ 1431.825561] do_IRQ(): syz-executor3 has overflown the kernel stack
(cur:88805370,sp:8880ac1651b8,irq stk
top-bottom:8880ae600080-8880ae608000,exception stk
top-bottom:fe006080-fe01,ip:udp6_lib_lookup2+0x622/0xb20)
[ 1431.848168] WARNING: CPU: 0 PID: 14788 at
arch/x86/kernel/irq_64.c:61 handle_irq+0x2cb/0x3d8
[ 1431.848178] Kernel panic - not syncing: panic_on_warn set ...
[ 1431.862633] CPU: 0 PID: 14788 Comm: syz-executor3 Not tainted 4.20.0+ #6
[ 1431.869494] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[ 1431.878863] Call Trace:
[ 1431.882758] Kernel Offset: disabled
[ 1431.886385] Rebooting in 86400 seconds..

[  343.370355] [ cut here ]
[  343.375254] do_IRQ(): syz-executor1 has overflown the kernel stack
(cur:88806e81,sp:8880ac8e0c80,irq stk
top-bottom:8880ae600080-8880ae608000,exception stk
top-bottom:fe006080-fe01,ip:__sanitizer_cov_trace_pc+0x8/0x50)
[  343.398335] WARNING: CPU: 0 PID: 17088 at
arch/x86/kernel/irq_64.c:61 handle_irq+0x2cb/0x3d8
[  343.398345] Kernel panic - not syncing: panic_on_warn set ...
[  343.412823] CPU: 0 PID: 17088 Comm: syz-executor1 Not tainted 4.20.0+ #6
[  343.419670] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[  343.429024] Call Trace:
[  343.433016] Kernel Offset: disabled
[  343.436648] Rebooting in 86400 seconds..

[  183.310893] 
==
[  183.318584] BUG: KASAN: stack-out-of-bounds in
debug_lockdep_rcu_enabled.part.0+0x50/0x60
[  183.326896] Read of size 4 at addr 8880a9eb8cbc by task
8�멀���d/1/356348210
[  183.334536]
[  183.336165] CPU: 1 PID: 356348210 Comm: 8�멀���d/1 Not tainted 4.20.0+ #2
[  183.343169] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 01/01/2011
[  183.352518] Call Trace:
[  183.355108]  dump_stack+0x1db/0x2d0
[  183.358743]  ? dump_stack_print_info.cold+0x20/0x20
[  183.364297]  ? debug_lockdep_rcu_enabled.part.0+0x50/0x60
[  183.369835]  print_address_description.cold+0x7c/0x20d
[  183.375117]  ? debug_lockdep_rcu_enabled.part.0+0x50/0x60
[  183.380654]  kasan_report.cold+0x8c/0x2ba
[  183.384811]  ? gue6_err_proto_handler+0x280/0x280
[  183.389651]  __asan_report_load4_noabort+0x14/0x20
[  183.394589]  debug_lockdep_rcu_enabled.part.0+0x50/0x60
[  183.399146] list_add corruption. next->prev should be prev
(8880ae72d8d8), but was 8880a9eb8600. (next=8880a9eb84f0).
[  183.411727]  debug_lockdep_rcu_enabled+0x71/0xa0
[  183.416475]  __udp6_lib_err+0xbc9/0x1890
[  183.420537]  ? udp6_lib_lookup+0xa0/0xa0
[  183.424595]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  183.430126]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  183.435658]  ? check_preemption_disabled+0x48/0x290
[  183.440668]  ? gue6_err_proto_handler+0x280/0x280
[  183.445505]  ? rcu_lockdep_current_cpu_online+0x1aa/0x220
[  183.451033]  ? rcu_pm_notify+0xd0/0xd0
[  183.454912]  udpv6_err+0x46/0x60
[  183.458277]  ? __udp6_lib_err+0x1890/0x1890
[  183.462593]  gue6_err_proto_handler+0x199/0x280
[  183.467252]  ? gre_rcv+0x1600/0x1600
[  183.470971]  ? check_preemption_disabled+0x48/0x290
[  183.475983]  gue6_err+0x4c1/0x6b0
[  183.479435]  ? gue6_err_proto_handler+0x280/0x280
[  183.484287]  __udp6_lib_err+0xc40/0x1890
[  183.488352]  ? udp6_lib_lookup+0xa0/0xa0
[  183.492411]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  183.497941]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[  183.503472]  ? check_preemption_disabled+0x48/0x290
[  183.508483]  ? gue6_err_proto_handler+0x280/0x280
[  183.513320]  ? __lock_is_held+0xb6/0x140
[  183.517380]  udpv6_err+0x46/0x60
[  183.520739]  ? __udp6_lib_err+0x1890/0x1890
[  183.525054]  gue6_err_proto_handler+0x199/0x280
[  183.529719]  ? gre_rcv+0x1600/0x1600
[  183.533429]  ? check_preemption_disabled+0x48/0x290
[  183.538459]  gue6_err+0x4c1/0x6b0
[  183.541915]  ? gue6_err_proto_handler+0x280/0x280
[  183.546749]  __udp6_lib_err+0xc40/0x1890
[  183.550815]  ?

Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-04 Thread Dmitry Vyukov
On Thu, Jan 3, 2019 at 10:54 PM Stefano Brivio  wrote:
>
> On Thu, 3 Jan 2019 15:15:06 -0600
> Willem de Bruijn  wrote:
>
> > syzbot generated stack traces with
> >
> > [  183.517380]  udpv6_err+0x46/0x60
> > [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> > [  183.525054]  gue6_err_proto_handler+0x199/0x280
>
> Where? I can't find that in any logs linked from the dashboard at
> https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(

Stefano, there are these 4 bugs reported that have similarly looking
reproducers involving udp sockets and that crash modes that looks like
stack corruption/overflow:

https://syzkaller.appspot.com/bug?extid=14005fa30c9a07192934
https://syzkaller.appspot.com/bug?extid=d14090007dc9ba5fa9b7
https://syzkaller.appspot.com/bug?extid=137ed32ec9a6d5b0d5fe
https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452

Are these the same bug as this?


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-03 Thread Stefano Brivio
On Thu, 3 Jan 2019 15:15:06 -0600
Willem de Bruijn  wrote:

> syzbot generated stack traces with
> 
> [  183.517380]  udpv6_err+0x46/0x60
> [  183.520739]  ? __udp6_lib_err+0x1890/0x1890
> [  183.525054]  gue6_err_proto_handler+0x199/0x280

Where? I can't find that in any logs linked from the dashboard at
https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0 :(

-- 
Stefano


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-03 Thread Willem de Bruijn
On Thu, Jan 3, 2019 at 2:07 PM Stefano Brivio  wrote:
>
> On Thu, 3 Jan 2019 12:01:29 -0800
> Eric Dumazet  wrote:
>
> > On 01/03/2019 05:07 AM, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of 
> > > git://git.kern..
> > > git tree:   upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0
> > > compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com
> > >
> > > protocol 88fb is buggy, dev hsr_slave_1
> > > protocol 88fb is buggy, dev hsr_slave_0
> > > protocol 88fb is buggy, dev hsr_slave_1
> > > FAT-fs (loop0): invalid media value (0x00)
> > > FAT-fs (loop0): Can't find a valid FAT filesystem
> > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted 
> > > in: udp4_lib_lookup2+0x7ea/0x7f0 net/ipv4/udp.c:455
> > > CPU: 1 PID: 17960 Comm: syz-executor2 Not tainted 4.20.0+ #176
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > > Google 01/01/2011
> > > Call Trace:
> > > Kernel Offset: disabled
> > > Rebooting in 86400 seconds..
> > >
> > >
> > > ---
> > > This bug is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at syzkal...@googlegroups.com.
> > >
> > > syzbot will keep track of this bug report. See:
> > > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with 
> > > syzbot.
> >
> > Maybe commit 11789039da536fea96c98a40c2b441decf2e7323
> > Author: Stefano Brivio 
> > Date:   Tue Dec 18 00:13:17 2018 +0100
> >
> > fou: Prevent unbounded recursion in GUE error handler
> >
> > Forgot to deal with IPv6 ?
>
> Damn, yes. :( Thanks both for pointing that out, patch coming.
>
> Still, I can't be sure this is the same issue.

syzbot generated stack traces with

[  183.517380]  udpv6_err+0x46/0x60
[  183.520739]  ? __udp6_lib_err+0x1890/0x1890
[  183.525054]  gue6_err_proto_handler+0x199/0x280

so it is quite likely


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-03 Thread Stefano Brivio
On Thu, 3 Jan 2019 12:01:29 -0800
Eric Dumazet  wrote:

> On 01/03/2019 05:07 AM, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0
> > compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com
> > 
> > protocol 88fb is buggy, dev hsr_slave_1
> > protocol 88fb is buggy, dev hsr_slave_0
> > protocol 88fb is buggy, dev hsr_slave_1
> > FAT-fs (loop0): invalid media value (0x00)
> > FAT-fs (loop0): Can't find a valid FAT filesystem
> > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 
> > udp4_lib_lookup2+0x7ea/0x7f0 net/ipv4/udp.c:455
> > CPU: 1 PID: 17960 Comm: syz-executor2 Not tainted 4.20.0+ #176
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> > Google 01/01/2011
> > Call Trace:
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> > 
> > 
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkal...@googlegroups.com.
> > 
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with 
> > syzbot.  
> 
> Maybe commit 11789039da536fea96c98a40c2b441decf2e7323
> Author: Stefano Brivio 
> Date:   Tue Dec 18 00:13:17 2018 +0100
> 
> fou: Prevent unbounded recursion in GUE error handler
> 
> Forgot to deal with IPv6 ?

Damn, yes. :( Thanks both for pointing that out, patch coming.

Still, I can't be sure this is the same issue.

-- 
Stefano


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-03 Thread Eric Dumazet



On 01/03/2019 05:07 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com
> 
> protocol 88fb is buggy, dev hsr_slave_1
> protocol 88fb is buggy, dev hsr_slave_0
> protocol 88fb is buggy, dev hsr_slave_1
> FAT-fs (loop0): invalid media value (0x00)
> FAT-fs (loop0): Can't find a valid FAT filesystem
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 
> udp4_lib_lookup2+0x7ea/0x7f0 net/ipv4/udp.c:455
> CPU: 1 PID: 17960 Comm: syz-executor2 Not tainted 4.20.0+ #176
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.

Maybe commit 11789039da536fea96c98a40c2b441decf2e7323
Author: Stefano Brivio 
Date:   Tue Dec 18 00:13:17 2018 +0100

fou: Prevent unbounded recursion in GUE error handler

Forgot to deal with IPv6 ?



Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-03 Thread Stefano Brivio
Hi Willem,

On Thu, 3 Jan 2019 13:41:43 -0600
Willem de Bruijn  wrote:

> On Thu, Jan 3, 2019 at 1:39 PM Willem de Bruijn
>  wrote:
> >
> > On Thu, Jan 3, 2019 at 7:07 AM syzbot
> >  wrote:  
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of 
> > > git://git.kern..
> > > git tree:   upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > > dashboard link: 
> > > https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0
> > > compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com
> > >
> > > protocol 88fb is buggy, dev hsr_slave_1
> > > protocol 88fb is buggy, dev hsr_slave_0
> > > protocol 88fb is buggy, dev hsr_slave_1
> > > FAT-fs (loop0): invalid media value (0x00)
> > > FAT-fs (loop0): Can't find a valid FAT filesystem
> > > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted 
> > > in:  
> >
> > This sounds similar to the stack corruption fixed recently in commit
> > e7cc082455cb ("udp: Support for error handlers of tunnels ...").
> >
> > That fix is for ipv4 gue_err(). ipv6 gue6_err() probably needs the same.  
> 
> Correction. The fix is 11789039da ("fou: prevent unbounded recursion
> in GUE error handler")

Yes, I looked into this, the fix for that issue is on the tree tested by
syzbot, and I think this is unrelated, also because KASan should say
something before we hit that.

By the way, do you happen to know if I objects from kernels tested by
syzbot are stored anywhere? It would be helpful to know for sure what's
at udp4_lib_lookup2+0x7ea.

-- 
Stefano


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-03 Thread Willem de Bruijn
On Thu, Jan 3, 2019 at 1:39 PM Willem de Bruijn
 wrote:
>
> On Thu, Jan 3, 2019 at 7:07 AM syzbot
>  wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
> > git tree:   upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0
> > compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com
> >
> > protocol 88fb is buggy, dev hsr_slave_1
> > protocol 88fb is buggy, dev hsr_slave_0
> > protocol 88fb is buggy, dev hsr_slave_1
> > FAT-fs (loop0): invalid media value (0x00)
> > FAT-fs (loop0): Can't find a valid FAT filesystem
> > Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
>
> This sounds similar to the stack corruption fixed recently in commit
> e7cc082455cb ("udp: Support for error handlers of tunnels ...").
>
> That fix is for ipv4 gue_err(). ipv6 gue6_err() probably needs the same.

Correction. The fix is 11789039da ("fou: prevent unbounded recursion
in GUE error handler")


Re: kernel panic: stack is corrupted in udp4_lib_lookup2

2019-01-03 Thread Willem de Bruijn
On Thu, Jan 3, 2019 at 7:07 AM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12245d8f40
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=4ad25edc7a33e4ab91e0
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4ad25edc7a33e4ab9...@syzkaller.appspotmail.com
>
> protocol 88fb is buggy, dev hsr_slave_1
> protocol 88fb is buggy, dev hsr_slave_0
> protocol 88fb is buggy, dev hsr_slave_1
> FAT-fs (loop0): invalid media value (0x00)
> FAT-fs (loop0): Can't find a valid FAT filesystem
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:

This sounds similar to the stack corruption fixed recently in commit
e7cc082455cb ("udp: Support for error handlers of tunnels ...").

That fix is for ipv4 gue_err(). ipv6 gue6_err() probably needs the same.