Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-20 Thread Dmitry Vyukov
On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa
 wrote:
>
> On 2019/03/14 8:40, Eric Biggers wrote:
> > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via 
> > syzkaller-bugs wrote:
> >>> Also, humans can sometimes find more simpler C reproducers from syzbot 
> >>> provided
> >>> reproducers. It would be nice if syzbot can accept and use a user defined 
> >>> C
> >>> reproducer for testing.
> >>
> >> It would be more useful to accept patches that make syzkaller create
> >> better reproducers from these people. Manual work is not scalable. We
> >> would need 10 reproducers per day for a dozen of OSes (incl some
> >> private kernels/branches). Anybody is free to run syzkaller manually
> >> and do full manual (perfect) reporting. But for us it become clear
> >> very early that it won't work. Then see above, while that human is
> >> sleeping/on weekend/vacation, syzbot will already bisect own
> >> reproducer. Adding manual reproducer later won't help in any way.
> >> syzkaller already does lots of smart work for reproducers. Let's not
> >> give up on the last mile and switch back to all manual work.
> >>
> >
> > Well, it's very tough and not many people are familiar with the syzkaller
> > codebase, let alone have time to contribute.
>
> Right. I don't read/write go programs. I don't have access to environments
> for running syzbot. But instead I try to write kernel patches.
>
> Also, although anybody is free to do full manual (perfect) reporting,
> I can't afford checking such reports posted to e.g. LKML. I can afford
> checking only https://syzkaller.appspot.com/ .
>
> I have seen a Japanese article which explains how to run syzbot. But I felt 
> that
> that article lacks what to do if syzbot found a bug. If people found a crash
> by running syzbot in their environments, it would be nice if they can export
> the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
> acts as if a bugzilla).


Problem 1 (smaller). Neither providing custom program nor manually
specifying bisection range (as you suggested in another thread
https://groups.google.com/d/msg/syzkaller-bugs/nFeC8-UG1gg/1OTVIuzBAgAJ)
won't make kernel bug bisection reliable. The problems with kernel
bisection are deeper. Consider a bug that is inherently hard to
trigger, even if one provides own reproducer it's still hard to
trigger and bisection can diverge. What happened in the other bug:
bisection diverged because the reproducer triggered another bug. Now
consider that this happens within the bisection range. Even if you
give own range, it won't help. And there are lots of other problems
like, say, large ranges where kernel build is broken.
And this will introduce own problems: e.g. it's very easy to give
syzbot a reproducer that actually don't not trigger the bug for it
(because you can't match its environment precisely).
Also: if you can't bisect locally and can't test, how do you know the
right range generally? Again that one bug was a single corner case.
Also: semi-manual process will also lead to some suboptimal results,
and then other kernel developers will come and ask questions and
somebody will need to answer these questions. But in this case syzbot
is not even accountable for what happened.
I don't think there is a simple substitution for a qualified engineer
doing its job (guiding each step of bisection manually).
It's possible to imagine a very complex workflow (super hard to
implement, test and maintain too) that will allow to do that. And it
becomes mostly offloading build/boot/test of a given configuration to
the cloud. And this brings this us to the second problem.

Problem 2. What you are proposing effectively looks like some kind of
custom workload offloading service for kernel developers. Just instead
of console commands (raw cloud VMs) it has somewhat higher level
interface (e.g. here is kernel config, compiler, command line,
sysctls, machine configuration and test case, go build and test it).
I don't think this should be bolted on top of syzbot.
Developing and running syzbot is already a _huge_ amount of work
(frequently ungrateful). I simply cannot take on developing, testing,
deploying, maintaining and operating another service. And that service
will involve much more complex human interactions, so will be much
more complex overall.
If such service is provided I think it needs to run on Linux
Foundation infrastructure that runs CI and other testing. Yes, I know,
it does not exist. But that would be the right place. It would benefit
work on all other kernel bugs too. Lots of things people attribute to
syzbot are really not specific to syzbot in anyway. For example that
service would help with bisection of all other bugs too. And it seems
that a much simpler solution would be just to provide free VMs for
developers, because you main points seems to be "I would like to do
something custom, but I don't have resources for that". This is out of
scope for syzbot.
The current syzbot scope is: automating as 

Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-20 Thread Dmitry Vyukov
On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa
 wrote:
>
> On 2019/03/14 8:40, Eric Biggers wrote:
> > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via 
> > syzkaller-bugs wrote:
> >>> Also, humans can sometimes find more simpler C reproducers from syzbot 
> >>> provided
> >>> reproducers. It would be nice if syzbot can accept and use a user defined 
> >>> C
> >>> reproducer for testing.
> >>
> >> It would be more useful to accept patches that make syzkaller create
> >> better reproducers from these people. Manual work is not scalable. We
> >> would need 10 reproducers per day for a dozen of OSes (incl some
> >> private kernels/branches). Anybody is free to run syzkaller manually
> >> and do full manual (perfect) reporting. But for us it become clear
> >> very early that it won't work. Then see above, while that human is
> >> sleeping/on weekend/vacation, syzbot will already bisect own
> >> reproducer. Adding manual reproducer later won't help in any way.
> >> syzkaller already does lots of smart work for reproducers. Let's not
> >> give up on the last mile and switch back to all manual work.
> >>
> >
> > Well, it's very tough and not many people are familiar with the syzkaller
> > codebase, let alone have time to contribute.
>
> Right. I don't read/write go programs. I don't have access to environments
> for running syzbot. But instead I try to write kernel patches.
>
> Also, although anybody is free to do full manual (perfect) reporting,
> I can't afford checking such reports posted to e.g. LKML. I can afford
> checking only https://syzkaller.appspot.com/ .
>
> I have seen a Japanese article which explains how to run syzbot. But I felt 
> that
> that article lacks what to do if syzbot found a bug. If people found a crash
> by running syzbot in their environments, it would be nice if they can export
> the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
> acts as if a bugzilla).
>
> >   But having simplified a lot of
> > the syzkaller reproducers manually, the main things I do are:
>
> Yes. I'm doing similar things. Other things not listed here are:
>
>   Try to remove syscall() which passes EOF as fd argument, for it should be
>   unrelated to the problem unless such call affects subtle timing.
>
>   Try to remove code for testing fuse / tun etc. if the problem seems to be
>   unrelated to fuse / tun etc.
>
> syzbot gets pleased with finding one C reproducer, but I wish that syzbot
> continues trying to find smaller C reproducers by e.g. eliminating unrelated
> calls.
>
> >
> > - Replace bare system calls with proper C library calls.  For example:
> >
> >   #include 
> >
> >   syscall(__NR_socket, 0xa, 6, 0);
> >
> > becomes:
> >
> >   #include 
> >
> > socket(AF_INET, SOCK_DCCP, 0);
>
> Yes. It would be nice if C reproducers are provided using symbols. I run
> syzbot provided C reproducers under strace because strace gives me more hints
> about symbols and structures.


I will answer re reproducers first.

Thanks for the suggestions, I've filed
https://github.com/google/syzkaller/issues/1070 for this. Lots of them
are implementable within the current framework. Things on kernel
mailing lists (suggestions, bug reports, patches) get lost very
quickly.

As far as I see most of them are related to cosmetics (not saying that
it's not useful, but just won't affect bisection results).
>From my experience the most powerful simplifications are possible only
when I have already root caused the bug and understand its mechanics.
Then it's possible to reorder syscalls, remove all/most of threading,
etc. But in that case bisection is not so useful already.

Some of them can alter the load on kernel (e.g. using libc structs and
syscall wrappers) which can lead to triggering of a different bug...


Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-14 Thread Tetsuo Handa
On 2019/03/14 8:40, Eric Biggers wrote:
> On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs 
> wrote:
>>> Also, humans can sometimes find more simpler C reproducers from syzbot 
>>> provided
>>> reproducers. It would be nice if syzbot can accept and use a user defined C
>>> reproducer for testing.
>>
>> It would be more useful to accept patches that make syzkaller create
>> better reproducers from these people. Manual work is not scalable. We
>> would need 10 reproducers per day for a dozen of OSes (incl some
>> private kernels/branches). Anybody is free to run syzkaller manually
>> and do full manual (perfect) reporting. But for us it become clear
>> very early that it won't work. Then see above, while that human is
>> sleeping/on weekend/vacation, syzbot will already bisect own
>> reproducer. Adding manual reproducer later won't help in any way.
>> syzkaller already does lots of smart work for reproducers. Let's not
>> give up on the last mile and switch back to all manual work.
>>
> 
> Well, it's very tough and not many people are familiar with the syzkaller
> codebase, let alone have time to contribute.

Right. I don't read/write go programs. I don't have access to environments
for running syzbot. But instead I try to write kernel patches.

Also, although anybody is free to do full manual (perfect) reporting,
I can't afford checking such reports posted to e.g. LKML. I can afford
checking only https://syzkaller.appspot.com/ .

I have seen a Japanese article which explains how to run syzbot. But I felt that
that article lacks what to do if syzbot found a bug. If people found a crash
by running syzbot in their environments, it would be nice if they can export
the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
acts as if a bugzilla).

>   But having simplified a lot of
> the syzkaller reproducers manually, the main things I do are:

Yes. I'm doing similar things. Other things not listed here are:

  Try to remove syscall() which passes EOF as fd argument, for it should be
  unrelated to the problem unless such call affects subtle timing.

  Try to remove code for testing fuse / tun etc. if the problem seems to be
  unrelated to fuse / tun etc.

syzbot gets pleased with finding one C reproducer, but I wish that syzbot
continues trying to find smaller C reproducers by e.g. eliminating unrelated
calls.

> 
> - Replace bare system calls with proper C library calls.  For example:
> 
>   #include 
> 
>   syscall(__NR_socket, 0xa, 6, 0);
> 
> becomes:
> 
>   #include 
> 
> socket(AF_INET, SOCK_DCCP, 0); 

Yes. It would be nice if C reproducers are provided using symbols. I run
syzbot provided C reproducers under strace because strace gives me more hints
about symbols and structures.



Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-13 Thread Eric Biggers
On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs 
wrote:
> > Also, humans can sometimes find more simpler C reproducers from syzbot 
> > provided
> > reproducers. It would be nice if syzbot can accept and use a user defined C
> > reproducer for testing.
> 
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.
> 

Well, it's very tough and not many people are familiar with the syzkaller
codebase, let alone have time to contribute.  But having simplified a lot of
the syzkaller reproducers manually, the main things I do are:

- Replace bare system calls with proper C library calls.  For example:

#include 

syscall(__NR_socket, 0xa, 6, 0);

becomes:

#include 

socket(AF_INET, SOCK_DCCP, 0); 

- Do the same for structs.  Use the appropriate C header rather than filling in
  each struct manually.  For example:

*(uint16_t*)0x2000 = 0xa;
*(uint16_t*)0x2002 = htobe16(0x4e20);
*(uint32_t*)0x2004 = 0;
*(uint8_t*)0x2008 = 0;
*(uint8_t*)0x2009 = 0;
*(uint8_t*)0x200a = 0;
*(uint8_t*)0x200b = 0;
*(uint8_t*)0x200c = 0;
*(uint8_t*)0x200d = 0;
*(uint8_t*)0x200e = 0;
*(uint8_t*)0x200f = 0;
*(uint8_t*)0x2010 = 0;
*(uint8_t*)0x2011 = 0;
*(uint8_t*)0x2012 = 0;
*(uint8_t*)0x2013 = 0;
*(uint8_t*)0x2014 = 0;
*(uint8_t*)0x2015 = 0;
*(uint8_t*)0x2016 = 0;
*(uint8_t*)0x2017 = 0;
*(uint32_t*)0x2018 = 0;

becomes:

struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_port = 
htobe16(0x4e20) };

- Put arguments on the stack rather than in a mmap'd region, if possible.

- Simplify any calls to the helper functions that syzkaller emits, e.g.
  syz_open_dev(), syz_kvm_setup_vcpu(), or the networking setup stuff.  Usually
  the reproducer needs a small subset of the functionality to work.

- For multithreaded reproducers, try to incrementally simplify the threading
  strategy.  For example, reduce the number of threads by combining operations.
  Also try running the operations in loops.  Also, using fork() can often result
  in a simpler reproducer than pthreads.

- Instead of using the 'r[]' array to hold all integer return values, give them
  appropriate names.

- Remove duplicate #includes.

- Considering the actual kernel code and the bug, if possible find a different
  way to trigger the same bug that's simpler or more reliable.  If the problem
  is obvious it may be possible to jump right to this step from the beginning.

Some gotchas:

- fault-nth injections are fragile, since the number of memory allocations in a
  particular system call varies by kernel config and kernel version.
  Incrementing n starting from 1 is more reliable.

- Some of the perf_event_open() reproducers are fragile because they hardcode a
  trace event ID, which can change in every kernel version.  Reading the trace
  event ID from /sys/kernel/debug/tracing/events/ is more reliable.

- Reproducers using the KVM API sometimes only work on certain processors (e.g.
  Intel but not AMD) or even depend on the host kernel.

- Reproducers that access the local filesystem sometimes assume that it's ext4.


Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-13 Thread Dmitry Vyukov
On Wed, Mar 13, 2019 at 5:37 PM Theodore Ts'o  wrote:
>
> On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote:
> > It would be more useful to accept patches that make syzkaller create
> > better reproducers from these people. Manual work is not scalable. We
> > would need 10 reproducers per day for a dozen of OSes (incl some
> > private kernels/branches). Anybody is free to run syzkaller manually
> > and do full manual (perfect) reporting. But for us it become clear
> > very early that it won't work. Then see above, while that human is
> > sleeping/on weekend/vacation, syzbot will already bisect own
> > reproducer. Adding manual reproducer later won't help in any way.
> > syzkaller already does lots of smart work for reproducers. Let's not
> > give up on the last mile and switch back to all manual work.
>
> I suspect a scalable solution that would significantly improve things
> is one where Syzbot tries N times for a "good" result to make sure
> it's not a flaky pass.  N could either be hard-coded to some value
> like 8 or 10, or Syzbot could experimentally try to figure out how
> reliable the reproducer happens to be, and figure out what an ideal
> "N" value should be for a particular reproducer.

It currently tries 8 times, see e.g.:
https://syzkaller.appspot.com/text?tag=Log=13354d9d20


Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-13 Thread Theodore Ts'o
On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote:
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.

I suspect a scalable solution that would significantly improve things
is one where Syzbot tries N times for a "good" result to make sure
it's not a flaky pass.  N could either be hard-coded to some value
like 8 or 10, or Syzbot could experimentally try to figure out how
reliable the reproducer happens to be, and figure out what an ideal
"N" value should be for a particular reproducer.

- Ted


Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-13 Thread Dmitry Vyukov
On Tue, Mar 12, 2019 at 10:11 PM Tetsuo Handa
 wrote:
>
> On 2019/03/13 2:15, Dmitry Vyukov wrote:
> >> Also, this bisection is finding multiple different crash patterns, which
> >> suggests that the crashed tests are not giving correct feedback to syzbot.
> >
> > Treating different crashes as just "crash" is intended. Kernel bugs
> > can manifest in very different ways.
> > Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
> > in parallel" in  https://syzkaller.appspot.com/?fixed=upstream
> > It lead to 50+ different failure modes.
> >
>
> But syzbot already found a rather simple C reproducer
> ( https://syzkaller.appspot.com/text?tag=ReproC=116fc7a8c0 ) for this 
> bug.
> Was this reproducer used for bisection?

The C reproducer used for bisection is provided as "C reproducer" in
the bisection report.

> I guess that if this reproducer was used,
> syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases.

Maybe. But we won't have more than 1 in future. Currently syzbot
bisects over a backlog of crashes, some of them accumulated multiple
reproducers over weeks/months/years. When it will bisect newly
reported bugs as they are found, there will be only 1 reproducer. E.g.
these two for this bug were found within a month.

> Also, humans can sometimes find more simpler C reproducers from syzbot 
> provided
> reproducers. It would be nice if syzbot can accept and use a user defined C
> reproducer for testing.

It would be more useful to accept patches that make syzkaller create
better reproducers from these people. Manual work is not scalable. We
would need 10 reproducers per day for a dozen of OSes (incl some
private kernels/branches). Anybody is free to run syzkaller manually
and do full manual (perfect) reporting. But for us it become clear
very early that it won't work. Then see above, while that human is
sleeping/on weekend/vacation, syzbot will already bisect own
reproducer. Adding manual reproducer later won't help in any way.
syzkaller already does lots of smart work for reproducers. Let's not
give up on the last mile and switch back to all manual work.


Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-12 Thread Tetsuo Handa
On 2019/03/13 2:15, Dmitry Vyukov wrote:
>> Also, this bisection is finding multiple different crash patterns, which
>> suggests that the crashed tests are not giving correct feedback to syzbot.
> 
> Treating different crashes as just "crash" is intended. Kernel bugs
> can manifest in very different ways.
> Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
> in parallel" in  https://syzkaller.appspot.com/?fixed=upstream
> It lead to 50+ different failure modes.
> 

But syzbot already found a rather simple C reproducer
( https://syzkaller.appspot.com/text?tag=ReproC=116fc7a8c0 ) for this bug.
Was this reproducer used for bisection? I guess that if this reproducer was 
used,
syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases.

Also, humans can sometimes find more simpler C reproducers from syzbot provided
reproducers. It would be nice if syzbot can accept and use a user defined C
reproducer for testing.



Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-12 Thread Dmitry Vyukov
On Tue, Mar 12, 2019 at 3:30 PM Tetsuo Handa
 wrote:
>
> (Moving most recipients to bcc: in order to avoid flooding.)
>
> On 2019/03/12 13:08, Al Viro wrote:
> > Umm...  Might be a good idea to add some plausibility filters - it is,
> > in theory, possible that adding a line in a comment changes behaviour
> > (without compiler bugs, even - playing with __LINE__ is all it would
> > take), but the odds that it's _not_ a false positive are very low.
>
> Well, 108 out of 168 tests done during this bisection failed to test.
> With such high failure ratio, it is possible that by chance no crash
> happened during few tests for specific commit; causing a wrong bisection
> result. I expect that when trying to conclude "git bisect good" for
> specific commit, the tests should be repeated until no crash happened
> during 8 successful tests.

Added to https://github.com/google/syzkaller/issues/1051:

Tetsuo points out that if lots (say, 7/8) tests failed with infra
problems, then we should retry/skip or something. This zeroes the
effect of having multiple independent tests.

Thanks.

> Also, this bisection is finding multiple different crash patterns, which
> suggests that the crashed tests are not giving correct feedback to syzbot.

Treating different crashes as just "crash" is intended. Kernel bugs
can manifest in very different ways.
Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
in parallel" in  https://syzkaller.appspot.com/?fixed=upstream
It lead to 50+ different failure modes.

> $ grep -F 'run #' bisect.txt\?x\=1322028320 | wc -l
> 168
> $ grep -F 'Connection timed out' bisect.txt\?x\=1322028320 | wc -l
> 108
> $ grep -F 'crashed' bisect.txt\?x\=1322028320
> run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
> run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in sys_sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #4: crashed: INFO: rcu detected stall in sys_sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in sendfile64
> run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #3: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in do_iter_write


Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-03-12 Thread Tetsuo Handa
(Moving most recipients to bcc: in order to avoid flooding.)

On 2019/03/12 13:08, Al Viro wrote:
> Umm...  Might be a good idea to add some plausibility filters - it is,
> in theory, possible that adding a line in a comment changes behaviour
> (without compiler bugs, even - playing with __LINE__ is all it would
> take), but the odds that it's _not_ a false positive are very low.

Well, 108 out of 168 tests done during this bisection failed to test.
With such high failure ratio, it is possible that by chance no crash
happened during few tests for specific commit; causing a wrong bisection
result. I expect that when trying to conclude "git bisect good" for
specific commit, the tests should be repeated until no crash happened
during 8 successful tests.

Also, this bisection is finding multiple different crash patterns, which
suggests that the crashed tests are not giving correct feedback to syzbot.

$ grep -F 'run #' bisect.txt\?x\=1322028320 | wc -l
168
$ grep -F 'Connection timed out' bisect.txt\?x\=1322028320 | wc -l
108
$ grep -F 'crashed' bisect.txt\?x\=1322028320
run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in sys_sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in sys_sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in sendfile64
run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in do_iter_write


Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-01-19 Thread Tetsuo Handa
On 2019/01/19 20:41, Dmitry Vyukov wrote:
> On Sat, Jan 19, 2019 at 12:32 PM syzbot
>  wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel...
>> git tree:   upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c0
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
>> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
>> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c0
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c0
> 
> Looking at the reproducer it looks like something with scheduler as it
> involves perf_event_open and sched_setattr. So +Peter and Mingo.
> Is it the same root cause as the other stalls involving sched_setattr?

Yes. I think sched_setattr() involves this problem.

Reproducers from "BUG: workqueue lockup (4)" at
https://syzkaller.appspot.com/text?tag=ReproC=13ec31a540
involves sched_setattr(SCHED_DEADLINE) and
https://syzkaller.appspot.com/text?tag=ReproC=104da690c0
also involves sched_setattr().



Re: INFO: rcu detected stall in sys_sendfile64 (2)

2019-01-19 Thread Dmitry Vyukov
On Sat, Jan 19, 2019 at 12:32 PM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel...
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c0

Looking at the reproducer it looks like something with scheduler as it
involves perf_event_open and sched_setattr. So +Peter and Mingo.
Is it the same root cause as the other stalls involving sched_setattr?

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+1505c80c74256c611...@syzkaller.appspotmail.com
>
> audit: type=1400 audit(1547895693.874:36): avc:  denied  { map } for
> pid=8427 comm="syz-executor786" path="/root/syz-executor786610373"
> dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> hrtimer: interrupt took 42996 ns
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:(detected by 0, t=10502 jiffies, g=6469, q=3)
> rcu: All QSes seen, last rcu_preempt kthread activity 10502
> (4295051508-4295041006), jiffies_till_next_fqs=1, root ->qsmask 0x0
> syz-executor786 R  running task21544  8437   8433 0x
> Call Trace:
>   
>   sched_show_task kernel/sched/core.c:5293 [inline]
>   sched_show_task.cold+0x273/0x2d5 kernel/sched/core.c:5268
>   print_other_cpu_stall.cold+0x7f2/0x8bb kernel/rcu/tree.c:1301
>   check_cpu_stall kernel/rcu/tree.c:1429 [inline]
>   rcu_pending kernel/rcu/tree.c:3018 [inline]
>   rcu_check_callbacks+0xf36/0x1380 kernel/rcu/tree.c:2521
>   update_process_times+0x32/0x80 kernel/time/timer.c:1635
>   tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
>   tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
>   __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
>   __hrtimer_run_queues+0x3a7/0x1050 kernel/time/hrtimer.c:1451
>   hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
>   smp_apic_timer_interrupt+0x18d/0x760 arch/x86/kernel/apic/apic.c:1060
>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
>   
> RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
> RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
> RIP: 0010:write_comp_data+0x9/0x70 kernel/kcov.c:122
> Code: 12 00 00 8b 80 dc 12 00 00 48 8b 11 48 83 c2 01 48 39 d0 76 07 48 89
> 34 d1 48 89 11 5d c3 0f 1f 00 65 4c 8b 04 25 40 ee 01 00 <65> 8b 05 80 ee
> 7f 7e a9 00 01 1f 00 75 51 41 8b 80 d8 12 00 00 83
> RSP: 0018:888080466f58 EFLAGS: 0246 ORIG_RAX: ff13
> RAX: 226d RBX: 888080467140 RCX: 87f08bdf
> RDX: 0002 RSI: 0002 RDI: 0007
> RBP: 888080466f60 R08: 888096338480 R09: ed1015cc5b90
> R10: ed1015cc5b8f R11: 8880ae62dc7b R12: 11101008cdf0
> R13: 888092b64102 R14: 888092b64102 R15: 888080467158
>   xa_is_node include/linux/xarray.h:946 [inline]
>   xas_start+0x1cf/0x720 lib/xarray.c:183
>   xas_load+0x21/0x160 lib/xarray.c:227
>   find_get_entry+0x350/0x10a0 mm/filemap.c:1476
>   pagecache_get_page+0xe6/0x1020 mm/filemap.c:1579
>   find_get_page include/linux/pagemap.h:272 [inline]
>   generic_file_buffered_read mm/filemap.c:2076 [inline]
>   generic_file_read_iter+0x7b2/0x2d40 mm/filemap.c:2350
>   ext4_file_read_iter+0x180/0x3c0 fs/ext4/file.c:77
>   call_read_iter include/linux/fs.h:1856 [inline]
>   generic_file_splice_read+0x5c4/0xa90 fs/splice.c:308
>   do_splice_to+0x12a/0x190 fs/splice.c:880
>   splice_direct_to_actor+0x31b/0x9d0 fs/splice.c:957
>   do_splice_direct+0x2c7/0x420 fs/splice.c:1066
>   do_sendfile+0x61a/0xe60 fs/read_write.c:1436
>   __do_sys_sendfile64 fs/read_write.c:1491 [inline]
>   __se_sys_sendfile64 fs/read_write.c:1483 [inline]
>   __x64_sys_sendfile64+0x15a/0x240 fs/read_write.c:1483
>   do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x446a19
> Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 4b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7f53e33cfda8 EFLAGS: 0297 ORIG_RAX: 0028
> RAX: ffda RBX: 006dcc28 RCX: 00446a19
> RDX: 2380 RSI: 0003 RDI: 0003
> RBP: 006dcc20 R08:  R09: 
> R10: 8080fffe R11: 0297 R12: 006dcc2c
> R13: 

INFO: rcu detected stall in sys_sendfile64 (2)

2019-01-19 Thread syzbot

Hello,

syzbot found the following crash on:

HEAD commit:2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c0
kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c0

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1505c80c74256c611...@syzkaller.appspotmail.com

audit: type=1400 audit(1547895693.874:36): avc:  denied  { map } for   
pid=8427 comm="syz-executor786" path="/root/syz-executor786610373"  
dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023  
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1

hrtimer: interrupt took 42996 ns
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:(detected by 0, t=10502 jiffies, g=6469, q=3)
rcu: All QSes seen, last rcu_preempt kthread activity 10502  
(4295051508-4295041006), jiffies_till_next_fqs=1, root ->qsmask 0x0

syz-executor786 R  running task21544  8437   8433 0x
Call Trace:
 
 sched_show_task kernel/sched/core.c:5293 [inline]
 sched_show_task.cold+0x273/0x2d5 kernel/sched/core.c:5268
 print_other_cpu_stall.cold+0x7f2/0x8bb kernel/rcu/tree.c:1301
 check_cpu_stall kernel/rcu/tree.c:1429 [inline]
 rcu_pending kernel/rcu/tree.c:3018 [inline]
 rcu_check_callbacks+0xf36/0x1380 kernel/rcu/tree.c:2521
 update_process_times+0x32/0x80 kernel/time/timer.c:1635
 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
 __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
 __hrtimer_run_queues+0x3a7/0x1050 kernel/time/hrtimer.c:1451
 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
 smp_apic_timer_interrupt+0x18d/0x760 arch/x86/kernel/apic/apic.c:1060
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
 
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
RIP: 0010:write_comp_data+0x9/0x70 kernel/kcov.c:122
Code: 12 00 00 8b 80 dc 12 00 00 48 8b 11 48 83 c2 01 48 39 d0 76 07 48 89  
34 d1 48 89 11 5d c3 0f 1f 00 65 4c 8b 04 25 40 ee 01 00 <65> 8b 05 80 ee  
7f 7e a9 00 01 1f 00 75 51 41 8b 80 d8 12 00 00 83

RSP: 0018:888080466f58 EFLAGS: 0246 ORIG_RAX: ff13
RAX: 226d RBX: 888080467140 RCX: 87f08bdf
RDX: 0002 RSI: 0002 RDI: 0007
RBP: 888080466f60 R08: 888096338480 R09: ed1015cc5b90
R10: ed1015cc5b8f R11: 8880ae62dc7b R12: 11101008cdf0
R13: 888092b64102 R14: 888092b64102 R15: 888080467158
 xa_is_node include/linux/xarray.h:946 [inline]
 xas_start+0x1cf/0x720 lib/xarray.c:183
 xas_load+0x21/0x160 lib/xarray.c:227
 find_get_entry+0x350/0x10a0 mm/filemap.c:1476
 pagecache_get_page+0xe6/0x1020 mm/filemap.c:1579
 find_get_page include/linux/pagemap.h:272 [inline]
 generic_file_buffered_read mm/filemap.c:2076 [inline]
 generic_file_read_iter+0x7b2/0x2d40 mm/filemap.c:2350
 ext4_file_read_iter+0x180/0x3c0 fs/ext4/file.c:77
 call_read_iter include/linux/fs.h:1856 [inline]
 generic_file_splice_read+0x5c4/0xa90 fs/splice.c:308
 do_splice_to+0x12a/0x190 fs/splice.c:880
 splice_direct_to_actor+0x31b/0x9d0 fs/splice.c:957
 do_splice_direct+0x2c7/0x420 fs/splice.c:1066
 do_sendfile+0x61a/0xe60 fs/read_write.c:1436
 __do_sys_sendfile64 fs/read_write.c:1491 [inline]
 __se_sys_sendfile64 fs/read_write.c:1483 [inline]
 __x64_sys_sendfile64+0x15a/0x240 fs/read_write.c:1483
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446a19
Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 4b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7f53e33cfda8 EFLAGS: 0297 ORIG_RAX: 0028
RAX: ffda RBX: 006dcc28 RCX: 00446a19
RDX: 2380 RSI: 0003 RDI: 0003
RBP: 006dcc20 R08:  R09: 
R10: 8080fffe R11: 0297 R12: 006dcc2c
R13: 8080fffe R14: 7f53e33d09c0 R15: 006dcd2c
rcu: rcu_preempt kthread starved for 10502 jiffies! g6469 f0x2  
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1

rcu: RCU grace-period kthread stack dump:
rcu_preempt R  running task2620010  2 0x8000
Call Trace:
 context_switch kernel/sched/core.c:2831 [inline]
 __schedule+0x897/0x1e60 kernel/sched/core.c:3472
 schedule+0xfe/0x350 kernel/sched/core.c:3516