Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa wrote: > > On 2019/03/14 8:40, Eric Biggers wrote: > > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via > > syzkaller-bugs wrote: > >>> Also, humans can sometimes find more simpler C reproducers from syzbot > >>> provided > >>> reproducers. It would be nice if syzbot can accept and use a user defined > >>> C > >>> reproducer for testing. > >> > >> It would be more useful to accept patches that make syzkaller create > >> better reproducers from these people. Manual work is not scalable. We > >> would need 10 reproducers per day for a dozen of OSes (incl some > >> private kernels/branches). Anybody is free to run syzkaller manually > >> and do full manual (perfect) reporting. But for us it become clear > >> very early that it won't work. Then see above, while that human is > >> sleeping/on weekend/vacation, syzbot will already bisect own > >> reproducer. Adding manual reproducer later won't help in any way. > >> syzkaller already does lots of smart work for reproducers. Let's not > >> give up on the last mile and switch back to all manual work. > >> > > > > Well, it's very tough and not many people are familiar with the syzkaller > > codebase, let alone have time to contribute. > > Right. I don't read/write go programs. I don't have access to environments > for running syzbot. But instead I try to write kernel patches. > > Also, although anybody is free to do full manual (perfect) reporting, > I can't afford checking such reports posted to e.g. LKML. I can afford > checking only https://syzkaller.appspot.com/ . > > I have seen a Japanese article which explains how to run syzbot. But I felt > that > that article lacks what to do if syzbot found a bug. If people found a crash > by running syzbot in their environments, it would be nice if they can export > the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard > acts as if a bugzilla). Problem 1 (smaller). Neither providing custom program nor manually specifying bisection range (as you suggested in another thread https://groups.google.com/d/msg/syzkaller-bugs/nFeC8-UG1gg/1OTVIuzBAgAJ) won't make kernel bug bisection reliable. The problems with kernel bisection are deeper. Consider a bug that is inherently hard to trigger, even if one provides own reproducer it's still hard to trigger and bisection can diverge. What happened in the other bug: bisection diverged because the reproducer triggered another bug. Now consider that this happens within the bisection range. Even if you give own range, it won't help. And there are lots of other problems like, say, large ranges where kernel build is broken. And this will introduce own problems: e.g. it's very easy to give syzbot a reproducer that actually don't not trigger the bug for it (because you can't match its environment precisely). Also: if you can't bisect locally and can't test, how do you know the right range generally? Again that one bug was a single corner case. Also: semi-manual process will also lead to some suboptimal results, and then other kernel developers will come and ask questions and somebody will need to answer these questions. But in this case syzbot is not even accountable for what happened. I don't think there is a simple substitution for a qualified engineer doing its job (guiding each step of bisection manually). It's possible to imagine a very complex workflow (super hard to implement, test and maintain too) that will allow to do that. And it becomes mostly offloading build/boot/test of a given configuration to the cloud. And this brings this us to the second problem. Problem 2. What you are proposing effectively looks like some kind of custom workload offloading service for kernel developers. Just instead of console commands (raw cloud VMs) it has somewhat higher level interface (e.g. here is kernel config, compiler, command line, sysctls, machine configuration and test case, go build and test it). I don't think this should be bolted on top of syzbot. Developing and running syzbot is already a _huge_ amount of work (frequently ungrateful). I simply cannot take on developing, testing, deploying, maintaining and operating another service. And that service will involve much more complex human interactions, so will be much more complex overall. If such service is provided I think it needs to run on Linux Foundation infrastructure that runs CI and other testing. Yes, I know, it does not exist. But that would be the right place. It would benefit work on all other kernel bugs too. Lots of things people attribute to syzbot are really not specific to syzbot in anyway. For example that service would help with bisection of all other bugs too. And it seems that a much simpler solution would be just to provide free VMs for developers, because you main points seems to be "I would like to do something custom, but I don't have resources for that". This is out of scope for syzbot. The current syzbot scope is: automating as
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa wrote: > > On 2019/03/14 8:40, Eric Biggers wrote: > > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via > > syzkaller-bugs wrote: > >>> Also, humans can sometimes find more simpler C reproducers from syzbot > >>> provided > >>> reproducers. It would be nice if syzbot can accept and use a user defined > >>> C > >>> reproducer for testing. > >> > >> It would be more useful to accept patches that make syzkaller create > >> better reproducers from these people. Manual work is not scalable. We > >> would need 10 reproducers per day for a dozen of OSes (incl some > >> private kernels/branches). Anybody is free to run syzkaller manually > >> and do full manual (perfect) reporting. But for us it become clear > >> very early that it won't work. Then see above, while that human is > >> sleeping/on weekend/vacation, syzbot will already bisect own > >> reproducer. Adding manual reproducer later won't help in any way. > >> syzkaller already does lots of smart work for reproducers. Let's not > >> give up on the last mile and switch back to all manual work. > >> > > > > Well, it's very tough and not many people are familiar with the syzkaller > > codebase, let alone have time to contribute. > > Right. I don't read/write go programs. I don't have access to environments > for running syzbot. But instead I try to write kernel patches. > > Also, although anybody is free to do full manual (perfect) reporting, > I can't afford checking such reports posted to e.g. LKML. I can afford > checking only https://syzkaller.appspot.com/ . > > I have seen a Japanese article which explains how to run syzbot. But I felt > that > that article lacks what to do if syzbot found a bug. If people found a crash > by running syzbot in their environments, it would be nice if they can export > the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard > acts as if a bugzilla). > > > But having simplified a lot of > > the syzkaller reproducers manually, the main things I do are: > > Yes. I'm doing similar things. Other things not listed here are: > > Try to remove syscall() which passes EOF as fd argument, for it should be > unrelated to the problem unless such call affects subtle timing. > > Try to remove code for testing fuse / tun etc. if the problem seems to be > unrelated to fuse / tun etc. > > syzbot gets pleased with finding one C reproducer, but I wish that syzbot > continues trying to find smaller C reproducers by e.g. eliminating unrelated > calls. > > > > > - Replace bare system calls with proper C library calls. For example: > > > > #include > > > > syscall(__NR_socket, 0xa, 6, 0); > > > > becomes: > > > > #include > > > > socket(AF_INET, SOCK_DCCP, 0); > > Yes. It would be nice if C reproducers are provided using symbols. I run > syzbot provided C reproducers under strace because strace gives me more hints > about symbols and structures. I will answer re reproducers first. Thanks for the suggestions, I've filed https://github.com/google/syzkaller/issues/1070 for this. Lots of them are implementable within the current framework. Things on kernel mailing lists (suggestions, bug reports, patches) get lost very quickly. As far as I see most of them are related to cosmetics (not saying that it's not useful, but just won't affect bisection results). >From my experience the most powerful simplifications are possible only when I have already root caused the bug and understand its mechanics. Then it's possible to reorder syscalls, remove all/most of threading, etc. But in that case bisection is not so useful already. Some of them can alter the load on kernel (e.g. using libc structs and syscall wrappers) which can lead to triggering of a different bug...
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On 2019/03/14 8:40, Eric Biggers wrote: > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs > wrote: >>> Also, humans can sometimes find more simpler C reproducers from syzbot >>> provided >>> reproducers. It would be nice if syzbot can accept and use a user defined C >>> reproducer for testing. >> >> It would be more useful to accept patches that make syzkaller create >> better reproducers from these people. Manual work is not scalable. We >> would need 10 reproducers per day for a dozen of OSes (incl some >> private kernels/branches). Anybody is free to run syzkaller manually >> and do full manual (perfect) reporting. But for us it become clear >> very early that it won't work. Then see above, while that human is >> sleeping/on weekend/vacation, syzbot will already bisect own >> reproducer. Adding manual reproducer later won't help in any way. >> syzkaller already does lots of smart work for reproducers. Let's not >> give up on the last mile and switch back to all manual work. >> > > Well, it's very tough and not many people are familiar with the syzkaller > codebase, let alone have time to contribute. Right. I don't read/write go programs. I don't have access to environments for running syzbot. But instead I try to write kernel patches. Also, although anybody is free to do full manual (perfect) reporting, I can't afford checking such reports posted to e.g. LKML. I can afford checking only https://syzkaller.appspot.com/ . I have seen a Japanese article which explains how to run syzbot. But I felt that that article lacks what to do if syzbot found a bug. If people found a crash by running syzbot in their environments, it would be nice if they can export the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard acts as if a bugzilla). > But having simplified a lot of > the syzkaller reproducers manually, the main things I do are: Yes. I'm doing similar things. Other things not listed here are: Try to remove syscall() which passes EOF as fd argument, for it should be unrelated to the problem unless such call affects subtle timing. Try to remove code for testing fuse / tun etc. if the problem seems to be unrelated to fuse / tun etc. syzbot gets pleased with finding one C reproducer, but I wish that syzbot continues trying to find smaller C reproducers by e.g. eliminating unrelated calls. > > - Replace bare system calls with proper C library calls. For example: > > #include > > syscall(__NR_socket, 0xa, 6, 0); > > becomes: > > #include > > socket(AF_INET, SOCK_DCCP, 0); Yes. It would be nice if C reproducers are provided using symbols. I run syzbot provided C reproducers under strace because strace gives me more hints about symbols and structures.
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote: > > Also, humans can sometimes find more simpler C reproducers from syzbot > > provided > > reproducers. It would be nice if syzbot can accept and use a user defined C > > reproducer for testing. > > It would be more useful to accept patches that make syzkaller create > better reproducers from these people. Manual work is not scalable. We > would need 10 reproducers per day for a dozen of OSes (incl some > private kernels/branches). Anybody is free to run syzkaller manually > and do full manual (perfect) reporting. But for us it become clear > very early that it won't work. Then see above, while that human is > sleeping/on weekend/vacation, syzbot will already bisect own > reproducer. Adding manual reproducer later won't help in any way. > syzkaller already does lots of smart work for reproducers. Let's not > give up on the last mile and switch back to all manual work. > Well, it's very tough and not many people are familiar with the syzkaller codebase, let alone have time to contribute. But having simplified a lot of the syzkaller reproducers manually, the main things I do are: - Replace bare system calls with proper C library calls. For example: #include syscall(__NR_socket, 0xa, 6, 0); becomes: #include socket(AF_INET, SOCK_DCCP, 0); - Do the same for structs. Use the appropriate C header rather than filling in each struct manually. For example: *(uint16_t*)0x2000 = 0xa; *(uint16_t*)0x2002 = htobe16(0x4e20); *(uint32_t*)0x2004 = 0; *(uint8_t*)0x2008 = 0; *(uint8_t*)0x2009 = 0; *(uint8_t*)0x200a = 0; *(uint8_t*)0x200b = 0; *(uint8_t*)0x200c = 0; *(uint8_t*)0x200d = 0; *(uint8_t*)0x200e = 0; *(uint8_t*)0x200f = 0; *(uint8_t*)0x2010 = 0; *(uint8_t*)0x2011 = 0; *(uint8_t*)0x2012 = 0; *(uint8_t*)0x2013 = 0; *(uint8_t*)0x2014 = 0; *(uint8_t*)0x2015 = 0; *(uint8_t*)0x2016 = 0; *(uint8_t*)0x2017 = 0; *(uint32_t*)0x2018 = 0; becomes: struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_port = htobe16(0x4e20) }; - Put arguments on the stack rather than in a mmap'd region, if possible. - Simplify any calls to the helper functions that syzkaller emits, e.g. syz_open_dev(), syz_kvm_setup_vcpu(), or the networking setup stuff. Usually the reproducer needs a small subset of the functionality to work. - For multithreaded reproducers, try to incrementally simplify the threading strategy. For example, reduce the number of threads by combining operations. Also try running the operations in loops. Also, using fork() can often result in a simpler reproducer than pthreads. - Instead of using the 'r[]' array to hold all integer return values, give them appropriate names. - Remove duplicate #includes. - Considering the actual kernel code and the bug, if possible find a different way to trigger the same bug that's simpler or more reliable. If the problem is obvious it may be possible to jump right to this step from the beginning. Some gotchas: - fault-nth injections are fragile, since the number of memory allocations in a particular system call varies by kernel config and kernel version. Incrementing n starting from 1 is more reliable. - Some of the perf_event_open() reproducers are fragile because they hardcode a trace event ID, which can change in every kernel version. Reading the trace event ID from /sys/kernel/debug/tracing/events/ is more reliable. - Reproducers using the KVM API sometimes only work on certain processors (e.g. Intel but not AMD) or even depend on the host kernel. - Reproducers that access the local filesystem sometimes assume that it's ext4.
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Wed, Mar 13, 2019 at 5:37 PM Theodore Ts'o wrote: > > On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote: > > It would be more useful to accept patches that make syzkaller create > > better reproducers from these people. Manual work is not scalable. We > > would need 10 reproducers per day for a dozen of OSes (incl some > > private kernels/branches). Anybody is free to run syzkaller manually > > and do full manual (perfect) reporting. But for us it become clear > > very early that it won't work. Then see above, while that human is > > sleeping/on weekend/vacation, syzbot will already bisect own > > reproducer. Adding manual reproducer later won't help in any way. > > syzkaller already does lots of smart work for reproducers. Let's not > > give up on the last mile and switch back to all manual work. > > I suspect a scalable solution that would significantly improve things > is one where Syzbot tries N times for a "good" result to make sure > it's not a flaky pass. N could either be hard-coded to some value > like 8 or 10, or Syzbot could experimentally try to figure out how > reliable the reproducer happens to be, and figure out what an ideal > "N" value should be for a particular reproducer. It currently tries 8 times, see e.g.: https://syzkaller.appspot.com/text?tag=Log=13354d9d20
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote: > It would be more useful to accept patches that make syzkaller create > better reproducers from these people. Manual work is not scalable. We > would need 10 reproducers per day for a dozen of OSes (incl some > private kernels/branches). Anybody is free to run syzkaller manually > and do full manual (perfect) reporting. But for us it become clear > very early that it won't work. Then see above, while that human is > sleeping/on weekend/vacation, syzbot will already bisect own > reproducer. Adding manual reproducer later won't help in any way. > syzkaller already does lots of smart work for reproducers. Let's not > give up on the last mile and switch back to all manual work. I suspect a scalable solution that would significantly improve things is one where Syzbot tries N times for a "good" result to make sure it's not a flaky pass. N could either be hard-coded to some value like 8 or 10, or Syzbot could experimentally try to figure out how reliable the reproducer happens to be, and figure out what an ideal "N" value should be for a particular reproducer. - Ted
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Tue, Mar 12, 2019 at 10:11 PM Tetsuo Handa wrote: > > On 2019/03/13 2:15, Dmitry Vyukov wrote: > >> Also, this bisection is finding multiple different crash patterns, which > >> suggests that the crashed tests are not giving correct feedback to syzbot. > > > > Treating different crashes as just "crash" is intended. Kernel bugs > > can manifest in very different ways. > > Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update > > in parallel" in https://syzkaller.appspot.com/?fixed=upstream > > It lead to 50+ different failure modes. > > > > But syzbot already found a rather simple C reproducer > ( https://syzkaller.appspot.com/text?tag=ReproC=116fc7a8c0 ) for this > bug. > Was this reproducer used for bisection? The C reproducer used for bisection is provided as "C reproducer" in the bisection report. > I guess that if this reproducer was used, > syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases. Maybe. But we won't have more than 1 in future. Currently syzbot bisects over a backlog of crashes, some of them accumulated multiple reproducers over weeks/months/years. When it will bisect newly reported bugs as they are found, there will be only 1 reproducer. E.g. these two for this bug were found within a month. > Also, humans can sometimes find more simpler C reproducers from syzbot > provided > reproducers. It would be nice if syzbot can accept and use a user defined C > reproducer for testing. It would be more useful to accept patches that make syzkaller create better reproducers from these people. Manual work is not scalable. We would need 10 reproducers per day for a dozen of OSes (incl some private kernels/branches). Anybody is free to run syzkaller manually and do full manual (perfect) reporting. But for us it become clear very early that it won't work. Then see above, while that human is sleeping/on weekend/vacation, syzbot will already bisect own reproducer. Adding manual reproducer later won't help in any way. syzkaller already does lots of smart work for reproducers. Let's not give up on the last mile and switch back to all manual work.
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On 2019/03/13 2:15, Dmitry Vyukov wrote: >> Also, this bisection is finding multiple different crash patterns, which >> suggests that the crashed tests are not giving correct feedback to syzbot. > > Treating different crashes as just "crash" is intended. Kernel bugs > can manifest in very different ways. > Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update > in parallel" in https://syzkaller.appspot.com/?fixed=upstream > It lead to 50+ different failure modes. > But syzbot already found a rather simple C reproducer ( https://syzkaller.appspot.com/text?tag=ReproC=116fc7a8c0 ) for this bug. Was this reproducer used for bisection? I guess that if this reproducer was used, syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases. Also, humans can sometimes find more simpler C reproducers from syzbot provided reproducers. It would be nice if syzbot can accept and use a user defined C reproducer for testing.
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Tue, Mar 12, 2019 at 3:30 PM Tetsuo Handa wrote: > > (Moving most recipients to bcc: in order to avoid flooding.) > > On 2019/03/12 13:08, Al Viro wrote: > > Umm... Might be a good idea to add some plausibility filters - it is, > > in theory, possible that adding a line in a comment changes behaviour > > (without compiler bugs, even - playing with __LINE__ is all it would > > take), but the odds that it's _not_ a false positive are very low. > > Well, 108 out of 168 tests done during this bisection failed to test. > With such high failure ratio, it is possible that by chance no crash > happened during few tests for specific commit; causing a wrong bisection > result. I expect that when trying to conclude "git bisect good" for > specific commit, the tests should be repeated until no crash happened > during 8 successful tests. Added to https://github.com/google/syzkaller/issues/1051: Tetsuo points out that if lots (say, 7/8) tests failed with infra problems, then we should retry/skip or something. This zeroes the effect of having multiple independent tests. Thanks. > Also, this bisection is finding multiple different crash patterns, which > suggests that the crashed tests are not giving correct feedback to syzbot. Treating different crashes as just "crash" is intended. Kernel bugs can manifest in very different ways. Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update in parallel" in https://syzkaller.appspot.com/?fixed=upstream It lead to 50+ different failure modes. > $ grep -F 'run #' bisect.txt\?x\=1322028320 | wc -l > 168 > $ grep -F 'Connection timed out' bisect.txt\?x\=1322028320 | wc -l > 108 > $ grep -F 'crashed' bisect.txt\?x\=1322028320 > run #0: crashed: WARNING: ODEBUG bug in netdev_freemem > run #0: crashed: WARNING: ODEBUG bug in netdev_freemem > run #1: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in sys_sendfile64 > run #0: crashed: INFO: rcu detected stall in corrupted > run #4: crashed: INFO: rcu detected stall in sys_sendfile64 > run #0: crashed: INFO: rcu detected stall in corrupted > run #1: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter > run #0: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in sendfile64 > run #0: crashed: INFO: rcu detected stall in corrupted > run #1: crashed: INFO: rcu detected stall in sendfile64 > run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter > run #1: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in corrupted > run #1: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in corrupted > run #3: crashed: INFO: rcu detected stall in corrupted > run #0: crashed: INFO: rcu detected stall in do_iter_write
Re: INFO: rcu detected stall in sys_sendfile64 (2)
(Moving most recipients to bcc: in order to avoid flooding.) On 2019/03/12 13:08, Al Viro wrote: > Umm... Might be a good idea to add some plausibility filters - it is, > in theory, possible that adding a line in a comment changes behaviour > (without compiler bugs, even - playing with __LINE__ is all it would > take), but the odds that it's _not_ a false positive are very low. Well, 108 out of 168 tests done during this bisection failed to test. With such high failure ratio, it is possible that by chance no crash happened during few tests for specific commit; causing a wrong bisection result. I expect that when trying to conclude "git bisect good" for specific commit, the tests should be repeated until no crash happened during 8 successful tests. Also, this bisection is finding multiple different crash patterns, which suggests that the crashed tests are not giving correct feedback to syzbot. $ grep -F 'run #' bisect.txt\?x\=1322028320 | wc -l 168 $ grep -F 'Connection timed out' bisect.txt\?x\=1322028320 | wc -l 108 $ grep -F 'crashed' bisect.txt\?x\=1322028320 run #0: crashed: WARNING: ODEBUG bug in netdev_freemem run #0: crashed: WARNING: ODEBUG bug in netdev_freemem run #1: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in sys_sendfile64 run #0: crashed: INFO: rcu detected stall in corrupted run #4: crashed: INFO: rcu detected stall in sys_sendfile64 run #0: crashed: INFO: rcu detected stall in corrupted run #1: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter run #0: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in sendfile64 run #0: crashed: INFO: rcu detected stall in corrupted run #1: crashed: INFO: rcu detected stall in sendfile64 run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter run #1: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in corrupted run #1: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in corrupted run #3: crashed: INFO: rcu detected stall in corrupted run #0: crashed: INFO: rcu detected stall in do_iter_write
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On 2019/01/19 20:41, Dmitry Vyukov wrote: > On Sat, Jan 19, 2019 at 12:32 PM syzbot > wrote: >> >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit:2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel... >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c0 >> kernel config: https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258 >> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5 >> compiler: gcc (GCC) 9.0.0 20181231 (experimental) >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c0 >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15df4108c0 > > Looking at the reproducer it looks like something with scheduler as it > involves perf_event_open and sched_setattr. So +Peter and Mingo. > Is it the same root cause as the other stalls involving sched_setattr? Yes. I think sched_setattr() involves this problem. Reproducers from "BUG: workqueue lockup (4)" at https://syzkaller.appspot.com/text?tag=ReproC=13ec31a540 involves sched_setattr(SCHED_DEADLINE) and https://syzkaller.appspot.com/text?tag=ReproC=104da690c0 also involves sched_setattr().
Re: INFO: rcu detected stall in sys_sendfile64 (2)
On Sat, Jan 19, 2019 at 12:32 PM syzbot wrote: > > Hello, > > syzbot found the following crash on: > > HEAD commit:2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel... > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c0 > kernel config: https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258 > dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5 > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c0 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15df4108c0 Looking at the reproducer it looks like something with scheduler as it involves perf_event_open and sched_setattr. So +Peter and Mingo. Is it the same root cause as the other stalls involving sched_setattr? > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+1505c80c74256c611...@syzkaller.appspotmail.com > > audit: type=1400 audit(1547895693.874:36): avc: denied { map } for > pid=8427 comm="syz-executor786" path="/root/syz-executor786610373" > dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 > tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1 > hrtimer: interrupt took 42996 ns > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > rcu:(detected by 0, t=10502 jiffies, g=6469, q=3) > rcu: All QSes seen, last rcu_preempt kthread activity 10502 > (4295051508-4295041006), jiffies_till_next_fqs=1, root ->qsmask 0x0 > syz-executor786 R running task21544 8437 8433 0x > Call Trace: > > sched_show_task kernel/sched/core.c:5293 [inline] > sched_show_task.cold+0x273/0x2d5 kernel/sched/core.c:5268 > print_other_cpu_stall.cold+0x7f2/0x8bb kernel/rcu/tree.c:1301 > check_cpu_stall kernel/rcu/tree.c:1429 [inline] > rcu_pending kernel/rcu/tree.c:3018 [inline] > rcu_check_callbacks+0xf36/0x1380 kernel/rcu/tree.c:2521 > update_process_times+0x32/0x80 kernel/time/timer.c:1635 > tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161 > tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271 > __run_hrtimer kernel/time/hrtimer.c:1389 [inline] > __hrtimer_run_queues+0x3a7/0x1050 kernel/time/hrtimer.c:1451 > hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline] > smp_apic_timer_interrupt+0x18d/0x760 arch/x86/kernel/apic/apic.c:1060 > apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 > > RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline] > RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline] > RIP: 0010:write_comp_data+0x9/0x70 kernel/kcov.c:122 > Code: 12 00 00 8b 80 dc 12 00 00 48 8b 11 48 83 c2 01 48 39 d0 76 07 48 89 > 34 d1 48 89 11 5d c3 0f 1f 00 65 4c 8b 04 25 40 ee 01 00 <65> 8b 05 80 ee > 7f 7e a9 00 01 1f 00 75 51 41 8b 80 d8 12 00 00 83 > RSP: 0018:888080466f58 EFLAGS: 0246 ORIG_RAX: ff13 > RAX: 226d RBX: 888080467140 RCX: 87f08bdf > RDX: 0002 RSI: 0002 RDI: 0007 > RBP: 888080466f60 R08: 888096338480 R09: ed1015cc5b90 > R10: ed1015cc5b8f R11: 8880ae62dc7b R12: 11101008cdf0 > R13: 888092b64102 R14: 888092b64102 R15: 888080467158 > xa_is_node include/linux/xarray.h:946 [inline] > xas_start+0x1cf/0x720 lib/xarray.c:183 > xas_load+0x21/0x160 lib/xarray.c:227 > find_get_entry+0x350/0x10a0 mm/filemap.c:1476 > pagecache_get_page+0xe6/0x1020 mm/filemap.c:1579 > find_get_page include/linux/pagemap.h:272 [inline] > generic_file_buffered_read mm/filemap.c:2076 [inline] > generic_file_read_iter+0x7b2/0x2d40 mm/filemap.c:2350 > ext4_file_read_iter+0x180/0x3c0 fs/ext4/file.c:77 > call_read_iter include/linux/fs.h:1856 [inline] > generic_file_splice_read+0x5c4/0xa90 fs/splice.c:308 > do_splice_to+0x12a/0x190 fs/splice.c:880 > splice_direct_to_actor+0x31b/0x9d0 fs/splice.c:957 > do_splice_direct+0x2c7/0x420 fs/splice.c:1066 > do_sendfile+0x61a/0xe60 fs/read_write.c:1436 > __do_sys_sendfile64 fs/read_write.c:1491 [inline] > __se_sys_sendfile64 fs/read_write.c:1483 [inline] > __x64_sys_sendfile64+0x15a/0x240 fs/read_write.c:1483 > do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x446a19 > Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff > ff 0f 83 4b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 > RSP: 002b:7f53e33cfda8 EFLAGS: 0297 ORIG_RAX: 0028 > RAX: ffda RBX: 006dcc28 RCX: 00446a19 > RDX: 2380 RSI: 0003 RDI: 0003 > RBP: 006dcc20 R08: R09: > R10: 8080fffe R11: 0297 R12: 006dcc2c > R13:
INFO: rcu detected stall in sys_sendfile64 (2)
Hello, syzbot found the following crash on: HEAD commit:2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c0 kernel config: https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258 dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5 compiler: gcc (GCC) 9.0.0 20181231 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c0 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15df4108c0 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+1505c80c74256c611...@syzkaller.appspotmail.com audit: type=1400 audit(1547895693.874:36): avc: denied { map } for pid=8427 comm="syz-executor786" path="/root/syz-executor786610373" dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1 hrtimer: interrupt took 42996 ns rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu:(detected by 0, t=10502 jiffies, g=6469, q=3) rcu: All QSes seen, last rcu_preempt kthread activity 10502 (4295051508-4295041006), jiffies_till_next_fqs=1, root ->qsmask 0x0 syz-executor786 R running task21544 8437 8433 0x Call Trace: sched_show_task kernel/sched/core.c:5293 [inline] sched_show_task.cold+0x273/0x2d5 kernel/sched/core.c:5268 print_other_cpu_stall.cold+0x7f2/0x8bb kernel/rcu/tree.c:1301 check_cpu_stall kernel/rcu/tree.c:1429 [inline] rcu_pending kernel/rcu/tree.c:3018 [inline] rcu_check_callbacks+0xf36/0x1380 kernel/rcu/tree.c:2521 update_process_times+0x32/0x80 kernel/time/timer.c:1635 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271 __run_hrtimer kernel/time/hrtimer.c:1389 [inline] __hrtimer_run_queues+0x3a7/0x1050 kernel/time/hrtimer.c:1451 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline] smp_apic_timer_interrupt+0x18d/0x760 arch/x86/kernel/apic/apic.c:1060 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807 RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline] RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline] RIP: 0010:write_comp_data+0x9/0x70 kernel/kcov.c:122 Code: 12 00 00 8b 80 dc 12 00 00 48 8b 11 48 83 c2 01 48 39 d0 76 07 48 89 34 d1 48 89 11 5d c3 0f 1f 00 65 4c 8b 04 25 40 ee 01 00 <65> 8b 05 80 ee 7f 7e a9 00 01 1f 00 75 51 41 8b 80 d8 12 00 00 83 RSP: 0018:888080466f58 EFLAGS: 0246 ORIG_RAX: ff13 RAX: 226d RBX: 888080467140 RCX: 87f08bdf RDX: 0002 RSI: 0002 RDI: 0007 RBP: 888080466f60 R08: 888096338480 R09: ed1015cc5b90 R10: ed1015cc5b8f R11: 8880ae62dc7b R12: 11101008cdf0 R13: 888092b64102 R14: 888092b64102 R15: 888080467158 xa_is_node include/linux/xarray.h:946 [inline] xas_start+0x1cf/0x720 lib/xarray.c:183 xas_load+0x21/0x160 lib/xarray.c:227 find_get_entry+0x350/0x10a0 mm/filemap.c:1476 pagecache_get_page+0xe6/0x1020 mm/filemap.c:1579 find_get_page include/linux/pagemap.h:272 [inline] generic_file_buffered_read mm/filemap.c:2076 [inline] generic_file_read_iter+0x7b2/0x2d40 mm/filemap.c:2350 ext4_file_read_iter+0x180/0x3c0 fs/ext4/file.c:77 call_read_iter include/linux/fs.h:1856 [inline] generic_file_splice_read+0x5c4/0xa90 fs/splice.c:308 do_splice_to+0x12a/0x190 fs/splice.c:880 splice_direct_to_actor+0x31b/0x9d0 fs/splice.c:957 do_splice_direct+0x2c7/0x420 fs/splice.c:1066 do_sendfile+0x61a/0xe60 fs/read_write.c:1436 __do_sys_sendfile64 fs/read_write.c:1491 [inline] __se_sys_sendfile64 fs/read_write.c:1483 [inline] __x64_sys_sendfile64+0x15a/0x240 fs/read_write.c:1483 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446a19 Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 4b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f53e33cfda8 EFLAGS: 0297 ORIG_RAX: 0028 RAX: ffda RBX: 006dcc28 RCX: 00446a19 RDX: 2380 RSI: 0003 RDI: 0003 RBP: 006dcc20 R08: R09: R10: 8080fffe R11: 0297 R12: 006dcc2c R13: 8080fffe R14: 7f53e33d09c0 R15: 006dcd2c rcu: rcu_preempt kthread starved for 10502 jiffies! g6469 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1 rcu: RCU grace-period kthread stack dump: rcu_preempt R running task2620010 2 0x8000 Call Trace: context_switch kernel/sched/core.c:2831 [inline] __schedule+0x897/0x1e60 kernel/sched/core.c:3472 schedule+0xfe/0x350 kernel/sched/core.c:3516