On Thu, Jan 18, 2018 at 5:01 AM, Dmitry Vyukov <dvyu...@google.com> wrote: > On Thu, Jan 18, 2018 at 2:09 AM, Theodore Ts'o <ty...@mit.edu> wrote: >> On Wed, Jan 17, 2018 at 04:21:13PM -0800, Alexei Starovoitov wrote: >>> >>> If syzkaller can only test one tree than linux-next should be the one. >> >> Well, there's been some controversy about that. The problem is that >> it's often not clear if this is long-standing bug, or a bug which is >> in a particular subsystem tree --- and if so, *which* subsystem tree, >> etc. So it gets blasted to linux-kernel, and to get_maintainer.pl, >> which is often not accurate --- since the location of the crash >> doesn't necessarily point out where the problem originated, and hence >> who should look at the syzbot report. And so this has caused >> some.... irritation. > > > Re set of tested trees. > > We now have an interesting spectrum of opinions. > > Some assorted thoughts on this: > > 1. First, "upstream is clean" won't happen any time soon. There are > several reasons for this: > - Currently syzkaller only tests a subset of subsystems that it knows > how to test, even the ones that it tests it tests poorly. Over time > it's improved to test most subsystems and existing subsystems better. > Just few weeks ago I've added some descriptions for crypto subsystem > and it uncovered 20+ old bugs. > - syzkaller is guided, genetic fuzzer over time it leans how to do > more complex things by small steps. It takes time. > - We have more bug detection tools coming: LEAKCHECK, KMSAN (uninit > memory), KTSAN (data races). > - generic syzkaller smartness will be improved over time. > - it will get more CPU resources. > Effect of all of these things is multiplicative: we test more code, > smarter, with more bug-detection tools, with more resources. So I > think we need to plan for a mix of old and new bugs for foreseeable > future. > > 2. get_maintainer.pl and mix of old and new bugs was mentioned as > harming attribution. I don't see what will change when/if we test only > upstream. Then the same mix of old/new bugs will be detected just on > upstream, with all of the same problems for old/new, maintainers, > which subsystem, etc. I think the amount of bugs in the kernel is > significant part of the problem, but the exact boundary where we > decide to start killing them won't affect number of bugs. > > 3. If we test only upstream, we increase chances of new security bugs > sinking into releases. We sure could raise perceived security value of > the bugs by keeping them private, letting them sink into release, > letting them sink into distros, and then reporting a high-profile > vulnerability. I think that's wrong. There is something broken with > value measuring in security community. Bug that is killed before > sinking into any release is the highest impact thing. As Alexei noted, > fixing bugs es early as possible also reduces fix costs, backporting > burden, etc. This also can eliminate need in bisection in some cases, > say if you accepted a large change to some files and a bunch of > crashes appears for these files on your tree soon, it's obvious what > happens. > > 4. It was mentioned that linux-next can have a broken slab allocator > and that will manifest as multiple random crashes. FWIW I don't > remember that I ever seen this. Yes, sometimes it does not build/boot, > but these builds are just rejected for testing. > > I don't mind dropping linux-next specifically if that's the common > decision. However, (1) Alexei and Gruenter expressed opposite opinion,
My opinion does not really mean much, if anything. While my personal opinion is that it would be beneficial to test -next, my understanding also was that -next was not supposed to be a playground but a collection of patches which are ready for upstream. Quite obviously, as this exchange has shown, this is not or no longer the case. The result is that your testing of -next has not the desired effect of improving the Linux kernel and of finding problems _before_ they hit mainline. Instead, your efforts are seen as noise, and syzcaller's reputation is negatively affected. With that in mind, I would suggest to stop testing -next. If you ever have spare CPU capacity, you can start adding subtrees from -next which are known to never be rebased, such as net-next, taking subtrees tested by 0day as baseline. Thanks, Guenter > (2) I don't see what it will change dramatically, (2) as far as I > understand Linus actually relies on linux-next giving some concrete > testing to the code there. > But I think that testing bpf-next is a positive thing provided that > there is explicit interest from maintainers. And note that that will > be testing targeted specifically at bpf subsystem, so that instance > will not generate bugs in SCSI, USB, etc (though it will cover a part > of net). Also note that the latest email format includes set of tree > where the crash happened, so if you see "upstream" or "upstream and > bpf-next", nothing really changes, you still know that it happens > upstream. Or if you see only "bpf-next", then you know that it's only > that tree.