Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Wed, Aug 6, 2014 at 1:14 AM, Jakub Jelinek wrote: > On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote: d>> I don't understand how you guys can be so cavalier about a compiler >> bug that has already resulted in actual real problems. You bring up > > I have no problem with a -fno-var-tracking-assignments workaround for > compilers that have the PR61801 wrong-code bug. What I have problem with > is with disabling it even for compilers that have that bug fixed. > That is in essence disabling a useful feature just because it could have > other bugs. If my memory serves me well, PR61801 is the only wrong-code > I remember caused by -fvar-tracking-assignments during the 5 years since > it has been introduced into gcc. Sure, there have been several > -fcompare-debug bugs, where we generated slightly different code between > -g and -g0, and as you mentioned we have one still pending (Vladimir is > working on it right now), but that is mainly relevant to the case where I think gcc guys are taking a wrong lesson out of this. kernel doesn't care too much whether gcc produces the same binary with -g and -g0. kernel developers also don't care about amount debug info for variables, but they care about hard to find compiler bugs. In this case sched2 mishap around debug_insn was a symptom. The root cause is lack of attention to -mno-red-zone. Kernel is not another user space program where data/control flow analysis is all compiler need to make things right. -mno-red-zone lesson exposes lack of 'interrupt' concept in compiler. I think there has to be some infra put in place to make sure that it's not just a scheduling barrier. Otherwise next bug will pop much sooner than 5 years and it will not be related to debug info at all. In this sense Steven's perl script to detect red-zone violations did more to re-enable var-tracking than -fcompare-debug fixes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Jakub Jelinek writes: > There have been several man-years of work to get from the 25% var coverage > to 67%, several DWARF extensions (most of them to be available in DWARF5 or > work in progress on that) and with -fno-var-tracking-assignments that is > just returned to the old state. This is a typical "but look at all the work we've put in, it *has* to work" argument. As always, it is completely without merit. >> In other words, anybody who relies on it has already learnt to work >> around it. Or, more likely, there just isn't anybody who relies on it. >> >> I don't understand how you guys can be so cavalier about a compiler >> bug that has already resulted in actual real problems. You bring up > > I have no problem with a -fno-var-tracking-assignments workaround for > compilers that have the PR61801 wrong-code bug. Are there any that with reasonable confidence do not? -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote: > On Tue, Aug 5, 2014 at 2:07 PM, Frank Ch. Eigler wrote: > > > > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes > > and to extract variables at those probes, much as systemtap does. > > Without var-tracking, probes placed at most interior points of > > functions will make variables inaccessible. > > .. and as mentioned, -O2 already does that for many things, even > *with* tracking. Sure, debug info coverage for highly optimized code is never going to be perfect, but there is a difference if you have 75% of vars or just 33% of vars (see the numbers I've posted, I've picked ext4.o just randomly because it was one of the largest modules, can post more numbers if needed). BTW, because var-tracking is not performed at -O0, sometimes debug info is actually worse in -O0 than at higher optimization levels, because variable locations are bogus in prologues, epilogues and for variables with register keyword anywhere. There have been several man-years of work to get from the 25% var coverage to 67%, several DWARF extensions (most of them to be available in DWARF5 or work in progress on that) and with -fno-var-tracking-assignments that is just returned to the old state. > In other words, anybody who relies on it has already learnt to work > around it. Or, more likely, there just isn't anybody who relies on it. > > I don't understand how you guys can be so cavalier about a compiler > bug that has already resulted in actual real problems. You bring up I have no problem with a -fno-var-tracking-assignments workaround for compilers that have the PR61801 wrong-code bug. What I have problem with is with disabling it even for compilers that have that bug fixed. That is in essence disabling a useful feature just because it could have other bugs. If my memory serves me well, PR61801 is the only wrong-code I remember caused by -fvar-tracking-assignments during the 5 years since it has been introduced into gcc. Sure, there have been several -fcompare-debug bugs, where we generated slightly different code between -g and -g0, and as you mentioned we have one still pending (Vladimir is working on it right now), but that is mainly relevant to the case where you'd ship -g0 built binaries (== kernel) and then only if bugs appear wanted someone else to build kernel with -g and get identical binary, so that you could debug it. I believe if people build kernel with -g, then they usually build it with -g from the beginning, and either save the kernel with debug info somewhere, strip it to file or handle it similarly. If there is a fear there could be other wrong-code bugs with -fno-var-tracking-assignments, from the past experience that would be ~ another 5 years to discover it. Compare that to the frequency of -O2 wrong code issues, with that you'd need to disable -O2 because of the fear of unknown compiler bugs first. And, we had various wrong-code bugs even at -O0, so even that wouldn't help. Compiler bugs are just that, bugs that need to be reported, fixed, fixed compiler distributed to users, it is the same thing with kernel bugs, libc bugs etc. > theoretical cases that nobody has actually reported, and are > apparently ignoring the fact that the compiler generates INCORRECT > CODE. So on one hand we have known breakage, on the other we have It actually isn't theoretical, actually various -fvar-tracking-assignments changes have been done because people complained about important variables in the kernel being optimized away, the whole DW_OP_GNU_entry_value DWARF extension (DW_OP_entry_value in DWARF5 when it is released) was added because of bugreports from people trying to debug the kernel. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 5, 2014 at 4:30 PM, Frank Ch. Eigler wrote: > > Would you consider a patch that does a gcc COMPARE_DEBUG-based test? Yes. But as mentioned, we don't have a really good way to do that. I guess we can do something similar to what "cc-option" does, but that will end up doing it for every single kernel build. I'd love to have something like this be done at kconfig time (along with all the *other* crazy conditional compiler options we do), but I guess that's a largely independent issue. And afaik, it will basically disallow every gcc version since 4.5 up until 4.9.2 gets released. And afaik, even current gcc actually fails the compare-debug test for at least some inputs, so ... Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Hi - On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote: > > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes > > and to extract variables at those probes, much as systemtap does. > > Without var-tracking, probes placed at most interior points of > > functions will make variables inaccessible. > > .. and as mentioned, -O2 already does that for many things, even > *with* tracking. The whole point of variable tracking was to make -O2 usable (though still imperfect) for those who use debuggers and such tools. > [...] I don't understand how you guys can be so cavalier about a > compiler bug that has already resulted in actual real problems. No one is minimizing the problem. We are looking for a knob for those who know that their compiler does not have that bug. (Plus, those who don't care about debug data could use CONFIG_DEBUG_INFO=n with the bad compiler.) > You bring up theoretical cases that nobody has actually reported > [...] I assure you that the years of effort that went into gcc variable tracking was justified with actual reports. > Do you compile without -O2 too? Because I *guarantee* you that with > -O2 (even with tracking), you'll get "local variable 'xyz' optimized > away" cases. One gets many fewer than without it, and also fewer false positives (where the non-var-tracking debuginfo claims a variable may be available, but points to the wrong place). > [...] Until you can get the compiler people to have some sane way > to know the problem is gone, I'm not going to maintain a kernel that > uses a known-broken compiler feature. It's that simple. Would you consider a patch that does a gcc COMPARE_DEBUG-based test? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 5, 2014 at 2:07 PM, Frank Ch. Eigler wrote: > > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes > and to extract variables at those probes, much as systemtap does. > Without var-tracking, probes placed at most interior points of > functions will make variables inaccessible. .. and as mentioned, -O2 already does that for many things, even *with* tracking. In other words, anybody who relies on it has already learnt to work around it. Or, more likely, there just isn't anybody who relies on it. I don't understand how you guys can be so cavalier about a compiler bug that has already resulted in actual real problems. You bring up theoretical cases that nobody has actually reported, and are apparently ignoring the fact that the compiler generates INCORRECT CODE. So on one hand we have known breakage, on the other we have theoretical "I can make an example" arguments of behavior that no sane user has ever done. Do you compile without -O2 too? Because I *guarantee* you that with -O2 (even with tracking), you'll get "local variable 'xyz' optimized away" cases. Christ. I have asked the compiler people at least twice whether there is some way to limit it to just gcc-4.9.0/1, but the compiler people didn't step up and say that it was safe. In the meantime, crazy people can choose to compile with known-broken compilers. Until you can get the compiler people to have some sane way to know the problem is gone, I'm not going to maintain a kernel that uses a known-broken compiler feature. It's that simple. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Hi - > >>. I don't disagree it should be > >> disabled by default, but making it unconditional is going to force the > >> distributions that care about perf, systemtap, and debuggers to > >> manually revert this. > > > > Bah. I bet I use 'perf' more than most, and it doesn't care about > > debug info. Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes and to extract variables at those probes, much as systemtap does. Without var-tracking, probes placed at most interior points of functions will make variables inaccessible. Do you need a fully worked out example to see this? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 5, 2014 at 12:49 PM, Linus Torvalds wrote: > On Tue, Aug 5, 2014 at 4:31 AM, Josh Boyer wrote: >> >> Sorry to bring this back up after the fact, but it's important for a >> number of things in various distros > > You said that before, and I ignored you before, because you didn't > actually give any examples. Actually, I haven't said anything on this until now. You are perhaps thinking of someone else. I haven't participated in this thread until today. >>. I don't disagree it should be >> disabled by default, but making it unconditional is going to force the >> distributions that care about perf, systemtap, and debuggers to >> manually revert this. > > Bah. I bet I use 'perf' more than most, and it doesn't care about > debug info. Sure, if you want the annotated source code, or put probes > in place, you want the line number information, but the amount of > debug information it needs is miniscule, and not impacted bu this at > all afaik. > > And systemtap people have more problems than this. They might have more problems, but they're telling me this breaks systemtap full stop. AFAIK, systemtap needs to put probes in place for it to work. I've added a couple systemtap developers on CC again, and Jakub has an earlier post as well. Hopefully they can expand upon it. If they can't, then things become simpler. > Debuggers? Again, people who actually use kgdb have bigger issues than > some slightly worse local variable tracking. People care about the > frame pointer information and type information, but if you use kgdb on > the kernel you can damn well look at the assembly code and source code > annotation for local variable information. > > So I call bullshit. Give a real example of real-world use, not some > random handwaving of cases that happen to use debug info but - at > least for the kernel - don't actually care about the variable > tracking. I'm not here to convince you that kernel developers need this option set. I doubt they do, and I'm not sure I could make a convincing argument around that anyway. But I'm also being told by the teams I work with that this breaks something that previously worked for them, and I'm asking if it can be conditionally enabled instead of blindly defaulting to off. Distros do terrible things like patch their gcc with pre-release patches to fix issues, and giving the distros the option of enabling this is all I'm after. > The variable tracking is absolutely the *least* important part of the > debug info. By a huge margin. To the point of being entirely > irrelevant for the kernel. You make it sound like you lose all debug > information, when in reality that's not the case at all. I don't think I said that. I said distros that care about the things this enables in perf and systemtap are going to have to carry a revert (assuming what I've been told is valid). It's unnecessary deviation form upstream because you decided you didn't give a crap about this particular thing and thinks it's a waste of time. Frankly, I don't give a crap about it either but I do give a crap about people no longer being able to use my distro kernels without that revert for something that worked just fine before. The distro knows when their gcc packages have been fixed. > Trust me, you lose *way* more debug information because the kernel > uses "-O2" rather than "-O". I'm not arguing it's useful. I'm asking you to let distros to continue to make the horrible choices they make so the things they ship will work. I don't think that's being unreasonable. josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 5, 2014 at 4:31 AM, Josh Boyer wrote: > > Sorry to bring this back up after the fact, but it's important for a > number of things in various distros You said that before, and I ignored you before, because you didn't actually give any examples. >. I don't disagree it should be > disabled by default, but making it unconditional is going to force the > distributions that care about perf, systemtap, and debuggers to > manually revert this. Bah. I bet I use 'perf' more than most, and it doesn't care about debug info. Sure, if you want the annotated source code, or put probes in place, you want the line number information, but the amount of debug information it needs is miniscule, and not impacted bu this at all afaik. And systemtap people have more problems than this. Debuggers? Again, people who actually use kgdb have bigger issues than some slightly worse local variable tracking. People care about the frame pointer information and type information, but if you use kgdb on the kernel you can damn well look at the assembly code and source code annotation for local variable information. So I call bullshit. Give a real example of real-world use, not some random handwaving of cases that happen to use debug info but - at least for the kernel - don't actually care about the variable tracking. The variable tracking is absolutely the *least* important part of the debug info. By a huge margin. To the point of being entirely irrelevant for the kernel. You make it sound like you lose all debug information, when in reality that's not the case at all. Trust me, you lose *way* more debug information because the kernel uses "-O2" rather than "-O". Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 05, 2014 at 01:46:49PM +0200, Markus Trippelsdorf wrote: > On 2014.08.05 at 07:31 -0400, Josh Boyer wrote: > > Sorry to bring this back up after the fact, but it's important for a > > number of things in various distros. I don't disagree it should be > > disabled by default, but making it unconditional is going to force the > > distributions that care about perf, systemtap, and debuggers to > > manually revert this. That deviation is concerning because the > > upstream kernel won't easily be buildable the same way distros build > > it. > > > > I'm happy to come up with a config option patch, but I'm not sure if > > it would be accepted. Is that a possibility at this point? > > Please note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61923 But that isn't a wrong-code bug, in some cases it is no code generation difference between -g and -g0 at all (just :TI on the dumps being different), in some cases it can affect scheduling decisions for real, but still not in a way that disregards the actual dependencies in between instructions. You can get such scheduling differences also if you just use slightly different compiler revision. > isn't fixed yet. So it would be premature to manually revert Linus' > patch yet. > > When PR61923 gets fixed (and backported) its testcase could be combined > with the testcase Jakub posted earlier in this thread. Just some data to show that the debug info differences are nothing close to insignificant. I've built a 3.16 kernel (from Fedora 21 rpm) both normally (which includes the -fno-var-tracking-assignments change) and with that line in Makefile commented out. E.g. on fs/ext4/ext4.o, dwlocstat says for -fno-var-tracking-assignments: dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o cov%samples cumul 0.0 19620/75% 19620/75% 0..10 518/1% 20138/77% 11..20 317/1% 20455/78% 21..30 362/1% 20817/79% 31..40 349/1% 21166/81% 41..50 240/0% 21406/82% 51..60 297/1% 21703/83% 61..70 293/1% 21996/84% 71..80 355/1% 22351/85% 81..90 402/1% 22753/87% 91..99 901/3% 23654/90% 100 2425/9% 26079/100% and without it: dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o cov%samples cumul 0.0 8697/33%8697/33% 0..10 431/1% 9128/35% 11..20 371/1% 9499/36% 21..30 608/2% 10107/38% 31..40 525/2% 10632/40% 41..50 461/1% 11093/42% 51..60 881/3% 11974/45% 61..70 678/2% 12652/48% 71..80 894/3% 13546/51% 81..90 1036/3% 14582/55% 91..99 949/3% 15531/59% 100 10548/40% 26079/100% So, with -fno-var-tracking-assignments, about 75% of parameters/variables have no location info at all in ext4.o, 9% of parameters/variables have coverage for all instructions where the parameter/variable is in scope, the rest something in between. Without -fno-var-tracking-assignments, only 33% of params/vars have no location info at all, and 40% have coverage on all instructions where they are in scope. Even for the params/vars without 100% coverage there can be seen improvements, e.g. in the 51%-90% coverage. Or e.g. -fno-var-tracking-assignments: dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o cov%samples cumul 0.0 6323/64%6323/64% 0..10 144/1% 6467/66% 11..20 138/1% 6605/67% 21..30 123/1% 6728/68% 31..40 144/1% 6872/70% 41..50 169/1% 7041/72% 51..60 130/1% 7171/73% 61..70 146/1% 7317/74% 71..80 198/2% 7515/76% 81..90 258/2% 7773/79% 91..99 541/5% 8314/85% 100 1448/14%9762/100% without -fno-var-tracking-assignments: dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o cov%samples cumul 0.0 2954/30%2954/30% 0..10 131/1% 3085/31% 11..20 110/1% 3195/32% 21..30 141/1% 3336/34% 31..40 212/2% 3548/36% 41..50 226/2% 3774/38% 51..60 254/2% 4028/41% 61..70 237/2% 4265/43% 71..80 325/3% 4590/47% 81..90 420/4% 5010/51% 91..99 425/4% 5435/55% 100 4328/44%9763/100% It is visible also e.g. in the section sizes on vmlinux: -fno-var-tracking-assignments: readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, strtonum("0x"$6)/1024.0/1024.0}' .debug_aranges 0.115387MB .debug_info 96.625183MB .debug_abbrev 2.585021MB .debug_line 5.936069MB .debug_frame 1.710655MB .debug_str 1.956143MB .debug_loc 8.713246MB .debug_ranges 3.020081MB without -fno-var-tracking-assignments: readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, strtonum("0x"$6)/1024.0/1024.0}' .debug_aranges 0.115387MB .debug_info 99.564449MB .debug_abbrev 2.665213MB .debug_line 5.936069MB .debug_frame 1.710655MB .debug_str 1.955960MB .debug_loc 22.607447MB .debug_ranges 3.020264MB .debug_info growth is only very small (not even 3MB), only few vars have the same value everywhere, but .debug_loc growth is significant (2.6 times bigger). Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Aug 05, 2014 at 07:31:22AM -0400, Josh Boyer wrote: > On Wed, Jul 30, 2014 at 11:47 AM, Linus Torvalds > wrote: > > On Tue, Jul 29, 2014 at 11:53 PM, Jakub Jelinek wrote: > >> > >> IMNSHO this is a too big hammer approach. The bug happened on a single > >> file only (right?) > > > > Very dubious. We happened to see it in a single case, and _maybe_ that > > was the only one in the whole kernel. But it's much more likely that > > it wasn't - it's not like the code in question was even all that > > unusual (just a percpu access triggering an asm - but we have tons of > > asms in the kernel). > > > > I'd argue that we were very lucky to get the problem happening > > reliably enough for a couple of people who then cared enoiugh to do > > good bug reports (considering that it needed an interrupt in *just* > > the right place) that we could debug it at all. In some code that gets > > run much less than the scheduler, it could easily have been one of > > those "people report it once in a blue moon, looks like memory > > corruption". > > > > Now, it would be interesting to hear if there is something very > > special that made that instruction scheduling bug trigger just for > > 4.9.x, or if there is something else that made it very particular to > > that code sequence. But in the absence of good reasoning to the > > contrary, I'd much rather say "let's just avoid the bug entirely". > > > > And that's partly because we really don't care that much about the > > debug info. Yes, it gets used, but it's not *that* common, and the > > last time the issue of debug info sucking up tons of resources came > > up, the biggest users were people who just wanted line information for > > oopses. Yes, there are people running kgdb etc, but on the whole it's > > rare, and quite frankly, from everything I have _ever_ seen, that's > > not how the real kernel bugs are ever really discovered. So the kind > > of debug information that the variable tracking logic adds just isn't > > all that important for the kernel. > > Sorry to bring this back up after the fact, but it's important for a > number of things in various distros. I don't disagree it should be > disabled by default, but making it unconditional is going to force the > distributions that care about perf, systemtap, and debuggers to > manually revert this. That deviation is concerning because the > upstream kernel won't easily be buildable the same way distros build > it. Why does this patch affect perf and other debuggers? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On 2014.08.05 at 07:31 -0400, Josh Boyer wrote: > Sorry to bring this back up after the fact, but it's important for a > number of things in various distros. I don't disagree it should be > disabled by default, but making it unconditional is going to force the > distributions that care about perf, systemtap, and debuggers to > manually revert this. That deviation is concerning because the > upstream kernel won't easily be buildable the same way distros build > it. > > I'm happy to come up with a config option patch, but I'm not sure if > it would be accepted. Is that a possibility at this point? Please note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61923 isn't fixed yet. So it would be premature to manually revert Linus' patch yet. When PR61923 gets fixed (and backported) its testcase could be combined with the testcase Jakub posted earlier in this thread. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Wed, Jul 30, 2014 at 11:47 AM, Linus Torvalds wrote: > On Tue, Jul 29, 2014 at 11:53 PM, Jakub Jelinek wrote: >> >> IMNSHO this is a too big hammer approach. The bug happened on a single >> file only (right?) > > Very dubious. We happened to see it in a single case, and _maybe_ that > was the only one in the whole kernel. But it's much more likely that > it wasn't - it's not like the code in question was even all that > unusual (just a percpu access triggering an asm - but we have tons of > asms in the kernel). > > I'd argue that we were very lucky to get the problem happening > reliably enough for a couple of people who then cared enoiugh to do > good bug reports (considering that it needed an interrupt in *just* > the right place) that we could debug it at all. In some code that gets > run much less than the scheduler, it could easily have been one of > those "people report it once in a blue moon, looks like memory > corruption". > > Now, it would be interesting to hear if there is something very > special that made that instruction scheduling bug trigger just for > 4.9.x, or if there is something else that made it very particular to > that code sequence. But in the absence of good reasoning to the > contrary, I'd much rather say "let's just avoid the bug entirely". > > And that's partly because we really don't care that much about the > debug info. Yes, it gets used, but it's not *that* common, and the > last time the issue of debug info sucking up tons of resources came > up, the biggest users were people who just wanted line information for > oopses. Yes, there are people running kgdb etc, but on the whole it's > rare, and quite frankly, from everything I have _ever_ seen, that's > not how the real kernel bugs are ever really discovered. So the kind > of debug information that the variable tracking logic adds just isn't > all that important for the kernel. Sorry to bring this back up after the fact, but it's important for a number of things in various distros. I don't disagree it should be disabled by default, but making it unconditional is going to force the distributions that care about perf, systemtap, and debuggers to manually revert this. That deviation is concerning because the upstream kernel won't easily be buildable the same way distros build it. I'm happy to come up with a config option patch, but I'm not sure if it would be accepted. Is that a possibility at this point? josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Jul 29, 2014 at 11:53 PM, Jakub Jelinek wrote: > > IMNSHO this is a too big hammer approach. The bug happened on a single > file only (right?) Very dubious. We happened to see it in a single case, and _maybe_ that was the only one in the whole kernel. But it's much more likely that it wasn't - it's not like the code in question was even all that unusual (just a percpu access triggering an asm - but we have tons of asms in the kernel). I'd argue that we were very lucky to get the problem happening reliably enough for a couple of people who then cared enoiugh to do good bug reports (considering that it needed an interrupt in *just* the right place) that we could debug it at all. In some code that gets run much less than the scheduler, it could easily have been one of those "people report it once in a blue moon, looks like memory corruption". Now, it would be interesting to hear if there is something very special that made that instruction scheduling bug trigger just for 4.9.x, or if there is something else that made it very particular to that code sequence. But in the absence of good reasoning to the contrary, I'd much rather say "let's just avoid the bug entirely". And that's partly because we really don't care that much about the debug info. Yes, it gets used, but it's not *that* common, and the last time the issue of debug info sucking up tons of resources came up, the biggest users were people who just wanted line information for oopses. Yes, there are people running kgdb etc, but on the whole it's rare, and quite frankly, from everything I have _ever_ seen, that's not how the real kernel bugs are ever really discovered. So the kind of debug information that the variable tracking logic adds just isn't all that important for the kernel. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On 2014.07.30 at 09:21 +0200, Jakub Jelinek wrote: > On Wed, Jul 30, 2014 at 09:13:08AM +0200, Markus Trippelsdorf wrote: > > On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote: > > > On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote: > > > > 3.15-stable review patch. If anyone has any objections, please let me > > > > know. > > > > > > IMNSHO this is a too big hammer approach. The bug happened on a single > > > file only (right?), so if anything, IMHO it could be disabled for that > > > single file only, and better do it only for compilers with the bug. > > > > No. There are many more files affected. It just happens that Linus > > analyzed the assembly of this single file (fair.c) and found a bug. > > Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll > > see. So unless someone analyzes the assembly output of all other > > affected files by hand and finds no issues, one has to assume the worst. > > I'm talking about wrong-code issues. For -fcompare-debug, we indeed check > it primarily during gcc bootstrap (through bootstrap-debug) and some > testcases, and we'll certainly try to build some more code with > -fcompare-debug and fix the issues. Yes, I'm talking about wrong-code issues, too. For example the pr61801.c testcase was reduced from kernel/exit.c. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Wed, Jul 30, 2014 at 09:13:08AM +0200, Markus Trippelsdorf wrote: > On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote: > > On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote: > > > 3.15-stable review patch. If anyone has any objections, please let me > > > know. > > > > IMNSHO this is a too big hammer approach. The bug happened on a single > > file only (right?), so if anything, IMHO it could be disabled for that > > single file only, and better do it only for compilers with the bug. > > No. There are many more files affected. It just happens that Linus > analyzed the assembly of this single file (fair.c) and found a bug. > Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll > see. So unless someone analyzes the assembly output of all other > affected files by hand and finds no issues, one has to assume the worst. I'm talking about wrong-code issues. For -fcompare-debug, we indeed check it primarily during gcc bootstrap (through bootstrap-debug) and some testcases, and we'll certainly try to build some more code with -fcompare-debug and fix the issues. Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote: > On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote: > > 3.15-stable review patch. If anyone has any objections, please let me know. > > IMNSHO this is a too big hammer approach. The bug happened on a single > file only (right?), so if anything, IMHO it could be disabled for that > single file only, and better do it only for compilers with the bug. No. There are many more files affected. It just happens that Linus analyzed the assembly of this single file (fair.c) and found a bug. Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll see. So unless someone analyzes the assembly output of all other affected files by hand and finds no issues, one has to assume the worst. > If there are wrong code bugs with -O2 (we've fixed many in the past), you > also don't turn off -O2 everywhere, similarly for -Os or any other options. > Disabling it just in case the same bug happens elsewhere when it actually > took 5 years before the bug caused miscompilation of something is too > defensive. We had at least 15 other wrong-code bugfixes just in between > 4.9.0 and 4.9.1. -fvar-tracking-assignments doesn't make a small difference > in debug info, but significant for optimized code. > If you build the kernel without and with -fno-var-tracking-assignments, > you can use e.g. the dwlocstat tool to see what kind of difference it makes > for the kernel in particular in variable debug info coverage. I'm sure it would be possible to backport a proper check based on the gcc testcase to the stable kernels, once it gets implemented. -- Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote: > 3.15-stable review patch. If anyone has any objections, please let me know. IMNSHO this is a too big hammer approach. The bug happened on a single file only (right?), so if anything, IMHO it could be disabled for that single file only, and better do it only for compilers with the bug. If there are wrong code bugs with -O2 (we've fixed many in the past), you also don't turn off -O2 everywhere, similarly for -Os or any other options. Disabling it just in case the same bug happens elsewhere when it actually took 5 years before the bug caused miscompilation of something is too defensive. We had at least 15 other wrong-code bugfixes just in between 4.9.0 and 4.9.1. -fvar-tracking-assignments doesn't make a small difference in debug info, but significant for optimized code. If you build the kernel without and with -fno-var-tracking-assignments, you can use e.g. the dwlocstat tool to see what kind of difference it makes for the kernel in particular in variable debug info coverage. > > --- a/Makefile > +++ b/Makefile > @@ -669,6 +669,8 @@ KBUILD_CFLAGS += -fomit-frame-pointer > endif > endif > > +KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) > + > ifdef CONFIG_DEBUG_INFO > KBUILD_CFLAGS+= -g > KBUILD_AFLAGS+= -Wa,--gdwarf-2 > Jakub -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/