Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-06 Thread Alexei Starovoitov
On Wed, Aug 6, 2014 at 1:14 AM, Jakub Jelinek  wrote:
> On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote:
d>> I don't understand how you guys can be so cavalier about a compiler
>> bug that has already resulted in actual real problems. You bring up
>
> I have no problem with a -fno-var-tracking-assignments workaround for
> compilers that have the PR61801 wrong-code bug.  What I have problem with
> is with disabling it even for compilers that have that bug fixed.
> That is in essence disabling a useful feature just because it could have
> other bugs.  If my memory serves me well, PR61801 is the only wrong-code
> I remember caused by -fvar-tracking-assignments during the 5 years since
> it has been introduced into gcc.  Sure, there have been several
> -fcompare-debug bugs, where we generated slightly different code between
> -g and -g0, and as you mentioned we have one still pending (Vladimir is
> working on it right now), but that is mainly relevant to the case where

I think gcc guys are taking a wrong lesson out of this. kernel doesn't
care too much whether gcc produces the same binary with -g and -g0.
kernel developers also don't care about amount debug info for variables,
but they care about hard to find compiler bugs. In this case sched2
mishap around debug_insn was a symptom. The root cause is lack
of attention to -mno-red-zone. Kernel is not another user space program
where data/control flow analysis is all compiler need to make things right.
-mno-red-zone lesson exposes lack of 'interrupt' concept in compiler.
I think there has to be some infra put in place to make sure that it's not
just a scheduling barrier. Otherwise next bug will pop much sooner
than 5 years and it will not be related to debug info at all.
In this sense Steven's perl script to detect red-zone violations did more
to re-enable var-tracking than -fcompare-debug fixes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-06 Thread Måns Rullgård
Jakub Jelinek  writes:

> There have been several man-years of work to get from the 25% var coverage
> to 67%, several DWARF extensions (most of them to be available in DWARF5 or
> work in progress on that) and with -fno-var-tracking-assignments that is
> just returned to the old state.

This is a typical "but look at all the work we've put in, it *has* to
work" argument.  As always, it is completely without merit.

>> In other words, anybody who relies on it has already learnt to work
>> around it. Or, more likely, there just isn't anybody who relies on it.
>> 
>> I don't understand how you guys can be so cavalier about a compiler
>> bug that has already resulted in actual real problems. You bring up
>
> I have no problem with a -fno-var-tracking-assignments workaround for
> compilers that have the PR61801 wrong-code bug.

Are there any that with reasonable confidence do not?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-06 Thread Jakub Jelinek
On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote:
> On Tue, Aug 5, 2014 at 2:07 PM, Frank Ch. Eigler  wrote:
> >
> > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes
> > and to extract variables at those probes, much as systemtap does.
> > Without var-tracking, probes placed at most interior points of
> > functions will make variables inaccessible.
> 
> .. and as mentioned, -O2 already does that for many things, even
> *with* tracking.

Sure, debug info coverage for highly optimized code is never going to be
perfect, but there is a difference if you have 75% of vars 
or just 33% of vars (see the numbers I've posted, I've picked ext4.o just
randomly because it was one of the largest modules, can post more numbers if
needed).  BTW, because var-tracking is not performed at -O0, sometimes debug
info is actually worse in -O0 than at higher optimization levels, because
variable locations are bogus in prologues, epilogues and for variables with
register keyword anywhere.
There have been several man-years of work to get from the 25% var coverage
to 67%, several DWARF extensions (most of them to be available in DWARF5 or
work in progress on that) and with -fno-var-tracking-assignments that is
just returned to the old state.

> In other words, anybody who relies on it has already learnt to work
> around it. Or, more likely, there just isn't anybody who relies on it.
> 
> I don't understand how you guys can be so cavalier about a compiler
> bug that has already resulted in actual real problems. You bring up

I have no problem with a -fno-var-tracking-assignments workaround for
compilers that have the PR61801 wrong-code bug.  What I have problem with
is with disabling it even for compilers that have that bug fixed.
That is in essence disabling a useful feature just because it could have
other bugs.  If my memory serves me well, PR61801 is the only wrong-code
I remember caused by -fvar-tracking-assignments during the 5 years since
it has been introduced into gcc.  Sure, there have been several
-fcompare-debug bugs, where we generated slightly different code between
-g and -g0, and as you mentioned we have one still pending (Vladimir is
working on it right now), but that is mainly relevant to the case where
you'd ship -g0 built binaries (== kernel) and then only if bugs appear
wanted someone else to build kernel with -g and get identical binary, so
that you could debug it.  I believe if people build kernel with -g, then
they usually build it with -g from the beginning, and either save the kernel
with debug info somewhere, strip it to file or handle it similarly.
If there is a fear there could be other wrong-code bugs with
-fno-var-tracking-assignments, from the past experience that would be
~ another 5 years to discover it.  Compare that to the frequency of -O2
wrong code issues, with that you'd need to disable -O2 because of the fear
of unknown compiler bugs first.  And, we had various wrong-code bugs even at
-O0, so even that wouldn't help.  Compiler bugs are just that, bugs that
need to be reported, fixed, fixed compiler distributed to users, it is the
same thing with kernel bugs, libc bugs etc.

> theoretical cases that nobody has actually reported, and are
> apparently ignoring the fact that the compiler generates INCORRECT
> CODE. So on one hand we have known breakage, on the other we have

It actually isn't theoretical, actually various -fvar-tracking-assignments
changes have been done because people complained about important variables
in the kernel being optimized away, the whole DW_OP_GNU_entry_value DWARF
extension (DW_OP_entry_value in DWARF5 when it is released) was added
because of bugreports from people trying to debug the kernel.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Linus Torvalds
On Tue, Aug 5, 2014 at 4:30 PM, Frank Ch. Eigler  wrote:
>
> Would you consider a patch that does a gcc COMPARE_DEBUG-based test?

Yes. But as mentioned, we don't have a really good way to do that. I
guess we can do something similar to what "cc-option" does, but that
will end up doing it for every single kernel build. I'd love to have
something like this be done at kconfig time (along with all the
*other* crazy conditional compiler options we do), but I guess that's
a largely independent issue.

And afaik, it will basically disallow every gcc version since 4.5 up
until 4.9.2 gets released. And afaik, even current gcc actually fails
the compare-debug test for at least some inputs, so ...

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Frank Ch. Eigler
Hi -

On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote:
> > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes
> > and to extract variables at those probes, much as systemtap does.
> > Without var-tracking, probes placed at most interior points of
> > functions will make variables inaccessible.
> 
> .. and as mentioned, -O2 already does that for many things, even
> *with* tracking.

The whole point of variable tracking was to make -O2 usable (though
still imperfect) for those who use debuggers and such tools.


> [...]  I don't understand how you guys can be so cavalier about a
> compiler bug that has already resulted in actual real problems.

No one is minimizing the problem.  We are looking for a knob for those
who know that their compiler does not have that bug.  (Plus, those who
don't care about debug data could use CONFIG_DEBUG_INFO=n with the bad
compiler.)


> You bring up theoretical cases that nobody has actually reported
> [...]

I assure you that the years of effort that went into gcc variable
tracking was justified with actual reports.


> Do you compile without -O2 too? Because I *guarantee* you that with
> -O2 (even with tracking), you'll get "local variable 'xyz' optimized
> away" cases.

One gets many fewer than without it, and also fewer false positives
(where the non-var-tracking debuginfo claims a variable may be
available, but points to the wrong place).


> [...]  Until you can get the compiler people to have some sane way
> to know the problem is gone, I'm not going to maintain a kernel that
> uses a known-broken compiler feature. It's that simple.

Would you consider a patch that does a gcc COMPARE_DEBUG-based test?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Linus Torvalds
On Tue, Aug 5, 2014 at 2:07 PM, Frank Ch. Eigler  wrote:
>
> Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes
> and to extract variables at those probes, much as systemtap does.
> Without var-tracking, probes placed at most interior points of
> functions will make variables inaccessible.

.. and as mentioned, -O2 already does that for many things, even
*with* tracking.

In other words, anybody who relies on it has already learnt to work
around it. Or, more likely, there just isn't anybody who relies on it.

I don't understand how you guys can be so cavalier about a compiler
bug that has already resulted in actual real problems. You bring up
theoretical cases that nobody has actually reported, and are
apparently ignoring the fact that the compiler generates INCORRECT
CODE. So on one hand we have known breakage, on the other we have
theoretical "I can make an example" arguments of behavior that no sane
user has ever done.

Do you compile without -O2 too? Because I *guarantee* you that with
-O2 (even with tracking), you'll get "local variable 'xyz' optimized
away" cases.

Christ.

I have asked the compiler people at least twice whether there is some
way to limit it to just gcc-4.9.0/1, but the compiler people didn't
step up and say that it was safe.

In the meantime, crazy people can choose to compile with known-broken
compilers. Until you can get the compiler people to have some sane way
to know the problem is gone, I'm not going to maintain a kernel that
uses a known-broken compiler feature. It's that simple.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Frank Ch. Eigler
Hi -

> >>.  I don't disagree it should be
> >> disabled by default, but making it unconditional is going to force the
> >> distributions that care about perf, systemtap, and debuggers to
> >> manually revert this.
> >
> > Bah. I bet I use 'perf' more than most, and it doesn't care about
> > debug info. 

Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes
and to extract variables at those probes, much as systemtap does.
Without var-tracking, probes placed at most interior points of
functions will make variables inaccessible.

Do you need a fully worked out example to see this?

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Josh Boyer
On Tue, Aug 5, 2014 at 12:49 PM, Linus Torvalds
 wrote:
> On Tue, Aug 5, 2014 at 4:31 AM, Josh Boyer  wrote:
>>
>> Sorry to bring this back up after the fact, but it's important for a
>> number of things in various distros
>
> You said that before, and I ignored you before, because you didn't
> actually give any examples.

Actually, I haven't said anything on this until now.  You are perhaps
thinking of someone else.  I haven't participated in this thread until
today.

>>.  I don't disagree it should be
>> disabled by default, but making it unconditional is going to force the
>> distributions that care about perf, systemtap, and debuggers to
>> manually revert this.
>
> Bah. I bet I use 'perf' more than most, and it doesn't care about
> debug info. Sure, if you want the annotated source code, or put probes
> in place, you want the line number information, but the amount of
> debug information it needs is miniscule, and not impacted bu this at
> all afaik.
>
> And systemtap people have more problems than this.

They might have more problems, but they're telling me this breaks
systemtap full stop.  AFAIK, systemtap needs to put probes in place
for it to work.  I've added a couple systemtap developers on CC again,
and Jakub has an earlier post as well.  Hopefully they can expand upon
it.  If they can't, then things become simpler.

> Debuggers? Again, people who actually use kgdb have bigger issues than
> some slightly worse local variable tracking.  People care about the
> frame pointer information and type information, but if you use kgdb on
> the kernel you can damn well look at the assembly code and source code
> annotation for local variable information.
>
> So I call bullshit. Give a real example of real-world use, not some
> random handwaving of cases that happen to use debug info but - at
> least for the kernel - don't actually care about the variable
> tracking.

I'm not here to convince you that kernel developers need this option
set.  I doubt they do, and I'm not sure I could make a convincing
argument around that anyway.  But I'm also being told by the teams I
work with that this breaks something that previously worked for them,
and I'm asking if it can be conditionally enabled instead of blindly
defaulting to off.  Distros do terrible things like patch their gcc
with pre-release patches to fix issues, and giving the distros the
option of enabling this is all I'm after.

> The variable tracking is absolutely the *least* important part of the
> debug info. By a huge margin. To the point of being entirely
> irrelevant for the kernel. You make it sound like you lose all debug
> information, when in reality that's not the case at all.

I don't think I said that.  I said distros that care about the things
this enables in perf and systemtap are going to have to carry a revert
(assuming what I've been told is valid).  It's unnecessary deviation
form upstream because you decided you didn't give a crap about this
particular thing and thinks it's a waste of time.  Frankly, I don't
give a crap about it either but I do give a crap about people no
longer being able to use my distro kernels without that revert for
something that worked just fine before.  The distro knows when their
gcc packages have been fixed.

> Trust me, you lose *way* more debug information because the kernel
> uses "-O2" rather than "-O".

I'm not arguing it's useful.  I'm asking you to let distros to
continue to make the horrible choices they make so the things they
ship will work.  I don't think that's being unreasonable.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Linus Torvalds
On Tue, Aug 5, 2014 at 4:31 AM, Josh Boyer  wrote:
>
> Sorry to bring this back up after the fact, but it's important for a
> number of things in various distros

You said that before, and I ignored you before, because you didn't
actually give any examples.

>.  I don't disagree it should be
> disabled by default, but making it unconditional is going to force the
> distributions that care about perf, systemtap, and debuggers to
> manually revert this.

Bah. I bet I use 'perf' more than most, and it doesn't care about
debug info. Sure, if you want the annotated source code, or put probes
in place, you want the line number information, but the amount of
debug information it needs is miniscule, and not impacted bu this at
all afaik.

And systemtap people have more problems than this.

Debuggers? Again, people who actually use kgdb have bigger issues than
some slightly worse local variable tracking.  People care about the
frame pointer information and type information, but if you use kgdb on
the kernel you can damn well look at the assembly code and source code
annotation for local variable information.

So I call bullshit. Give a real example of real-world use, not some
random handwaving of cases that happen to use debug info but - at
least for the kernel - don't actually care about the variable
tracking.

The variable tracking is absolutely the *least* important part of the
debug info. By a huge margin. To the point of being entirely
irrelevant for the kernel. You make it sound like you lose all debug
information, when in reality that's not the case at all.

Trust me, you lose *way* more debug information because the kernel
uses "-O2" rather than "-O".

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Jakub Jelinek
On Tue, Aug 05, 2014 at 01:46:49PM +0200, Markus Trippelsdorf wrote:
> On 2014.08.05 at 07:31 -0400, Josh Boyer wrote:
> > Sorry to bring this back up after the fact, but it's important for a
> > number of things in various distros.  I don't disagree it should be
> > disabled by default, but making it unconditional is going to force the
> > distributions that care about perf, systemtap, and debuggers to
> > manually revert this.  That deviation is concerning because the
> > upstream kernel won't easily be buildable the same way distros build
> > it.
> > 
> > I'm happy to come up with a config option patch, but I'm not sure if
> > it would be accepted.  Is that a possibility at this point?
> 
> Please note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61923

But that isn't a wrong-code bug, in some cases it is no code generation 
difference
between -g and -g0 at all (just :TI on the dumps being different), in some
cases it can affect scheduling decisions for real, but still not in a way
that disregards the actual dependencies in between instructions.  You can
get such scheduling differences also if you just use slightly different
compiler revision.

> isn't fixed yet. So it would be premature to manually revert Linus'
> patch yet. 
> 
> When PR61923 gets fixed (and backported) its testcase could be combined
> with the testcase Jakub posted earlier in this thread.

Just some data to show that the debug info differences are nothing close to
insignificant.  I've built a 3.16 kernel (from Fedora 21 rpm) both normally
(which includes the -fno-var-tracking-assignments change) and with that line
in Makefile commented out.  E.g. on fs/ext4/ext4.o, dwlocstat says for
-fno-var-tracking-assignments:
dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o
cov%samples cumul
0.0 19620/75%   19620/75%
0..10   518/1%  20138/77%
11..20  317/1%  20455/78%
21..30  362/1%  20817/79%
31..40  349/1%  21166/81%
41..50  240/0%  21406/82%
51..60  297/1%  21703/83%
61..70  293/1%  21996/84%
71..80  355/1%  22351/85%
81..90  402/1%  22753/87%
91..99  901/3%  23654/90%
100 2425/9% 26079/100%
and without it:
dwlocstat --tabulate=0.0:10,99,100 fs/ext4/ext4.o
cov%samples cumul
0.0 8697/33%8697/33%
0..10   431/1%  9128/35%
11..20  371/1%  9499/36%
21..30  608/2%  10107/38%
31..40  525/2%  10632/40%
41..50  461/1%  11093/42%
51..60  881/3%  11974/45%
61..70  678/2%  12652/48%
71..80  894/3%  13546/51%
81..90  1036/3% 14582/55%
91..99  949/3%  15531/59%
100 10548/40%   26079/100%

So, with -fno-var-tracking-assignments, about 75% of parameters/variables have
no location info at all in ext4.o, 9% of parameters/variables have coverage
for all instructions where the parameter/variable is in scope, the rest
something in between.  Without -fno-var-tracking-assignments, only 33% of
params/vars have no location info at all, and 40% have coverage on all
instructions where they are in scope.  Even for the params/vars without 100%
coverage there can be seen improvements, e.g. in the 51%-90% coverage.

Or e.g. -fno-var-tracking-assignments:
dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o
cov%samples cumul
0.0 6323/64%6323/64%
0..10   144/1%  6467/66%
11..20  138/1%  6605/67%
21..30  123/1%  6728/68%
31..40  144/1%  6872/70%
41..50  169/1%  7041/72%
51..60  130/1%  7171/73%
61..70  146/1%  7317/74%
71..80  198/2%  7515/76%
81..90  258/2%  7773/79%
91..99  541/5%  8314/85%
100 1448/14%9762/100%
without -fno-var-tracking-assignments:
dwlocstat --tabulate=0.0:10,99,100 lib/built-in.o
cov%samples cumul
0.0 2954/30%2954/30%
0..10   131/1%  3085/31%
11..20  110/1%  3195/32%
21..30  141/1%  3336/34%
31..40  212/2%  3548/36%
41..50  226/2%  3774/38%
51..60  254/2%  4028/41%
61..70  237/2%  4265/43%
71..80  325/3%  4590/47%
81..90  420/4%  5010/51%
91..99  425/4%  5435/55%
100 4328/44%9763/100%

It is visible also e.g. in the section sizes on vmlinux:
-fno-var-tracking-assignments:
readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, 
strtonum("0x"$6)/1024.0/1024.0}'
.debug_aranges 0.115387MB
.debug_info 96.625183MB
.debug_abbrev 2.585021MB
.debug_line 5.936069MB
.debug_frame 1.710655MB
.debug_str 1.956143MB
.debug_loc 8.713246MB
.debug_ranges 3.020081MB
without -fno-var-tracking-assignments:
readelf -WS vmlinux 2>&1 | awk '/\.debug_/{printf "%s %fMB\n", $2, 
strtonum("0x"$6)/1024.0/1024.0}'
.debug_aranges 0.115387MB
.debug_info 99.564449MB
.debug_abbrev 2.665213MB
.debug_line 5.936069MB
.debug_frame 1.710655MB
.debug_str 1.955960MB
.debug_loc 22.607447MB
.debug_ranges 3.020264MB

.debug_info growth is only very small (not even 3MB), only few vars have the
same value everywhere, but .debug_loc growth is significant (2.6 times
bigger).

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read

Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Greg Kroah-Hartman
On Tue, Aug 05, 2014 at 07:31:22AM -0400, Josh Boyer wrote:
> On Wed, Jul 30, 2014 at 11:47 AM, Linus Torvalds
>  wrote:
> > On Tue, Jul 29, 2014 at 11:53 PM, Jakub Jelinek  wrote:
> >>
> >> IMNSHO this is a too big hammer approach.  The bug happened on a single
> >> file only (right?)
> >
> > Very dubious. We happened to see it in a single case, and _maybe_ that
> > was the only one in the whole kernel. But it's much more likely that
> > it wasn't - it's not like the code in question was even all that
> > unusual (just a percpu access triggering an asm - but we have tons of
> > asms in the kernel).
> >
> > I'd argue that we were very lucky to get the problem happening
> > reliably enough for a couple of people who then cared enoiugh to do
> > good bug reports (considering that it needed an interrupt in *just*
> > the right place) that we could debug it at all. In some code that gets
> > run much less than the scheduler, it could easily have been one of
> > those "people report it once in a blue moon, looks like memory
> > corruption".
> >
> > Now, it would be interesting to hear if there is something very
> > special that made that instruction scheduling bug trigger just for
> > 4.9.x, or if there is something else that made it very particular to
> > that code sequence. But in the absence of good reasoning to the
> > contrary, I'd much rather say "let's just avoid the bug entirely".
> >
> > And that's partly because we really don't care that much about the
> > debug info. Yes, it gets used, but it's not *that* common, and the
> > last time the issue of debug info sucking up tons of resources came
> > up, the biggest users were people who just wanted line information for
> > oopses. Yes, there are people running kgdb etc, but on the whole it's
> > rare, and quite frankly, from everything I have _ever_ seen, that's
> > not how the real kernel bugs are ever really discovered. So the kind
> > of debug information that the variable tracking logic adds just isn't
> > all that important for the kernel.
> 
> Sorry to bring this back up after the fact, but it's important for a
> number of things in various distros.  I don't disagree it should be
> disabled by default, but making it unconditional is going to force the
> distributions that care about perf, systemtap, and debuggers to
> manually revert this.  That deviation is concerning because the
> upstream kernel won't easily be buildable the same way distros build
> it.

Why does this patch affect perf and other debuggers?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Markus Trippelsdorf
On 2014.08.05 at 07:31 -0400, Josh Boyer wrote:
> Sorry to bring this back up after the fact, but it's important for a
> number of things in various distros.  I don't disagree it should be
> disabled by default, but making it unconditional is going to force the
> distributions that care about perf, systemtap, and debuggers to
> manually revert this.  That deviation is concerning because the
> upstream kernel won't easily be buildable the same way distros build
> it.
> 
> I'm happy to come up with a config option patch, but I'm not sure if
> it would be accepted.  Is that a possibility at this point?

Please note that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61923
isn't fixed yet. So it would be premature to manually revert Linus'
patch yet. 

When PR61923 gets fixed (and backported) its testcase could be combined
with the testcase Jakub posted earlier in this thread.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Josh Boyer
On Wed, Jul 30, 2014 at 11:47 AM, Linus Torvalds
 wrote:
> On Tue, Jul 29, 2014 at 11:53 PM, Jakub Jelinek  wrote:
>>
>> IMNSHO this is a too big hammer approach.  The bug happened on a single
>> file only (right?)
>
> Very dubious. We happened to see it in a single case, and _maybe_ that
> was the only one in the whole kernel. But it's much more likely that
> it wasn't - it's not like the code in question was even all that
> unusual (just a percpu access triggering an asm - but we have tons of
> asms in the kernel).
>
> I'd argue that we were very lucky to get the problem happening
> reliably enough for a couple of people who then cared enoiugh to do
> good bug reports (considering that it needed an interrupt in *just*
> the right place) that we could debug it at all. In some code that gets
> run much less than the scheduler, it could easily have been one of
> those "people report it once in a blue moon, looks like memory
> corruption".
>
> Now, it would be interesting to hear if there is something very
> special that made that instruction scheduling bug trigger just for
> 4.9.x, or if there is something else that made it very particular to
> that code sequence. But in the absence of good reasoning to the
> contrary, I'd much rather say "let's just avoid the bug entirely".
>
> And that's partly because we really don't care that much about the
> debug info. Yes, it gets used, but it's not *that* common, and the
> last time the issue of debug info sucking up tons of resources came
> up, the biggest users were people who just wanted line information for
> oopses. Yes, there are people running kgdb etc, but on the whole it's
> rare, and quite frankly, from everything I have _ever_ seen, that's
> not how the real kernel bugs are ever really discovered. So the kind
> of debug information that the variable tracking logic adds just isn't
> all that important for the kernel.

Sorry to bring this back up after the fact, but it's important for a
number of things in various distros.  I don't disagree it should be
disabled by default, but making it unconditional is going to force the
distributions that care about perf, systemtap, and debuggers to
manually revert this.  That deviation is concerning because the
upstream kernel won't easily be buildable the same way distros build
it.

I'm happy to come up with a config option patch, but I'm not sure if
it would be accepted.  Is that a possibility at this point?

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-07-30 Thread Linus Torvalds
On Tue, Jul 29, 2014 at 11:53 PM, Jakub Jelinek  wrote:
>
> IMNSHO this is a too big hammer approach.  The bug happened on a single
> file only (right?)

Very dubious. We happened to see it in a single case, and _maybe_ that
was the only one in the whole kernel. But it's much more likely that
it wasn't - it's not like the code in question was even all that
unusual (just a percpu access triggering an asm - but we have tons of
asms in the kernel).

I'd argue that we were very lucky to get the problem happening
reliably enough for a couple of people who then cared enoiugh to do
good bug reports (considering that it needed an interrupt in *just*
the right place) that we could debug it at all. In some code that gets
run much less than the scheduler, it could easily have been one of
those "people report it once in a blue moon, looks like memory
corruption".

Now, it would be interesting to hear if there is something very
special that made that instruction scheduling bug trigger just for
4.9.x, or if there is something else that made it very particular to
that code sequence. But in the absence of good reasoning to the
contrary, I'd much rather say "let's just avoid the bug entirely".

And that's partly because we really don't care that much about the
debug info. Yes, it gets used, but it's not *that* common, and the
last time the issue of debug info sucking up tons of resources came
up, the biggest users were people who just wanted line information for
oopses. Yes, there are people running kgdb etc, but on the whole it's
rare, and quite frankly, from everything I have _ever_ seen, that's
not how the real kernel bugs are ever really discovered. So the kind
of debug information that the variable tracking logic adds just isn't
all that important for the kernel.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-07-30 Thread Markus Trippelsdorf
On 2014.07.30 at 09:21 +0200, Jakub Jelinek wrote:
> On Wed, Jul 30, 2014 at 09:13:08AM +0200, Markus Trippelsdorf wrote:
> > On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote:
> > > On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote:
> > > > 3.15-stable review patch.  If anyone has any objections, please let me 
> > > > know.
> > > 
> > > IMNSHO this is a too big hammer approach.  The bug happened on a single
> > > file only (right?), so if anything, IMHO it could be disabled for that
> > > single file only, and better do it only for compilers with the bug.
> > 
> > No. There are many more files affected. It just happens that Linus
> > analyzed the assembly of this single file (fair.c) and found a bug. 
> > Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll
> > see. So unless someone analyzes the assembly output of all other
> > affected files by hand and finds no issues, one has to assume the worst.
> 
> I'm talking about wrong-code issues.  For -fcompare-debug, we indeed check
> it primarily during gcc bootstrap (through bootstrap-debug) and some
> testcases, and we'll certainly try to build some more code with
> -fcompare-debug and fix the issues.

Yes, I'm talking about wrong-code issues, too. For example the pr61801.c
testcase was reduced from kernel/exit.c.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-07-30 Thread Jakub Jelinek
On Wed, Jul 30, 2014 at 09:13:08AM +0200, Markus Trippelsdorf wrote:
> On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote:
> > On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote:
> > > 3.15-stable review patch.  If anyone has any objections, please let me 
> > > know.
> > 
> > IMNSHO this is a too big hammer approach.  The bug happened on a single
> > file only (right?), so if anything, IMHO it could be disabled for that
> > single file only, and better do it only for compilers with the bug.
> 
> No. There are many more files affected. It just happens that Linus
> analyzed the assembly of this single file (fair.c) and found a bug. 
> Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll
> see. So unless someone analyzes the assembly output of all other
> affected files by hand and finds no issues, one has to assume the worst.

I'm talking about wrong-code issues.  For -fcompare-debug, we indeed check
it primarily during gcc bootstrap (through bootstrap-debug) and some
testcases, and we'll certainly try to build some more code with
-fcompare-debug and fix the issues.

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-07-30 Thread Markus Trippelsdorf
On 2014.07.30 at 08:53 +0200, Jakub Jelinek wrote:
> On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote:
> > 3.15-stable review patch.  If anyone has any objections, please let me know.
> 
> IMNSHO this is a too big hammer approach.  The bug happened on a single
> file only (right?), so if anything, IMHO it could be disabled for that
> single file only, and better do it only for compilers with the bug.

No. There are many more files affected. It just happens that Linus
analyzed the assembly of this single file (fair.c) and found a bug. 
Just build your redhat distro kernel with GCC_COMPARE_DEBUG=1 and you'll
see. So unless someone analyzes the assembly output of all other
affected files by hand and finds no issues, one has to assume the worst.

> If there are wrong code bugs with -O2 (we've fixed many in the past), you
> also don't turn off -O2 everywhere, similarly for -Os or any other options.
> Disabling it just in case the same bug happens elsewhere when it actually
> took 5 years before the bug caused miscompilation of something is too
> defensive.  We had at least 15 other wrong-code bugfixes just in between
> 4.9.0 and 4.9.1.  -fvar-tracking-assignments doesn't make a small difference
> in debug info, but significant for optimized code.
> If you build the kernel without and with -fno-var-tracking-assignments,
> you can use e.g. the dwlocstat tool to see what kind of difference it makes
> for the kernel in particular in variable debug info coverage.

I'm sure it would be possible to backport a proper check based on the
gcc testcase to the stable kernels, once it gets implemented.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-07-29 Thread Jakub Jelinek
On Tue, Jul 29, 2014 at 06:49:09PM -0700, Greg Kroah-Hartman wrote:
> 3.15-stable review patch.  If anyone has any objections, please let me know.

IMNSHO this is a too big hammer approach.  The bug happened on a single
file only (right?), so if anything, IMHO it could be disabled for that
single file only, and better do it only for compilers with the bug.
If there are wrong code bugs with -O2 (we've fixed many in the past), you
also don't turn off -O2 everywhere, similarly for -Os or any other options.
Disabling it just in case the same bug happens elsewhere when it actually
took 5 years before the bug caused miscompilation of something is too
defensive.  We had at least 15 other wrong-code bugfixes just in between
4.9.0 and 4.9.1.  -fvar-tracking-assignments doesn't make a small difference
in debug info, but significant for optimized code.
If you build the kernel without and with -fno-var-tracking-assignments,
you can use e.g. the dwlocstat tool to see what kind of difference it makes
for the kernel in particular in variable debug info coverage.
> 
> --- a/Makefile
> +++ b/Makefile
> @@ -669,6 +669,8 @@ KBUILD_CFLAGS += -fomit-frame-pointer
>  endif
>  endif
>  
> +KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
> +
>  ifdef CONFIG_DEBUG_INFO
>  KBUILD_CFLAGS+= -g
>  KBUILD_AFLAGS+= -Wa,--gdwarf-2
> 

Jakub
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/