from:"andi\-gcc at firstfloor dot org"

[Bug lto/83388] New: reference statement index not found error with -fsanitize=null

2017-12-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83388

Bug ID: 83388
   Summary: reference statement index not found error with
-fsanitize=null
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 42844
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42844=edit
test case

With the attached test case

gcc8  -m32 -O2 -flto -fsanitize=null -c core.i
gcc8 -r -nostdlib core.o

gives

In function 'i':
lto1: fatal error: Reference statement index not found
compilation terminated.

Happens with gcc 7 and trunk

[Bug lto/83376] ICE in LTO streamer

2017-12-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376

Andi Kleen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andi Kleen  ---
Looks like it was a case of incompatible LTO object file from a different gcc
build. With a clean build it doesn't happen anymore.

[Bug lto/83380] New: disk full while writing LTO files leads to ICE

2017-12-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83380

Bug ID: 83380
   Summary: disk full while writing LTO files leads to ICE
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

lto1: fatal error: error writing to vmlinux.ltrans15.s: No space left on device
gcc: internal compiler error: Aborted signal terminated program lto1
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.

Should just exit in this case

[Bug gcov-profile/83355] autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE

2017-12-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355

--- Comment #3 from Andi Kleen  ---
patch checked in

[Bug lto/83376] New: ICE in LTO streamer

2017-12-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83376

Bug ID: 83376
   Summary: ICE in LTO streamer
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Don't have a small test case right now, but will bisect

When building Linux kernel LTO with gcc 8 I currently get an ICE. Doesn't
happen on 7 and I think it's also recent on 8.

In this case data_in->current_decl_data is NULL while reading a reference.

0xa58fe7 crash_signal
../../gcc/gcc/toplev.c:325
0x957a39 lto_file_decl_data_get_var_decl
../../gcc/gcc/lto-streamer.h:1210
0x957a39 lto_input_tree_ref(lto_input_block*, data_in*, function*, LTO_tags)
../../gcc/gcc/lto-streamer-in.c:366
0x957c1d lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int)
../../gcc/gcc/lto-streamer-in.c:1475
0x6bdc8c lto_read_decls
../../gcc/gcc/lto/lto.c:1791
0x6bdc8c lto_file_finalize
../../gcc/gcc/lto/lto.c:2055
0x6bdc8c lto_create_files_from_ids
../../gcc/gcc/lto/lto.c:2065
0x6bdc8c lto_file_read
../../gcc/gcc/lto/lto.c:2106
0x6bdc8c read_cgraph_and_symbols
../../gcc/gcc/lto/lto.c:2818
0x6bfdb1 lto_main()
../../gcc/gcc/lto/lto.c:3323

[Bug lto/83375] partitioner partitions static arrays with label references

2017-12-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375

--- Comment #1 from Andi Kleen  ---
Actually -flto-partition=max

[Bug lto/83375] New: partitioner partitions static arrays with label references

2017-12-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83375

Bug ID: 83375
   Summary: partitioner partitions static arrays with label
references
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 42842
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42842=edit
test case

I thought there was already a bug for this, but can't find it right now.

When & are put into static arrays the LTO partitioner can put the static
into a different partition, which causes an assembler error because the code
labels are local.

This breaks Linux kernel LTO builds.

See attached test case.

I think ipa-comdats should put the function and the static into the same
partition, but for some reason it doesn't work.

Attached test case shows the problem with -flto-partition=1to1 -flto -O2

[Bug gcov-profile/83355] New: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE

2017-12-10 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83355

Bug ID: 83355
   Summary: autofdo g++.dg/bprob/g++-bprob-1.C FAILS with ICE
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Running in gdb shows that there is a very deep recursion in get_index_by_decl
until it overflows the stack.

This patch seems to fix it (but not sure why the abstract origin would point to
itself)

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 5134a795331..403709bad6b 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -477,7 +477,7 @@ string_table::get_index_by_decl (tree decl) const
   ret = get_index (lang_hooks.dwarf_name (decl, 0));
   if (ret != -1)
 return ret;
-  if (DECL_ABSTRACT_ORIGIN (decl))
+  if (DECL_ABSTRACT_ORIGIN (decl) && DECL_ABSTRACT_ORIGIN (decl) != decl)
 return get_index_by_decl (DECL_ABSTRACT_ORIGIN (decl));

   return -1;


Backtrace:


Program received signal SIGSEGV, Segmentation fault.
0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 ) at
/home/andi/gcc/git/gcc/gcc/pretty-print.c:1485
1485{
(gdb) up
#1  0x016c4c90 in pp_append_text(pretty_printer*, char const*, char
const*) ()
at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556
1556  pp_emit_prefix (pp);
(gdb) bt
#0  0x016c4ab2 in pp_emit_prefix (pp=0x229b1a0 )
at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1485
#1  0x016c4c90 in pp_append_text(pretty_printer*, char const*, char
const*) ()
at /home/andi/gcc/git/gcc/gcc/pretty-print.c:1556
#2  0x00b12c83 in pp_c_identifier (pp=0x229b1a0
, id=)
at /home/andi/gcc/git/gcc/gcc/c-family/c-pretty-print.c:1203
#3  0x00992b46 in dump_decl (flags=0, t=0x76d2ce40, pp=0x229b1a0
)
at /home/andi/gcc/git/gcc/gcc/tree.h:3226
#4  dump_function_name(cxx_pretty_printer*, tree_node*, int) () at
/home/andi/gcc/git/gcc/gcc/cp/error.c:1852
#5  0x009940a4 in lang_decl_name(tree_node*, int, bool) () at
/home/andi/gcc/git/gcc/gcc/cp/error.c:3005
#6  0x00994133 in lang_decl_dwarf_name (decl=,
v=, translate=)
at /home/andi/gcc/git/gcc/gcc/cp/error.c:2977
#7  0x0156762a in autofdo::string_table::get_index_by_decl(tree_node*)
const ()
at /home/andi/gcc/git/gcc/gcc/auto-profile.c:477

[Bug ipa/83346] inliner crash with always inline and templates

2017-12-09 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346

--- Comment #1 from Andi Kleen  ---
This fixes it. Don't know why that node has no decl.

Will submit after a test cycle.

diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 7846e93d119..dcd8a3de1ac 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2391,7 +2391,8 @@ ipa_inline (void)
 entry of cycles, possibly cloning that entry point and
 try to flatten itself turning it into a self-recursive
 function.  */
-  if (lookup_attribute ("flatten",
+  if (node->decl
+&& lookup_attribute ("flatten",
DECL_ATTRIBUTES (node->decl)) != NULL)
{
  if (dump_file)

[Bug ipa/83346] New: inliner crash with always inline and templates

2017-12-09 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83346

Bug ID: 83346
   Summary: inliner crash with always inline and templates
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 42820
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42820=edit
test case

Attached test case segfaults with -O2 on gcc 7 and 8 trunk
g++ -O2 -S ch-crash.i

ch-crash.i:30:1: internal compiler error: Segmentation fault
 }
 ^
0xc030f7 crash_signal
../../gcc/gcc/toplev.c:325
0x125b189 ipa_inline
../../gcc/gcc/ipa-inline.c:2388
0x125b189 execute
../../gcc/gcc/ipa-inline.c:2807

[Bug target/83052] [8 Regression] ICE in extract_insn, at recog.c:2305 starting from r254560

2017-11-20 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83052

--- Comment #1 from Andi Kleen  ---
I'm not sure why you call it a regression? You must be running the test suite
manually with the new option. 

I haven't tested, but likely it will fail if you run that test with
-mcmodel=large too. The -mforce-indirect-call patch is really only a subset
of -mcmodel=large.  Then it would be more a latent bug.

[Bug tree-optimization/82854] New: more missing simplifcations

2017-11-05 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854

Bug ID: 82854
   Summary: more missing simplifcations
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

These all come from a paper

"Optgen: A Generator for Local Optimizations" (Buchwald et.al.).
https://pp.info.uni-karlsruhe.de/uploads/publikationen/buchwald15cc.pdf

These were found by a SAT solver.

I wrote them in partial pseudo match.pd syntax (untested, likely buggy)

I'm not sure how useful they are really for real programs, but with the auto
generated matchers scaling well to more rules they wouldn't hurt I suppose.

/* x + (x & 0x8000) -> x & 0x7fff */
(simplify
  (plus:c @0 (bit_and @0 integer_msb_onlyp@1))
  (bit_and @0 { @1 - 1; } ))

/* (x | 0x8000) + 0x8000 -> x & 0x7FFF */
(simplify
  (plus:c (bit_ior @0 integer_msb_onlyp) msb_setp)
  (bit_and @0 { msb_minus_one_val(type); } ))

/* x & (x + 0x8000) -> x & 0x7FFF */
(simplify
  (bit_and:c (plus @0 msb_setp) @0)
  (bit_and @0 { msb_minus_one_val(type); } ))

/* x & (0x7FFF - x) -> x & 0x8000 */
(simplify
  (bit_and:c @0 (minus msb_minus_onep @0))
  (bit_and @0 { msb_val(type); } ))

/* is_power_of_2(c1) && c0 & (2 * c1 - 1) == c1 - 1 ->
   (c0 - x) & c1 -> x & c1 */

/* x | (x + 0x8000) -> x | 0x8000 */
(simplify
  (bit_ior:c @0 (plus @0 msb_onlyp))
  (bit_ior @0 { msb_val(type); } ))

/* x | (0x7FFF - x) -> x | 0x7FFF */
(simplify
  (bit_ior:c @0 (minus 0x7FFF @0))
  (bit_ior @0 0x7FFF))

/* x | (x ^ y) -> x | y */
(simplify
  (bit_ior:c @0 (bit_xor:c @0 @1))
  (bit_ior @0 @1))

/* ((c0 | -c0) & ∼c1) == 0 AND (x + c0) | c1 -> x | c1 */

/* is_power_of_2(∼c1) && c0 & (2 * ∼c1 - 1) == ∼c1 - 1 AND
   (c0 - x) | c1 ->
   x | c1 */

/* -x | 0xFFFE -> x | 0xFFFE */
(simplify
  (bit_or (negate @0) 0xFFFE)
  (bit_or @0 0xFFFE))

/* 0 - (x & 0x8000) -> x & 0x8000 */
(simplify
  (minus 0 (bit_and:c @0 0x8000))
  (bit_and @0 0x8000))

/* 0x7FFF - (x & 0x8000) -> x | 0x7FFF */
(simplify
  (minus 0x7FFF (bit_and @0 0x8000))
  (bit_ior @0 0x7FFF))

/* 0x7FFF - (x | 0x7FFF) -> x & 0x8000 */
(simplify
  (minus 0x7FFF (bit_ior:c @0 0x7FFF))
  (bit_and @0 0x8000))

/* 0xFFFE - (x | 0x7FFF) -> x | 0x7FFF */
(simplify
  (minus 0xFFFE (bit_ior:c @0 0x7FFF))
  (bit_ior @0 0x7FFF))

/* (x & 0x7FFF) - x -> x & 0x8000 */
(simplify
  (minus (bit_and:c @0 0x7FFF) @0)
  (bit_and @0 0x8000))

/* x ^ (x + 0x8000) -> 0x8000 */
(simplify
  (bit_xor:c (plus:c @0 0x8000))
  0x8000)

/* x ^ (0x7FFF - x) -> 0x7FFF */
(simplify
  (bit_xor:c @0 (minus 0x7FFF @0))
  0x7FFF)

/* (x + 0x7FFF) ^ 0x7FFF -> -x */
(simplify
  (bit_xor:c (plus:c @0 0x7FFF) 0x7FFF)
  (negate @0))

/* -x ^ 0x8000 -> 0x8000 - x */
(simplify
  (bit_xor:c (negate @0) 0x8000)
  (minus 0x8000 @0))

/* (0x7FFF - x) ^ 0x7FFF -> x */
(simplify
  (bit_xor:c (minus 0x7FFF @0) 0x7FFF)
  @0)

/* ~(x + c) -> ~c - x */
(simplify
  (bit_not (plus:c @0 CONSTANT_CLASS_P@1))
  (minus (bit_not c) @0))

/* -x ^ 0x7FFF -> x + 0x7FFF */
(simplify
  (bit_xor (negate @0) 0x7FFF)
  (plus @0 0x7FFF))

/* (x | c) - c -> x & ∼c */
(simplify
  (minus (bit_ior @0 CONSTANT_CLASS_P@1) @1)
  (bit_and @0 (bit_not @1)))

/* ~(c - x) -> x + ∼c */
(simplify
  (bit_not (minus CONSTANT_CLASS_P@0 @1))
  (plus @1 (bit_not @0)))

/* -c0 == c1 AND (x | c0) + c1 -> x & ∼c1 */
(simplify
  (plus (bit_or @0 CONSTANT_CLASS_P@1) CONSTANT_CLASS_P@2)
  (if (...)
(bit_and @0 (bit_not @2))

/* (c0 & ∼c1) == 0 AND (x ^ c0) | c1 -> x | c1 */

/* 0x7FFF - (x ^ c) -> x ^ (0x7FFF - c) */

[Bug tree-optimization/82854] more missing simplifcations

2017-11-05 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82854

--- Comment #1 from Andi Kleen  ---
Also I suppose a lot of them could be generalized to 8/16/64bit.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #8 from Andi Kleen  ---
I'm not sure if it works with other numbers too.

(need to dig through Hacker's delight & Matters Computational to see if they
have anything on it)

But it could be extended for other word lengths at least

BTW there are some other cases, will file a bug shortly on those too.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #5 from Andi Kleen  ---
Also I'm not sure why you would want it in the middle end. It should all work
at the tree level

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #4 from Andi Kleen  ---
Right it's about special casing the complete expression

[Bug tree-optimization/82853] New: Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Bug ID: 82853
   Summary: Optimize x % 3 == 0 without modulo
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Ralph Levien pointed out as part of FizzBuzz optimization:

Turns out you can compute x%3 == 0 with even fewer steps, it's (x*0xb)
< 0x5556 (assuming wrapping unsigned 32 bit arithmetic).

gcc currently generates the full modulo and then checks.

Could be done in match.pd I suppose.

Test case

unsigned mod3(unsigned a) { return 0==(a%3); }

[Bug other/82784] Remove semicolon after "do {} while (0)" macros

2017-11-04 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82784

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #5 from Andi Kleen  ---
Sounds like a good candidate for a new warning

[Bug c/82013] New: better error message for missing semicolon in prototype

2017-08-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82013

Bug ID: 82013
   Summary: better error message for missing semicolon in
prototype
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

gcc gives quite poor error messages when forgetting a semicolon after a
prototype (common mistake when cut'n'pasting a function definition into a
header)

It's especially confusing when the prototype is the last in the include file,
because then the errors appear in another file.

As a minimum it should warn about a missing semicolon at the end of a file.

Possibly this could be also used for fix-it, but that's likely more
complicated.

[Bug target/80742] New: attribute target no- does not work

2017-05-14 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80742

Bug ID: 80742
   Summary: attribute target no- does not work
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Disabling ISAs with attribute target doesn't seem to work on x86_64

e.g. 

typedef float __m128 __attribute__ ((vector_size (16)));

__attribute__((target("no-sse2"))) __m128 func (__m128 x, __m128 y)
{
__m128 xmm0 = x, xmm1 = y, xmm2;
xmm0 = __builtin_ia32_xorps (xmm1, xmm1);
return xmm0;
}

does not error out.

[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out

2017-05-10 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067

--- Comment #3 from Andi Kleen  ---
sandra,

does this patch fix it?

diff --git a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
index 6214e3629f2..924a270e1bd 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/cold_partition_label.c
@@ -2,6 +2,7 @@
gets a label.  */
 /* { dg-require-effective-target freorder } */
 /* { dg-options "-O2 -freorder-blocks-and-partition -save-temps" } */
+/* { dg-require-profiling "-fprofile-generate" } */

 #define SIZE 1

[Bug testsuite/79067] gcc.dg/tree-prof/cold_partition_label.c runs a million times longer than it used to and times out

2017-05-10 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79067

--- Comment #2 from Andi Kleen  ---
There's a separate fix for the random failures (or w/a increase
/proc/sys/kernel/perf_event_mlock_kb), see PR 77684

Not running the test on systems without FDO seems best. I don't think it does
anything useful there anyways.

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2017-05-10 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684

--- Comment #5 from Andi Kleen  ---
Created attachment 41337
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41337=edit
limit perf buffer size

This patch allows parallelism upto 16 with the default setting.
Currently testing

[Bug testsuite/77684] many tree-prof testsuite failures in parallel make check

2017-05-05 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77684

--- Comment #4 from Andi Kleen  ---
Thanks for tracing that down. 

So perf runs out of memory for the locked trace buffers

Increasing the limit is a good workaround
ulimit -l may also work, but also needs root.

We could just pass a smaller -m value to perf

Does it work when you change the last line in config/i386/gcc-auto-profile
to add -m 128k 

(or possibly other values, have to be power of two)

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-24 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #8 from Andi Kleen  ---
__builtin_constant_p does not cover variable range information, which is what
we're looking for here to prevent security bugs.

Also in my experience these explicit expressions tend to be somewhat fragile
and is not well specified.  It has to assume that the optimizer does specific
operations which are nowhere guaranteed.

An explicit builtin could be much tighter defined.

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-24 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #6 from Andi Kleen  ---
In the kernel there is also an upper limit on allocations.

Perhaps just a generic assert builtin that:
- uses value range information
- uses constant propagation
- is a nop when the compiler doesn't have either of this available
- otherwise warns at build time

__builtin_compile_assert(size >= 0 && size < MAX_ALLOC_SIZE);

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-24 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #4 from Andi Kleen  ---
I tested it now and the inline trick doesn't work. Here's a test case

extern void *do_alloc(int a, int b);

static inline __attribute__((alloc_size(1))) void check_alloc_size(int size)
{
}

static inline void *alloc(int a, int b)
{
check_alloc_size(a + b);
return do_alloc(a, b);
}

void func(void)
{
alloc(-1, 0);
}

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-09 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #3 from Andi Kleen  ---
Hmm, that trick may work for the shift too. Let me try.

[Bug c/80378] Extend alloc_size attribute for better Linux kernel checking

2017-04-09 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

--- Comment #1 from Andi Kleen  ---
Small correction: argument 4 would need to be a constant for shifted by.

[Bug lto/80379] New: Redundant note: code may be misoptimized unless -fno-strict-aliasing is used

2017-04-09 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80379

Bug ID: 80379
   Summary: Redundant  note: code may be misoptimized unless
-fno-strict-aliasing is used
   Product: gcc
   Version: 6.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

I get an extra

 note: code may be misoptimized unless -fno-strict-aliasing is used

note for type mismatches in LTO builds. But -fno-strict-aliasing is already
set. In this case the extra note is pointless and should be suppressed.

[Bug c/80378] New: Extend alloc_size attribute for better Linux kernel checking

2017-04-09 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80378

Bug ID: 80378
   Summary: Extend alloc_size attribute for better Linux kernel
checking
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

I've been adding alloc_size attributes to the Linux kernel allocators.

However there are some allocator patterns that can currently not be correctly
described. It would be nice if the attribute could be extended with more
parameters to handle this.

One is 

void *alloc(int size_a, int size_b)

where the allocation size is size_a + size_b

The other is

void *alloc_order(int order)

where the allocation size is constant << order

This could be handled by two extra parameters to alloc_size, one to give a sum
argument and another to to give a shifted by argument. The arguments 2,3 would
also need to support a "ignore" parameter (e.g. -1)

[Bug lto/60016] gcc-nm does not report static symbols

2016-09-12 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60016

--- Comment #2 from Andi Kleen  ---
This is needed for example to generate backtraces, if the symbol table should
be built in instead of read from the binary.

The Linux kernel cannot read its own binary, so the symbol table has to built
in.

[Bug gcov-profile/71672] New: inlining indirect calls does not work with autofdo

2016-06-27 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71672

Bug ID: 71672
   Summary: inlining indirect calls does not work with autofdo
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

The current mainline version of autofdo doesn't inline indirect calls based on
profiling data.

I instrumented a bootstrap and it never triggers.

gcc.dg/tree-prof/indir-call-prof.c

also fails (needs the patch kit in
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01786.html applied first). 

I did some debugging and it seems to give up in update_inlined_ind_target()
here

 772   /* Program behavior changed, original promoted (and inlined) target is
not
 773  hot any more. Will avoid promote the original target.
 774 
 775  To check if original promoted target is still hot, we check the total
 776  count of the unpromoted targets (stored in old_info). If it is no
less
 777  than half of the callsite count (stored in INFO), the original
promoted
 778  target is considered not hot any more.  */
 779   if (total >= info->count / 2)

but even with the test commented out it doesn't work.

[Bug target/71659] New: _xgetbv intrinsic missing

2016-06-25 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71659

Bug ID: 71659
   Summary: _xgetbv intrinsic missing
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

icc and microsoft have a _xgetbv intrinsic for the XGETBV instruction, which is
needed to check if AVX or MPX are supported by the kernel.

gcc is missing an intrinsic for that, so everyone has to write inline
assembler. Should add one.

[Bug c/70618] New: better error messages for missing/too many arguments

2016-04-10 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70618

Bug ID: 70618
   Summary: better error messages for missing/too many arguments
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

When doing API refactorings it is reasonable common to have too many or not
enough arguments in function calls. The existing errors in gcc/g++ are not very
good for that, i get at least two consecutive ones and they are not very clear.

Since that is common it would be much better if the compiler could compute the
minimum edit distance to the real prototype (or the nearest for C++) and then
directl ysuggest what arguments are missing or which are too many.

void foo(int *xp, float *yp, double *zp)
{
}

int x;
float y;
double z;
short k;

void f2(void)
{
foo(, );/* forgot x */
foo(, );/* forgot y */
foo(, );/* forgot z */
foo();/* forgot y and z */
foo();/* forgot x and y*/

foo(, , , );/* x too many at end */
foo(, , , );/* x too man at start */
foo(, , , );/* y too much in the middle */
foo(, , , );/* different y in middle */
foo(, , , );/* different x at start */
foo(, , , );/* different x at end */
}
gcc/tsrc/tmissing.c: In function ‘f2’:
gcc/tsrc/tmissing.c:14:6: warning: passing argument 1 of ‘foo’ from
incompatible pointer type [-Wincompatible-pointer-types]
  foo(, );  /* forgot x */
  ^
gcc/tsrc/tmissing.c:3:6: note: expected ‘int *’ but argument is of type ‘float
*’
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:14:10: warning: passing argument 2 of ‘foo’ from
incompatible pointer type [-Wincompatible-pointer-types]
  foo(, );  /* forgot x */
  ^
gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type
‘double *’
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:14:2: error: too few arguments to function ‘foo’
  foo(, );  /* forgot x */
  ^
gcc/tsrc/tmissing.c:3:6: note: declared here
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:15:10: warning: passing argument 2 of ‘foo’ from
incompatible pointer type [-Wincompatible-pointer-types]
  foo(, ); /* forgot y */
  ^
gcc/tsrc/tmissing.c:3:6: note: expected ‘float *’ but argument is of type
‘double *’
 void foo(int *xp, float *yp, double *zp)
  ^
gcc/tsrc/tmissing.c:15:2: error: too few arguments to function ‘foo’
  foo(, ); /* forgot y */

[Bug tree-optimization/70427] autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

--- Comment #3 from Andi Kleen  ---

Analyzing the code more it looks like the compiler generates it correctly, the
edge returned should not be 0 here.

[Bug tree-optimization/70427] autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

--- Comment #2 from Andi Kleen  ---
Created attachment 38110
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38110=edit
somewhat reduced input file, only single function

[Bug tree-optimization/70427] autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

--- Comment #1 from Andi Kleen  ---
Created attachment 38109
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38109=edit
ipa-profile input

Here's the source of the miscompiled file from the compiler

cc1plus -O2 ipa-profile.i  -S

unfortunately have to inspect assembler to see the miscompilation:

look for ipa_generate_profile_summary

then look for get_edge

call_ZN11cgraph_node8get_edgeEP6gimple
testq   %rax, %rax
movq%rax, %r15 
je  .L836< jump if rax/r15 is 0
testb   $2, 96(%rax)
je  .L837
.L836:   <--- it can be here
movq16(%r12), %rax
movq64(%r15), %rsi <-- BAD

same miscompilation here (just with another register). r15 is referenced after
being tested for NULL.

[Bug tree-optimization/70427] New: autofdo bootstrap generates wrong code

2016-03-27 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70427

Bug ID: 70427
   Summary: autofdo bootstrap generates wrong code
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

I've been working on building gcc with an autofdo bootstrap.

Currently I always run into an crash while rebuilding tree.c with the stage2
compiler and the autofdo information 

Looking at the code it is clearly miscompiled in ipa_profile_generate_summary:

struct cgraph_edge * e = node->get_edge (stmt);
if (e && !e->indirect_unknown_callee)
  continue;


   0x0093bb16 <+326>:   callq  0x7be530
<_ZN11cgraph_node8get_edgeEP6gimple> 
   0x0093bb1b <+331>:   test   %rax,%rax   # check for NUULL
   0x0093bb1e <+334>:   mov%rax,%r8
   0x0093bb21 <+337>:   je 0x93bb2d   
<_ZL28ipa_profile_generate_summaryv+349>
   0x0093bb23 <+339>:   testb  $0x2,0x60(%rax)
   0x0093bb27 <+343>:   je 0x93baa7
<_ZL28ipa_profile_generate_summaryv+215>
   0x0093bb2d <+349>:   mov0x10(%r13),%rax # go here because of
NULL
=> 0x0093bb31 <+353>:   mov0x40(%r8),%rsi  # but we still
reference!

(gdb) p $r8
$4 = 0

The crash is on bb31 because r8 is NULL. The code checked the return value of
the call, but then references it afterwards before doing the continue.

Command line option:

cc1plus -fauto-profile=cc1plus.fda  -g -O2 tree.i

cc1plus.fda is at http://halobates.de/cc1plus.fda (too big to attach)

[Bug c/28901] -Wunused-variable ignores unused const initialised variables

2015-11-30 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901

--- Comment #17 from Andi Kleen  ---
There were a few false or useless ones (e.g. related to macros and specific
build configs).  I didn't look through them all, but various were semi
legitimate, but also very minor (small) so fixing it won't help much. I think
one or two of the ones I looked at may have been real bugs.

I still think the warning should not be in -Wall. thousand+ warnings in real
projects is just not acceptable.

[Bug c/28901] -Wunused-variable ignores unused const initialised variables

2015-11-29 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28901

Andi Kleen  changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #14 from Andi Kleen  ---
I'm building a current Linux kernel with allyesconfig, and this new warning
causes
1383(!) new warnings in the build.

I think this should be revisited and the warning be turned off again.

[Bug target/68602] New: i386: -mtune/arch options not all output by -v --help

2015-11-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68602

Bug ID: 68602
   Summary: i386: -mtune/arch options not all output by -v --help
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

gcc -v --help does not output all the possible options for -mtune=/-march=

For example corei7-avx is missing for arch, which is Sandy Bridge. tune is also
mising all cpu names



  -march=CPU[,+EXTENSION...]
  generate code for CPU and EXTENSION, CPU is one of:
   generic32, generic64, i386, i486, i586, i686,
   pentium, pentiumpro, pentiumii, pentiumiii,
pentium4,
   prescott, nocona, core, core2, corei7, l1om, k1om,
   k6, k6_2, athlon, opteron, k8, amdfam10, bdver1,
   bdver2, bdver3, bdver4, btver1, btver2
  EXTENSION is combination of:
   8087, 287, 387, no87, mmx, nommx, sse, sse2, sse3,
   ssse3, sse4.1, sse4.2, sse4, nosse, avx, avx2,
   avx512f, avx512cd, avx512er, avx512pf, avx512dq,
   avx512bw, avx512vl, noavx, vmx, vmfunc, smx, xsave,
   xsaveopt, xsavec, xsaves, aes, pclmul, fsgsbase,
   rdrnd, f16c, bmi2, fma, fma4, xop, lwp, movbe, cx16,
   ept, lzcnt, hle, rtm, invpcid, clflush, nop,
syscall,
   rdtscp, 3dnow, 3dnowa, padlock, svme, sse4a, abm,
   bmi, tbm, adx, rdseed, prfchw, smap, mpx, sha,
   clflushopt, prefetchwt1, se1, clwb, pcommit,
   avx512ifma, avx512vbmi
  -mtune=CPU  optimize for CPU, CPU is one of:
   generic32, generic64, i8086, i186, i286, i386, i486,
   i586, i686, pentium, pentiumpro, pentiumii,
   pentiumiii, pentium4, prescott, nocona, core, core2,
   corei7, l1om, k1om, k6, k6_2, athlon, opteron, k8,
   amdfam10, bdver1, bdver2, bdver3, bdver4, btver1,
   btver2

[Bug lto/66229] LTO fails with -fauto-profile on mcf

2015-11-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66229

--- Comment #2 from Andi Kleen  ---
Some analysis of the problem:

At the time cc1 is streaming out profile_data it is not set to anything in
autofdo. So the LTO files contain all 0 profile data, which later causes the
ICE here.

Seems to be some kind of ordering problem.

Strangely the autofdo pass gets executed in the frontend run, but for unknown
reasons the profile data doesn't survive until the LTO data is written.

[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed

2015-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946

Andi Kleen  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #10 from Andi Kleen  ---
Turned out to be a binutils issue with an old binutils

[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed

2015-09-25 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946

--- Comment #9 from Andi Kleen  ---
Created attachment 36391
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36391=edit
workaround

This workaround fixes it. Disable -gc-section for libstdc++.

It seems like a linker bug. I opened a binutils bug report
https://sourceware.org/bugzilla/show_bug.cgi?id=19008

[Bug lto/50676] Partitioning may fail with presence of static variables referring to function labels

2015-07-18 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50676

--- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org ---
The patch doesn't seem to be checked in yet. Is there a reason for that?

[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-17 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org ---
Created attachment 36008
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36008action=edit
Updated patch with documentation and param

I updated the patch with proper documentation and a param for the cut off.
In some tests it appears to do the right thing when building a Linux kernel.

[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-16 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org ---
I suspect the patch may be too simple because it could get stuck in unlikely,
but high frequency edges in the cold area. Perhaps need to adapt more of the
code of the non partitioning reordering

[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-16 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
Created attachment 35993
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35993action=edit
Potential patch


This patch fixes the problem for my simple test case. It adds a fall back path
to the partition check: if no profile information is available only edges are
checked and everything that has only 20% frequency or less incoming edges is
considered cold.

20% is fairly arbitrary, likely needs tuning and should be a param. But seems
to work for the test case.

Comments?

[Bug rtl-optimization/66890] function splitting only works with profile feedback

2015-07-15 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org ---

The problem seems to be that
bb-reorder.c:find_rarely_executed_basic_blocks_and_crossing_edges
returns no edges without profile feedback, which prevents generation of a
section split note.

[Bug rtl-optimization/66890] New: function splitting only works with profile feedback

2015-07-15 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66890

Bug ID: 66890
   Summary: function splitting only works with profile feedback
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org
  Target Milestone: ---

Consider this simple example:

volatile int count;

int main()
{
int i;
for (i = 0; i  10; i++) {
if (i == 999)
count *= 2;
count++;
}
}

The default EQ is unlikely heuristic in predict.* predicts that the if (i ==
999) is unlikely. So the tracer moves the count *= 2 basic block out of line to
preserve instruction cache.

gcc50 -O2 -S thotcold.c

movl$1, %edx
jmp .L2
.p2align 4,,10
.p2align 3
.L4:
addl$1, %edx
.L2:
cmpl$1000, %edx
movlcount(%rip), %eax
je  .L6
addl$1, %eax
cmpl$10, %edx
movl%eax, count(%rip)
jne .L4
xorl%eax, %eax
ret
# out of line code
.L6:
addl%eax, %eax
movl%eax, count(%rip)
movlcount(%rip), %eax
addl$1, %eax
movl%eax, count(%rip)
jmp .L4


Now if we enable -freorder-blocks-and-partition I would expect it to be also
put into .text.unlikely to given even better cache layout. But that's what is
not happening. It generates the same code.

Only when I use actual profile feedback and -freorder-blocks-and-partition the
code actually ends up being in a separate section

(it also unrolled the loop, so the code looks a bit different)

gcc -O2 -fprofile-generate -freorder-blocks-and-partition thotcold.c
./a.out 
gcc -O2 -fprofile-use -freorder-blocks-and-partition thotcold.c 
...
   .cfi_endproc
.section.text.unlikely
.cfi_startproc
.L55:
movlcount(%rip), %ecx
addl$1, %eax
addl$1, %ecx
cmpl$10, %eax
movl%ecx, count(%rip)
je  .L6
cmpl$1, %edx
je  .L5
cmpl$2, %edx
je  .L28
cmpl$3, %edx


-freorder-blocks-and-partition should already use the extra section even
without profile feedback. 

I tested some larger programs and without profile feedback the unlikely section
is always empty.

The heuristics in predict.* often work quite well and a lot of code would
benefit from moving cold code out of the way of the caches.

This would allow to use the option to improve frontend bound codes without
needing to do full profile feedback.

[Bug lto/61635] LTO partitioner does not handle label in statics

2015-03-29 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61635

--- Comment #7 from Andi Kleen andi-gcc at firstfloor dot org ---
Still happens with current trunk and with newer LTO Linux kernels (4.0-rc*)

[Bug bootstrap/60946] Current 4.9 branch does not boot strap on FC20 with systemtap-devel installed

2015-03-29 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60946

--- Comment #8 from Andi Kleen andi-gcc at firstfloor dot org ---
I still get that one with current trunk on my fedora 21 system.

[Bug c/65620] New: Incorrect warning for !! with -Wlogical-not-parentheses

2015-03-29 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65620

Bug ID: 65620
   Summary: Incorrect warning for !! with
-Wlogical-not-parentheses
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

Created attachment 35172
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35172action=edit
test case

When building the linux 4.0-rc5 kernel with 5.0 there are several imho
bogus warnings like

warning: logical not is only applied to the left hand side of comparison
[-Wlogical-not-parentheses]

for constructs like this:

  !!test_bit(...) != ...

The warning shouldn't warn for !! which is reasonably common. Looking at the
c/cp parsers there is already code to check for this, but it doesn't seem to
work here.

In the kernel case test_bit actually expands to a complex macro like

 if (usage-type == 0x01   !!(__builtin_constant_p((usage-code)) ? 
  constant_test_bit((usage-code), (input-key)) :
variable_test_bit((usage-key),   
   (input-key)))

I'm attaching an (already delta'ed but still quite big) test case

C++ likely has the same problem (but not tested)

[Bug bootstrap/65621] New: boot strap with checking enabled ICEs

2015-03-29 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65621

Bug ID: 65621
   Summary: boot strap with checking enabled ICEs
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

target: x86_64-linux

../../../../gcc/libstdc++-v3/libsupc++/tinfo.cc:82:1: internal compiler error:
in mark_functions_to_output, at cgraphunit.c:1307
 }
 ^
0xb25f0b mark_functions_to_output
../../gcc/gcc/cgraphunit.c:1302
0xb29137 symbol_table::compile()
../../gcc/gcc/cgraphunit.c:2330
0xb29313 symbol_table::finalize_compilation_unit()
../../gcc/gcc/cgraphunit.c:2444
0x884c9a cp_write_global_declarations()
../../gcc/gcc/cp/decl2.c:4755

[Bug bootstrap/65621] boot strap with checking enabled ICEs

2015-03-29 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65621

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org ---
Never mind. Was caused by a local modification.

[Bug ipa/64963] IPA Cloning/Splitting does not copy function section attributes resulting in kernel miscompilation

2015-02-10 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64963

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org ---
In theory the kernel could mark __init functions with noclone.

But I think sticky behavior would be better. That's the behavior that the
kernel expects. There isn't any code as far as I know that would expect only a
single function per section.

[Bug ipa/64963] [5 Regression] IPA Cloning/Splitting does not copy function section attributes resulting in kernel miscompilation

2015-02-10 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64963

--- Comment #10 from Andi Kleen andi-gcc at firstfloor dot org ---
Yes it has to be fixed. For example with the kernel __kprobes attribute it
could cause a real bug (__kprobes marks function that cannot be safely
instrumented)

We shouldn't inline over different section names either, this could also cause
problems for the same reason.

[Bug tree-optimization/64130] New: vrp: handle non zero constant divided by range cannot be zero.

2014-11-30 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64130

Bug ID: 64130
   Summary: vrp: handle non zero constant divided by range cannot
be zero.
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

The following two functions should always be optimized to return 0
because x  0, x / a cannot be 0. But VRP misses this case for unknown 
reasons, even though it has some code for it in ranges_from_anti_range()

int fsigned(int a)
{
return 100 / a == 0;
}

int funsigned(unsigned a)
{
return 100 / a == 0;
}

gcc50 -fno-non-call-exceptions -O2 -S tvrpdiv.c

gcc version 5.0.0 2014 (experimental) (GCC) 

movl$100, %eax
cltd
idivl   %edi
testl   %eax, %eax
sete%al
movzbl  %al, %eax
ret

xorl%edx, %edx
movl$100, %eax
divl%edi
testl   %eax, %eax
sete%al
movzbl  %al, %eax

[Bug tree-optimization/64130] vrp: handle non zero constant divided by range cannot be zero.

2014-11-30 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64130

--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org ---
You're right. I actually meant

x = maxval(typeof(a)), x / a   cannot be 0.

Corrected test case (assuming 64bit target):

#include limits.h

int fsigned(int a)
{
return 0x1fffL / a == 0;
}

int funsigned(unsigned a)
{
return 0x1fffL / a == 0;
}

So this should be optimized to a  100 instead.

Yes this would make sense too.

[Bug tree-optimization/63844] [4.8/4.9/5 Regression] open mp parallelization prevents vectorization

2014-11-18 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844

--- Comment #12 from Andi Kleen andi-gcc at firstfloor dot org ---
Yes should have been omp parallel for

[Bug tree-optimization/63844] [4.8/4.9/5 Regression] open mp parallelization prevents vectorization

2014-11-18 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844

--- Comment #13 from Andi Kleen andi-gcc at firstfloor dot org ---

I think aggregate IPA-CP does that, IPA-SRA cannot as the function has
its address taken.

Perhaps that case (only passing address to gomp runtime) could be special cased
in the escape analysis.

[Bug bootstrap/63933] Build stage1 with -O2 during bootstrap if host compiler is a recent gcc version

2014-11-18 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63933

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org ---
Perhaps using -Og (or -O1) if available?

I actually like to use unoptimized stage1 gcc to debug things with gdb,

The last time I checked the worst offenders were some of the C++ inlines not
getting inlined, and especially the new RTL code very heavily relies on that.
Perhaps just 

#define inline __attribute__((always_inline)) inline

for stage1 would be good enough to fix the worst.

[Bug tree-optimization/63844] open mp parallelization prevents vectorization

2014-11-17 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
Regression, doesn't happen on 4.8

[Bug tree-optimization/63844] [4.8/4.9/5 Regression] open mp parallelization prevents vectorization

2014-11-17 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844

--- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org ---
I had a typo in the test case (remove += to make the loops identical)

#define N 1000
int a[N], b[N], c[N];

main()
{

int i;

#pragma omp parallel num_threads(4)
for (i = 0; i  N; i++) {
a[i] = b[i] + c[i];
}
for (i = 0; i  N; i++) {
a[i] = b[i] + c[i];
}
}

The case I saw vectorized on 4.8 (opensuse 13.1 compiler), but not on 5.0, was
slightly different, auto parallelized

#define N 1000
int a[N], b[N], c[N];

main()
{

int i;
for (i = 0; i  N; i++) {
a[i] = b[i] + c[i];
}
}
With -O3 -mtree-parallelize-loops=4
I understand this will just internally generate openmp

[Bug tree-optimization/63844] New: open mp parallelization prevents vectorization

2014-11-12 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63844

Bug ID: 63844
   Summary: open mp parallelization prevents vectorization
   Product: gcc
   Version: 4.9.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

#define N 1000
int a[N], b[N], c[N];

main()
{

int i;

#pragma omp parallel num_threads(4)
for (i = 0; i  N; i++) {
a[i] = b[i] + c[i];
}
for (i = 0; i  N; i++) {
a[i] += b[i] + c[i];
}
}


compiled with gcc -O3 -fopenmp

The first loop gets parallelized by openmp, the second loop gets vectorized.
But why does the parallelized loop not get vectorized too?

[Bug c/60804] Another CilkPlus ICE in gimplify_expr, at gimplify.c:8335

2014-11-11 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60804

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Andi Kleen andi-gcc at firstfloor dot org ---
Should be all fixed now in mainline.

[Bug target/63672] New: xbegin/xend/xabort missing memory barriers

2014-10-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63672

Bug ID: 63672
   Summary: xbegin/xend/xabort missing memory barriers
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

Created attachment 33835
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33835action=edit
proposed patch adding barriers

No test case currently, but we got a report that the builtins for x86 RTM
xbegin/xend/xabort are missing implicit memory barriers. This can cause code to
be moved outside the critical sections, breaking the program.

[Bug middle-end/63556] New: gcc should dedup string postfixes

2014-10-16 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63556

Bug ID: 63556
   Summary: gcc should dedup string postfixes
   Product: gcc
   Version: 4.9.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

With this code:

extern void func(char *a, char *b);

void f(void)
{
func(abc, xabc);
func(abc, abc);
}

we get:

.LC0:
.string xabc
.LC1:
.string abc

So the abcs get deduped. But it could also dedup the postfix by pointing
abc to xabc + 1. This would save some space.

[Bug middle-end/63556] gcc should dedup string postfixes

2014-10-16 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63556

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug c/63543] New: incomplete type error should suppress duplicates

2014-10-15 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63543

Bug ID: 63543
   Summary: incomplete type error should suppress duplicates
   Product: gcc
   Version: 4.9.2
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

For a test case like this

struct undefined;

int f(struct undefined *f)
{
int x = f-a;
return x + f-a + f-b;
}

tmissing-type.c: In function 'f':
tmissing-type.c:5:11: error: dereferencing pointer to incomplete type
  int x = f-a;
   ^
tmissing-type.c:6:14: error: dereferencing pointer to incomplete type
  return x + f-a + f-b;
  ^
tmissing-type.c:6:21: error: dereferencing pointer to incomplete type
  return x + f-a + f-b;


gcc outputs three different errors for each reference of the undefined type.
It would be better if it remembered that it already gave an error for
referencing that type and suppress the later errors (similar to undefined
symbols). This would avoid cascading errors.

[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)

2014-10-07 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969

--- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org ---
I looked at this a bit more. It's definitely the nrv pass that causes the
problem.

When I disable it in the source code the 32bit version compiles correctly.
I also tried disabling the next pass (cfgcleanup), but that didn't make a
difference.

It converts the local variable to be a value-expr.

It's still not exactly clear who deletes the variable declaration though.

There are two possibilities:
- nrv shouldn't convert the variable in the first place
- someone who messes with the variables forgets to check for value-exprs.

;; Function func_52 (func_52, funcdef_no=86, decl_uid=2858, cgraph_uid=54,
symbol_order=1152)

NRV Replaced: l_55  with: retval
func_52 (uint32_t p_53)
{
  extern const struct S0 l_55 = {.f0=4, .f1=40290, .f2=10, .f3=4} [value-expr:
retval];

  bb 2:
  return retval;

}

[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)

2014-10-07 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969

--- Comment #9 from Andi Kleen andi-gcc at firstfloor dot org ---
Patch fixes the test case.

[Bug c/63462] [RFC] gcc should prevent from overwriting source file

2014-10-06 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63462

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org ---
Agreed this would be a useful feature. Happened to me at least once too.

[Bug libstdc++/63466] New: sstream is very slow

2014-10-06 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63466

Bug ID: 63466
   Summary: sstream is very slow
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

sstream is very slow. Comparing two simple programs that parse a stream with C
and with sstream. The sstream version is an order of magnitude slower.

gcc version 4.9.1 20140423 (prerelease) (GCC) 

# C++
% time ./a.out   testfile 

real0m0.893s
user0m0.888s
sys0m0.004s


# C
time ./tstream-c   testfile 

real0m0.032s
user0m0.030s
sys0m0.001s

Here's a profile.

16.13%a.out  libc-2.18.so [.] _IO_getc  
10.39%a.out  libc-2.18.so [.] _IO_ungetc
 9.15%a.out  libstdc++.so.6.0.20  [.] std::basic_istreamchar,
std::char_traitschar  std::getlinechar, std::char_traitschar,
std::allocatorchar (std::basic_istreamchar, std::char_traitschar ,
std::basic_stringchar, std::char_traitschar, std::allocatorchar , char)  
 7.87%a.out  libstdc++.so.6.0.20  [.] __dynamic_cast
 4.99%a.out  libc-2.18.so [.] __GI___strcmp_ssse3   
 3.95%a.out  libstdc++.so.6.0.20  [.] std::basic_istreamchar,
std::char_traitschar  std::operatorchar, std::char_traitschar,
std::allocatorchar (std::basic_istreamchar, std::char_traitschar ,
std::basic_stringchar, std::char_traitschar, std::allocatorchar )
 3.89%a.out  libc-2.18.so [.] _int_free 
 2.79%a.out  libstdc++.so.6.0.20  [.]
__cxxabiv1::__vmi_class_type_info::__do_dyncast(long,
__cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info
const*, void const*, __cxxabiv1::__class_type_info const*, void const*,
__cxxabiv1::__class_type_info::__dyncast_result) const
 2.65%a.out  a.out[.] main  
 2.58%a.out  libc-2.18.so [.] malloc
 2.30%a.out  libstdc++.so.6.0.20  [.]
__cxxabiv1::__si_class_type_info::__do_dyncast(long,
__cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info
const*, void const*, __cxxabiv1::__class_type_info const*, void const*,
__cxxabiv1::__class_type_info::__dyncast_result) const 
 1.96%a.out  libc-2.18.so [.] _int_malloc   
 1.86%a.out  libstdc++.so.6.0.20  [.]
std::istream::sentry::sentry(std::istream, bool)   
 1.55%a.out  libc-2.18.so [.] _IO_sputbackc 
 1.51%a.out  libstdc++.so.6.0.20  [.]
__gnu_cxx::stdio_sync_filebufchar, std::char_traitschar ::underflow()   

Test case:

Generate test file:

 perl -e 'for($i=0;$i100;$i++) { printf(%d %d\n, $i, $i); } '  testfile

C++ version:

#include iostream
#include string
#include sstream

using namespace std;

void __attribute__((noinline, noclone)) func(string , string )
{
}

int main()
{
string line;
while (getline(cin, line)) {
istringstream iss(line);
string index, s;

if (!(iss  index  s))
   continue;
func(index, s);
}
return 0;
}

C version:

#define _GNU_SOURCE 1
#include stdio.h
#include string.h

void __attribute__((noinline, noclone)) func(char *a, char *b)
{
}

int main()
{
char *line = NULL;
size_t linelen = 0;
while (getline(line, linelen, stdin)  0) {
char *p = line;
char *a = strsep(p,  \t\n);
char *b = strsep(p,  \t\n);
func(a, b);
}
return 0;
}

[Bug tree-optimization/63467] New: should have asm statement that does not prevent vectorization

2014-10-06 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63467

Bug ID: 63467
   Summary: should have asm statement that does not prevent
vectorization
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

Currently any inline asm statement in a loop prevents vectorization, like

#define N 100
int a[N], b[N], c[N];

main()
{
int i;
for (i = 0; i  N; i++) {
asm();
a[i] = b[i] + c[i];
}
}

Without the asm the loop vectorizes fine.

This is a problem if you want to add markers into the loop body for static
assembler code analysis (for example with IACA,
https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)

Should have some way to tell the compiler that a particular inline asm
statement does not have any side effects that prevent vectorization or other
loop transformations.

Perhaps an asm const ?

[Bug tree-optimization/63467] should have asm statement that does not prevent vectorization

2014-10-06 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63467

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
It's the same with asm( :::);

At least the vectorizer bombs out on any asm.

[Bug tree-optimization/63467] should have asm statement that does not prevent vectorization

2014-10-06 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63467

--- Comment #6 from Andi Kleen andi-gcc at firstfloor dot org ---
For the marker case it's enough if it just stays in the same position in the
basic block and does get duplicated if the BB gets too.

That's somewhat special semantics, that is why I think it would need some way
to annotate (asm const?)

Ok maybe Andrew's trick works, but it seems fragile. Would that work for other
loop transformations (like graphite) too?

[Bug libstdc++/63466] sstream is very slow

2014-10-06 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63466

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
Looking at the profile there's plenty of room for optimization. e.g. not using
getc/ungetc, but directly accessing the buffer, or maybe even some kind of
template specialization.

With the variables pulled out it's faster, but still a lot slower than C:

% time ./a.out  testfile 
real0m0.400s
user0m0.397s
sys0m0.002s
% time ./tstream-c  testfile

real0m0.033s
user0m0.028s
sys0m0.004s

[Bug c/61898] Variadic functions accept va_list without warning

2014-10-04 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61898

--- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org ---
The patch has several issues (making it  currently fail bootstrap):
- it warns for vfprintf too (fixed)
- on i386 it gets confused between va_list * and char *, so something like

char *format;
char buf[100];

printf(format, buf)

warns too because the underlying types are the same.
Not sure about a good solution for this, need a new type attribute?

[Bug c/63450] Optimizing -O3 generates rep ret on an almost empty function

2014-10-03 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63450

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
This is a feature in -mtune=generic because it helps branch prediction in some
older AMD CPUs. If you're optimizing for Atom you'll get even more padding due
to other reasons. Optimizing e.g. for nehalem should avoid it.

[Bug c/61898] Variadic functions accept va_list without warning

2014-09-30 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61898

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org ---
Created attachment 33633
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33633action=edit
Proposed patch

This patch implements the warning for the non constant format case.

Not done for passing va_list to a real format, but I assume that is rare and in
most cases caught by the normal type checking.

Let me know if it works.

[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org ---
I did some experiments. I can reproduce it with trunk for 32bit.

The interesting part is that the printed value seems to be uninitialized on the
stack and changes on every run. a valgrind run gives


=23130== Use of uninitialised value of size 4
==23130==at 0x40B102B: _itoa_word (in /lib/libc-2.18.so)
==23130==by 0x40B474A: vfprintf (in /lib/libc-2.18.so)
==23130==by 0x40BAFCE: printf (in /lib/libc-2.18.so)
==23130==by 0x40879D2: (below main) (in /lib/libc-2.18.so)
==23130==  Uninitialised value was created by a stack allocation
==23130==at 0x80482F4: main (in /home/andi/Downloads/pr61969/t)
==23130== 
... more warnings like this ...

[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969

--- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org ---

The problem is when returning a struct from func_52:

const struct S0 func_52 (uint32_t p_53)
{
  const struct S0 l_55 = { 4, 40290, 10, 4 };
  return l_55;
}

The main code stores the struct value from the stack into the global variable
and eventually prints it

 80482f4:   83 ec 38sub$0x38,%esp
 80482f7:   0f b6 15 4c da 04 08movzbl 0x804da4c,%edx
 80482fe:   8b 1d 20 d1 04 08   mov0x804d120,%ebx
 8048304:   0f b6 35 70 da 04 08movzbl 0x804da70,%esi
 804830b:   e8 b0 0c 00 00  call   8048fc0 func_52
 8048310:   0f b7 45 d2 movzwl -0x2e(%ebp),%eax
 8048314:   ba 01 00 00 00  mov$0x1,%edx
 8048319:   c7 05 20 d1 04 08 48movl   $0x804da48,0x804d120
 8048320:   da 04 08 
 8048323:   66 a3 5c da 04 08   mov%ax,0x804da5c


But func_52 has been completely optimized away and puts nothing onto the stack:


08048fc0 func_52:
 8048fc0:   f3 c3   repz ret 
 8048fc2:   8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi
 8048fc9:   8d bc 27 00 00 00 00lea0x0(%edi,%eiz,1),%edi

So the value is random stack garbage.

[Bug lto/61969] [4.8/4.9/5 Regression] wrong code by LTO on i?86-linux-gnu (affecting trunk, 4.9.x, and 4.8.x)

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61969

--- Comment #5 from Andi Kleen andi-gcc at firstfloor dot org ---
func_52 disappears during/after nrv:

in 173t.nrv:

;; Function func_52 (func_52, funcdef_no=86, decl_uid=2858, cgraph_uid=54,
symbol_order=1152)

func_52 (uint32_t p_53)
{
  extern const struct S0 l_55 = {.f0=4, .f1=40290, .f2=10, .f3=4} [value-expr:
retval];

  bb 2:
  return retval;

}

in 174t.optimized

;; Function func_52 (func_52, funcdef_no=86, decl_uid=2858, cgraph_uid=54,
symbol_order=1152)

func_52 (uint32_t p_53)
{
  bb 2:
  return retval;

}

[Bug rtl-optimization/61605] Potential optimization: Keep unclobbered argument registers live across function calls

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61605

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org,
   ||tom at codesourcery dot com

--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org ---
This is in theory implemented in mainline with -fuse-caller-save
It doesn't seem to work for me though. I also didn't see the option doing
anything on a larger program.

[Bug rtl-optimization/61605] Potential optimization: Keep unclobbered argument registers live across function calls

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61605

--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org ---
It was supposed to be enabled with 

Date:   Fri May 30 11:39:49 2014 +

-fuse-caller-save - Enable for i386

2014-05-30  Tom de Vries  t...@codesourcery.com

* config/i386/i386.c (TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS):
Redefine as true.

* gcc.target/i386/fuse-caller-save.c: New test.
* gcc.dg/ira-shrinkwrap-prep-1.c: Run with -fno-use-caller-save.
* gcc.dg/ira-shrinkwrap-prep-2.c: Same.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@211078
138bc75d-0d04-0410-961f-82ee72b054a4

[Bug rtl-optimization/63384] selective scheduling on x86 takes very long

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384

--- Comment #3 from Andi Kleen andi-gcc at firstfloor dot org ---
It loops (forever?) on this in sched2:


Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

Scheduling on fences: (uid:28;seqno:7;) 
Fence 28[2] has not changed

[Bug tree-optimization/36602] memset should be optimized into an empty CONSTRUCTOR

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36602

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #9 from Andi Kleen andi-gcc at firstfloor dot org ---
Any progress on fixing the test case, so that this can be finally fixed?

[Bug rtl-optimization/63384] scheduler loops on endless fence list with -fselective-scheduling2 on x86

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384

--- Comment #4 from Andi Kleen andi-gcc at firstfloor dot org ---
It loops forever in this loop in sel_sched_region_2

  while (fences)
{
  int min_seqno, max_seqno;
  ilist_t scheduled_insns = NULL;
  ilist_t *scheduled_insns_tailp = scheduled_insns;

  find_min_max_seqno (fences, min_seqno, max_seqno);
  schedule_on_fences (fences, max_seqno, scheduled_insns_tailp);
  fences = calculate_new_fences (fences, orig_max_seqno, max_time);
  highest_seqno_in_use = update_seqnos_and_stage (min_seqno, max_seqno,
  highest_seqno_in_use,
  scheduled_insns);
}

because calculate_new_fences always comes up with a list which is the same as
before. In move_fence_to_fences it always goes into the else

 f = flist_lookup (FLIST_TAIL_HEAD (new_fences),
FENCE_INSN (FLIST_FENCE (old_fences)));
  if (f)
{
  merge_fences (f, old-insn, old-state, old-dc, old-tc,
old-last_scheduled_insn, old-executing_insns,
old-ready_ticks, old-ready_ticks_size,
old-sched_next, old-cycle, old-issue_more,
old-after_stall_p);
}
  else
{
  _list_add (tailp);
  FLIST_TAIL_TAILP (new_fences) = FLIST_NEXT (*tailp);


So something is going wrong in flist_lookup.

[Bug middle-end/61848] [5 Regression] a previous declaration causes the section attribute to be lost

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848

--- Comment #16 from Andi Kleen andi-gcc at firstfloor dot org ---
Can Alan's patch be submitted please?

I always need to apply it now before compiling a kernel.

[Bug middle-end/61848] [5 Regression] a previous declaration causes the section attribute to be lost

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848

--- Comment #20 from Andi Kleen andi-gcc at firstfloor dot org ---
So the only problem was the missing test case, which you supplied?

[Bug middle-end/63404] New: gcc 5 miscompiles linux block layer

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63404

Bug ID: 63404
   Summary: gcc 5 miscompiles linux block layer
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

When I boot a current Linux mainline kernel compiled with mainline gcc 
and the section fix patch applied I always get a crash at boot in the block
layer.

gcc version 5.0.0 20140926 (experimental) (GCC) 

1.318801] EXT4-fs (sda1): write access will be enabled during recovery
[1.367592] [ cut here ]
[1.369061] kernel BUG at /home/andi/lsrc/linux/block/blk-flush2.c:80!
[1.370910] invalid opcode:  [#1] SMP 


I narrowed it down to one function. When only the function is compiled with gcc
4.9 the kernel boots.

Attach is a test case with only the function.
It doesn't quite run by itself yet, so the code has to be examined.

[Bug middle-end/63404] gcc 5 miscompiles linux block layer

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63404

--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org ---
Created attachment 33607
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33607action=edit
not quite yet runnable test case


In the real execution blk_flush_complete_seq always ends up in the default case
in the switch and crashes.

[Bug target/63404] gcc 5 miscompiles linux block layer

2014-09-28 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63404

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

  Component|middle-end  |target

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
The switch is miscompiled and destroys the flags register in the middle of a
comparison:

.LVL2:
.loc 1 49 0
cmpl$2, %eax#, seq
je  .L5 #,
shrb$2, %r12b   #, D.32130   BAD1
andl$1, %r12d   #, D.32130   BAD2
jbe .L24#,
cmpl$4, %eax#, seq
je  .L7 #,
cmpl$8, %eax#, seq
jne .L4 #,


gcc 4.9 creates the same code except for BAD1/BAD2. These two
JBE relies on CF/ZF being preserved, but SHR can overwrite ZF/CF,
which breaks the JBE after the CMP

So somehow the backend lost track of these two flag bits.

[Bug rtl-optimization/63384] ICE in moveup_expr_chached-sel_bb_head-bb_node with special options

2014-09-27 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384

--- Comment #1 from Andi Kleen andi-gcc at firstfloor dot org ---
With a newer compiler version

gcc version 5.0.0 20140926 (experimental) (GCC) 


the test case doesn't crash anymore, but just runs very very long. I killed it
after 20s. This happens with the following two options:


g++50 matrix.i -o outfile -O2  -fvar-tracking-assignments-toggle 
-fselective-scheduling2

The overhead is mostly in the scheduler:

  - sched_analyze_insn(deps_desc*, rtx_def*, rtx_insn*) ▒
  - 99.39% deps_analyze_insn(deps_desc*, rtx_insn*)  ▒
   tick_check_p(_expr*, deps_desc*, _fence*) ▒
   fill_insns(_fence*, int, _list_node***)   ▒
   sel_sched_region_2(int)   ▒
   sel_sched_region(int) ▒
   run_selective_scheduling()▒
   (anonymous namespace)::pass_sched2::execute(function*)▒
   execute_one_pass(opt_pass*)   ▒
   execute_pass_list_1(opt_pass*)▒
   execute_pass_list_1(opt_pass*)▒
   execute_pass_list_1(opt_pass*)▒
   execute_pass_list(function*, opt_pass*)   ▒
   cgraph_node::expand() ▒
   symbol_table::compile()   ▒
   symbol_table::finalize_compilation_unit() ▒
   cp_write_global_declarations()▒
   compile_file()▒
   toplev_main(int, char**)  ▒
   __libc_start_main ▒
  + 0.61% tick_check_p(_expr*, deps_desc*, _fence*) 


sched_analyze_insn(deps_desc*, rtx_def*, rtx_insn*) 


   │
   │for (i = 0; i  FIRST_PSEUDO_REGISTER; i++)
 12.84 │ 748:   add$0x1,%r13d
  0.07 │add$0x30,%r14
   │cmp$0x4d,%r13d
   │  ↓ je 7e5
   │  if (TEST_HARD_REG_BIT (implicit_reg_pending_uses, i))
  0.06 │ 75a:   mov%r13d,%eax
 12.45 │shr$0x6,%eax
  0.17 │mov0x1828100(,%rax,8),%rax
  6.06 │bt %r13,%rax
  6.21 │  ↑ jae748
   │{

[Bug rtl-optimization/63384] selective scheduling on x86 takes very long

2014-09-27 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63384

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

  Attachment #33585|0   |1
is obsolete||

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
Created attachment 33600
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33600action=edit
Reduced test case for long compile time

Oddly the problem goes away when the variable allocation that is not used is
commented out.

[Bug target/63382] New: gcc 5 breaks linux early bootup in QEMU

2014-09-26 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63382

Bug ID: 63382
   Summary: gcc 5 breaks linux early bootup in QEMU
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andi-gcc at firstfloor dot org

No debug so far. But a gcc 5 compiled x86 Linux kernel cannot boot in qemu/KVM
with -kernel bzImage. qemu always resets and loops directly after starting to
execute the kernel image. The same kernel compiled with an older compiler works
fine.

[Bug middle-end/61848] [5 Regression] a previous declaration causes the section attribute to be lost

2014-09-26 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61848

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 CC||andi-gcc at firstfloor dot org

--- Comment #15 from Andi Kleen andi-gcc at firstfloor dot org ---
*** Bug 63382 has been marked as a duplicate of this bug. ***

[Bug target/63382] gcc 5 breaks linux early bootup in QEMU

2014-09-26 Thread andi-gcc at firstfloor dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63382

Andi Kleen andi-gcc at firstfloor dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Andi Kleen andi-gcc at firstfloor dot org ---
Yes, Alan's patch fixes it. So it's a dup.

*** This bug has been marked as a duplicate of bug 61848 ***

< 1 2 3 4 5 6 >

101 - 200 of 513 matches

Mail list logo