[Bug tree-optimization/86625] funroll-loops doesn't unroll, producing >3x assembly and running 10x slower than manual complete unrolling

2018-07-22 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86625

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org
  Component|rtl-optimization|tree-optimization

--- Comment #1 from Alexander Monakov  ---
Please supply testcase(s) as Bugzilla attachments, not external links.

At -O3/-Ofast the main issue is early unrolling ('cunrolli') splatting all
simple 16-iteration inner loops. After that imho all hope is lost, and yeah,
looks like we try to vectorize across the other dimension.

With -O3 -fdisable-tree-cunrolli, or with -O2 -ftree-vectorize we do get the
correct vectorization pattern, but a couple of problems remain: after vect,
tree optimizations cannot hoist/sink memory references out of the outer loop,
leaving 2 loads, 1 load-broadcast and 1 store per each fma. Later, RTL PRE
cleans up redundant vector loads, but load-broadcasts and stores remain.

[Bug c/86617] [6/7/8/9 Regression] Volatile qualifier is ignored sometimes for unsigned char

2018-07-21 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86617

Alexander Monakov  changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-07-21
 CC||amonakov at gcc dot gnu.org
Summary|Volatile qualifier is   |[6/7/8/9 Regression]
   |ignored sometimes for   |Volatile qualifier is
   |unsigned char   |ignored sometimes for
   ||unsigned char
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
Confirmed, 'unsigned short' is similarly mishandled, but not wider integer
types. gcc-4.9 got this right. Appears like over-eager folding in the frontend:
in the .original dump I get

{
  u8 = u8 * 2;
  u8 = u8, 0;
}

[Bug c++/86586] [6/7/8/9 Regression] -Wsign-compare affects code generation

2018-07-19 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86586

--- Comment #7 from Alexander Monakov  ---
Another possible compromise is to add 'bool for_warnings = false' argument to
maybe_constant_value, store it along with the reduced tree in cv_cache (perhaps
even by setting a flag on the tree itself?), and then when retrieving from
cv_cache when !for_warnings, but the retrieved tree has the flag set, throw it
away and recompute.

That should be a fairly simple change that keeps the current speed when the
warnings are disabled or main code generation needs the reduced tree before
some of the warnings do.

[Bug c++/86586] New: [6/7/8/9 Regression] -Wsign-compare affects code generation

2018-07-19 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86586

Bug ID: 86586
   Summary: [6/7/8/9 Regression] -Wsign-compare affects code
generation
   Product: gcc
   Version: 6.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
Blocks: 86518
  Target Milestone: ---

void f ()
{
  __builtin_cpu_supports ("avx2") && __builtin_cpu_supports ("ssse3");
}

ICEs with 'g++ -std=c++98 -fcompare-debug=-Wsign-compare'. This is minimized
from mv1.C in the testsuite.

I know it's inconvenient that this test depends on an x86-specific builtin, but
unfortunately I don't see other tests failing (apart from cp/mangle.c
miscomparing on bootstrap with/without the warning).

This may be similar to PR 86567: there's a use of maybe_constant_value guarded
by warn_sign_compare.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518
[Bug 86518] Strengthen bootstrap comparison by not enabling warnings at stage3

[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3

2018-07-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518

--- Comment #9 from Alexander Monakov  ---
One more: -Wimplicit-fallthrough issue uncovered by the testsuite: PR 86575.

So far all issues appeared in gcc-6 or more recent.

[Bug middle-end/86575] New: -Wimplicit-fallthrough affects code generation

2018-07-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86575

Bug ID: 86575
   Summary: -Wimplicit-fallthrough affects code generation
   Product: gcc
   Version: 7.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

void
f2 (int a, int b, int c, int d)
{
  switch (b)
{
default:
  for (int e = 0; e < c; ++e)
if (e == d)
  break;
}
}

ICEs as both C and C++ using 'gcc -fcompare-debug=-Wimplicit-fallthrough'. This
is minimized from pr81275-1.C in the testsuite (the -2 and -3 variants of the
original test also fail).

[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3

2018-07-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518

--- Comment #8 from Alexander Monakov  ---
Other files seem to miscompare due to -Wnonnull-compare: PR 86569.

[Bug c++/86569] New: -Wnonnull-compare affects code generation

2018-07-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86569

Bug ID: 86569
   Summary: -Wnonnull-compare affects code generation
   Product: gcc
   Version: 6.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

bool b;

int main ()
{
  return ((!b) != 0);
}

ICEs with g++ -fcompare-debug=-Wnonnull-compare (this is bool6.C in the
testsuite). It looks as if the warning prevents folding '!b != 0' to '!b'.

[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3

2018-07-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518

--- Comment #7 from Alexander Monakov  ---
cp/mangle.o miscompares due to -Wsign-compare, possibly due to caching in
maybe_constant_value as in the above PR.

[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3

2018-07-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518

--- Comment #6 from Alexander Monakov  ---
GCC 7 sadly has a similar list of miscomparing files. Did not check GCC 6 yet.

So far I managed to catch one set of misbehaving warnings by checking testsuite
fallout with -fcompare-debug=-Wall, but unfortunately fixing those would not
reduce the number of bootstrap miscompares: PR 86567.

[Bug c++/86567] New: [8/9 Regression] -Wnonnull/-Wformat/-Wrestrict affect code generation

2018-07-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86567

Bug ID: 86567
   Summary: [8/9 Regression] -Wnonnull/-Wformat/-Wrestrict affect
code generation
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

#include 

std::vector
f()
{
  std::vector r;
  return r;
}

starting with gcc-8 ICEs using 'g++ -fcompare-debug=-Wnonnull' (as well as
Wformat, Wrestrict, Wsuggest-attribute=format)


cp/call.c:build_over_call() has:

  if (warn_nonnull
  || warn_format
  || warn_suggest_attribute_format
  || warn_restrict)
{
  tree *fargs = (!nargs ? argarray
: (tree *) alloca (nargs * sizeof (tree)));
  for (j = 0; j < nargs; j++)
{
  /* For -Wformat undo the implicit passing by hidden reference
 done by convert_arg_to_ellipsis.  */
  if (TREE_CODE (argarray[j]) == ADDR_EXPR
  && TYPE_REF_P (TREE_TYPE (argarray[j])))
fargs[j] = TREE_OPERAND (argarray[j], 0);
  else
fargs[j] = maybe_constant_value (argarray[j]);
}

  warned_p = check_function_arguments (input_location, fn, TREE_TYPE (fn),
   nargs, fargs, NULL);
}


which if bypassed does not cause the ICE, which indicates that something in the
snippet may affect code generation (not investigating further).

[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3

2018-07-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518

--- Comment #4 from Alexander Monakov  ---
Yep, that's correct: -Wno-narrowing is necessary for build to succeed at all.

[Bug bootstrap/86518] New: Strengthen bootstrap comparison by not enabling warnings at stage3

2018-07-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518

Bug ID: 86518
   Summary: Strengthen bootstrap comparison by not enabling
warnings at stage3
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

Currently stage2 and 3 use the same warning options, but that is redundant: if
any warnings are generated, they will be present at stage2 (and stop
bootstrap). By not enabling any warnings for stage3, we would get checking that
warnings do not affect code generation.

Note that simply adding -w at stage3 doesn't work, as it simply suppresses the
warning at print time.

I tried leaving only -Wno-narrowing in warning flags and got many comparison
failures:

Comparing stages 2 and 3
warning: gcc/cc1obj-checksum.o differs
Bootstrap comparison failure!
gcc/calls.o differs
gcc/dwarf2out.o differs
gcc/loop-iv.o differs
gcc/generic-match.o differs
gcc/ipa-inline.o differs
gcc/builtins.o differs
gcc/optabs.o differs
gcc/tree-vrp.o differs
gcc/profile.o differs
gcc/i386.o differs
gcc/cfgexpand.o differs
gcc/simplify-rtx.o differs
gcc/gimple-ssa-sprintf.o differs
gcc/expr.o differs
gcc/print-tree.o differs
gcc/gimple-match.o differs
gcc/godump.o differs
gcc/gimple-ssa-nonnull-compare.o differs
gcc/targhooks.o differs
gcc/tree-ssa-live.o differs
gcc/gimple-ssa-warn-restrict.o differs
gcc/tree-ssa-ccp.o differs
gcc/gimplify.o differs
gcc/tree-cfg.o differs
gcc/tree-pretty-print.o differs
make: *** [compare] Error 1

[Bug lto/86490] lto1: fatal error: multiple prevailing defs

2018-07-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490

--- Comment #8 from Alexander Monakov  ---
(In reply to H.J. Lu from comment #7)
> It is to be consistent for common symbol linked against .a or .so.

That seems like a really strange reason because without --whole-archive there
are other ways to arrive at an apparent "inconsistency", while with
--whole-archive there's no need for special treatment as the "consistent"
result is achieved automatically.

[Bug lto/86490] lto1: fatal error: multiple prevailing defs

2018-07-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490

--- Comment #6 from Alexander Monakov  ---
(In reply to H.J. Lu from comment #5)
> When ld sees a common symbol, it will use a non-common definiton
> in a library, .a or .so, to override it.

This is surprising, is it documented somewhere? I don't think the ELF spec
suggests something like that needs to happen.

> Do you have a testcase?

No, it would take some time to prepare.

[Bug lto/86490] lto1: fatal error: multiple prevailing defs

2018-07-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490

--- Comment #4 from Alexander Monakov  ---
(In reply to H.J. Lu from comment #3)
> It is because gold doesn't check archive for a common definition.

Please elaborate - does ld.bfd try to extract static archive members when it
already has a common definition? Why?

> Is there a common symbol involved?

I don't think so, but I'm not sure. We've also seen other pain points like the
same member extracted and given to the plugin multiple times, even though the
second extraction cannot possibly satisfy any unresolved references.

[Bug lto/86490] lto1: fatal error: multiple prevailing defs

2018-07-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86490

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
Note that Gold does not exhibit this issue. I think ld.bfd is at fault here.

We've hit similar issues with some internal plugin development. The main issue
is, ld.bfd feeds the plugin with objects extracted from static archives, but
those objects do not satisfy any unresolved references and would not be
extracted in the first place in non-LTO link. So ld.bfd is causing useless
extra work both for itself and the compiler plugin.

It would be nice to fix this on ld.bfd side so future plugin writers don't need
to wrestle with this issue.

[Bug lto/86442] Wrong error: global register variable follows a function definition when using LTO

2018-07-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86442

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
Indeed, I think the build system should be passing -fno-lto for such
translation units, as their ABI is different from the rest.

I'm not sure limiting the inliner is enough, there's also IPA-RA and it's not
obviously safe w.r.t global-reg-var differences.

If there's a desire to support such usage "seamlessly", I'd really like to see
the same solution for this and toplevel asms: making such input translation
unit one LTO partition (i.e. not splitting or merging it with anything else).

[Bug tree-optimization/86435] -fsemantic-interposition does not appear to have any effect

2018-07-08 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86435

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Alexander Monakov  ---
Without -fpic, f1 is considered not interposable. With -fpic, gcc needs
-fsemantic-interposition to optimize f2 to 'return 0;'.

[Bug c/86420] [9 regression] nextafter(0x1p-1022,0) is constant folded

2018-07-06 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86420

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
I think it's intended for -ftrapping-math to cover this.

Jakub's patch adding this folding functionality handles over/underflow cases,
but looks like the situation in comment #0 is not handled:

https://gcc.gnu.org/ml/gcc-patches/2018-04/msg01027.html

[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage

2018-07-04 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214

Alexander Monakov  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #8 from Alexander Monakov  ---
Removing the 'waiting' status.

[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage

2018-07-04 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214

--- Comment #5 from Alexander Monakov  ---
Sorry, this still seems over-reduced: the 'cmp' variable is uninitialized, and
gcc-7 completely optimizes out the block with large stack usage guarded by 'cmp
== 0' test, so gcc-7 vs gcc-8 is not directly comparable. It's strange that
gcc-7 optimizes that out, but it's a different issue.

Can you attach the unreduced preprocessed source, and if you make another
attempt at reducing, perhaps enable most warnings?

That said, it seems gcc is not very good at re-discovering non-overlapping
stack allocations introduced by inlining. Looking at your testcase I came up
with the following minimal test:

struct S{~S();};

void f(void *);

inline void ff()
{
  char c[1000];
  f(c);
}

void g(int n)
{
  S s;
  char c[100];
  f(c);
  if (n) ff(), ff();
}

(there's no regression vs. gcc-7 on this example, but gcc-4.6 used to get a
better result by consuming 1100 bytes rather than 2100).

[Bug driver/86388] Enhancement: sort "valid arguments to '-march=' switch" suggestions alphabetically

2018-07-03 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86388

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
I'd prefer existing ordering relative to alphabetical: the list produced by GCC
is mostly ordered first by manufacturer, then by generation/capabilities.
Placement of the 'x86-64' entry seems odd, but, other than that, I like the
order.

The manual also orders -march entries in this "manufacturer-capabilities"
style.

[Bug fortran/86350] Missed optimization with multiplication by zero

2018-06-28 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86350

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
The multiplication is optimized out under -ffinite-math-only -fno-signed-zeros
(otherwise y can be NaN if bar returns infinity, for example).

Why is it ok to optimize out the call to bar even though it's impure?

[Bug middle-end/86311] gcc_qsort calls memcpy with overlaps

2018-06-25 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86311

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Alexander Monakov  ---
Thanks! Fixed by using memmove where memcpy might have been called with
dst==src.

[Bug middle-end/86311] gcc_qsort calls memcpy with overlaps

2018-06-25 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86311

--- Comment #1 from Alexander Monakov  ---
Author: amonakov
Date: Mon Jun 25 17:44:15 2018
New Revision: 262092

URL: https://gcc.gnu.org/viewcvs?rev=262092=gcc=rev
Log:
gcc_qsort: avoid overlapping memcpy (PR 86311)

PR middle-end/86311
* sort.cc (REORDER_23): Avoid memcpy with same destination and source.
(REORDER_45): Likewise.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/sort.cc

[Bug tree-optimization/86214] [8/9 Regression] Strongly increased stack usage

2018-06-21 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86214

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Target Milestone|--- |8.2
Summary|[8 Regression] Strongly |[8/9 Regression] Strongly
   |increased stack usage   |increased stack usage

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2018-06-21
 CC||amonakov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
Inlining decisions are not so different between 7/8, the main difference is
gcc-8 translates b::x() into __builtin_unreachable and warns accordingly:

warning: no return statement in function returning non-void [-Wreturn-type]

   b x(b) {}
   ^

and with that change gcc-8 no longer manages to prove that big arrays have
non-overlapping lifetimes.

If I change the source to well-formed 'void x(b) {}', it compiles as desired.

So, assuming the original MySQL source is free of that warning, the testcase is
too aggressively reduced and no longer reflects the original issue. Can you
please re-reduce?

[Bug lto/86175] LTO code generator does not respect ld -u option to force symbol inclusion in the link product

2018-06-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86175

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
> but really gets an empty blob from the LTO plugin for foo.

Are you sure about this? Compiling with -save-temps shows that the symbol is
present in GCC's assembly output; specifying --print-gc-sections also shows
that the linker is discarding it:

/usr/bin/ld.bfd: Removing unused section '.text.KeepMe' in file
'/tmp/ccWbtSKK.ltrans0.ltrans.o'


Gold linker does not exhibit this (try -fuse-ld=gold). Can you report it
against the BFD linker at sourceware.org/bugzilla?

[Bug c/86174] Poor vectorization/register allocation with omp simd, FMA

2018-06-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86174

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
It might be useful to note that what the testcase "wants" to happen is for the
compiler to notice that the temporary array 'double C[Si][Sk]' does not need to
live in memory - ideally it would correspond to 8 256-bit (or 4 512-bit)
registers.

[Bug c/86150] Trunk Segmentation Fault

2018-06-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86150

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Alexander Monakov  ---
This is the *assembler* segfaulting, not the *compiler*. The assembly produced
by trunk is not different from gcc-8 output on empty input, so it's probably
some weird issue with Binutils installation for gcc-trunk worker(s) on Godbolt
side.

[Bug c++/86094] [8/9 Regression] Call ABI changed for small objects with defaulted ctor

2018-06-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86094

--- Comment #3 from Alexander Monakov  ---
-fabi-version=12 is not documented, not mentioned in release notes, and not
wired up in -Wabi.

[Bug rtl-optimization/86096] ICE: qsort checking failed (error: qsort comparator non-negative on sorted output: 0)

2018-06-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86096

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-06-09
 CC||amonakov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
df_mw_compare has:

   if (mw1->mw_reg != mw2->mw_reg)
 return mw1->mw_order - mw2->mw_order;

Note mw_reg in the 'if' vs mw_order in the 'return'. This is invalid.

It's simpler and more efficient to just use mw_order as the last tie-breaker
regardless of mw_reg value.

[Bug c++/86094] New: [8/9 Regression] Call ABI changed for small objects with defaulted ctor

2018-06-08 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86094

Bug ID: 86094
   Summary: [8/9 Regression] Call ABI changed for small objects
with defaulted ctor
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: ABI, wrong-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

When compiling the following with -O2 -std=c++11:

struct S {
S(S&&) = default;
int i;
};

S foo(S s)
{
return s;
}

gcc-7 and earlier emit

_Z3foo1S:
movl%edi, %eax
ret

but gcc-8 and trunk emit

_Z3foo1S:
movl(%rsi), %edx
movq%rdi, %rax
movl%edx, (%rdi)
ret

i.e. the object is now passed in memory rather than on register. This appears
to be a silent ABI change.

(Clang generates the same code as gcc-7)

[Bug c/86093] [8/9 Regression] volatile ignored on pointer in C

2018-06-08 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86093

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-06-08
 CC||amonakov at gcc dot gnu.org
  Known to work||7.3.0
Summary|volatile ignored on pointer |[8/9 Regression] volatile
   |in C|ignored on pointer in C
 Ever confirmed|0   |1
  Known to fail||8.1.0, 9.0

--- Comment #1 from Alexander Monakov  ---
gcc-7 got this right.

[Bug tree-optimization/86072] Poor codegen with atomics

2018-06-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86072

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
As for the segfault mentioned in comment 0, this is not a compiler bug: it's
the assembler segfaulting, and it segfaults even with an empty source, so it's
probably an issue/misconfiguration on the godbolt.org side.

[Bug tree-optimization/86071] -O0 -foptimize-sibling-calls doesn't optimize

2018-06-06 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86071

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Alexander Monakov  ---
In GCC there's no way to selectively enable a few optimizations with their -f
flags at -O0 level: -O0 means that optimizations are completely disabled,
regardless of -f flags. This is mentioned in the manual:

  "Most optimizations are only enabled if an -O level is set on the command
line.  Otherwise they are disabled, even if individual optimization flags are
specified."


Tail call optimization sometimes is not applied because there's an escaping
local variable (possibly from an inlined function), and GCC does not take into
account its life range. This might be what you're seeing at -O3. There's a
recent report: PR 86050.

[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules

2018-06-04 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #16 from Alexander Monakov  ---
What do you think about the suggestion made in the most recent duplicate,
namely expanding GIMPLE pointer-to-integer casts to non-transparent RTL
assignments, i.e. going from

  val = (intptr_t) ptr;

to

  asm ("" : "=g" (rval) : "0" (rptr));

Wouldn't this plug the hole in one shot instead of chasing down missing
REG_POINTERs in multiple RTL passes?

[Bug c/86026] Document and/or change allowed operations on integer representation of pointers

2018-06-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026

--- Comment #3 from Alexander Monakov  ---
Tree optimizations already manage to avoid "optimizing" f_intadd, but
unfortunately on RTL types and casts are not visible in IR and various passes
make no distinction between (char*)((uintptr_t)t + o) and (t + o).

Perhaps GCC should consider lowering pointer-to-integer casts to a
non-transparent assignment, making the result alias all for the purposes of RTL
alias analysis, akin to

char __attribute__ ((noinline)) f_intadd1(ptrdiff_t o) {
  g = 1;
  uintptr_t t1 = (uintptr_t)t;
  asm("" : "+g"(t1));
  *(char*)(t1 + o) = 2;
  return g;
}

[Bug c/86026] Document and/or change allowed operations on integer representation of pointers

2018-06-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
Please add full testcase source, the snippet is missing (at least) declarations
of 'g' and 't'. The Godbolt link does not work correctly for me right now, and
in general such links are not reliable long-term.

[Bug target/85994] Comparison failure in 64-bit libgcc *_{sav,res}ms64*.o on Solaris/x86

2018-05-30 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85994

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
Why does this affect only new files, i.e. how did existing libgcc .S files
avoid running into the same issue?

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-30 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #10 from Alexander Monakov  ---
Also note that both the original and the reduced testcase can be tweaked to
exhibit the surprising transformation even when -fexcess-precision=standard is
enabled. A "lazy" way is via -mpc64, but I think it's possible even without the
additional option (by making the code more convoluted to enforce rounding to
double). Here's what happens on the reduced testcase:

$ gcc -m32 d.c -O -fdisable-tree-dom3 && ./a.out 
cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295]
1 == 0

$ gcc -m32 d.c -O -fdisable-tree-dom3 -fexcess-precision=standard -mpc64 &&
./a.out 
cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295]
0 == 1

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #9 from Alexander Monakov  ---
Sorry, the above comment should have said 'b * 1e6' every time it said 'b'.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #8 from Alexander Monakov  ---
To expand a bit: DOM makes the small testcase behave as if 'b' and 'ib' are
evaluated twice:

* one time, 'b' is evaluated in precision matching 'a' (either infinite or
double), and 'ib' is evaluated to 1; this instance is used in 'ia == ib'
comparison;
* a second time, 'b' is evaluated in extended precision and 'ib' is evaluated
to 0; this instance is passed as the last argument to printf.

This is surprising as the original program clearly evaluates 'b' and 'ib' just
once.

If there's no bug in DOM and the observed transformation is allowed to happen
when -fexcess-precision=fast is in effect, I think it would be nice to mention
that in the compiler manual.

[Bug target/85961] scratch register rsi used after function call

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85961

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
You'd need to disable IPA-RA after forcing -O2 with the pragma, i.e.:

#pragma GCC optimize "O2"
#pragma GCC optimize "no-ipa-ra"

We already have logic to disable IPA-RA when instrumentation/profiling is
active, but it's done once in toplev.c. Here the pragma re-enables IPA-RA after
toplev.c:process_options() has disabled it.

Do we want to adjust it given that "pragma optimized" is documented as "not
suitable for production use"?

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

Alexander Monakov  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2018-05-29
 CC||amonakov at gcc dot gnu.org
 Resolution|DUPLICATE   |---
 Ever confirmed|0   |1

--- Comment #7 from Alexander Monakov  ---
Reopening, the issue here is way more subtle than bug 323 and points to a
possible issue in DOM. Hopefully Richi can have a look and comment.

It appears dom2 pass performs something like jump threading based on
compile-time-evaluated floating-point expression values without also
substituting those expressions in IR. At run time, they are evaluated to
different values, leading to an inconsistency. Namely, dom2 creates bb 10:

  :
  # iftmp.1_1 = PHI <"true"(7), "false"(8), "true"(10)>
  printf ("(a6 == b6) = %s\n", iftmp.1_1);
  return 0;

  :
  _24 = __n2_13 * 1.0e+6;
  b6_25 = (guint64) _24;
  printf ("a6 = %llu\n", 1);
  printf ("b6 = %llu\n", b6_25);
  goto ;

where jump to bb 9 implies that _24 evaluates to 1.0 and b6_25 to 1, but they
are not substituted as such, and at run time evaluate to 0.99... and 0 due to
excess precision.

The following reduced testcase demonstrates the same issue, but requires
-fdisable-tree-dom3 (on gcc-6 at least, as otherwise dom3 substitutes results
of compile-time evaluation).

__attribute__((noinline,noclone))
static double f(void)
{
  return 1e-6;
}

int main(void)
{
  double a = 1e-6, b = f();

  if (a != b) __builtin_printf("uneq");

  unsigned long long ia = a * 1e6, ib = b * 1e6;

  __builtin_printf("%lld %s %lld\n", ia, ia == ib ? "==" : "!=", ib);
}

[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build

2018-05-25 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov  ---
Glibc bits/sigcontext.h should not include Linux asm/sigcontext.h (but it used
to on i386).

This was fixed back in 2012 for Glibc 2.16 by this Glibc commit:
https://sourceware.org/git/?p=glibc.git;a=commit;h=48495318fa5ae223a8b777ed144bd769d9f6c67f

I doubt this warrants a change on GCC side, given that a workaround is simple.

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2018-05-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 79985, which changed state.

Bug 79985 Summary: ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-05-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Alexander Monakov  ---
Fixed.

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-05-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

--- Comment #9 from Alexander Monakov  ---
Author: amonakov
Date: Wed May 23 15:01:28 2018
New Revision: 260613

URL: https://gcc.gnu.org/viewcvs?rev=260613=gcc=rev
Log:
df-scan: remove ad-hoc handling of global regs in asms

PR rtl-optimization/79985
* df-scan.c (df_insn_refs_collect): Remove special case for
global registers and asm statements.

testsuite/
* gcc.dg/pr79985.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/pr79985.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/df-scan.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/80318] GCC takes too much RAM and time compiling a template file (var-tracking)

2018-05-22 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80318

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov  ---
Second largest seems to be the frontend, as with -fsyntax-only we still need
18s and 1.8GB (this is 8.1 with release checking):

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
   1381 kB (  0%)
 phase parsing  :   4.01 ( 22%)   0.80 ( 30%)   4.82 ( 23%)
 519422 kB ( 27%)
 phase lang. deferred   :  13.96 ( 78%)   1.83 ( 70%)  15.82 ( 77%)
1414614 kB ( 73%)
 |name lookup   :   1.89 ( 11%)   0.35 ( 13%)   2.08 ( 10%)
  99986 kB (  5%)
 |overload resolution   :   8.94 ( 50%)   1.29 ( 49%)  10.10 ( 49%)
 934750 kB ( 48%)
 garbage collection :   1.79 ( 10%)   0.00 (  0%)   1.80 (  9%)
  0 kB (  0%)
 preprocessing  :   0.14 (  1%)   0.12 (  5%)   0.37 (  2%)
   2890 kB (  0%)
 parser (global):   0.58 (  3%)   0.21 (  8%)   0.73 (  4%)
 115783 kB (  6%)
 parser struct body :   0.74 (  4%)   0.09 (  3%)   0.77 (  4%)
  81383 kB (  4%)
 parser enumerator list :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
364 kB (  0%)
 parser function body   :   0.05 (  0%)   0.03 (  1%)   0.07 (  0%)
   4688 kB (  0%)
 parser inl. func. body :   0.10 (  1%)   0.01 (  0%)   0.12 (  1%)
   6402 kB (  0%)
 parser inl. meth. body :   0.41 (  2%)   0.06 (  2%)   0.39 (  2%)
  27538 kB (  1%)
 template instantiation :  13.86 ( 77%)   2.06 ( 78%)  16.02 ( 78%)
1694216 kB ( 88%)
 constant expression evaluation :   0.26 (  1%)   0.04 (  2%)   0.27 (  1%)
729 kB (  0%)
 varconst   :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)
 39 kB (  0%)
 symout :   0.02 (  0%)   0.01 (  0%)   0.07 (  0%)
  0 kB (  0%)
 TOTAL  :  17.97  2.63 20.65   
1935427 kB

[Bug c++/85783] alloc-size-larger-than fires incorrectly with new[] and can't be disabled

2018-05-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85783

Alexander Monakov  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2018-05-16
 CC||amonakov at gcc dot gnu.org
 Resolution|WONTFIX |---
 Ever confirmed|0   |1

--- Comment #10 from Alexander Monakov  ---
Reopening: the request to be able to disable the warning (via
-Wno-alloc-size-larger-than) is valid and should be addressed.

[Bug target/41084] Filling xmm register with all bit set is not optimized

2018-05-15 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #2 from Alexander Monakov  ---
Starting from gcc-4.5 (released in 2010) GCC emits pcmpeq for the
explicit-constructor variant (where it would previously emit a load) as well as
for a more concise form:

  __m128i r = {-1, -1};

The implicit variant with _mm_cmpeq_epi32 is optimized as expected starting
with gcc-5 (released in 2015).

So as far as I can see both issues raised in this report have been addressed in
the meantime. If there are other cases that are not well optimized, please let
us know (they deserve separate bug reports).

[Bug tree-optimization/85758] New: questionable bitwise folding (missing single use check?)

2018-05-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85758

Bug ID: 85758
   Summary: questionable bitwise folding (missing single use
check?)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

The following should be translated as-is:

void f(int a, int b);
void g(int a, int b, int m, int s)
{
m &= s;
a += m;
m ^= s;
b += m;
f(a, b);
}

However instead of and/add/xor/add we get mov/not/and/and/add/add:

movl%edx, %eax
notl%edx
andl%ecx, %eax
andl%edx, %ecx
addl%eax, %edi
addl%ecx, %esi
jmp f

This is because forwprop applies an identity to m = (m & s) ^ s:

g (int a, int b, int m, int s)
{
   :
  m_3 = m_1(D) & s_2(D);
  a_5 = a_4(D) + m_3;
  m_6 = m_3 ^ s_2(D);
  b_8 = b_7(D) + m_6;
  f (a_5, b_8);
  return;
}

gimple_simplified to _11 = ~m_1(D);
m_6 = s_2(D) & _11;
g (int a, int b, int m, int s)
{
  int _11;

   :
  m_3 = m_1(D) & s_2(D);
  a_5 = m_3 + a_4(D);
  _11 = ~m_1(D);
  m_6 = s_2(D) & _11;
  b_8 = m_6 + b_7(D);
  f (a_5, b_8);
  return;
}

However since m_3 is used, this is more costly. Shouldn't this folding check
for single use of the intermediate expr? From a quick look, this is probably
match.pd:/* Fold (X & Y) ^ Y and (X ^ Y) & Y as ~X & Y.  */

[Bug tree-optimization/85757] New: tree optimizers fail to fully clean up fixed-size memcpy

2018-05-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85757

Bug ID: 85757
   Summary: tree optimizers fail to fully clean up fixed-size
memcpy
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

This is minimized from one of suboptimal stack consumption issues in gcc_qsort.

gcc_qsort uses code similar to this to move potentially-unaligned data:

void f(int n, char *p0, char *p1, char *p2, char *o)
{
int t0, t1;
__builtin_memcpy(, p0, 1);
__builtin_memcpy(, p1, 1);
if (n==3) __builtin_memcpy(o+2, p2, 1);
__builtin_memcpy(o+0, , 1);
__builtin_memcpy(o+1, , 1);
}

Note the mismatch between memcpy size (1) and temporaries' size (4).

If the sizes match, there's no problem. If not, tree optimizers fail to fully
clean up the copies (and, unlike in this minimal testcase, in full gcc_qsort
RTL optimizers can't clean it up either and we get dead stack stores). The
.optimized dump reads (note dead writes to t0 and t1 in BB 2):

f (int n, char * p0, char * p1, char * p2, char * o)
{
  int t1;
  int t0;
  unsigned char _4;
  unsigned char _7;
  unsigned char _12;

   [local count: 1073741825]:
  _4 = MEM[(char * {ref-all})p0_3(D)];
  MEM[(char * {ref-all})] = _4;
  _7 = MEM[(char * {ref-all})p1_6(D)];
  MEM[(char * {ref-all})] = _7;
  if (n_9(D) == 3)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 365072220]:
  _12 = MEM[(char * {ref-all})p2_11(D)];
  MEM[(char * {ref-all})o_10(D) + 2B] = _12;

   [local count: 1073741825]:
  MEM[(char * {ref-all})o_10(D)] = _4;
  MEM[(char * {ref-all})o_10(D) + 1B] = _7;
  t0 ={v} {CLOBBER};
  t1 ={v} {CLOBBER};
  return;

}

[Bug target/85683] [8 Regression] GCC 8 stopped using RMW (Read Modify Write) instructions on x86[_64]

2018-05-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85683

Alexander Monakov  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-05-07
 CC||amonakov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
Smaller testcase:

void f(void);

void g(int *p)
{
if (!--*p)
f();
}

On gcc-7.3 this is optimized by the peephole2 pass so it doesn't really help
with register pressure (combine pass seems more suitable for that); don't know
why the peephole doesn't trigger on gcc-8.

[Bug rtl-optimization/85673] ICE in create_pre_exit, at mode-switching.c:451

2018-05-06 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85673

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-05-06
 CC||abel at gcc dot gnu.org,
   ||amonakov at gcc dot gnu.org
 Blocks||84301
   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
PR 84301 is related (not backported to 6/7 so failure is expected there).

The fix was incomplete because 'cant_move' insn flag only restricts inter-block
motion (argh!), so sel-sched is still free to move %eax assignment up. Oops.

Perhaps we can additionally set sched_group_p in add_branch_dependences for
pre-RA sel-sched to ensure insns stay at the end of basic block; after reload
that would also pin mutex_p cond-exec insns to BB end as well.

(apropos: flag_sched_group_heuristic should be removed, the way it's used in
rank_for_schedule is not a heuristic, but a correctness requirement)

Overall I'm concerned that mode-switching is making unreasonable assumptions,
if it really needs that some insns stay in sequence just before function
return, they should be arranged to have a barrier insn or SCHED_GROUP_P from
the beginning. So maybe it's better to adjust mode-switching instead, but
unfortunately it's not quite obvious how it works :)


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301
[Bug 84301] [6/7 Regression] ICE in create_pre_exit, at mode-switching.c:451

[Bug rtl-optimization/84842] [7/8/9 Regression] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-30 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

--- Comment #14 from Alexander Monakov  ---
Thanks. I think the root cause on this x86_64 testcase is different.

Arseny, in the meantime if by chance you have another x86_64 variant of this
failure that doesn't require -funroll-all-loops, please post it as well.

[Bug inline-asm/85546] GCC assumes volatile asm block returns same value in loop

2018-04-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85546

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #3 from Alexander Monakov  ---
I'm not sure Richard is correct about the definition of volatile asms: similar
to reads of volatile objects, volatile asms can produce different output on
each invocation (iow they are not pure/const).

In any case the inline asm in io() is missing clobbers for rcx, r11 and memory,
which makes the bug invalid.

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2018-04-24 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 85423, which changed state.

Bug 85423 Summary: [8 Regression] ICE in code_motion_process_successors, at 
sel-sched.c:6403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/80463] [6/7 Regression] ICE with -fselective-scheduling2 and -fvar-tracking-assignments

2018-04-24 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463
Bug 80463 depends on bug 85423, which changed state.

Bug 85423 Summary: [8 Regression] ICE in code_motion_process_successors, at 
sel-sched.c:6403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/85423] [8 Regression] ICE in code_motion_process_successors, at sel-sched.c:6403

2018-04-24 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Blocks||80463
 Resolution|--- |FIXED

--- Comment #7 from Alexander Monakov  ---
Thanks. I've added one more "Blocks" edge so indicate that this should be taken
when backporting the earlier patch.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463
[Bug 80463] [6/7 Regression] ICE with -fselective-scheduling2 and
-fvar-tracking-assignments

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-04-21 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

--- Comment #8 from Alexander Monakov  ---
Unfortunately the above doesn't fully address the issue, as schedulers and
other passes still have no idea that DF makes those assumptions and will allow
reordering of asms:

register int r asm("ebx");

int f(int x, int y)
{
int t = x/y/r;
asm("#asm" );
return t-x;
}

_Z1fii:
#APP
#asm
#NO_APP
movl%edi, %eax
cltd
idivl   %esi
cltd
idivl   %ebx
subl%edi, %eax
ret

See how the asm is first, even though from DF point of view it should remain
after the read of %ebx for division by r; here cprop_hardreg makes the
offending propagation.

So currently GCC has a rather split personality when it comes to deps w.r.t
global reg vars in asm statements. The documentation should spell out the
intended behavior. My suggestion is to require that references are exposed to
the compiler via constraints, allowing to remove the ad-hoc treatment in DF. I
intend to do that early in stage 1.

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-04-20 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

--- Comment #7 from Alexander Monakov  ---
Or rather like this:

diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 95e1e0df2d5..732705c0385 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -3207,11 +3207,11 @@ df_insn_refs_collect (struct df_collection_rec
*collection_rec,
   if (CALL_P (insn_info->insn))
 df_get_call_refs (collection_rec, bb, insn_info, flags);

-  if (asm_noperands (PATTERN (insn_info->insn)) >= 0)
+  if (GET_CODE (PATTERN (insn_info->insn)) == ASM_INPUT)
 for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if (global_regs[i])
{
- /* As with calls, asm statements reference all global regs. */
+ /* As with calls, basic asms reference all global regs. */
  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
 NULL, bb, insn_info, DF_REF_REG_USE, flags);
  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],

[Bug rtl-optimization/84842] [7/8 Regression] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-17
 Ever confirmed|0   |1

--- Comment #11 from Alexander Monakov  ---
Thanks, I managed to reproduce it. The unusual thing here is hardreg 63 being
considered call-clobbered in its reg_raw_mode=TImode but not narrower modes. We
have

(insn 97 29 98 4 (set (reg:DI 63 31 [160])
(unspec:DI [
(reg:SI 29 29)
] UNSPEC_LFIWAX)) "pr84842.i":5 344 {lfiwax}
 (expr_list:REG_DEAD (reg:SI 29 29)
(nil)))

and sched-deps noting a REG_DEP_OUTPUT dependence on regno 63 against a
preceding call insn according to rs6000_hard_regno_call_part_clobbered
(regno=63, mode=E_TImode). I assume what the backend in conveying there is that
only the low part of the register will be preserved by callees.

However, when we move up the instruction we don't have a dependence. The LHS is
DImode, so that seems correct as well: sched-deps had a more conservative
answer because its dependence lists are not separated per mode.

Andrey, does the above make sense? Can the assert be relaxed?

[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416

Alexander Monakov  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #14 from Alexander Monakov  ---
Ah, the linked report actually says very clearly that fixes landed in Glibc
2.25, so I'll close this bug: nothing to do on GCC side about this.

[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416

--- Comment #13 from Alexander Monakov  ---
This is most likely a variant of 

  https://bugzilla.redhat.com/show_bug.cgi?id=1421121

so hitting this bug requires a specific CPU model.

It looks as if SSE-AVX transition penalties appear when switching between
pure-SSE sinf code and VEX-prefixed SSE code in the main program after the
ld.so runtime resolver affects AVX state tracking in the CPU.

I'm not sure if any patches have landed on Glibc side to avoid this, but in any
case this should be re-reported against Glibc if needed, GCC cannot improve the
situation.

An easy workaround would be to pass -Wl,-z,now when linking.

[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
Can you also run the tests under 'perf stat'?

[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

--- Comment #8 from Alexander Monakov  ---
Or as Jakub (thanks!) noted on IRC, gcc/auto-host.h from the build tree may be
also helpful and simpler for us to work with.

[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

--- Comment #7 from Alexander Monakov  ---
The testcase is not easily reproducible because the rs6000 backend has some
implicit dependencies on capabilities of configure-time binutils, and they are
not visible as 'gcc -v' flags.

So, to reproduce this we need to know the version and configure flags of cross
binutils that were found and checked by gcc's configure.

[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #4 from Alexander Monakov  ---
Can you please share tree and rtl dumps for the nice testcase in comment #3 by
re-running it with -fdump-tree-all -fdump-rtl-all and attaching a tar.gz with
those? I could not reproduce it either, so having the dumps might help us see
what's different on our side.

(and an additional archive for a non-failing run without
-fselective-scheduling2 might be helpful too)

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-04-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

Alexander Monakov  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org

--- Comment #6 from Alexander Monakov  ---
Candidate patch for gcc-9 stage 1:

diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 95e1e0df2d5..4708fc328c6 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -3207,7 +3207,8 @@ df_insn_refs_collect (struct df_collection_rec
*collection_rec,
   if (CALL_P (insn_info->insn))
 df_get_call_refs (collection_rec, bb, insn_info, flags);

-  if (asm_noperands (PATTERN (insn_info->insn)) >= 0)
+  if (asm_noperands (PATTERN (insn_info->insn)) >= 0
+  && volatile_insn_p (PATTERN (insn_info->insn)))
 for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if (global_regs[i])
{

[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling

2018-04-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
Bug 84659 depends on bug 85354, which changed state.

Bug 85354 Summary: [8 regression] ICE with gcc.dg/graphite/pr84872.c starting 
with r259313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313

2018-04-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Blocks||84659
 Resolution|--- |FIXED

--- Comment #5 from Alexander Monakov  ---
Fixed.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
[Bug 84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in
bb_note) w/ selective scheduling

[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313

2018-04-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

--- Comment #4 from Alexander Monakov  ---
Author: amonakov
Date: Thu Apr 12 15:40:44 2018
New Revision: 259348

URL: https://gcc.gnu.org/viewcvs?rev=259348=gcc=rev
Log:
sel-sched: move cleanup_cfg before calculate_dominance_info (PR 85354)

PR rtl-optimization/85354
* sel-sched-ir.c (sel_init_pipelining): Move cfg_cleanup call...
* sel-sched.c (sel_global_init): ... here.



Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched-ir.c
trunk/gcc/sel-sched.c

[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

Alexander Monakov  changed:

   What|Removed |Added

 CC||abel at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org

--- Comment #1 from Alexander Monakov  ---
Thanks. Judging from the backtrace, we shouldn't call cleanup_cfg after
dominators are computed: it will invalidate dominators without freeing or
fixing them. I wonder if that's "by design".

A simple way out is to run cleanup_cfg early enough. I'll bootstrap/regtest the
following on gcc112:

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 50a7daafba6..ee970522890 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -30,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgrtl.h"
 #include "cfganal.h"
 #include "cfgbuild.h"
-#include "cfgcleanup.h"
 #include "insn-config.h"
 #include "insn-attr.h"
 #include "recog.h"
@@ -6122,9 +6121,6 @@ make_regions_from_loop_nest (struct loop *loop)
 void
 sel_init_pipelining (void)
 {
-  /* Remove empty blocks: their presence can break assumptions elsewhere,
- e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
-  cleanup_cfg (0);
   /* Collect loop information to be used in outer loops pipelining.  */
   loop_optimizer_init (LOOPS_HAVE_PREHEADERS
| LOOPS_HAVE_FALLTHRU_PREHEADERS
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index cd29df35666..59762964c6e 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "regs.h"
 #include "cfgbuild.h"
+#include "cfgcleanup.h"
 #include "insn-config.h"
 #include "insn-attr.h"
 #include "params.h"
@@ -7661,6 +7662,10 @@ sel_sched_region (int rgn)
 static void
 sel_global_init (void)
 {
+  /* Remove empty blocks: their presence can break assumptions elsewhere,
+ e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
+  cleanup_cfg (0);
+
   calculate_dominance_info (CDI_DOMINATORS);
   alloc_sched_pools ();

[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Alexander Monakov  ---
Fixed.

[Bug middle-end/82407] [meta-bug] qsort_chk fallout tracking

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82407
Bug 82407 depends on bug 84566, which changed state.

Bug 84566 Summary: error: qsort comparator not anti-commutative: -1, -1 on 
aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 84566, which changed state.

Bug 84566 Summary: error: qsort comparator not anti-commutative: -1, -1 on 
aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/84301] [6/7 Regression] ICE in create_pre_exit, at mode-switching.c:451

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301

Alexander Monakov  changed:

   What|Removed |Added

  Known to work||8.0
   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org
Summary|[6/7/8 Regression] ICE in   |[6/7 Regression] ICE in
   |create_pre_exit, at |create_pre_exit, at
   |mode-switching.c:451|mode-switching.c:451
  Known to fail|8.0 |

--- Comment #6 from Alexander Monakov  ---
Fixed on the trunk.

[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

--- Comment #4 from Alexander Monakov  ---
Author: amonakov
Date: Wed Apr 11 14:36:04 2018
New Revision: 259322

URL: https://gcc.gnu.org/viewcvs?rev=259322=gcc=rev
Log:
sched-deps: respect deps->readonly in macro-fusion (PR 84566)

PR rtl-optimization/84566
* sched-deps.c (sched_analyze_insn): Check deps->readonly when invoking
sched_macro_fuse_insns.



Modified:
trunk/gcc/ChangeLog
trunk/gcc/sched-deps.c

[Bug target/84301] [6/7/8 Regression] ICE in create_pre_exit, at mode-switching.c:451

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301

--- Comment #5 from Alexander Monakov  ---
Author: amonakov
Date: Wed Apr 11 14:32:32 2018
New Revision: 259321

URL: https://gcc.gnu.org/viewcvs?rev=259321=gcc=rev
Log:
sched-rgn: run add_branch_dependencies for sel-sched (PR 84301)

PR target/84301
* sched-rgn.c (add_branch_dependences): Move sel_sched_p check here...
(compute_block_dependences): ... from here.

testsuite/
* gcc.target/i386/pr84301.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr84301.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sched-rgn.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659

Alexander Monakov  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org
Summary|[6/7/8 Regression] ICE: |[6/7 Regression] ICE:
   |Segmentation fault (stack   |Segmentation fault (stack
   |overflow in bb_note) w/ |overflow in bb_note) w/
   |selective scheduling|selective scheduling

--- Comment #3 from Alexander Monakov  ---
Fixed on the trunk. Unfortunately the Changelog entry had a typo in the PR#:

Author: amonakov
Date: Wed Apr 11 10:40:07 2018
New Revision: 259313

URL: https://gcc.gnu.org/viewcvs?rev=259313=gcc=rev
Log:
sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659)

  PR rtl-optimization/84659
  * sel-sched-ir.c (sel_init_pipelining): Invoke cleanup_cfg.

testsuite/
  * gcc.dg/pr84659.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/pr84659.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched-ir.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659

--- Comment #4 from Alexander Monakov  ---
Author: amonakov
Date: Wed Apr 11 10:48:42 2018
New Revision: 259314

URL: https://gcc.gnu.org/viewcvs?rev=259314=gcc=rev
Log:
fix PR 84659 references in ChangeLog files

Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/85275] New: copyheader peels off almost the entire iteration

2018-04-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85275

Bug ID: 85275
   Summary: copyheader peels off almost the entire iteration
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

I expected predcom to eliminate one of the loads in this loop at -O3:

int is_sorted(int *a, int n)
{
  for (int i = 0; i < n - 1; i++)
if (a[i] > a[i + 1])
  return 0;
  return 1;
}

Unfortunately, predcom bails out since the loads it sees are not
always-executed. Ideally loop header copying would make this a suitable
do-while loop, but in this case it duplicates too much:


;; Loop 1
;;  header 5, latch 4
;;  depth 1, outer 0
;;  nodes: 5 4 3
;; 2 succs { 5 }
;; 3 succs { 6 4 }
;; 4 succs { 5 }
;; 5 succs { 3 6 }
;; 6 succs { 1 }
Analyzing loop 1
Loop 1 is not do-while loop: latch is not empty.
Will duplicate bb 5
Will duplicate bb 3
  Not duplicating bb 4: it is single succ.
Duplicating header of the loop 1 up to edge 3->4, 12 insns.
[...]
   [local count: 114863532]:
  _17 = n_12(D) + -1;
  if (_17 > 0)
goto ; [94.50%]
  else
goto ; [5.50%]

   [local count: 108546038]:
  _18 = 0;
  _19 = _18 * 4;
  _20 = a_13(D) + _19;
  _21 = *_20;
  _22 = _18 + 1;
  _23 = _22 * 4;
  _24 = a_13(D) + _23;
  _25 = *_24;
  if (_21 > _25)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 906139986]:
  _1 = (long unsigned int) i_15;
  _2 = _1 * 4;
  _3 = a_13(D) + _2;
  _4 = *_3;
  _5 = _1 + 1;
  _6 = _5 * 4;
  _7 = a_13(D) + _6;
  _8 = *_7;
  if (_4 > _8)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 958878293]:
  # i_26 = PHI <0(3), i_15(4)>
  i_15 = i_26 + 1;
  _9 = n_12(D) + -1;
  if (_9 > i_15)
goto ; [94.50%]
  else
goto ; [5.50%]



(throttling it down with --param max-loop-header-insns=5 gives the expected
optimization)

[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not

2018-03-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091

--- Comment #13 from Alexander Monakov  ---
> (in the diffs, plus-lines correspond to -Wnonnull added to command line)

No, sorry, it was the other way around. Here's the reverse diff with more
context:

   if (0)
 {
   <;
 }
-  if (0)
-{
-  <;
-}
 }

It corresponds to

if(!(!std::signbit(bourn_cast( From(0) { lmi_test::record_error();
};
if(!(std::signbit(bourn_cast(-From(0) { lmi_test::record_error();
};

in template instantiation test_floating_conversions.
Essentially, with -Wnonnull the second condition seems to be folded to truth
value.

[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not

2018-03-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091

--- Comment #12 from Alexander Monakov  ---
I can reproduce it with downloaded Debian's cc1plus, and for me -Wnonnull alone
is sufficient to cause diverging codegen. It diverges very early, in the
frontend: diff of .tu dumps starts with:

--- a/1/16795.cpp.001t.tu
+++ b/2/16795.cpp.001t.tu
@@ -110354,336 +110354,337 @@
 @56158  bind_exprtype: @27  body: @59125
 @56159  cond_exprtype: @27  op 0: @5106op 1: @59126
  op 2: @59127
-@56160  cleanup_point_expr type: @27  op 0: @59128
-@56161  convert_expr type: @27  op 0: @59129
-@56162  call_exprtype: @109 fn  : @59130   0   : @59131
- 1   : @59132
-@56163  expr_stmttype: @27  line: 732  expr: @59133
-@56164  cleanup_point_expr type: @109 op 0: @59134
+@56160  cond_exprtype: @27  op 0: @5106op 1: @59128
+ op 2: @59129
+@56161  convert_expr type: @27  op 0: @59130
+@56162  call_exprtype: @109 fn  : @59131   0   : @59132
+ 1   : @59133
+@56163  expr_stmttype: @27  line: 732  expr: @59134
+@56164  cleanup_point_expr type: @109 op 0: @59135

and .original diff has the following hunk:

@@ -17695,8 +17695,11 @@ return  = __out;
   <;
 }
-  <;
+}
 }


(in the diffs, plus-lines correspond to -Wnonnull added to command line)

[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not

2018-03-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
Vadim, can you please check if the issue is reproducible on preprocessed (-E)
input as well, and if so, attach the preprocessed testcase so people can try to
repro it without downloading Debian's MinGW headers? Thanks.

[Bug sanitizer/84761] AddressSanitizer is not compatible with glibc 2.27 on x86

2018-03-19 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84761

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #7 from Alexander Monakov  ---
Is it possible that a distribution would backport glob() changes together with
its symver update (without also backporting the regparm change)? In that case
the dlvsym check shown above will be wrong I think.

Would the approach with confstr query for glibc version (discussed on irc) be
less fragile?

[Bug inline-asm/84861] -flto with asm() optimizes too much

2018-03-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
PR 57703 seems to be the "canonical instance" for the toplevel-asms-with-lto
issue.

[Bug middle-end/84681] New: tree-ter moving code too much

2018-03-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84681

Bug ID: 84681
   Summary: tree-ter moving code too much
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64

The following code (derived from a hot loop in a Huffman encoder, reported by
Fabian Giesen) suffers from TER activity too much on x86-64. TER lifts
loads+zero_extends to the BB head, sinking variable-length shifts and
increasing register pressure too badly.

Not being very familiar with TER, I think it would be good to understand why
loads are lifted all the way up to BB head like that. That's probably not
supposed to happen (and may be fixable without a TER overhaul?)

unsigned long long f(unsigned char *from,
 unsigned char *from_end,
 unsigned long long *codes,
 unsigned char *lens)
{
unsigned char sym0, sym1, sym2;
unsigned long long bits0=0, bits1=0, bits2=0;
unsigned char count0=0, count1=0, count2=0;
do {
sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0];
sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1];
sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2];
sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0];
sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1];
sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2];
sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0];
sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1];
sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2];
sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0];
sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1];
sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2];
sym0 = *from++; bits0 |= codes[sym0] << count0; count0 += lens[sym0];
sym1 = *from++; bits1 |= codes[sym1] << count1; count1 += lens[sym1];
sym2 = *from++; bits2 |= codes[sym2] << count2; count2 += lens[sym2];
} while(from != from_end);
return bits0+bits1+bits2;
}

[Bug tree-optimization/84562] -faggressive-loop-optimizations makes decisions based on weak data structures

2018-02-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84562

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
It's not just -faggressive-loop-optimizations, it seems that constructors of
weak globals are available for folding, and I really doubt that's actually
intended; after all, GCC does always consider weak function interposable, so
why not objects? Compare:

__attribute__((weak)) const int x=0; int f(){return x==0;}

f:
movl$1, %eax
ret

vs.

__attribute__((weak)) int x(void){return 0;} int f(){return x()==0;}

f:
subq$8, %rsp
callx
testl   %eax, %eax
sete%al
movzbl  %al, %eax
popq%rdx
ret

[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1

2018-02-26 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-02-26
 CC||abel at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Alexander Monakov  ---
Confirmed. Our comparator breaks here:

  /* Prefer SCHED_GROUP_P insns to any others.  */
  if (SCHED_GROUP_P (tmp_insn) != SCHED_GROUP_P (tmp2_insn))
{
  if (VINSN_UNIQUE_P (tmp_vinsn) && VINSN_UNIQUE_P (tmp2_vinsn))
return SCHED_GROUP_P (tmp2_insn) ? 1 : -1;

  /* Now uniqueness means SCHED_GROUP_P is set, because schedule groups
 cannot be cloned.  */
  if (VINSN_UNIQUE_P (tmp2_vinsn))
return 1;
  return -1;
}

when we have two non-unique insns such that one is in a sched group. That is
not supposed to happen actually, since SCHED_GROUP_P should imply
VINSN_UNIQUE_P. This invariant is broken when sched_macro_fuse_insns sets
SCHED_GROUP_P without looking at deps->readonly.

So while we could get rid of the issue by rewriting the problematic sel-sched
code in terms of SCHED_GROUP_P only, lack of deps->readonly check for
macro-fusion seems like a bigger issue and should be fixed too.

[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1

2018-02-26 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

--- Comment #2 from Alexander Monakov  ---
Bah, built a wrong branch, not the trunk. I'll recheck later, sorry for the
noise.

[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1

2018-02-26 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

--- Comment #1 from Alexander Monakov  ---
Sorry, I cannot reproduce this. I've built a cross-compiler from today's trunk
via 'configure --target aarch64-linux-gnu && make all-gcc' (i.e. just to
cc1plus, no binutils etc.) and it doesn't abort.

If possible please add 'g++ -v' output, svn revision, and any other info that
can help me reproduce the issue.

[Bug target/84301] [6/7/8 Regression] ICE in create_pre_exit, at mode-switching.c:451

2018-02-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301

--- Comment #4 from Alexander Monakov  ---
Moreover, without --param selsched-max-lookahead=2 sel-sched moves both the
assignment and  use into middle of BB 2, breaking the assumption in
mode-switching that retval use is the last insn:

249 /* If this function returns a value at the end, we have to
250insert the final mode switch before the return value copy
251to its hard register.  */
252 if (EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) == 1
253 && NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
254 && GET_CODE (PATTERN (last_insn)) == USE
255 && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG)

(independently of max-pending-list-length being 0 or not).

It seems a bit surprising that mode-switching needs to treat return value
specially, but more importantly, are the restrictions on return value register
set/use placement written down somewhere?

I don't see any explicit dependencies or barriers either, so isn't this like a
repeat of cc0 situation? What are other (dozens of) RTL passes doing to avoid
disturbing the required order?

Looking via gdb, apparently what pins those uses/clobbers to BB end for
haifa-sched is:

2728  /* Selective scheduling handles control dependencies by itself.  */
2729  if (!sel_sched_p ())
2730add_branch_dependences (head, tail);

but the function doesn't do what it says on the tin:

2432 /* Add dependences so that branches are scheduled to run last in their
2433block.  */
2434 static void
2435 add_branch_dependences (rtx_insn *head, rtx_insn *tail)
2436 {
2437   rtx_insn *insn, *last;
2438
2439   /* For all branches, calls, uses, clobbers, cc0 setters, and
instructions
2440  that can throw exceptions, force them to remain in order at the end
of
2441  the block by adding dependencies and giving the last a high priority.
2442  There may be notes present, and prev_head may also be a note.

[Bug c++/84191] Compiler ICEs when trying to resolve impossible arithmetic operations

2018-02-05 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84191

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
Testcase needs -march=znver1 for builtins to be available (comment #0 shows
-march=native which is unfortunately ambiguous).

[Bug c/70952] Missing warning for likely-erroneous octal escapes in string literals

2018-01-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70952

--- Comment #7 from Alexander Monakov  ---
Code in comment #0 is also valid, it's just rather questionable (the octal
literal is \00) and most likely unintended (or intentionally misleading).

[Bug c/70952] Missing warning for likely-erroneous octal escapes in string literals

2018-01-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70952

Alexander Monakov  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |---

--- Comment #5 from Alexander Monakov  ---
No, it's not a dup? Invalid octal literals outside of strings are already
properly diagnosed, so the other bug talks about warning about them _as a
matter of style_. This bug is about confusing use of octal literals in string
constants. Compare:

char c=008;

error: invalid digit "8" in octal constant

char c[]="\008";

[silently accepted with -Wall -Wextra, emits a string literal of size 3]

[Bug gcov-profile/84107] New: indirect call profiling broken with multiple DSOs

2018-01-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84107

Bug ID: 84107
   Summary: indirect call profiling broken with multiple DSOs
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: visibility, wrong-code
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 43272
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43272=edit
testcase archive

(Marxin, on IRC you've requested this bug to be filed; enjoy!)

The finely crafted testcase in the attachment segfaults with null pointer
dereference in __gcov_indirect_call_profiler_v2.

In general libgcov should have "hidden" visibility on small symbols that have
no need to inter-operate between different shared objects and can be freely
duplicated in user-built shared libraries (thus indirect profiling symbols
probably all miss the visibility annotation).

Large symbols and symbols that must exist in exactly one instance in the
running program probably should be a part of (nonexistent) libgcov.so.0.

[Bug rtl-optimization/83913] [6/7/8 Regression] Compile time and memory hog w/ selective scheduling

2018-01-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83913

--- Comment #2 from Alexander Monakov  ---
Thanks. While I could not find why we blow up with Haswell tuning but not say
Sandybridge, the main problem is that with all those -fno-... flags we have a
few insns of the form rK = rN where rN is loop-invariant and rK is unused, so
the insns are movable anywhere, including across the loop backedge (since
pipelining is enabled). We try to fill schedule holes (caused by long-latency
integer division insns) by repeatedly pipelining them. Eventually sched_times
cut off should prevent that, but it doesn't grow as intended because
bookkeeping copies get sched_times 0, and expr merging takes the minimum of two
sched_times.

<    3   4   5   6   7   8   9   10   11   12   >