[Bug testsuite/82951] gcc.c-torture/execute/20040409-1.c undefined behavior

2017-11-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82951

--- Comment #1 from Marc Glisse  ---
Or I should just add -fwrapv since those tests were added to test an RTL
transformation ( https://gcc.gnu.org/ml/gcc-patches/2004-04/msg00615.html ).

[Bug testsuite/82951] New: gcc.c-torture/execute/20040409-1.c undefined behavior

2017-11-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82951

Bug ID: 82951
   Summary: gcc.c-torture/execute/20040409-1.c undefined behavior
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

While testing a VRP patch, I had failures for
gcc.c-torture/execute/20040409-[1-3].c. If I run them with
-fsanitize=undefined, I get

20040409-1.c:27:12: runtime error: signed integer overflow: 0 - -2147483648
cannot be represented in type 'int'
20040409-1.c:17:12: runtime error: signed integer overflow: -2147483648 +
-2147483648 cannot be represented in type 'int'

20040409-2.c:47:13: runtime error: signed integer overflow: 0 - -2147483648
cannot be represented in type 'int'
20040409-2.c:57:23: runtime error: signed integer overflow: 4660 - -2147483648
cannot be represented in type 'int'
20040409-2.c:27:13: runtime error: signed integer overflow: -2147483648 +
-2147483648 cannot be represented in type 'int'
20040409-2.c:37:23: runtime error: signed integer overflow: -2147478988 +
-2147483648 cannot be represented in type 'int'
20040409-2.c:111:18: runtime error: signed integer overflow: -2147483648 +
-2147478988 cannot be represented in type 'int'

20040409-3.c:27:14: runtime error: signed integer overflow: 0 - -2147483648
cannot be represented in type 'int'
20040409-3.c:17:14: runtime error: signed integer overflow: -2147483648 +
-2147483648 cannot be represented in type 'int'

Unless someone volunteers to improve the tests, I'll likely remove the
offending cases (and probably more since this is a grid and I don't want to
look for every cell) from those 3 files.

[Bug bootstrap/82948] New: [8 Regression] prefix.c:202:15: error: 'char* strncpy(char*, const char*, size_t)' destination unchanged after copying no bytes [-Werror=stringop-truncation]

2017-11-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82948

Bug ID: 82948
   Summary: [8 Regression] prefix.c:202:15: error: 'char*
strncpy(char*, const char*, size_t)' destination
unchanged after copying no bytes
[-Werror=stringop-truncation]
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

Hello,

I cannot bootstrap currently (r254649) on gcc112
(powerpc64le-unknown-linux-gnu) with --with-system-zlib --disable-nls
--enable-languages=all,obj-c++,go --enable-host-shared

/home/glisse/pristine/gcc/prefix.c: In function 'char* translate_name(char*)':
/home/glisse/pristine/gcc/prefix.c:202:15: error: 'char* strncpy(char*, const
char*, size_t)' destination unchanged after copying no bytes
[-Werror=stringop-truncation]
   strncpy (key, [1], keylen);
   ^~~
/home/glisse/pristine/gcc/prefix.c:202:15: error: 'char* strncpy(char*, const
char*, size_t)' destination unchanged after copying no bytes
[-Werror=stringop-truncation]
/home/glisse/pristine/gcc/prefix.c:202:15: error: 'char* strncpy(char*, const
char*, size_t)' destination unchanged after copying no bytes
[-Werror=stringop-truncation]
/home/glisse/pristine/gcc/prefix.c:202:15: error: 'char* strncpy(char*, const
char*, size_t)' destination unchanged after copying no bytes
[-Werror=stringop-truncation]
cc1plus: all warnings being treated as errors
make[3]: *** [prefix.o] Error 1
make[3]: *** Waiting for unfinished jobs
rm gfortran.pod fsf-funding.pod gcov.pod gpl.pod cpp.pod gfdl.pod gccgo.pod
gcc.pod gcov-dump.pod gcov-tool.pod
make[3]: Leaving directory `/home/glisse/test/pristine/build/gcc'
make[2]: *** [all-stage2-gcc] Error 2

[Bug preprocessor/82939] genmatch fills up terminal with endless printing of periods

2017-11-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82939

--- Comment #1 from Marc Glisse  ---
Is that during stage 1 or in a later stage?

[Bug target/82935] Unnecessary "sub rsp, 8", "call" and "add rsp, 8" instructions

2017-11-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82935

--- Comment #4 from Marc Glisse  ---
We keep

  *a1_2(D) = *a2_3(D);

and only at expansion time turn it into a call to memcpy, so the gimple pass
that detects tail calls doesn't have a chance to notice this case.

[Bug middle-end/82898] Aliasing knowledge is not used to replace memmove with memcpy

2017-11-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82898

--- Comment #1 from Marc Glisse  ---
At least in the gcc model, the type of a pointer is meaningless as long as you
do not dereference it using that type, so I am not sure what can be done here.

[Bug c++/82888] terrible code generation for initialization of POD array members vs. clang

2017-11-07 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82888

--- Comment #4 from Marc Glisse  ---
The front-end internally uses VEC_INIT_EXPR, and gimplifies it to a loop. I
believe we should end up with an empty CONSTRUCTOR instead.

[Bug middle-end/82885] memcpy does not propagate aliasing knowledge

2017-11-07 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82885

--- Comment #1 from Marc Glisse  ---
gcc (illegally) generates some calls to memcpy(p,q,n) where p and q may be the
same pointer, although they mustn't overlap in any more complicated way. That
makes such an optimization problematic (although this memcpy generation seems
to happen at expansion time, so doing the optimization earlier might be ok).

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-06 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #11 from Marc Glisse  ---
(In reply to Wilco from comment #9)
> It works for any C where (divisor*C) MOD 2^32 == 1 (or -1).

For x%3==0, i.e. z==0 for x==3*y+z with 0<=y<5556 and 0<=z<3. 
Indeed, x*0xaaab==y+z*0xaaab is in the right range precisely for z==0
and the same can be done for any odd number.

> You can support any kind of comparison, it doesn't need to be with 0 (but 
> zero is the easiest).

Any ==cst will yield a range test. It is less obvious that inequalities are
transformed to a contiguous range... (try x%7<3 maybe)

> I forgot whether I made it work for signed too, but it's certainly
> possible to skip the sign handling in x % 4 == 0 even if x is signed.

4 is a completely different story, as a power of 2.

[Bug target/82858] __builtin_add_overflow() generates suboptimal code with unsigned types on x86

2017-11-06 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82858

--- Comment #4 from Marc Glisse  ---
unsigned c;
unsigned d = __builtin_add_overflow(a, b, )?-1:0;
return c|d;

gives the expected asm. Ideally phiopt would recognize a saturing add pattern,
but we have nothing to model it in gimple. We could turn it into the branchless
BIT_IOR form though.

(the problem isn't with __builtin_add_overflow but with what comes afterwards)

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #7 from Marc Glisse  ---
Is that a special case of a more generic transformation, which might apply for
other values of 3, 0, == etc, or is this meant only literally for x%3==0?

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2017-11-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #38 from Marc Glisse  ---
*** Bug 82845 has been marked as a duplicate of this bug. ***

[Bug c/82845] -ftree-loop-distribute-patterns creates recursive loops on function called "memset"

2017-11-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82845

Marc Glisse  changed:

   What|Removed |Added

 Resolution|FIXED   |DUPLICATE

--- Comment #3 from Marc Glisse  ---
Please don't touch the status field, I marked it as "duplicate" pointing to the
other PR, that's more useful than "fixed" (which is false).

Indeed we can hope that it will serve as a reminder for people working on PR
56888.

*** This bug has been marked as a duplicate of bug 56888 ***

[Bug c/82845] -ftree-loop-distribute-patterns creates recursive loops on function called "memset"

2017-11-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82845

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Marc Glisse  ---
Richard's patch seems to have been forgotten :-(

*** This bug has been marked as a duplicate of bug 56888 ***

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2017-11-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

Marc Glisse  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #37 from Marc Glisse  ---
*** Bug 82845 has been marked as a duplicate of this bug. ***

[Bug middle-end/82839] missing -Wmaybe-uninitialized warning

2017-11-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82839

--- Comment #1 from Marc Glisse  ---
You can simplify the function to

  int ts;
  g();
  *t = ts;
  h();

Part of the analysis is not flow-sensitive: we see that ts escapes, we deduce
that g() can write to it, so ts might be initialized and we do not warn. We
miss the fact that the escape cannot happen before the call to g.

[Bug target/82830] New: short rotate with truncated length

2017-11-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82830

Bug ID: 82830
   Summary: short rotate with truncated length
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

#include 
unsigned short f(unsigned short x,int n){
  return _rotwl(x,n&15);
}

andl$15, %ecx
rolw%cl, %ax

I believe the masking is unnecessary. We have some related things in i386.md,
but only for SWI48.

[Bug c++/82818] Bad Codegen, delete does not check for nullptrs

2017-11-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82818

--- Comment #3 from Marc Glisse  ---
Please read the documentation for -flifetime-dse, your code is invalid.

[Bug tree-optimization/82776] Unable to optimize the loop when iteration count is unavailable.

2017-11-01 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82776

--- Comment #8 from Marc Glisse  ---
At some point, we could also think of taking advantage of what the C++ standard
(for instance) says:

"[intro.progress]
The implementation may assume that any thread will eventually do one of the
following:
(1.1) — terminate,
(1.2) — make a call to a library I/O function,
(1.3) — perform an access through a volatile glvalue, or
(1.4) — perform a synchronization operation or an atomic operation.
[ Note: This is intended to allow compiler transformations such as removal of
empty loops, even when termination cannot be proven. — end note ]"

The only potential "progress" in this loop is the call to
__builtin_ia32_pmovmskb128, but replacing it with a call to a function with
attribute((const)) does not help. And if there is no progress in the loop, the
loop must be finite.

(we could have some new flag if people insist on for(;;); not being optimized
away. I would even use a flag -fno-infinite-loop that says that no loop can be
infinite, or -fmain-returns that says that no loop is infinite and the program
cannot trap or terminate, etc, but that's getting a bit far from this PR)

[Bug c++/82781] [6/7/8 Regression] Vector extension operators return wrong result in constexpr

2017-10-31 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82781

Marc Glisse  changed:

   What|Removed |Added

  Known to work||5.4.0
   Target Milestone|--- |6.5
Summary|Vector extension operators  |[6/7/8 Regression] Vector
   |return wrong result in  |extension operators return
   |constexpr   |wrong result in constexpr

--- Comment #1 from Marc Glisse  ---
Have to write static_assert( ... ,"") with earlier compilers, but gcc-6 is the
first that fails.

[Bug c++/82781] Vector extension operators return wrong result in constexpr

2017-10-31 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82781

Marc Glisse  changed:

   What|Removed |Added

   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-10-31
 Ever confirmed|0   |1

[Bug tree-optimization/82776] Unable to optimize the loop when iteration count is unavailable.

2017-10-31 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82776

--- Comment #1 from Marc Glisse  ---
That could be because gcc sadly refuses to optimize away infinite loops
(happens for other cases, and cddce2 dump (the pass that removes the whole
thing when the macro is defined) says "can not prove finiteness of loop 2").
Although ++chunk_ should be enough to prove that the loop terminates (otherwise
chunk_ eventually overflows).

(the unaligned vector use in this code seems strange)

[Bug c++/82760] Incorrect code generated for aligned new

2017-10-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82760

--- Comment #2 from Marc Glisse  ---
In cp/call.c:

-  (**args)[0] = *size;
+  const_cast((*cand->args)[0]) = *size;

since in the aligned case we are using a copy align_args of the arguments. Of
course it should be done in a way that doesn't require a const_cast.

[Bug c++/82760] Incorrect code generated for aligned new

2017-10-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82760

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-10-28
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
If I make the destructor do something (print hello) and I delete[] gFoo, then I
get a crash with c++17 and not c++14, indeed. In c++17 we don't seem to
allocate any extra space to store the array size.

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2017-10-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

--- Comment #3 from Marc Glisse  ---
Actually, what CSE1 does might be fine, and it is LRA that should have noticed
that the register it assigned was clobbered, so it should have spilled (or
better rematerialized). Assuming the i386 backend does say that this unspec
clobbers the registers, which I am not seeing right now (but I may not be
looking in the right place).

[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers

2017-10-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-10-26
 Ever confirmed|0   |1

--- Comment #2 from Marc Glisse  ---
CSE1 happily turns uses of the second constant, loaded after vzeroupper, into
uses of the first constant, loaded before, ignoring the fact that vzeroupper
clobbers (the upper part of) all avx registers.

[Bug tree-optimization/82732] malloc+zeroing other than memset not optimized to calloc, so asm output is malloc+memset

2017-10-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82732

--- Comment #2 from Marc Glisse  ---
If you use size_t consistently (for size and i), then the resulting code is a
call to calloc.

[Bug tree-optimization/82732] malloc+zeroing other than memset not optimized to calloc, so asm output is malloc+memset

2017-10-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82732

--- Comment #1 from Marc Glisse  ---
We do recognize the memset early enough. What we fail to recognize is that the
size argument to malloc is the same as the length of the memset:

  _1 = (long unsigned int) size_8(D);
  _2 = _1 * 4;
  p_11 = malloc (_2);
  if (size_8(D) != 0)
goto ; [85.00%] [count: INV]
  else
goto ; [15.00%] [count: INV]

   [12.75%] [count: INV]:
  _18 = size_8(D) + 4294967295;
  _21 = (sizetype) _18;
  _7 = _21 + 1;
  _6 = _7 * 4;
  __builtin_memset (p_11, 0, _6);

VRP could be taught to simplify (unsigned long)(u-1)+1 to (unsigned long)u for
unsigned int u non-zero (though there is no VRP between ldist and strlen), or
we could try to generate some simpler code in ldist...

[Bug inline-asm/82677] Many projects (linux, coreutils, GMP, gcrypt, openSSL, etc) are misusing asm(divq/divl) etc, potentially resulting in faulty/unintended optimisations

2017-10-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677

--- Comment #3 from Marc Glisse  ---
On x86, by default, the compiler already assumes that flags are clobbered.
That's explained in a comment in GMP's longlong.h at least.

[Bug libstdc++/81797] gcc 7.1.0 fails to build on macOS 10.13 (High Sierra):

2017-10-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797

--- Comment #32 from Marc Glisse  ---
(In reply to Misty De Meo from comment #31)
> For what it's worth, Apple's response was: "We analyzed the issue and
> determined the problem to be a latent bug in gcc’s build system that is
> revealed by changes in macOS High Sierra. The FSF will need up issue a fix
> in gcc."

Thanks for forwarding. Their response is oh so precise and helpful... "bug on
your side, washing my hands". I can't complain since I basically did the same
thing in my previous comment, but if they really did analyze the issue, one
might expect that they would share what the bug actually is :-(

[Bug libstdc++/81797] gcc 7.1.0 fails to build on macOS 10.13 (High Sierra):

2017-10-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797

--- Comment #30 from Marc Glisse  ---
(In reply to Francois-Xavier Coudert from comment #29)
> The result of "make -d --trace -j8 all-target-libstdc++-v3", in a build
> where x86_64-apple-darwin17.0.0/libstdc++-v3 was entirely removed, can be
> found here:
> https://gist.github.com/fxcoudert/b621465a794d968593bc7ed90c0fc1fb

make's I/O is not exactly a reliable way to debug multithreading issues, but
the output looks right to me.

If --disable-libstdcxx-pch works (does it?), and until someone can investigate
more, I'd be tempted to consider it a mac bug and recommend that option in
https://gcc.gnu.org/install/specific.html .

[Bug libstdc++/81797] gcc 7.1.0 fails to build on macOS 10.13 (High Sierra):

2017-10-18 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81797

--- Comment #28 from Marc Glisse  ---
I am also failing to see how this can happen without a bug in make or macos.
The failing command is the recipe for ${pch1b_output}. That rule has
${allstamped} as a dependency, which includes stamp-bits-sup, whose recipe does
link the header. At least, disabling precompiled headers should work around it
(--disable-libstdcxx-pch IIRC)

You could always remove the @ sign on the $(STAMP) lines (and the ones before)
so it gets printed in the output, maybe that would show something suspicious.
If you are building in a clean directory (the headers aren't there yet), you
could also remove '-' at the beginning of the $(LN_S) lines, to make sure that
no error occurs. Running make in verbose mode might also hint at something.
Maybe print the date in the pch rule (or use the creation date of
${pch1_output_builddir}), and compare it to the creation date of the symlinks,
etc.

If the issue was with make, you could try replacing
all-local: ${allstamped} ${allcreated}

with
all-local:
$(MAKE) ${allstamped}
$(MAKE) ${allcreated}

Generally, I don't understand why we are linking sources in the build directory
instead of passing -I flags pointing directly to the source directory.

[Bug tree-optimization/80511] [8 Regression] gcc.dg/Wstrict-overflow-18.c gcc.dg/Wstrict-overflow-7.c gcc.dg/pragma-diag-3.c

2017-10-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80511

Marc Glisse  changed:

   What|Removed |Added

Summary|[8 Regression]  |[8 Regression]
   |gcc.dg/Wstrict-overflow-18. |gcc.dg/Wstrict-overflow-18.
   |c   |c
   ||gcc.dg/Wstrict-overflow-7.c
   ||gcc.dg/pragma-diag-3.c

--- Comment #3 from Marc Glisse  ---
https://gcc.gnu.org/viewcvs/gcc?view=revision=253642

2 more testcases got xfailed: gcc.dg/Wstrict-overflow-7.c and
gcc.dg/pragma-diag-3.c.

Some possibilities:
- add the warning in match.pd: users keep complaining about those
strict-overflow warnings, so we would have to take it out of Wall.
- add the warning in match.pd, restricted to GENERIC: that gets us close to the
gcc-7 situation.
- reimplement the warning in the front-end. In general, telling users that we
simplified x+1

[Bug target/82498] Missed optimization for x86 rotate instruction

2017-10-12 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82498

--- Comment #10 from Marc Glisse  ---
f1...f6 already have a LROTATE_EXPR in the .original dump. The others don't get
one until forwprop1, which is after einline, so there is a small chance of
inlining causing other optimizations that mess with rotate detection (or the
large-ish code before rotate is recognized may prevent early inlining, missing
optimizations). I guess without going through the large job of moving the
rotate code from forwprop to match.pd it would be possible to add one basic
transform to recognize precisely the case in those intrinsics, if we pick one
in f7...f11.

[Bug target/82498] Missed optimization for x86 rotate instruction

2017-10-12 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82498

--- Comment #7 from Marc Glisse  ---
(In reply to Uroš Bizjak from comment #6)
> You can use __rol{b,w,d,q} and __ror{b,w,d,q} (and their aliases) from
> ia32intrin.h. These are standardized; you have to include x86intrin.h header.

Some of those break if you use -fsanitize=undefined.

#include 

int main(){
  unsigned i = 0;
  return __rold(i,0);
}

/usr/lib/gcc-snapshot/lib/gcc/x86_64-linux-gnu/8/include/ia32intrin.h:150:30:
runtime error: shift exponent 32 is too large for 32-bit type 'unsigned int'

[Bug target/82498] Missed optimization for x86 rotate instruction

2017-10-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82498

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-10-10
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
Looks like https://stackoverflow.com/q/44000956/1918193 . During combine, we
try to match

(set (reg:SI 97)
(rotate:SI (reg/v:SI 90 [ input ])
(and:QI (subreg:QI (reg:SI 92 [ rot ]) 0)
(const_int 31 [0x1f]

But the pattern in i386.md has 'and' and 'subreg' reversed.

For the other part, we have a very limited transform that removes the test in
this case:
uint32_t rotate_left(uint32_t input, int rot)
{
  if(rot == 0)
return input;
  return static_cast((input << rot) | (input >>
(8*sizeof(uint32_t)-rot)));;
}
But it only works when there is a single gimple insn involved, not
and+cast+rotate.

[Bug c++/82505] g++ -O3 -funroll-loops generates weird code

2017-10-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82505

--- Comment #2 from Marc Glisse  ---
dest/src might alias anything (even themselves), so the compiler can't really
optimize much.

[Bug middle-end/82504] Optimize away exception allocation and throws handled by catch(...){}

2017-10-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82504

--- Comment #3 from Marc Glisse  ---
Dup of PR53294?

[Bug libstdc++/82470] Structured bindings don't work with std::tuple if a type has a get member function

2017-10-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82470

--- Comment #3 from Marc Glisse  ---
As with all the issues caused by the EBCO in std::tuple, I believe the answer
is PR 63579 (I think it can be done in a way that preserves the layout of
tuple).

[Bug ipa/82476] C++: Inlining fails for a simple function

2017-10-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82476

--- Comment #2 from Marc Glisse  ---
What is the point of inlining it? It isn't a hot call (called once from main).
And unless you are using something like -flto of -fwhole-program (which would
turn the function static), it has to be emitted as a separate function as well,
so inlining it increases code size.

[Bug tree-optimization/82434] -fstore-merging does not work reliably.

2017-10-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82434

--- Comment #2 from Marc Glisse  ---
-Dbool=char lets it merge the stores, I guess this is because bool has
precision < bitsize.

[Bug target/82418] Division on a constant is suboptimal because of not using imul instruction

2017-10-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82418

Marc Glisse  changed:

   What|Removed |Added

 Target||x86_64-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-10-04
 Ever confirmed|0   |1

[Bug target/82418] Division on a constant is suboptimal because of not using imul instruction

2017-10-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82418

--- Comment #4 from Marc Glisse  ---
(In reply to Alexander Monakov from comment #3)
> it's likely that your test measured something else,

You are right, my test was bogus and clang's version is faster.

[Bug target/82418] Division on a constant is suboptimal because of not using imul instruction

2017-10-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82418

--- Comment #1 from Marc Glisse  ---
If I time it, gcc's code is several times faster than clang's on skylake. Why
is clang's version supposed to be better?

[Bug libstdc++/82417] Macros from defined in C++11

2017-10-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82417

--- Comment #3 from Marc Glisse  ---
(In reply to Jonathan Wakely from comment #2)
> Thinking about this further, I think we must not include  at all
> for strict -std=c++1* modes,

Yes.

Can we get a #warning in that case which explains that including  in
strict C++11+ mode makes no sense? Actually,  could also do with a
#warning explaining that it never makes sense to include it.

[Bug libstdc++/82417] Macros from defined in C++11

2017-10-03 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82417

--- Comment #1 from Marc Glisse  ---
(In reply to Jonathan Wakely from comment #0)
> The C++11 standard says that  should just include the C++
>  header and completely ignore the C library's header.

I am very surprised that nobody has cared enough to get the standard fixed. But
I can't complain, I didn't write a proposal either.

> For C++11 mode we should #undef the macros that  defines with
> non-reserved names, and maybe consider not including  at all for
> -std=c++1* modes.

I guess so.

[Bug c++/82394] Pointer imposes an optimization barrier

2017-10-02 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82394

--- Comment #1 from Marc Glisse  ---
What compiler flags? At -O3 we do optimize both the same.

[Bug target/79709] Subobtimal code with -mavx and explicit vector

2017-09-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79709

--- Comment #8 from Marc Glisse  ---
Thomas, the code generated by gcc has changed (after some patches by Jakub
IIRC). Do you consider the issue fixed or is the generated asm still
problematic?

.L13:
vpextrq $1, %xmm2, %rax
testq   %rax, %rax
je  .L2
vextractf128$0x1, %ymm2, %xmm2
vmovq   %xmm2, %rax
testq   %rax, %rax
jne .L2
vpextrq $1, %xmm2, %rax
vmovapd %ymm4, %ymm3
testq   %rax, %rax
jne .L2
.L3:
vmulpd  %ymm3, %ymm3, %ymm4
vmulpd  %ymm8, %ymm3, %ymm3
vsubpd  %ymm10, %ymm4, %ymm4
vmulpd  %ymm9, %ymm3, %ymm3
vaddpd  %ymm0, %ymm4, %ymm4
vaddpd  %ymm1, %ymm3, %ymm9
vaddpd  %ymm4, %ymm4, %ymm2
vmulpd  %ymm9, %ymm9, %ymm10
vaddpd  %ymm10, %ymm2, %ymm2
vcmpltpd%ymm7, %ymm2, %ymm2
vpaddq  %xmm2, %xmm5, %xmm3
vextractf128$1, %ymm2, %xmm6
vmovq   %xmm2, %rax
vextractf128$1, %ymm5, %xmm5
testq   %rax, %rax
vpaddq  %xmm6, %xmm5, %xmm5
vinsertf128 $0x1, %xmm5, %ymm3, %ymm5
jne .L13

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2017-09-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #2 from Marc Glisse  ---
Does anything bad happen if you remove the #ifdef/#endif for _mm_cvtsi64_si128?
(2 files in the testsuite would need updating for a proper patch)

[Bug target/82261] x86: missing peephole for SHLD / SHRD

2017-09-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82261

--- Comment #1 from Marc Glisse  ---
Related to PR 55583.

[Bug target/82242] x86_64 bad optimization with -march

2017-09-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82242

--- Comment #2 from Marc Glisse  ---
Nothing gets vectorized :-(
Note that to fill the vector, this would be better

  std::vector array(size, 1e-9);

In the reduction, we seem to do strange things with the accumulator.

addsd   (%rax), %xmm1
addq$8, %rax
cmpq%rbx, %rax
movsd   %xmm1, (%rsp)
jne .L13

or

vmovq   %rbp, %xmm2
vaddsd  (%rax), %xmm2, %xmm1
addq$8, %rax
vmovq   %xmm1, %rbp
cmpq%rbx, %rax
jne .L13

We aren't happy with xmm1, we save the value to memory in the first case, and
to an integer register in the second case where we even restore the value from
that register...

[Bug middle-end/82223] Incorrect optimization for lossy round trips of arithmetic types

2017-09-15 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82223

--- Comment #2 from Marc Glisse  ---
(float)INT_MAX gets rounded to 2^31. When you try to convert it to int, it
doesn't fit, so the compiler is at liberty to return INT_MAX if it likes.
clang's -fsanitize=undefined does complain on your code (not gcc's though).

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-09-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #18 from Marc Glisse  ---
(In reply to Gergö Barany from comment #17)
> the division used to be replaced by a shift that updated the condition code
> register (again, on ARM; r250337):

(just my opinion)
At a high level (gimple), (unsigned)x+3<=6 seems like a more canonical way to
represent an interval than x/4==0. If the second one turns out to be more
efficient on some targets, it sounds like we could later turn (unsigned)x+3<=6
into x/4==0 (even if the user did not write it that way), i.e. add a new
transform at RTL time. Looks like a separate enhancement request would be
appropriate.

[Bug target/82170] gcc optimizes int range-checking poorly on x86-64

2017-09-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82170

--- Comment #2 from Marc Glisse  ---
Note that n==(int)n (gcc documents that this must work) may work with more gcc
versions and is more readable.

[Bug c++/82146] if () is always true error

2017-09-08 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82146

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Marc Glisse  ---
Null references are illegal, use pointers if you want to use null pointers.

[Bug tree-optimization/82135] Missed constant propagation through possible unsigned wraparound, with std::align() variable pointer, constant everything else.

2017-09-07 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82135

--- Comment #1 from Marc Glisse  ---
This PR is a bit messy, please minimize your examples...
Looking at the dse2 dump (before reassoc messes things up):

  __intptr_2 = (const long unsigned int) voidp_9(D);
  _3 = __intptr_2 + 63;
  __aligned_4 = _3 & 18446744073709551552;
  __diff_5 = __aligned_4 - __intptr_2;
  _6 = __diff_5 + 64;
  if (_6 > 1024)

IIUC, essentially, you would like gcc to realize that __diff_5 is in [0,63], so
the condition is always false.

If aligned was not reused, we could simplify ((x+63)&-64)-x to 63&-x, but we
don't want to do it in general. Maybe we could add a very special case in VRP
(or CCP for nonzero bits)...
(we could also add if(__diff>__align)__builtin__unreachable() in  but
that's getting really specific)

[Bug lto/82027] [5/6/7/8 Regression] wrong code with -O3 -flto

2017-08-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82027

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-08-29
Summary|wrong code with -O3 -flto   |[5/6/7/8 Regression] wrong
   ||code with -O3 -flto
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
gcc mistakenly thinks it found some UB (division by zero) and inserts a trap.

[Bug c++/82021] Unnecessary null pointer check in global placement new (and also in any class-specific placement new operator declared as noexcept)

2017-08-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82021

--- Comment #3 from Marc Glisse  ---
You can search for "Ville Voutilainen", the patch was this year, not long
before the release so maybe March.

[Bug c++/82021] Unnecessary null pointer check in global placement new (and also in any class-specific placement new operator declared as noexcept)

2017-08-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82021

--- Comment #1 from Marc Glisse  ---
Did you try with -std=c++1z? (if that solves your issue, this is a DUP, it
should be enabled in all mode, but it isn't yet)

[Bug c++/82000] Missed optimization of char_traits::length() on constant string

2017-08-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82000

--- Comment #3 from Marc Glisse  ---
(In reply to Louis Dionne from comment #2)
> > Downloading the one from godbolt, we simplify it to: [...]
> 
> I have no idea what this is and how you feed that to GCC, but I'm curious.

That's what -fdump-tree-optimized shows (end of high-level optimizations). You
don't feed it to gcc (it is missing all information about internal_buffer for
instance), although with -fgimple there are variants that gcc could read.

[Bug c++/82000] Missed optimization of char_traits::length() on constant string

2017-08-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82000

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-08-28
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
The example you wrote in the bug report makes no sense: missing includes, and
with the includes added it optimizes to return 0. Downloading the one from
godbolt, we simplify it to:

int main() ()
{
  struct string_view D.32298;
  long unsigned int _15;

   [14.44%] [count: INV]:
  _15 = __builtin_strlen (_buffer);
  MEM[(struct string_view *)] = _15;
  MEM[(struct string_view *) + 8B] = _buffer;
  __asm__ __volatile__("" :  : "i,r,m" D.32298 : "memory");
  return 0;
}

Indeed we don't seem to manage folding strlen there. I think there is a DUP
asking to transform the buffer into STRING_CST or something like that.

(btw, why do you use "g" for clang and not for gcc?)

[Bug c++/69433] missing -Wreturn-local-addr assigning address of a local to a static

2017-08-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69433

--- Comment #3 from Marc Glisse  ---
f3: the inliner silently removes s (and the assignment to it) as write-only.
You need to add a function that reads s (we don't warn in that case either, of
course, but that's a first step).

f2: the (atomic) initialization of the static is a lot of hard to optimize
code. Still, since we manage to warn for f1:

  # iftmp.0_1 = PHI <(2), "def"(3)>
  a ={v} {CLOBBER};
  return iftmp.0_1;

we would probably manage it for f2:

  # prephitmp_14 = PHI 
  a ={v} {CLOBBER};
  return prephitmp_14;

... if there was an isolate-path pass after PRE, since before that we only see:

  s = 
  __cxa_guard_release (&_ZGVZ2f2vE1s);

   [100.00%] [count: INV]:
  _10 = s;
  a ={v} {CLOBBER};
  return _10;

IMO we should look into why this optimization doesn't happen before PRE (why
not FRE for instance?).

[Bug tree-optimization/81948] New: vectorize exp2 using exp

2017-08-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81948

Bug ID: 81948
   Summary: vectorize exp2 using exp
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

Using -Ofast -mavx2 and a recent glibc, g++ vectorizes

#include 
void f(double*d){
  d=(double*)__builtin_assume_aligned(d,256);
  for(int i=0;i<1024;++i)
d[i]=std::exp(d[i]*std::log(2));
}

However, if I write d[i]=std::exp2(d[i]) instead, it fails to vectorize
(libmvec does not provide a vector version of exp2). It would be good, when
checking if a standard function like exp2 has a vector version, to also check
related, more canonical functions (exp in this case).

(this could also be vaguely related to PR 81706)

[Bug libstdc++/81912] std::distance not constexpr in C++17 mode

2017-08-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81912

Marc Glisse  changed:

   What|Removed |Added

 CC||alexbaroni68 at gmail dot com

--- Comment #3 from Marc Glisse  ---
*** Bug 81944 has been marked as a duplicate of this bug. ***

[Bug c++/81944] constexpr std::distance

2017-08-23 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81944

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Marc Glisse  ---
.

*** This bug has been marked as a duplicate of bug 81912 ***

[Bug c++/81906] [7/8 Regression] Calls to rint() wrongly optimized away starting in g++ 6

2017-08-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81906

--- Comment #8 from Marc Glisse  ---
(In reply to Vadim Zeitlin from comment #5)
> Perhaps you could consider this as a QoI issue, but it would be really great
> if gcc could give a warning if the code tries to use fesetround() without
> -frounding-math being on.

First note that even with -frounding-math, there are several bugs related to
rounding (maybe rint isn't considered pure, but operators like +-*/ are). Also,
there are ways (inline asm that hides optimization opportunities) to use
fesetround safely even with -fno-rounding-math (and it avoids the perf penalty
in places where we don't care about the rounding). Still, I guess we could
consider such a warning, if someone is willing to implement it...

[Bug c++/81906] [7/8 Regression] Calls to rint() wrongly optimized away starting in g++ 6

2017-08-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81906

Marc Glisse  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
   Last reconfirmed||2017-08-20
 Resolution|INVALID |---
Summary|Calls to rint() wrongly |[7/8 Regression] Calls to
   |optimized away starting in  |rint() wrongly optimized
   |g++ 6   |away starting in g++ 6
 Ever confirmed|0   |1

--- Comment #2 from Marc Glisse  ---
Indeed you want -frounding-math, and with gcc-6 that makes things work, but
starting with gcc-7 it doesn't anymore. (gimple looks fine, the problem comes
later)

[Bug libstdc++/81905] New: partial_sort slower than sort

2017-08-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81905

Bug ID: 81905
   Summary: partial_sort slower than sort
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---

(from https://stackoverflow.com/q/45455345/1918193 )

std::partial_sort of half an array can be slower than std::sort of the whole
array, because it uses heap sort vs introsort. There may be a size threshold
above which we could use a different algorithm than heap_select+sort_heap (say
a variant of introsort where after partitioning (possibly with a biased pivot),
depending where the pivot ends up, either we partial_sort the left and ignore
the right, or we sort the left and partial_sort the right), or some other
compromise.

[Bug target/81904] New: FMA and addsub instructions

2017-08-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904

Bug ID: 81904
   Summary: FMA and addsub instructions
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: glisse at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-*-*

(asked in
https://stackoverflow.com/questions/45298855/how-to-write-portable-simd-code-for-complex-multiplicative-reduction/45401182#comment77780455_45401182
)

Intel has instructions like vfmaddsubps. Gcc manages, under certain
circumstances, to merge mult and plus or mult and minus into FMA, but not mult
and this strange addsub mix.

#include 
__m128d f(__m128d x, __m128d y, __m128d z){
  return _mm_addsub_pd(_mm_mul_pd(x,y),z);
}
__m128d g(__m128d x, __m128d y, __m128d z){
  return _mm_fmaddsub_pd(x,y,z);
}

(the order of the arguments is probably not right)

My first guess as to how this could be implemented without too much trouble is
in ix86_gimple_fold_builtin: for IX86_BUILTIN_ADDSUBPD and others, check that
we are late enough in the optimization pipeline (roughly where "widening_mul"
is), that contractions are enabled, and that the first (?) argument is a
single-use MULT_EXPR.

I didn't check what the situation is with the vectorizer (which IIRC can now
generate code that ends up as addsub).

[Bug c/81389] _mm_cmpestri segfault on -O0

2017-08-17 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81389

--- Comment #13 from Marc Glisse  ---
(In reply to rockeet from comment #7)
> @Marc @Jakub @Martin
> Intel CPU document says: operand of _mm_cmpestri can be memory or mm
> register, when the operand is memory, it does not require alignment.

That's the doc for the CPU instruction. The intrinsic, as a C function, always
takes an object of type __m128i, not a register or memory. The only question is
what the alignment of the type __m128i is. In gcc, it is 16 bytes. What does
alignof (or _Alignof or whatever variant you can get working) return with
Intel's compiler?

> The issue is: GCC does not know this knowledge(memory operand need not
> memory align), and there is no way to enforce gcc to generate a _mm_cmpestri
> which always use memory operand, not mm register.

Use inline asm? Intrinsics are not quite as low level as you seem to expect.

> If I manually load the unaligned memory into an aligned `__m128i`, it has
> performance penalty on optimizing compilation.

Uh? With -O1, the compiler merges the unaligned load with pcmpestri (it knows
that the insn can read unaligned memory). Did you mean to talk about the
performance of code generated with -O0? We explicitly do not care about that.

[Bug c/81630] powl returns values with insufficient accuracy

2017-07-31 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81630

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Marc Glisse  ---
> Apple LLVM version 8.1.0 (clang-802.0.42)

That's not gcc.

[Bug tree-optimization/81607] Conditional operator: "type mismatch in shift expression" error

2017-07-29 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81607

Marc Glisse  changed:

   What|Removed |Added

   Keywords||ice-checking
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-07-29
   Target Milestone|--- |8.0
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
The original dump has
(void) (f = (char) ((long int) d.c << a))

(using long instead of int) This happens only for a bitfield of size 32, not 31
or 33, we probably get confused about a NOP conversion somewhere along the way.

[Bug c++/81606] A small program works as expected with -O0 but fails with -O1 on all tested gcc versions

2017-07-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81606

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Marc Glisse  ---
What do you expect "A" >= "B" to mean? You are comparing addresses, the result
is arbitrary.

[Bug c++/81597] returns link to temporary value

2017-07-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81597

--- Comment #3 from Marc Glisse  ---
-Werror=return-local-addr
(we cannot reject those programs by default, if the caller ignores what the
function returns, the program may be valid)

[Bug c++/81597] returns link to temporary value

2017-07-28 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81597

--- Comment #1 from Marc Glisse  ---
Sorry, what change are you asking for?

Compiling with current gcc, we get plenty of warnings, and at runtime:

int &&
zsh: segmentation fault  ./a.out

[Bug tree-optimization/81555] Wrong code at -O1

2017-07-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81555

--- Comment #3 from Marc Glisse  ---
(In reply to Dmitry Babokin from comment #2)
> Hmmm, but this one is triggered at -O1, another only at -O2.

-fno-tree-reassoc should help both.

It is often a combination of optimizations that causes the bug. Reassoc is
doing a good transformation, but it leaves wrong information around, which only
matters if some other pass (rightfully) takes advantage of that information.
Still, it was good to report both, and I expect we may add (a modified version
of) both to the testsuite once this is fixed, thanks.

[Bug tree-optimization/81555] Wrong code at -O1

2017-07-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81555

--- Comment #1 from Marc Glisse  ---
Same reassoc issue as PR 81556 it seems.

[Bug tree-optimization/81556] Wrong code at -O2

2017-07-26 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81556

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-07-26
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
Reassoc not clearing the VRP info?

[Bug tree-optimization/81503] [8 Regression] Wrong code at -O2

2017-07-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81503

--- Comment #4 from Marc Glisse  ---
  if (a + b * -2)
c = (b-1073741824)*-2;

might let you find an earlier culprit.

[Bug tree-optimization/81503] Wrong code at -O2

2017-07-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81503

--- Comment #1 from Marc Glisse  ---
Looks like SLSR does an overflow-unsafe transformation, then VRP2 takes
advantage of it. Maybe.

[Bug middle-end/81502] In some cases the data is moved to memory unnecessarily [partial regression]

2017-07-21 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81502

Marc Glisse  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-07-21
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
.optimized dump:

int bar(void*) (void * ptr)
{
  int res;
  __m128i word;
  long unsigned int _2;
  vector(2) long long int word.3_3;
  unsigned int _4;

   [100.00%] [count: INV]:
  _2 = (long unsigned int) ptr_9(D);
  word = { 0, 0 };
  MEM[(char * {ref-all})] = _2;
  word.3_3 = word;
  word ={v} {CLOBBER};
  _4 = BIT_FIELD_REF ;
  res_5 = (int) _4;
  return res_5;

}

We missed turning the memory write into a BIT_INSERT_EXPR, and passes like PRE
missed following the bit_field_expr all the way to _2.

.combine dump:
[...]
(insn 8 3 10 2 (set (reg/v:V2DI 90 [ word ])
(vec_concat:V2DI (reg/v/f:DI 92 [ ptr ])
(const_int 0 [0]))) "b.c":16 3712 {vec_concatv2di}
 (expr_list:REG_DEAD (reg/v/f:DI 92 [ ptr ])
(nil)))
(insn 10 8 15 2 (set (reg:SI 94 [ res ])
(vec_select:SI (subreg:V4SI (reg/v:V2DI 90 [ word ]) 0)
(parallel [
(const_int 0 [0])
]))) "b.c":20 3697 {*vec_extractv4si_0}
 (expr_list:REG_DEAD (reg/v:V2DI 90 [ word ])
(nil)))
[...]

combine tries
(set (reg:SI 94 [ res ])
(vec_select:SI (subreg:V4SI (vec_concat:V2DI (reg/v/f:DI 92 [ ptr ])
(const_int 0 [0])) 0)
(parallel [
(const_int 0 [0])
])))
which we fail to simplify. The xmm1-xmm0 mov is not considered a mov by the
compiler but concatenation with 0, so not a RA problem.

The change of mode (64-bit pointer to 32-bit int) seems to play a big role in
confusing things here.

[Bug tree-optimization/81396] [7/8 Regression] Optimization of reading Little-Endian 64-bit number with portable code has a regression

2017-07-20 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81396

--- Comment #9 from Marc Glisse  ---
Should we open a separate PR for the transformation you suggested in comment 4,
or does that seem not useful enough now, or will be part of bitfield gimple
lowering when that lands?

[Bug libstdc++/81476] severe slow-down with range-v3 library compared to clang

2017-07-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81476

--- Comment #17 from Marc Glisse  ---
(In reply to Jonathan Wakely from comment #14)
> The advantage of doing it as in comment 13, rather than:
> [comment #11]
> is that when inserting the inputrange causes reallocations we only have to
> transfer the already inserted elements of the inputrange to the new storage,
> not the elements preceding the insertion point ("the beginning of the
> vector" and "what we already inserted at the end").

I see what you mean. Note that as soon as there is some reallocation going on
at any point, we should be able to avoid calling (in-place) rotate, which is
quite a bit more expensive than a simple range move.

[Bug libstdc++/81476] severe slow-down with range-v3 library compared to clang

2017-07-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81476

--- Comment #11 from Marc Glisse  ---
Or one could (not legal) directly start a new allocation, copy the beginning of
the vector, append the range, then append the end of the vector. Or a
combination of all that: first try appending the range to the vector. If that
works without reallocating, rotate. If a reallocation is necessary, switch to
the "new allocation" strategy, create a new vector, copy the beginning of the
vector, copy what we already inserted at the end, append the rest of the
inputrange, copy the rest of the original vector, and finally adopt this new
vector.

[Bug libstdc++/81476] severe slow-down with range-v3 library compared to clang

2017-07-19 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81476

--- Comment #10 from Marc Glisse  ---
Inserting an InputRange (not even Forward) at the beginning of a vector is
really a misuse of vector. It is true that we can do better than what libstdc++
currently does, though we shouldn't encourage the practice. Trivial idea would
be first to copy the InputRange to some array (either something dynamic like a
vector, or by block to a fixed size buffer), then insert that. Variants include
tricks like inserting the InputRange at the end of the vector, then calling
std::rotate to move it to the right position.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-18 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #13 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #12)
> Created attachment 41781 [details]
> gcc8-pr81346-2.patch
> 
> Further optimization from build_range_check.

I wonder if "1" is that special, this optimization basically applies to any
range that ends at INT_MAX, turning (X-C1)<=C2 into (signed)X>=C3. Or do we
consider that only the case that yields a simple sign check is a win?

[Bug target/81389] _mm_cmpestri segfault on -O0

2017-07-18 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81389

--- Comment #4 from Marc Glisse  ---
(In reply to rockeet from comment #3)
> @Martin Liška Yes, my use case is:
> 
> __m128i key128 = { key }; // key is an unsigned char
> int idx = _mm_cmpestri(key128, 1,
> *(const __m128i*)(data), // don't require memory align
> len,
> _SIDD_UBYTE_OPS|_SIDD_CMP_EQUAL_ORDERED|_SIDD_LEAST_SIGNIFICANT);
> 
> // 

You should load the unaligned data using one of the loadu intrinsics and pass
that to _mm_cmpestri. When optimizing, it should generate the code you want,
but in a safe way.

[Bug lto/78795] LTO causes undefined reference errors when linking with GMP "make check"

2017-07-18 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78795

--- Comment #12 from Marc Glisse  ---
(In reply to Vincent Lefèvre from comment #11)
> On Debian, after path canonicalization, this is /usr/lib/bfd-plugins, but
> only packages should manage files under /usr/lib (unlike /usr/local, for
> instance). I've sent a mail to the Debian GCC Maintainers so that they
> provide a symlink:
> 
>   https://lists.debian.org/debian-gcc/2016/12/msg00122.html

For the record, Debian (testing+unstable) recently added this symlink.

[Bug middle-end/81445] Dynamic stack allocation not optimized into static allocation

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81445

--- Comment #6 from Marc Glisse  ---
(In reply to Wilco from comment #5)
> Also it doesn't support these simple cases:
> 
> void vla2(int x)
> {
>   if (x == 10)
>   {
> int arr[x];
> t (arr);
>   }
> }

Again, try something smaller. When the allocation is not always executed, the
threshold is even lower.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #10 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #9)
> (In reply to Marc Glisse from comment #8)
> > I think always using an unsigned type for the range check would be simpler.
> > If we try to check that x>=INT_MIN+2 && x<=INT_MAX-2 with -fwrapv, int is
> > still not a suitable type in which to do
> > x-(INT_MIN+2)<=INT_MAX-2-(INT_MIN+2), while the issue doesn't exist with an
> > unsigned type.
> 
> I'm trying to preserve what we did before, it can be tweaked incrementally
> if needed.

Then you may need to check for overflow in "hi = const_binop (MINUS_EXPR,
etype, hi, lo);", current build_range_check has "if (value != 0 &&
!TREE_OVERFLOW (value))" for the result of that operation. That should matter
for instance when simplifying X/INT_MAX==0.

[Bug middle-end/81445] Dynamic stack allocation not optimized into static allocation

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81445

--- Comment #4 from Marc Glisse  ---
(In reply to Wilco from comment #2)
> I don't see it happen for the simplest case in current trunk:

400 bytes is too large, try again with something smaller. (I'm with you if you
want to increase the threshold)

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #8 from Marc Glisse  ---
I think always using an unsigned type for the range check would be simpler. If
we try to check that x>=INT_MIN+2 && x<=INT_MAX-2 with -fwrapv, int is still
not a suitable type in which to do x-(INT_MIN+2)<=INT_MAX-2-(INT_MIN+2), while
the issue doesn't exist with an unsigned type.
I notice you call build_range_check in GENERIC (and new code for GIMPLE). Is
that temporary until match.pd can optimize range checks?
Do we want :s on trunc_div?

[Bug middle-end/81445] Dynamic stack allocation not optimized into static allocation

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81445

--- Comment #1 from Marc Glisse  ---
Note that we already do it for VLA (aka BUILT_IN_ALLOCA_WITH_ALIGN) in CCP.

[Bug tree-optimization/81396] [7/8 Regression] Optimization of reading Little-Endian 64-bit number with portable code has a regression

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81396

--- Comment #6 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #5)
> Or both this bswap change and the match.pd addition.

Doing both sounds good to me :-)

[Bug bootstrap/81425] Bootstrap broken since r250158

2017-07-13 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81425

--- Comment #1 from Marc Glisse  ---
Isn't that already fixed?
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00614.html

[Bug c++/81410] [5/6/7/8 Regression] -O3 breaks code

2017-07-12 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81410

--- Comment #5 from Marc Glisse  ---
Seems related to vectorization. These lines look suspicious:

  vect__37.14_78 = MEM[(long int *)_30];
  vect__37.15_72 = MEM[(long int *)_30 + 16B];
  vect__37.16_70 = MEM[(long int *)_30 + 32B];
  vect__37.17_68 = MEM[(long int *)_30 + 48B];
  MEM[(long int *)_28] = vect__37.14_78;
  MEM[(long int *)_28 + 16B] = vect__37.15_72;
  MEM[(long int *)_28 + 32B] = vect__37.16_70;
  MEM[(long int *)_28 + 48B] = vect__37.17_68;

where _30 is for b, _28 is for a, and I would expect to see gaps in the reads
from b (+24, +48, +72 instead of +16, +32 and +48). But I haven't checked, this
is only a first guess.

[Bug tree-optimization/81409] Inefficient loops generated from range-v3 code

2017-07-12 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81409

--- Comment #1 from Marc Glisse  ---
The most obvious thing I notice is

   [100.00%] [count: INV]:
  # it$_M_current_23 = PHI 
  _20 = _7 == it$_M_current_23;
  _5 = _20 | _53;
  if (_5 != 0)
goto ; [7.36%] [count: INV]
  else
goto ; [92.64%] [count: INV]

   [92.60%] [count: INV]:
  _27 = it$_M_current_23 + 4;
  if (_7 != _27)
goto ; [3.75%] [count: INV]
  else
goto ; [96.25%] [count: INV]

where 7 -> 6 means that _7 == _27 == it$_M_current_23 so _5 != 0 has to be
true. However, we do not thread that (at thread4 time, we go from 7 to 12
(empty latch) to 6 instead of directly to 6).

[Bug tree-optimization/81403] [8 Regression] wrong code at -O3

2017-07-12 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81403

--- Comment #3 from Marc Glisse  ---
/* x & C -> x if we know that x & ~C == 0.  */

Not clear where it is getting the bogus range/nonzero information from, I
thought we had fixed all the places reusing SSA_NAMEs with stale information.

[Bug tree-optimization/81403] wrong code at -O3

2017-07-12 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81403

--- Comment #1 from Marc Glisse  ---
PRE losing "& 10393" at -O3 but not -O2 (the previous dumps are identical)

@@ -611,6 +639,7 @@
 ;;6 [100.0%]  (FALLTHRU,EXECUTABLE)
   # .MEM_21 = PHI <.MEM_26(5), .MEM_25(6)>
   # prephitmp_34 = PHI <_30(5), _30(6)>
+  # prephitmp_35 = PHI <_30(5), _30(6)>
   # VUSE <.MEM_21>
   var_33.4_11 = var_33D.35372;
   if (var_33.4_11 != 0)
@@ -624,9 +653,7 @@
 ;;prev block 7, next block 9, flags: (NEW, REACHABLE, VISITED)
 ;;pred:   7 [54.0%]  (TRUE_VALUE,EXECUTABLE)
   # RANGE [0, 10393] NONZERO 10393
-  _29 = prephitmp_34 & 10393;
-  # RANGE [0, 10393] NONZERO 10393
-  _15 = (long intD.12) _29;
+  _15 = (long intD.12) prephitmp_35;

[Bug tree-optimization/81396] [7/8 Regression] Optimization of reading Little-Endian 64-bit number with portable code has a regression

2017-07-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81396

--- Comment #2 from Marc Glisse  ---
bswap was happy dealing with

  _2 = MEM[(const unsigned char *)];
  _3 = (uint64_t) _2;
  _4 = MEM[(const unsigned char *) + 1B];
  _5 = (uint64_t) _4;
  _6 = _5 << 8;
  _8 = MEM[(const unsigned char *) + 2B];
  _9 = (uint64_t) _8;
  _10 = _9 << 16;
  _32 = _6 | _10;
  _11 = _3 | _32;

etc, but has trouble with

  _21 = word_31(D) & 255;
  _1 = BIT_FIELD_REF ;
  _23 = (uint64_t) _1;
  _2 = _23 << 8;
  _4 = BIT_FIELD_REF ;
  _24 = (uint64_t) _4;
  _5 = _24 << 16;
  _32 = _2 | _5;
  _6 = _21 | _32;

<    4   5   6   7   8   9   10   11   12   13   >