[Bug tree-optimization/82337] [5/6/7/8 Regression] ICE: SSA corruption at tree-ssa-coalesce.c:1010

2017-09-27 Thread ivo.raisr at oracle dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337

--- Comment #10 from Ivo Raisr  ---
(In reply to Bill Schmidt from comment #9)

I confirm this fixes the problem also in the original full-blown source.

[Bug c++/66601] RFE: improve diagnostics for failure to deduce template parameter pack that is not in the last position in the parameter list

2017-09-27 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66601

Eric Gallager  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-28
 CC||egallager at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Eric Gallager  ---
Confirmed.

[Bug tree-optimization/65461] -Warray-bounds warnings in the linux kernel (free_area_init_nodes)

2017-09-27 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65461

Eric Gallager  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-28
 CC||egallager at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Eric Gallager  ---
Confirmed that I get the -Warray-bounds warning too.

[Bug middle-end/65041] Improve -Wclobbered

2017-09-27 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65041

Eric Gallager  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-28
 CC||egallager at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Eric Gallager  ---
Confirmed that gcc warns for a2 instead of fd.

[Bug target/71727] -O3 -mstrict-align produces code which assumes unaligned vector accesses work

2017-09-27 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71727

--- Comment #4 from Christophe Lyon  ---
Author: clyon
Date: Wed Sep 27 23:52:58 2017
New Revision: 253242

URL: https://gcc.gnu.org/viewcvs?rev=253242=gcc=rev
Log:
[AArch64] PR71727 fix -mstrict-align

2017-09-27  Christophe Lyon  

PR target/71727
gcc/
* config/aarch64/aarch64.c
(aarch64_builtin_support_vector_misalignment): Always return false
when misalignment is unknown.

gcc/testsuite/
* gcc.target/aarch64/pr71727-2.c: New test


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c
trunk/gcc/testsuite/ChangeLog

[Bug c++/82347] New: Class Name Injection and Constructor Typenames

2017-09-27 Thread ahuszagh at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82347

Bug ID: 82347
   Summary: Class Name Injection and Constructor Typenames
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ahuszagh at gmail dot com
  Target Milestone: ---

G++ (including 7.01 and 6.3.0) allowing the following code to compile without
any warning, allowing a type alias of the constructor of a class.


Source File
---

a.cpp

```
#include 
#include 
#include 

int main()
{
using a = typename std::vector::vector;
std::cout << typeid(a).name() << std::endl;
return 0;
}
```

G++ Information
---

Using built-in specs.
COLLECT_GCC=g++-7
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
7-20170407-0ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-7
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none --without-cuda-driver
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu
7-20170407-0ubuntu2)


Complete Command


g++-7 a.cpp

Compiler Output
---

N/A

Preprocessed File
-

(Not applicable?)

Description
---

ISO C++ in [class.qual]/2 states that a nested name specifier for the class
specifies the constructor and not the class, and therefore using `typename
x::x` is therefore not standards compliant. More information can be found on
the StackOverflow post I made on the topic:
https://stackoverflow.com/questions/46412754/class-name-injection-and-constructors

[Bug c++/82343] internal compiler error: Segmentation fault - template recurrency, SFINAE

2017-09-27 Thread p1006680 at mvrht dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82343

Mark  changed:

   What|Removed |Added

  Attachment #42244|0   |1
is obsolete||

--- Comment #1 from Mark  ---
Created attachment 42251
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42251=edit
(simplified) Preprocessed source code, generated by adding -save-temps

I managed to narrow down the problem (see attachment). I was wrong about the
function template recurrency. Problem is associated with the parameter pack of
size 0 or 1 in SFINAE test.

Confirmed with gcc: 8.0.0, 7.2.0, 7.1.0, 6.3.0, 6.2.0, 6.1.0, 5.4.0, 5.3.0,
5.2.0, 5.1.0, 4.9.3, 4.9.2, 4.9.1, 4.9.0, 4.8.5, 4.8.4, 4.8.3, 4.8.2, 4.8.1.
Below version 4.8.1 (4.7.4 and below) everything is okay.

[Bug lto/82172] Destruction of basic_string in basic_stringbuf::overflow with _GLIBCXX_USE_CXX11_ABI=0, -flto, and C++17 mode results in invalid delete

2017-09-27 Thread dave.gittins at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82172

--- Comment #21 from Gubbins  ---
(In reply to Martin Liška from comment #20)
> Your failure happens even w/o LTO, am I right?
> But yes, the problem looks very similar to what happens for ld.bfd.

You are right.

Does anyone know how I would raise this with someone who can fix it on the
Darwin side? Or could it be worked around by gcc?

[Bug fortran/81509] Wrong compilation error: iand/ieor/ior + boz + -std=f2008

2017-09-27 Thread sgk at troutmask dot apl.washington.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81509

--- Comment #6 from Steve Kargl  ---
On Wed, Sep 27, 2017 at 10:59:56PM +, dominiq at lps dot ens.fr wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81509
> 
> --- Comment #5 from Dominique d'Humieres  ---
> pr45513 and pr54072 could be duplicates.
> 

I don't recall either of those PR's, and have no idea
why I would have missed them.  :-\

pr45513 should be covered by my patch.  A portion of pr54072
is also covered, but pr54072 indicates a BOZ can be used with
other intrinsic subprogramis, e.g., TRANSFER.  This isn't surprizing
as a boz is marked as a BT_INTEGER.  That is, for

gfc_expr x;

a boz has

x->ts.type = BT_INTEGER
x->is_boz = 1

In hindsight, we probably should have introduced BT_BOZ and
treat it has some opaque entity with helper functions.  For
example, gfc_boz2int(x,kind) would convert the BOZ in x to
an INTEGER with kind type parameter 'kind'.

[Bug target/80210] ICE in in extract_insn, at recog.c:2311 on ppc64 for with __builtin_pow

2017-09-27 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80210

--- Comment #16 from Peter Bergner  ---
While investigating the new failure in Comment 15, I modified the test case
slightly to move the #pragma to the beginning of the test case.  I found I get
another similar looking ICE, but which isn't the same as the bug reported in
Comment 15:

bergner@bns:~/gcc/BUGS/PR80210>
/home/bergner/gcc/build/gcc-fsf-mainline-pr80210-64-base/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr80210-64-base/gcc -O2 -S no-sqrt.i 
no-sqrt.i: In function ‘foo’:
no-sqrt.i:6:1: error: unrecognizable insn:
 }
 ^
(insn 6 3 7 2 (set (reg:DF 121 [  ])
(sqrt:DF (reg/v:DF 122 [ a ]))) "no-sqrt.i":5 -1
 (nil))
during RTL pass: vregs
no-sqrt.i:6:1: internal compiler error: in extract_insn, at recog.c:2304
0x101330c7 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/bergner/gcc/gcc-fsf-mainline-pr80210-base/gcc/rtl-error.c:108
0x1013310b _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/home/bergner/gcc/gcc-fsf-mainline-pr80210-base/gcc/rtl-error.c:116
0x1085747b extract_insn(rtx_insn*)
/home/bergner/gcc/gcc-fsf-mainline-pr80210-base/gcc/recog.c:2304
0x10555bff instantiate_virtual_regs_in_insn
/home/bergner/gcc/gcc-fsf-mainline-pr80210-base/gcc/function.c:1591
0x10555bff instantiate_virtual_regs
/home/bergner/gcc/gcc-fsf-mainline-pr80210-base/gcc/function.c:1959
0x10555bff execute
/home/bergner/gcc/gcc-fsf-mainline-pr80210-base/gcc/function.c:2008

After debugging this, I have found that this is a problem saving and restoring
the optab values, so basically the opposite problem than we had before.

I have a patch that I am testing that fixes both new problems.

[Bug libstdc++/82346] [5.5 Regression] String is not detected as a part of std

2017-09-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

--- Comment #7 from Jonathan Wakely  ---
The condition for std::to_string being declared in gcc-5 is:

#if __cplusplus >= 201103L && defined(_GLIBCXX_USE_C99)

So presumably _GLIBCXX_USE_C99 is false. If you're using glibc 2.26 you might
have hit https://sourceware.org/bugzilla/show_bug.cgi?id=22146 and so need a
glibc fix so that libstdc++ correctly detects C99 support.

[Bug target/80210] ICE in in extract_insn, at recog.c:2311 on ppc64 for with __builtin_pow

2017-09-27 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80210

Peter Bergner  changed:

   What|Removed |Added

 Status|CLOSED  |ASSIGNED
 CC||schwab at gcc dot gnu.org
 Resolution|FIXED   |---

--- Comment #15 from Peter Bergner  ---
David and Andreas have reported that they are seeing ICEs on powerpc-aix and
powerpc-linux (32-bit) with the test case added as part of the fix for this
bug.  I didn't see this on my BE builds, because my build scripts were using
the --with-cpu=... configure command and using any -mcpu=... option will work
around the bug.  The bug isn't due to my earlier, since the ICE existed before
my patch, we just didn't have the test case to notice before.  

The problem they are seeing is due to a mismatch between TARGET_DEFAULT, which
contains MASK_PPC_GPOPT and the ISA flags for the default "powerpc64" cpu,
which does not contain MASK_PPC_GPOPT and how rs6000_option_override_internal()
decides which one to use.  The failure scenario is:

Early on, we call init_all_optabs() which setups up a table which describes
which patterns that generate some HW insns are "valid".  Before we call
init_all_optabs(), rs6000_option_override_internal() gets called with
global_init_p arg set to "true" and we basically set rs6000_isa_flags to
TARGET_DEFAULT.  This is because we do not have a -mcpu= value nor do we have
an "implicit_cpu", which forces us to use TARGET_DEFAULT.  With this,
init_all_optabs() thinks we can generate a HW sqrt, so it enables generating
its pattern.

Later, after we've scanned the entire file, we go to expand our function into
RTL and we reset our compiler options and we end up calling
rs6000_option_override_internal() again, but with global_init_p arg now false
and we encounter this code:

  struct cl_target_option *main_target_opt
= ((global_init_p || target_option_default_node == NULL)
   ? NULL : TREE_TARGET_OPTION (target_option_default_node));

This ends up setting main_target_opt to a non-NULL value, then:

  ...
  else if (main_target_opt != NULL && main_target_opt->x_rs6000_cpu_index >= 0)
{
  rs6000_cpu_index = cpu_index = main_target_opt->x_rs6000_cpu_index;
  have_cpu = true;
}

So now we act as if the user explicitly passed in a -mcpu= option, then:

  ...
  /* If we have a cpu, either through an explicit -mcpu= or if the
 compiler was configured with --with-cpu=, replace all of the ISA bits
 with those from the cpu, except for options that were explicitly set.  If
 we don't have a cpu, do not override the target bits set in
 TARGET_DEFAULT.  */
  if (have_cpu)
{
  rs6000_isa_flags &= ~set_masks;
  rs6000_isa_flags |= (processor_target_table[cpu_index].target_enable
   & set_masks);
}
  else
{
  /* If no -mcpu=, inherit any default options that were cleared via
 POWERPC_MASKS.  Originally, TARGET_DEFAULT was used to initialize
 target_flags via the TARGET_DEFAULT_TARGET_FLAGS hook.  When we
switched
 to using rs6000_isa_flags, we need to do the initialization here.

 If there is a TARGET_DEFAULT, use that.  Otherwise fall back to using
 -mcpu=powerpc, -mcpu=powerpc64, or -mcpu=powerpc64le defaults.  */
  HOST_WIDE_INT flags = ((TARGET_DEFAULT) ? TARGET_DEFAULT
 :
processor_target_table[cpu_index].target_enable);
  rs6000_isa_flags |= (flags & ~rs6000_isa_flags_explicit);
}

So the first time through here with global_init_p == true, have_cpu is set to
false and we get TARGET_DEFAULT.  The next time we come here, global_init_p ==
false and we set have_cpu to true because main_target_opt is non-NULL and the
cpu_index value is set to "powerpc64" (for -m64 compiles) or "powerpc" (for
-m32 compiles).  This causes us to now grab the ISA flags from:

  processor_target_table[cpu_index].target_enable

...instead of from TARGET_DEFAULT and neither "powerpc64" nor "powerpc" contain
the MASK_PPC_GPOPT flag, which leads us to ICE because the optabs allows us to
generate the HW sqrt pattern, but our ISA flags don't allow it.

This doesn't affect LE builds, because it has a TARGET_DEFAULT value that
matches the "powerpc64le" default masks.  We also enforce passing a
-mcpu=power8 option when the user doesn't explicitly use one, so again, not a
problem.

This also doesn't affect --target=powerpc-linux builds or
--target=powerpc64-linux builds that default to 32-bit binaries, because we use
a value of TARGET_DEFAULT == 0 (for both -m32 and -m64), so the first time
through rs6000_option_override_internal(), we end up using
processor_target_table[cpu_index].target_enable right from the beginning.

[Bug fortran/81509] Wrong compilation error: iand/ieor/ior + boz + -std=f2008

2017-09-27 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81509

--- Comment #5 from Dominique d'Humieres  ---
pr45513 and pr54072 could be duplicates.

[Bug libstdc++/82346] [5.5 Regression] String is not detected as a part of std

2017-09-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #6 from Jonathan Wakely  ---
This must be an Ubuntu bug. It works fine here.

[Bug rtl-optimization/82338] valgrind error in inherit_in_ebb

2017-09-27 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82338

--- Comment #1 from David Binderman  ---
12 hours reducing leads to this C++ code:

extern "C" {
void a();
void *memset(void *, int, unsigned long);
}
struct b {
  int c;
  int d;
} e[5000], *f;
int g;
int h;
int i;
int j, k;
void l(int);
int m;
int o;
void load() {
  int n;
  memset(e, 0, sizeof(e));
  for (; k;)
;
  a();
  for (; m < o; m++)
for (; i; n++) {
  e[j].c = g;
  if (f[h].d && n == i)
l(-1);
}
}

[Bug lto/82302] LTO producing bad code

2017-09-27 Thread krzysio.kurek at wp dot pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82302

--- Comment #9 from krzysio.kurek at wp dot pl ---
I think I located the issue, it works fine on my machine, but using I found an
error using glslangValidator.
Please try pulling and compiling again.

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2017-09-27 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #3 from Peter Cordes  ---
(In reply to Marc Glisse from comment #2)
> Does anything bad happen if you remove the #ifdef/#endif for
> _mm_cvtsi64_si128? (2 files in the testsuite would need updating for a
> proper patch)

It's just a wrapper for

_mm_cvtsi64_si128 (long long __A) {
  return _mm_set_epi64x (0, __A);
}

and _mm_set_epi64x is already available in 32-bit mode.

I tried using _mm_set_epi64x(0, i) (https://godbolt.org/g/24AYPk), and got the
expected results (same as with _mm_loadl_epi64());

__m128i movq_test(uint64_t *p) {
  return _mm_set_epi64x( 0, *p );
}

movl4(%esp), %eax
vmovq   (%eax), %xmm0
ret

For the test where we shift before movq, it still uses 32-bit integer
double-precision shifts, stores to the stack, then vmovq (instead of optimizing
to  vmovq / vpsllq)


For the reverse, we get:

long long extract(__m128i v) {
return ((__v2di)v)[0];
}

subl$28, %esp
vmovq   %xmm0, 8(%esp)
movl8(%esp), %eax
movl12(%esp), %edx
addl$28, %esp
ret

MOVD / PEXTRD might be better, but gcc does handle it.  It's all using syntax
that's available in 32-bit mode, not a special built-in.

I don't think it's helpful to disable the 64-bit integer intrinsics for 32-bit
mode, even though they are no longer always single instructions.  I guess it
could be worse if someone used it without thinking, assuming it would be the
same cost as MOVD, and didn't really need the full 64 bits.  In that case, a
compile-time error would prompt them to port more optimally to 32-bit.  But
it's not usually gcc's job to refuse to compile code that might be sub-optimal!

[Bug libstdc++/82346] [5.5 Regression] String is not detected as a part of std

2017-09-27 Thread krzysio.kurek at wp dot pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

--- Comment #5 from krzysio.kurek at wp dot pl ---
$ g++-5 -std=c++11 main.cpp -o string -v
Using built-in specs.
COLLECT_GCC=g++-5
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.1-12ubuntu4'
--with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-5 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie
--with-system-zlib --disable-browser-plugin --enable-java-awt=gtk
--enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre
--enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu   
Thread model: posix 
gcc version 5.4.1 20170906 (Ubuntu 5.4.1-12ubuntu4) 
COLLECT_GCC_OPTIONS='-std=c++11' '-o' 'string' '-v' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/5/cc1plus -quiet -v -imultiarch x86_64-linux-gnu
-D_GNU_SOURCE main.cpp -quiet -dumpbase main.cpp -mtune=generic -march=x86-64
-auxbase main -std=c++11 -version -fstack-protector-strong -Wformat
-Wformat-security -o /tmp/ccUSBFkt.s
GNU C++11 (Ubuntu 5.4.1-12ubuntu4) version 5.4.1 20170906 (x86_64-linux-gnu)
compiled by GNU C version 5.4.1 20170906, GMP version 6.1.2, MPFR
version 3.1.6-rc1, MPC version 1.0.3
warning: MPFR header version 3.1.6-rc1 differs from library version 3.1.6.  
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072  
ignoring duplicate directory "/usr/include/x86_64-linux-gnu/c++/5"  
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/x86_64-linux-gnu/5/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/5
 /usr/include/x86_64-linux-gnu/c++/5
 /usr/include/c++/5/backward
 /usr/lib/gcc/x86_64-linux-gnu/5/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/5/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
GNU C++11 (Ubuntu 5.4.1-12ubuntu4) version 5.4.1 20170906 (x86_64-linux-gnu)
compiled by GNU C version 5.4.1 20170906, GMP version 6.1.2, MPFR
version 3.1.6-rc1, MPC version 1.0.3
warning: MPFR header version 3.1.6-rc1 differs from library version 3.1.6.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4fa6e505be5bb1fff9039a541d8268ee
main.cpp: In function ‘int main()’:
main.cpp:6:25: error: ‘to_string’ is not a member of ‘std’
   std::string perfect = std::to_string(1+2+4+7+14) + " is a perfect number";
 ^

[Bug libstdc++/82346] [5.5 Regression] String is not detected as a part of std

2017-09-27 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

Andrew Pinski  changed:

   What|Removed |Added

  Component|c++ |libstdc++
   Target Milestone|--- |5.5
Summary|String is not detected as a |[5.5 Regression] String is
   |part of std |not detected as a part of
   ||std

--- Comment #3 from Andrew Pinski  ---
Can you provide the exact output of g++ then?

[Bug libstdc++/82346] [5.5 Regression] String is not detected as a part of std

2017-09-27 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

--- Comment #4 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #3)
> Can you provide the exact output of g++ then?

Can you provide the exact output of g++ -v then?


Sorry for the typo.

[Bug c++/82346] String is not detected as a part of std

2017-09-27 Thread krzysio.kurek at wp dot pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

krzysio.kurek at wp dot pl changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #2 from krzysio.kurek at wp dot pl ---
Oh yeah I could have mentioned, my bad.
-std=c++11/-std=c++14 do not fix the issue. It only fails on 5.4.1, and works
on 5.4.0.

[Bug c++/82346] String is not detected as a part of std

2017-09-27 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Andrew Pinski  ---
This compiles for me with GCC 7 without any -std=* options.  GCC 6 and above
default to C++14 while GCC 5 defaults to C++03 so you might need -std=c++11 or
-std=gnu++11 to make std::to_string work correctly.

[Bug c++/82346] New: String is not detected as a part of std

2017-09-27 Thread krzysio.kurek at wp dot pl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82346

Bug ID: 82346
   Summary: String is not detected as a part of std
   Product: gcc
   Version: 5.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: krzysio.kurek at wp dot pl
  Target Milestone: ---

Most basic code fails to compile.

#include 
#include 

int main ()
{
  std::string perfect = std::to_string(1+2+4+7+14) + " is a perfect number";
  std::cout << perfect << '\n';
  return 0;
}

[Bug target/82339] Inefficient movabs instruction

2017-09-27 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339

--- Comment #5 from Peter Cordes  ---
(In reply to Richard Biener from comment #2)
> I always wondered if it is more efficient to have constant pools per function
> in .text so we can do %rip relative loads with short displacement?

There's no rel8 encoding for RIP-relative; it's always RIP+rel32, so this
doesn't save code-size.  (AMD64 hacked it in by repurposing one of the two
redundant ways to encode a 32-bit absolute address with no base or index
register; the ModRM machine-code encoding is otherwise the same between x86-32
and x86-64.)

> I suppose the assembler could even optimize things if there's the desired
> constant somewhere near in the code itself... (in case data loads from icache
> do not occur too much of a penalty).

There's no penalty for loads AFAIK, only stores to addresses near RIP are
snooped and cause self-modifying-code machine clears.

Code will often be hot in L2 cache as well as L1I, so an L1D miss could hit
there.  But L1dTLB is separate from L1iTLB, so you could TLB miss even when
loading from the instruction you're running.

(The L2TLB is usually a victim cache, IIRC, so a TLB miss that loaded the
translation into the L1iTLB doesn't also put it into L2TLB.)

>  The assembler could also replace
> .palign space before function start with (small) constant(s).

This could be a win in some cases, if L1D pressure is low or there wasn't any
locality with other constants anyway.  If there could have been locality,
you're just wasting space in L1D by having your data spread out across more
cache lines.

But in general on x86, it's probably not a good strategy.


BTW, gcc could do a lot better with vector constants.  e.g. set1_ps(1.0f) could
compile to a vbroadcastss load (which is the same cost as a normal vmovaps). 
But instead it actually repeats the 1.0f in memory 8 times.  That's useful if
you want to use it as a memory operand, because before AVX512 you can't have
broadcast memory operands to ALU instructions.  But if it's only ever loaded
ahead of a loop, a broadcast load or a PMOVZX load can save a lot of space.  In
a function with multiple vector constants, this is the difference between one
vs. multiple cache lines for all its data. 

(vpbroadcastd/q, ss/sd, and 128-bit is handled in the load ports on Intel and
AMD, but vector PMOVZX/SX with a memory operand is still a micro-fused
load+ALU.  Still, could easily be worth it for e.g.
_mm256_set_epi32(1,2,3,4,5,6,7,8), storing that as .byte 1,2,3,4,5,6,7,8.

The downside is lost opportunities for different functions to share the same
constant like with string-literal deduplication.  If one function wants the
full constant in memory for use as a memory operand, it's probably better for
all functions to use that copy.  Except that putting all the constants for a
given function into a couple cache lines is good for locality when it runs.  If
the full copy somewhere else isn't generally hot when a function that could use
a broadcast or pmovzx/pmovsx load runs, it might be better for it to use a
separate copy stored with the constants it does touch.

[Bug libfortran/66756] libgfortran: ThreadSanitizer: lock-order-inversion

2017-09-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66756

--- Comment #12 from Thomas Koenig  ---
Correction... the patch does not work with a simple
example such as

program main
  !$OMP PARALLEL NUM_THREADS(4)
  print *,"Hello, world"
  !$OMP END PARALLEL 
end program main

Some more digging to do...

[Bug fortran/81509] Wrong compilation error: iand/ieor/ior + boz + -std=f2008

2017-09-27 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81509

--- Comment #4 from kargl at gcc dot gnu.org ---
A patch has been submitted.  See

https://gcc.gnu.org/ml/fortran/2017-09/msg00124.html

[Bug tree-optimization/82337] [5/6/7/8 Regression] ICE: SSA corruption at tree-ssa-coalesce.c:1010

2017-09-27 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337

--- Comment #9 from Bill Schmidt  ---
Revised and tested patch posted here: 
https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01836.html

[Bug testsuite/82324] Problem in new trunk test case gfortran.dg/promotion_4.f90

2017-09-27 Thread janus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82324

janus at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |8.0

--- Comment #7 from janus at gcc dot gnu.org ---
I hope all failures should be fixed with r253214. If not, please reopen.

[Bug libfortran/66756] libgfortran: ThreadSanitizer: lock-order-inversion

2017-09-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66756

--- Comment #11 from Thomas Koenig  ---
Created attachment 42250
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42250=edit
Proposed patch

This patch is an attempt at getting rid of the lock-order
inversion.  It seems to do the right thing, and survives
both regression-testing and the thread sanitizer.

It is not yet complete (comments are not adjusted).

I would be grateful if somebody had a way to stress-test it.

[Bug target/82339] Inefficient movabs instruction

2017-09-27 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339

--- Comment #4 from Peter Cordes  ---
(In reply to Jakub Jelinek from comment #0)
> At least on i7-5960X in the following testcase:
> 
> baz is fastest as well as shortest.
> So I think we should consider using movl $cst, %edx; shlq $shift, %rdx
> instead of movabsq $(cst << shift), %rdx.
> 
> Unfortunately I can't find in Agner Fog MOVABS and for MOV r64,i64 there is
> too little information, so it is unclear on which CPUs it is beneficial.

Agner uses Intel syntax, where imm64 doesn't have a special mnemonic.  It's
part of the  mov r,i  entry in the tables.  But those tables are throughput for
a flat sequence of the instruction repeated many times, not mixed with others
where front-end effects can be different.  Agner probably didn't actually test
mov r64,imm64, because its throughput is different when tested in a long
sequence (not in a small loop).  According to
http://users.atw.hu/instlatx64/GenuineIntel00506E3_Skylake2_InstLatX64.txt, a
regular desktop Skylake has 0.64c throughput for mov r64, imm64, vs. 0.25 for
mov r32, imm32.  (They don't test mov r/m64, imm32, the 7-byte encoding for
something like mov rax,-1).

Skylake with up-to-date microcode (including all SKX CPUs) disables the loop
buffer (LSD), and has to read uops from the uop cache every time even in short
loops.

Uop-cache effects could be a problem for instructions with a 64-bit immediate. 
Agner only did detailed testing for Sandybridge; it's likely that Skylake still
mostly works the same (although the uop cache read bandwidth is higher).

mov r64, imm64 takes 2 entries in the uop cache (because of the 64-bit
immediate that's outside the signed 32-bit range), and takes 2 cycles to read
from the uop cache, according to Agner's Table 9.1 in his microarch pdf.  It
can borrow space from another entry in the same uop cache line, but still takes
extra cycles to read.

See
https://stackoverflow.com/questions/46433208/which-is-faster-imm64-or-m64-for-x86-64
for an SO question the other day about loading constants from memory vs. imm64.
 (Although I didn't have anything very wise to say there, just that it depends
on surrounding code as always!)

> Peter, any information on what the MOV r64,i64 latency/throughput on various
> CPUs vs. MOV r32,i32; SHL r64,i8 is?

When not bottlenecked on the front-end,  mov r64,i64  is a single ALU uop with
1c latency.  I think it's pretty much universal that it's the best choice when
you bottleneck on anything else.

Some loops *do* bottleneck on the front-end, though, especially without
unrolling.  But then it comes down to whether we have a uop-cache read
bottleneck, or a decode bottleneck, or an issue bottleneck (4 fused-domain uops
per clock renamed/issued).  For issue/retire bandwidth mov/shl is 2 uops
instead of 1.

But for code that bottlenecks on reading the uop-cache, it's really hard to say
if one is better in general.  I think if the imm64 can borrow space in other
uops in the cache line, it's better for uop-cache density than mov/shl.  Unless
the extra code-size means one fewer instruction fits into a uop cache line that
wasn't nearly full (6 uops).

Front-end stuff is *very* context-sensitive.  :/  Calling a very short
non-inline function from a tiny loop is probably making the uop-cache issues
worse, and is probably favouring the mov/shift over the mov r64,imm64 approach
more than you'd see as part of a larger contiguous block.

I *think*  mov r64,imm64  should still generally be preferred in most cases. 
Usually the issue queue (IDQ) between the uop cache and the issue/rename stage
can absorb uop-cache read bubbles.

A constant pool might be worth considering if code-size is getting huge
(average instruction length much greater than 4).

Normally of course you'd really want to hoist an imm64 out of a loop, if you
have a spare register.  When optimizing small loops, you can usually avoid
front-end bottlenecks.  It's a lot harder for medium-sized loops involving
separate functions.  I'm not confident this noinline case is very
representative of real code.

---

Note that in this special case, you can save another byte of code by using 
ror rax  (implicit by-one encoding).

Also worth considering for tune=sandybridge or later: xor eax,eax / bts rax,
63.   2B + 5B = 7B.  BTS has 0.5c throughput, and xor-zeroing doesn't need an
ALU on SnB-family (so it has zero latency; the BTS can execute right away even
if it issues in the same cycle as xor-zeroing).  BTS runs on the same ports as
shifts (p0/p6 in HSW+, or p0/p5 in SnB/IvB).  On older Intel, it has 1 per
clock throughput for the reg,imm form.  On AMD, it's 2 uops, with 1c throughput
(0.5c on Ryzen), so its not bad if used on AMD CPUs, but it doesn't look good
for tune=generic.

At -Os, you could consider  or eax, -1;  shl rax,63.  (Also 7 bytes, and works
for constants with multiple consecutive high-bits set). The false dependency on
the old RAX value is often not a bottleneck, and gcc 

[Bug target/69493] Poor code generation for return of struct containing vectors on PPC64LE

2017-09-27 Thread bergner at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69493

Peter Bergner  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org

--- Comment #6 from Peter Bergner  ---
A simpler test case that shows the same problem when compiling for POWER8. When
compiling for POWER9, we get the code we want/expect:

bergner@pike:~/gcc/BUGS/PR70053$ cat pr69493-2.c 
typedef struct
{
  __vector double vx0;
  __vector double vx1;
} vec_t;

vec_t
foo (__vector double a, __vector double b)
{
  vec_t result;
  result.vx0 = a;
  result.vx1 = b;
  return result;
}

bergner@pike:~/gcc/BUGS/PR70053$
/home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc -S -O2
-mcpu=power8 pr69493-2.c 
bergner@pike:~/gcc/BUGS/PR70053$ cat pr69493-2.s 
...
foo:
addi 8,1,-96
li 10,32
xxpermdi 34,34,34,2
xxpermdi 35,35,35,2
li 9,48
stxvd2x 34,8,10
stxvd2x 35,8,9
lxvd2x 34,8,10
lxvd2x 35,8,9
xxpermdi 34,34,34,2
xxpermdi 35,35,35,2
blr


bergner@pike:~/gcc/BUGS/PR70053$
/home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-pr70053-debug/gcc -S -O2
-mcpu=power9 pr69493-2.c 

bergner@pike:~/gcc/BUGS/PR70053$ cat pr69493-2.s 
...
foo:
blr

[Bug fortran/82258] [8 regression] allocate_zerosize_3.f fails since r251949

2017-09-27 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82258

--- Comment #9 from Christophe Lyon  ---

I get:
   1   2   1   0  -2  -3   
  -4
   3   4   5   0   7   8   
   9

[Bug c++/82345] low performance (comparing to clang)

2017-09-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82345

Jonathan Wakely  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #5 from Jonathan Wakely  ---
(In reply to Eugene from comment #3)
> Created attachment 42249 [details]
> source code

Thanks, GCC does indeed perform worse for the version using boost::string_view.

[Bug c++/82345] low performance (comparing to clang)

2017-09-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82345

--- Comment #4 from Jonathan Wakely  ---
When I compare the performance of this similar program on a text file of 4
million lines I see gcc performs slightly better:

#include 
#include 
#include 

int main(int , char**argv) {
  std::ifstream in(argv[1]);

  std::string line;
  while (std::getline(in, line)) {
auto pos = std::experimental::string_view(line).find("http");
  }
}

[Bug c++/82345] low performance (comparing to clang)

2017-09-27 Thread claprix at yandex dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82345

--- Comment #3 from Eugene  ---
Created attachment 42249
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42249=edit
source code

[Bug c++/82345] low performance (comparing to clang)

2017-09-27 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82345

Jonathan Wakely  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2017-09-27
 Ever confirmed|0   |1

--- Comment #2 from Jonathan Wakely  ---
Please attach the source here, don't link to somewhere else. Compress it if
needed, or better still, reduce it:
https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction

https://gcc.gnu.org/bugs/

[Bug c++/82345] low performance (comparing to clang)

2017-09-27 Thread claprix at yandex dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82345

--- Comment #1 from Eugene  ---
Source file https://yadi.sk/d/FqXH-4Y63NGeSw

[Bug c++/82345] New: low performance (comparing to clang)

2017-09-27 Thread claprix at yandex dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82345

Bug ID: 82345
   Summary: low performance (comparing to clang)
   Product: gcc
   Version: 7.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: claprix at yandex dot ru
  Target Milestone: ---

String search is slow for this code.

$ g++ -O2 -DNDEBUG -std=c++14 gcc_perf_buf.cc && time ./a.out shodan/huge01.txt 

real0m0.470s
user0m0.367s
sys 0m0.104s
$ clang++ -O2 -DNDEBUG -std=c++14 gcc_perf_buf.cc && time ./a.out
shodan/huge01.txt 

real0m0.248s
user0m0.179s
sys 0m0.069s

$ gcc --version
gcc (Ubuntu 7.2.0-7ubuntu1) 7.2.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

$ lsb_release -d
Description:Ubuntu Artful Aardvark (development branch)

[Bug c++/63392] poor error recovery with missing typename

2017-09-27 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63392

Eric Gallager  changed:

   What|Removed |Added

   Keywords||error-recovery
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-27
 CC||egallager at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Eric Gallager  ---

Confirmed.

[Bug target/82341] [8 regression] i386/pr80732.c fail

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82341

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #2 from Jakub Jelinek  ---
Can't reproduce this.
I believe this test used to FAIL since r252976 till r253100, but shouldn't be
broken anymore.

[Bug tree-optimization/82337] [5/6/7/8 Regression] ICE: SSA corruption at tree-ssa-coalesce.c:1010

2017-09-27 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337

--- Comment #8 from Bill Schmidt  ---
Created attachment 42248
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42248=edit
Proposed patch

Here's what I'm testing -- looks like it fixes this particular case.

[Bug tree-optimization/82337] [5/6/7/8 Regression] ICE: SSA corruption at tree-ssa-coalesce.c:1010

2017-09-27 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337

--- Comment #7 from Bill Schmidt  ---
I think we can do something simpler by just keeping these abnormal SSA names
out of the basis chains in the table.  Working on a patch.

[Bug target/82342] [8 regression] i386/pr82260-2.c fail

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82342

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-09-27
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Jakub Jelinek  ---
Created attachment 42247
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42247=edit
gcc8-pr82342.patch

Untested fix.  BMI2 sarx is not something the test intends to test, so disable
it.

[Bug rtl-optimization/82344] New: [8 Regression] SPEC CPU2006 435.gromacs ~10% performance regression with trunk@250855

2017-09-27 Thread alexander.nesterovskiy at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82344

Bug ID: 82344
   Summary: [8 Regression] SPEC CPU2006 435.gromacs ~10%
performance regression with trunk@250855
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alexander.nesterovskiy at intel dot com
  Target Milestone: ---

Created attachment 42246
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42246=edit
r250854 vs r250855 generated code comparison

Compilation options that affects regression: "-Ofast -march=core-avx2
-mfpmath=sse"

Regression happened after r250855 though it looks like this commit is not of
guilty by itself but reveals something in other stages.

Changes in 123t.reassoc1 stage leads to a bit different code generation during
stages that follow it.

Place of interest is in "inl1130" subroutine (file "innerf.f") - it's a part of
a big loop with 9 similar expressions with 4-byte float variables:
---
y1 = 1.0/sqrt(x1)
y2 = 1.0/sqrt(x2)
y3 = 1.0/sqrt(x3)
y4 = 1.0/sqrt(x4)
y5 = 1.0/sqrt(x5)
y6 = 1.0/sqrt(x6)
y7 = 1.0/sqrt(x7)
y8 = 1.0/sqrt(x8)
y9 = 1.0/sqrt(x9)
---

When compiled with "-ffast-math" 1/sqrt is calculated with "vrsqrtss"
instruction followed by Newton-Raphson step with four "vmulss", one "vadss" and
two constants used.
Like here (part of r250854 code):
---
vrsqrtss xmm12, xmm12, xmm7
vmulss   xmm7,  xmm12, xmm7
vmulss   xmm0,  xmm12, DWORD PTR .LC2[rip]
vmulss   xmm8,  xmm7,  xmm12
vaddss   xmm5,  xmm8,  DWORD PTR .LC1[rip]
vmulss   xmm1,  xmm5,  xmm0
---
Input values (x1-x9) are in xmm registers mostly (x2 and x7 in memory), output
values (y1-y9) are in xmm registers.

After r250855 .LC2 constant goes into xmm7 and x7 is also goes to xmm register.
This leads to lack of temporary registers and worse instructions interleaving
as a result.
See attached picture with part of assembly listings where corresponding
y=1/sqrt parts are painted the same color.

Finally these 9 lines of code are executed about twice slower which leads to
~10% performance regression for whole test.

I've made two independent attempts to change code in order to verify the above.

1. To be sure that we loose performance exactly in this part of a loop I just
pasted ~60 assembly instructions from previous revision to a new one (after
proper renaming of course). This helped to restore performance.

2. To be sure that the problem is due to a lack of temporary registers I moved
calculation of 1/sqrt for one last line into function call. Like here:
---
... in other module to disable inlining:
function myrsqrt(x)
  implicit none
  real*4 x
  real*4 myrsqrt
  myrsqrt = 1.0/sqrt(x);
  return
end function myrsqrt

...

y1 = 1.0/sqrt(x1)
y2 = 1.0/sqrt(x2)
y3 = 1.0/sqrt(x3)
y4 = 1.0/sqrt(x4)
y5 = 1.0/sqrt(x5)
y6 = 1.0/sqrt(x6)
y7 = 1.0/sqrt(x7)
y8 = 1.0/sqrt(x8)
y9 = myrsqrt(x9)
---
Even with call/ret overhead this also helped to restore performance since it
freed some registers.

[Bug target/82012] [8 Regression] libitm build fails for s390x-linux-gnu

2017-09-27 Thread krebbel at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82012

Andreas Krebbel  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Andreas Krebbel  ---
Commit in Comment #9

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2017-09-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #2 from Marc Glisse  ---
Does anything bad happen if you remove the #ifdef/#endif for _mm_cvtsi64_si128?
(2 files in the testsuite would need updating for a proper patch)

[Bug c++/82159] [6/7/8 Regression] ICE: in assign_temp, at function.c:961

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82159

--- Comment #4 from Jakub Jelinek  ---
Author: jakub
Date: Wed Sep 27 14:19:57 2017
New Revision: 253230

URL: https://gcc.gnu.org/viewcvs?rev=253230=gcc=rev
Log:
PR c++/82159
* gimplify.c (gimplify_modify_expr): Don't optimize away zero sized
lhs from calls if the lhs has addressable type.

* g++.dg/opt/pr82159.C: New test.

Added:
trunk/gcc/testsuite/g++.dg/opt/pr82159.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c
trunk/gcc/testsuite/ChangeLog

[Bug c/82340] volatile ignored in compound literal

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82340

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-09-27
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Jakub Jelinek  ---
Created attachment 42245
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42245=edit
gcc8-pr82340.patch

Untested fix.

[Bug target/82339] Inefficient movabs instruction

2017-09-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
In addition to Agner Fog's manuals, the instlatx64 resource provide plenty of
latency/throughput data: http://users.atw.hu/instlatx64/

The benchmark in comment 0 measures throughput (including call/return overhead
which seems a bit strange), latency-wise movabs should be preferable.

So I think this indicates that a "real fix" should try to evaluate if a 64-bit
immediate move starts a critical-ish dependency chain, if yes, then we should
be trying to reduce latency and should prefer movabs, if not, we may prefer the
mov+shl combo that trades latency for overall throughput (i.e. assuming the
additional latency can be hidden by compiler scheduling and CPU reordering).

[Bug c++/81398] Complaining about 'partial specialization of '...' after instantiation' in c++1z

2017-09-27 Thread d25fe0be at outlook dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81398

d25fe0be@  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from d25fe0be@  ---
Let's close it then. Sorry for the noise.

[Bug tree-optimization/82337] [5/6/7/8 Regression] ICE: SSA corruption at tree-ssa-coalesce.c:1010

2017-09-27 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337

Bill Schmidt  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |wschmidt at gcc dot 
gnu.org

--- Comment #6 from Bill Schmidt  ---
You bet.  I'll try to have a look today.

[Bug c++/82343] New: internal compiler error: Segmentation fault - template recurrency, SFINAE

2017-09-27 Thread p1006680 at mvrht dot net
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82343

Bug ID: 82343
   Summary: internal compiler error: Segmentation fault - template
recurrency, SFINAE
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: p1006680 at mvrht dot net
  Target Milestone: ---

Created attachment 42244
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42244=edit
Preprocessed source code, generated by adding -save-temps

Segmentation fault while compiling code (see attachment). Bug is probably
associated with function template recurrency.

Tested on Wandbox (https://wandbox.org). Confirmed with gcc: 8.0, 7.2, 7.1.

Exact version of GCC (gcc -v output):
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
7.2.0-1ubuntu1~16.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-7
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib
--with-target-system-zlib --enable-objc-gc=auto --enable-multiarch
--disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none --without-cuda-driver
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 7.2.0 (Ubuntu 7.2.0-1ubuntu1~16.04)
System type (lsb_release -a output):
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 16.04.3 LTS
Release:16.04
Codename:   xenial
Complete command line that triggers the bug:
g++ prog.cpp -std=gnu++1z
Compiler output (exit code: 4):
g++: internal compiler error: Segmentation fault signal terminated program
cc1plus
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
Attachment:
prog.ii - preprocessed source code, generated by adding -save-temps

[Bug target/82341] [8 regression] i386/pr80732.c fail

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82341

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.0

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug target/82342] [8 regression] i386/pr82260-2.c fail

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82342

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.0

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug middle-end/82095] [8 Regression] ICE in tree_nop_conversion at tree.c:11793 on ppc64le

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82095

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Jakub Jelinek  ---
Fixed.

[Bug c++/82115] [8 Regression] ICE on (valid) C++11 code: Segmentation fault signal terminated program cc1plus

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82115

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org,
   ||nathan at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Anyway, if value_dependent_expression_p needs to recurse on DECL_INITIAL that
can contain arbitrary stuff, including the VAR_DECL with that DECL_INITIAL in
it, we need to avoid the recursion.
"it is a constant with literal type and is initialized with an expression that
is value-dependent."
is what applies here.
So do we need some hash-map that will track VAR_DECLs  on whose DECL_INITIAL
we've already recursed?  The problem is that value_dependent_expression_p calls
type_dependent_expression_p and vice versa, so it is unclear when that hash-map
should be saved/restored.  Even if it is not possible to reach similar infinite
recursion through both of those functions (i.e. when we could save/restore the
hash-map in "toplevel" value_dependent_expression_p call and use
value_dependent_expression_p_1 recursing to itself from it, another question is
how to do that efficiently; how common are VAR_DECLs on which we recurse on
DECL_INITIAL?  How many there are on average during one top-level
value_dependent_expression_p call?  E.g. if the common cases are 0 or 1 times,
perhaps we could have next to the hash-map a single tree which we'd compare and
only create hash-map if seeing another one.

[Bug target/82138] [8 Regression] Assembler messages: Error: can't resolve `.got2' {.got2 section} - `.LCF0' {.text.unlikely section}

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82138

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P4
 CC||jakub at gcc dot gnu.org

[Bug c++/82148] [7/8 Regression] ICE in assign_temp, at function.c:968

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82148

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
The gimplifier is handed call to foo with argument of type Derived, the
conversion has been omitted.  As the call expects a type (in this case empty
class) passed by value, but the argument is one that should be passed by
invisible reference, this obviously crashes during expansion.

[Bug c/82340] volatile ignored in compound literal

2017-09-27 Thread pascal_cuoq at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82340

--- Comment #2 from Pascal Cuoq  ---
Richard:

>  I don't see how a volatile compound literal could make any sense or how 
> you'd observe the side-effect of multiple stores to it

Well, I have the same question about volatile variables the address of which is
not taken. But this is off-topic for this bug report, in which the volatile's
object's address is taken.

> (IIRC compound literals are constant!?).

The C11 standard invites the programmer to use the const qualifier if they want
a constant compound literal, and gives an explicit example of a “modifiable”
non-const compound literal: https://port70.net/~nsz/c/c11/n1570.html#6.5.2.5p12

[Bug target/82342] New: [8 regression] i386/pr82260-2.c fail

2017-09-27 Thread andrey.y.guskov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82342

Bug ID: 82342
   Summary: [8 regression] i386/pr82260-2.c fail
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrey.y.guskov at intel dot com
  Target Milestone: ---

r253050 triggered these fails:

FAIL: gcc.target/i386/pr82260-2.c scan-assembler \\mmovl\\t%esi, %ecx
FAIL: gcc.target/i386/pr82260-2.c scan-assembler \\mmovb\\t%dl, %cl
FAIL: gcc.target/i386/pr82260-2.c scan-assembler \\mmovb\\t%r8b, %cl

Option set:
-with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-shared
--enable-host-shared --enable-clocale=gnu --enable-cloog-backend=isl
--enable-languages=c,c++,fortran,jit,lto --with-arch=haswell --with-cpu=haswell

[Bug target/82341] New: [8 regression] i386/pr80732.c fail

2017-09-27 Thread andrey.y.guskov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82341

Bug ID: 82341
   Summary: [8 regression] i386/pr80732.c fail
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrey.y.guskov at intel dot com
  Target Milestone: ---

r253037 triggered the following fail:

FAIL: gcc.target/i386/pr80732.c (test for excess errors)
Excess errors:
gcc.target/i386/pr80732.c:46:8: warning: 'f2' 'ifunc' resolver should return a
function pointer [-Wattributes]

Option set:
-with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-shared
--enable-host-shared --enable-clocale=gnu --enable-cloog-backend=isl
--enable-languages=c,c++,fortran,jit,lto --with-arch=haswell --with-cpu=haswell

[Bug other/82327] [7 Regression] ICE in equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429 (i686-linux-gnu)

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82327

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Can't reproduce, neither with current gcc-7-branch, nor with r253107 (both
x86_64-linux with -m32 -march=i686 -mtune=generic -mno-sse), nor Sep 15th build
of i686-linux compiler.  Whether the compiler defaults to PIE or not should not
matter given the explicit -fPIC.

Can you reproduce it with vanilla gcc-7-branch?

[Bug middle-end/81657] [8 Regression] FAIL: gcc.dg/20050503-1.c scan-assembler-not call

2017-09-27 Thread andrey.y.guskov at intel dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81657

Andrey Guskov  changed:

   What|Removed |Added

 CC||andrey.y.guskov at intel dot 
com

--- Comment #2 from Andrey Guskov  ---
Also seeing this.

[Bug target/82339] Inefficient movabs instruction

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*

--- Comment #2 from Richard Biener  ---
I always wondered if it is more efficient to have constant pools per function
in .text so we can do %rip relative loads with short displacement?

I suppose the assembler could even optimize things if there's the desired
constant somewhere near in the code itself... (in case data loads from icache
do not occur too much of a penalty).  The assembler could also replace
.palign space before function start with (small) constant(s).

Nothing we can really do given x86 has no idea of instruction sizes.

[Bug libfortran/66756] libgfortran: ThreadSanitizer: lock-order-inversion

2017-09-27 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66756

--- Comment #10 from Dominique d'Humieres  ---
> Could this still be fixed / filtered out in the ThreadSanitizer somehow?

Should it be moved to the sanitizer component?

[Bug c/82340] volatile ignored in compound literal

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82340

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |c

--- Comment #1 from Richard Biener  ---
Well.  I don't see how a volatile compound literal could make any sense or how
you'd observe the side-effect of multiple stores to it (IIRC compound literals
are constant!?).

So you need to come up with sth more clever to be convincing ;)

Implementation-wise we end up with

  volatile char * p;
  volatile char D.1801[1];

   [0.00%]:
  D.1801[0] = 1;
  p_6 = 
  i_7 = 1;
  goto ; [0.00%]

   [0.00%]:
  *p_6 ={v} 4;
  i_10 = i_2 + 1;

   [0.00%]:
  # i_2 = PHI 
  if (i_2 <= 9)
goto ; [0.00%]
  else
goto ; [0.00%]

   [0.00%]:
  _1 ={v} *p_6;
  _8 = (int) _1;

which is mostly fine but the initialization of D.1801[0] is not a volatile
access.  And after optimization

  volatile char D.1801[1];
  char _1;
  unsigned int ivtmp_2;
  int _6;
  unsigned int ivtmp_10;

   [10.00%]:

   [90.00%]:
  # ivtmp_2 = PHI 
  MEM[(volatile char *)] ={v} 4;
  ivtmp_10 = ivtmp_2 + 4294967295;
  if (ivtmp_10 != 0)
goto ; [88.89%]
  else
goto ; [11.11%]

   [10.00%]:
  _1 ={v} MEM[(volatile char *)];
  _6 = (int) _1;

so GIMPLE does what you want but somewhere on RTL we are too clever in the
end.  D.1801 ends up being expanded as a register.  Probably the variable
itself is _not_ marked volatile but only the array member type.

[Bug libfortran/66756] libgfortran: ThreadSanitizer: lock-order-inversion

2017-09-27 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66756

Thomas Koenig  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|INVALID |---
   Severity|normal  |enhancement

--- Comment #9 from Thomas Koenig  ---
Well, if it is indeed a serious problem, we can re-open
as an enhancement request.

Unfortunately, I don't know if it is possible to shut up the
thread sanitizer somehow.

A possiblity would be to lock the unit only after the global
lock has been released, and possibly keep around the global
lock for longer. If we still are in the process of opening the
file in the original thread, then there should be no problem
(at least I hope so...)

[Bug lto/82172] Destruction of basic_string in basic_stringbuf::overflow with _GLIBCXX_USE_CXX11_ABI=0, -flto, and C++17 mode results in invalid delete

2017-09-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82172

--- Comment #20 from Martin Liška  ---
(In reply to Gubbins from comment #19)
> (In reply to Martin Liška from comment #18)
> > Issue solved, ld.bfd is responsible.
> 
> Unfortunately, the same test program also crashes when built and linked on
> OSX.
> 
> I tested with Sierra (OSX 10.12.5), gcc 7.2.0, compiling the original sample
> here with:
> 
> g++-7 -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 -O1 test.cpp
> 
> Result:
> 
> > ./a.out
> a.out(608,0x7fffbda8a3c0) malloc: *** error for object 0x10c68d080: pointer
> being freed was not allocated
> *** set a breakpoint in malloc_error_break to debug
> Abort trap: 6
> 
> 
> I think this means the Darwin linker has a similar problem.

Your failure happens even w/o LTO, am I right?
But yes, the problem looks very similar to what happens for ld.bfd.

[Bug target/82339] Inefficient movabs instruction

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339

--- Comment #1 from Jakub Jelinek  ---
Created attachment 42243
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42243=edit
gcc8-pr82339.patch

Patch for -Os where movl $cst, %eXX; shlq $shift, %rXX is 1 byte shorter than
movabsq $(cst << shift), %rXX.
For speed, there is yet another option of movq $cst, %rXX; shlq $shift, %rXX
for constants like 0x12345670 which have a sequence of 1 bit, followed
by at most 31 arbitrary bits and then the rest is all 0s,
movabsq $12345670, %r8 is equivalent to movq $-249346713, %r8; shlq
$20, %r8 which is longer, but perhaps faster (on which CPUs).

[Bug c/82340] New: volatile ignored in compound literal

2017-09-27 Thread pascal_cuoq at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82340

Bug ID: 82340
   Summary: volatile ignored in compound literal
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pascal_cuoq at hotmail dot com
  Target Milestone: ---

Consider the function f below:

int f(void) {
  volatile char *p = (volatile char[1]){1};
  for (int i=1; i<10; i++) *p=4;
  return *p;
}

Volatile access is a visible side-effect, so one may expect the generated code
for the function f to do “something” nine times, for some definition of
“something”.

In GCC 7.2 and in gcc.godbolt.org's current snapshot of “gcc (trunk)”, the
function f is compiled to:

f:
movl$4, %eax
ret

Command: gcc -O3 -std=c11 -xc -pedantic -S t.c
Link: https://godbolt.org/g/4Ua1Ud

I would expect function f to be compiled to something that ressembles the code
produced for function g, or the code produced by Clang for f:

int g(void) {
  volatile char t[1] = {1};
  volatile char *p = t;
  for (int i=1; i<10; i++) *p=4;
  return *p;
}

g:
movb$1, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movb$4, -1(%rsp)
movsbl  -1(%rsp), %eax
ret

[Bug lto/82172] Destruction of basic_string in basic_stringbuf::overflow with _GLIBCXX_USE_CXX11_ABI=0, -flto, and C++17 mode results in invalid delete

2017-09-27 Thread dave.gittins at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82172

Gubbins  changed:

   What|Removed |Added

 CC||dave.gittins at gmail dot com

--- Comment #19 from Gubbins  ---
(In reply to Martin Liška from comment #18)
> Issue solved, ld.bfd is responsible.

Unfortunately, the same test program also crashes when built and linked on OSX.

I tested with Sierra (OSX 10.12.5), gcc 7.2.0, compiling the original sample
here with:

g++-7 -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 -O1 test.cpp

Result:

> ./a.out
a.out(608,0x7fffbda8a3c0) malloc: *** error for object 0x10c68d080: pointer
being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6


I think this means the Darwin linker has a similar problem.

[Bug target/82339] New: Inefficient movabs instruction

2017-09-27 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82339

Bug ID: 82339
   Summary: Inefficient movabs instruction
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

At least on i7-5960X in the following testcase:
__attribute__((noinline, noclone)) unsigned long long int
foo (int x)
{
  asm volatile ("" : : : "memory");
  return 1ULL << (63 - x);
}

__attribute__((noinline, noclone)) unsigned long long int
bar (int x)
{
  asm volatile ("" : : : "memory");
  return (1ULL << 63) >> x;
}

__attribute__((noinline, noclone)) unsigned long long int
baz (int x)
{
  unsigned long long int y = 1;
  asm volatile ("" : "+r" (y) : : "memory");
  return (y << 63) >> x;
}

int
main (int argc, const char **argv)
{
  int i;
  if (argc == 1)
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (foo (13)));
  else if (argc == 2)
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (bar (13)));
  else if (argc == 3)
for (i = 0; i < 10; i++)
  asm volatile ("" : : "r" (baz (13)));
  return 0;
}

baz is fastest as well as shortest.
So I think we should consider using movl $cst, %edx; shlq $shift, %rdx instead
of movabsq $(cst << shift), %rdx.

Unfortunately I can't find in Agner Fog MOVABS and for MOV r64,i64 there is too
little information, so it is unclear on which CPUs it is beneficial.
For -Os, if the destination is a %rax to %rsp register, it is one byte shorter
(5+4 vs 10), for %r8 to %r15 it is the same size.
For speed optimization, the disadvantage is obviously that the shift clobbers
flags register.

Peter, any information on what the MOV r64,i64 latency/throughput on various
CPUs vs. MOV r32,i32; SHL r64,i8 is?

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2017-09-27 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

--- Comment #27 from rguenther at suse dot de  ---
On Wed, 27 Sep 2017, david at westcontrol dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892
> 
> David Brown  changed:
> 
>What|Removed |Added
> 
>  CC||david at westcontrol dot com
> 
> --- Comment #26 from David Brown  ---
> (In reply to rguent...@suse.de from comment #24)
> > On Wed, 2 Nov 2016, txr at alumni dot caltech.edu wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892
> > > 
> > > --- Comment #22 from Tim Rentsch  ---
> > > [responding to comments from rguent...@suse.de in Comment 20]
> > > 
> > > > GCC already implements this if you specify -fno-strict-aliasing.
> > > 
> > > The main point of my comments is that the ISO C standard requires
> > > the behavior in this case (and similar cases) be defined and not
> > > subject to any reordering.  In other words the result must be the
> > > same as an unoptimized version.  If a -fstrict-aliasing gcc /does/
> > > transform the code so that the behavior is not the same as an
> > > unoptimized version, then gcc is not a conforming implementation.
> > 
> > GCC has various optimization options that make it a not strictly
> > conforming implementation (-ffast-math for example), various
> > GNU extensions to the language, etc.
> > 
> > > Or is it your position that gcc is conforming only when operated
> > > in the -fno-strict-aliasing mode?  That position seems contrary to
> > > the documented description of the -fstrict-aliasing option.
> > 
> > Well, N685 is still disputed in this bug.  I was just pointing out
> > that GCC has a switch to make it conforming to your interpretation
> > of the standard (and this switch is the default at -O0 and -O1).
> 
> A key difference with non-conformance options like -ffast-math is that these
> are not default options.  A user must actively choose to use them.  A user
> should not need particular options in order to get correct object code from
> their correct source code - or at least the user should get obvious error
> messages when using default options but where their source code hits an oddity
> in gcc (as they would get if they happened to use a gcc extension keyword like
> "asm" as an identifier in conforming C code).  What should not happen is for
> the compiler to silently break good code unless the user has given specific
> flags.
> 
> I am not sure whether this particular case really is a bug or not.  However, I
> wonder if there has been too much emphasis on trying to understand exactly 
> what
> the standards say.  If the gcc developers here, who are amongst the most
> knowledgeable C and C++ experts around, have trouble with the details - then
> consider the position of the average C developer.  Maybe it is better to try
> see it from their viewpoint - would a programmer expect these accesses to 
> alias
> or not?  If it is likely that programmers would expect aliasing here, and see
> that behaviour in other compilers, then the /useful/ default behaviour for gcc
> would be to treat code in the way programmers expect - even with -O3.  Then
> have a "-fI-know-what-I-am-doing" flag for those that want to squeeze out the
> last bit of performance.

Unfortunately it's not the "last bit of performance", otherwise it would
be indeed a no-brainer.

People expect fast code from a compiler and do not want to enable
dozens of -fIm-writing-reasonable-code.  Some benchmarks even have
rules as to how many options you are allowed to enable...

Richard.

[Bug c/65892] gcc fails to implement N685 aliasing of union members

2017-09-27 Thread david at westcontrol dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892

David Brown  changed:

   What|Removed |Added

 CC||david at westcontrol dot com

--- Comment #26 from David Brown  ---
(In reply to rguent...@suse.de from comment #24)
> On Wed, 2 Nov 2016, txr at alumni dot caltech.edu wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892
> > 
> > --- Comment #22 from Tim Rentsch  ---
> > [responding to comments from rguent...@suse.de in Comment 20]
> > 
> > > GCC already implements this if you specify -fno-strict-aliasing.
> > 
> > The main point of my comments is that the ISO C standard requires
> > the behavior in this case (and similar cases) be defined and not
> > subject to any reordering.  In other words the result must be the
> > same as an unoptimized version.  If a -fstrict-aliasing gcc /does/
> > transform the code so that the behavior is not the same as an
> > unoptimized version, then gcc is not a conforming implementation.
> 
> GCC has various optimization options that make it a not strictly
> conforming implementation (-ffast-math for example), various
> GNU extensions to the language, etc.
> 
> > Or is it your position that gcc is conforming only when operated
> > in the -fno-strict-aliasing mode?  That position seems contrary to
> > the documented description of the -fstrict-aliasing option.
> 
> Well, N685 is still disputed in this bug.  I was just pointing out
> that GCC has a switch to make it conforming to your interpretation
> of the standard (and this switch is the default at -O0 and -O1).

A key difference with non-conformance options like -ffast-math is that these
are not default options.  A user must actively choose to use them.  A user
should not need particular options in order to get correct object code from
their correct source code - or at least the user should get obvious error
messages when using default options but where their source code hits an oddity
in gcc (as they would get if they happened to use a gcc extension keyword like
"asm" as an identifier in conforming C code).  What should not happen is for
the compiler to silently break good code unless the user has given specific
flags.

I am not sure whether this particular case really is a bug or not.  However, I
wonder if there has been too much emphasis on trying to understand exactly what
the standards say.  If the gcc developers here, who are amongst the most
knowledgeable C and C++ experts around, have trouble with the details - then
consider the position of the average C developer.  Maybe it is better to try
see it from their viewpoint - would a programmer expect these accesses to alias
or not?  If it is likely that programmers would expect aliasing here, and see
that behaviour in other compilers, then the /useful/ default behaviour for gcc
would be to treat code in the way programmers expect - even with -O3.  Then
have a "-fI-know-what-I-am-doing" flag for those that want to squeeze out the
last bit of performance.

[Bug tree-optimization/82337] [5/6/7/8 Regression] ICE: SSA corruption at tree-ssa-coalesce.c:1010

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337

Richard Biener  changed:

   What|Removed |Added

 CC||wschmidt at gcc dot gnu.org

--- Comment #5 from Richard Biener  ---
It's SLSR, so a workaround is -fno-tree-slsr.

It's a bit iffy to fix given one of the suitable points to fence off is
alloc_cand_and_find_basis where we should "reject"
SSA_NAME_OCCURS_IN_ABNORMAL_PHI (base) but this function isn't supposed
to "FAIL".

I'm also not sure it covers the cases fully.

Basically when doing replacements SLSR may _never_ end up with
a SSA_NAME_OCCURS_IN_ABNORMAL_PHI SSA name in the replacement expression.

Costing doesn't seem to apply to unconditional candidates so fending it
off there doesn't seem viable.

The easiest thing is to try fend off during the stmt walk like with the
following big hammer.  Not sure if that's enough or we walk SSA def
stmts from other places.  Bill, can you take over with the hint below?

Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 253203)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -802,6 +802,8 @@ slsr_process_phi (gphi *phi, bool speed)
  definitions must be in the same position in the loop hierarchy
  as PHI.  */

+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
+return;
   for (i = 0; i < gimple_phi_num_args (phi); i++)
 {
   slsr_cand_t arg_cand;
@@ -810,7 +812,8 @@ slsr_process_phi (gphi *phi, bool speed)
   gimple *arg_stmt = NULL;
   basic_block arg_bb = NULL;

-  if (TREE_CODE (arg) != SSA_NAME)
+  if (TREE_CODE (arg) != SSA_NAME
+ || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg))
return;

   arg_cand = base_cand_from_table (arg);
@@ -1738,6 +1741,18 @@ find_candidates_dom_walker::before_dom_c
 {
   gimple *gs = gsi_stmt (gsi);

+  tree op;
+  ssa_op_iter iter;
+  bool abnormal_found = false;
+  FOR_EACH_SSA_TREE_OPERAND (op, gs, iter, SSA_OP_USE|SSA_OP_DEF)
+   if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
+ {
+   abnormal_found = true;
+   break;
+ }
+  if (abnormal_found)
+   continue;
+
   if (gimple_vuse (gs) && gimple_assign_single_p (gs))
slsr_process_ref (gs);

[Bug c++/82338] New: valgrind error in inherit_in_ebb

2017-09-27 Thread dcb314 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82338

Bug ID: 82338
   Summary: valgrind error in inherit_in_ebb
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dcb314 at hotmail dot com
  Target Milestone: ---

Created attachment 42242
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42242=edit
C++ source code

I built a recent trunk version of gcc with valgrind.

I compiled the attached C++ code with flags -O3 -fPIC -fno-strict-aliasing -c
and got the following:

$ ~/gcc/results.253187.valgrind/bin/gcc -O3 -fPIC -fno-strict-aliasing -c
bug386.cc
==17152== Conditional jump or move depends on uninitialised value(s)
==17152==at 0x9DCE01: inherit_in_ebb (lra-constraints.c:6224)
==17152==by 0x9DCE01: lra_inheritance() (lra-constraints.c:6474)
==17152==by 0x9C8518: lra(_IO_FILE*) (lra.c:2430)
==17152==by 0x986161: do_reload (ira.c:5440)

The bug seems to have been introduced sometime before revision 249539.

svn blame claims that lra-constraints.c:6224 is as follows:

192719   vmakarov && usage_insns[regno].calls_num == calls_num - 1

I'll have a go at reducing the code.

[Bug c++/82336] [5/6/7/8 Regression] GCC requires but does not emit defaulted constructors in certain cases

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82336

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |5.5

[Bug target/82333] [8 Regression] powerpc64le _Float128 ICE in as_a, at machmode.h:345

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82333

Richard Biener  changed:

   What|Removed |Added

Version|7.0 |8.0
   Target Milestone|--- |8.0

[Bug c++/82331] [7/8 Regression] ICE specializing template for function pointer

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82331

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Priority|P3  |P2
  Known to work||7.1.0
   Target Milestone|--- |7.3
Summary|ICE specializing|[7/8 Regression] ICE
   |template for function |specializing template
   |pointer |for function pointer
  Known to fail||7.2.0

[Bug middle-end/82329] #pragma GCC target/optimize incurs high compilation time cost

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82329

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-27
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug lto/82172] Destruction of basic_string in basic_stringbuf::overflow with _GLIBCXX_USE_CXX11_ABI=0, -flto, and C++17 mode results in invalid delete

2017-09-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82172

Martin Liška  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #18 from Martin Liška  ---
Issue solved, ld.bfd is responsible. Gold properly marks the symbols as:
1322 aef281150a4b024f PREVAILING_DEF_IRONLY_EXP
_ZNSs4_Rep20_S_empty_rep_storageE

I'm going to create binutils issue for that.

[Bug other/82327] [7 Regression] ICE in equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429 (i686-linux-gnu)

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82327

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug c++/82326] static_cast for vector extension not working?

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82326

--- Comment #2 from Richard Biener  ---
Agreed.  One could allow changes in signedness as extension.

[Bug tree-optimization/82321] [8 Regression] ICE in check_loop_closed_ssa_use, at tree-ssa-loop-manip.c:707

2017-09-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82321

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Richard Biener  ---
Fixed.

[Bug c/82323] circular ifunc attribute on a function definition silently accepted

2017-09-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82323

Martin Liška  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |marxin at gcc dot 
gnu.org

--- Comment #4 from Martin Liška  ---
Good, I'm taking that for stage4.

[Bug tree-optimization/82337] [5/6/7/8 Regression] ICE: SSA corruption at tree-ssa-coalesce.c:1010

2017-09-27 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82337

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-27
 CC||marxin at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
Summary|-O2: ICE: SSA corruption at |[5/6/7/8 Regression] ICE:
   |tree-ssa-coalesce.c:1010|SSA corruption at
   ||tree-ssa-coalesce.c:1010
 Ever confirmed|0   |1

--- Comment #4 from Martin Liška  ---
Started with r214941.