[Bug ipa/98594] [11 Regression] IPA modref codegen bug

2021-01-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98594

--- Comment #4 from rguenther at suse dot de  ---
On Wed, 27 Jan 2021, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98594
> 
> --- Comment #3 from Jan Hubicka  ---
> The initialization is removed by dse1 pass.  We get:
> ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int,
> glm::packed_highp> (&D.3185); [return slot optimization]
> ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const
> glm::vec&) [with int L = 1; T = int; glm::qualifier Q =
> glm::packed_highp]/8 does not use ref: D.3185.D.3097.x alias sets: 3->1
>   Deleted dead store: D.3185.D.3097.x = x_2(D);   
>   
> 
> ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int,
> glm::packed_highp> (&D.3185); [return slot optimization]
> ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const
> glm::vec&) [with int L = 1; T = int; glm::qualifier Q =
> glm::packed_highp]/8 does not use ref: D.3185 alias sets: 3->3
>   Deleted dead store: D.3185 ={v} {CLOBBER};  
>   
> 
> Now the modref summary for function is
>   loads:  
>   
> Limits: 32 bases, 16 refs 
>   
>   Base 0: alias set 5 
>   
> Ref 0: alias set 5
>   
>   access: Parm 0 param offset:0 offset:0 size:32 max_size:32  
>   
> 
> alias set 5 correspond to const struct vec but diferent instantiation than
> alias set 3 used in the store.
> There is reinterpret cast:
> 
>   glm::vec::type, Q>
  x(*reinterpret_cast<
glm::vec::type, Q> const *>(&v));
> 
> turning it to
> 
>   glm::vec::type, Q> x(*(&v));
> 
> makes the aliasing difference go away.  So it seems to me that the testcase
> simply includes TBAA violation?

Not sure but if my visuals do not cheat me then the difference is only
const qualification so it should not matter for TBAA?  Of course the
question is what type 'v' has since this maybe invokes a different CTOR?

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #2 from Richard Biener  ---
The cxx bench Botan doesn't know --cxxflags, what Botan version are you looking
at?

[Bug c++/98861] I want deterministic exceptions (Herbception)

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98861

Richard Biener  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Last reconfirmed||2021-01-28
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

[Bug bootstrap/98860] [11 Regression] boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

Richard Biener  changed:

   What|Removed |Added

Summary|boostrap failure on |[11 Regression] boostrap
   |MinGW-w64 windows 10|failure on MinGW-w64
   ||windows 10
   Target Milestone|--- |11.0

[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

--- Comment #6 from cqwrteur  ---
configure:4069: ./conftest.exe
/home/unlvs/mcf_build/src/gcc-git/libgomp/configure: line 4071: ./conftest.exe:
cannot execute binary file: Exec format error
configure:4073: $? = 126
configure:4080: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libgomp':
configure:4082: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details

[Bug fortran/93524] [ISO C Binding][F2018] CFI_allocate – elem_size mishandled + sm wrongly set?

2021-01-27 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93524

Thomas Koenig  changed:

   What|Removed |Added

 CC||tkoenig at gcc dot gnu.org

--- Comment #3 from Thomas Koenig  ---
A related patch was applied at

https://gcc.gnu.org/g:1cdca4261e88f4dc9c3293c6b3c2fff3071ca32b .

[Bug target/98799] [11 Regression] vector_set_var ICE

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98799

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Xiong Hu Luo :

https://gcc.gnu.org/g:fbe37371cf372b84d5b7f1a6f5f0971a513dd5fa

commit r11-6947-gfbe37371cf372b84d5b7f1a6f5f0971a513dd5fa
Author: Xionghu Luo 
Date:   Wed Jan 27 20:24:03 2021 -0600

rs6000: Fix vec insert ilp32 ICE and test failures [PR98799]

UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
variable vector insert.  Remove rs6000_expand_vector_set_var helper
function, adjust the p8 and p9 definitions position and make them
static.

The previous commit r11-6858 missed check m32, This patch is tested pass
on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with
RUNTESTFLAGS="--target_board =unix'{-m32,-m64}'" for BE targets.

gcc/ChangeLog:

2021-01-27  Xionghu Luo  
David Edelsohn  

PR target/98799
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Don't generate VIEW_CONVERT_EXPR for fcode
ALTIVEC_BUILTIN_VEC_INSERT
when -m32.
* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var):
Delete.
* config/rs6000/rs6000.c (rs6000_expand_vector_set): Remove the
wrapper call rs6000_expand_vector_set_var for cleanup.  Call
rs6000_expand_vector_set_var_p9 and rs6000_expand_vector_set_var_p8
directly.
(rs6000_expand_vector_set_var): Delete.
(rs6000_expand_vector_set_var_p9): Make static.
(rs6000_expand_vector_set_var_p8): Make static.

gcc/testsuite/ChangeLog:

2021-01-27  Xionghu Luo  

PR target/98827
* gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust ilp32.
* gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
* gcc.target/powerpc/pr79251.p8.c: Likewise.
* gcc.target/powerpc/pr79251.p9.c: Likewise.
* gcc.target/powerpc/vsx-builtin-7.c: Likewise.
* gcc.target/powerpc/pr79251-run.c: Build and run with vsx
option.

[Bug target/98827] [11 regression] gcc.target/powerpc/vsx-builtin-7.c assembler counts off after r11-6857

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98827

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Xiong Hu Luo :

https://gcc.gnu.org/g:fbe37371cf372b84d5b7f1a6f5f0971a513dd5fa

commit r11-6947-gfbe37371cf372b84d5b7f1a6f5f0971a513dd5fa
Author: Xionghu Luo 
Date:   Wed Jan 27 20:24:03 2021 -0600

rs6000: Fix vec insert ilp32 ICE and test failures [PR98799]

UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
variable vector insert.  Remove rs6000_expand_vector_set_var helper
function, adjust the p8 and p9 definitions position and make them
static.

The previous commit r11-6858 missed check m32, This patch is tested pass
on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with
RUNTESTFLAGS="--target_board =unix'{-m32,-m64}'" for BE targets.

gcc/ChangeLog:

2021-01-27  Xionghu Luo  
David Edelsohn  

PR target/98799
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Don't generate VIEW_CONVERT_EXPR for fcode
ALTIVEC_BUILTIN_VEC_INSERT
when -m32.
* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var):
Delete.
* config/rs6000/rs6000.c (rs6000_expand_vector_set): Remove the
wrapper call rs6000_expand_vector_set_var for cleanup.  Call
rs6000_expand_vector_set_var_p9 and rs6000_expand_vector_set_var_p8
directly.
(rs6000_expand_vector_set_var): Delete.
(rs6000_expand_vector_set_var_p9): Make static.
(rs6000_expand_vector_set_var_p8): Make static.

gcc/testsuite/ChangeLog:

2021-01-27  Xionghu Luo  

PR target/98827
* gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust ilp32.
* gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
* gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
* gcc.target/powerpc/pr79251.p8.c: Likewise.
* gcc.target/powerpc/pr79251.p9.c: Likewise.
* gcc.target/powerpc/vsx-builtin-7.c: Likewise.
* gcc.target/powerpc/pr79251-run.c: Build and run with vsx
option.

[Bug c++/98862] New: Complex reduction support in offload region

2021-01-27 Thread xw111luoye at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98862

Bug ID: 98862
   Summary: Complex reduction support in offload region
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xw111luoye at gmail dot com
  Target Milestone: ---

Using std::complex type in offload region is highly desired.

$ g++ -fopenmp complex_reduction.cpp
ptxas /tmp/cceLNaYr.o, line 484; error   : Label expected for argument 0 of
instruction 'call'
ptxas /tmp/cceLNaYr.o, line 484; error   : Function '_ZNSt7complexIfEC1Eff' not
declared in this scope
ptxas /tmp/cceLNaYr.o, line 484; fatal   : Call target not recognized
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1
exit status
compilation terminated.
lto-wrapper: fatal error:
/soft/gcc/gcc-11-dev-2021-01-27/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0//accel/nvptx-none/mkoffload
returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

$ g++ -fopenmp -O2 complex_reduction.cpp
unresolved symbol __atomic_compare_exchange_16
collect2: error: ld returned 1 exit status
mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1
exit status
compilation terminated.
lto-wrapper: fatal error:
/soft/gcc/gcc-11-dev-2021-01-27/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0//accel/nvptx-none/mkoffload
returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The -O2 is more useful for production. Fixing both are desired.

source code:
https://github.com/ye-luo/openmp-target/blob/master/hands-on/tests/complex/complex_reduction.cpp

[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

--- Comment #5 from cqwrteur  ---
I do not know whether it has to do with the CRLF issue because GCC on Linux
emits the same result as it does on MinGW-w64 or msys2.

conftextx.c

#ifdef __x86_64__
#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
#error need -march=i486
#endif
#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
#error need -mcx16
#endif
#else
#ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
#error need -march=i686
#endif
#endif


MinGW32

unlvs@DESKTOP-DFHPDC1 MINGW32 ~/gcc_bug
$  gcc -E conftestx.c
# 1 "conftestx.c"
# 1 ""
# 1 ""
# 1 "conftestx.c"

unlvs@DESKTOP-DFHPDC1 MINGW32 ~/gcc_bug
$  gcc -E conftestx.c -march=i486
# 1 "conftestx.c"
# 1 ""
# 1 ""
# 1 "conftestx.c"
conftestx.c:10:2: error: #error need -march=i686
   10 | #error need -march=i686
  |  ^

MinGW64

unlvs@DESKTOP-DFHPDC1 MINGW64 ~/gcc_bug
$  gcc -E conftestx.c -m32
# 1 "conftestx.c"
# 1 ""
# 1 ""
# 1 "conftestx.c"

unlvs@DESKTOP-DFHPDC1 MINGW64 ~/gcc_bug
$  gcc -E conftestx.c -march=i486 -mtune=generic
# 1 "conftestx.c"
cc1.exe: error: CPU you selected does not support x86-64 instruction set

MSYS (which is x86_64 with CYGWIN)

unlvs@DESKTOP-DFHPDC1 MSYS ~/gcc_bug
$  gcc -E conftestx.c
# 1 "conftestx.c"
# 1 ""
# 1 ""
# 1 "conftestx.c"
conftestx.c:6:2: error: #error need -mcx16
6 | #error need -mcx16
  |  ^

unlvs@DESKTOP-DFHPDC1 MSYS ~/gcc_bug
$  gcc -E conftestx.c -march=i486
# 1 "conftestx.c"
cc1: error: CPU you selected does not support x86-64 instruction set


The result on Linux:

cqwrteur@DESKTOP-DFHPDC1:/mnt/d/msys64/home/unlvs/gcc_bug$ gcc -E conftestx.c
# 0 "conftestx.c"
# 0 ""
# 0 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "" 2
# 1 "conftestx.c"
conftestx.c:6:2: error: #error need -mcx16
6 | #error need -mcx16
  |  ^
cqwrteur@DESKTOP-DFHPDC1:/mnt/d/msys64/home/unlvs/gcc_bug$ gcc -E conftestx.c
-march=i486
# 0 "conftestx.c"
cc1: error: CPU you selected does not support x86-64 instruction set

[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

--- Comment #4 from cqwrteur  ---
Created attachment 50071
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50071&action=edit
bootstrap failure picture

[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

--- Comment #3 from cqwrteur  ---
After revert to the previous commit. Compilation success

https://github.com/gcc-mirror/gcc/commit/bfab355012ca0f5219da8beb04f2fdaf757d34b7

I think it has to do with the script you changed, Jakub.

[Bug c++/98861] New: I want deterministic exceptions (Herbception)

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98861

Bug ID: 98861
   Summary: I want deterministic exceptions (Herbception)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: unlvsur at live dot com
  Target Milestone: ---

The mailing list requires me to request the feature here. I put it here.
https://www.mail-archive.com/gcc@gcc.gnu.org/msg94104.html
http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p0709r4.pdf

[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

--- Comment #2 from cqwrteur  ---
I guess is because of this commit

https://github.com/gcc-mirror/gcc/commit/0411ae7f08e0f5a8b02ff313d26d27a0f6d1bb34

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0411ae7f08e0f5a8b02ff313d26d27a0f6d1bb34

[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

--- Comment #1 from cqwrteur  ---
The question is that why it says we are not cross-compiling? I am using the
same script I used before.
https://bitbucket.org/ejsvifq_mabmip/mingw-gcc-mcf-gthread/src/master/PKGBUILD

It is so weird.


checking whether we are cross compiling... configure: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libgomp':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
configure: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libatomic':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
.exe
checking whether we are cross compiling... make[1]: *** [Makefile:15606:
configure-target-libgomp] Error 1
make[1]: *** Waiting for unfinished jobs
configure: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libssp':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
make[1]: *** [Makefile:16174: configure-target-libatomic] Error 1
configure: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libquadmath':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
make[1]: *** [Makefile:13329: configure-target-libssp] Error 1
make[1]: *** [Makefile:14375: configure-target-libquadmath] Error 1
make[1]: Leaving directory '/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32'
make: *** [Makefile:973: all] Error 2
==> ERROR: A failure occurred in build().
Aborting...

[Bug ipa/98594] [11 Regression] IPA modref codegen bug

2021-01-27 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98594

--- Comment #3 from Jan Hubicka  ---
The initialization is removed by dse1 pass.  We get:
ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int,
glm::packed_highp> (&D.3185); [return slot optimization]
ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const
glm::vec&) [with int L = 1; T = int; glm::qualifier Q =
glm::packed_highp]/8 does not use ref: D.3185.D.3097.x alias sets: 3->1
  Deleted dead store: D.3185.D.3097.x = x_2(D); 

ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int,
glm::packed_highp> (&D.3185); [return slot optimization]
ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const
glm::vec&) [with int L = 1; T = int; glm::qualifier Q =
glm::packed_highp]/8 does not use ref: D.3185 alias sets: 3->3
  Deleted dead store: D.3185 ={v} {CLOBBER};

Now the modref summary for function is
  loads:
Limits: 32 bases, 16 refs   
  Base 0: alias set 5   
Ref 0: alias set 5  
  access: Parm 0 param offset:0 offset:0 size:32 max_size:32

alias set 5 correspond to const struct vec but diferent instantiation than
alias set 3 used in the store.
There is reinterpret cast:

  glm::vec::type,
Q>x(*reinterpret_cast::type,
Q> const *>(&v));

turning it to

  glm::vec::type, Q> x(*(&v));

makes the aliasing difference go away.  So it seems to me that the testcase
simply includes TBAA violation?

[Bug c/97172] [11 Regression] ICE: tree code ‘ssa_name’ is not supported in LTO streams since r11-3303-g6450f07388f9fe57

2021-01-27 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172

--- Comment #25 from Martin Sebor  ---
Patch v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564411.html

[Bug bootstrap/98860] New: boostrap failure on MinGW-w64 windows 10

2021-01-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860

Bug ID: 98860
   Summary: boostrap failure on MinGW-w64 windows 10
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: unlvsur at live dot com
  Target Milestone: ---

checking whether we are cross compiling... configure: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libatomic':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
.exe
checking whether we are cross compiling... make[1]: *** [Makefile:15606:
configure-target-libgomp] Error 1
make[1]: *** Waiting for unfinished jobs
make[1]: *** [Makefile:16174: configure-target-libatomic] Error 1
configure: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libssp':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
configure: error: in
`/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libquadmath':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
make[1]: *** [Makefile:13329: configure-target-libssp] Error 1
make[1]: *** [Makefile:14375: configure-target-libquadmath] Error 1
make[1]: Leaving directory '/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32'
make: *** [Makefile:973: all] Error 2
==> ERROR: A failure occurred in build().
Aborting...

[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case

2021-01-27 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #26 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #23)
> (that combine number prevails on trunk as well, I can't spot any code
> that disables combine on large BBs so not sure what goes on here)

There is no such thing, indeed.  And the instruction combiner is
"mostly linear", so it shouldn't actually matter.

[Bug fortran/86470] [8/9/10/11 Regression] [OOP] ICE with OMP

2021-01-27 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86470

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org
 CC||anlauf at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #8 from anlauf at gcc dot gnu.org ---
Submitted: https://gcc.gnu.org/pipermail/fortran/2021-January/055647.html

[Bug rtl-optimization/97684] [11 Regression] ICE in reg_preferred_class, at reginfo.c:789 by r11-4577

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97684

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:081c96621da658760b4a67c07530805f770fa22c

commit r11-6943-g081c96621da658760b4a67c07530805f770fa22c
Author: Vladimir N. Makarov 
Date:   Wed Jan 27 14:53:28 2021 -0500

[PR97684] IRA: Recalculate pseudo classes if we added new pseduos since
last calculation before updating equiv regs

update_equiv_regs can use reg classes of pseudos and they are set up in
register pressure sensitive scheduling and loop invariant motion and in
live range shrinking.  This info can become obsolete if we add new pseudos
since the last set up.  Recalculate it again if the new pseudos were
added.

gcc/ChangeLog:

PR rtl-optimization/97684
* ira.c (ira): Call ira_set_pseudo_classes before
update_equiv_regs when it is necessary.

gcc/testsuite/ChangeLog:

PR rtl-optimization/97684
* gcc.target/i386/pr97684.c: New.

[Bug libstdc++/70303] Value-initialized debug iterators

2021-01-27 Thread fdumont at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70303

François Dumont  changed:

   What|Removed |Added

 CC||fdumont at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |fdumont at gcc dot 
gnu.org

--- Comment #6 from François Dumont  ---
After fixing the duplicate PR 98466 std::vector::iterator is ok but
std::deque::iterator seems to be broken still.

Taking it.

[Bug c++/98859] pedantic error on use of __VA_OPT__ before C++20 is unnecessary and counterproductive

2021-01-27 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98859

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-01-27
   Keywords||diagnostic

--- Comment #1 from Marek Polacek  ---
That sounds reasonable.

[Bug c++/98859] New: pedantic error on use of __VA_OPT__ before C++20 is unnecessary and counterproductive

2021-01-27 Thread richard-gccbugzilla at metafoo dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98859

Bug ID: 98859
   Summary: pedantic error on use of __VA_OPT__ before C++20 is
unnecessary and counterproductive
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: richard-gccbugzilla at metafoo dot co.uk
  Target Milestone: ---

There's no good way in ISO C or C++ to express what the GNU ,##__VA_ARGS__
extension does prior to the addition of __VA_OPT__. However, code targeting new
compilers (that doesn't want to use GNU C / GNU C++) cannot reliably use
__VA_OPT__ instead of the comma paste extension, because GCC's -pedantic-errors
mode rejects it outside C++20.

Such rejection is unnecessary: __VA_OPT__ is a reserved identifier in other
language modes, so there is no conformance reason to issue a diagnostic on its
use. I think it'd be useful for GCC to unconditionally allow using __VA_OPT__
in all language modes. (I'm changing Clang to do the same.)

[Bug c++/98570] [8/9/10/11 Regression] ICE: canonical types differ for identical types

2021-01-27 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98570

Jason Merrill  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jason at gcc dot gnu.org
 Status|NEW |ASSIGNED
 CC||jason at gcc dot gnu.org

[Bug lto/85574] [8/9 Regression] LTO bootstapped binaries differ

2021-01-27 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85574

--- Comment #38 from Eric Botcazou  ---
> Feel free to improve things - I do not have any Windows system to
> test on or an idea what you think needs to be improved.  I would
> guess similar things apply to compare-debug which it was derived from.

That's even more broken than initially thought: nobody sets $(exeext) at top
level so gcc/lto1 is passed and then the behavior is random since some tools
apppend the missing .exe implicitly and some don't.

[Bug c++/97874] [11 Regression] ICE: tree check: expected record_type or union_type or qual_union_type, have template_type_parm in lookup_using_decl, at cp/name-lookup.c:4652

2021-01-27 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97874

Jason Merrill  changed:

   What|Removed |Added

   Keywords|ice-on-invalid-code |ice-on-valid-code
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jason Merrill  ---
Fixed.

[Bug c++/97874] [11 Regression] ICE: tree check: expected record_type or union_type or qual_union_type, have template_type_parm in lookup_using_decl, at cp/name-lookup.c:4652

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97874

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:9cd7c32549fa334885b716fe98b674f6447fa7c0

commit r11-6942-g9cd7c32549fa334885b716fe98b674f6447fa7c0
Author: Jason Merrill 
Date:   Wed Jan 27 00:51:01 2021 -0500

c++: Dependent using enum [PR97874]

The handling of dependent scopes and unsuitable scopes in lookup_using_decl
was a bit convoluted; I tweaked it for a while and then eventually
reorganized much of the function to hopefully be clearer.  Along the way I
noticed a couple of ways we were mishandling inherited constructors.

The local binding for a dependent using is the USING_DECL.

Implement instantiation of a dependent USING_DECL at function scope.

gcc/cp/ChangeLog:

PR c++/97874
* name-lookup.c (lookup_using_decl): Clean up handling
of dependency and inherited constructors.
(finish_nonmember_using_decl): Handle DECL_DEPENDENT_P.
* pt.c (tsubst_expr): Handle DECL_DEPENDENT_P.

gcc/testsuite/ChangeLog:

PR c++/97874
* g++.dg/lookup/using4.C: No error in C++20.
* g++.dg/cpp0x/decltype37.C: Adjust message.
* g++.dg/template/crash75.C: Adjust message.
* g++.dg/template/crash76.C: Adjust message.
* g++.dg/cpp0x/inh-ctor36.C: New test.
* g++.dg/cpp1z/inh-ctor39.C: New test.
* g++.dg/cpp2a/using-enum-7.C: New test.

[Bug target/98853] [9/10 Regression] wrong use of bfxil at -O1

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2021-01-27
Summary|[9/10/11 Regression] wrong  |[9/10 Regression] wrong use
   |use of bfxil at -O1 |of bfxil at -O1
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

--- Comment #5 from Jakub Jelinek  ---
Fixed on the trunk so far.

[Bug target/98853] [9/10/11 Regression] wrong use of bfxil at -O1

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:55163419211c6f17e3e22c68304384eba35782a3

commit r11-6941-g55163419211c6f17e3e22c68304384eba35782a3
Author: Jakub Jelinek 
Date:   Wed Jan 27 20:35:21 2021 +0100

aarch64: Fix up *aarch64_bfxilsi_uxtw [PR98853]

The https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html
patch that introduced this pattern claimed:
Would generate:

combine_balanced_int:
bfxil   w0, w1, 0, 16
uxtwx0, w0
ret

But with this patch generates:

combine_balanced_int:
bfxil   w0, w1, 0, 16
ret
and it is indeed what it should generate, but it doesn't do that,
it emits bfxil  x0, x1, 0, 16
instead which doesn't zero extend from 32 to 64 bits, but preserves
the bits from the destination register.

2021-01-27  Jakub Jelinek  

PR target/98853
* config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Use
%w0, %w1 and %2 instead of %0, %1 and %2.

* gcc.c-torture/execute/pr98853-1.c: New test.
* gcc.c-torture/execute/pr98853-2.c: New test.

[Bug c++/98295] [8/9/10/11 Regression] ICE in verify_ctor_sanity, at cp/constexpr.c:4312

2021-01-27 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98295

Patrick Palka  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ppalka at gcc dot 
gnu.org
 CC||ppalka at gcc dot gnu.org

[Bug tree-optimization/60770] disappearing clobbers

2021-01-27 Thread orgads at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60770

--- Comment #15 from Orgad Shaneh  ---
test.cpp: In function ‘int f(int)’:
test.cpp:7:11: warning: ‘q’ is used uninitialized in this function
[-Wuninitialized]
7 |   return *p;
  |   ^

Is this the intended description? It doesn't refer to the real problem (storing
a pointer to a variable that is out of scope).

[Bug fortran/98858] OpenMP offload target data ICE at use_device_ptr

2021-01-27 Thread xw111luoye at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98858

--- Comment #1 from Ye Luo  ---
GNU Fortran (GCC) 11.0.0 20210127 (experimental)

[Bug fortran/98858] New: OpenMP offload target data ICE at use_device_ptr

2021-01-27 Thread xw111luoye at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98858

Bug ID: 98858
   Summary: OpenMP offload target data ICE at use_device_ptr
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xw111luoye at gmail dot com
  Target Milestone: ---

Getting ICE


yeluo@ryzen-box:~/opt/openmp-target/hands-on/tests/fortran_use_device_ptr$
gfortran -fopenmp test_use_device_ptr_target.f90 
test_use_device_ptr_target.f90:15:41:

   15 |   !$omp target data use_device_ptr(a)
  | ^
internal compiler error: Segmentation fault
0xf55ee3 crash_signal

source code at.
https://github.com/ye-luo/openmp-target/blob/master/hands-on/tests/fortran_use_device_ptr/test_use_device_ptr_target.f90

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #11 from Christophe Lyon  ---
Yes MVE is incompatible with iWMMXt.

Regarding the pattern name, quoting what I wrote in the commit message:
I kept the mve_vshlq_ naming instead of renaming it to
ashl3__ as discussed because the reference in
arm_mve_builtins.def automatically inserts the "mve_" prefix and I
didn't want to make a special case for this.

[Bug c++/98841] wrong ‘operator=’ should return a reference to ‘*this’ [-Weffc++]

2021-01-27 Thread o.mandel at menlosystems dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98841

--- Comment #7 from Olaf Mandel  ---
(In reply to Olaf Mandel from comment #0)
> In the minimal demo used here this only happens for a template member
> function, but in larger code it can also be observed for a plain member
> function: see e.g. https://github.com/jbeder/yaml-cpp/issues/970
> 
I have to retract that statement: I cannot reproduce this and the two line
numbers in the larger code in question are very similar: 212 and 221. Maybe I
just confused them?

[Bug tree-optimization/60770] disappearing clobbers

2021-01-27 Thread glisse at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60770

--- Comment #14 from Marc Glisse  ---
(In reply to Orgad Shaneh from comment #13)
> The case described in comment 1 doesn't issue a warning with GCC 10.

It does for me with -Wall -O (you need at least some optimization). If there is
still a problem, you need to open a new issue.

[Bug tree-optimization/60770] disappearing clobbers

2021-01-27 Thread orgads at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60770

Orgad Shaneh  changed:

   What|Removed |Added

 CC||orgads at gmail dot com

--- Comment #13 from Orgad Shaneh  ---
The case described in comment 1 doesn't issue a warning with GCC 10.

Looks like it's a different case than bug 60517.

[Bug c++/98295] [8/9/10/11 Regression] ICE in verify_ctor_sanity, at cp/constexpr.c:4312

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98295

--- Comment #4 from Jakub Jelinek  ---
Still ICEs even when that other bug is fixed.

[Bug testsuite/98351] [11 regression] gcc.target/powerpc/sse-andnps-1.c and sse2-andnpd-1.c fail after r11-3308

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98351

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Jakub Jelinek  ---
Should be fixed with r11-6869-gd08677c11dc4b43cc8bab862d1c986563897ce3f and
r11-6871-g70ab52b8cafffedb05b55c68c847173ff80f2652 and
https://gcc.gnu.org/g:e80f1f6b7a339bce1db03567e497658ae32d135e  

commit r11-6917-ge80f1f6b7a339bce1db03567e497658ae32d135e   
Author: Jakub Jelinek 
Date:   Tue Jan 26 20:02:29 2021 +0100  

testsuite: Fix TBAA in sse*and*p[sd]*.c tests   

This patch drops the no-strict-aliasing hack in m128-check.h and instead
ensures the tests read objects with the right dynamic type. 

2021-01-26  Jakub Jelinek 

* gcc.target/powerpc/m128-check.h (CHECK_EXP): Remove   
optimize ("no-strict-aliasing") attribute.  
* gcc.target/powerpc/sse-andnps-1.c (TEST): Copy e into float[4]
array to avoid violating TBAA.  
* gcc.target/powerpc/sse2-andpd-1.c (TEST): Copy e.d into double[2] 
array to avoid violating TBAA.  
* gcc.target/powerpc/sse-andps-1.c (TEST): Copy e.f into float[4]   
array to avoid violating TBAA.  
* gcc.target/powerpc/sse2-andnpd-1.c (TEST): Copy e into double[2]  
array to avoid violating TBAA.

[Bug testsuite/98349] [11 regression] gcc.target/powerpc/sse-movhps-1.c and sse-movlps.c fail after r11-3434

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98349

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Should be fixed by:
https://gcc.gnu.org/g:c63f091db89a56ae56b2bfa2ba4d9e956bd9693f  

commit r11-6879-gc63f091db89a56ae56b2bfa2ba4d9e956bd9693f   
Author: Jakub Jelinek 
Date:   Sat Jan 23 09:41:58 2021 +0100  

rs6000: Fix up __m64 typedef in mmintrin.h [PR97301]

The x86 __m64 type is defined as:   
/* The Intel API is flexible enough that we must allow aliasing with other  
   vector types, and their scalar components.  */   
typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__)); 
and so matches the comment above it in that reads and stores through
pointers to __m64 can alias anything.   
But in the rs6000 headers that is the case only for __m128, but not __m64.  

The following patch adds that attribute, which fixes the
FAIL: gcc.target/powerpc/sse-movhps-1.c execution test  
FAIL: gcc.target/powerpc/sse-movlps-1.c execution test  
regressions that appeared when Honza improved ipa-modref.   

2021-01-23  Jakub Jelinek 

PR testsuite/97301  
* config/rs6000/mmintrin.h (__m64): Add __may_alias__ attribute.

[Bug middle-end/98829] Different results with -O3 and custom quiet NaN

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98829

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Your custom quiet NaN is not a quiet NaN, but signaling NaN.
And, as documented, -fno-signaling-nans is the default.
If you change your custom signaling NaN into a quiet NaN,
static constexpr std::uint64_t kCustomNaN = 0x7ff8 | kMagicNumber;
or if you compile with -fsignaling-nans, this works fine, so I'd say this is
just a user error.

[Bug libfortran/98825] Unexpected behavior of FORTRAN FORMAT expressions when suppressing new line with '$'

2021-01-27 Thread max.pd at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98825

--- Comment #5 from max.pd at gmx dot de ---
The -fdec compiler flag provides a possible work around. When opening a Unit
with CARRIAGECONTROL='NONE' (an option available with DEC extensions in
gfortran), the program won't show the unexpected behavior any more. But there
would be no way to enable the carriage return between records for that io-unit
omitting '$' in the format expression. This work around makes it necessary to
open a new unit for stdout output:

   OPEN (UNIT=7, FILE='/dev/stdout', CARRIAGECONTROL='NONE')

 This feature is documented in:

https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gfortran/Extended-I_002fO-specifiers.html

The '$'-fin format works well for gfortran on single records without any
compiler flags. So it might be coherent, to make a patch, that affects the full
scope of the '$' format expressions, even those compiled without the -fdec
compiler flags. So the patch would cover all possible occurrences of the
unexpected behavior when writing multiple record output.

[Bug rtl-optimization/97684] [11 Regression] ICE in reg_preferred_class, at reginfo.c:789 by r11-4577

2021-01-27 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97684

Vladimir Makarov  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #5 from Vladimir Makarov  ---
I've reproduced x86-64 case and started to work on it.  I think the patch will
be ready soon.

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Richard Biener  ---
Fixed.

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:c91db798ec65b3e55f2380ca1530ecb71544f1bb

commit r11-6934-gc91db798ec65b3e55f2380ca1530ecb71544f1bb
Author: Richard Biener 
Date:   Wed Jan 27 15:20:58 2021 +0100

tree-optimization/98854 - avoid some PHI BB vectorization

This avoids cases of PHI node vectorization that just causes us
to insert vector CTORs inside loops for values only required
outside of the loop.

2021-01-27  Richard Biener  

PR tree-optimization/98854
* tree-vect-slp.c (vect_build_slp_tree_2): Also build
PHIs from scalars when the number of CTORs matches the
number of children.

* gcc.dg/vect/bb-slp-pr98854.c: New testcase.

[Bug inline-asm/98847] Miscompilation with c++17, templates, and register keyword

2021-01-27 Thread programmerjake at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98847

--- Comment #4 from programmerjake at gmail dot com ---
(In reply to Jakub Jelinek from comment #3)
> Created attachment 50066 [details]
> gcc11-pr98847.patch
> 
> Untested fix.

That will probably also fix bug #98846

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #10 from Jakub Jelinek  ---
./cc1 -quiet -nostdinc -O3 -mcpu=iwmmxt pr98849-2.c -fdump-tree-all-folding
-mfpu=neon
cc1: error: iWMMXt and NEON are incompatible
So I think TARGET_NEON && TARGET_REALLY_IWMMXT is never true.
Don't know if TARGET_HAVE_MVE && TARGET_REALLY_IWMMXT is similarly never true,
but I'd guess so.  So perhaps just add && !TARGET_REALLY_IWMMXT to the two
conditions.  It is also unclear why you call the pattern mve_* when it is used
by both neon and mve.

[Bug c++/98824] [C++-20] function template non-type-class-arg deduction fails with a reason that looks bogus

2021-01-27 Thread dimitri.gorokhovik at free dot fr via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98824

--- Comment #1 from Dimitri Gorokhovik  ---
It doesn't seem to contradict N4868 :-(

Modifying the code slightly (adding refs, splitting deduction across two fn
templates) didn't show any other differences from clang: all other modification
either both pass or both fail with the equivalent messages ("'i' cannot be
deduced"). This is the only one.

clang version:
Ubuntu clang version
12.0.0-++20201102052620+327bf5c2d91-1~exp1~20201102163303.210
Target: x86_64-pc-linux-gnu

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #8 from rguenther at suse dot de  ---
On Wed, 27 Jan 2021, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854
> 
> --- Comment #7 from Martin Liška  ---
> > I used -O3 but -O2 -ftree-slp-vectorize also vectorizes it.
> 
> I must be blind, but I see for the current master:
> 
> gcc pr98854.c -c -O2 -ftree-slp-vectorize -fdump-tree-optimized=/dev/stdout
> 
> foo (int n)
> {
>   unsigned long ivtmp.8;
>   double y;
>   double x;
>   double _6;
>   double _8;
>   double _9;
>   double _11;
>   int _14;
>   void * _29;
>   unsigned long _31;
> 
>   :
>   ivtmp.8_28 = (unsigned long) &MEM[(void *)&a + 8184B];
>   _31 = (unsigned long) &a;
> 
>   :
>   # x_1 = PHI <0.0(2), x_10(5)>
>   # y_2 = PHI <0.0(2), y_12(5)>
>   # ivtmp.8_18 = PHI 
>   _29 = (void *) ivtmp.8_18;
>   _6 = MEM[base: _29, offset: 0B];
>   _8 = MEM[base: _29, offset: 8B];
>   _9 = _6 + _8;
>   x_10 = _9 + x_1;
>   _11 = _6 / _8;
>   y_12 = _11 + y_2;
>   _14 = bar ();
>   if (_14 != 0)
> goto ;
>   else
> goto ;
> 
>   :
>   a[0] = x_10;
>   a[1] = y_12;
>   return;
> 
>   :
>   ivtmp.8_27 = ivtmp.8_18 - 8;
>   if (ivtmp.8_27 != _31)
> goto ;
>   else
> goto ;
> 
> }

Hmm, maybe my dev tree has related adjustments to SLP ... at least
the posted patch fixes the regression for me.

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #7 from Martin Liška  ---
> I used -O3 but -O2 -ftree-slp-vectorize also vectorizes it.

I must be blind, but I see for the current master:

gcc pr98854.c -c -O2 -ftree-slp-vectorize -fdump-tree-optimized=/dev/stdout

foo (int n)
{
  unsigned long ivtmp.8;
  double y;
  double x;
  double _6;
  double _8;
  double _9;
  double _11;
  int _14;
  void * _29;
  unsigned long _31;

  :
  ivtmp.8_28 = (unsigned long) &MEM[(void *)&a + 8184B];
  _31 = (unsigned long) &a;

  :
  # x_1 = PHI <0.0(2), x_10(5)>
  # y_2 = PHI <0.0(2), y_12(5)>
  # ivtmp.8_18 = PHI 
  _29 = (void *) ivtmp.8_18;
  _6 = MEM[base: _29, offset: 0B];
  _8 = MEM[base: _29, offset: 8B];
  _9 = _6 + _8;
  x_10 = _9 + x_1;
  _11 = _6 / _8;
  y_12 = _11 + y_2;
  _14 = bar ();
  if (_14 != 0)
goto ;
  else
goto ;

  :
  a[0] = x_10;
  a[1] = y_12;
  return;

  :
  ivtmp.8_27 = ivtmp.8_18 - 8;
  if (ivtmp.8_27 != _31)
goto ;
  else
goto ;

}

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #6 from rguenther at suse dot de  ---
On Wed, 27 Jan 2021, marxin at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854
> 
> --- Comment #5 from Martin Li?ka  ---
> (In reply to Richard Biener from comment #4)
> > Little bit convoluted testcase:
> > 
> > double a[1024];
> > 
> > int bar();
> > void foo (int n)
> > {
> >   double x = 0, y = 0;
> >   int i = 1023;
> >   do
> > {
> >   x += a[i] + a[i+1];
> >   y += a[i] / a[i+1];
> >   if (bar ())
> > break;
> > }
> >   while (--i);
> >   a[0] = x;
> >   a[1] = y;
> > }
> > 
> 
> What compiler (ISA options) do you use in order to vectorize this?

I used -O3 but -O2 -ftree-slp-vectorize also vectorizes it.

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Target Milestone|--- |11.0
   Keywords||missed-optimization
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener  ---
I will have a look.

[Bug tree-optimization/98766] [10 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits

2021-01-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
  Known to fail|11.0|

--- Comment #7 from ktkachov at gcc dot gnu.org ---
Fixed on branch too.

[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644

2021-01-27 Thread dmjpp at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466

--- Comment #3 from Dimitrij Mijoski  ---
(In reply to Jonathan Wakely from comment #2)
> This was already fixed on master by r11-6682
> 05a30af3f237984b4dcf1dbbc17fdac583c46506

Yes, that patch mostly fixes bug 70303, too. With that patch, the asserts
presented in bug 70303 pass for vector::iterator but not for
deque::iterator.

[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #2 from Richard Biener  ---
OK, let's see whether the fix for 98854 makes a difference before investigating
closer.

[Bug tree-optimization/98766] [10 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766

--- Comment #6 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Kyrylo Tkachov
:

https://gcc.gnu.org/g:e753db89ddcc7f005fd54f861375bcdc85f23335

commit r10-9305-ge753db89ddcc7f005fd54f861375bcdc85f23335
Author: Kyrylo Tkachov 
Date:   Thu Jan 21 16:33:49 2021 +

tree-ssa-mathopts: Use proper poly_int64 comparison with
param_avoid_fma_max_bits [PR 98766]

We ICE here because we end up comparing a poly_int64 with a scalar using
<= rather than maybe_le.
This patch fixes that in the way rich suggests in the PR.

gcc/ChangeLog:

PR tree-optimization/98766
* tree-ssa-math-opts.c (convert_mult_to_fma): Use maybe_le when
comparing against type size with param_avoid_fma_max_bits.

gcc/testsuite/ChangeLog:

PR tree-optimization/98766
* gcc.dg/pr98766.c: New test.

(cherry picked from commit 9d33785f57daf29dc0c106c919da319fe1906bc6)

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #5 from Martin Liška  ---
(In reply to Richard Biener from comment #4)
> Little bit convoluted testcase:
> 
> double a[1024];
> 
> int bar();
> void foo (int n)
> {
>   double x = 0, y = 0;
>   int i = 1023;
>   do
> {
>   x += a[i] + a[i+1];
>   y += a[i] / a[i+1];
>   if (bar ())
> break;
> }
>   while (--i);
>   a[0] = x;
>   a[1] = y;
> }
> 

What compiler (ISA options) do you use in order to vectorize this?

[Bug c++/83417] Pointer-to-member template parameter with auto member type dependent container type does not work (C++17)

2021-01-27 Thread davveston at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83417

David Friberg  changed:

   What|Removed |Added

 CC||davveston at gmail dot com

--- Comment #3 from David Friberg  ---
The same holds for the case of function pointers.

Given the following function:

 void f(int) {}

Both examples (A) and (B) below are well-formed, as per [temp.deduct.type]/13
(and for (B): also as per [temp.arg.nontype]/1).

 // Example (A)
 template 
 struct A;

 template 
 struct A { };

 A a{};  // #1: OK

 // Example (B)
 template 
 struct B;

 template 
 struct B { };

 B b{};  // #2: Rejected (type deduction failure in partial specialization)

Clang accepts both, whereas GCC (trunk/any version I've tried that supports
C++17) rejects example (B), as #2 is resolved to the primary (non-defined)
class template after failing to deduce the dependent 'T' from 'auto (*fp)(T)'
in the partial specialization, given the argument 'f' to the latter (non-type)
template parameter.

[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #25 from Richard Biener  ---
Oh, so it's not actually that plus_constant calls but the ones called via
get_addr from true_dependence_1 which is called 60 million times from
check_mem_read_use.  That does:

/* Convert the address X into something we can use.  This is done by returning
   it unchanged unless it is a VALUE or VALUE +/- constant; for VALUE
   we call cselib to get a more useful rtx.  */

rtx
get_addr (rtx x)
{
  cselib_val *v;
  struct elt_loc_list *l;

  if (GET_CODE (x) != VALUE)
{
  if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)
  && GET_CODE (XEXP (x, 0)) == VALUE
  && CONST_SCALAR_INT_P (XEXP (x, 1)))
{
  rtx op0 = get_addr (XEXP (x, 0));
  if (op0 != XEXP (x, 0))
{
  poly_int64 c;
  if (GET_CODE (x) == PLUS
  && poly_int_rtx_p (XEXP (x, 1), &c))
return plus_constant (GET_MODE (x), op0, c);

thus undoing the valueization DSE does.  Since it unconditionally does
this I guess DSE could do it itself instead.  That helps tremendously:

 dead store elim2   :   6.34 ( 11%)   0.02 (  7%)   6.38 ( 11%)
  170M ( 45%)
 TOTAL  :  56.96  0.27 57.26   
  381M
56.96user 0.29system 0:57.27elapsed 99%CPU (0avgtext+0avgdata
825148maxresident)k
0inputs+0outputs (0major+210372minor)pagefaults 0swaps

diff --git a/gcc/dse.c b/gcc/dse.c
index c88587e7d94..da0df54a2dd 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
 }
   if (maybe_ne (offset, 0))
 mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
+  /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence
+ which will over and over re-create proper RTL and re-apply the
+ offset above.  See PR80960 where we almost allocate 1.6GB of PLUS
+ RTXen that way.  */
+  mem_addr = get_addr (mem_addr);

   if (group_id >= 0)
 {

[Bug c++/98857] New: Add support for function attributes applied to function pointers from non-capturing lambdas

2021-01-27 Thread koncek.marian at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98857

Bug ID: 98857
   Summary: Add support for function attributes applied to
function pointers from non-capturing lambdas
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: koncek.marian at gmail dot com
  Target Milestone: ---

Created attachment 50070
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50070&action=edit
example

Since non-capturing lambda has to be convertible to a function pointer it may
be useful to be able to specify function attributes (such as
[[gnu::aligned(N)]]) which apply to the function pointer obtained from such
lambda.

Example use: https://godbolt.org/z/ds1G6z
(also attached)

Although this opens some questions about which attribute applies to any of the:
1) lambda object
2) member application function: &decltype(lambda)::operator()
3) function pointer from non-capturing lambdas

and where to place the attribute specifier.

[Bug target/98853] [9/10/11 Regression] wrong use of bfxil at -O1

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853

--- Comment #3 from Jakub Jelinek  ---
Created attachment 50069
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50069&action=edit
gcc11-pr98853.patch

Untested fix.

[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855

--- Comment #1 from Martin Liška  ---
And likely something similar happens since the same revision:

botan/KASUMI decrypt
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=245.694.1&plot.1=171.694.1

[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

Martin Liška  changed:

   What|Removed |Added

  Known to fail||11.0
   Last reconfirmed||2021-01-27
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
  Known to work||10.2.0

[Bug tree-optimization/98856] New: [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

Bug ID: 98856
   Summary: [11 Regression] botan AES-128/XTS is slower by ~17%
since
r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Since the revision the following is slower:

$ make clean && ./configure.py --cxxflags="-Ofast -march=znver2 -fno-checking"
&& make -j16 && ./botan speed AES-128/XTS

as seen here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=226.721.1&plot.1=14.721.1&;

[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644

2021-01-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466

--- Comment #2 from Jonathan Wakely  ---
This was already fixed on master by r11-6682
05a30af3f237984b4dcf1dbbc17fdac583c46506

[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644

2021-01-27 Thread dmjpp at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466

Dimitrij Mijoski  changed:

   What|Removed |Added

 CC||dmjpp at hotmail dot com

--- Comment #1 from Dimitrij Mijoski  ---
This bug looks like a duplicate of bug 70303. The asserts presented there
should be used on random-access iterators (vector, deque) to test if N3644 is
implement.

[Bug target/98853] [9/10/11 Regression] wrong use of bfxil at -O1

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|11.0|9.4
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
Summary|[11 Regression] wrong use   |[9/10/11 Regression] wrong
   |of bfxil at -O1 |use of bfxil at -O1

--- Comment #2 from Jakub Jelinek  ---
That change has been introduced in
r9-2905-g2dc09f66b3b49d821e4bd68d3c97ff51d5e080d4 , so I think we have at least
latent wrong-code in 9 and 10 too.

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #4 from Richard Biener  ---
Little bit convoluted testcase:

double a[1024];

int bar();
void foo (int n)
{
  double x = 0, y = 0;
  int i = 1023;
  do
{
  x += a[i] + a[i+1];
  y += a[i] / a[i+1];
  if (bar ())
break;
}
  while (--i);
  a[0] = x;
  a[1] = y;
}

where we end up with the {x, y} vector CTOR inside the loop (and even
spill/reload it because of the call).  We have a PHI node-only feed
for the vectorized store:

t.c:16:8: note: Vectorizing SLP tree:
t.c:16:8: note: node 0x3b21ee0 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: a[0] = x_22;
t.c:16:8: note: stmt 0 a[0] = x_22;
t.c:16:8: note: stmt 1 a[1] = y_21;
t.c:16:8: note: children 0x3b21f68
t.c:16:8: note: node 0x3b21f68 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: x_22 = PHI 
t.c:16:8: note: stmt 0 x_22 = PHI 
t.c:16:8: note: stmt 1 y_21 = PHI 
t.c:16:8: note: children 0x3b21ff0 0x3b22210
t.c:16:8: note: node 0x3b21ff0 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: x_26 = PHI 
t.c:16:8: note: stmt 0 x_26 = PHI 
t.c:16:8: note: stmt 1 y_24 = PHI 
t.c:16:8: note: children 0x3b22320
t.c:16:8: note: node (external) 0x3b22320 (max_nunits=1, refcnt=1)
t.c:16:8: note: { x_14, y_15 }
t.c:16:8: note: node 0x3b22210 (max_nunits=2, refcnt=1)
t.c:16:8: note: op template: x_25 = PHI 
t.c:16:8: note: stmt 0 x_25 = PHI 
t.c:16:8: note: stmt 1 y_23 = PHI 
t.c:16:8: note: children 0x3b223a8
t.c:16:8: note: node (external) 0x3b223a8 (max_nunits=1, refcnt=1)
t.c:16:8: note: { x_14, y_15 }

fixing this issue fixes the slowdown.  Testing a patch.

[Bug target/98853] [11 Regression] wrong use of bfxil at -O1

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
I admit I know next to nothing about AArch64, but the
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html
patch certainly doesn't emit what it claims to, and from a brief look at the
assembler guide it appears that emitting what the patch claims to shall fix it.

So I think this should be:
--- gcc/config/aarch64/aarch64.md.jj2021-01-04 10:25:46.435147744 +0100
+++ gcc/config/aarch64/aarch64.md   2021-01-27 15:13:13.993275204 +0100
@@ -5724,10 +5724,10 @@ (define_insn "*aarch64_bfxilsi_uxtw"
 {
   case 0:
operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
-   return "bfxil\\t%0, %1, 0, %3";
+   return "bfxil\\t%w0, %w1, 0, %3";
   case 1:
operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
-   return "bfxil\\t%0, %2, 0, %3";
+   return "bfxil\\t%w0, %w2, 0, %3";
   default:
gcc_unreachable ();
 }

[Bug ipa/98815] Redundant free_dominance_info in cgraph_node::analyze()

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98815

--- Comment #4 from Martin Liška  ---
I can confirm the patch survives bootstrap and regression tests.
I'm going to send it at the beginning of the next stage1.

[Bug libstdc++/66414] string::find ten times slower than strstr

2021-01-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:a199da782fc165fd45f42a15cc9020994efd455d

commit r11-6931-ga199da782fc165fd45f42a15cc9020994efd455d
Author: Jonathan Wakely 
Date:   Wed Jan 27 13:21:52 2021 +

libstdc++: Optimize std::string_view::find [PR 66414]

This reuses the code from std::string::find, which was improved by
r244225, but string_view was not changed to match.

libstdc++-v3/ChangeLog:

PR libstdc++/66414
* include/bits/string_view.tcc
(basic_string_view::find(const CharT*, size_type, size_type)):
Optimize.

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #3 from Richard Biener  ---
OK, one can see it with BB vectorization enabled vs. disabled.

Bad:

Samples: 7K of event 'cycles:u', Event count (approx.): 7540324763  
Overhead   Samples  Command  Shared Object   Symbol 
  53.11%  3711  a.outa.out   [.] shade
  25.39%  1774  a.outa.out   [.] trace
  18.16%  1271  a.outa.out   [.] render_scanline
   1.56%   109  a.outlibm-2.26.so[.] __ieee754_pow_sse2

Good:

Samples: 6K of event 'cycles:u', Event count (approx.): 6673802579  
Overhead   Samples  Command  Shared Object   Symbol 
  61.21%  3857  a.outa.out   [.] shade
  20.44%  1288  a.outa.out   [.] trace
  14.42%   912  a.outa.out   [.] render_scanline
   1.81%   114  a.outlibm-2.26.so[.] __ieee754_pow_sse2

With added -fwhole-program we have

c-ray-mt.c:624:18: optimized: basic block part vectorized using 32 byte vectors
c-ray-mt.c:372:13: optimized: basic block part vectorized using 32 byte vectors
c-ray-mt.c:372:13: optimized: basic block part vectorized using 32 byte vectors
c-ray-mt.c:432:9: optimized: basic block part vectorized using 32 byte vectors
c-ray-mt.c:656:7: optimized: basic block part vectorized using 32 byte vectors
c-ray-mt.c:656:7: optimized: basic block part vectorized using 32 byte vectors
c-ray-mt.c:265:23: optimized: basic block part vectorized using 32 byte vectors

:372 is bad and then :656

For the first we vectorize a store

   [local count: 31445960]:
  # nearest_obj_239 = PHI 
...
  _816 = {nearest_sp_pos_x_lsm.258_78, nearest_sp_pos_y_lsm.259_174,
nearest_sp_pos_z_lsm.260_201, nearest_sp_normal_x_lsm.261_200};
  _820 = {nearest_sp_normal_y_lsm.262_122, nearest_sp_normal_z_lsm.263_293,
nearest_sp_vref_x_lsm.264_124, nearest_sp_vref_y_lsm.265_148};
  iter_231 = iter_363->next;
  if (iter_231 != 0B)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 27986904]:
  goto ; [100.00%]

   [local count: 3459055]:
  # nearest_sp_dist_lsm.257_228 = PHI 
  # nearest_sp_pos_x_lsm.258_226 = PHI 
  # nearest_sp_normal_y_lsm.262_343 = PHI 
  # nearest_sp_vref_x_lsm.264_238 = PHI 
  # nearest_sp_vref_y_lsm.265_237 = PHI 
  # nearest_sp_vref_z_lsm.266_236 = PHI 
  # nearest_sp_pos_y_lsm.259_342 = PHI 
  # nearest_sp_normal_x_lsm.261_351 = PHI 
  # nearest_sp_pos_z_lsm.260_304 = PHI 
  # nearest_obj_197 = PHI 
  # nearest_sp_normal_z_lsm.263_821 = PHI 
  # vect_nearest_sp_pos_x_lsm.258_226.268_815 = PHI <_816(26)>
  # vect_nearest_sp_pos_x_lsm.258_226.268_814 = PHI <_820(26)>
  nearest_sp.vref.z = nearest_sp_vref_z_lsm.266_236;
  MEM  [(double *)&nearest_sp] =
vect_nearest_sp_pos_x_lsm.258_226.268_815;
  _812 = &nearest_sp.pos.x + 32;
  MEM  [(double *)_812] =
vect_nearest_sp_pos_x_lsm.258_226.268_814;

but we insert the vector CTOR on a path that's more often executed than
the use.  And since there's no sinking pass after vectorization nothing
fixes this up.

[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855

Martin Liška  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 CC||rguenth at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=98854

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #9 from Jakub Jelinek  ---
(In reply to Christophe Lyon from comment #6)
> so to answer your question arm does have vector shift by scalar.

If it does, it doesn't advertize them:
make mddump
grep '"v\?ashlv[0-9qhsdi]*3"' tmp-mddump.md 
(define_expand ("vashlv8qi3")
(define_expand ("vashlv16qi3")
(define_expand ("vashlv4hi3")
(define_expand ("vashlv8hi3")
(define_expand ("vashlv2si3")
(define_expand ("vashlv4si3")
Ditto for ashr and lshr instead of ashl.

[Bug tree-optimization/98855] New: [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855

Bug ID: 98855
   Summary: [11 Regression] botan XTEA is 100% slower on znver2
since r11-4428-g4a369d199bf2f34e
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
  Target Milestone: ---

Since the revision the following is now slower:

$ make clean && ./configure.py --cxxflags="-Ofast -march=znver2" && make -j16
&& ./botan speed XTEA

as seen here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=245.710.1&plot.1=171.710.1&;

Algorithm is implemented here:
src/lib/block/xtea/xtea.cpp

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #8 from Jakub Jelinek  ---
Seems vec_init optab is supported if TARGET_NEON || TARGET_HAVE_MVE, so maybe
guard the shift expander also on && (TARGET_NEON || TARGET_HAVE_MVE)?
Or && !TARGET_REALLY_IWMMXT.  Dunno if one can mix iwmmxt with neon or mve etc.

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #7 from Christophe Lyon  ---
(In reply to ktkachov from comment #5)
> Looks like after the refactoring to introduce MVE shifts (which doesn't ICE)
> we need to make sure the optab is still disabled for iwmmxt?

So that would mean that ARM_HAVE__ARITH shouldn't be defined for iwmmxt
(only for shifts?) ?

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #6 from Christophe Lyon  ---
I'm not familiar with iwmmxt, but the testcase in comment #2 is vectorized
with:
* -mcpu=cortex-a9 -mfpu=auto -mfloat-abi=hard (uses Neon FPU)
* -mcpu=cortex-m55 -mfpu=auto -mfloat-abi=hard (uses MVE/Helium FPU)
in both cases -mfloat-abi=hard is required.

Using -mcpu=iwmmxt -mfpu=auto -mfloat-abi=hard fails because:
cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU

so to answer your question arm does have vector shift by scalar.

But the Neon/MVE patterns use a const_vector constraint (see
mve_vshlq_ and vashl3 in vec-common.md and ashl3_iwmmxt
in iwmmxt.md)

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Looks like after the refactoring to introduce MVE shifts (which doesn't ICE) we
need to make sure the optab is still disabled for iwmmxt?

[Bug c++/98531] [11 Regression] g++.dg/modules/xtreme-header-2_a.H etc. FAIL

2021-01-27 Thread nathan at acm dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98531

--- Comment #8 from Nathan Sidwell  ---
On 1/27/21 8:30 AM, ro at CeBiTec dot Uni-Bielefeld.DE wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98531
> 
> --- Comment #7 from ro at CeBiTec dot Uni-Bielefeld.DE  Uni-Bielefeld.DE> ---
> Nathan,
> 
> last night I've tried the patch you posted on both i386-pc-solaris2.11
> and sparc-sun-solaris2.11, with mixed results:
> 
> * The new g++.dg/modules/pr98531_* testcases PASS.
> 
> * However, there's a libstdc++ regression:
> 
> +FAIL: 17_intro/headers/c++1998/all_attributes.cc (test for excess errors)
> +FAIL: 17_intro/headers/c++2011/all_attributes.cc (test for excess errors)
> +FAIL: 17_intro/headers/c++2014/all_attributes.cc (test for excess errors)
> +FAIL: 17_intro/headers/c++2017/all_attributes.cc (test for excess errors)
> 
> Excess errors:
> /vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error:
> declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*)
> throw ()' has a different exception specifier

thanks, I'm finding this too -- thankful I didn;t push the patch!  this 
is indicative there is a mismatch between the runtime library and the 
compiler's idea of it.
> 
>i.e.
> 
> In file included from
> /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:40:
> /vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error:
> declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*)
> throw ()' has a different exception specifier
> In file included from
> /var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/extc++.h:68,
>   from
> /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:39:
> /var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/ext/throw_allocator.h:371:
> note: from previous declaration 'int __cxxabiv1::__cxa_atexit(void (*)(void*),
> void*, void*)'
> 
>where cxxabi.h has
> 
> #ifdef _GLIBCXX_CDTOR_CALLABI
>__cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*)
> _GLIBCXX_NOTHROW;
> #else
>__cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW;
> #endif
> 
> * Besides, the ICE in the original testcases remains:
> 
> /vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/modules/xtreme-header-2_a.H:
> internal compiler error: in tree_node, at cp/module.cc:9137
> 
> 
>I'm uncertain if the patch was just meant as a preparatory step to fix
>those or something else is amiss.

thanks, I was going to revisit the original report to see if there were 
further issues.

nathan

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug libstdc++/66414] string::find ten times slower than strstr

2021-01-27 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414

--- Comment #9 from Jonathan Wakely  ---
(In reply to AK from comment #8)
> Should we consider this fixed?

I think we can still do better, by using GNU memmem when it's available:

https://gcc.gnu.org/pipermail/gcc-patches/2017-January/466460.html
https://gcc.gnu.org/pipermail/gcc-patches/2017-January/466469.html
https://gcc.gnu.org/pipermail/gcc-patches/2017-January/466471.html

For now we should also use the new code in basic_string_view::find which is
currently much slower.

[Bug c++/98531] [11 Regression] g++.dg/modules/xtreme-header-2_a.H etc. FAIL

2021-01-27 Thread ro at CeBiTec dot Uni-Bielefeld.DE via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98531

--- Comment #7 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
Nathan,

last night I've tried the patch you posted on both i386-pc-solaris2.11
and sparc-sun-solaris2.11, with mixed results:

* The new g++.dg/modules/pr98531_* testcases PASS.

* However, there's a libstdc++ regression:

+FAIL: 17_intro/headers/c++1998/all_attributes.cc (test for excess errors)
+FAIL: 17_intro/headers/c++2011/all_attributes.cc (test for excess errors)
+FAIL: 17_intro/headers/c++2014/all_attributes.cc (test for excess errors)
+FAIL: 17_intro/headers/c++2017/all_attributes.cc (test for excess errors)

Excess errors:
/vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error:
declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*)
throw ()' has a different exception specifier

  i.e.

In file included from
/vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:40:
/vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error:
declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*)
throw ()' has a different exception specifier
In file included from
/var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/extc++.h:68,
 from
/vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:39:
/var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/ext/throw_allocator.h:371:
note: from previous declaration 'int __cxxabiv1::__cxa_atexit(void (*)(void*),
void*, void*)'

  where cxxabi.h has

#ifdef _GLIBCXX_CDTOR_CALLABI
  __cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*)
_GLIBCXX_NOTHROW;
#else
  __cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW;
#endif

* Besides, the ICE in the original testcases remains:

/vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/modules/xtreme-header-2_a.H:
internal compiler error: in tree_node, at cp/module.cc:9137


  I'm uncertain if the patch was just meant as a preparatory step to fix
  those or something else is amiss.

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

Jakub Jelinek  changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org,
   ||rearnsha at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Ah, so on powerpc64le this works fine, expand_binop has:
1558  /* If this is a vector shift by a scalar, see if we can do a vector
1559 shift by a vector.  If so, broadcast the scalar into a vector.  */
1560  if (mclass == MODE_VECTOR_INT)
1561{
1562  optab otheroptab = unknown_optab;
1563
1564  if (binoptab == ashl_optab)
1565otheroptab = vashl_optab;
1566  else if (binoptab == ashr_optab)
1567otheroptab = vashr_optab;
1568  else if (binoptab == lshr_optab)
1569otheroptab = vlshr_optab;
1570  else if (binoptab == rotl_optab)
1571otheroptab = vrotl_optab;
1572  else if (binoptab == rotr_optab)
1573otheroptab = vrotr_optab;
1574
1575  if (otheroptab
1576  && (icode = optab_handler (otheroptab, mode)) !=
CODE_FOR_nothing)
1577{
1578  /* The scalar may have been extended to be too wide. 
Truncate
1579 it back to the proper size to fit in the broadcast vector.
 */
1580  scalar_mode inner_mode = GET_MODE_INNER (mode);
1581  if (!CONST_INT_P (op1)
1582  && (GET_MODE_BITSIZE (as_a  (GET_MODE
(op1)))
1583  > GET_MODE_BITSIZE (inner_mode)))
1584op1 = force_reg (inner_mode,
1585 simplify_gen_unary (TRUNCATE, inner_mode,
op1,
1586 GET_MODE (op1)));
1587  rtx vop1 = expand_vector_broadcast (mode, op1);
1588  if (vop1)
1589{
1590  temp = expand_binop_directly (icode, mode, otheroptab,
op0, vop1,
1591target, unsignedp, methods,
last);
1592  if (temp)
(gdb) 
1593return temp;
1594}
1595}
1596}
code for this.  It doesn't work in the ARM case, because it doesn't support
either vec_duplicate_optab nor vec_init_optab for the mode.

I'm declaring this a backend bug, it shouldn't advertise such vector shifts in
configurations in which it can't even init such vectors.

[Bug c/98852] [11 Regression] Conditional expression wrongly rejected for arm_neon.h vectors

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98852

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug target/98853] [11 Regression] wrong use of bfxil at -O1

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2021-01-27
   Target Milestone|--- |11.0

--- Comment #2 from Richard Biener  ---
I will have a look.

[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

--- Comment #24 from Richard Biener  ---
And we allocate

plus 66M  1606M

66 million PLUS RTXen via

explow.c:200 (plus_constant) 0 :  0.0% 1596M:
92.0%0 :  0.0%0 :  0.0%   66M

called by DSE check_mem_read_rtx and record_store.  Ideally we'd not need
any of that via an interface change to canon_true_dependence and friends
(pass in an optional offset).

Most of the time the plus RTX is already present in the original MEM.  Like

Breakpoint 6, record_store (body=0x742caa98, bb_info=0x3ea3b60)
at /home/rguenther/src/gcc2/gcc/dse.c:1529
1529mem_addr = plus_constant (get_address_mode (mem), mem_addr,
offset);
(reg/f:DI 19 frame)
$14 = void
(gdb) p debug_rtx (mem)
(mem/c:DI (plus:DI (reg/f:DI 19 frame)
(const_int -440 [0xfe48])) [1 MEM[(struct __st_parameter_dt
*)_13].format_len+0 S8 A64])
$15 = void
(gdb) p offset
$16 = {> = {coeffs = {-440}}, }

trivially pattern matching existing PLUS like

  if (MEM_P (mem)
  && GET_CODE (XEXP (mem, 0)) == PLUS
  && XEXP (XEXP (mem, 0), 0) == mem_addr
  && CONST_INT_P (XEXP (XEXP (mem, 0), 1))
  && known_eq (offset, INTVAL (XEXP (XEXP (mem, 0), 1
mem_addr= XEXP (mem, 0);
  else
mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);

doesn't help much.  Most cases seem to be build over (value:...) RTXen,
those we could ggc_free I presume.  Doing that in check_mem_read_rtx
doesn't help though.

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #3 from Jakub Jelinek  ---
For #c2 I've tried:
--- gcc/tree-vect-generic.c.jj  2021-01-04 10:25:38.289239984 +0100
+++ gcc/tree-vect-generic.c 2021-01-27 13:53:28.457752505 +0100
@@ -2147,16 +2147,21 @@ expand_vector_operations_1 (gimple_stmt_
   || code == LROTATE_EXPR
   || code == RROTATE_EXPR)
 {
-  optab opv;
+  optab opv = optab_for_tree_code (code, type, optab_vector);

   /* Check whether we have vector  {x,x,x,x} where x
  could be a scalar variable or a constant.  Transform
- vector  {x,x,x,x} ==> vector  scalar.  */
+ vector  {x,x,x,x} ==> vector  scalar, unless
+the backend only supports vector  by vector and not
+vecot  by scalar.  */
   if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
 {
   tree first;

-  if ((first = ssa_uniform_vector_p (rhs2)) != NULL_TREE)
+  op = optab_for_tree_code (code, type, optab_scalar);
+  if ((first = ssa_uniform_vector_p (rhs2)) != NULL_TREE
+ && (get_compute_type (code, opv, type) != type
+ || get_compute_type (code, op, type) == type))
 {
   gimple_assign_set_rhs2 (stmt, first);
   update_stmt (stmt);
@@ -2164,7 +2169,6 @@ expand_vector_operations_1 (gimple_stmt_
 }
 }

-  opv = optab_for_tree_code (code, type, optab_vector);
   if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2)))
op = opv;
   else
but that doesn't really help, because while veclower21 doesn't undo what the
vectorizer carefully did, match.pd during fre5 breaks it again:
 /* Prefer vector1 << scalar to vector1 << vector2
if vector2 is uniform.  */
 (for vec (VECTOR_CST CONSTRUCTOR)
  (simplify
   (shiftrotate @0 vec@1)
   (with { tree tem = uniform_vector_p (@1); }
(if (tem)
 (shiftrotate @0 { tem; }))

So, does ARM really only have vector shifts and not scalar?
Though, PowerPC seems to have that too, I'll check out what it does on these
testcases.

[Bug c++/98843] Building simple c++ modules example fails but successful with -save-temps

2021-01-27 Thread nathan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98843

--- Comment #2 from Nathan Sidwell  ---
thanks Gary, I expect to be able to reprduce the iostream.ii myself, and
particularly as (the lack of) -save-temps seems to be significant, I'll
probably need to.

[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

--- Comment #1 from Martin Liška  ---
One can see it here:

https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=245.639.0&plot.1=171.639.0&;

[Bug tree-optimization/98854] New: [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e

2021-01-27 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854

Bug ID: 98854
   Summary: [11 Regression] cray benchmark is about 15% slower
since r11-4428-g4a369d199bf2f34e
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: marxin at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org
  Target Milestone: ---

Since the revision, the following is slower on znver2:

$ make clean && m CFLAGS="-Ofast -march=znver2 -g" && cat sphfract | ./c-ray-mt
-o /dev/null
c-ray-mt v1.1
Rendering took: 1 seconds (1798 milliseconds)

while GCC 10 has:

c-ray-mt v1.1
Rendering took: 1 seconds (1585 milliseconds)

[Bug tree-optimization/80198] [8/9/10/11 Regression] does not vectorize generic inplace integer operation

2021-01-27 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198

--- Comment #21 from rguenther at suse dot de  ---
On Wed, 27 Jan 2021, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198
> 
> --- Comment #20 from rsandifo at gcc dot gnu.org  gnu.org> ---
> (In reply to Richard Biener from comment #19)
> > So I think when you consider
> > 
> > void __attribute__((noinline)) fun(int * a, int * b, int c)
> > {
> >   int i;
> >   for (i=0; i < 256; i++) {
> >  a[i] = b[i] | c;
> >   }
> > }
> > 
> > we can improve the versioning condition to allow a dependence distance
> > of zero.
> This one was fixed by r10-4803.  E.g. for aarch64 we now have:
> 
> add x3, x1, 4
> sub x3, x0, x3
> cmp x3, 8
> bls .L5

Ah, yeah - I failed to decipher the generated check:

  _7 = b_12 + 4;
  _22 = a_10 - _7;
  _23 = (sizetype) _22;
  if (_23 > 8)

the difference is -4U and thus > 8 when a == b.

> > Likewise with
> > 
> > void __attribute__((noipa)) generic(int * a, int * b, int c)
> > {
> >   int i;
> >   a = __builtin_assume_aligned (a, 16);
> >   b = __builtin_assume_aligned (b, 16);
> >   for (i=0; i < 256; i++) {
> >   a[i] = b[i] | c;
> >   }
> > }
> > 
> > we fail to realize no versioning check is required - the distance is
> > either zero or a multiple of 16.
> > 
> > Richard - ISTR you added some alignment considerations to the alias
> > versioning code, but it doesn't seem to help?
> I don't remember adding anything for that, but yeah, I agree it looks
> like we need it.

[Bug target/98853] New: [11 Regression] wrong use of bfxil at -O1

2021-01-27 Thread zsojka at seznam dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853

Bug ID: 98853
   Summary: [11 Regression] wrong use of bfxil at -O1
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: aarch64-unknown-linux-gnu

Created attachment 50068
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50068&action=edit
reduced testcase

Output:
$ aarch64-unknown-linux-gnu-gcc -O testcase.c -static
$ ./a.out 
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted

$ aarch64-unknown-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r11-6925-20210127102218-g6cf43433750-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/11.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl
--with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu
--host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu
--with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld
--with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r11-6925-20210127102218-g6cf43433750-checking-yes-rtl-df-extra-aarch64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.0.0 20210127 (experimental) (GCC)

[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960

Richard Biener  changed:

   What|Removed |Added

  Known to work||4.3.4
  Known to fail||4.8.5

--- Comment #23 from Richard Biener  ---
So now I see

> /usr/bin/time gfortran-4.3 t.f90 -fdefault-integer-8 -O2 -ftime-report
 combiner  :   0.25 ( 2%) usr   0.00 ( 0%) sys   0.24 ( 2%) wall   
9947 kB ( 5%) ggc
 TOTAL :  15.43 0.2115.65
220667 kB
15.59user 0.24system 0:15.84elapsed 99%CPU (0avgtext+0avgdata
607492maxresident)k
0inputs+0outputs (0major+164981minor)pagefaults 0swaps

> /usr/bin/time gfortran-4.8 t.f90 -fdefault-integer-8 -O2 -ftime-report
 combiner:  90.22 (48%) usr   1.07 (63%) sys  91.33 (48%) wall
1757344 kB (88%) ggc
 TOTAL : 188.29 1.70   190.04   
2000994 kB
188.43user 1.73system 3:10.21elapsed 99%CPU (0avgtext+0avgdata
6523136maxresident)k
0inputs+0outputs (0major+1727565minor)pagefaults 0swaps

> /usr/bin/time gfortran-7 t.f90 -fdefault-integer-8 -O2 -fno-checking 
> -ftime-report
 combiner:  67.18 (64%) usr   0.56 (60%) sys  67.76 (64%) wall
2701121 kB (60%) ggc
 TOTAL : 105.40 0.93   106.36   
4530486 kB
105.54user 0.99system 1:46.58elapsed 99%CPU (0avgtext+0avgdata
3297696maxresident)k
48248inputs+0outputs (7major+835050minor)pagefaults 0swaps

> /usr/bin/time gfortran-10 t.f90 -fdefault-integer-8 -O2 -fno-checking 
> -ftime-report
 combiner   :   0.24 (  0%)   0.00 (  0%)   0.22 (  0%)
  10376 kB (  1%)
 TOTAL  :  52.02  0.49 52.52   
1876905 kB
52.16user 0.52system 0:52.71elapsed 99%CPU (0avgtext+0avgdata
1831392maxresident)k
55032inputs+0outputs (8major+539965minor)pagefaults 0swaps

(that combine number prevails on trunk as well, I can't spot any code
that disables combine on large BBs so not sure what goes on here)

At least clearly GCC 4.8.5 is bad as well and there's clear progression
on both memory use and compile-time, still not up to the level of GCC 4.3.

Interestingly memory-wise it all points to RTL DSE (GCC 10), likely
because of DF.  Eventually post-reload we can simplify some things...

 dead store elim2   :   6.90 ( 12%)   0.20 ( 27%)   7.12 ( 12%)
1641076 kB ( 87%)

[Bug c/98852] New: [11 Regression] Conditional expression wrongly rejected for arm_neon.h vectors

2021-01-27 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98852

Bug ID: 98852
   Summary: [11 Regression] Conditional expression wrongly
rejected for arm_neon.h vectors
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rsandifo at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64*-*-*

#include 

uint8x16_t
foo (int c, uint8x16_t x, uint8x16_t y)
{
  return c ? x + 1 : y;
}

is wrongly rejected for C, but not C++.  This is extracted from
comment 8 of PR96377.

[Bug tree-optimization/80198] [8/9/10/11 Regression] does not vectorize generic inplace integer operation

2021-01-27 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198

--- Comment #20 from rsandifo at gcc dot gnu.org  
---
(In reply to Richard Biener from comment #19)
> So I think when you consider
> 
> void __attribute__((noinline)) fun(int * a, int * b, int c)
> {
>   int i;
>   for (i=0; i < 256; i++) {
>  a[i] = b[i] | c;
>   }
> }
> 
> we can improve the versioning condition to allow a dependence distance
> of zero.
This one was fixed by r10-4803.  E.g. for aarch64 we now have:

add x3, x1, 4
sub x3, x0, x3
cmp x3, 8
bls .L5

> Likewise with
> 
> void __attribute__((noipa)) generic(int * a, int * b, int c)
> {
>   int i;
>   a = __builtin_assume_aligned (a, 16);
>   b = __builtin_assume_aligned (b, 16);
>   for (i=0; i < 256; i++) {
>   a[i] = b[i] | c;
>   }
> }
> 
> we fail to realize no versioning check is required - the distance is
> either zero or a multiple of 16.
> 
> Richard - ISTR you added some alignment considerations to the alias
> versioning code, but it doesn't seem to help?
I don't remember adding anything for that, but yeah, I agree it looks
like we need it.

[Bug tree-optimization/80198] [8/9/10/11 Regression] does not vectorize generic inplace integer operation

2021-01-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org
   Last reconfirmed|2017-03-27 00:00:00 |2021-1-27

--- Comment #19 from Richard Biener  ---
So I think when you consider

void __attribute__((noinline)) fun(int * a, int * b, int c)
{
  int i;
  for (i=0; i < 256; i++) {
 a[i] = b[i] | c;
  }
}

we can improve the versioning condition to allow a dependence distance
of zero.  Likewise with

void __attribute__((noipa)) generic(int * a, int * b, int c)
{
  int i;
  a = __builtin_assume_aligned (a, 16);
  b = __builtin_assume_aligned (b, 16);
  for (i=0; i < 256; i++) {
  a[i] = b[i] | c;
  }
}

we fail to realize no versioning check is required - the distance is
either zero or a multiple of 16.

Richard - ISTR you added some alignment considerations to the alias
versioning code, but it doesn't seem to help?

[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403

2021-01-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849

--- Comment #2 from Jakub Jelinek  ---
int a[1024], b[1024];

void
foo (void)
{
  for (int i = 0; i < 1024; i++)
a[i] = b[i] << 3;
}

void
bar (int x)
{
  for (int i = 0; i < 1024; i++)
a[i] = b[i] << x;
}

ICEs with -O3 -mcpu=iwmmxt too.  Here the vectorizer understands the target has
vector x vector shift and not vector x scalar, so we get:
  vect_cst__13 = { 3, 3 };
...
  vect__2.7_14 = vect__1.6_8 << vect_cst__13;
but veclower carelessly undoes that:
  vect__2.7_14 = vect__1.6_8 << 3;

  1   2   >