[Bug ipa/98594] [11 Regression] IPA modref codegen bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98594 --- Comment #4 from rguenther at suse dot de --- On Wed, 27 Jan 2021, hubicka at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98594 > > --- Comment #3 from Jan Hubicka --- > The initialization is removed by dse1 pass. We get: > ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int, > glm::packed_highp> (&D.3185); [return slot optimization] > ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const > glm::vec&) [with int L = 1; T = int; glm::qualifier Q = > glm::packed_highp]/8 does not use ref: D.3185.D.3097.x alias sets: 3->1 > Deleted dead store: D.3185.D.3097.x = x_2(D); > > > ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int, > glm::packed_highp> (&D.3185); [return slot optimization] > ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const > glm::vec&) [with int L = 1; T = int; glm::qualifier Q = > glm::packed_highp]/8 does not use ref: D.3185 alias sets: 3->3 > Deleted dead store: D.3185 ={v} {CLOBBER}; > > > Now the modref summary for function is > loads: > > Limits: 32 bases, 16 refs > > Base 0: alias set 5 > > Ref 0: alias set 5 > > access: Parm 0 param offset:0 offset:0 size:32 max_size:32 > > > alias set 5 correspond to const struct vec but diferent instantiation than > alias set 3 used in the store. > There is reinterpret cast: > > glm::vec::type, Q> x(*reinterpret_cast< glm::vec::type, Q> const *>(&v)); > > turning it to > > glm::vec::type, Q> x(*(&v)); > > makes the aliasing difference go away. So it seems to me that the testcase > simply includes TBAA violation? Not sure but if my visuals do not cheat me then the difference is only const qualification so it should not matter for TBAA? Of course the question is what type 'v' has since this maybe invokes a different CTOR?
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 --- Comment #2 from Richard Biener --- The cxx bench Botan doesn't know --cxxflags, what Botan version are you looking at?
[Bug c++/98861] I want deterministic exceptions (Herbception)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98861 Richard Biener changed: What|Removed |Added Severity|normal |enhancement Last reconfirmed||2021-01-28 Status|UNCONFIRMED |NEW Ever confirmed|0 |1
[Bug bootstrap/98860] [11 Regression] boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 Richard Biener changed: What|Removed |Added Summary|boostrap failure on |[11 Regression] boostrap |MinGW-w64 windows 10|failure on MinGW-w64 ||windows 10 Target Milestone|--- |11.0
[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 --- Comment #6 from cqwrteur --- configure:4069: ./conftest.exe /home/unlvs/mcf_build/src/gcc-git/libgomp/configure: line 4071: ./conftest.exe: cannot execute binary file: Exec format error configure:4073: $? = 126 configure:4080: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libgomp': configure:4082: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details
[Bug fortran/93524] [ISO C Binding][F2018] CFI_allocate – elem_size mishandled + sm wrongly set?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93524 Thomas Koenig changed: What|Removed |Added CC||tkoenig at gcc dot gnu.org --- Comment #3 from Thomas Koenig --- A related patch was applied at https://gcc.gnu.org/g:1cdca4261e88f4dc9c3293c6b3c2fff3071ca32b .
[Bug target/98799] [11 Regression] vector_set_var ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98799 --- Comment #6 from CVS Commits --- The master branch has been updated by Xiong Hu Luo : https://gcc.gnu.org/g:fbe37371cf372b84d5b7f1a6f5f0971a513dd5fa commit r11-6947-gfbe37371cf372b84d5b7f1a6f5f0971a513dd5fa Author: Xionghu Luo Date: Wed Jan 27 20:24:03 2021 -0600 rs6000: Fix vec insert ilp32 ICE and test failures [PR98799] UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for variable vector insert. Remove rs6000_expand_vector_set_var helper function, adjust the p8 and p9 definitions position and make them static. The previous commit r11-6858 missed check m32, This patch is tested pass on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with RUNTESTFLAGS="--target_board =unix'{-m32,-m64}'" for BE targets. gcc/ChangeLog: 2021-01-27 Xionghu Luo David Edelsohn PR target/98799 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Don't generate VIEW_CONVERT_EXPR for fcode ALTIVEC_BUILTIN_VEC_INSERT when -m32. * config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var): Delete. * config/rs6000/rs6000.c (rs6000_expand_vector_set): Remove the wrapper call rs6000_expand_vector_set_var for cleanup. Call rs6000_expand_vector_set_var_p9 and rs6000_expand_vector_set_var_p8 directly. (rs6000_expand_vector_set_var): Delete. (rs6000_expand_vector_set_var_p9): Make static. (rs6000_expand_vector_set_var_p8): Make static. gcc/testsuite/ChangeLog: 2021-01-27 Xionghu Luo PR target/98827 * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust ilp32. * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-double.c: Likewise. * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise. * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise. * gcc.target/powerpc/pr79251.p8.c: Likewise. * gcc.target/powerpc/pr79251.p9.c: Likewise. * gcc.target/powerpc/vsx-builtin-7.c: Likewise. * gcc.target/powerpc/pr79251-run.c: Build and run with vsx option.
[Bug target/98827] [11 regression] gcc.target/powerpc/vsx-builtin-7.c assembler counts off after r11-6857
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98827 --- Comment #4 from CVS Commits --- The master branch has been updated by Xiong Hu Luo : https://gcc.gnu.org/g:fbe37371cf372b84d5b7f1a6f5f0971a513dd5fa commit r11-6947-gfbe37371cf372b84d5b7f1a6f5f0971a513dd5fa Author: Xionghu Luo Date: Wed Jan 27 20:24:03 2021 -0600 rs6000: Fix vec insert ilp32 ICE and test failures [PR98799] UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for variable vector insert. Remove rs6000_expand_vector_set_var helper function, adjust the p8 and p9 definitions position and make them static. The previous commit r11-6858 missed check m32, This patch is tested pass on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with RUNTESTFLAGS="--target_board =unix'{-m32,-m64}'" for BE targets. gcc/ChangeLog: 2021-01-27 Xionghu Luo David Edelsohn PR target/98799 * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Don't generate VIEW_CONVERT_EXPR for fcode ALTIVEC_BUILTIN_VEC_INSERT when -m32. * config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var): Delete. * config/rs6000/rs6000.c (rs6000_expand_vector_set): Remove the wrapper call rs6000_expand_vector_set_var for cleanup. Call rs6000_expand_vector_set_var_p9 and rs6000_expand_vector_set_var_p8 directly. (rs6000_expand_vector_set_var): Delete. (rs6000_expand_vector_set_var_p9): Make static. (rs6000_expand_vector_set_var_p8): Make static. gcc/testsuite/ChangeLog: 2021-01-27 Xionghu Luo PR target/98827 * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust ilp32. * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-double.c: Likewise. * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise. * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise. * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise. * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise. * gcc.target/powerpc/pr79251.p8.c: Likewise. * gcc.target/powerpc/pr79251.p9.c: Likewise. * gcc.target/powerpc/vsx-builtin-7.c: Likewise. * gcc.target/powerpc/pr79251-run.c: Build and run with vsx option.
[Bug c++/98862] New: Complex reduction support in offload region
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98862 Bug ID: 98862 Summary: Complex reduction support in offload region Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: xw111luoye at gmail dot com Target Milestone: --- Using std::complex type in offload region is highly desired. $ g++ -fopenmp complex_reduction.cpp ptxas /tmp/cceLNaYr.o, line 484; error : Label expected for argument 0 of instruction 'call' ptxas /tmp/cceLNaYr.o, line 484; error : Function '_ZNSt7complexIfEC1Eff' not declared in this scope ptxas /tmp/cceLNaYr.o, line 484; fatal : Call target not recognized ptxas fatal : Ptx assembly aborted due to errors nvptx-as: ptxas returned 255 exit status mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /soft/gcc/gcc-11-dev-2021-01-27/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status $ g++ -fopenmp -O2 complex_reduction.cpp unresolved symbol __atomic_compare_exchange_16 collect2: error: ld returned 1 exit status mkoffload: fatal error: x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status compilation terminated. lto-wrapper: fatal error: /soft/gcc/gcc-11-dev-2021-01-27/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0//accel/nvptx-none/mkoffload returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The -O2 is more useful for production. Fixing both are desired. source code: https://github.com/ye-luo/openmp-target/blob/master/hands-on/tests/complex/complex_reduction.cpp
[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 --- Comment #5 from cqwrteur --- I do not know whether it has to do with the CRLF issue because GCC on Linux emits the same result as it does on MinGW-w64 or msys2. conftextx.c #ifdef __x86_64__ #ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 #error need -march=i486 #endif #ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 #error need -mcx16 #endif #else #ifndef __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 #error need -march=i686 #endif #endif MinGW32 unlvs@DESKTOP-DFHPDC1 MINGW32 ~/gcc_bug $ gcc -E conftestx.c # 1 "conftestx.c" # 1 "" # 1 "" # 1 "conftestx.c" unlvs@DESKTOP-DFHPDC1 MINGW32 ~/gcc_bug $ gcc -E conftestx.c -march=i486 # 1 "conftestx.c" # 1 "" # 1 "" # 1 "conftestx.c" conftestx.c:10:2: error: #error need -march=i686 10 | #error need -march=i686 | ^ MinGW64 unlvs@DESKTOP-DFHPDC1 MINGW64 ~/gcc_bug $ gcc -E conftestx.c -m32 # 1 "conftestx.c" # 1 "" # 1 "" # 1 "conftestx.c" unlvs@DESKTOP-DFHPDC1 MINGW64 ~/gcc_bug $ gcc -E conftestx.c -march=i486 -mtune=generic # 1 "conftestx.c" cc1.exe: error: CPU you selected does not support x86-64 instruction set MSYS (which is x86_64 with CYGWIN) unlvs@DESKTOP-DFHPDC1 MSYS ~/gcc_bug $ gcc -E conftestx.c # 1 "conftestx.c" # 1 "" # 1 "" # 1 "conftestx.c" conftestx.c:6:2: error: #error need -mcx16 6 | #error need -mcx16 | ^ unlvs@DESKTOP-DFHPDC1 MSYS ~/gcc_bug $ gcc -E conftestx.c -march=i486 # 1 "conftestx.c" cc1: error: CPU you selected does not support x86-64 instruction set The result on Linux: cqwrteur@DESKTOP-DFHPDC1:/mnt/d/msys64/home/unlvs/gcc_bug$ gcc -E conftestx.c # 0 "conftestx.c" # 0 "" # 0 "" # 1 "/usr/include/stdc-predef.h" 1 3 4 # 0 "" 2 # 1 "conftestx.c" conftestx.c:6:2: error: #error need -mcx16 6 | #error need -mcx16 | ^ cqwrteur@DESKTOP-DFHPDC1:/mnt/d/msys64/home/unlvs/gcc_bug$ gcc -E conftestx.c -march=i486 # 0 "conftestx.c" cc1: error: CPU you selected does not support x86-64 instruction set
[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 --- Comment #4 from cqwrteur --- Created attachment 50071 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50071&action=edit bootstrap failure picture
[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 --- Comment #3 from cqwrteur --- After revert to the previous commit. Compilation success https://github.com/gcc-mirror/gcc/commit/bfab355012ca0f5219da8beb04f2fdaf757d34b7 I think it has to do with the script you changed, Jakub.
[Bug c++/98861] New: I want deterministic exceptions (Herbception)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98861 Bug ID: 98861 Summary: I want deterministic exceptions (Herbception) Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: unlvsur at live dot com Target Milestone: --- The mailing list requires me to request the feature here. I put it here. https://www.mail-archive.com/gcc@gcc.gnu.org/msg94104.html http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p0709r4.pdf
[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 --- Comment #2 from cqwrteur --- I guess is because of this commit https://github.com/gcc-mirror/gcc/commit/0411ae7f08e0f5a8b02ff313d26d27a0f6d1bb34 https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0411ae7f08e0f5a8b02ff313d26d27a0f6d1bb34
[Bug bootstrap/98860] boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 --- Comment #1 from cqwrteur --- The question is that why it says we are not cross-compiling? I am using the same script I used before. https://bitbucket.org/ejsvifq_mabmip/mingw-gcc-mcf-gthread/src/master/PKGBUILD It is so weird. checking whether we are cross compiling... configure: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libgomp': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details configure: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libatomic': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details .exe checking whether we are cross compiling... make[1]: *** [Makefile:15606: configure-target-libgomp] Error 1 make[1]: *** Waiting for unfinished jobs configure: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libssp': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details make[1]: *** [Makefile:16174: configure-target-libatomic] Error 1 configure: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libquadmath': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details make[1]: *** [Makefile:13329: configure-target-libssp] Error 1 make[1]: *** [Makefile:14375: configure-target-libquadmath] Error 1 make[1]: Leaving directory '/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32' make: *** [Makefile:973: all] Error 2 ==> ERROR: A failure occurred in build(). Aborting...
[Bug ipa/98594] [11 Regression] IPA modref codegen bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98594 --- Comment #3 from Jan Hubicka --- The initialization is removed by dse1 pass. We get: ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int, glm::packed_highp> (&D.3185); [return slot optimization] ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const glm::vec&) [with int L = 1; T = int; glm::qualifier Q = glm::packed_highp]/8 does not use ref: D.3185.D.3097.x alias sets: 3->1 Deleted dead store: D.3185.D.3097.x = x_2(D); ipa-modref: call stmt D.3199 = bitCount::bitCount_bitfield<1, int, glm::packed_highp> (&D.3185); [return slot optimization] ipa-modref: call to glm::vec bitCount::bitCount_bitfield(const glm::vec&) [with int L = 1; T = int; glm::qualifier Q = glm::packed_highp]/8 does not use ref: D.3185 alias sets: 3->3 Deleted dead store: D.3185 ={v} {CLOBBER}; Now the modref summary for function is loads: Limits: 32 bases, 16 refs Base 0: alias set 5 Ref 0: alias set 5 access: Parm 0 param offset:0 offset:0 size:32 max_size:32 alias set 5 correspond to const struct vec but diferent instantiation than alias set 3 used in the store. There is reinterpret cast: glm::vec::type, Q>x(*reinterpret_cast::type, Q> const *>(&v)); turning it to glm::vec::type, Q> x(*(&v)); makes the aliasing difference go away. So it seems to me that the testcase simply includes TBAA violation?
[Bug c/97172] [11 Regression] ICE: tree code ‘ssa_name’ is not supported in LTO streams since r11-3303-g6450f07388f9fe57
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172 --- Comment #25 from Martin Sebor --- Patch v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564411.html
[Bug bootstrap/98860] New: boostrap failure on MinGW-w64 windows 10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98860 Bug ID: 98860 Summary: boostrap failure on MinGW-w64 windows 10 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: unlvsur at live dot com Target Milestone: --- checking whether we are cross compiling... configure: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libatomic': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details .exe checking whether we are cross compiling... make[1]: *** [Makefile:15606: configure-target-libgomp] Error 1 make[1]: *** Waiting for unfinished jobs make[1]: *** [Makefile:16174: configure-target-libatomic] Error 1 configure: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libssp': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details configure: error: in `/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32/x86_64-w64-mingw32/libquadmath': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details make[1]: *** [Makefile:13329: configure-target-libssp] Error 1 make[1]: *** [Makefile:14375: configure-target-libquadmath] Error 1 make[1]: Leaving directory '/home/unlvs/mcf_build/src/build-x86_64-w64-mingw32' make: *** [Makefile:973: all] Error 2 ==> ERROR: A failure occurred in build(). Aborting...
[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960 --- Comment #26 from Segher Boessenkool --- (In reply to Richard Biener from comment #23) > (that combine number prevails on trunk as well, I can't spot any code > that disables combine on large BBs so not sure what goes on here) There is no such thing, indeed. And the instruction combiner is "mostly linear", so it shouldn't actually matter.
[Bug fortran/86470] [8/9/10/11 Regression] [OOP] ICE with OMP
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86470 anlauf at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |anlauf at gcc dot gnu.org CC||anlauf at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #8 from anlauf at gcc dot gnu.org --- Submitted: https://gcc.gnu.org/pipermail/fortran/2021-January/055647.html
[Bug rtl-optimization/97684] [11 Regression] ICE in reg_preferred_class, at reginfo.c:789 by r11-4577
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97684 --- Comment #6 from CVS Commits --- The master branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:081c96621da658760b4a67c07530805f770fa22c commit r11-6943-g081c96621da658760b4a67c07530805f770fa22c Author: Vladimir N. Makarov Date: Wed Jan 27 14:53:28 2021 -0500 [PR97684] IRA: Recalculate pseudo classes if we added new pseduos since last calculation before updating equiv regs update_equiv_regs can use reg classes of pseudos and they are set up in register pressure sensitive scheduling and loop invariant motion and in live range shrinking. This info can become obsolete if we add new pseudos since the last set up. Recalculate it again if the new pseudos were added. gcc/ChangeLog: PR rtl-optimization/97684 * ira.c (ira): Call ira_set_pseudo_classes before update_equiv_regs when it is necessary. gcc/testsuite/ChangeLog: PR rtl-optimization/97684 * gcc.target/i386/pr97684.c: New.
[Bug libstdc++/70303] Value-initialized debug iterators
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70303 François Dumont changed: What|Removed |Added CC||fdumont at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |fdumont at gcc dot gnu.org --- Comment #6 from François Dumont --- After fixing the duplicate PR 98466 std::vector::iterator is ok but std::deque::iterator seems to be broken still. Taking it.
[Bug c++/98859] pedantic error on use of __VA_OPT__ before C++20 is unnecessary and counterproductive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98859 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-01-27 Keywords||diagnostic --- Comment #1 from Marek Polacek --- That sounds reasonable.
[Bug c++/98859] New: pedantic error on use of __VA_OPT__ before C++20 is unnecessary and counterproductive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98859 Bug ID: 98859 Summary: pedantic error on use of __VA_OPT__ before C++20 is unnecessary and counterproductive Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: richard-gccbugzilla at metafoo dot co.uk Target Milestone: --- There's no good way in ISO C or C++ to express what the GNU ,##__VA_ARGS__ extension does prior to the addition of __VA_OPT__. However, code targeting new compilers (that doesn't want to use GNU C / GNU C++) cannot reliably use __VA_OPT__ instead of the comma paste extension, because GCC's -pedantic-errors mode rejects it outside C++20. Such rejection is unnecessary: __VA_OPT__ is a reserved identifier in other language modes, so there is no conformance reason to issue a diagnostic on its use. I think it'd be useful for GCC to unconditionally allow using __VA_OPT__ in all language modes. (I'm changing Clang to do the same.)
[Bug c++/98570] [8/9/10/11 Regression] ICE: canonical types differ for identical types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98570 Jason Merrill changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jason at gcc dot gnu.org Status|NEW |ASSIGNED CC||jason at gcc dot gnu.org
[Bug lto/85574] [8/9 Regression] LTO bootstapped binaries differ
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85574 --- Comment #38 from Eric Botcazou --- > Feel free to improve things - I do not have any Windows system to > test on or an idea what you think needs to be improved. I would > guess similar things apply to compare-debug which it was derived from. That's even more broken than initially thought: nobody sets $(exeext) at top level so gcc/lto1 is passed and then the behavior is random since some tools apppend the missing .exe implicitly and some don't.
[Bug c++/97874] [11 Regression] ICE: tree check: expected record_type or union_type or qual_union_type, have template_type_parm in lookup_using_decl, at cp/name-lookup.c:4652
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97874 Jason Merrill changed: What|Removed |Added Keywords|ice-on-invalid-code |ice-on-valid-code Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Jason Merrill --- Fixed.
[Bug c++/97874] [11 Regression] ICE: tree check: expected record_type or union_type or qual_union_type, have template_type_parm in lookup_using_decl, at cp/name-lookup.c:4652
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97874 --- Comment #2 from CVS Commits --- The master branch has been updated by Jason Merrill : https://gcc.gnu.org/g:9cd7c32549fa334885b716fe98b674f6447fa7c0 commit r11-6942-g9cd7c32549fa334885b716fe98b674f6447fa7c0 Author: Jason Merrill Date: Wed Jan 27 00:51:01 2021 -0500 c++: Dependent using enum [PR97874] The handling of dependent scopes and unsuitable scopes in lookup_using_decl was a bit convoluted; I tweaked it for a while and then eventually reorganized much of the function to hopefully be clearer. Along the way I noticed a couple of ways we were mishandling inherited constructors. The local binding for a dependent using is the USING_DECL. Implement instantiation of a dependent USING_DECL at function scope. gcc/cp/ChangeLog: PR c++/97874 * name-lookup.c (lookup_using_decl): Clean up handling of dependency and inherited constructors. (finish_nonmember_using_decl): Handle DECL_DEPENDENT_P. * pt.c (tsubst_expr): Handle DECL_DEPENDENT_P. gcc/testsuite/ChangeLog: PR c++/97874 * g++.dg/lookup/using4.C: No error in C++20. * g++.dg/cpp0x/decltype37.C: Adjust message. * g++.dg/template/crash75.C: Adjust message. * g++.dg/template/crash76.C: Adjust message. * g++.dg/cpp0x/inh-ctor36.C: New test. * g++.dg/cpp1z/inh-ctor39.C: New test. * g++.dg/cpp2a/using-enum-7.C: New test.
[Bug target/98853] [9/10 Regression] wrong use of bfxil at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853 Jakub Jelinek changed: What|Removed |Added Last reconfirmed||2021-01-27 Summary|[9/10/11 Regression] wrong |[9/10 Regression] wrong use |use of bfxil at -O1 |of bfxil at -O1 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #5 from Jakub Jelinek --- Fixed on the trunk so far.
[Bug target/98853] [9/10/11 Regression] wrong use of bfxil at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853 --- Comment #4 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:55163419211c6f17e3e22c68304384eba35782a3 commit r11-6941-g55163419211c6f17e3e22c68304384eba35782a3 Author: Jakub Jelinek Date: Wed Jan 27 20:35:21 2021 +0100 aarch64: Fix up *aarch64_bfxilsi_uxtw [PR98853] The https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html patch that introduced this pattern claimed: Would generate: combine_balanced_int: bfxil w0, w1, 0, 16 uxtwx0, w0 ret But with this patch generates: combine_balanced_int: bfxil w0, w1, 0, 16 ret and it is indeed what it should generate, but it doesn't do that, it emits bfxil x0, x1, 0, 16 instead which doesn't zero extend from 32 to 64 bits, but preserves the bits from the destination register. 2021-01-27 Jakub Jelinek PR target/98853 * config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Use %w0, %w1 and %2 instead of %0, %1 and %2. * gcc.c-torture/execute/pr98853-1.c: New test. * gcc.c-torture/execute/pr98853-2.c: New test.
[Bug c++/98295] [8/9/10/11 Regression] ICE in verify_ctor_sanity, at cp/constexpr.c:4312
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98295 Patrick Palka changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ppalka at gcc dot gnu.org CC||ppalka at gcc dot gnu.org
[Bug tree-optimization/60770] disappearing clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60770 --- Comment #15 from Orgad Shaneh --- test.cpp: In function ‘int f(int)’: test.cpp:7:11: warning: ‘q’ is used uninitialized in this function [-Wuninitialized] 7 | return *p; | ^ Is this the intended description? It doesn't refer to the real problem (storing a pointer to a variable that is out of scope).
[Bug fortran/98858] OpenMP offload target data ICE at use_device_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98858 --- Comment #1 from Ye Luo --- GNU Fortran (GCC) 11.0.0 20210127 (experimental)
[Bug fortran/98858] New: OpenMP offload target data ICE at use_device_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98858 Bug ID: 98858 Summary: OpenMP offload target data ICE at use_device_ptr Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: xw111luoye at gmail dot com Target Milestone: --- Getting ICE yeluo@ryzen-box:~/opt/openmp-target/hands-on/tests/fortran_use_device_ptr$ gfortran -fopenmp test_use_device_ptr_target.f90 test_use_device_ptr_target.f90:15:41: 15 | !$omp target data use_device_ptr(a) | ^ internal compiler error: Segmentation fault 0xf55ee3 crash_signal source code at. https://github.com/ye-luo/openmp-target/blob/master/hands-on/tests/fortran_use_device_ptr/test_use_device_ptr_target.f90
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #11 from Christophe Lyon --- Yes MVE is incompatible with iWMMXt. Regarding the pattern name, quoting what I wrote in the commit message: I kept the mve_vshlq_ naming instead of renaming it to ashl3__ as discussed because the reference in arm_mve_builtins.def automatically inserts the "mve_" prefix and I didn't want to make a special case for this.
[Bug c++/98841] wrong ‘operator=’ should return a reference to ‘*this’ [-Weffc++]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98841 --- Comment #7 from Olaf Mandel --- (In reply to Olaf Mandel from comment #0) > In the minimal demo used here this only happens for a template member > function, but in larger code it can also be observed for a plain member > function: see e.g. https://github.com/jbeder/yaml-cpp/issues/970 > I have to retract that statement: I cannot reproduce this and the two line numbers in the larger code in question are very similar: 212 and 221. Maybe I just confused them?
[Bug tree-optimization/60770] disappearing clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60770 --- Comment #14 from Marc Glisse --- (In reply to Orgad Shaneh from comment #13) > The case described in comment 1 doesn't issue a warning with GCC 10. It does for me with -Wall -O (you need at least some optimization). If there is still a problem, you need to open a new issue.
[Bug tree-optimization/60770] disappearing clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60770 Orgad Shaneh changed: What|Removed |Added CC||orgads at gmail dot com --- Comment #13 from Orgad Shaneh --- The case described in comment 1 doesn't issue a warning with GCC 10. Looks like it's a different case than bug 60517.
[Bug c++/98295] [8/9/10/11 Regression] ICE in verify_ctor_sanity, at cp/constexpr.c:4312
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98295 --- Comment #4 from Jakub Jelinek --- Still ICEs even when that other bug is fixed.
[Bug testsuite/98351] [11 regression] gcc.target/powerpc/sse-andnps-1.c and sse2-andnpd-1.c fail after r11-3308
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98351 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED CC||jakub at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #2 from Jakub Jelinek --- Should be fixed with r11-6869-gd08677c11dc4b43cc8bab862d1c986563897ce3f and r11-6871-g70ab52b8cafffedb05b55c68c847173ff80f2652 and https://gcc.gnu.org/g:e80f1f6b7a339bce1db03567e497658ae32d135e commit r11-6917-ge80f1f6b7a339bce1db03567e497658ae32d135e Author: Jakub Jelinek Date: Tue Jan 26 20:02:29 2021 +0100 testsuite: Fix TBAA in sse*and*p[sd]*.c tests This patch drops the no-strict-aliasing hack in m128-check.h and instead ensures the tests read objects with the right dynamic type. 2021-01-26 Jakub Jelinek * gcc.target/powerpc/m128-check.h (CHECK_EXP): Remove optimize ("no-strict-aliasing") attribute. * gcc.target/powerpc/sse-andnps-1.c (TEST): Copy e into float[4] array to avoid violating TBAA. * gcc.target/powerpc/sse2-andpd-1.c (TEST): Copy e.d into double[2] array to avoid violating TBAA. * gcc.target/powerpc/sse-andps-1.c (TEST): Copy e.f into float[4] array to avoid violating TBAA. * gcc.target/powerpc/sse2-andnpd-1.c (TEST): Copy e into double[2] array to avoid violating TBAA.
[Bug testsuite/98349] [11 regression] gcc.target/powerpc/sse-movhps-1.c and sse-movlps.c fail after r11-3434
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98349 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Should be fixed by: https://gcc.gnu.org/g:c63f091db89a56ae56b2bfa2ba4d9e956bd9693f commit r11-6879-gc63f091db89a56ae56b2bfa2ba4d9e956bd9693f Author: Jakub Jelinek Date: Sat Jan 23 09:41:58 2021 +0100 rs6000: Fix up __m64 typedef in mmintrin.h [PR97301] The x86 __m64 type is defined as: /* The Intel API is flexible enough that we must allow aliasing with other vector types, and their scalar components. */ typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__)); and so matches the comment above it in that reads and stores through pointers to __m64 can alias anything. But in the rs6000 headers that is the case only for __m128, but not __m64. The following patch adds that attribute, which fixes the FAIL: gcc.target/powerpc/sse-movhps-1.c execution test FAIL: gcc.target/powerpc/sse-movlps-1.c execution test regressions that appeared when Honza improved ipa-modref. 2021-01-23 Jakub Jelinek PR testsuite/97301 * config/rs6000/mmintrin.h (__m64): Add __may_alias__ attribute.
[Bug middle-end/98829] Different results with -O3 and custom quiet NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98829 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- Your custom quiet NaN is not a quiet NaN, but signaling NaN. And, as documented, -fno-signaling-nans is the default. If you change your custom signaling NaN into a quiet NaN, static constexpr std::uint64_t kCustomNaN = 0x7ff8 | kMagicNumber; or if you compile with -fsignaling-nans, this works fine, so I'd say this is just a user error.
[Bug libfortran/98825] Unexpected behavior of FORTRAN FORMAT expressions when suppressing new line with '$'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98825 --- Comment #5 from max.pd at gmx dot de --- The -fdec compiler flag provides a possible work around. When opening a Unit with CARRIAGECONTROL='NONE' (an option available with DEC extensions in gfortran), the program won't show the unexpected behavior any more. But there would be no way to enable the carriage return between records for that io-unit omitting '$' in the format expression. This work around makes it necessary to open a new unit for stdout output: OPEN (UNIT=7, FILE='/dev/stdout', CARRIAGECONTROL='NONE') This feature is documented in: https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gfortran/Extended-I_002fO-specifiers.html The '$'-fin format works well for gfortran on single records without any compiler flags. So it might be coherent, to make a patch, that affects the full scope of the '$' format expressions, even those compiled without the -fdec compiler flags. So the patch would cover all possible occurrences of the unexpected behavior when writing multiple record output.
[Bug rtl-optimization/97684] [11 Regression] ICE in reg_preferred_class, at reginfo.c:789 by r11-4577
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97684 Vladimir Makarov changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #5 from Vladimir Makarov --- I've reproduced x86-64 case and started to work on it. I think the patch will be ready soon.
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Richard Biener --- Fixed.
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #9 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:c91db798ec65b3e55f2380ca1530ecb71544f1bb commit r11-6934-gc91db798ec65b3e55f2380ca1530ecb71544f1bb Author: Richard Biener Date: Wed Jan 27 15:20:58 2021 +0100 tree-optimization/98854 - avoid some PHI BB vectorization This avoids cases of PHI node vectorization that just causes us to insert vector CTORs inside loops for values only required outside of the loop. 2021-01-27 Richard Biener PR tree-optimization/98854 * tree-vect-slp.c (vect_build_slp_tree_2): Also build PHIs from scalars when the number of CTORs matches the number of children. * gcc.dg/vect/bb-slp-pr98854.c: New testcase.
[Bug inline-asm/98847] Miscompilation with c++17, templates, and register keyword
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98847 --- Comment #4 from programmerjake at gmail dot com --- (In reply to Jakub Jelinek from comment #3) > Created attachment 50066 [details] > gcc11-pr98847.patch > > Untested fix. That will probably also fix bug #98846
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #10 from Jakub Jelinek --- ./cc1 -quiet -nostdinc -O3 -mcpu=iwmmxt pr98849-2.c -fdump-tree-all-folding -mfpu=neon cc1: error: iWMMXt and NEON are incompatible So I think TARGET_NEON && TARGET_REALLY_IWMMXT is never true. Don't know if TARGET_HAVE_MVE && TARGET_REALLY_IWMMXT is similarly never true, but I'd guess so. So perhaps just add && !TARGET_REALLY_IWMMXT to the two conditions. It is also unclear why you call the pattern mve_* when it is used by both neon and mve.
[Bug c++/98824] [C++-20] function template non-type-class-arg deduction fails with a reason that looks bogus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98824 --- Comment #1 from Dimitri Gorokhovik --- It doesn't seem to contradict N4868 :-( Modifying the code slightly (adding refs, splitting deduction across two fn templates) didn't show any other differences from clang: all other modification either both pass or both fail with the equivalent messages ("'i' cannot be deduced"). This is the only one. clang version: Ubuntu clang version 12.0.0-++20201102052620+327bf5c2d91-1~exp1~20201102163303.210 Target: x86_64-pc-linux-gnu
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #8 from rguenther at suse dot de --- On Wed, 27 Jan 2021, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 > > --- Comment #7 from Martin Liška --- > > I used -O3 but -O2 -ftree-slp-vectorize also vectorizes it. > > I must be blind, but I see for the current master: > > gcc pr98854.c -c -O2 -ftree-slp-vectorize -fdump-tree-optimized=/dev/stdout > > foo (int n) > { > unsigned long ivtmp.8; > double y; > double x; > double _6; > double _8; > double _9; > double _11; > int _14; > void * _29; > unsigned long _31; > > : > ivtmp.8_28 = (unsigned long) &MEM[(void *)&a + 8184B]; > _31 = (unsigned long) &a; > > : > # x_1 = PHI <0.0(2), x_10(5)> > # y_2 = PHI <0.0(2), y_12(5)> > # ivtmp.8_18 = PHI > _29 = (void *) ivtmp.8_18; > _6 = MEM[base: _29, offset: 0B]; > _8 = MEM[base: _29, offset: 8B]; > _9 = _6 + _8; > x_10 = _9 + x_1; > _11 = _6 / _8; > y_12 = _11 + y_2; > _14 = bar (); > if (_14 != 0) > goto ; > else > goto ; > > : > a[0] = x_10; > a[1] = y_12; > return; > > : > ivtmp.8_27 = ivtmp.8_18 - 8; > if (ivtmp.8_27 != _31) > goto ; > else > goto ; > > } Hmm, maybe my dev tree has related adjustments to SLP ... at least the posted patch fixes the regression for me.
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #7 from Martin Liška --- > I used -O3 but -O2 -ftree-slp-vectorize also vectorizes it. I must be blind, but I see for the current master: gcc pr98854.c -c -O2 -ftree-slp-vectorize -fdump-tree-optimized=/dev/stdout foo (int n) { unsigned long ivtmp.8; double y; double x; double _6; double _8; double _9; double _11; int _14; void * _29; unsigned long _31; : ivtmp.8_28 = (unsigned long) &MEM[(void *)&a + 8184B]; _31 = (unsigned long) &a; : # x_1 = PHI <0.0(2), x_10(5)> # y_2 = PHI <0.0(2), y_12(5)> # ivtmp.8_18 = PHI _29 = (void *) ivtmp.8_18; _6 = MEM[base: _29, offset: 0B]; _8 = MEM[base: _29, offset: 8B]; _9 = _6 + _8; x_10 = _9 + x_1; _11 = _6 / _8; y_12 = _11 + y_2; _14 = bar (); if (_14 != 0) goto ; else goto ; : a[0] = x_10; a[1] = y_12; return; : ivtmp.8_27 = ivtmp.8_18 - 8; if (ivtmp.8_27 != _31) goto ; else goto ; }
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #6 from rguenther at suse dot de --- On Wed, 27 Jan 2021, marxin at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 > > --- Comment #5 from Martin Li?ka --- > (In reply to Richard Biener from comment #4) > > Little bit convoluted testcase: > > > > double a[1024]; > > > > int bar(); > > void foo (int n) > > { > > double x = 0, y = 0; > > int i = 1023; > > do > > { > > x += a[i] + a[i+1]; > > y += a[i] / a[i+1]; > > if (bar ()) > > break; > > } > > while (--i); > > a[0] = x; > > a[1] = y; > > } > > > > What compiler (ISA options) do you use in order to vectorize this? I used -O3 but -O2 -ftree-slp-vectorize also vectorizes it.
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Target Milestone|--- |11.0 Keywords||missed-optimization Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- I will have a look.
[Bug tree-optimization/98766] [10 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766 ktkachov at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Known to fail|11.0| --- Comment #7 from ktkachov at gcc dot gnu.org --- Fixed on branch too.
[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 --- Comment #3 from Dimitrij Mijoski --- (In reply to Jonathan Wakely from comment #2) > This was already fixed on master by r11-6682 > 05a30af3f237984b4dcf1dbbc17fdac583c46506 Yes, that patch mostly fixes bug 70303, too. With that patch, the asserts presented in bug 70303 pass for vector::iterator but not for deque::iterator.
[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855 Richard Biener changed: What|Removed |Added Keywords||missed-optimization --- Comment #2 from Richard Biener --- OK, let's see whether the fix for 98854 makes a difference before investigating closer.
[Bug tree-optimization/98766] [10 Regression] SVE: ICE in tree_to_shwi with -O3 --param=avoid-fma-max-bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98766 --- Comment #6 from CVS Commits --- The releases/gcc-10 branch has been updated by Kyrylo Tkachov : https://gcc.gnu.org/g:e753db89ddcc7f005fd54f861375bcdc85f23335 commit r10-9305-ge753db89ddcc7f005fd54f861375bcdc85f23335 Author: Kyrylo Tkachov Date: Thu Jan 21 16:33:49 2021 + tree-ssa-mathopts: Use proper poly_int64 comparison with param_avoid_fma_max_bits [PR 98766] We ICE here because we end up comparing a poly_int64 with a scalar using <= rather than maybe_le. This patch fixes that in the way rich suggests in the PR. gcc/ChangeLog: PR tree-optimization/98766 * tree-ssa-math-opts.c (convert_mult_to_fma): Use maybe_le when comparing against type size with param_avoid_fma_max_bits. gcc/testsuite/ChangeLog: PR tree-optimization/98766 * gcc.dg/pr98766.c: New test. (cherry picked from commit 9d33785f57daf29dc0c106c919da319fe1906bc6)
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #5 from Martin Liška --- (In reply to Richard Biener from comment #4) > Little bit convoluted testcase: > > double a[1024]; > > int bar(); > void foo (int n) > { > double x = 0, y = 0; > int i = 1023; > do > { > x += a[i] + a[i+1]; > y += a[i] / a[i+1]; > if (bar ()) > break; > } > while (--i); > a[0] = x; > a[1] = y; > } > What compiler (ISA options) do you use in order to vectorize this?
[Bug c++/83417] Pointer-to-member template parameter with auto member type dependent container type does not work (C++17)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83417 David Friberg changed: What|Removed |Added CC||davveston at gmail dot com --- Comment #3 from David Friberg --- The same holds for the case of function pointers. Given the following function: void f(int) {} Both examples (A) and (B) below are well-formed, as per [temp.deduct.type]/13 (and for (B): also as per [temp.arg.nontype]/1). // Example (A) template struct A; template struct A { }; A a{}; // #1: OK // Example (B) template struct B; template struct B { }; B b{}; // #2: Rejected (type deduction failure in partial specialization) Clang accepts both, whereas GCC (trunk/any version I've tried that supports C++17) rejects example (B), as #2 is resolved to the primary (non-defined) class template after failing to deduce the dependent 'T' from 'auto (*fp)(T)' in the partial specialization, given the argument 'f' to the latter (non-type) template parameter.
[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #25 from Richard Biener --- Oh, so it's not actually that plus_constant calls but the ones called via get_addr from true_dependence_1 which is called 60 million times from check_mem_read_use. That does: /* Convert the address X into something we can use. This is done by returning it unchanged unless it is a VALUE or VALUE +/- constant; for VALUE we call cselib to get a more useful rtx. */ rtx get_addr (rtx x) { cselib_val *v; struct elt_loc_list *l; if (GET_CODE (x) != VALUE) { if ((GET_CODE (x) == PLUS || GET_CODE (x) == MINUS) && GET_CODE (XEXP (x, 0)) == VALUE && CONST_SCALAR_INT_P (XEXP (x, 1))) { rtx op0 = get_addr (XEXP (x, 0)); if (op0 != XEXP (x, 0)) { poly_int64 c; if (GET_CODE (x) == PLUS && poly_int_rtx_p (XEXP (x, 1), &c)) return plus_constant (GET_MODE (x), op0, c); thus undoing the valueization DSE does. Since it unconditionally does this I guess DSE could do it itself instead. That helps tremendously: dead store elim2 : 6.34 ( 11%) 0.02 ( 7%) 6.38 ( 11%) 170M ( 45%) TOTAL : 56.96 0.27 57.26 381M 56.96user 0.29system 0:57.27elapsed 99%CPU (0avgtext+0avgdata 825148maxresident)k 0inputs+0outputs (0major+210372minor)pagefaults 0swaps diff --git a/gcc/dse.c b/gcc/dse.c index c88587e7d94..da0df54a2dd 100644 --- a/gcc/dse.c +++ b/gcc/dse.c @@ -2219,6 +2219,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info) } if (maybe_ne (offset, 0)) mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); + /* Avoid passing VALUE RTXen as mem_addr to canon_true_dependence + which will over and over re-create proper RTL and re-apply the + offset above. See PR80960 where we almost allocate 1.6GB of PLUS + RTXen that way. */ + mem_addr = get_addr (mem_addr); if (group_id >= 0) {
[Bug c++/98857] New: Add support for function attributes applied to function pointers from non-capturing lambdas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98857 Bug ID: 98857 Summary: Add support for function attributes applied to function pointers from non-capturing lambdas Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: koncek.marian at gmail dot com Target Milestone: --- Created attachment 50070 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50070&action=edit example Since non-capturing lambda has to be convertible to a function pointer it may be useful to be able to specify function attributes (such as [[gnu::aligned(N)]]) which apply to the function pointer obtained from such lambda. Example use: https://godbolt.org/z/ds1G6z (also attached) Although this opens some questions about which attribute applies to any of the: 1) lambda object 2) member application function: &decltype(lambda)::operator() 3) function pointer from non-capturing lambdas and where to place the attribute specifier.
[Bug target/98853] [9/10/11 Regression] wrong use of bfxil at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853 --- Comment #3 from Jakub Jelinek --- Created attachment 50069 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50069&action=edit gcc11-pr98853.patch Untested fix.
[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855 --- Comment #1 from Martin Liška --- And likely something similar happens since the same revision: botan/KASUMI decrypt https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=245.694.1&plot.1=171.694.1
[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 Martin Liška changed: What|Removed |Added Known to fail||11.0 Last reconfirmed||2021-01-27 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Known to work||10.2.0
[Bug tree-optimization/98856] New: [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856 Bug ID: 98856 Summary: [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Since the revision the following is slower: $ make clean && ./configure.py --cxxflags="-Ofast -march=znver2 -fno-checking" && make -j16 && ./botan speed AES-128/XTS as seen here: https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=226.721.1&plot.1=14.721.1&;
[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 --- Comment #2 from Jonathan Wakely --- This was already fixed on master by r11-6682 05a30af3f237984b4dcf1dbbc17fdac583c46506
[Bug libstdc++/98466] Debug Mode iterators for unordered containers do not implement N3644
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 Dimitrij Mijoski changed: What|Removed |Added CC||dmjpp at hotmail dot com --- Comment #1 from Dimitrij Mijoski --- This bug looks like a duplicate of bug 70303. The asserts presented there should be used on random-access iterators (vector, deque) to test if N3644 is implement.
[Bug target/98853] [9/10/11 Regression] wrong use of bfxil at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853 Jakub Jelinek changed: What|Removed |Added Target Milestone|11.0|9.4 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Summary|[11 Regression] wrong use |[9/10/11 Regression] wrong |of bfxil at -O1 |use of bfxil at -O1 --- Comment #2 from Jakub Jelinek --- That change has been introduced in r9-2905-g2dc09f66b3b49d821e4bd68d3c97ff51d5e080d4 , so I think we have at least latent wrong-code in 9 and 10 too.
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #4 from Richard Biener --- Little bit convoluted testcase: double a[1024]; int bar(); void foo (int n) { double x = 0, y = 0; int i = 1023; do { x += a[i] + a[i+1]; y += a[i] / a[i+1]; if (bar ()) break; } while (--i); a[0] = x; a[1] = y; } where we end up with the {x, y} vector CTOR inside the loop (and even spill/reload it because of the call). We have a PHI node-only feed for the vectorized store: t.c:16:8: note: Vectorizing SLP tree: t.c:16:8: note: node 0x3b21ee0 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: a[0] = x_22; t.c:16:8: note: stmt 0 a[0] = x_22; t.c:16:8: note: stmt 1 a[1] = y_21; t.c:16:8: note: children 0x3b21f68 t.c:16:8: note: node 0x3b21f68 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: x_22 = PHI t.c:16:8: note: stmt 0 x_22 = PHI t.c:16:8: note: stmt 1 y_21 = PHI t.c:16:8: note: children 0x3b21ff0 0x3b22210 t.c:16:8: note: node 0x3b21ff0 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: x_26 = PHI t.c:16:8: note: stmt 0 x_26 = PHI t.c:16:8: note: stmt 1 y_24 = PHI t.c:16:8: note: children 0x3b22320 t.c:16:8: note: node (external) 0x3b22320 (max_nunits=1, refcnt=1) t.c:16:8: note: { x_14, y_15 } t.c:16:8: note: node 0x3b22210 (max_nunits=2, refcnt=1) t.c:16:8: note: op template: x_25 = PHI t.c:16:8: note: stmt 0 x_25 = PHI t.c:16:8: note: stmt 1 y_23 = PHI t.c:16:8: note: children 0x3b223a8 t.c:16:8: note: node (external) 0x3b223a8 (max_nunits=1, refcnt=1) t.c:16:8: note: { x_14, y_15 } fixing this issue fixes the slowdown. Testing a patch.
[Bug target/98853] [11 Regression] wrong use of bfxil at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- I admit I know next to nothing about AArch64, but the https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01895.html patch certainly doesn't emit what it claims to, and from a brief look at the assembler guide it appears that emitting what the patch claims to shall fix it. So I think this should be: --- gcc/config/aarch64/aarch64.md.jj2021-01-04 10:25:46.435147744 +0100 +++ gcc/config/aarch64/aarch64.md 2021-01-27 15:13:13.993275204 +0100 @@ -5724,10 +5724,10 @@ (define_insn "*aarch64_bfxilsi_uxtw" { case 0: operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3]))); - return "bfxil\\t%0, %1, 0, %3"; + return "bfxil\\t%w0, %w1, 0, %3"; case 1: operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4]))); - return "bfxil\\t%0, %2, 0, %3"; + return "bfxil\\t%w0, %w2, 0, %3"; default: gcc_unreachable (); }
[Bug ipa/98815] Redundant free_dominance_info in cgraph_node::analyze()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98815 --- Comment #4 from Martin Liška --- I can confirm the patch survives bootstrap and regression tests. I'm going to send it at the beginning of the next stage1.
[Bug libstdc++/66414] string::find ten times slower than strstr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414 --- Comment #10 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:a199da782fc165fd45f42a15cc9020994efd455d commit r11-6931-ga199da782fc165fd45f42a15cc9020994efd455d Author: Jonathan Wakely Date: Wed Jan 27 13:21:52 2021 + libstdc++: Optimize std::string_view::find [PR 66414] This reuses the code from std::string::find, which was improved by r244225, but string_view was not changed to match. libstdc++-v3/ChangeLog: PR libstdc++/66414 * include/bits/string_view.tcc (basic_string_view::find(const CharT*, size_type, size_type)): Optimize.
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #3 from Richard Biener --- OK, one can see it with BB vectorization enabled vs. disabled. Bad: Samples: 7K of event 'cycles:u', Event count (approx.): 7540324763 Overhead Samples Command Shared Object Symbol 53.11% 3711 a.outa.out [.] shade 25.39% 1774 a.outa.out [.] trace 18.16% 1271 a.outa.out [.] render_scanline 1.56% 109 a.outlibm-2.26.so[.] __ieee754_pow_sse2 Good: Samples: 6K of event 'cycles:u', Event count (approx.): 6673802579 Overhead Samples Command Shared Object Symbol 61.21% 3857 a.outa.out [.] shade 20.44% 1288 a.outa.out [.] trace 14.42% 912 a.outa.out [.] render_scanline 1.81% 114 a.outlibm-2.26.so[.] __ieee754_pow_sse2 With added -fwhole-program we have c-ray-mt.c:624:18: optimized: basic block part vectorized using 32 byte vectors c-ray-mt.c:372:13: optimized: basic block part vectorized using 32 byte vectors c-ray-mt.c:372:13: optimized: basic block part vectorized using 32 byte vectors c-ray-mt.c:432:9: optimized: basic block part vectorized using 32 byte vectors c-ray-mt.c:656:7: optimized: basic block part vectorized using 32 byte vectors c-ray-mt.c:656:7: optimized: basic block part vectorized using 32 byte vectors c-ray-mt.c:265:23: optimized: basic block part vectorized using 32 byte vectors :372 is bad and then :656 For the first we vectorize a store [local count: 31445960]: # nearest_obj_239 = PHI ... _816 = {nearest_sp_pos_x_lsm.258_78, nearest_sp_pos_y_lsm.259_174, nearest_sp_pos_z_lsm.260_201, nearest_sp_normal_x_lsm.261_200}; _820 = {nearest_sp_normal_y_lsm.262_122, nearest_sp_normal_z_lsm.263_293, nearest_sp_vref_x_lsm.264_124, nearest_sp_vref_y_lsm.265_148}; iter_231 = iter_363->next; if (iter_231 != 0B) goto ; [89.00%] else goto ; [11.00%] [local count: 27986904]: goto ; [100.00%] [local count: 3459055]: # nearest_sp_dist_lsm.257_228 = PHI # nearest_sp_pos_x_lsm.258_226 = PHI # nearest_sp_normal_y_lsm.262_343 = PHI # nearest_sp_vref_x_lsm.264_238 = PHI # nearest_sp_vref_y_lsm.265_237 = PHI # nearest_sp_vref_z_lsm.266_236 = PHI # nearest_sp_pos_y_lsm.259_342 = PHI # nearest_sp_normal_x_lsm.261_351 = PHI # nearest_sp_pos_z_lsm.260_304 = PHI # nearest_obj_197 = PHI # nearest_sp_normal_z_lsm.263_821 = PHI # vect_nearest_sp_pos_x_lsm.258_226.268_815 = PHI <_816(26)> # vect_nearest_sp_pos_x_lsm.258_226.268_814 = PHI <_820(26)> nearest_sp.vref.z = nearest_sp_vref_z_lsm.266_236; MEM [(double *)&nearest_sp] = vect_nearest_sp_pos_x_lsm.258_226.268_815; _812 = &nearest_sp.pos.x + 32; MEM [(double *)_812] = vect_nearest_sp_pos_x_lsm.258_226.268_814; but we insert the vector CTOR on a path that's more often executed than the use. And since there's no sinking pass after vectorization nothing fixes this up.
[Bug tree-optimization/98855] [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855 Martin Liška changed: What|Removed |Added Target Milestone|--- |11.0 CC||rguenth at gcc dot gnu.org See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=98854
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #9 from Jakub Jelinek --- (In reply to Christophe Lyon from comment #6) > so to answer your question arm does have vector shift by scalar. If it does, it doesn't advertize them: make mddump grep '"v\?ashlv[0-9qhsdi]*3"' tmp-mddump.md (define_expand ("vashlv8qi3") (define_expand ("vashlv16qi3") (define_expand ("vashlv4hi3") (define_expand ("vashlv8hi3") (define_expand ("vashlv2si3") (define_expand ("vashlv4si3") Ditto for ashr and lshr instead of ashl.
[Bug tree-optimization/98855] New: [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98855 Bug ID: 98855 Summary: [11 Regression] botan XTEA is 100% slower on znver2 since r11-4428-g4a369d199bf2f34e Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org Target Milestone: --- Since the revision the following is now slower: $ make clean && ./configure.py --cxxflags="-Ofast -march=znver2" && make -j16 && ./botan speed XTEA as seen here: https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=245.710.1&plot.1=171.710.1&; Algorithm is implemented here: src/lib/block/xtea/xtea.cpp
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #8 from Jakub Jelinek --- Seems vec_init optab is supported if TARGET_NEON || TARGET_HAVE_MVE, so maybe guard the shift expander also on && (TARGET_NEON || TARGET_HAVE_MVE)? Or && !TARGET_REALLY_IWMMXT. Dunno if one can mix iwmmxt with neon or mve etc.
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #7 from Christophe Lyon --- (In reply to ktkachov from comment #5) > Looks like after the refactoring to introduce MVE shifts (which doesn't ICE) > we need to make sure the optab is still disabled for iwmmxt? So that would mean that ARM_HAVE__ARITH shouldn't be defined for iwmmxt (only for shifts?) ?
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #6 from Christophe Lyon --- I'm not familiar with iwmmxt, but the testcase in comment #2 is vectorized with: * -mcpu=cortex-a9 -mfpu=auto -mfloat-abi=hard (uses Neon FPU) * -mcpu=cortex-m55 -mfpu=auto -mfloat-abi=hard (uses MVE/Helium FPU) in both cases -mfloat-abi=hard is required. Using -mcpu=iwmmxt -mfpu=auto -mfloat-abi=hard fails because: cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU so to answer your question arm does have vector shift by scalar. But the Neon/MVE patterns use a const_vector constraint (see mve_vshlq_ and vashl3 in vec-common.md and ashl3_iwmmxt in iwmmxt.md)
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #5 from ktkachov at gcc dot gnu.org --- Looks like after the refactoring to introduce MVE shifts (which doesn't ICE) we need to make sure the optab is still disabled for iwmmxt?
[Bug c++/98531] [11 Regression] g++.dg/modules/xtreme-header-2_a.H etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98531 --- Comment #8 from Nathan Sidwell --- On 1/27/21 8:30 AM, ro at CeBiTec dot Uni-Bielefeld.DE wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98531 > > --- Comment #7 from ro at CeBiTec dot Uni-Bielefeld.DE Uni-Bielefeld.DE> --- > Nathan, > > last night I've tried the patch you posted on both i386-pc-solaris2.11 > and sparc-sun-solaris2.11, with mixed results: > > * The new g++.dg/modules/pr98531_* testcases PASS. > > * However, there's a libstdc++ regression: > > +FAIL: 17_intro/headers/c++1998/all_attributes.cc (test for excess errors) > +FAIL: 17_intro/headers/c++2011/all_attributes.cc (test for excess errors) > +FAIL: 17_intro/headers/c++2014/all_attributes.cc (test for excess errors) > +FAIL: 17_intro/headers/c++2017/all_attributes.cc (test for excess errors) > > Excess errors: > /vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error: > declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*) > throw ()' has a different exception specifier thanks, I'm finding this too -- thankful I didn;t push the patch! this is indicative there is a mismatch between the runtime library and the compiler's idea of it. > >i.e. > > In file included from > /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:40: > /vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error: > declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*) > throw ()' has a different exception specifier > In file included from > /var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/extc++.h:68, > from > /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:39: > /var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/ext/throw_allocator.h:371: > note: from previous declaration 'int __cxxabiv1::__cxa_atexit(void (*)(void*), > void*, void*)' > >where cxxabi.h has > > #ifdef _GLIBCXX_CDTOR_CALLABI >__cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*) > _GLIBCXX_NOTHROW; > #else >__cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW; > #endif > > * Besides, the ICE in the original testcases remains: > > /vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/modules/xtreme-header-2_a.H: > internal compiler error: in tree_node, at cp/module.cc:9137 > > >I'm uncertain if the patch was just meant as a preparatory step to fix >those or something else is amiss. thanks, I was going to revisit the original report to see if there were further issues. nathan
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P1
[Bug libstdc++/66414] string::find ten times slower than strstr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414 --- Comment #9 from Jonathan Wakely --- (In reply to AK from comment #8) > Should we consider this fixed? I think we can still do better, by using GNU memmem when it's available: https://gcc.gnu.org/pipermail/gcc-patches/2017-January/466460.html https://gcc.gnu.org/pipermail/gcc-patches/2017-January/466469.html https://gcc.gnu.org/pipermail/gcc-patches/2017-January/466471.html For now we should also use the new code in basic_string_view::find which is currently much slower.
[Bug c++/98531] [11 Regression] g++.dg/modules/xtreme-header-2_a.H etc. FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98531 --- Comment #7 from ro at CeBiTec dot Uni-Bielefeld.DE --- Nathan, last night I've tried the patch you posted on both i386-pc-solaris2.11 and sparc-sun-solaris2.11, with mixed results: * The new g++.dg/modules/pr98531_* testcases PASS. * However, there's a libstdc++ regression: +FAIL: 17_intro/headers/c++1998/all_attributes.cc (test for excess errors) +FAIL: 17_intro/headers/c++2011/all_attributes.cc (test for excess errors) +FAIL: 17_intro/headers/c++2014/all_attributes.cc (test for excess errors) +FAIL: 17_intro/headers/c++2017/all_attributes.cc (test for excess errors) Excess errors: /vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error: declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*) throw ()' has a different exception specifier i.e. In file included from /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:40: /vol/gcc/src/hg/master/local/libstdc++-v3/libsupc++/cxxabi.h:129: error: declaration of 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*) throw ()' has a different exception specifier In file included from /var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/extc++.h:68, from /vol/gcc/src/hg/master/local/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc:39: /var/gcc/regression/master/11.4-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/ext/throw_allocator.h:371: note: from previous declaration 'int __cxxabiv1::__cxa_atexit(void (*)(void*), void*, void*)' where cxxabi.h has #ifdef _GLIBCXX_CDTOR_CALLABI __cxa_atexit(void (_GLIBCXX_CDTOR_CALLABI *)(void*), void*, void*) _GLIBCXX_NOTHROW; #else __cxa_atexit(void (*)(void*), void*, void*) _GLIBCXX_NOTHROW; #endif * Besides, the ICE in the original testcases remains: /vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/modules/xtreme-header-2_a.H: internal compiler error: in tree_node, at cp/module.cc:9137 I'm uncertain if the patch was just meant as a preparatory step to fix those or something else is amiss.
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 Jakub Jelinek changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org, ||rearnsha at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- Ah, so on powerpc64le this works fine, expand_binop has: 1558 /* If this is a vector shift by a scalar, see if we can do a vector 1559 shift by a vector. If so, broadcast the scalar into a vector. */ 1560 if (mclass == MODE_VECTOR_INT) 1561{ 1562 optab otheroptab = unknown_optab; 1563 1564 if (binoptab == ashl_optab) 1565otheroptab = vashl_optab; 1566 else if (binoptab == ashr_optab) 1567otheroptab = vashr_optab; 1568 else if (binoptab == lshr_optab) 1569otheroptab = vlshr_optab; 1570 else if (binoptab == rotl_optab) 1571otheroptab = vrotl_optab; 1572 else if (binoptab == rotr_optab) 1573otheroptab = vrotr_optab; 1574 1575 if (otheroptab 1576 && (icode = optab_handler (otheroptab, mode)) != CODE_FOR_nothing) 1577{ 1578 /* The scalar may have been extended to be too wide. Truncate 1579 it back to the proper size to fit in the broadcast vector. */ 1580 scalar_mode inner_mode = GET_MODE_INNER (mode); 1581 if (!CONST_INT_P (op1) 1582 && (GET_MODE_BITSIZE (as_a (GET_MODE (op1))) 1583 > GET_MODE_BITSIZE (inner_mode))) 1584op1 = force_reg (inner_mode, 1585 simplify_gen_unary (TRUNCATE, inner_mode, op1, 1586 GET_MODE (op1))); 1587 rtx vop1 = expand_vector_broadcast (mode, op1); 1588 if (vop1) 1589{ 1590 temp = expand_binop_directly (icode, mode, otheroptab, op0, vop1, 1591target, unsignedp, methods, last); 1592 if (temp) (gdb) 1593return temp; 1594} 1595} 1596} code for this. It doesn't work in the ARM case, because it doesn't support either vec_duplicate_optab nor vec_init_optab for the mode. I'm declaring this a backend bug, it shouldn't advertise such vector shifts in configurations in which it can't even init such vectors.
[Bug c/98852] [11 Regression] Conditional expression wrongly rejected for arm_neon.h vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98852 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.0
[Bug target/98853] [11 Regression] wrong use of bfxil at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.0
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2021-01-27 Target Milestone|--- |11.0 --- Comment #2 from Richard Biener --- I will have a look.
[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960 --- Comment #24 from Richard Biener --- And we allocate plus 66M 1606M 66 million PLUS RTXen via explow.c:200 (plus_constant) 0 : 0.0% 1596M: 92.0%0 : 0.0%0 : 0.0% 66M called by DSE check_mem_read_rtx and record_store. Ideally we'd not need any of that via an interface change to canon_true_dependence and friends (pass in an optional offset). Most of the time the plus RTX is already present in the original MEM. Like Breakpoint 6, record_store (body=0x742caa98, bb_info=0x3ea3b60) at /home/rguenther/src/gcc2/gcc/dse.c:1529 1529mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); (reg/f:DI 19 frame) $14 = void (gdb) p debug_rtx (mem) (mem/c:DI (plus:DI (reg/f:DI 19 frame) (const_int -440 [0xfe48])) [1 MEM[(struct __st_parameter_dt *)_13].format_len+0 S8 A64]) $15 = void (gdb) p offset $16 = {> = {coeffs = {-440}}, } trivially pattern matching existing PLUS like if (MEM_P (mem) && GET_CODE (XEXP (mem, 0)) == PLUS && XEXP (XEXP (mem, 0), 0) == mem_addr && CONST_INT_P (XEXP (XEXP (mem, 0), 1)) && known_eq (offset, INTVAL (XEXP (XEXP (mem, 0), 1 mem_addr= XEXP (mem, 0); else mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); doesn't help much. Most cases seem to be build over (value:...) RTXen, those we could ggc_free I presume. Doing that in check_mem_read_rtx doesn't help though.
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #3 from Jakub Jelinek --- For #c2 I've tried: --- gcc/tree-vect-generic.c.jj 2021-01-04 10:25:38.289239984 +0100 +++ gcc/tree-vect-generic.c 2021-01-27 13:53:28.457752505 +0100 @@ -2147,16 +2147,21 @@ expand_vector_operations_1 (gimple_stmt_ || code == LROTATE_EXPR || code == RROTATE_EXPR) { - optab opv; + optab opv = optab_for_tree_code (code, type, optab_vector); /* Check whether we have vector {x,x,x,x} where x could be a scalar variable or a constant. Transform - vector {x,x,x,x} ==> vector scalar. */ + vector {x,x,x,x} ==> vector scalar, unless +the backend only supports vector by vector and not +vecot by scalar. */ if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2))) { tree first; - if ((first = ssa_uniform_vector_p (rhs2)) != NULL_TREE) + op = optab_for_tree_code (code, type, optab_scalar); + if ((first = ssa_uniform_vector_p (rhs2)) != NULL_TREE + && (get_compute_type (code, opv, type) != type + || get_compute_type (code, op, type) == type)) { gimple_assign_set_rhs2 (stmt, first); update_stmt (stmt); @@ -2164,7 +2169,6 @@ expand_vector_operations_1 (gimple_stmt_ } } - opv = optab_for_tree_code (code, type, optab_vector); if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (rhs2))) op = opv; else but that doesn't really help, because while veclower21 doesn't undo what the vectorizer carefully did, match.pd during fre5 breaks it again: /* Prefer vector1 << scalar to vector1 << vector2 if vector2 is uniform. */ (for vec (VECTOR_CST CONSTRUCTOR) (simplify (shiftrotate @0 vec@1) (with { tree tem = uniform_vector_p (@1); } (if (tem) (shiftrotate @0 { tem; })) So, does ARM really only have vector shifts and not scalar? Though, PowerPC seems to have that too, I'll check out what it does on these testcases.
[Bug c++/98843] Building simple c++ modules example fails but successful with -save-temps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98843 --- Comment #2 from Nathan Sidwell --- thanks Gary, I expect to be able to reprduce the iostream.ii myself, and particularly as (the lack of) -save-temps seems to be significant, I'll probably need to.
[Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 --- Comment #1 from Martin Liška --- One can see it here: https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=245.639.0&plot.1=171.639.0&;
[Bug tree-optimization/98854] New: [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98854 Bug ID: 98854 Summary: [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Since the revision, the following is slower on znver2: $ make clean && m CFLAGS="-Ofast -march=znver2 -g" && cat sphfract | ./c-ray-mt -o /dev/null c-ray-mt v1.1 Rendering took: 1 seconds (1798 milliseconds) while GCC 10 has: c-ray-mt v1.1 Rendering took: 1 seconds (1585 milliseconds)
[Bug tree-optimization/80198] [8/9/10/11 Regression] does not vectorize generic inplace integer operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198 --- Comment #21 from rguenther at suse dot de --- On Wed, 27 Jan 2021, rsandifo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198 > > --- Comment #20 from rsandifo at gcc dot gnu.org gnu.org> --- > (In reply to Richard Biener from comment #19) > > So I think when you consider > > > > void __attribute__((noinline)) fun(int * a, int * b, int c) > > { > > int i; > > for (i=0; i < 256; i++) { > > a[i] = b[i] | c; > > } > > } > > > > we can improve the versioning condition to allow a dependence distance > > of zero. > This one was fixed by r10-4803. E.g. for aarch64 we now have: > > add x3, x1, 4 > sub x3, x0, x3 > cmp x3, 8 > bls .L5 Ah, yeah - I failed to decipher the generated check: _7 = b_12 + 4; _22 = a_10 - _7; _23 = (sizetype) _22; if (_23 > 8) the difference is -4U and thus > 8 when a == b. > > Likewise with > > > > void __attribute__((noipa)) generic(int * a, int * b, int c) > > { > > int i; > > a = __builtin_assume_aligned (a, 16); > > b = __builtin_assume_aligned (b, 16); > > for (i=0; i < 256; i++) { > > a[i] = b[i] | c; > > } > > } > > > > we fail to realize no versioning check is required - the distance is > > either zero or a multiple of 16. > > > > Richard - ISTR you added some alignment considerations to the alias > > versioning code, but it doesn't seem to help? > I don't remember adding anything for that, but yeah, I agree it looks > like we need it.
[Bug target/98853] New: [11 Regression] wrong use of bfxil at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98853 Bug ID: 98853 Summary: [11 Regression] wrong use of bfxil at -O1 Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zsojka at seznam dot cz Target Milestone: --- Host: x86_64-pc-linux-gnu Target: aarch64-unknown-linux-gnu Created attachment 50068 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50068&action=edit reduced testcase Output: $ aarch64-unknown-linux-gnu-gcc -O testcase.c -static $ ./a.out qemu: uncaught target signal 6 (Aborted) - core dumped Aborted $ aarch64-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r11-6925-20210127102218-g6cf43433750-checking-yes-rtl-df-extra-aarch64/bin/../libexec/gcc/aarch64-unknown-linux-gnu/11.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --with-sysroot=/usr/aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --with-ld=/usr/bin/aarch64-unknown-linux-gnu-ld --with-as=/usr/bin/aarch64-unknown-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r11-6925-20210127102218-g6cf43433750-checking-yes-rtl-df-extra-aarch64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.0.0 20210127 (experimental) (GCC)
[Bug rtl-optimization/80960] [8/9/10/11 Regression] Huge memory use when compiling a very large test case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80960 Richard Biener changed: What|Removed |Added Known to work||4.3.4 Known to fail||4.8.5 --- Comment #23 from Richard Biener --- So now I see > /usr/bin/time gfortran-4.3 t.f90 -fdefault-integer-8 -O2 -ftime-report combiner : 0.25 ( 2%) usr 0.00 ( 0%) sys 0.24 ( 2%) wall 9947 kB ( 5%) ggc TOTAL : 15.43 0.2115.65 220667 kB 15.59user 0.24system 0:15.84elapsed 99%CPU (0avgtext+0avgdata 607492maxresident)k 0inputs+0outputs (0major+164981minor)pagefaults 0swaps > /usr/bin/time gfortran-4.8 t.f90 -fdefault-integer-8 -O2 -ftime-report combiner: 90.22 (48%) usr 1.07 (63%) sys 91.33 (48%) wall 1757344 kB (88%) ggc TOTAL : 188.29 1.70 190.04 2000994 kB 188.43user 1.73system 3:10.21elapsed 99%CPU (0avgtext+0avgdata 6523136maxresident)k 0inputs+0outputs (0major+1727565minor)pagefaults 0swaps > /usr/bin/time gfortran-7 t.f90 -fdefault-integer-8 -O2 -fno-checking > -ftime-report combiner: 67.18 (64%) usr 0.56 (60%) sys 67.76 (64%) wall 2701121 kB (60%) ggc TOTAL : 105.40 0.93 106.36 4530486 kB 105.54user 0.99system 1:46.58elapsed 99%CPU (0avgtext+0avgdata 3297696maxresident)k 48248inputs+0outputs (7major+835050minor)pagefaults 0swaps > /usr/bin/time gfortran-10 t.f90 -fdefault-integer-8 -O2 -fno-checking > -ftime-report combiner : 0.24 ( 0%) 0.00 ( 0%) 0.22 ( 0%) 10376 kB ( 1%) TOTAL : 52.02 0.49 52.52 1876905 kB 52.16user 0.52system 0:52.71elapsed 99%CPU (0avgtext+0avgdata 1831392maxresident)k 55032inputs+0outputs (8major+539965minor)pagefaults 0swaps (that combine number prevails on trunk as well, I can't spot any code that disables combine on large BBs so not sure what goes on here) At least clearly GCC 4.8.5 is bad as well and there's clear progression on both memory use and compile-time, still not up to the level of GCC 4.3. Interestingly memory-wise it all points to RTL DSE (GCC 10), likely because of DF. Eventually post-reload we can simplify some things... dead store elim2 : 6.90 ( 12%) 0.20 ( 27%) 7.12 ( 12%) 1641076 kB ( 87%)
[Bug c/98852] New: [11 Regression] Conditional expression wrongly rejected for arm_neon.h vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98852 Bug ID: 98852 Summary: [11 Regression] Conditional expression wrongly rejected for arm_neon.h vectors Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* #include uint8x16_t foo (int c, uint8x16_t x, uint8x16_t y) { return c ? x + 1 : y; } is wrongly rejected for C, but not C++. This is extracted from comment 8 of PR96377.
[Bug tree-optimization/80198] [8/9/10/11 Regression] does not vectorize generic inplace integer operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198 --- Comment #20 from rsandifo at gcc dot gnu.org --- (In reply to Richard Biener from comment #19) > So I think when you consider > > void __attribute__((noinline)) fun(int * a, int * b, int c) > { > int i; > for (i=0; i < 256; i++) { > a[i] = b[i] | c; > } > } > > we can improve the versioning condition to allow a dependence distance > of zero. This one was fixed by r10-4803. E.g. for aarch64 we now have: add x3, x1, 4 sub x3, x0, x3 cmp x3, 8 bls .L5 > Likewise with > > void __attribute__((noipa)) generic(int * a, int * b, int c) > { > int i; > a = __builtin_assume_aligned (a, 16); > b = __builtin_assume_aligned (b, 16); > for (i=0; i < 256; i++) { > a[i] = b[i] | c; > } > } > > we fail to realize no versioning check is required - the distance is > either zero or a multiple of 16. > > Richard - ISTR you added some alignment considerations to the alias > versioning code, but it doesn't seem to help? I don't remember adding anything for that, but yeah, I agree it looks like we need it.
[Bug tree-optimization/80198] [8/9/10/11 Regression] does not vectorize generic inplace integer operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80198 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Last reconfirmed|2017-03-27 00:00:00 |2021-1-27 --- Comment #19 from Richard Biener --- So I think when you consider void __attribute__((noinline)) fun(int * a, int * b, int c) { int i; for (i=0; i < 256; i++) { a[i] = b[i] | c; } } we can improve the versioning condition to allow a dependence distance of zero. Likewise with void __attribute__((noipa)) generic(int * a, int * b, int c) { int i; a = __builtin_assume_aligned (a, 16); b = __builtin_assume_aligned (b, 16); for (i=0; i < 256; i++) { a[i] = b[i] | c; } } we fail to realize no versioning check is required - the distance is either zero or a multiple of 16. Richard - ISTR you added some alignment considerations to the alias versioning code, but it doesn't seem to help?
[Bug target/98849] [11 Regression] ICE in expand_shift_1, at expmed.c:2658 since g:7432f255b70811dafaf325d9403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98849 --- Comment #2 from Jakub Jelinek --- int a[1024], b[1024]; void foo (void) { for (int i = 0; i < 1024; i++) a[i] = b[i] << 3; } void bar (int x) { for (int i = 0; i < 1024; i++) a[i] = b[i] << x; } ICEs with -O3 -mcpu=iwmmxt too. Here the vectorizer understands the target has vector x vector shift and not vector x scalar, so we get: vect_cst__13 = { 3, 3 }; ... vect__2.7_14 = vect__1.6_8 << vect_cst__13; but veclower carelessly undoes that: vect__2.7_14 = vect__1.6_8 << 3;