[Bug target/98399] x86: Awful code generation for shifting vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98399 Bug 98399 depends on bug 98434, which changed state. Bug 98434 Summary: [AVX512] Missing expander for vashl, vlshr, vashr{v32hi,v16hi,v4di,v8di} https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98434 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug other/98375] [meta bug] GCC 12 pending patches
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98375 Bug 98375 depends on bug 98434, which changed state. Bug 98434 Summary: [AVX512] Missing expander for vashl, vlshr, vashr{v32hi,v16hi,v4di,v8di} https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98434 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug target/98434] [AVX512] Missing expander for vashl, vlshr, vashr{v32hi,v16hi,v4di,v8di}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98434 Hongtao.liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Hongtao.liu --- Fixed in GCC12.
[Bug tree-optimization/101187] New: enhancement for vector shift with constant bigger than element precision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101187 Bug ID: 101187 Summary: enhancement for vector shift with constant bigger than element precision Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: jakub at redhat dot com, rguenth at gcc dot gnu.org Target Milestone: --- Host: x86_64-pc-linux-gnu using T = unsigned char; // or ushort, or uint using V [[gnu::vector_size(8)]] = T; V f(V x) { return x >> 8 * sizeof(T); } r12-1764 regresses pr91838.C with extra options: -march=cascadelake cat pr91838.C /* { dg-do compile { target c++11 } } */ /* { dg-additional-options "-O2 -Wno-psabi -w" } */ /* { dg-additional-options "-masm=att" { target i?86-*-* x86_64-*-* } } */ using T = unsigned char; // or ushort using V [[gnu::vector_size(8)]] = T; V f(V x) { return x >> 8 * sizeof(T); } /* { dg-final { scan-assembler {pxor\s+%xmm0,\s+%xmm0} { target { { i?86-*-* x86_64-*-* } && lp64 } } } } */ w/o vlshr_optab, vector operation will be lowered to scalar and be optimized by pass_ccp4 in gimple. But w/ vlshr_optab, it's not optimized and left to the backend, in the backend we don't optimize(just like what we did in ix86_expand_vec_shift_qihi_constant). As suggested by Richi, we may need to add a gimple simplification pattern to handle this.
[Bug target/98434] [AVX512] Missing expander for vashl, vlshr, vashr{v32hi,v16hi,v4di,v8di}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98434 --- Comment #3 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:3bd86940c428de9dde53e41265fb1435ed236f5e commit r12-1764-g3bd86940c428de9dde53e41265fb1435ed236f5e Author: liuhongt Date: Tue Jan 26 16:29:32 2021 +0800 i386: Add vashlm3/vashrm3/vlshrm3 to enable vectorization of vector shift vector. [PR98434] Add expanders for vashl, vlshr, vashr and vashr. Besides there's some assumption in expand_mult_const that mul and add must be available at the same time, but for i386, addv8qi is restricted under TARGET_64BIT, but mulv8qi not, that could cause ICE. So restrict mulv8qi and shiftv8qi under TARGET_64BIT. gcc/ChangeLog: PR target/98434 * config/i386/i386-expand.c (ix86_expand_vec_interleave): Adjust comments for ix86_expand_vecop_qihi2. (ix86_expand_vecmul_qihi): Renamed to .. (ix86_expand_vecop_qihi2): Adjust function prototype to support shift operation, add static to definition. (ix86_expand_vec_shift_qihi_constant): Add static to definition. (ix86_expand_vecop_qihi): Call ix86_expand_vecop_qihi2 and ix86_expand_vec_shift_qihi_constant. * config/i386/i386-protos.h (ix86_expand_vecmul_qihi): Deleted. (ix86_expand_vec_shift_qihi_constant): Deleted. * config/i386/sse.md (VI12_256_512_AVX512VL): New mode iterator. (mulv8qi3): Call ix86_expand_vecop_qihi directly, add condition TARGET_64BIT. (mul3): Ditto. (3): Ditto. (vlshr3): Extend to support avx512 vlshr. (v3): New expander for vashr/vlshr/vashl. (vv8qi3): Ditto. (vashrv8hi3): Renamed to .. (vashr3): And extend to support V16QImode for avx512. (vashrv16qi3): Deleted. (vashrv2di3): Extend expander to support avx512 instruction. gcc/testsuite/ChangeLog: PR target/98434 * gcc.target/i386/pr98434-1.c: New test. * gcc.target/i386/pr98434-2.c: New test. * gcc.target/i386/avx512vl-pr95488-1.c: Adjust testcase.
[Bug tree-optimization/101186] predictable comparison of integer variables not folded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101186 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Keywords||missed-optimization
[Bug tree-optimization/101186] New: predictable comparison of integer variables not folded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101186 Bug ID: 101186 Summary: predictable comparison of integer variables not folded Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: dizhao at os dot amperecomputing.com Target Milestone: --- GCC fail to remove dead codes for following cases: #include void f (unsigned int a, unsigned int b, unsigned int c) // if a,b,c are signed, VRP can remove dead code { if (a == b) { printf ("a"); if (c < a) { printf ("b"); if (c >= b) printf ("Unreachable!"); } } } void g (int a, int b, int x, int y) { int c = y; if (a != 0) c = x; while (b < 1000) // without this loop, jump thread & VRP can remove dead code { if (a != 0) { if (c > x) printf ("Unreachable!"); } else printf ("a\n"); b++; } }
[Bug target/99488] dwz: /usr/lib/gcc/mips64el-linux-gnuabi64/11/go1: Found two copies of .debug_line_str section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99488 --- Comment #18 from YunQiang Su --- (In reply to Andrew Pinski from comment #15) > (In reply to YunQiang Su from comment #14) > > The problem sees due to some problem of LTO. > > So I if understand correctly this binutils patch is fixes the issue? If so > please close this bug as moved and open up a binutils bug and submit the > patch there. Yes. It should be the fix. While it is about the MIPS psABI and dwarf spec. so we still need to some comment about this solution.
[Bug target/99488] dwz: /usr/lib/gcc/mips64el-linux-gnuabi64/11/go1: Found two copies of .debug_line_str section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99488 --- Comment #17 from YunQiang Su --- (In reply to Jakub Jelinek from comment #16) > Are you sure about the .. in one of the zdebug section names? It is a typo.
[Bug target/101185] pr96814.c failed after r12-1669 on non-avx512 platform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101185 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #1) > Alloc order is just another kind of cost which can be compensated by > increasing cost of mask->integer and integer->mask. > > With below patch , pr96814 wouldn't generate any mask intructions execept > for > > kmovd %eax, %k1 > vpcmpeqd%ymm1, %ymm1, %ymm1 > vmovdqu8%ymm1, %ymm0{%k1}{z} > > which is what we want. > > > modified gcc/config/i386/i386.md > @@ -1335,7 +1335,7 @@ > (define_insn "*cmp_ccz_1" >[(set (reg FLAGS_REG) > (compare (match_operand:SWI1248_AVX512BWDQ_64 0 > - "nonimmediate_operand" ",?m,$k") > + "nonimmediate_operand" ",?m,*k") >(match_operand:SWI1248_AVX512BWDQ_64 1 "const0_operand")))] >"TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)" >"@ > modified gcc/config/i386/x86-tune-costs.h > @@ -2768,7 +2768,7 @@ struct processor_costs intel_cost = { >{6, 6, 6, 6, 6}, /* cost of storing SSE registers > in 32,64,128,256 and 512-bit */ >4, 4, /* SSE->integer and integer->SSE moves > */ > - 4, 4, /* mask->integer and integer->mask > moves */ > + 6, 6, /* mask->integer and integer->mask > moves */ I changed intel_cost just to validate 1 unit more cost is also enough for -mtune=intel to prevent generation of mask instructions.
[Bug tree-optimization/101173] [9/10/11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101173 --- Comment #5 from bin cheng --- (In reply to Richard Biener from comment #3) > So we're exchanging the inner two loops > > a[1][3] = 8; > for (int b = 1; b <= 5; b++) > for (int d = 0; d <= 5; d++) > for (c = 0; c <= 5; c++) > a[b][c] = a[b][c + 2] & 216; > > to > > a[1][3] = 8; > for (int b = 1; b <= 5; b++) > for (c = 0; c <= 5; c++) > for (int d = 0; d <= 5; d++) > a[b][c] = a[b][c + 2] & 216; > > but that looks wrong from a dependence analysis perspective. We have > > (compute_affine_dependence > ref_a: a[b_33][_1], stmt_a: _2 = a[b_33][_1]; > ref_b: a[b_33][c.3_32], stmt_b: a[b_33][c.3_32] = _3; > (analyze_overlapping_iterations > (chrec_a = {2, +, 1}_5) > (chrec_b = {0, +, 1}_5) > (analyze_siv_subscript > (analyze_subscript_affine_affine > (overlaps_a = [0 + 1 * x_1]) > (overlaps_b = [2 + 1 * x_1])) > ) > (overlap_iterations_a = [0 + 1 * x_1]) > (overlap_iterations_b = [2 + 1 * x_1])) > (analyze_overlapping_iterations > (chrec_a = {1, +, 1}_1) > (chrec_b = {1, +, 1}_1) > (overlap_iterations_a = [0]) > (overlap_iterations_b = [0])) > (analyze_overlapping_iterations > (chrec_a = {0, +, 1}_5) > (chrec_b = {2, +, 1}_5) > (analyze_siv_subscript > (analyze_subscript_affine_affine > (overlaps_a = [2 + 1 * x_1]) > (overlaps_b = [0 + 1 * x_1])) > ) > (overlap_iterations_a = [2 + 1 * x_1]) > (overlap_iterations_b = [0 + 1 * x_1])) > (analyze_overlapping_iterations > (chrec_a = {1, +, 1}_1) > (chrec_b = {1, +, 1}_1) > (overlap_iterations_a = [0]) > (overlap_iterations_b = [0])) > (build_classic_dist_vector > dist_vector = ( 0 0 2 > ) > ) > ) > > I don't see anything wrong with that at a first glance so the bug must be in > tree_loop_interchange::valid_data_dependences it checks > > /* Be conservative, skip case if either direction at i_idx/o_idx > levels is not '=' or '<'. */ > if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0) > return false; > > dist_vect is [0 0 2], i_idx 2 and o_idx 1 but I think that dist_vect[o_idx] > should exclude zero, thus > > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc > index f45b9364644..265e36c48d4 100644 > --- a/gcc/gimple-loop-interchange.cc > +++ b/gcc/gimple-loop-interchange.cc > @@ -1043,8 +1043,8 @@ tree_loop_interchange::valid_data_dependences > (unsigned i_idx, unsigned o_idx, > continue; > > /* Be conservative, skip case if either direction at i_idx/o_idx > -levels is not '=' or '<'. */ > - if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0) > +levels is not '=' (for the inner loop) or '<'. */ > + if (dist_vect[i_idx] < 0 || dist_vect[o_idx] <= 0) > return false; > } > } > > Bin - does this analysis look sound? Hi Richard, Thanks very much for helping on this. Sorry I would need a bit more time to answer this question. Thanks again.
[Bug target/101185] pr96814 failed after r12-1669 on non-avx512 platform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101185 --- Comment #2 from Hongtao.liu --- About longteam part, i'm working slowly on that, it's in PR98478.
[Bug target/101185] pr96814 failed after r12-1669 on non-avx512 platform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101185 --- Comment #1 from Hongtao.liu --- Alloc order is just another kind of cost which can be compensated by increasing cost of mask->integer and integer->mask. With below patch , pr96814 wouldn't generate any mask intructions execept for kmovd %eax, %k1 vpcmpeqd%ymm1, %ymm1, %ymm1 vmovdqu8%ymm1, %ymm0{%k1}{z} which is what we want. modified gcc/config/i386/i386.md @@ -1335,7 +1335,7 @@ (define_insn "*cmp_ccz_1" [(set (reg FLAGS_REG) (compare (match_operand:SWI1248_AVX512BWDQ_64 0 - "nonimmediate_operand" ",?m,$k") + "nonimmediate_operand" ",?m,*k") (match_operand:SWI1248_AVX512BWDQ_64 1 "const0_operand")))] "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)" "@ modified gcc/config/i386/x86-tune-costs.h @@ -2768,7 +2768,7 @@ struct processor_costs intel_cost = { {6, 6, 6, 6, 6}, /* cost of storing SSE registers in 32,64,128,256 and 512-bit */ 4, 4,/* SSE->integer and integer->SSE moves */ - 4, 4,/* mask->integer and integer->mask moves */ + 6, 6,/* mask->integer and integer->mask moves */ {4, 4, 4}, /* cost of loading mask register in QImode, HImode, SImode. */ {6, 6, 6}, /* cost if storing mask register @@ -2882,7 +2882,7 @@ struct processor_costs generic_cost = { {6, 6, 6, 10, 15}, /* cost of storing SSE registers in 32,64,128,256 and 512-bit */ 6, 6,/* SSE->integer and integer->SSE moves */ - 6, 6,/* mask->integer and integer->mask moves */ + 8, 8,/* mask->integer and integer->mask moves */ {6, 6, 6}, /* cost of loading mask register in QImode, HImode, SImode. */ {6, 6, 6}, /* cost if storing mask register So would the solution of increasing one more unit(or maybe more) for cost of mask->integer and integer->mask as compensation for changing alloca order be acceptable for you? or do you insist on reverting the x86_order_regs_for_local_alloc part?
[Bug target/101185] New: pr96814 failed after r12-1669 on non-avx512 platform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101185 Bug ID: 101185 Summary: pr96814 failed after r12-1669 on non-avx512 platform Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: uros at gcc dot gnu.org Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-*-* i?86-*-* > > > Running target unix/-m32 > > > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test > > > FAIL: gcc.target/i386/pr96814.c execution test > > > > > > Debugging pr96814 failure: > > > > > > 0x0804921d <+66>:mov%edx,%ecx > > > 0x0804921f <+68>:cpuid > > > => 0x08049221 <+70>:kmovd %edx,%k0 > > > 0x08049225 <+74>:mov%eax,-0x8(%ebp) > > > 0x08049228 <+77>:mov%ebx,-0xc(%ebp) > > > mask intructions generated after cpuid which raise SIGILL on non-avx512 platform.
[Bug middle-end/95189] [9/10 Regression] memcmp being wrongly stripped like strcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189 --- Comment #30 from Rich Felker --- This is a critical codegen issue. Is it really still not fixed in 9.4.0?
[Bug fortran/93524] [ISO C Binding][F2018] CFI_allocate – elem_size mishandled + sm wrongly set?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93524 --- Comment #7 from sandra at gcc dot gnu.org --- Now applied to GCC 11 too. The other two patches referenced in this issue were put on mainline before GCC 11 branched and not on GCC 10 or any older branch, so I think I'm done here and the issue can be closed.
[Bug fortran/93524] [ISO C Binding][F2018] CFI_allocate – elem_size mishandled + sm wrongly set?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93524 --- Comment #6 from CVS Commits --- The releases/gcc-11 branch has been updated by Sandra Loosemore : https://gcc.gnu.org/g:1a2bbc08d9e5e4837d33afbb8c8347a182223a43 commit r11-8648-g1a2bbc08d9e5e4837d33afbb8c8347a182223a43 Author: Sandra Loosemore Date: Tue Jun 22 12:42:17 2021 -0700 Fortran: fix sm computation in CFI_allocate [PR93524] Backported from trunk. This patch fixes a bug in setting the step multiplier field in the C descriptor for array dimensions > 2. 2021-06-21 Sandra Loosemore Tobias Burnus libgfortran/ PR fortran/93524 * runtime/ISO_Fortran_binding.c (CFI_allocate): Fix sm computation. gcc/testsuite/ PR fortran/93524 * gfortran.dg/pr93524.c: New. * gfortran.dg/pr93524.f90: New.
[Bug tree-optimization/56223] Integer ABS is not recognized for more complicated pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56223 --- Comment #7 from Andrew Pinski --- (In reply to Andrew Pinski from comment #5) > I also noticed that factor_out_conditional_conversion has a similar issue > where the cast is inside both if and else part. I have a fix for that, though we don't remove one of the BB if it becomes empty.
[Bug c++/101184] New: [modules] ICE and unexpected behavior when using precisely 5 stl-memory includes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101184 Bug ID: 101184 Summary: [modules] ICE and unexpected behavior when using precisely 5 stl-memory includes. Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: 1lumin at protonmail dot com Target Milestone: --- The following five modules evoke a ICE when exporting module test_impl as compiled with the commands below. The 5 file example below is the smallest I could reduce it to, the standard memory include is used. Clang handles this fine with both libc++ and libstdc++ memory includes. gcc version 11.1.0 Target: x86_64-pc-linux-gnu FILE: filler1.cc / module; #include export module filler1; / FILE: filler2.cc / module; #include export module filler2; / FILE: filler3.cc / module; #include export module filler3; / FILE: vec.cc / module; #include export module vec; export template struct ivec{}; / FILE: test_impl.cc / module; #include import filler1; import filler2; import filler3; import vec; export module test_impl; ivec g_vec2; / Commands to compile: mkdir build g++ -std=c++2a -fmodules-ts -c filler1.cc -o build/filler1.pcm g++ -std=c++2a -fmodules-ts -c filler2.cc -o build/filler2.pcm g++ -std=c++2a -fmodules-ts -c filler3.cc -o build/filler3.pcm g++ -std=c++2a -fmodules-ts -c vec.cc -o build/vec.pcm g++ -std=c++2a -fmodules-ts -c test_impl.cc -o build/test_impl.pcm NOTES: / test_impl must be a module. If lines 1 and 10 are removed, it compiles fine. Despite not exporting anything, filler 1, 2, and 3 must be included for the error to occur. If any or all are removed, it compiles fine. ivec must be a template. Even though the template argument is not used, if it is of a non-template type, it compiles fine. Memory as a the shared header is particular from what I have gathered, if swapped with vector, array, functional, etc... it compiles fine. g_vec2 must be defined in the module test_impl. The scope of g_vec2 within the module does not seem to matter. / MISC: Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-werror gdc_include_dir=/usr/include/dlang/gdc test_impl.cc:10:8: internal compiler error: in write_location, at cp/module.cc:15605 10 | export module test_impl; |^~ 0x1797368 internal_error(char const*, ...) ???:0 0x67f8f9 fancy_abort(char const*, int, char const*) ???:0 0x7653b2 trees_out::core_vals(tree_node*) ???:0 0x765dd1 trees_out::tree_node_vals(tree_node*) ???:0 0x766d3a trees_out::decl_value(tree_node*, depset*) ???:0 0x761a9a trees_out::decl_node(tree_node*, walk_kind) ???:0 0x761e43 trees_out::tree_node(tree_node*) ???:0 0x764d30 module_state::write_cluster(elf_out*, depset**, unsigned int, depset::hash&, unsigned int*, unsigned int*) ???:0 0x767ec9 module_state::write(elf_out*, cpp_reader*) ???:0 0x768d86 finish_module_processing(cpp_reader*) ???:0 0x713d1b c_parse_final_cleanups() ???:0 .ii of test_impl module is attached.
[Bug pch/101183] [compiler ICE]gcc mingw for precompiled header file. MapViewOfFileEx: Attempt to access invalid address.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101183 --- Comment #3 from Andrew Pinski --- (In reply to cqwrteur from comment #2) > https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-gcc/0010-Fix- > using-large-PCH.patch > But why not add these patches to GCC itself? You have to ask the mingw project that question. Have them submit them to gcc.
[Bug pch/101183] [compiler ICE]gcc mingw for precompiled header file. MapViewOfFileEx: Attempt to access invalid address.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101183 --- Comment #2 from cqwrteur --- (In reply to Andrew Pinski from comment #1) > Dup of bug 91440. > > https://github.com/msys2/MINGW-packages/issues/5719 > > So you have to manually setdllcharacteristics on cc1.exe and cc1plus.exe > > *** This bug has been marked as a duplicate of bug 91440 *** https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-gcc/0010-Fix-using-large-PCH.patch But why not add these patches to GCC itself?
[Bug c++/101182] [concepts] ICE with ++ in non-template requires-expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101182 Patrick Palka changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ppalka at gcc dot gnu.org Last reconfirmed||2021-06-23 Ever confirmed|0 |1 CC||ppalka at gcc dot gnu.org
[Bug tree-optimization/25290] PHI-OPT could be rewritten so that is uses match
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=25290 --- Comment #24 from Andrew Pinski --- Next patch series can be found here which removes abs_replacement: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573558.html
[Bug c/101176] valgrind error for c-c++-common/builtin-has-attribute.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101176 Jakub Jelinek changed: What|Removed |Added Last reconfirmed||2021-06-23 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- Created attachment 51058 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51058=edit gcc12-pr101176.patch Untested fix.
[Bug pch/91440] Precompiled headers regression in 9.2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91440 Andrew Pinski changed: What|Removed |Added CC||unlvsur at live dot com --- Comment #7 from Andrew Pinski --- *** Bug 101183 has been marked as a duplicate of this bug. ***
[Bug pch/101183] [compiler ICE]gcc mingw for precompiled header file. MapViewOfFileEx: Attempt to access invalid address.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101183 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- Dup of bug 91440. https://github.com/msys2/MINGW-packages/issues/5719 So you have to manually setdllcharacteristics on cc1.exe and cc1plus.exe *** This bug has been marked as a duplicate of bug 91440 ***
[Bug preprocessor/101183] New: gcc mingw for precompiled header file. MapViewOfFileEx: Attempt to access invalid address.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101183 Bug ID: 101183 Summary: gcc mingw for precompiled header file. MapViewOfFileEx: Attempt to access invalid address. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: preprocessor Assignee: unassigned at gcc dot gnu.org Reporter: unlvsur at live dot com Target Milestone: --- D:\hg\fast_io\examples\0001.helloworld>g++ -o helloworld helloworld.cc -Ofast -std=c++20 -s -flto -march=native -I../../include internal error in mingw32_gt_pch_use_address, at config/i386/host-mingw32.c:192: MapViewOfFileEx: Attempt to access invalid address.
[Bug c++/101174] [12 Regression] CTAD causes instantiation of invalid class specialization since r12-926
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101174 Patrick Palka changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Patrick Palka --- Fixed.
[Bug c++/101174] [12 Regression] CTAD causes instantiation of invalid class specialization since r12-926
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101174 --- Comment #3 from CVS Commits --- The master branch has been updated by Patrick Palka : https://gcc.gnu.org/g:7da4eae3dcef6fd5d955eb2c80c453aa52368004 commit r12-1762-g7da4eae3dcef6fd5d955eb2c80c453aa52368004 Author: Patrick Palka Date: Wed Jun 23 17:23:39 2021 -0400 c++: excessive instantiation during CTAD [PR101174] We set DECL_CONTEXT on implicitly generated deduction guides so that their access is consistent with that of the constructor. But this apparently leads to excessive instantiation in some cases, ultimately because instantiation of a deduction guide should be independent of instantiation of the resulting class specialization, but setting the DECL_CONTEXT of the former to the latter breaks this independence. To fix this, this patch makes push_access_scope handle artificial deduction guides specifically rather than setting their DECL_CONTEXT in build_deduction_guide. We could alternatively make the class befriend the guide via DECL_BEFRIENDING_CLASSES, but that wouldn't be a complete fix and would break class-deduction-access3.C below since friendship isn't transitive. PR c++/101174 gcc/cp/ChangeLog: * pt.c (push_access_scope): For artificial deduction guides, set the access scope to that of the constructor. (pop_access_scope): Likewise. (build_deduction_guide): Don't set DECL_CONTEXT on the guide. libstdc++-v3/ChangeLog: * testsuite/23_containers/multiset/cons/deduction.cc: Uncomment CTAD example that was rejected by this bug. * testsuite/23_containers/set/cons/deduction.cc: Likewise. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/class-deduction-access3.C: New test. * g++.dg/cpp1z/class-deduction91.C: New test.
[Bug c++/101182] New: [concepts] ICE with ++ in non-template requires-expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101182 Bug ID: 101182 Summary: [concepts] ICE with ++ in non-template requires-expression Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: jason at gcc dot gnu.org Blocks: 67491 Target Milestone: --- int f() { if (auto a_ = f(); requires{a_++;}) {} return 0; } Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67491 [Bug 67491] [meta-bug] concepts issues
[Bug tree-optimization/101179] y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101179 --- Comment #5 from Andrew Pinski --- (In reply to Andrew Pinski from comment #4) > Only the last one produces the best code. So for clang, f1-f3 produces the same code but f4 is bad. It was only fixed in clang 10.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #21 from Fangrui Song --- (In reply to Fangrui Song from comment #20) > For example, if an inlining pass happens after instrumentation, then the > function attribute doesn't necessarily need to suppress inlining. After > instrumentation is done, we can even treat the noprofile attribute as a > no-op. Sent too early:) Amendment: a smart inliner can inline the noprofile callee and then drop instrumentation code. That will also be an approach which does not break the "no instrumenting my code" contract. Other approaches can be (probably more relevant to function specialization/clones): the instrumentation pass can leave an un-instrumented copy which can be used by a subsequent inliner. As we can see, all these approaches are much more complex than simply "suppressing inlining". So I agree that "suppressing inlining" is a good implementation detail here.
[Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866 --- Comment #14 from Segher Boessenkool --- (In reply to luoxhu from comment #13) > It is not visible in combine due to the constant data is in *.LC0 and combine can see things in the constant pool in various ways though (just like many other parts of the compiler). But yeah, unspecs are a big hurdle to optimisation always. If we would express this as some "real" RTL we would need a few variants: one that takes only one register as data input and another that takes two; one that has all permutation indices in range and another that masks them; and maybe a few more.
[Bug middle-end/101134] Bogus -Wstringop-overflow warning about non-existent overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101134 --- Comment #7 from Martin Sebor --- Changing the warning text from "does X" to "may do X" wouldn't help because all instances of it (or all warnings) would have to use the latter form, and that's already implied by the former. Every GCC warning already means "something looks fishy here" and not "this is definitely a bug." Not just because not every suspicious piece of code is necessarily a bug, or because no warning is completely free of false positives, but also because every flow-sensitive warning also depends on whether control can reach the construct it warns about (as in: is the function where X occurs ever called?) Users who expect otherwise simply need to adjust their expectations (as per the manual).
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #20 from Fangrui Song --- (In reply to Marco Elver from comment #19) I am ok with "inlining suppression" as an implementation strategy and I agree that it should be useful. What I objected strongly is "promised inlining suppression". For example, if an inlining pass happens after instrumentation, then the function attribute doesn't necessarily need to suppress inlining. After instrumentation is done, we can even treat the noprofile attribute as a no-op. The example applies to the non-LTO case -fsanitize-coverage= . (We don't actually use the noprofile function attribute for -fsanitize-coverage=, but I cannot find a better example in LLVM; I think all other noprofile affected instrumentations happen before the inliner pipeline). So in a documentation, it can be said that the inlined copy (if any) will not get instrumentation, but it **should not** say that a noprofile function cannot be inlined into a function without the attribute.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #19 from Marco Elver --- (In reply to Fangrui Song from comment #18) [...] > Our problem is that a boolean attribute with 1 bit information cannot > express whether a neg attribute function can be inlined into a pos attribute > function. > > Let's agree to disagree. I don't see why a no_profile_instrument_function > function suppress inlining into a function without the attribute. For the > use cases where users want to suppress inlining, they can add noinline. What > I worry about is that now GCC has an attitude and if the LLVM side doesn't > follow it is like diverging. However, the GCC patch is still in review. I > think a similar topic may need to be raided on llvm-dev side as I feel this > is the tip of the iceberg - more attributes can be similarly leveraged. So, > how about a llvm-dev discussion? I have mentioned this several times now, but it seems nobody is listening: It's _not_ about inlining -- the inlining discussion is about a particular implementation strategy. It's about the _contract an attribute promises_, which is treating the code in the function a certain way (e.g. do not instrument). That can be done by either: a) even if the code is inlined, respect the original attribute for the inlined code (e.g. do not instrument), or b) just don't inline. It looks like (b) is easier to do. I probably do not understand how hard (a) is. If you break the contract because it's too hard to do (a), then that's your problem. Just don't break the contract. Because that's how we get impossible-to-diagnose bugs. Correctness comes first: if it is impossible for a user to reason about the behaviour of their code due to unspecified behaviour (viz. breaking the contract) of an attribute, then the code is doomed to be incorrect. Therefore, do _not_ implement attributes with either unspecified or ridiculously specified behaviour. Ridiculous in this case is saying "this attribute only does what it promises if you also add noinline". It's ridiculous, because the user will then rightfully wonder "?!?!? Why doesn't it imply noinline then?!?!?!". Thanks.
[Bug rtl-optimization/100328] IRA doesn't model matching constraint well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100328 --- Comment #2 from Vladimir Makarov --- (In reply to Kewen Lin from comment #1) > Created attachment 50715 [details] > ira:consider matching cstr in all alternatives > > With little understanding on ira, I am not quite sure this patch is on the > reasonable direction. It aims to check the matching constraint in all > alternatives, if there is one alternative with matching constraint and > matches the current preferred regclass, it will record the output operand > number and further create one copy for it. Normally it can get the priority > against shuffle copies and the matching constraint will get satisfied with > higher possibility, reload doesn't create extra copies to meet the matching > constraint or the desirable register class when it has to. > > For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as > shuffle copies, and later any of A,B,C,D gets assigned by one hardware > register which is a VSX register but not a FP register, which means it has > to pay costs once we can NOT go with VSX alternatives, so at that time we > can increase the freq for the remaining copies related to this, once the > matching constraint gets satisfied further, there aren't any extra costs to > pay. This idea seems a bit complicated in the current framework, so the > proposed patch aggressively emphasizes the matching constraint at the time > of creating copies. > > FWIW bootstrapped/regtested on powerpc64le-linux-gnu P9. The evaluation with > Power9 SPEC2017 all run shows 505.mcf_r +2.98%, 508.namd_r +3.37%, 519.lbm_r > +2.51%, no remarkable degradation is observed. Thank you for working on this issue. The current implementation of ira_get_dup_out_num was specifically tuned for better register allocation for x86-64 div insns. Your patch definitely improves code for power9 and I would say significantly (congratulations!). The patch you proposed makes me think that it might work for major targets as well. I would prefer to avoid introducing new parameter because there are too many of them already and its description is cryptic. It would be nice if you benchmark the patch on x86-64 too, If there is no overall degradation with new behaviour we could remove the parameter and make the new behaviour as a default. If it is not, well we will keep the parameter. As for the patch itself, I don't like some variable names. Sorry. Could you use op_regno, out_regno, and present_alt instead of op_no, out_no, tot. Please, in general use longer variable names reflecting their purpose as GCC developers reads code in many times more than writing it.
[Bug target/101175] builtin_clz generates wrong bsr instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175 --- Comment #6 from CVS Commits --- The releases/gcc-11 branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:e99256fc5eab1cf8958223d79b23e359b6d5ca60 commit r11-8644-ge99256fc5eab1cf8958223d79b23e359b6d5ca60 Author: Uros Bizjak Date: Wed Jun 23 12:50:53 2021 +0200 i386: Prevent unwanted combine from LZCNT to BSR [PR101175] The current RTX pattern for BSR allows combine pass to convert LZCNT insn to BSR. Note that the LZCNT has a defined behavior to return the operand size when operand is zero, where BSR has not. Add a BSR specific setting of zero-flag to RTX pattern of BSR insn in order to avoid matching unwanted combinations. 2021-06-23 Uroš Bizjak gcc/ PR target/101175 * config/i386/i386.md (bsr_rex64): Add zero-flag setting RTX. (bsr): Ditto. (*bsrhi): Remove. (clz2): Update RTX pattern for additions. gcc/testsuite/ PR target/101175 * gcc.target/i386/pr101175.c: New test. (cherry picked from commit 1e16f2b472c7d253d564556a048dc4ae16119c00)
[Bug tree-optimization/101179] y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101179 --- Comment #4 from Andrew Pinski --- But here are two other functions which all should have the same code gen as the original two: int f3(int y) { const bool x = y % 100 == 0; return (x ? y%16 : y%4) == 0; } int f4(int y) { const bool x = y % 100 == 0; return (x ? (y%16) == 0 : (y%4) == 0); } Only the last one produces the best code.
[Bug c++/101181] New: ICE when using an alias template
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101181 Bug ID: 101181 Summary: ICE when using an alias template Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: webrown.cpp at gmail dot com Target Milestone: --- The following program produces "Segmentation fault: 11 signal terminated program cc1plus" when compiled with flags -std=c++23 -fmodules-ts -pedantic-errors -O0 -c using gcc trunk version (Homebrew GCC HEAD-da13e4e_1) 12.0.0 20210623 (experimental) template< class T , bool = requires { typename T::pointer; } > struct p { using type = void; }; template< class T > struct p { using type = T::pointer; }; template< class T > using P = typename p::type; Without the final alias template, all seems well.
[Bug tree-optimization/101179] y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101179 --- Comment #3 from Jonathan Wakely --- the ?: one seems to produce better code currently though, so I'm not sure transforming it to the shift is what we want.
[Bug tree-optimization/101179] y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101179 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Last reconfirmed||2021-06-23 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- (x ? 16 : 4) to: (4 << (x * 2)) Should be easy to add to match.pd's /* A few simplifications of "a ? CST1 : CST2". */ And PHI-OPT will use it without you doing anything extra.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #18 from Fangrui Song --- (In reply to Nick Desaulniers from comment #15) > (In reply to Fangrui Song from comment #14) > > Can a no_profile_instrument_function function be inlined into a function > > without the attribute? This may be controversial but I'd argue that it can. > > GCC no_stack_protector behaves this way. no_profile_instrument_function can > > mean that user does not want profiling when the function is called with its > > entity, not via another entity. > > I respectfully but strongly disagree. It's surprising to developers when > they ask for no stack protector, or no profiling instrumentation, then get > one anyways. For long call chains, it's hard for developers to diagnose on > their own which function they called that missed such function attribute. > > This reminds me of "what color is your function?" > https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ > As suddenly a developer would need to verify for a no_* attributed function > that they only call no_* attributed functions, or add noinline (which is a > big hammer to all call sites, and games with aliases that have the noinline > attribute are kind of ridiculous). > > It's less surprising to prevent inline substitution upon function attribute > mismatch. Then a developer can self diagnose with -Rpass=inline. Either way, > some form of diagnostics would be helpful for these kinds of issues, and has > been requested by Android platform developers working on Zygote. > > For no_stack_protector in LLVM, I implemented the rules: upon mismatch, > prevent inline substitution unless the user specified always_inline. This > fixed suspend/resume bugs in x86 Linux kernels when built with LTO. > > Though, I'm happy to revisit that behavior in LLVM; we could add > > #define noinline_for_lto __attribute__((__noinline__)) > > then use that in the Linux kernel instead. Our problem is that a boolean attribute with 1 bit information cannot express whether a neg attribute function can be inlined into a pos attribute function. Let's agree to disagree. I don't see why a no_profile_instrument_function function suppress inlining into a function without the attribute. For the use cases where users want to suppress inlining, they can add noinline. What I worry about is that now GCC has an attitude and if the LLVM side doesn't follow it is like diverging. However, the GCC patch is still in review. I think a similar topic may need to be raided on llvm-dev side as I feel this is the tip of the iceberg - more attributes can be similarly leveraged. So, how about a llvm-dev discussion?
[Bug c/101171] [12 Regression] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_expr_sizeof_expr, at c/c-typeck.c:3006
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101171 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Jakub Jelinek --- Created attachment 51057 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51057=edit gcc12-pr101171.patch Untested fix.
[Bug c++/101174] [12 Regression] CTAD causes instantiation of invalid class specialization since r12-926
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101174 Patrick Palka changed: What|Removed |Added Last reconfirmed||2021-06-23 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ppalka at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Patrick Palka --- The particular problem here is that during dguide overload resolution for multiset(42), we briefly consider the implicit deduction guide for the second ctor: template multiset(U) -> multiset which after substituting deduced template arguments becomes multiset(int) -> multiset and after r12-926, its (substituted) DECL_CONTEXT is also multiset rather than empty. Since DECL_CLASS_SCOPE_P is now true for implicit deduction guides, we try to complete/instantiate its DECL_CONTEXT via the call to DERIVED_FROM_P in joust(): /* F1 is a member of a class D, F2 is a member of a base class B of D, and for all arguments the corresponding parameters of F1 and F2 have the same type (CWG 2273/2277). */ if (DECL_P (cand1->fn) && DECL_CLASS_SCOPE_P (cand1->fn) && !DECL_CONV_FN_P (cand1->fn) && DECL_P (cand2->fn) && DECL_CLASS_SCOPE_P (cand2->fn) && !DECL_CONV_FN_P (cand2->fn)) { tree base1 = DECL_CONTEXT (strip_inheriting_ctors (cand1->fn)); tree base2 = DECL_CONTEXT (strip_inheriting_ctors (cand2->fn)); bool used1 = false; bool used2 = false; if (base1 == base2) /* No difference. */; else if (DERIVED_FROM_P (base1, base2)) // XXX used1 = true; else if (DERIVED_FROM_P (base2, base1)) used2 = true; which results in the hard error seen. I'm testing setting DECL_BEFRIENDING_CLASSES instead of DECL_CONTEXT on an implicit deduction guide, to avoid such accidental instantiations
[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 101014, which changed state. Bug 101014 Summary: [12 Regression] Big compile time hog with -O3 since r12-1268-g9858cd1a6827ee7a https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101014 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/101014] [12 Regression] Big compile time hog with -O3 since r12-1268-g9858cd1a6827ee7a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101014 Andrew Macleod changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #20 from Andrew Macleod --- Hopefully this closes it for good. The final patch needed to adjust the propagation engine to avoid propagating the failed value more than once. The original patch simply stopped propagating immediately, and that caused other issues.
[Bug tree-optimization/101148] [12 Regression] ranger compile-tme hog when building 527.cam4_r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101148 Andrew Macleod changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #8 from Andrew Macleod --- Hopefully this closes it. The final patch is slightly different than the proposed one in 101014, as it had to change the propagation engine slightly as well.
[Bug tree-optimization/101014] [12 Regression] Big compile time hog with -O3 since r12-1268-g9858cd1a6827ee7a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101014 --- Comment #19 from CVS Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:a03e944e92ee51ae583382079d4739b64bd93b35 commit r12-1750-ga03e944e92ee51ae583382079d4739b64bd93b35 Author: Andrew MacLeod Date: Tue Jun 22 17:46:05 2021 -0400 Do not continue propagating values which cannot be set properly. If the on-entry cache cannot properly represent a range, do not continue trying to propagate it. PR tree-optimization/101148 PR tree-optimization/101014 * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust. (ranger_cache::~ranger_cache): Adjust. (ranger_cache::block_range): Check if propagation disallowed. (ranger_cache::propagate_cache): Disallow propagation if new value can't be stored properly. * gimple-range-cache.h (ranger_cache::m_propfail): New member.
[Bug tree-optimization/101148] [12 Regression] ranger compile-tme hog when building 527.cam4_r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101148 --- Comment #7 from CVS Commits --- The master branch has been updated by Andrew Macleod : https://gcc.gnu.org/g:a03e944e92ee51ae583382079d4739b64bd93b35 commit r12-1750-ga03e944e92ee51ae583382079d4739b64bd93b35 Author: Andrew MacLeod Date: Tue Jun 22 17:46:05 2021 -0400 Do not continue propagating values which cannot be set properly. If the on-entry cache cannot properly represent a range, do not continue trying to propagate it. PR tree-optimization/101148 PR tree-optimization/101014 * gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust. (ranger_cache::~ranger_cache): Adjust. (ranger_cache::block_range): Check if propagation disallowed. (ranger_cache::propagate_cache): Disallow propagation if new value can't be stored properly. * gimple-range-cache.h (ranger_cache::m_propfail): New member.
[Bug target/89021] Implement mmintrin.h in SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89021 --- Comment #58 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:37e93925366676201b526624e9f8dc32d82b4ff2 commit r12-1746-g37e93925366676201b526624e9f8dc32d82b4ff2 Author: Uros Bizjak Date: Wed Jun 23 16:14:31 2021 +0200 i386: Add PPERM two-operand 64bit vector permutation [PR89021] Add emulation of V8QI PPERM permutations for TARGET_XOP target. Similar to PSHUFB, the permutation is performed with V16QI PPERM instruction, where selector is defined in V16QI mode with inactive elements set to 0x80. Specific to two operand permutations is the remapping of elements from the second operand (e.g. e[8] -> e[16]), as we have to account for the inactive elements from the first operand. 2021-06-23 Uroš Bizjak gcc/ PR target/89021 * config/i386/i386-expand.c (expand_vec_perm_pshufb): Handle 64bit modes for TARGET_XOP. Use indirect gen_* functions. * config/i386/mmx.md (mmx_ppermv64): New insn pattern. * config/i386/i386.md (unspec): Move UNSPEC_XOP_PERMUTE from ... * config/i386/sse.md (unspec): ... here.
[Bug tree-optimization/101179] y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101179 --- Comment #1 from Jonathan Wakely --- On IRC Richi said: "VRP has code to do that but maybe for some reason shifts are not handled"
[Bug tree-optimization/94084] Optimizer produces suboptimal code related to loop-invariant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94084 vfdff changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID No valid bug report
[Bug target/98636] [ARM] ICE on passing incompatible options for fp16 - global_options’ are modified in local context
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98636 Martin Liška changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #23 from Martin Liška --- Fixed again.
[Bug target/98636] [ARM] ICE on passing incompatible options for fp16 - global_options’ are modified in local context
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98636 --- Comment #22 from CVS Commits --- The master branch has been updated by Martin Liska : https://gcc.gnu.org/g:371c1992624c9269e2d5747561a8b27b30e485ee commit r12-1745-g371c1992624c9269e2d5747561a8b27b30e485ee Author: Martin Liska Date: Wed Jun 23 15:30:17 2021 +0200 arm: Revert partially ebd5e86c0f41dc1d692f9b2b68a510b1f6835a3e PR target/98636 gcc/ChangeLog: * optc-save-gen.awk: Put back arm_fp16_format to checked_options.
[Bug middle-end/101167] Miscompilation of task_reduction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101167 --- Comment #2 from CVS Commits --- The releases/gcc-11 branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:f50a222dffb448ef5c69a64b6945acafc6b16e12 commit r11-8643-gf50a222dffb448ef5c69a64b6945acafc6b16e12 Author: Jakub Jelinek Date: Wed Jun 23 10:03:28 2021 +0200 openmp: Fix up *_reduction clause handling with UDRs on PARM_DECLs [PR101167] The following testcase FAILs, because the UDR combiner is invoked incorrectly. lower_omp_rec_clauses expects that when it sets DECL_VALUE_EXPR/DECL_HAS_VALUE_EXPR_P for both the placeholder and the var that everything will be properly regimplified, but as the variable in question is a PARM_DECL rather than VAR_DECL, lower_omp_regimplify_p doesn't say that it should be regimplified and so it is not. 2021-06-23 Jakub Jelinek PR middle-end/101167 * omp-low.c (lower_omp_regimplify_p): Regimplify also PARM_DECLs and RESULT_DECLs that have DECL_HAS_VALUE_EXPR_P set. * testsuite/libgomp.c-c++-common/task-reduction-15.c: New test. (cherry picked from commit 679506c3830ea1a93c755413609bfac3538e2cbd)
[Bug inline-asm/100785] [9/10/11/12 Regression] ICE: in expand_asm_stmt with "m" and bitfield
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100785 --- Comment #9 from CVS Commits --- The releases/gcc-11 branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b6e4453172e6502318d31517b7d3771b157ae71a commit r11-8642-gb6e4453172e6502318d31517b7d3771b157ae71a Author: Jakub Jelinek Date: Mon Jun 21 13:30:42 2021 +0200 inline-asm: Fix ICE with bitfields in "m" operands [PR100785] Bitfields, while they live in memory, aren't something inline-asm can easily operate on. For C and "=m" or "+m", we were diagnosing bitfields in the past in the FE, where c_mark_addressable had: case COMPONENT_REF: if (DECL_C_BIT_FIELD (TREE_OPERAND (x, 1))) { error ("cannot take address of bit-field %qD", TREE_OPERAND (x, 1)); return false; } but that check got moved in GCC 6 to build_unary_op instead and now we emit an error during expansion and ICE afterwards (i.e. error-recovery). For "m" it used to be diagnosed in c_mark_addressable too, but since GCC 6 it is ice-on-invalid. For C++, this was never diagnosed in the FE, but used to be diagnosed in the gimplifier and/or during expansion before 4.8. The following patch does multiple things: 1) diagnoses it in the FEs 2) simplifies during expansion the inline asm if any errors have been reported (similarly how e.g. vregs pass if it detects errors on inline-asm either deletes them or simplifies to bare minimum - just labels), so that we don't have error-recovery ICEs there 2021-06-11 Jakub Jelinek PR inline-asm/100785 gcc/ * cfgexpand.c (expand_asm_stmt): If errors are emitted, remove all inputs, outputs and clobbers from the asm and set template to "". gcc/c/ * c-typeck.c (c_mark_addressable): Diagnose trying to make bit-fields addressable. gcc/cp/ * typeck.c (cxx_mark_addressable): Diagnose trying to make bit-fields addressable. gcc/testsuite/ * c-c++-common/pr100785.c: New test. (cherry picked from commit 644c2cc5f2c09506a7bfef293a7f90efa8d7e5fa)
[Bug c++/101180] [12 Regression] Rejected code since r12-299-ga0fdff3cf33f7284
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101180 Martin Liška changed: What|Removed |Added Last reconfirmed||2021-06-23 Known to work||11.1.0 Target Milestone|--- |12.0 Known to fail||12.0 Status|UNCONFIRMED |NEW Ever confirmed|0 |1
[Bug c++/101180] New: [12 Regression] Rejected code since r12-299-ga0fdff3cf33f7284
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101180 Bug ID: 101180 Summary: [12 Regression] Rejected code since r12-299-ga0fdff3cf33f7284 Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: rejects-valid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: marxin at gcc dot gnu.org CC: jason at gcc dot gnu.org Target Milestone: --- The code is reduced from Skia (part of chromium package): $ cat enc.ii #pragma GCC target "avx" template struct Simd {}; #pragma GCC push_options #pragma GCC target "avx,avx2,bmi,bmi2,fma,f16c" template using Full256 = Simd; template struct BitCastFromInteger256; template <> struct BitCastFromInteger256 { __attribute__((always_inline)) float operator()(long) { return .0f; } }; long BitCastFromByte_v_0; template void BitCastFromByte(Full256) { T{BitCastFromInteger256()(BitCastFromByte_v_0)}; } template void BitCast(T d, FromT) { BitCastFromByte(d); } int EstimateEntropy___trans_tmp_3; void EstimateEntropy() { Simd df; BitCast(df, EstimateEntropy___trans_tmp_3); } #pragma GCC pop_options $ g++ enc.ii -c enc.ii: In function ‘void BitCastFromByte(Full256) [with T = float]’: enc.ii:8:40: error: inlining failed in call to ‘always_inline’ ‘float BitCastFromInteger256::operator()(long int)’: target specific option mismatch 8 | __attribute__((always_inline)) float operator()(long) { return .0f; } |^~~~ enc.ii:12:31: note: called from here 12 | T{BitCastFromInteger256()(BitCastFromByte_v_0)}; | ~~^
[Bug c++/86439] CTAD with deleted copy constructor fails due to deduction-guide taking by value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86439 --- Comment #2 from CVS Commits --- The master branch has been updated by Patrick Palka : https://gcc.gnu.org/g:3eecc1db4c691a87ef4a229d059aa863066d9a1c commit r12-1744-g3eecc1db4c691a87ef4a229d059aa863066d9a1c Author: Patrick Palka Date: Wed Jun 23 08:24:34 2021 -0400 c++: CTAD and deduction guide selection [PR86439] During CTAD, we select the best viable deduction guide using build_new_function_call, which performs overload resolution on the set of candidate guides and then forms a call to the guide. As the PR points out, this latter step is unnecessary and occasionally incorrect since a call to the selected guide may be ill-formed, or forming the call may have side effects such as prematurely deducing the type of a {}. So this patch introduces a specialized subroutine based on build_new_function_call that stops short of building a call to the selected function, and makes do_class_deduction use this subroutine instead. And since a call is no longer built, do_class_deduction doesn't need to set tf_decltype or cp_unevaluated_operand anymore. This change causes us to reject some container CTAD examples in the libstdc++ testsuite due to deduction failure for {}, which AFAICT is the correct behavior. Previously in e.g. the first removed example std::map{{std::pair{1, 2.0}, {2, 3.0}, {3, 4.0}}, {}}, the type of the {} would get deduced to less as a side effect of forming a call to the chosen guide template, typename _Allocator = allocator>> map(initializer_list>, _Compare = _Compare(), _Allocator = _Allocator()) -> map<_Key, _Tp, _Compare, _Allocator>; which made later overload resolution for the constructor call unambiguous. Now, the type of the {} remains undeduced until constructor overload resolution, and we complain about ambiguity for the two equally good constructor candidates map(initializer_list, const _Compare& = _Compare(), const allocator_type& = allocator_type()) map(initializer_list, const allocator_type&). This patch fixes these problematic container CTAD examples by giving the {} an appropriate concrete type. Two of these adjusted CTAD examples (one for std::set and one for std::multiset) end up triggering an unrelated CTAD bug on trunk, PR101174, so these two adjusted examples are commented out for now. PR c++/86439 gcc/cp/ChangeLog: * call.c (print_error_for_call_failure): Constify 'args' parameter. (perform_dguide_overload_resolution): Define. * cp-tree.h: (perform_dguide_overload_resolution): Declare. * pt.c (do_class_deduction): Use perform_dguide_overload_resolution instead of build_new_function_call. Don't use tf_decltype or set cp_unevaluated_operand. Remove unnecessary NULL_TREE tests. libstdc++-v3/ChangeLog: * testsuite/23_containers/map/cons/deduction.cc: Replace ambiguous CTAD examples. * testsuite/23_containers/multimap/cons/deduction.cc: Likewise. * testsuite/23_containers/multiset/cons/deduction.cc: Likewise. Mention one of the replaced examples is broken due to PR101174. * testsuite/23_containers/set/cons/deduction.cc: Likewise. * testsuite/23_containers/unordered_map/cons/deduction.cc: Replace ambiguous CTAD examples. * testsuite/23_containers/unordered_multimap/cons/deduction.cc: Likewise. * testsuite/23_containers/unordered_multiset/cons/deduction.cc: Likewise. * testsuite/23_containers/unordered_set/cons/deduction.cc: Likewise. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/class-deduction88.C: New test. * g++.dg/cpp1z/class-deduction89.C: New test. * g++.dg/cpp1z/class-deduction90.C: New test.
[Bug c/101171] [12 Regression] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_expr_sizeof_expr, at c/c-typeck.c:3006
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101171 --- Comment #3 from John X --- (In reply to Richard Biener from comment #1) > Is your GCC 11 compiler checking-enabled? I doubt it is a regression. gcc 11 build command: ``` configure --prefix=install_path --enable-languages=c --disable-multilib ``` Platform: Ubuntu 20.04 x64
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #17 from Martin Liška --- All right, similarly to sanitizer flags, I sent a patch that prevent inlining when -fprofile-generate is used: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573511.html Note that one typically uses the attribute when a function is hot and would delay instrumentation a lot. That's why we don't want the function be inlined. Moreover, each hot function excluded from instrumentation should be likely decorated with 'hot' attribute.
[Bug tree-optimization/101179] New: y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101179 Bug ID: 101179 Summary: y % (x ? 16 : 4) and y % (4 << (2 * (bool)x)) produce different code Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Target Milestone: --- These produces different assembly: int f1(int y) { const bool x = y % 100 == 0; return y % (x ? 16 : 4) == 0; } int f2(int y) { const bool x = y % 100 == 0; return y % (4 << (x * 2)) == 0; } Since they do the same calculation, I would expect them to produce the same code. Currently f1 produces slightly smaller code for aarch64 and x86_64. With Clang they produce the same code (but using cmov which might not be optimal).
[Bug target/101175] builtin_clz generates wrong bsr instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175 --- Comment #5 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:1e16f2b472c7d253d564556a048dc4ae16119c00 commit r12-1743-g1e16f2b472c7d253d564556a048dc4ae16119c00 Author: Uros Bizjak Date: Wed Jun 23 12:50:53 2021 +0200 i386: Prevent unwanted combine from LZCNT to BSR [PR101175] The current RTX pattern for BSR allows combine pass to convert LZCNT insn to BSR. Note that the LZCNT has a defined behavior to return the operand size when operand is zero, where BSR has not. Add a BSR specific setting of zero-flag to RTX pattern of BSR insn in order to avoid matching unwanted combinations. 2021-06-23 Uroš Bizjak gcc/ PR target/101175 * config/i386/i386.md (bsr_rex64): Add zero-flag setting RTX. (bsr): Ditto. (*bsrhi): Remove. (clz2): Update RTX pattern for additions. gcc/testsuite/ PR target/101175 * gcc.target/i386/pr101175.c: New test.
[Bug middle-end/101170] [12 Regression] ICE in df_ref_record building libgomp for ColdFire
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101170 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- Created attachment 51056 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51056=edit gcc12-pr101170.patch Untested fix.
Re: RISC-V: Parsing custom extension that is version 0
In our fork of gcc we go from "xpulpv0" to "xpulpv3". Technically, the versioning was not done 100% correctly (since some changes didn't require a major version bump) but either way I hit this issue when porting our patches to a newer gcc. Currently, I work around it with an additional check. Robert On 6/23/21 11:19 AM, Kito Cheng wrote: > Hi Robert: > > My assumption is the version should never be 0.0, at least 0.1, so it > is treated as 2p0, > but I didn't check if the input is really 0p0 or 0, that's kind of bug > we need to fix. > > And I am not familiar with PULP stuff, does it mean PULP really uses > version 0.0, > and intend to implement multiple-version of that on GCC? > > On Mon, Jun 21, 2021 at 10:07 AM Robert Balas via Gcc-bugs > wrote: >> >> When giving gcc a -march string with a custom extension of >> version 0 (for example pulpv0) then gcc will think assign in the >> default version of 2p0. >> >> In gcc/common/config/riscv/riscv-common.c the function >> riscv_subset_list::parsing_subset_version falls back to the >> default version (2p0) when parsing if the major and minor version >> are both zero (which is the case for the string "pulpv0"). This >> means both "pulpv0" and "pulpv2" will get assigned the version >> 2p0. Looks wrong to me. >> >> Robert
[Bug middle-end/101170] [12 Regression] ICE in df_ref_record building libgomp for ColdFire
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101170 --- Comment #4 from Jakub Jelinek --- Slightly cleaned up testcase: struct S s; __builtin_va_list ap; int i; long long l; struct S { int a; int b[]; }; struct S foo (int x) { struct S a = {}; do if (x) return a; while (1); } void bar (void) { for (; i; i++) l |= __builtin_va_arg (ap, long long) << s.b[i]; if (l) foo (l); } In fwprop1 we have: (insn 81 80 82 7 (set (subreg:SI (reg:DI 35 [ _5 ]) 0) (reg:SI 65)) "pr101170.c":22:7 56 {*movsi_cf} (expr_list:REG_DEAD (reg:SI 65) (nil))) (insn 82 81 44 7 (set (subreg:SI (reg:DI 35 [ _5 ]) 4) (reg:SI 66 [+4 ])) "pr101170.c":22:7 56 {*movsi_cf} (expr_list:REG_DEAD (reg:SI 66 [+4 ]) (nil))) (note, m68k is big endian). and eventually (debug_insn 65 64 66 10 (var_location:SI x (subreg:SI (reg:DI 35 [ _5 ]) 4)) "pr101170.c":24:5 -1 (nil)) During cprop1 we get: (debug_insn 84 80 81 8 (var_location:DI D#1 (clobber (const_int 0 [0]))) -1 (nil)) (insn 81 84 83 8 (set (subreg:SI (reg:DI 35 [ _5 ]) 0) (reg:SI 65)) "pr101170.c":22:7 56 {*movsi_cf} (expr_list:REG_DEAD (reg:SI 65) (nil))) (debug_insn 83 81 82 8 (var_location:DI D#1 (subreg:DI (reg:SI 66 [+4 ]) 0)) -1 (nil)) (insn 82 83 44 8 (set (subreg:SI (reg:DI 35 [ _5 ]) 4) (reg:SI 66 [+4 ])) "pr101170.c":22:7 56 {*movsi_cf} (expr_list:REG_DEAD (reg:SI 66 [+4 ]) (nil))) ... (debug_insn 65 64 66 12 (var_location:SI x (subreg:SI (debug_expr:DI D#1) 4)) "pr101170.c":24:5 -1 (nil)) out of this (which is very strange, because pseudo DI 35 which the D#1 supposedly stands for consist not just from the pseudo 66 (that is only lower half) but also pseudo 65 (upper half). So we have possible wrong-debug somewhere. Anyway, generally a paradoxical subreg of a reg is not wrong. Later on reload sticks pseudo 66 into %d0 register, which shouldn't be wrong, reload decisions shouldn't be affected by debug insns. And at the end of reload (LRA?) df is computed and the new assertion doesn't like that subreg, where regno becomes -1 (but unsigned).
[Bug tree-optimization/101178] SLP permute propagation doesn't handle VEC_PERM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101178 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 CC||tnfchris at gcc dot gnu.org Keywords||missed-optimization Last reconfirmed||2021-06-23 Status|UNCONFIRMED |ASSIGNED
[Bug tree-optimization/101178] New: SLP permute propagation doesn't handle VEC_PERM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101178 Bug ID: 101178 Summary: SLP permute propagation doesn't handle VEC_PERM Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- The current permute propagation code simply treats VEC_PERM nodes as materialization points (they can consume incoming permutes) but it does neither handle them as sources for permutes nor does it consider propagating a common source permute through itself. The latter can be seen for double x[2], y[2], z[2], w[2]; void foo () { double tem0 = x[1] + y[1]; double tem1 = x[0] - y[0]; double tem2 = z[1] * tem0; double tem3 = z[0] * tem1; z[0] = tem2 - w[0]; z[1] = tem3 + w[1]; } where we do not end up materializing the x[], y[] and w[] permute at the last +- node but instead materialize at the first +- node and thus end up with incoming permute differences at the second +- one: [local count: 1073741824]: _21 = [1] + 18446744073709551608; vect__3.9_22 = MEM [(double *)_21]; _1 = x[1]; _23 = [1] + 18446744073709551608; vect__4.12_24 = MEM [(double *)_23]; vect_tem1_13.14_26 = vect__3.9_22 - vect__4.12_24; vect_tem0_12.13_25 = vect__3.9_22 + vect__4.12_24; _27 = VEC_PERM_EXPR ; _2 = y[1]; tem0_12 = _1 + _2; _3 = x[0]; _4 = y[0]; tem1_13 = _3 - _4; _18 = [1] + 18446744073709551608; vect__5.5_19 = MEM [(double *)_18]; vect__6.6_20 = VEC_PERM_EXPR ; vect_tem2_14.15_28 = vect__6.6_20 * _27; _5 = z[1]; tem2_14 = _5 * tem0_12; _6 = z[0]; tem3_15 = _6 * tem1_13; vect__7.18_29 = MEM [(double *)]; vect__10.20_31 = vect_tem2_14.15_28 + vect__7.18_29; vect__8.19_30 = vect_tem2_14.15_28 - vect__7.18_29; _32 = VEC_PERM_EXPR ; _7 = w[0]; _8 = tem2_14 - _7; _9 = w[1]; _10 = _9 + tem3_15; MEM [(double *)] = _32; The permute vect__6.6_20 = VEC_PERM_EXPR could have been elided.
[Bug tree-optimization/101105] [11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101105 --- Comment #5 from Richard Biener --- Created attachment 51055 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51055=edit patch So like this.
[Bug tree-optimization/101105] [11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101105 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #4 from rsandifo at gcc dot gnu.org --- Mine.
[Bug tree-optimization/101105] [11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101105 --- Comment #3 from Richard Biener --- So what happens is that vect_compile_time_alias fails to perform the offset adjustment for the negative step DR #(Data Ref: # bb: 3 # stmt: b[g_40][0] = 0; # ref: b[g_40][0]; # base_object: b; # Access function 0: 0 # Access function 1: {3, +, -1}_1 which is done via /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of [a, a+12) */ if (tree_int_cst_compare (DR_STEP (a->dr), size_zero_node) < 0) { const_length_a = (-wi::to_poly_wide (segment_length_a)).force_uhwi (); offset_a -= const_length_a; } since we zero segment_length_a because of ignore_step_p. But that adjustment cannot be ignored. I suppose we need to track a separate "offset segment length" for this purpose?
Re: RISC-V: Parsing custom extension that is version 0
The gcc-bugs mailing list is for automated mails from our Bugzilla database. Bug reports should be entered into Bugzilla, and discussions should happen in Bugzilla or on a more apppropriate mailing list (because most GCC devs do not routinely read the gcc-bugs mails).
Re: RISC-V: Parsing custom extension that is version 0
Hi Robert: My assumption is the version should never be 0.0, at least 0.1, so it is treated as 2p0, but I didn't check if the input is really 0p0 or 0, that's kind of bug we need to fix. And I am not familiar with PULP stuff, does it mean PULP really uses version 0.0, and intend to implement multiple-version of that on GCC? On Mon, Jun 21, 2021 at 10:07 AM Robert Balas via Gcc-bugs wrote: > > When giving gcc a -march string with a custom extension of > version 0 (for example pulpv0) then gcc will think assign in the > default version of 2p0. > > In gcc/common/config/riscv/riscv-common.c the function > riscv_subset_list::parsing_subset_version falls back to the > default version (2p0) when parsing if the major and minor version > are both zero (which is the case for the string "pulpv0"). This > means both "pulpv0" and "pulpv2" will get assigned the version > 2p0. Looks wrong to me. > > Robert
[Bug middle-end/101172] [11/12 Regression] ICE Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101172 --- Comment #4 from Jakub Jelinek --- Created attachment 51054 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51054=edit gcc12-pr101172.patch Untested fix.
[Bug middle-end/101172] [11/12 Regression] ICE Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101172 Jakub Jelinek changed: What|Removed |Added Component|c |middle-end Status|NEW |ASSIGNED
[Bug c++/98401] Temporaries passed to co_await sometimes cause an extraneous call to destructor at incorrect address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98401 Victor Burckel changed: What|Removed |Added CC||victor.burckel at gmail dot com --- Comment #4 from Victor Burckel --- I'm also seeing the same behavior, destructor of lambda captures seems to get called twice https://godbolt.org/z/zxnhM3x47
[Bug c++/99576] [coroutines] destructor of a temporary called too early within co_await expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99576 --- Comment #4 from Victor Burckel --- I'm also seeing the same behavior, destructor of lambda captures seems to get called twice https://godbolt.org/z/zxnhM3x47
[Bug target/101175] builtin_clz generates wrong bsr instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175 --- Comment #4 from Mikael Pettersson --- (In reply to Uroš Bizjak from comment #3) > (In reply to Mikael Pettersson from comment #2) > > (In reply to Iru Cai from comment #0) > > > Built with '-march=x86-64-v3 -O1', the following code generates a bsr > > > instruction, which has undefined behavior when the source operand is zero, > > > thus gives wrong result > > > > The documentation for __builtin_clz(x) states "If x is 0, the result is > > undefined". > > The testcase from Comment #0 does: > > if (value != 0) { > return __builtin_clz(value); Yes I just noticed. My mistake.
[Bug target/101175] builtin_clz generates wrong bsr instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175 --- Comment #3 from Uroš Bizjak --- (In reply to Mikael Pettersson from comment #2) > (In reply to Iru Cai from comment #0) > > Built with '-march=x86-64-v3 -O1', the following code generates a bsr > > instruction, which has undefined behavior when the source operand is zero, > > thus gives wrong result > > The documentation for __builtin_clz(x) states "If x is 0, the result is > undefined". The testcase from Comment #0 does: if (value != 0) { return __builtin_clz(value);
[Bug target/101175] builtin_clz generates wrong bsr instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175 --- Comment #2 from Mikael Pettersson --- (In reply to Iru Cai from comment #0) > Built with '-march=x86-64-v3 -O1', the following code generates a bsr > instruction, which has undefined behavior when the source operand is zero, > thus gives wrong result The documentation for __builtin_clz(x) states "If x is 0, the result is undefined".
[Bug target/99488] dwz: /usr/lib/gcc/mips64el-linux-gnuabi64/11/go1: Found two copies of .debug_line_str section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99488 --- Comment #16 from Jakub Jelinek --- Are you sure about the .. in one of the zdebug section names?
[Bug fortran/100337] Should be able to pass non-present optional arguments to CO_BROADCAST
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100337 --- Comment #4 from Andre Vehreschild --- Waiting two weeks before backporting to gcc-11.
[Bug fortran/100337] Should be able to pass non-present optional arguments to CO_BROADCAST
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100337 --- Comment #3 from CVS Commits --- The master branch has been updated by Andre Vehreschild : https://gcc.gnu.org/g:da13e4ebebb07a47d5fb50eab8893f8fe38683df commit r12-1741-gda13e4ebebb07a47d5fb50eab8893f8fe38683df Author: Andre Vehreschild Date: Wed Jun 23 10:09:29 2021 +0200 fortran: Fix deref of optional in gen. code. [PR100337] gcc/fortran/ChangeLog: PR fortran/100337 * trans-intrinsic.c (conv_co_collective): Check stat for null ptr before dereferrencing. gcc/testsuite/ChangeLog: PR fortran/100337 * gfortran.dg/coarray_collectives_17.f90: New test.
[Bug middle-end/101170] [12 Regression] ICE in df_ref_record building libgomp for ColdFire
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101170 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW CC||marxin at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2021-06-23 --- Comment #3 from Martin Liška --- Yes, happens with the current master. Reduced test-case: $ cat pr101170.c struct gomp_doacross_work_share GOMP_doacross_ull_wait_ws_1_0; __builtin_va_list GOMP_doacross_ull_wait_ap; int GOMP_doacross_ull_wait_i; long long GOMP_doacross_ull_wait_flattened; struct gomp_doacross_work_share { int ncounts; int shift_counts[]; }; struct gomp_doacross_work_share doacross_spin(int expected) { struct gomp_doacross_work_share a; do if (expected) return a; while (1); } void GOMP_doacross_ull_wait() { for (; GOMP_doacross_ull_wait_i; GOMP_doacross_ull_wait_i++) GOMP_doacross_ull_wait_flattened |= __builtin_va_arg(GOMP_doacross_ull_wait_ap, long long) << GOMP_doacross_ull_wait_ws_1_0.shift_counts[GOMP_doacross_ull_wait_i]; if (GOMP_doacross_ull_wait_flattened) doacross_spin(GOMP_doacross_ull_wait_flattened); }
[Bug middle-end/101167] Miscompilation of task_reduction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101167 --- Comment #1 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:679506c3830ea1a93c755413609bfac3538e2cbd commit r12-1740-g679506c3830ea1a93c755413609bfac3538e2cbd Author: Jakub Jelinek Date: Wed Jun 23 10:03:28 2021 +0200 openmp: Fix up *_reduction clause handling with UDRs on PARM_DECLs [PR101167] The following testcase FAILs, because the UDR combiner is invoked incorrectly. lower_omp_rec_clauses expects that when it sets DECL_VALUE_EXPR/DECL_HAS_VALUE_EXPR_P for both the placeholder and the var that everything will be properly regimplified, but as the variable in question is a PARM_DECL rather than VAR_DECL, lower_omp_regimplify_p doesn't say that it should be regimplified and so it is not. 2021-06-23 Jakub Jelinek PR middle-end/101167 * omp-low.c (lower_omp_regimplify_p): Regimplify also PARM_DECLs and RESULT_DECLs that have DECL_HAS_VALUE_EXPR_P set. * testsuite/libgomp.c-c++-common/task-reduction-15.c: New test.
[Bug c/101172] [11/12 Regression] ICE Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101172 Andrew Pinski changed: What|Removed |Added Severity|normal |minor Target Milestone|--- |11.2 Summary|ICE Segmentation fault |[11/12 Regression] ICE ||Segmentation fault
[Bug target/99488] dwz: /usr/lib/gcc/mips64el-linux-gnuabi64/11/go1: Found two copies of .debug_line_str section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99488 --- Comment #15 from Andrew Pinski --- (In reply to YunQiang Su from comment #14) > The problem sees due to some problem of LTO. So I if understand correctly this binutils patch is fixes the issue? If so please close this bug as moved and open up a binutils bug and submit the patch there.
[Bug tree-optimization/101173] [9/10/11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101173 --- Comment #4 from Richard Biener --- Created attachment 51053 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51053=edit patch For reference this is the patch that completed bootstrap & regtest on x86_64-unknown-linux-gnu.
[Bug c/101172] ICE Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101172 Martin Liška changed: What|Removed |Added CC||jakub at gcc dot gnu.org, ||marxin at gcc dot gnu.org --- Comment #3 from Martin Liška --- Btw. started with r11-2225-ge4f1cbc35b1e823a.
[Bug c/101171] [12 Regression] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_expr_sizeof_expr, at c/c-typeck.c:3006
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101171 Martin Liška changed: What|Removed |Added Last reconfirmed||2021-06-23 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC||jsm28 at gcc dot gnu.org, ||marxin at gcc dot gnu.org --- Comment #2 from Martin Liška --- Started with r10-5922-g3d77686d2eddf76d.
[Bug target/99488] dwz: /usr/lib/gcc/mips64el-linux-gnuabi64/11/go1: Found two copies of .debug_line_str section
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99488 --- Comment #14 from YunQiang Su --- The problem sees due to some problem of LTO. Index: binutils-2.36.50.20210618/bfd/elfxx-mips.c === --- binutils-2.36.50.20210618.orig/bfd/elfxx-mips.c +++ binutils-2.36.50.20210618/bfd/elfxx-mips.c @@ -7448,7 +7448,9 @@ _bfd_mips_elf_section_from_shdr (bfd *ab break; case SHT_MIPS_DWARF: if (! startswith (name, ".debug_") - && ! startswith (name, ".zdebug_")) + && ! startswith (name, ".gnu.debuglto_.debug_") + && ! startswith (name, ".zdebug_") + && ! startswith (name, ".gnu.debuglto_..zdebug_")) return false; break; case SHT_MIPS_SYMBOL_LIB: @@ -7669,7 +7671,9 @@ _bfd_mips_elf_fake_sections (bfd *abfd, hdr->sh_entsize = sizeof (Elf_External_ABIFlags_v0); } else if (startswith (name, ".debug_") - || startswith (name, ".zdebug_")) + || startswith (name, ".gnu.debuglto_.debug_") + || startswith (name, ".zdebug_") + || startswith (name, ".gnu.debuglto_.zdebug_")) { hdr->sh_type = SHT_MIPS_DWARF;
[Bug target/101175] builtin_clz generates wrong bsr instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101175 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Last reconfirmed||2021-06-23 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak --- Created attachment 51052 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51052=edit Proposed patch Patch enhances BSR insn pattern with ZF setting to prevent unwanted combinations with LZCNT insn pattern.
[Bug tree-optimization/101173] [9/10/11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101173 --- Comment #3 from Richard Biener --- So we're exchanging the inner two loops a[1][3] = 8; for (int b = 1; b <= 5; b++) for (int d = 0; d <= 5; d++) for (c = 0; c <= 5; c++) a[b][c] = a[b][c + 2] & 216; to a[1][3] = 8; for (int b = 1; b <= 5; b++) for (c = 0; c <= 5; c++) for (int d = 0; d <= 5; d++) a[b][c] = a[b][c + 2] & 216; but that looks wrong from a dependence analysis perspective. We have (compute_affine_dependence ref_a: a[b_33][_1], stmt_a: _2 = a[b_33][_1]; ref_b: a[b_33][c.3_32], stmt_b: a[b_33][c.3_32] = _3; (analyze_overlapping_iterations (chrec_a = {2, +, 1}_5) (chrec_b = {0, +, 1}_5) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [0 + 1 * x_1]) (overlaps_b = [2 + 1 * x_1])) ) (overlap_iterations_a = [0 + 1 * x_1]) (overlap_iterations_b = [2 + 1 * x_1])) (analyze_overlapping_iterations (chrec_a = {1, +, 1}_1) (chrec_b = {1, +, 1}_1) (overlap_iterations_a = [0]) (overlap_iterations_b = [0])) (analyze_overlapping_iterations (chrec_a = {0, +, 1}_5) (chrec_b = {2, +, 1}_5) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [2 + 1 * x_1]) (overlaps_b = [0 + 1 * x_1])) ) (overlap_iterations_a = [2 + 1 * x_1]) (overlap_iterations_b = [0 + 1 * x_1])) (analyze_overlapping_iterations (chrec_a = {1, +, 1}_1) (chrec_b = {1, +, 1}_1) (overlap_iterations_a = [0]) (overlap_iterations_b = [0])) (build_classic_dist_vector dist_vector = ( 0 0 2 ) ) ) I don't see anything wrong with that at a first glance so the bug must be in tree_loop_interchange::valid_data_dependences it checks /* Be conservative, skip case if either direction at i_idx/o_idx levels is not '=' or '<'. */ if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0) return false; dist_vect is [0 0 2], i_idx 2 and o_idx 1 but I think that dist_vect[o_idx] should exclude zero, thus diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc index f45b9364644..265e36c48d4 100644 --- a/gcc/gimple-loop-interchange.cc +++ b/gcc/gimple-loop-interchange.cc @@ -1043,8 +1043,8 @@ tree_loop_interchange::valid_data_dependences (unsigned i_idx, unsigned o_idx, continue; /* Be conservative, skip case if either direction at i_idx/o_idx -levels is not '=' or '<'. */ - if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0) +levels is not '=' (for the inner loop) or '<'. */ + if (dist_vect[i_idx] < 0 || dist_vect[o_idx] <= 0) return false; } } Bin - does this analysis look sound?
[Bug target/101177] New: sh3: internal compiler error: Illegal instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101177 Bug ID: 101177 Summary: sh3: internal compiler error: Illegal instruction Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: roland.illig at gmx dot de Target Milestone: --- $ cat lex.c int lex_input(void); int lex_character_constant(void); int lex_character_constant(void) { int c = lex_input(); if (c == 7) return c; c &= 255; return c == 0 ? -1 : c; } $ /home/rillig/builds/sh3-tools/bin/sh--netbsdelf-gcc --version sh--netbsdelf-gcc (NetBSD nb1 20200907) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. $ /home/rillig/builds/sh3-tools/bin/sh--netbsdelf-gcc lex.c -O1 sh--netbsdelf-gcc: internal compiler error: Illegal instruction signal terminated program cc1 $ gdb --args /home/rillig/builds/sh3-tools/libexec/gcc/sh--netbsdelf/9.3.0/cc1 lex.c -O1 (gdb) r (gdb) bt #0 0x00675ddc in df_ref_create_structure(df_ref_class, df_collection_rec*, rtx_def*, rtx_def**, basic_block_def*, df_insn_info*, df_ref_type, int) () #1 0x00676b90 in df_ref_record(df_ref_class, df_collection_rec*, rtx_def*, rtx_def**, basic_block_def*, df_insn_info*, df_ref_type, int) () #2 0x00676dc9 in df_uses_record(df_collection_rec*, rtx_def**, df_ref_type, basic_block_def*, df_insn_info*, int) () #3 0x00676e91 in df_uses_record(df_collection_rec*, rtx_def**, df_ref_type, basic_block_def*, df_insn_info*, int) () #4 0x00678274 in df_insn_refs_collect(df_collection_rec*, basic_block_def*, df_insn_info*) () #5 0x0067b617 in df_insn_rescan(rtx_insn*) () #6 0x006dc3a1 in emit_pattern_after_noloc(rtx_def*, rtx_insn*, basic_block_def*, rtx_insn* (*)(rtx_def*)) () #7 0x006dc3d3 in emit_pattern_after_setloc(rtx_def*, rtx_insn*, unsigned int, rtx_insn* (*)(rtx_def*)) () #8 0x006dcc12 in emit_insn_after_setloc(rtx_def*, rtx_insn*, unsigned int) () #9 0x006dce92 in try_split(rtx_def*, rtx_insn*, int) () #10 0x006dd275 in try_split(rtx_def*, rtx_insn*, int) () #11 0x006dd275 in try_split(rtx_def*, rtx_insn*, int) () ... #1979 0x006dd275 in try_split(rtx_def*, rtx_insn*, int) () #1980 0x006dd275 in try_split(rtx_def*, rtx_insn*, int) () #1981 0x00905d33 in split_insn(rtx_insn*) () #1982 0x00909c96 in split_all_insns() () #1983 0x00909dc3 in (anonymous namespace)::pass_split_all_insns::execute(function*) () #1984 0x008d2044 in execute_one_pass(opt_pass*) () #1985 0x008d29a8 in execute_pass_list_1(opt_pass*) () #1986 0x008d29ba in execute_pass_list_1(opt_pass*) () #1987 0x008d29e0 in execute_pass_list(function*, opt_pass*) () #1988 0x0064fe9f in cgraph_node::expand() () #1989 0x00651375 in symbol_table::compile() () #1990 0x00652c7f in symbol_table::finalize_compilation_unit() () #1991 0x0098b57d in compile_file() () #1992 0x0098da13 in toplev::main(int, char**) () #1993 0x00f36e1c in main () The code is extracted from: https://github.com/NetBSD/src/blob/3f158578dbda380f096b448cf750251299159488/usr.bin/xlint/lint1/lex.c
[Bug c/101176] New: valgrind error for c-c++-common/builtin-has-attribute.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101176 Bug ID: 101176 Summary: valgrind error for c-c++-common/builtin-has-attribute.c Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: dcb314 at hotmail dot com Target Milestone: --- For the gcc testsuite file c-c++-common/builtin-has-attribute.c and a recent build of gcc trunk with valgrind, I get: ./c-c++-common/builtin-has-attribute.c:31:32: error: ‘foobar’ undeclared (first use in this function) 31 | b = __builtin_has_attribute (foobar, aligned); /* { dg-error ".foobar . \(undeclared|was not declared\)" } */ |^~ ==104199== Use of uninitialised value of size 8 ==104199==at 0x16EDF70: htab_find_slot_with_hash (hashtab.c:654) ==104199==by 0x169B7CD: get_combined_adhoc_loc(line_maps*, unsigned int, sou rce_range, void*) (line-map.c:208) Since this valgrind error is for C code with errors in it, I think this is reduced priority.
[Bug tree-optimization/101173] [9/10/11/12 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101173 Richard Biener changed: What|Removed |Added Target Milestone|--- |9.5 Keywords||wrong-code CC||amker at gcc dot gnu.org Summary|wrong code at -O3 on|[9/10/11/12 Regression] |x86_64-linux-gnu|wrong code at -O3 on ||x86_64-linux-gnu Ever confirmed|0 |1 Version|unknown |12.0 Known to work||7.5.0 Priority|P3 |P2 Last reconfirmed||2021-06-23 Status|UNCONFIRMED |NEW Known to fail||8.5.0 --- Comment #2 from Richard Biener --- Confirmed. int a[6][9]; int c; int main() { a[1][3] = 8; for (int b = 1; b <= 5; b++) for (int d = 0; d <= 5; d++) for (c = 0; c <= 5; c++) a[b][c] = a[b][c + 2] & 216; for (int e = 0; e < 6; e++) for (int f = 0; f < 9; f++) if (a[e][f] != 0) __builtin_abort (); return 0; } Fails with -O -floop-interchange already.
[Bug c/101172] ICE Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101172 Richard Biener changed: What|Removed |Added Summary|[12 regression] ICE |ICE Segmentation fault |Segmentation fault | --- Comment #2 from Richard Biener --- Note 36.c:17: confused by earlier errors, bailing out is what we print for an internal compiler error that happens after an error was reported for release builds.
[Bug c/101172] [12 regression] ICE Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101172 Richard Biener changed: What|Removed |Added Last reconfirmed||2021-06-23 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Keywords||error-recovery, ||ice-on-invalid-code --- Comment #1 from Richard Biener --- Confirmed. The DECL_BIT_FIELD_REPRESENTATIVE of the field not NULL but its type is.
[Bug c/101171] [12 Regression] ICE: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in c_expr_sizeof_expr, at c/c-typeck.c:3006
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101171 Richard Biener changed: What|Removed |Added Keywords||error-recovery, ||ice-checking, ||ice-on-invalid-code --- Comment #1 from Richard Biener --- Is your GCC 11 compiler checking-enabled? I doubt it is a regression.