[Bug target/96941] New: Initial PPC64LE transcendental auto-vectorization functionality
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96941 Bug ID: 96941 Summary: Initial PPC64LE transcendental auto-vectorization functionality Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: dje at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-*-linux Demonstrate basic auto-vectorization of single- and double-precision transcendental functions using libmvec.
[Bug c++/95164] [9/10/11 Regression] ICE regression starting with 9.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95164 Marek Polacek changed: What|Removed |Added Keywords||patch --- Comment #4 from Marek Polacek --- Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553311.html
[Bug gcov-profile/96913] gcc-11: __gcov_merge_topn hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96913 --- Comment #3 from Sergei Trofimovich --- Specifically I think this is already a wrong format on disk: > _json.gcda:01a7: 0:COUNTERS topn 0 counts > _json.gcda:01a9: 48:COUNTERS indirect_call 24 counts > _json.gcda: 0: 1 1 140325305737168 1 1 140325305737200 0 0 > _json.gcda: 8: 0 0 0 0 0 0 0 0 > _json.gcda: 16: 0 0 0 0 0 0 0 0 > ... Assuming indirect_call is in a 'hist' value format it should be in form of: [total_executions, N, value1, counter1, ..., valueN, counterN] Main problem: we have more than one entry here (which might be ok): - record1 (ok): total_executions=1 N=1 value1=140325305737168 counter1=1 - record2 (bad): total_executions=1 N=140325305737200 counter=0 ... This is where we trip over enormous N.
[Bug d/96924] d: ICE in create_tmp_var, at gimple-expr.c:482
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96924 Iain Buclaw changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Iain Buclaw --- Fix committed.
[Bug d/96924] d: ICE in create_tmp_var, at gimple-expr.c:482
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96924 --- Comment #2 from CVS Commits --- The releases/gcc-10 branch has been updated by Iain Buclaw : https://gcc.gnu.org/g:40af8b2eff82f28d83b2a5fe153cbc53af665956 commit r10-8711-g40af8b2eff82f28d83b2a5fe153cbc53af665956 Author: Iain Buclaw Date: Fri Sep 4 22:54:22 2020 +0200 d: Fix ICE in create_tmp_var, at gimple-expr.c:482 Array concatenate expressions were creating more SAVE_EXPRs than what was necessary. The internal error itself was the result of a forced temporary being made on a TREE_ADDRESSABLE type. gcc/d/ChangeLog: PR d/96924 * expr.cc (ExprVisitor::visit (CatAssignExp *)): Don't force temporaries needlessly. gcc/testsuite/ChangeLog: PR d/96924 * gdc.dg/pr96924.d: New test. (cherry picked from commit 52908b8de15a1c762a73063f1162bcedfcc993b4)
[Bug d/96924] d: ICE in create_tmp_var, at gimple-expr.c:482
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96924 --- Comment #1 from CVS Commits --- The master branch has been updated by Iain Buclaw : https://gcc.gnu.org/g:f8eabd47ac5335ebab0d83ff61fb680a46888be8 commit r11-3015-gf8eabd47ac5335ebab0d83ff61fb680a46888be8 Author: Iain Buclaw Date: Fri Sep 4 22:54:22 2020 +0200 d: Fix ICE in create_tmp_var, at gimple-expr.c:482 Array concatenate expressions were creating more SAVE_EXPRs than what was necessary. The internal error itself was the result of a forced temporary being made on a TREE_ADDRESSABLE type. gcc/d/ChangeLog: PR d/96924 * expr.cc (ExprVisitor::visit (CatAssignExp *)): Don't force temporaries needlessly. gcc/testsuite/ChangeLog: PR d/96924 * gdc.dg/simd13927b.d: Removed. * gdc.dg/pr96924.d: New test.
[Bug preprocessor/96940] ICE in linemap_compare_locations, at libcpp/line-map.c:1359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96940 --- Comment #2 from Jan Smets --- This is the workaround I currently have. It avoids calling min_location(). diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 90111e4c786..f49019e81d0 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -11005,8 +11005,11 @@ grokdeclarator (const cp_declarator *declarator, if (initialized > 1) funcdef_flag = true; - location_t typespec_loc = smallest_type_location (type_quals, + location_t typespec_loc = smallest_type_quals_location (type_quals, declspecs->locations); + // using smallest_type_quals_location() iso. smallest_type_quals_location() + // basically removes the usage of min_location on the result of smallest_type_quals_location(). + // typespec_loc = min_location (typespec_loc, declspecs->locations[ds_type_spec]); if (typespec_loc == UNKNOWN_LOCATION) typespec_loc = input_location;
[Bug target/96939] LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 --- Comment #7 from Jakub Jelinek --- AFAIK targetm.override_options_after_change is called at the end of switching optimization (but not target) options. So, that is a good hook to e.g. adjust something cached from those non-target Optimization options. targetm.target_option.save is called during cl_target_option_save, targetm.target_option.restore during cl_target_option_restore. Then there is the targetm.set_current_function hook that is called during set_cfun, i.e. when switching functions. So, just from quick look, it seems wrong that arm targetm.override_options_after_change calls arm_configure_build_target which is a function which deals with target specific options (and furthermore it calls it with the target_option_default_node node, so the command line options rather than whatever options the function has). I see that the rationale is probably that you want TARGET_THUMB to be reliable for the arm_override_options_after_change_1, but I'd think you instead want to just call the _1 and nothing else in there, and in arm_set_current_function (perhaps at the end of it) call it again, so it is updated properly even if the target options change. And maybe also call it at the start of arm_set_current_function even when nothing changed if that isn't sufficient. I think it was a mistake to have separate OPTIMIZATION_NODE and TARGET_OPTION_NODE, in retrospect I think that causes a lot of pain and I think it would be better if there was just one that would hold both and would be updated together, then have generic code deal with those changes and afterwards a target hook. Because when they are separate, at one time only one of them changes, some hooks are run, then the other one changes and some options/cached variables etc. can be dependent on both Optimization and Target options.
[Bug target/85830] vec_popcntd is improperly defined in altivec.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830 Carl Love changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #9 from Carl Love --- Issue fixed, closing.
[Bug target/85830] vec_popcntd is improperly defined in altivec.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830 Carl Love changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Carl Love --- The fix has been applied to the current mainline and backported to GCC 10. Closing the bug as fixed.
[Bug target/85830] vec_popcntd is improperly defined in altivec.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830 --- Comment #7 from CVS Commits --- The releases/gcc-10 branch has been updated by Carl Love : https://gcc.gnu.org/g:e86814328251ea7da83038605df01d8def8d873a commit r10-8710-ge86814328251ea7da83038605df01d8def8d873a Author: Carl Love Date: Thu Aug 27 13:36:13 2020 -0500 rs6000, remove improperly defined and unsupported builtins. gcc/ChangeLog 2020-08-31 Carl Love PR target/85830 * config/rs6000/altivec.h (vec_popcntb, vec_popcnth, vec_popcntw, vec_popcntd): Remove defines.
[Bug target/96939] LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 --- Comment #6 from Richard Earnshaw --- (In reply to Jakub Jelinek from comment #4) > Doesn't seem to be related to me, in the other PR everything is compiled > with one set of options and no target attribute is involved either. No, that's a completely different problem. The problem there is some calls to the back-end pass a fntype and some a fndecl. The ones with a fndecl can work out if a function is local and pick a different ABI, but the ones with only a type cannot. So we get inconsistent results.
[Bug target/96939] LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 --- Comment #5 from Richard Earnshaw --- I batted my head against this when reworking the command line options stuff a couple of years back, but the documentation on how the different hooks should interact (especially for LTO and streaming) is, quite frankly woeful. How any back-end maintainer is supposed to support this is beyond me.
[Bug target/96939] LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 --- Comment #4 from Jakub Jelinek --- Doesn't seem to be related to me, in the other PR everything is compiled with one set of options and no target attribute is involved either.
[Bug preprocessor/96940] ICE in linemap_compare_locations, at libcpp/line-map.c:1359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96940 --- Comment #1 from Jan Smets --- Likely duplicate of Bug 96391 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391) That one has a testcase for i686-w64-mingw32
[Bug target/96939] LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 --- Comment #3 from Andrew Pinski --- I think this is related to or a dup of bug 96882.
[Bug preprocessor/96391] [10/11 Regression] internal compiler error: in linemap_compare_locations, at libcpp/line-map.c:1359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96391 Jan Smets changed: What|Removed |Added CC||jan.smets at nokia dot com --- Comment #5 from Jan Smets --- Similar issue @ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935 (with bisect to the 'last known good' version)
[Bug target/96898] [nvptx] libatomic support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 --- Comment #5 from Jakub Jelinek --- It wouldn't be a fallback. omp-low.c just decides if it is going to use GOMP_atomic_{start,end} synchronization, __atomic_* or __sync_* to perform the reduction. And whether that uses the same or different lock doesn't matter, because for one reduction omp-low.c will only use one way.
[Bug c++/83591] -Wduplicated-branches fires in system headers in template instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83591 --- Comment #8 from Manuel López-Ibáñez --- (In reply to Tony E Lewis from comment #7) > Manuel López-Ibáñez: are you happy that all underlying issues are resolved > and this can be closed? Sure.
[Bug target/96898] [nvptx] libatomic support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 --- Comment #4 from Tom de Vries --- (In reply to Jakub Jelinek from comment #3) > For OpenMP reductions, we really don't care what kind of mutex protects the > updates, as long as it is the same for all updates of the same reduction. > I believe we don't rely on any other synchronization effects. > So, I think we should change omp-low.c so that it emits __atomic_* calls > with __ATOMIC_RELAXED rather than __sync_* calls. That sounds like a good idea. > And could just use > libatomic with its own locking if we didn't go the GOMP_atomic_{start,end} > route (that one is done if there are multiple reductions or the atomics > aren't available or there are user defined reductions we don't understand > (or all?), perhaps we should consider also using atomics perhaps even for > two simple reductions or similar. > And nvptx certainly could just use libatomic... If we use libatomic as fallback for openmp, shouldn't we then use the same lock in both?
[Bug target/96939] LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 Jakub Jelinek changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-09-04 Status|UNCONFIRMED |NEW --- Comment #2 from Jakub Jelinek --- Maybe the problem isn't that arm_option_reconfigure_globals isn't called, it is, but nothing has updated arm_active_target. E.g. put a breakpoint on arm_option_reconfigure_globals and arm_set_current_function and see what global_options.x_arm_arch_string and arm_active_target and arm_arch_crc is at the end of each arm_option_reconfigure_globals. Breakpoint 8, arm_option_reconfigure_globals () at ../../gcc/config/arm/arm.c:3772 3772 arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa, 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a6ba40 "armv7-a+fp" (gdb) c Continuing. Breakpoint 5, arm_set_current_function (fndecl=) at ../../gcc/config/arm/arm.c:32315 32315 if (!fndecl || fndecl == arm_previous_fndecl) 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a6ba40 "armv7-a+fp" (gdb) c Continuing. Breakpoint 8, arm_option_reconfigure_globals () at ../../gcc/config/arm/arm.c:3772 3772 arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa, 1: arm_arch_crc = 1 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb7a6 "armv8-a", arch_pp_name = 0x24fb7ae "8A", base_arch = BASE_ARCH_8A, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd" (gdb) c Continuing. Breakpoint 8, arm_option_reconfigure_globals () at ../../gcc/config/arm/arm.c:3772 3772 arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa, 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd" (gdb) c Continuing. Breakpoint 8, arm_option_reconfigure_globals () at ../../gcc/config/arm/arm.c:3772 3772 arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa, 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd" (gdb) Continuing. Breakpoint 8, arm_option_reconfigure_globals () at ../../gcc/config/arm/arm.c:3772 3772 arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa, 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd" (gdb) Continuing. Breakpoint 5, arm_set_current_function (fndecl=) at ../../gcc/config/arm/arm.c:32315 32315 if (!fndecl || fndecl == arm_previous_fndecl) 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd" (gdb) Continuing. Breakpoint 8, arm_option_reconfigure_globals () at ../../gcc/config/arm/arm.c:3772 3772 arm_arch6kz = arm_arch6k && bitmap_bit_p (arm_active_target.isa, 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24fb778 "7A", base_arch = BASE_ARCH_7, profile = 65 'A', isa = 0x2a6aaf0, tune_flags = 1, tune = 0x2213260 , tune_core = TARGET_CPU_genericv7a} 3: global_options.x_arm_arch_string = 0x2a38d50 "armv8-a+crc+simd" (gdb) Continuing. Breakpoint 5, arm_set_current_function (fndecl=) at ../../gcc/config/arm/arm.c:32315 32315 if (!fndecl || fndecl == arm_previous_fndecl) 1: arm_arch_crc = 0 2: arm_active_target = {core_name = 0x0, arch_name = 0x24fb770 "armv7-a", arch_pp_name = 0x24
[Bug tree-optimization/96938] Failure to optimize bit-setting pattern when not using temporary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96938 --- Comment #1 from Marc Glisse --- With "char tmp" instead of "int tmp", we get the same code as the first function.
[Bug preprocessor/96940] New: ICE in linemap_compare_locations, at libcpp/line-map.c:1359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96940 Bug ID: 96940 Summary: ICE in linemap_compare_locations, at libcpp/line-map.c:1359 Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: preprocessor Assignee: unassigned at gcc dot gnu.org Reporter: jan.smets at nokia dot com Target Milestone: --- Target: x86_64-linux-gnu Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu --disable-multilib --enable-languages=c,c++,fortran,go Reproduces with: 10.2, 10.1 Works with: 9.3, 9.1 Bisect traced it to commit 4593483f15ca2a82049500b9434e736996bb0891 Author: Paolo Carlini Date: Tue May 14 11:43:55 2019 + Reapply r270597. 2019-05-14 Paolo Carlini PR preprocessor/90382 * decl.c (grokdeclarator): Fix value assigned to typespec_loc, use min_location. 2019-05-14 Paolo Carlini PR preprocessor/90382 * g++.dg/diagnostic/trailing1.C: New test. From-SVN: r271164 /x/bcm_sdk/sdk/include/shared/bitop.h:73: internal compiler error: in linemap_compare_locations, at libcpp/line-map.c:1359 73 | CONST SHR_BITDCL *c, | 0x2233e01 linemap_compare_locations(line_maps*, unsigned int, unsigned int) /jasmets/git/tools/gcc/libcpp/line-map.c:1359 0x9b9bbb linemap_location_before_p(line_maps*, unsigned int, unsigned int) /jasmets/git/tools/gcc/gcc/../libcpp/include/line-map.h:1247 0x9a6998 min_location /jasmets/git/tools/gcc/gcc/cp/decl.c:10641 0x9a6a67 smallest_type_location /jasmets/git/tools/gcc/gcc/cp/decl.c:10673 0x9a759e grokdeclarator(cp_declarator const*, cp_decl_specifier_seq*, decl_context, int, tree_node**) /jasmets/git/tools/gcc/gcc/cp/decl.c:11009 0xa6115d cp_parser_parameter_declaration_list /jasmets/git/tools/gcc/gcc/cp/parser.c:22618 0xa60fd2 cp_parser_parameter_declaration_clause /jasmets/git/tools/gcc/gcc/cp/parser.c:22531 0xa5ea98 cp_parser_direct_declarator /jasmets/git/tools/gcc/gcc/cp/parser.c:21203 0xa5e8a5 cp_parser_declarator /jasmets/git/tools/gcc/gcc/cp/parser.c:21069 0xa5da7b cp_parser_init_declarator /jasmets/git/tools/gcc/gcc/cp/parser.c:20570 0xa5281d cp_parser_simple_declaration /jasmets/git/tools/gcc/gcc/cp/parser.c:13749 0xa5240a cp_parser_block_declaration /jasmets/git/tools/gcc/gcc/cp/parser.c:13566 0xa52105 cp_parser_declaration /jasmets/git/tools/gcc/gcc/cp/parser.c:13438 0xa521ea cp_parser_toplevel_declaration /jasmets/git/tools/gcc/gcc/cp/parser.c:13466 0xa40e07 cp_parser_translation_unit /jasmets/git/tools/gcc/gcc/cp/parser.c:4734 0xa926f8 c_parse_file() /jasmets/git/tools/gcc/gcc/cp/parser.c:44001 0xbc9712 c_common_parse_file() /jasmets/git/tools/gcc/gcc/c-family/c-opts.c:1190
[Bug target/87767] Missing AVX512 memory broadcast for constant vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 --- Comment #14 from Hongtao.liu --- (In reply to Jakub Jelinek from comment #12) > What I mean is that we should try to simplify the md file, instead of adding > hundreds of new *_bcst patterns. > We have e.g. > (define_insn "*3" > [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v") > (plusminus:VI_AVX2 > (match_operand:VI_AVX2 1 "vector_operand" "0,v") > (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))] > "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)" > "@ >p\t{%2, %0|%0, %2} >vp\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "isa" "noavx,avx") >(set_attr "type" "sseiadd") >(set_attr "prefix_data16" "1,*") >(set_attr "prefix" "orig,vex") >(set_attr "mode" "")]) > > (define_insn "*sub3_bcst" > [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") > (minus:VI48_AVX512VL > (match_operand:VI48_AVX512VL 1 "register_operand" "v") > (vec_duplicate:VI48_AVX512VL > (match_operand: 2 "memory_operand" "m"] > "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, mode, operands)" > "vpsub\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "type" "sseiadd") >(set_attr "prefix" "evex") >(set_attr "mode" "")]) > > What I meant is we could have just: > (define_insn "*3" > [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v") > (plusminus:VI_AVX2 > (match_operand:VI_AVX2 1 "vector_bcst_operand" "0,v") > (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))] > "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)" > "@ >p\t{%2, %0|%0, %2} >vp\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "isa" "noavx,avx") >(set_attr "type" "sseiadd") >(set_attr "prefix_data16" "1,*") >(set_attr "prefix" "orig,vex") >(set_attr "mode" "")]) > where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F > a VEC_DUPLICATE of the right mode with a MEM inside of it with the element > mode of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for > TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that > ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as > MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one > MEM operand, or two VEC_DUPLICATE operands) and that the output code would > handle emitting an operand with VEC_DUPLICATE of a MEM properly. > Or perhaps the constraint there could be just for the broadcast and one > could write vmBb. Still, I think the predicate needs to be accurate, i.e. > for some instructions we want e.g. vector_operand or TARGET_AVX512F and > bcst_mem_operand, > for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc. > > Anyway, if we go down this route, might be best to handle just a couple of > patterns, then ask for review and see what Kirill (or if Uros would be > interested) think about it and only later convert more. Is there any way to add preference to constraint "Bb", since we always want to choose "Bb" when vec_duplicate existed, but sometimes, pass_reload would choose 'v', which produce a redudant broadcast instructions. i.e: with the patch attached. testcase avx512f-add-df-zmm-1.c would fail to generate embedded broadcast with -m32.
[Bug target/87767] Missing AVX512 memory broadcast for constant vector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767 --- Comment #13 from Hongtao.liu --- Created attachment 49182 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49182&action=edit bcst_vector_operand
[Bug c++/87530] copy elision in return statement doesn't check for rvalue reference to object type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87530 --- Comment #3 from Marek Polacek --- No longer accepted since r11-2411. The test should probably be added.
[Bug gcov-profile/96913] gcc-11: __gcov_merge_topn hangs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96913 --- Comment #2 from Sergei Trofimovich --- (In reply to Sergei Trofimovich from comment #0) > The hang happens on real tauthon-2.8.2 interpreter from PR96394 (no nice > reproducer yet). > > In this instance I tried to build tauthon-2.8.2 against gcc-master. It hangs > early when tries to merge topn entry: > > """ > #0 0x7fd73865e0ce in __GI___libc_read (fd=3, buf=0x406284 > <__gcov_var+36>, nbytes=4096) at ../sysdeps/unix/sysv/linux/read.c:26 > #1 0x7fd7385f0090 in __GI__IO_file_xsgetn (fp=0x1295bc0, > data=, n=4096) at libioP.h:948 > #2 0x7fd7385e4d1f in __GI__IO_fread (buf=0x406284 <__gcov_var+36>, > size=1, count=4096, fp=0x1295bc0) at iofread.c:38 > #3 0x7fd72ab4237e in gcov_read_words (words=2) at > ../../../gcc/libgcc/../gcc/gcov-io.c:491 > #4 0x7fd72ab42483 in __gcov_read_counter () at > ../../../gcc/libgcc/../gcc/gcov-io.c:528 > #5 0x7fd72ab4190e in gcov_get_counter_target () at > ../../../gcc/libgcc/libgcov.h:383 > #6 0x7fd72ab41c6b in __gcov_merge_topn (counters=0x7fd72ab4d320 > <__gcov4.encoder_clear>, n_counters=24) at > ../../../gcc/libgcc/libgcov-merge.c:114 > #7 0x7fd72ab43569 in merge_one_data ( > filename=0x1154290 > "/tmp/portage/dev-lang/tauthon-2.8.2/work/x86_64-pc-linux-gnu/build/temp. > linux-x86_64-2.8/tmp/portage/dev-lang/tauthon-2.8.2/work/tauthon-2.8.2/ > Modules/_json.gcda", gi_ptr=0x7fd72ab4a180, summary=0x7ffde7d6e9c0) at > ../../../gcc/libgcc/libgcov-driver.c:314 > #8 0x7fd72ab43b1a in dump_one_gcov (gi_ptr=0x7fd72ab4a180, > gf=0x7ffde7d6ea00, run_counted=0, run_max=125) > at ../../../gcc/libgcc/libgcov-driver.c:492 > #9 0x7fd72ab43cba in gcov_do_dump (list=0x7fd72ab4a180, run_counted=0) > at ../../../gcc/libgcc/libgcov-driver.c:555 > #10 0x7fd72ab43d28 in __gcov_dump_one (root=0x7fd72ab4e5c0 > <__gcov_root>) at ../../../gcc/libgcc/libgcov-driver.c:578 ... > > Looks like the problem is in decoding of some tag in __gcov_merge_topn(). > There count of entries is huge: > > """ > (gdb) frame 6 > #6 0x7fd72ab41c6b in __gcov_merge_topn (counters=0x7fd72ab4d320 > <__gcov4.encoder_clear>, n_counters=24) at > ../../../gcc/libgcc/libgcov-merge.c:114 > 114 gcov_type value = gcov_get_counter_target (); > (gdb) list __gcov_merge_topn > 96 -- counter > 97 */ > 98 > 99 void > 100 __gcov_merge_topn (gcov_type *counters, unsigned n_counters) > 101 { > 102 gcc_assert (!(n_counters % GCOV_TOPN_MEM_COUNTERS)); > 103 > 104 for (unsigned i = 0; i < (n_counters / GCOV_TOPN_MEM_COUNTERS); > i++) > 105 { > (gdb) > 106 /* First value is number of total executions of the profiler. > */ > 107 gcov_type all = gcov_get_counter_ignore_scaling (-1); > 108 gcov_type n = gcov_get_counter_ignore_scaling (-1); > 109 > 110 counters[GCOV_TOPN_MEM_COUNTERS * i] += all; > 111 > 112 for (unsigned j = 0; j < n; j++) > 113 { > 114 gcov_type value = gcov_get_counter_target (); > 115 gcov_type count = gcov_get_counter_ignore_scaling (-1); > (gdb) > 116 > 117 // TODO: we should use atomic here > 118 gcov_topn_add_value (counters + GCOV_TOPN_MEM_COUNTERS * > i, value, > 119count, 0, 0); > 120 } > 121 } > 122 } > 123 #endif /* L_gcov_merge_topn */ > 124 > 125 #endif /* inhibit_libc */ > (gdb) print n > $1 = 140325305737200 I looks like __gconv_merge_topn() is applied to 'indirect_call' counters contents. But it's content does not seem to match dynamic topn structure: $ x86_64-pc-linux-gnu-gcov-dump -l _json.gcda ... _json.gcda:01a7: 0:COUNTERS topn 0 counts _json.gcda:01a9: 48:COUNTERS indirect_call 24 counts _json.gcda: 0: 1 1 140325305737168 1 1 140325305737200 0 0 _json.gcda: 8: 0 0 0 0 0 0 0 0 _json.gcda: 16: 0 0 0 0 0 0 0 0 ... Note how 140325305737200 is in the middle of topn. My wild guess is that before commit 871e5ada6d53d5eb495cc9f323983f347487c1b2 Author: Martin Liska Date: Fri Jan 31 13:10:14 2020 +0100 Make TOPN counter dynamically allocated. both indirect_call and topn had the same fixed n-value structure and were ~ok to be merged with __gconv_merge_topn(). But now topn got a special 0-values case (why did we emit it at all?) that merger can't handle and gets slightly past the beginning of 'indirect_call' section.
[Bug target/96939] LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #1 from Jeffrey A. Law --- I suspect this is the same thing we're seeing with the dozen or so armv7/NEON failures with LTO in Fedora. It was on my list to reduce, but hadn't gotten to it yet.
[Bug target/96939] New: LTO vs. different arm arch options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939 Bug ID: 96939 Summary: LTO vs. different arm arch options Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- $ cat a1.c extern unsigned crc (unsigned, const void *); typedef unsigned (*fnptr) (unsigned, const void *); volatile fnptr fn; int main () { fn = crc; return 0; } $ cat a2.c #include unsigned crc (unsigned x, const void *y) { return __crc32cw (x, *(unsigned *) y); } $ ./xgcc -B ./ -O2 -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -c a1.c -flto $ ./xgcc -B ./ -O2 -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -march=armv8-a+crc -c a2.c -flto $ ./xgcc -B ./ -r -march=armv7-a -mfpu=vfpv3-d16 -mtune=generic-armv7-a -mabi=aapcs-linux -mfloat-abi=hard -o a a1.o a2.o -flto results in: a2.c: In function ‘crc’: a2.c:6:10: error: this builtin is not supported for this target 6 | return __crc32cw (x, *(unsigned *) y); | ^ Adding __attribute__((target ("arch=armv8-a+crc"))) to crc function doesn't help. In gdb I see (gdb) p global_options.x_arm_arch_string $2 = 0x2a38d50 "armv8-a+crc+simd" (gdb) p arm_arch_crc $3 = 0 which means the function got proper target attribute even if it didn't have one, TARGET_OPTIONS and the like, but arm_option_reconfigure_globals wasn't really called when changing current function from the armv7 built one (or the default) and the armv8-a+crc+simd one. I'm afraid this makes LTO not work at all on arm when one mixes command line options between TUs or uses target attribute.
[Bug tree-optimization/96938] New: Failure to optimize bit-setting pattern when not using temporary
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96938 Bug ID: 96938 Summary: Failure to optimize bit-setting pattern when not using temporary Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gabravier at gmail dot com Target Milestone: --- void g(char *f, int offset, char value) { *f = (int)(*f & ~(1 << (offset & 0x1F))) | (value << (offset & 0x1F)); } This has much worse code generation than this: void g(char *f, int offset, char value) { int tmp = *f & ~(1 << (offset & 0x1F)); *f = tmp | (value << (offset & 0x1F)); } Which should be equivalent to the first example. Example of the worse code generation, on x86 the first example compiles to this: g(char*, int, char): movzx ecx, BYTE PTR [rdi] mov eax, 1 movsx edx, dl shlx eax, eax, esi shlx edx, edx, esi andn eax, eax, ecx or eax, edx mov BYTE PTR [rdi], al ret Whereas the second example compiles to this: g(char*, int, char): movsx eax, BYTE PTR [rdi] movsx edx, dl shlx edx, edx, esi btr eax, esi or eax, edx mov BYTE PTR [rdi], al ret
[Bug debug/96937] Duplicate DW_TAG_formal_parameter in out-of-line DW_TAG_subprogram instance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96937 --- Comment #2 from Simon Marchi --- Created attachment 49181 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49181&action=edit Output from creduce I compile the reproducer program with: /opt/gcc/git/bin/g++ -x c++ -g3 -O2 -c bug.c
[Bug debug/96937] Duplicate DW_TAG_formal_parameter in out-of-line DW_TAG_subprogram instance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96937 --- Comment #1 from Simon Marchi --- I passed the program in creduce, the result is not pretty but it's not too big and still reproduces the problem, so I'll attach it anyway.
[Bug debug/96937] New: Duplicate DW_TAG_formal_parameter in out-of-line DW_TAG_subprogram instance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96937 Bug ID: 96937 Summary: Duplicate DW_TAG_formal_parameter in out-of-line DW_TAG_subprogram instance Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: simon.marchi at polymtl dot ca Target Milestone: --- While debugging GDB (compiled with GCC master and -O2) with GDB, I get: 972 do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR addr) (top-gdb) frame #0 do_examine (gdbarch=0x249a6b0, addr=0x400636, fmt=..., fmt=...) at /home/smarchi/src/binutils-gdb/gdb/printcmd.c:972 972 do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR addr) Note that the parameters shown by GDB aren't in the same order as in the function declaration, and that "fmt" is duplicated. 0x0004e0bd: DW_TAG_subprogram DW_AT_abstract_origin [DW_FORM_ref4](0x0004a912 "do_examine") ... 0x0004e0cd: DW_TAG_formal_parameter DW_AT_abstract_origin [DW_FORM_ref4] (0x0004a92e "gdbarch") ... 0x0004e0d6: DW_TAG_formal_parameter DW_AT_abstract_origin [DW_FORM_ref4] (0x0004a93b "addr") ... 0x0004e0df: DW_TAG_variable DW_AT_abstract_origin [DW_FORM_ref4] (0x0004a948 "format") ... 0x0004e0e8: DW_TAG_variable DW_AT_abstract_origin [DW_FORM_ref4] (0x0004a955 "size") ... ... some more variables ... 0x0004e130: DW_TAG_formal_parameter DW_AT_abstract_origin [DW_FORM_ref4] (0x0004a921 "fmt") 0x0004e135: DW_TAG_formal_parameter DW_AT_abstract_origin [DW_FORM_ref4] (0x0004a921 "fmt") This matches what we see in GDB: the parameters are not in the same order as in the function declaration and fmt is duplicated. The abstract origin has them correct: 0x0004a912: DW_TAG_subprogram DW_AT_name [DW_FORM_strp] ("do_examine") ... 0x0004a921: DW_TAG_formal_parameter DW_AT_name [DW_FORM_string] ("fmt") ... 0x0004a92e: DW_TAG_formal_parameter DW_AT_name [DW_FORM_strp] ("gdbarch") ... 0x0004a93b: DW_TAG_formal_parameter DW_AT_name [DW_FORM_strp] ("addr") ... After reading bug #49828, I presume that the fact that the parameters are not in the right order is not considered a bug. GDB could cope with that by sorting them to be in the same order as what's in the abstract origin. However, having fmt there twice could maybe be considered a bug, hence I am reporting it. GDB commit (used as the debugged program): c5cd900e4f197870812c2d3e2c194871c171ef42 GCC commit: 8ad3fc6ca46c603d9c3efe8e6d4a8f2ff1a893a4
[Bug tree-optimization/96920] [10 Regression] ICE segmentation fault in tree-vectorizer at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920 Richard Biener changed: What|Removed |Added Known to work||11.0 Known to fail|11.0| Summary|[10/11 Regression] ICE |[10 Regression] ICE |segmentation fault in |segmentation fault in |tree-vectorizer at -O3 |tree-vectorizer at -O3 --- Comment #6 from Richard Biener --- Fixed on trunk sofar.
[Bug tree-optimization/96698] [10 Regression] ICE during GIMPLE pass:vect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96698 --- Comment #3 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:46a58c779af3055a4b10b285a1f4be28abe4351c commit r11-3013-g46a58c779af3055a4b10b285a1f4be28abe4351c Author: Richard Biener Date: Fri Sep 4 14:35:39 2020 +0200 tree-optimization/96920 - another ICE when vectorizing nested cycles This refines the previous fix for PR96698 by re-doing how and where we arrange for setting vectorized cycle PHI backedge values. 2020-09-04 Richard Biener PR tree-optimization/96698 PR tree-optimization/96920 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Remove vectorized cycle PHI latch code. * tree-vect-loop.c (maybe_set_vectorized_backedge_value): New helper to set vectorized cycle PHI latch values. (vect_transform_loop): Walk over all PHIs again after vectorizing them, calling maybe_set_vectorized_backedge_value. Call maybe_set_vectorized_backedge_value for each vectorized stmt. Remove delayed update code. * tree-vect-slp.c (vect_analyze_slp_instance): Initialize SLP instance reduc_phis member. (vect_schedule_slp): Set vectorized cycle PHI latch values. * gfortran.dg/vect/pr96920.f90: New testcase. * gcc.dg/vect/pr96920.c: Likewise.
[Bug tree-optimization/96920] [10/11 Regression] ICE segmentation fault in tree-vectorizer at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:46a58c779af3055a4b10b285a1f4be28abe4351c commit r11-3013-g46a58c779af3055a4b10b285a1f4be28abe4351c Author: Richard Biener Date: Fri Sep 4 14:35:39 2020 +0200 tree-optimization/96920 - another ICE when vectorizing nested cycles This refines the previous fix for PR96698 by re-doing how and where we arrange for setting vectorized cycle PHI backedge values. 2020-09-04 Richard Biener PR tree-optimization/96698 PR tree-optimization/96920 * tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove. (loop_vec_info::reduc_latch_slp_defs): Likewise. * tree-vect-stmts.c (vect_transform_stmt): Remove vectorized cycle PHI latch code. * tree-vect-loop.c (maybe_set_vectorized_backedge_value): New helper to set vectorized cycle PHI latch values. (vect_transform_loop): Walk over all PHIs again after vectorizing them, calling maybe_set_vectorized_backedge_value. Call maybe_set_vectorized_backedge_value for each vectorized stmt. Remove delayed update code. * tree-vect-slp.c (vect_analyze_slp_instance): Initialize SLP instance reduc_phis member. (vect_schedule_slp): Set vectorized cycle PHI latch values. * gfortran.dg/vect/pr96920.f90: New testcase. * gcc.dg/vect/pr96920.c: Likewise.
[Bug target/96933] rs6000: inefficient code for char/short vec CTOR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933 --- Comment #4 from Segher Boessenkool --- Yes, timing suggests there is some SHL/LHS flush. On p9 and later we can use mtvsrdd instead of mtvsrd (moving two bytes into place at one), which reduces the number of moves from 16 to 8, and the number of merges from 15 to 7 (and reduces path length by 1). This sounds like a no-brainer win with that :-)
[Bug c++/96936] brace initialization of const char* from string literal in specific cases doesn't compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96936 Marek Polacek changed: What|Removed |Added CC||mpolacek at gcc dot gnu.org Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #1 from Marek Polacek --- Dup. I should get to this soon. *** This bug has been marked as a duplicate of bug 84930 ***
[Bug c++/84930] Brace-closed initialization of cstring (i.e."abcdefghi") to coresponding aggregate types fails in certain situation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84930 Marek Polacek changed: What|Removed |Added CC||kirshamir at gmail dot com --- Comment #6 from Marek Polacek --- *** Bug 96936 has been marked as a duplicate of this bug. ***
[Bug preprocessor/96935] ICE in subspan, at input.h:69
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935 --- Comment #3 from Jan Smets --- A bisect resulted in this commit : commit 0d48e8779c6a9ac88f5efd1b4a2d40f43ef75faf Author: David Malcolm Date: Fri Oct 5 19:02:17 2018 + Support string locations for C++ in -Wformat (PR c++/56856) -Wformat in the C++ FE doesn't work as well as it could: (a) it doesn't report precise locations within the string literal, and (b) it doesn't underline arguments for those arguments !CAN_HAVE_LOCATION_P, despite having location wrapper nodes. Your suggestion doesn't trigger it for me. I'v built GCC with -g -O0 , but the standard provided backtrace didn't include function call arguments. A printf confirms your suspicion about start.column == 0 => start.column=0 literal_length=1
[Bug c++/96936] New: brace initialization of const char* from string literal in specific cases doesn't compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96936 Bug ID: 96936 Summary: brace initialization of const char* from string literal in specific cases doesn't compile Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: kirshamir at gmail dot com Target Milestone: --- template auto convert(U&& t) { // fails - see link to compiler explorer: return T{std::forward(t)}; // succeeds: // return T(std::forward(t)); } Code: https://godbolt.org/z/5q5sfb Related to (seem to be the same issue): https://stackoverflow.com/questions/63740618/need-for-stddecay-in-noexcept-operator
[Bug tree-optimization/96820] ICE in verify_sra_access_forest with array and out of bounds reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96820 Martin Jambor changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from Martin Jambor --- Fixed.
[Bug tree-optimization/96820] ICE in verify_sra_access_forest with array and out of bounds reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96820 --- Comment #8 from CVS Commits --- The releases/gcc-10 branch has been updated by Martin Jambor : https://gcc.gnu.org/g:75f5776b3fc4dad7453f8b9cf1690bd2ad628991 commit r10-8709-g75f5776b3fc4dad7453f8b9cf1690bd2ad628991 Author: Martin Jambor Date: Fri Sep 4 14:31:16 2020 +0200 sra: Avoid SRAing if there is an aout-of-bounds access (PR 96820) The testcase causes and ICE in the SRA verifier on x86_64 when compiling with -m32 because build_user_friendly_ref_for_offset looks at an out-of-bounds array_ref within an array_ref which accesses an offset which does not fit into a signed 32bit integer and turns it into an array-ref with a negative index. The best thing is probably to bail out early when encountering an out of bounds access to a local stack-allocated aggregate (and let the DSE just delete such statements) which is what the patch does. I also glanced over to the initial candidate vetting routine to make sure the size would fit into HWI and noticed that it uses unsigned variants whereas the rest of SRA operates on signed offsets and sizes (because get_ref_and_extent does) and so changed that for the sake of consistency. These ancient checks operate on sizes of types as opposed to DECLs but I hope that any issues potentially arising from that are basically hypothetical. gcc/ChangeLog: 2020-08-28 Martin Jambor PR tree-optimization/96820 * tree-sra.c (create_access): Disqualify candidates with accesses beyond the end of the original aggregate. (maybe_add_sra_candidate): Check that candidate type size fits signed uhwi for the sake of consistency. gcc/testsuite/ChangeLog: 2020-08-28 Martin Jambor PR tree-optimization/96820 * gcc.dg/tree-ssa/pr96820.c: New test. (cherry picked from commit 8ad3fc6ca46c603d9c3efe8e6d4a8f2ff1a893a4)
[Bug tree-optimization/96920] [10/11 Regression] ICE segmentation fault in tree-vectorizer at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920 --- Comment #4 from Richard Biener --- Another example: int a[1024]; int b[2048]; void foo (int x, int y) { for (int i = 0; i < 1024; ++i) { int tem0 = b[2*i]; int tem1 = b[2*i+1]; for (int j = 0; j < 32; ++j) { int tem = tem0; tem0 = tem1; tem1 = tem; a[i] += tem0; } } }
[Bug tree-optimization/96929] Failure to optimize right shift of -1 to -1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96929 Jakub Jelinek changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-09-04 CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- We already have a rule for this: /* Optimize -1 >> x for arithmetic right shifts. */ (simplify (rshift integer_all_onesp@0 @1) (if (!TYPE_UNSIGNED (type) && tree_expr_nonnegative_p (@1)) @0)) but the rule requires that the shift count is non-negative. That was added in PR38359 fix. Even current wide_int_binop has: case RSHIFT_EXPR: case LSHIFT_EXPR: if (wi::neg_p (arg2)) { tmp = -arg2; if (code == RSHIFT_EXPR) code = LSHIFT_EXPR; else code = RSHIFT_EXPR; } else tmp = arg2; and so it treats rshift shifts by negative values as left shifts. So, if we wanted to fix this PR, we'd need to remove the tree_expr_nonnegative_p and change wide_int_binop to perhaps best punt on negative arg2 instead of trying to handle it. Wonder what it will break.
[Bug target/96933] rs6000: inefficient code for char/short vec CTOR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933 --- Comment #3 from Richard Biener --- very likely the byte stores and then the following vector load will also trigger STLF issues.
[Bug preprocessor/96935] ICE in subspan, at input.h:69
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935 --- Comment #2 from Richard Biener --- Guess the error is simply that we fall back to no columns and thus start.column == 0 and we do char_span literal = line.subspan (start.column - 1, literal_length); which means input.c:1467 should check whether start.column is >= 1 Might also trigger with printf ("\ ..") who knows. Your backtrace doesn't contain function argument values to verify. Maybe you can build GCC with -O0 once and check?
[Bug target/96769] -mpure-code produces suboptimal code for immediate generation for thumb-1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96769 Christophe Lyon changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #3 from Christophe Lyon --- Fixed on trunk.
[Bug target/96769] -mpure-code produces suboptimal code for immediate generation for thumb-1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96769 --- Comment #2 from CVS Commits --- The master branch has been updated by Christophe Lyon : https://gcc.gnu.org/g:2033a63cbd0aab27d3a8450b4a4a5b371d583c85 commit r11-3011-g2033a63cbd0aab27d3a8450b4a4a5b371d583c85 Author: Christophe Lyon Date: Fri Sep 4 11:48:36 2020 + arm: Improve immediate generation for thumb-1 with -mpurecode [PR96769] This patch moves the move-immediate splitter after the regular ones so that it has lower precedence, and updates its constraints. For int f3 (void) { return 0x1100; } int f3_2 (void) { return 0x12345678; } we now generate: * with -O2 -mcpu=cortex-m0 -mpure-code: f3: movsr0, #136 lslsr0, r0, #21 bx lr f3_2: movsr0, #18 lslsr0, r0, #8 addsr0, r0, #52 lslsr0, r0, #8 addsr0, r0, #86 lslsr0, r0, #8 addsr0, r0, #121 bx lr * with -O2 -mcpu=cortex-m23 -mpure-code: f3: movsr0, #136 lslsr0, r0, #21 bx lr f3_2: movwr0, #22136 movtr0, 4660 bx lr 2020-09-04 Christophe Lyon PR target/96769 gcc/ * config/arm/thumb1.md: Move movsi splitter for arm_disable_literal_pool after the other movsi splitters. gcc/testsuite/ * gcc.target/arm/pure-code/pr96769.c: New test.
[Bug preprocessor/96935] ICE in subspan, at input.h:69
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935 --- Comment #1 from Jan Smets --- Proper backtrace (10.2) x.cpp: In function ‘void a()’: x.cpp:3: internal compiler error: in subspan, at input.h:69 3 | #define DB_PRINTF(str, fmt, args...) db_printf(indent_len, 50, fmt, str, ##args) | x.cpp:7: note: in expansion of macro ‘DB_PRINTF’ 7 | DB_PRINTF("", "%llu", 0); | 0x168ee7b char_span::subspan(int, int) const /jasmets/git/tools/gcc/gcc/input.h:69 0x168ee7b get_substring_ranges_for_loc /jasmets/git/tools/gcc/gcc/input.c:1467 0x168ee7b get_location_within_string(cpp_reader*, string_concat_db*, unsigned int, cpp_ttype, int, int, int, unsigned int*) /jasmets/git/tools/gcc/gcc/input.c:1553 0x8fad84 c_get_substring_location(substring_loc const&, unsigned int*) /jasmets/git/tools/gcc/gcc/c-family/c-common.c:903 0x92e3ad get_corrected_substring /jasmets/git/tools/gcc/gcc/c-family/c-format.c:4505 0x92e3ad format_type_warning /jasmets/git/tools/gcc/gcc/c-family/c-format.c:4721 0x93142b check_format_types /jasmets/git/tools/gcc/gcc/c-family/c-format.c:4266 0x93142b argument_parser::check_argument_type(format_char_info const*, length_modifier const&, tree_node*&, char const*&, bool, unsigned long&, tree_node*&, int, char const*, char const*, unsigned int, char) /jasmets/git/tools/gcc/gcc/c-family/c-format.c:2859 0x9332e0 check_format_info_main /jasmets/git/tools/gcc/gcc/c-family/c-format.c:3998 0x9332e0 check_format_arg /jasmets/git/tools/gcc/gcc/c-family/c-format.c:1821 0x92f3a2 check_format_info /jasmets/git/tools/gcc/gcc/c-family/c-format.c:1543 0x92f3a2 check_function_format(tree_node const*, tree_node*, int, tree_node**, vec*) /jasmets/git/tools/gcc/gcc/c-family/c-format.c:1197 0x922f09 check_function_arguments(unsigned int, tree_node const*, tree_node const*, int, tree_node**, vec*) /jasmets/git/tools/gcc/gcc/c-family/c-common.c:5730 0x77d86f build_over_call /jasmets/git/tools/gcc/gcc/cp/call.c:8901 0x77f2ea build_new_function_call(tree_node*, vec**, int) /jasmets/git/tools/gcc/gcc/cp/call.c:4613 0x8baac6 finish_call_expr(tree_node*, vec**, bool, bool, int) /jasmets/git/tools/gcc/gcc/cp/semantics.c:2672 0x864abf cp_parser_postfix_expression /jasmets/git/tools/gcc/gcc/cp/parser.c:7468 0x84d261 cp_parser_unary_expression /jasmets/git/tools/gcc/gcc/cp/parser.c:8563 0x846d11 cp_parser_cast_expression /jasmets/git/tools/gcc/gcc/cp/parser.c:9459 0x8473e1 cp_parser_binary_expression /jasmets/git/tools/gcc/gcc/cp/parser.c:9562
[Bug preprocessor/96935] New: ICE in subspan, at input.h:69
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96935 Bug ID: 96935 Summary: ICE in subspan, at input.h:69 Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: preprocessor Assignee: unassigned at gcc dot gnu.org Reporter: jan.smets at nokia dot com Target Milestone: --- Following ICE is seen : x.cpp: In function 'void a()': x.cpp:3: internal compiler error: in subspan, at input.h:69 3 | #define DB_PRINTF(str, fmt, args...) db_printf(indent_len, 50, fmt, str, ##args) | x.cpp:7: note: in expansion of macro 'DB_PRINTF' 7 | DB_PRINTF("", "%llu", 0); | Target: x86_64-linux-gnu Configured with: /usr/src/gcc/configure --build=x86_64-linux-gnu --disable-multilib --enable-languages=c,c++,fortran,go Compiled with: -O2 -Wformat Reproduces with 10.2, 10.1 9.3, 9.1 Works with 8.4 7.5 The reduced testcase is : #include "x.h" #define DB_PRINTF(str, fmt, args...) db_printf(indent_len, 50, fmt, str, ##args) extern "C" void db_printf(unsigned indent_len, unsigned column_split, const char * fmt, const char * str, ...) __attribute__ ((format (printf, 3, 5))); void a() { unsigned int indent_len = 0; DB_PRINTF("", "%llu", 0); // preprocesses to: db_printf(indent_len, 50, "%llu", ""); } But I suspect the testcase is just garbage. I tried various ways trying to reduce x.h, but even the slightest change makes the problem go away. "x.h" recursively includes about 200 other header files. A "flat" x.h (700k lines) (-fdirectives-only) does not reproduce the issue. This ICE occurs on couple of dozen files in my project. Some print "during GIMPLE pass: strlen". Goes away with --enable-checking=no, but then other problems show up (ICE in linemap_compare_locations, at libcpp/line-map.c:1359 - which may or may not be related) I suppose my next best option is start a bisect between 8.x and 9.x ? Thanks
[Bug tree-optimization/96512] wrong code generated with avx512 intrinsics in some cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512 H.J. Lu changed: What|Removed |Added Status|WAITING |RESOLVED See Also||https://sourceware.org/bugz ||illa/show_bug.cgi?id=23465 Resolution|--- |MOVED --- Comment #7 from H.J. Lu --- Binutils bug: https://sourceware.org/bugzilla/show_bug.cgi?id=23465
[Bug tree-optimization/96512] wrong code generated with avx512 intrinsics in some cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96512 --- Comment #6 from N Schaeffer --- Hello, Working further on this, it seems to be a problem in the assembler step, but only on some installations. I have a system where gcc 8.3 to 9 and 10 are good (no bug), while another system where gcc 8.3, 9.1 and 10.1 are NOT good (bug!) On the buggy system, when doing: gcc -O1 -D_GCC_VEC_=1 -march=skylake-avx512 -c bug_gcc_avx512.c and disasembling with gdb, one can see the offending instruction has been generated: vbroadcastsd 0x1(,%r8,8),%zmm but when outputing assembly code like so: gcc -O1 -D_GCC_VEC_=1 -march=skylake-avx512 -S bug_gcc_avx512.c the instruction in the bug_gcc_avx512.s file reads: vbroadcastsd8(,%r8,8), %zmm0 invoking now the assembler: as bug_gcc_avx512.s, the offending instruction is indeed generated. vbroadcastsd 0x1(,%r8,8),%zmm0 So here are the "as --version" on various systems: GNU assembler version 2.27-41.base.el7_7.3 ==> NO BUG GNU assembler (GNU Binutils for Debian) 2.28 ==> NO BUG GNU assembler version 2.30-58.el8_1.2 ==> BUG! Assembleur GNU (GNU Binutils) 2.34 ==> NO BUG Assembleur GNU (GNU Binutils) 2.35 ==> NO BUG Maybe I should post this bug report somewhere else?
[Bug middle-end/91490] [9 Regression] bogus argument missing terminating nul warning on strlen of a flexible array member
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91490 Gustaw Smolarczyk changed: What|Removed |Added CC||wielkiegie at gmail dot com --- Comment #8 from Gustaw Smolarczyk --- Bug 96934 is potentially related to this issue, but I have also found a miscompilation of strcmp call that is possibly related.
[Bug c++/96934] Copy initialization of struct involving aggregate array initialization miscompiles in GCC 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96934 --- Comment #1 from Gustaw Smolarczyk --- It seems that part of this issue was already reported in another bug report (though the report is about flexible array members, the comment does not reference them): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91490#c6 However, I don't see any mention of the miscompilation in the thread. Possibly the issue didn't surface in comment 6 as the tested string has only a single character (+ the terminating null character). Or strcmp is needed in order to trigger it.
[Bug target/96933] rs6000: inefficient code for char/short vec CTOR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933 --- Comment #2 from Kewen Lin --- (In reply to Segher Boessenkool from comment #1) > Is that actually faster though? The original has shorter dependency > chains. Or is this to avoid some LHS/SHL? Yes, I tested it with one constructed case, the original version takes 18.20s while the optimized version takes 8.40s. And yes, I guess it's due to LHS/SHL similar to the vec_insert issue xionghu is working on.
[Bug libstdc++/96731] uniform_int_distribution requirement that its type is_integral is too strict
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96731 --- Comment #5 from Tony E Lewis --- Thanks very much for your work on this. That's a shame but I appreciate the problems you've highlighted. > I don't plan to work on this any further for now. Yes, fair enough.
[Bug target/96933] rs6000: inefficient code for char/short vec CTOR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933 --- Comment #1 from Segher Boessenkool --- Is that actually faster though? The original has shorter dependency chains. Or is this to avoid some LHS/SHL?
[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931 --- Comment #4 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:fab77644842869adc8871e133e4c3f4c35b2b245 commit r11-3009-gfab77644842869adc8871e133e4c3f4c35b2b245 Author: Richard Biener Date: Fri Sep 4 12:18:38 2020 +0200 tree-optimization/96931 - clear ctrl-altering flag more aggressively The testcase shows that we fail to clear gimple_call_ctrl_altering_p when the last abnormal edge goes away, causing an edge insert to a loop header edge when we have preheaders to split the edge unnecessarily. The following addresses this by more aggressively clearing the flag in cleanup_call_ctrl_altering_flag. 2020-09-04 Richard Biener PR tree-optimization/96931 * tree-cfgcleanup.c (cleanup_call_ctrl_altering_flag): If there's a fallthru edge and no abnormal edge the call is no longer control-altering. (cleanup_control_flow_bb): Pass down the BB to cleanup_call_ctrl_altering_flag. * gcc.dg/pr96931.c: New testcase.
[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #5 from Richard Biener --- Fixed.
[Bug c++/96934] New: Copy initialization of struct involving aggregate array initialization miscompiles in GCC 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96934 Bug ID: 96934 Summary: Copy initialization of struct involving aggregate array initialization miscompiles in GCC 9 Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: wielkiegie at gmail dot com Target Milestone: --- GCC 9 (but not <= 8 and not >= 10, including trunk) miscompiles the following piece of code. I have tested GCC 9.1 (which produces even worse code, as it assumes the const char* has only a single character) and GCC 9.2, 9.3 (explained below). I was not able to test the current GCC 9 branch. Test case: https://godbolt.org/z/jPq4h7 The Code struct holds an array of chars that is meant to be always null-terminated. A simple constructor is provided to simulate how the array should be initialized (this is a reduced real world scenario). It uses aggregate initialization in order to store the "12" string. There are two problems, probably originating from the same underlying issue: 1. Bogus -Wstringop-overflow warning saying the _buffer is unterminated, while it most certainly is (as you can see in the assembly). 2. std::strcmp call miscompiled as if _buffer == "1" (and not "12"). Switching the TEST define on line 17 into T1, T2, T3, T4 doesn't change the outcome. However, T5 and T6 fix the issue. What seems to be the difference is the copy-initialization [1] being involved in T1 and T2 (and T3, T4 "inheriting" the buggy state). T5 doesn't do copy-initialization, and T6 just copies it. [1] https://en.cppreference.com/w/cpp/language/copy_initialization
[Bug tree-optimization/96920] [10/11 Regression] ICE segmentation fault in tree-vectorizer at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96920 --- Comment #3 from Richard Biener --- It's similar to PR96698 where we had a nested cycle where a cycle PHI was fed by an induction. Here we're feeding the cycle PHI by another cycle PHI so the fancy detection of computing a latch value doesn't work since the def is both the PHI of a cycle _and_ the latch def (in PR96698 we were safe because the def was an induction, not clashing with data structures). I never liked the current code much but now have to think about sth more reliable and workable for SLP.
[Bug target/96933] rs6000: inefficient code for char/short vec CTOR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933 Kewen Lin changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED CC||bergner at gcc dot gnu.org, ||linkw at gcc dot gnu.org, ||segher at gcc dot gnu.org, ||wschmidt at gcc dot gnu.org Summary|inefficient code for|rs6000: inefficient code |char/short vec CTOR |for char/short vec CTOR Last reconfirmed||2020-09-04 Target||powerpc Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org
[Bug target/96933] New: inefficient code for char/short vec CTOR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933 Bug ID: 96933 Summary: inefficient code for char/short vec CTOR Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: linkw at gcc dot gnu.org Target Milestone: --- When I'm investigate the vectorization cost for vec_construct, I happened to find the generated code for vector construction is inefficient with DIRECT_MOVE support. The test case looks like: vector unsigned char test_char(unsigned char f1, unsigned char f2, unsigned char f3, unsigned char f4, unsigned char f5, unsigned char f6, unsigned char f7, unsigned char f8, unsigned char f9, unsigned char f10, unsigned char f11, unsigned char f12, unsigned char f13, unsigned char f14, unsigned char f15, unsigned char f16) { vector unsigned char v = {f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16}; return v; } The generated code currently with -mcpu=power9: : 0: e8 ff a1 fb std r29,-24(r1) 4: f0 ff c1 fb std r30,-16(r1) 8: f8 ff e1 fb std r31,-8(r1) c: 60 00 a1 8b lbz r29,96(r1) 10: 68 00 c1 8b lbz r30,104(r1) 14: 70 00 e1 8b lbz r31,112(r1) 18: d1 ff 81 98 stb r4,-47(r1) 1c: d2 ff a1 98 stb r5,-46(r1) 20: 78 00 81 89 lbz r12,120(r1) 24: 80 00 01 88 lbz r0,128(r1) 28: 88 00 61 89 lbz r11,136(r1) 2c: 90 00 81 88 lbz r4,144(r1) 30: 98 00 a1 88 lbz r5,152(r1) 34: d0 ff 61 98 stb r3,-48(r1) 38: d3 ff c1 98 stb r6,-45(r1) 3c: d4 ff e1 98 stb r7,-44(r1) 40: d8 ff a1 9b stb r29,-40(r1) 44: d5 ff 01 99 stb r8,-43(r1) 48: d6 ff 21 99 stb r9,-42(r1) 4c: d7 ff 41 99 stb r10,-41(r1) 50: d9 ff c1 9b stb r30,-39(r1) 54: da ff e1 9b stb r31,-38(r1) 58: db ff 81 99 stb r12,-37(r1) 5c: dc ff 01 98 stb r0,-36(r1) 60: dd ff 61 99 stb r11,-35(r1) 64: de ff 81 98 stb r4,-34(r1) 68: df ff a1 98 stb r5,-33(r1) 6c: e8 ff a1 eb ld r29,-24(r1) 70: f0 ff c1 eb ld r30,-16(r1) 74: f8 ff e1 eb ld r31,-8(r1) 78: d9 ff 41 f4 lxv vs34,-48(r1) 7c: 20 00 80 4e blr But it can be more efficient with direct move and vector merge, such as: 0: 67 01 43 7c mtvsrd vs34,r3 4: 68 00 61 80 lwz r3,104(r1) 8: 60 00 61 81 lwz r11,96(r1) c: 67 01 64 7c mtvsrd vs35,r4 10: 70 00 81 80 lwz r4,112(r1) 14: 67 01 03 7d mtvsrd vs40,r3 18: 78 00 61 80 lwz r3,120(r1) 1c: 67 01 85 7c mtvsrd vs36,r5 20: 67 01 a6 7c mtvsrd vs37,r6 24: 67 01 07 7c mtvsrd vs32,r7 28: 67 01 28 7c mtvsrd vs33,r8 2c: 67 01 24 7d mtvsrd vs41,r4 30: 80 00 81 80 lwz r4,128(r1) 34: 0c 10 43 10 vmrghb v2,v3,v2 38: 67 01 63 7c mtvsrd vs35,r3 3c: 88 00 61 80 lwz r3,136(r1) 40: 67 01 eb 7c mtvsrd vs39,r11 44: 0c 20 85 10 vmrghb v4,v5,v4 48: 67 01 a4 7c mtvsrd vs37,r4 4c: 90 00 81 80 lwz r4,144(r1) 50: 0c 00 01 10 vmrghb v0,v1,v0 54: 67 01 23 7c mtvsrd vs33,r3 58: 98 00 61 80 lwz r3,152(r1) 5c: 67 01 c9 7c mtvsrd vs38,r9 60: 0c 38 e8 10 vmrghb v7,v8,v7 64: 67 01 04 7d mtvsrd vs40,r4 68: 0c 48 63 10 vmrghb v3,v3,v9 6c: 67 01 23 7d mtvsrd vs41,r3 70: 0c 28 a1 10 vmrghb v5,v1,v5 74: 67 01 2a 7c mtvsrd vs33,r10 78: 0c 40 09 11 vmrghb v8,v9,v8 7c: 0c 30 21 10 vmrghb v1,v1,v6 80: 4c 11 44 10 vmrglh v2,v4,v2 84: 4c 39 63 10 vmrglh v3,v3,v7 88: 4c 29 88 10 vmrglh v4,v8,v5 8c: 4c 01 a1 10 vmrglh v5,v1,v0 90: 8c 19 64 10 vmrglw v3,v4,v3 94: 8c 11 45 10 vmrglw v2,v5,v2 98: 57 13 43 f0 xxmrgld vs34,vs35,vs34
[Bug target/95535] Failure to optimize out cdqe after __bultin_ctz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95535 Gabriel Ravier changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Gabriel Ravier --- Looks like this is fixed but not marked as such, so I'll make it so.
[Bug c++/83591] -Wduplicated-branches fires in system headers in template instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83591 --- Comment #7 from Tony E Lewis --- Thanks for this comment T vd Sijs. Yes - I'm also able to compile this without problem in 9.3 (and in 10.1). Manuel López-Ibáñez: are you happy that all underlying issues are resolved and this can be closed?
[Bug tree-optimization/94893] Sign function not getting optimized to simple compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94893 Gabriel Ravier changed: What|Removed |Added Target|x86_64-*-* | Blocks||19987 --- Comment #1 from Gabriel Ravier --- No idea why it was marked as x86-specific, it isn't as far as I can see, so I removed the target here. Do tell me if this is wrong. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 [Bug 19987] [meta-bug] fold missing optimizations in general
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #9 from Dimitrij Mijoski --- Ignore my last comment, here is it fixed. Looking again at my proposed fix in comment #7, i concluded it is not the best fix. It will fix the testsuite in the same comment #7, but I discovered another class of errors related to the lines I am touching in that proposed fix. The error is when we have an incomplete sequence which is in the middle of the from range, and not at the end. In such cases codecvt_base::error should be returned. The bug exists in UTF8->UTF16, UTF8->UCS4 and UTF16->UCS4. I guess some more test need to be written about returning error.
[Bug tree-optimization/94880] Failure to recognize andn pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94880 Gabriel Ravier changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Gabriel Ravier --- This seems to be fixed, but it isn't closed. I'll close it myself, but do tell me if I somehow missed something.
[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419 --- Comment #8 from Dimitrij Mijoski --- Looking again at my proposed fix in comment #6, i concluded it is not the best fix. It will fix the testsuite in the same comment #6, but I discovered another class of errors related to the lines I am touching in that proposed fix. The error is when we have an incomplete sequence which is in the middle of the from range, and not at the end. In such cases codecvt_base::error should be returned. The bug exists in UTF8->UTF16, UTF8->UCS4 and UTF16->UCS4. I guess some more test need to be written about returning error.
[Bug target/96898] [nvptx] libatomic support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- For OpenMP reductions, we really don't care what kind of mutex protects the updates, as long as it is the same for all updates of the same reduction. I believe we don't rely on any other synchronization effects. So, I think we should change omp-low.c so that it emits __atomic_* calls with __ATOMIC_RELAXED rather than __sync_* calls. And could just use libatomic with its own locking if we didn't go the GOMP_atomic_{start,end} route (that one is done if there are multiple reductions or the atomics aren't available or there are user defined reductions we don't understand (or all?), perhaps we should consider also using atomics perhaps even for two simple reductions or similar. And nvptx certainly could just use libatomic...
[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931 --- Comment #3 from Richard Biener --- So the testcase only triggers on trunk because store commoning is new there and it transforms (interestingly!) [local count: 10631108]: p3 (); bl = 0; [local count: 1073741824]: bl.0_1 = bl; _2 = bl.0_1 + 1; bl = _2; goto ; [100.00%] to [local count: 10631108]: p3 (); [local count: 1073741824]: # _8 = PHI <0(2), _2(3)> bl = _8; bl.0_1 = bl; _2 = bl.0_1 + 1; goto ; [100.00%] which predcom tries to "improve" to [local count: 10631108]: p3 (); _10 = bl; [local count: 1073741824]: # _8 = PHI <0(2), _2(4)> # bl_lsm0.3_7 = PHI <_10(2), bl_lsm0.3_6(4)> bl = _8; bl_lsm0.3_6 = _8; bl.0_1 = bl_lsm0.3_6; _2 = bl.0_1 + 1; [local count: 1073741824]: goto ; [100.00%] so the situation is quite arcane and eventually not worth fixing on branches.
[Bug target/96932] New: [nvptx] atomic_exchange missing barrier
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96932 Bug ID: 96932 Summary: [nvptx] atomic_exchange missing barrier Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- After digging into GOMP_atomic_start/end I realized these also imply barrier semantics. And looking at the source code used for nvptx in libgomp/config/accel/mutex.h, that should be fine: ... static inline void gomp_mutex_lock (gomp_mutex_t *mutex) { while (__sync_lock_test_and_set (mutex, 1)) /* spin */ ; } static inline void gomp_mutex_unlock (gomp_mutex_t *mutex) { __sync_lock_release (mutex); } ... However, when looking at the resulting code in libgomp.a we see there's no barrier for GOMP_atomic_start: ... .visible .func GOMP_atomic_start { .reg .u32 %r22; .reg .pred %r23; $L2: .loc 1 51 10 atom.global.exch.b32 %r22,[atomic_lock],1; .loc 1 51 9 setp.ne.u32 %r23,%r22,0; @ %r23 bra $L2; .loc 2 43 1 ret; } ... While there is for GOMP_atomic_end: ... .visible .func GOMP_atomic_end { .reg .u32 %r22; .loc 1 58 3 membar.sys; mov.u32 %r22,0; st.global.u32 [atomic_lock],%r22; .loc 2 49 1 ret; } ...
[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931 --- Comment #2 from Richard Biener --- diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c index b1d6e63559c..af71c269f4b 100644 --- a/gcc/tree-predcom.c +++ b/gcc/tree-predcom.c @@ -1960,7 +1960,8 @@ initialize_root_vars_lm (class loop *loop, dref root, bool written, init = force_gimple_operand (init, &stmts, written, NULL_TREE); if (stmts) -gsi_insert_seq_on_edge_immediate (entry, stmts); +if (gsi_insert_seq_on_edge_immediate (entry, stmts)) + entry = loop_preheader_edge (loop); if (written) { guess even with simple preheaders passes need to not assume inserting on the entry edge will not split it (the call in the preheader ends the BB because we've had returns_twice functions). Now the ie() call was removed and so was the abnormal edge from p3() so likely gimple_call_ctrl_altering_p should have been cleared from it which would be a missed optimization. Thus an alternative fix is there.
[Bug target/96898] [nvptx] libatomic support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898 --- Comment #2 from Tom de Vries --- Hmm, I found this difference: - AFAIU, GOMP_atomic_start/end have barrier semantics - libatomics protect_start/end are always paired with explicit barriers, so presumably these don't have barrier semantics So, using GOMP_atomic_start for protect_start in libatomics will have the effect of issuing the barrier twice, which might be a performance problem.
[Bug tree-optimization/96931] [11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96931 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed||2020-09-04 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Target Milestone|--- |11.0 --- Comment #1 from Richard Biener --- Mine.