[Bug middle-end/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 Andrew Pinski changed: What|Removed |Added Component|libstdc++ |middle-end Severity|normal |enhancement
[Bug tree-optimization/83190] missing strlen optimization of the empty string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83190 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/82911] missing strlen optimization for strncpy with constant strings and constant bound
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82911 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-09-05 Status|UNCONFIRMED |NEW Severity|normal |enhancement --- Comment #1 from Andrew Pinski --- Confirmed. A related testcase is: void f1 (char *d, char *e, bool b) { d[2] = 0; if (__builtin_strlen (d) > 2) // not eliminated but could be __builtin_abort (); } where the strlen's range should be [0,2]. Maybe we can add a class to the ranger for string and do the optimization that way instead. so the null store to d[2] the range for the string becomes [0,2].
[Bug rtl-optimization/93525] Left shift and arithmetic shift could be futher simplified in simplify-rtx.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93525 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/93539] memmove over self with result of string function not eliminated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93539 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-09-05 Ever confirmed|0 |1 Severity|normal |enhancement Depends on||82991 Status|UNCONFIRMED |NEW See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=82991 --- Comment #1 from Andrew Pinski --- Confirmed, PR 82991 is related and will most likely solve this too. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82991 [Bug 82991] memcpy and strcpy return value can be assumed to be equal to first argument
[Bug tree-optimization/93556] lower mempcpy to memcpy when result is unused
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93556 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-09-05 Severity|normal |enhancement --- Comment #1 from Andrew Pinski --- Confirmed.
[Bug tree-optimization/93560] strstr(s, s) not folded to s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93560 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Status|UNCONFIRMED |NEW Last reconfirmed||2021-09-05 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed, LLVM does this.
[Bug target/93737] inline memmove for insertion into small arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93737 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-09-05 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Severity|normal |enhancement --- Comment #5 from Andrew Pinski --- Confirmed.
[Bug target/93396] [RX] tail call optimization does not work with indirect call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93396 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug middle-end/91409] Missed optimization on `labels as values` expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91409 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Component|target |middle-end
[Bug rtl-optimization/52082] Memory loads not rematerialized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52082 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement --- Comment #3 from Andrew Pinski --- One thing I noticed that LLVM does to reduce the register pressure is: (z ? v4 [k] : v3 [k]) Gets pulled out of the loop such that it is: tmpaddr = z ? v4 : v3; and then inside the loop it does: (tempaddr)[k] GCC still has (I changed the bb order just so it is easier to see what is going on): if (z_39(D) != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 5427362]: _21 = v3.3_18 + _157; iftmp.1_40 = *_21; goto ; [100.00%] [local count: 5427362]: _17 = v4.2_14 + _157; iftmp.1_41 = *_17; [local count: 10854724]: # m_8 = PHI if (m_8 != 0B) goto ; [94.50%] else goto ; [5.50%] we should able to do the similar it seems and need two less registers; one to hold z and one to hold either v3 or v4. This won't be enough for this testcase but it will be something.
[Bug target/91103] AVX512 vector element extract uses more than 1 shuffle instruction; VALIGND can grab any element
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91103 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug ipa/88231] aligned functions laid down inefficiently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88231 Andrew Pinski changed: What|Removed |Added Severity|minor |enhancement
[Bug tree-optimization/89043] strcat (strcpy (d, a), b) not folded to stpcpy (strcpy (d, a), b)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89043 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/86604] phiopt missed optimization of conditional add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86604 Andrew Pinski changed: What|Removed |Added CC||pinskia at gcc dot gnu.org Severity|normal |enhancement --- Comment #2 from Andrew Pinski --- Maybe something like: (simplify (cond (ne bool@0 integer_zerop) (plus @1 integer_onep) @1) (plus (convert @0) @1)) Where bool is defined to be a var that is in the range of [0,1]. This seems like what LLVM does.
[committed] Improve H8/300 C bit handling
These are various minor improvements to our C bit setcc handling. First the mode of the operands being compared can be independent of the mode of the destination. This allows us to pick up a few more cases. Second, the result of an setcc can feed a negate insn. Producing -1,0 is actually cheaper for the C bit than 1,0, so clearly something we should be supporting and it happens fairly regularly. Third we can use bst, bist and rotxr to store the C bit into a variety of bit positions in the destination which allows us to combine the setcc with a subsequent left shift. I haven't really seen this with C, but I have seen it semi-regularly with Z (shifting it to the sign bit in the destination in particular). Regardless, the bits are in place to handle it for C. Overall it saves a few bytes & cycles here and there. Nothing near as effective as the initial C support. Tested without regressions. Installed on the trunk, Jeff commit b27416a7a91b7e6b6b018411ac85cad556ff9903 Author: Jeff Law Date: Sun Sep 5 00:08:34 2021 -0400 Improve handling of C bit for setcc insns gcc/ * config/h8300/h8300.md (QHSI2 mode iterator): New mode iterator. * config/h8300/testcompare.md (store_c): Update name, use new QHSI2 iterator. (store_neg_c, store_shifted_c): New patterns. diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md index 89bfcf11126..e81e21b103e 100644 --- a/gcc/config/h8300/h8300.md +++ b/gcc/config/h8300/h8300.md @@ -223,6 +223,7 @@ (define_mode_iterator HSI [HI SI]) (define_mode_iterator QHSI [QI HI SI]) +(define_mode_iterator QHSI2 [QI HI SI]) (define_mode_iterator QHSIF [QI HI SI SF]) diff --git a/gcc/config/h8300/testcompare.md b/gcc/config/h8300/testcompare.md index 9ff7a51077e..0ee3e360bea 100644 --- a/gcc/config/h8300/testcompare.md +++ b/gcc/config/h8300/testcompare.md @@ -212,11 +212,96 @@ } [(set (attr "length") (symbol_ref "mode == SImode ? 6 : 4"))]) +;; Similarly, but with a negated result +(define_insn "*store_neg_c_" + [(set (match_operand:QHSI 0 "register_operand" "=r") + (neg:QHSI (ne:QHSI (reg:CCC CC_REG) (const_int 0] + "reload_completed" + { +if (mode == QImode) + return "subx\t%X0,%X0"; +else if (mode == HImode) + return "subx\t%X0,%X0\;exts.w\t%T0"; +else if (mode == SImode) + return "subx\t%X0,%X0\;exts.w\t%T0\;exts.l\t%S0"; +gcc_unreachable (); + } + [(set + (attr "length") + (symbol_ref "(mode == SImode ? 6 : mode == HImode ? 4 : 2)"))]) + +;; Using b[i]st we can store the C bit into any of the low 16 bits of +;; a destination. We can also rotate it up into the high bit of a 32 bit +;; destination. +(define_insn "*store_shifted_c" + [(set (match_operand:QHSI 0 "register_operand" "=r") + (ashift:QHSI (eqne:QHSI (reg:CCC CC_REG) (const_int 0)) +(match_operand 1 "immediate_operand" "n")))] + "(reload_completed +&& (INTVAL (operands[1]) == 31 || INTVAL (operands[1]) <= 15))" + { +if ( == NE) + { + if (mode == QImode) + return "xor.b\t%X0,%X0\;bst\t%1,%X0"; + else if (mode == HImode && INTVAL (operands[1]) < 8) + return "xor.w\t%T0,%T0\;bst\t%1,%X0"; + else if (mode == HImode) + { + operands[1] = GEN_INT (INTVAL (operands[1]) - 8); + output_asm_insn ("xor.w\t%T0,%T0\;bst\t%1,%t0", operands); + return ""; + } + else if (mode == SImode && INTVAL (operands[1]) == 31) + return "xor.l\t%S0,%S0\;rotxr.l\t%S0"; + else if (mode == SImode && INTVAL (operands[1]) < 8) + return "xor.l\t%S0,%S0\;bst\t%1,%X0"; + else if (mode == SImode) + { + operands[1] = GEN_INT (INTVAL (operands[1]) - 8); + output_asm_insn ("xor.l\t%S0,%S0\;bst\t%1,%t0", operands); + return ""; + } + gcc_unreachable (); + } +else if ( == EQ) + { + if (mode == QImode) + return "xor.b\t%X0,%X0\;bist\t%1,%X0"; + else if (mode == HImode && INTVAL (operands[1]) < 8) + return "xor.w\t%T0,%T0\;bist\t%1,%X0"; + else if (mode == HImode) + { + operands[1] = GEN_INT (INTVAL (operands[1]) - 8); + output_asm_insn ("xor.w\t%T0,%T0\;bist\t%1,%t0", operands); + return ""; + } + else if (mode == SImode && INTVAL (operands[1]) == 31) + return "xor.l\t%S0,%S0\;bixor\t#0,%X0\;rotxr.l\t%S0"; + else if (mode == SImode && INTVAL (operands[1]) < 8) + return "xor.l\t%S0,%S0\;bist\t%1,%X0"; + else if (mode == SImode) + { + operands[1] = GEN_INT (INTVAL (operands[1]) - 8); + output_asm_insn ("xor.l\t%S0,%S0\;bist\t%1,%t0", operands); + return ""; + } + gcc_unreachable (); + } +gcc_unreachable (); + } + [(set + (attr "length") + (symbol_ref "(mode == QImode ? 4 + : mode ==
[Bug tree-optimization/86241] duplicate strlen-like snprintf calls not folded
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86241 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Severity|normal |enhancement Status|UNCONFIRMED |NEW Last reconfirmed||2021-09-05 --- Comment #3 from Andrew Pinski --- Confirmed.
[Bug tree-optimization/86339] DOM does not handle RHS COND_EXPRs well
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86339 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/85116] std::min_element does not optimize well with inlined predicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Last reconfirmed|2018-03-29 00:00:00 |2021-9-4 Component|libstdc++ |tree-optimization
[Bug middle-end/86085] I/O built-ins considered argument clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86085 --- Comment #3 from Andrew Pinski --- I thought builtin_fnspec and friends would have optimized this case but no. In fact starting with GCC 10, f even regresses, starting with r10-2814.
[Bug middle-end/86085] I/O built-ins considered argument clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86085 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2018-06-13 00:00:00 |2021-9-4 Severity|normal |enhancement Component|tree-optimization |middle-end
[Bug target/102205] New: vec + 1 could be done as vec - (-1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102205 Bug ID: 102205 Summary: vec + 1 could be done as vec - (-1) Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: x86_64 Take: template using V [[gnu::vector_size(16)]] = T; auto a1(V< int> b) { return 1 + b; } CUT Currently GCC produces: a1(int __vector(4)): paddd .LC0(%rip), %xmm0 ret .cfi_endproc .LFE0: .size a1(int __vector(4)), .-a1(int __vector(4)) .section.rodata.cst16,"aM",@progbits,16 .align 16 .LC0: .long 1 .long 1 .long 1 .long 1 But it might be best if GCC produces (like LLVM): a1(int __vector(4)): pcmpeqd %xmm1, %xmm1 psubd %xmm1, %xmm0 retq
[Bug target/85324] missing constant propagation on SSE/AVX conversion intrinsics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85324 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Last reconfirmed|2018-04-11 00:00:00 |2021-9-4
[Bug c++/102204] New: OpenMP offload map type restriction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102204 Bug ID: 102204 Summary: OpenMP offload map type restriction Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: xw111luoye at gmail dot com Target Milestone: --- With branch devel/omp/gcc-11 I'm getting /home/yeluo/opt/qmcpack/build_rtx3060_gcc_offload_real/src/config.h:42:29: error: array section does not have mappable type in ‘map’ clause 42 | #define PRAGMA_OFFLOAD(x) _Pragma(x) | ^~~ /home/yeluo/opt/qmcpack/src/Particle/SoaDistanceTableAAOMPTarget.h:84:5: note: in expansion of macro ‘PRAGMA_OFFLOAD’ 84 | PRAGMA_OFFLOAD("omp target enter data map(to : this[:1])") | ^~ In file included from /home/yeluo/opt/qmcpack/src/Particle/createDistanceTableAAOMPTarget.cpp:19: /home/yeluo/opt/qmcpack/src/Particle/SoaDistanceTableAAOMPTarget.h:31:8: note: type ‘qmcplusplus::SoaDistanceTableAAOMPTarget’ with virtual members is not mappable 31 | struct SoaDistanceTableAAOMPTarget : public DTD_BConds, public DistanceTableData |^~~ because SoaDistanceTableAAOMPTarget is a derived class and there is virtual function overriding. https://github.com/QMCPACK/qmcpack/blob/1a7af8e589726a91da94e5f6ad8b4e8d9e2acd4d/src/Particle/SoaDistanceTableAAOMPTarget.h#L31 In my case virtual functions are never called in offload region and I map "this[:1]" for easy access a fixed data set. So I'm expecting just bit wise copy to the device. please remove this restriction.
[Bug c++/98869] Allowing mapping this in OpenMP target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98869 --- Comment #3 from Ye Luo --- This doesn't work with gcc 11.2 but works on devel/omp/gcc-11 branch.
[Bug middle-end/84756] Multiplication done twice just to get upper and lower parts of product
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84756 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-09-05 Component|target |middle-end Status|UNCONFIRMED |NEW Severity|normal |enhancement Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- Confirmed, we should be able to do part (all?) of this at the gimple level: _3 = a_6(D) w* b_7(D); _4 = _3 >> 64; _5 = (long unsigned int) _4; *upper_9(D) = _5; _11 = a_6(D) * b_7(D); return _11; (long unsigned int)_3 is the same as _11.
[Bug ipa/84312] Variadic function without named argument not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84312 Andrew Pinski changed: What|Removed |Added Known to fail||9.4.0 Resolution|--- |FIXED Status|NEW |RESOLVED CC||marxin at gcc dot gnu.org Component|tree-optimization |ipa See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=70929 Severity|normal |enhancement Known to work||10.1.0 --- Comment #2 from Andrew Pinski --- Fixed in GCC 10 by r10-.
[Bug tree-optimization/85406] Unnecessary blend when vectorizing short-cutted calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85406 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement --- Comment #7 from Andrew Pinski --- I Noticed clang/LLVM does not do this either nor ICC.
[PATCH, Fortran] Skip gfortran.dg/PR100914.f90 on targets that don't provide quadmath.h
The testcase gfortran.dg/PR100914.f90 that I recently checked in (originally written by José Rui Faustino de Sousa) depends on the header file to obtain a typedef for __complex128. It appears not to be possible to define an equivalent type in a portable way in the testcase itself (see https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Floating-Types.html) so this patch skips the test entirely on targets where quadmath.h is not available. The target-supports.exp change was cut-and-pasted from similar code in that file, but I haven't figured out how to test this change in a build that doesn't provide quadmath.h (e.g., my aarch64-linux-gnu toolchain build attempt croaked with an unrelated compilation error in glibc). Perhaps someone who previously encountered the FAILs on this testcase can confirm that it's skipped with this change? -Sandra commit 41fe3b50b3d92931fc99ef15f86cc9299e0c617e Author: Sandra Loosemore Date: Sat Sep 4 18:36:39 2021 -0700 Skip gfortran.dg/PR100914.f90 on targets that don't provide quadmath.h. This test uses the __complex128 type, which is provided by the header which may not be available on all targets. 2021-09-04 Sandra Loosemore gcc/testsuite/ * lib/target-supports.exp (check_effective_target_quadmath_h): New function. * gfortran.dg/PR100914.f90: Use it. Add comments. diff --git a/gcc/testsuite/gfortran.dg/PR100914.f90 b/gcc/testsuite/gfortran.dg/PR100914.f90 index 64b3335..aff405a 100644 --- a/gcc/testsuite/gfortran.dg/PR100914.f90 +++ b/gcc/testsuite/gfortran.dg/PR100914.f90 @@ -1,7 +1,10 @@ ! Fails on x86 targets where sizeof(long double) == 16. ! { dg-do run { xfail { { x86_64*-*-* i?86*-*-* } && longdouble128 } } } -! { dg-additional-sources PR100914.c } +! Requires Fortran support for __float128. ! { dg-require-effective-target fortran_real_c_float128 } +! Requires __complex128 type from quadmath.h. +! { dg-require-effective-target quadmath_h } +! { dg-additional-sources PR100914.c } ! ! Test the fix for PR100914 ! diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index ad8f011..072b776 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8340,6 +8340,14 @@ proc check_effective_target_libc_has_complex_functions {} { }] } +# Return true if this target has the quadmath.h header. + +proc check_effective_target_quadmath_h {} { +return [check_no_compiler_messages quadmath_h object { + #include +}] +} + # Return 1 if # (a) an error of a few ULP is expected in string to floating-point # conversion functions; and
[Bug rtl-optimization/80301] Sub-optimal code with an array of structs offsetted inside a struct global on x86/x86_64 at -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80301 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement --- Comment #4 from Andrew Pinski --- We are able to do the 2->2 combine now (after r9-2064): Trying 9 -> 10: 9: {r87:DI=r86:DI+0x2;clobber flags:CC;} REG_DEAD r86:DI REG_UNUSED flags:CC 10: flags:CCZ=cmp([r87:DI*0x8+`m'],r83:SI) Failed to match this instruction: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (mem:SI (plus:DI (mult:DI (reg:DI 86 [ indexD.2442 ]) (const_int 8 [0x8])) (const:DI (plus:DI (symbol_ref:DI ("m") [flags 0x2] ) (const_int 16 [0x10] [1 mD.2375.sD.2374[index_4(D)].aD.2372+0 S4 A64]) (reg:SI 83 [ ]))) (set (reg:DI 87) (plus:DI (reg:DI 86 [ indexD.2442 ]) (const_int 2 [0x2]))) ]) Failed to match this instruction: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (mem:SI (plus:DI (mult:DI (reg:DI 86 [ indexD.2442 ]) (const_int 8 [0x8])) (const:DI (plus:DI (symbol_ref:DI ("m") [flags 0x2] ) (const_int 16 [0x10] [1 mD.2375.sD.2374[index_4(D)].aD.2372+0 S4 A64]) (reg:SI 83 [ ]))) (set (reg:DI 87) (plus:DI (reg:DI 86 [ indexD.2442 ]) (const_int 2 [0x2]))) ]) Successfully matched this instruction: (set (reg:DI 87) (plus:DI (reg:DI 86 [ indexD.2442 ]) (const_int 2 [0x2]))) Successfully matched this instruction: (set (reg:CCZ 17 flags) (compare:CCZ (mem:SI (plus:DI (mult:DI (reg:DI 86 [ indexD.2442 ]) (const_int 8 [0x8])) (const:DI (plus:DI (symbol_ref:DI ("m") [flags 0x2] ) (const_int 16 [0x10] [1 mD.2375.sD.2374[index_4(D)].aD.2372+0 S4 A64]) (reg:SI 83 [ ]))) allowing combination of insns 9 and 10 original costs 4 + 13 = 17 replacement costs 4 + 13 = 17 modifying insn i2 9: r87:DI=r86:DI+0x2 deferring rescan insn with uid = 9. modifying insn i310: flags:CCZ=cmp([r86:DI*0x8+const(`m'+0x10)],r83:SI) REG_DEAD r86:DI deferring rescan insn with uid = 10. But then we don't sink the add into the conditional and do the combine there. The code we get now is: func(unsigned int): movl%edi, %edx movq%rdx, %rax leaq2(%rdx), %rcx cmpl%edx, m+16(,%rdx,8) je .L1 movlm+4(,%rcx,8), %eax .L1: ret
[Bug target/102203] New: __builtin_memset and __builtin_memcpy could be expanded inline if range is known to be small
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102203 Bug ID: 102203 Summary: __builtin_memset and __builtin_memcpy could be expanded inline if range is known to be small Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* Take: typedef decltype(sizeof(0)) size_t; void g(size_t a, char *d, char *e) { if (a>16)__builtin_unreachable(); __builtin_memcpy(d, e, a); } - CUT This could be inlined like it is on x86_64.
[Bug target/102202] Inefficent expansion of memset when range is [0,1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102202 --- Comment #2 from Andrew Pinski --- I wonder if we could do this expansion at the gimple level ... Though introducing branches might not be happy for some.
[Bug target/102202] Inefficent expansion of memset when range is [0,1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102202 --- Comment #1 from Andrew Pinski --- Likewise for memcpy: typedef decltype(sizeof(0)) size_t; void g(size_t a, char *d, char *e) { __builtin_memcpy(d, e, a&1); }
[Bug target/102202] New: Inefficent expansion of memset when range is [0,1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102202 Bug ID: 102202 Summary: Inefficent expansion of memset when range is [0,1] Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* Take: void g(int a, char *d) { if (a < 0 || a > 1) __builtin_unreachable(); __builtin_memset(d, 0, a); } - CUT - GCC compiles on x86_64 to: g(int, char*): .cfi_startproc testl %edi, %edi je .L1 xorl%eax, %eax .L2: movl%eax, %edx addl$1, %eax movb$0, (%rsi,%rdx) cmpl%edi, %eax jb .L2 .L1: ret Which is better than clang/LLVM/ICC does but the loop is not needed as a will either be 0 or 1 and we already jump around the loop. Here is another example not using __builtin_unreachable: void g(int a, char *d) { __builtin_memset(d, 0, a&1); }
[Bug tree-optimization/66646] small loop turned into memmove because of tree ldist
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66646 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Last reconfirmed|2015-06-24 00:00:00 |2021-9-4
[Bug target/101059] v4sf reduction not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101059 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/93745] Redundant store not eliminated with intermediate instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93745 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/95410] Failure to optimize compare next to and properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95410 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 Andrew Pinski changed: What|Removed |Added CC||jengelh at inai dot de --- Comment #14 from Andrew Pinski --- *** Bug 99383 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/99383] No tree-switch-conversion under PIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99383 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #8 from Andrew Pinski --- Dup of bug 84011. *** This bug has been marked as a duplicate of bug 84011 ***
[Bug tree-optimization/99383] No tree-switch-conversion under PIC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99383 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=93326, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=36881
[Bug tree-optimization/93326] switch optimisation of multiple jumptables into a lookup
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93326 --- Comment #6 from Andrew Pinski --- (In reply to Andrew Pinski from comment #5) > So for the -fPIC case, we don't want to increase the number of runtime > relocations done. The number of runtime locations will happen in the > constable load table. I think we don't want to change that. And that is PR 99383.
[Bug tree-optimization/85316] [meta-bug] VRP range propagation missed cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316 Bug 85316 depends on bug 98357, which changed state. Bug 98357 Summary: Bounds check not eliminated https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98357 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/98357] Bounds check not eliminated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98357 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |12.0 Status|NEW |RESOLVED --- Comment #4 from Andrew Pinski --- Fixed on the trunk by some of the improvements to VRP (range).
[Bug tree-optimization/94846] Failure to optimize jnc+inc into adc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94846 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement --- Comment #5 from Andrew Pinski --- After r12-897 (which added a late sink pass), we get the following in .optimized: if (_10 != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 536870913]: _2 = _1 + 1; [local count: 1073741824]: # prephitmp_11 = PHI <_1(2), _2(3)> # _13 = PHI <_1(2), _2(3)> *p_5(D) = _13; return prephitmp_11; Notice how prephitmp_11 and _13 are the same but no RTL optimizers handles that.
[Bug target/98453] aarch64: Missed opportunity for STP for vec_duplicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98453 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-09-05 Severity|normal |enhancement Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed. Plus these functions too: typedef double v2df __attribute__((vector_size (16))); typedef float v2sf __attribute__((vector_size (8))); void food (v2df *x, double a) { v2df tmp = {a, a}; *x = tmp; } void foof (v2sf *x, float a) { v2sf tmp = {a, a}; *x = tmp; }
[Bug middle-end/19987] [meta-bug] fold missing optimizations in general
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 Bug 19987 depends on bug 95527, which changed state. Bug 95527 Summary: Failure to optimize __builtin_ffs == 0 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95527 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/95527] Failure to optimize __builtin_ffs == 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95527 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |11.0 Status|NEW |RESOLVED --- Comment #6 from Andrew Pinski --- Fixed.
[Bug tree-optimization/85316] [meta-bug] VRP range propagation missed cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85316 Bug 85316 depends on bug 85375, which changed state. Bug 85375 Summary: possible missed optimisation / regression from 6.3 with while (__builtin_ffs(x) && x) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85375 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/85375] possible missed optimisation / regression from 6.3 with while (__builtin_ffs(x) && x)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85375 Andrew Pinski changed: What|Removed |Added Known to fail||10.3.0 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=95527 Target Milestone|--- |11.0 Resolution|--- |FIXED Status|NEW |RESOLVED Known to work||11.1.0 --- Comment #3 from Andrew Pinski --- After r11-1080 (PR 95527), __builtin_ffs(x) && x becomes just x != 0 and optimized. So yes fixed for GCC 11.
[Bug rtl-optimization/94798] Failure to optimize subtraction and 0 literal properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94798 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Severity|normal |enhancement Last reconfirmed||2021-09-04
[Bug rtl-optimization/97603] Failure to optimize out compare into reuse of subtraction result
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97603 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[PATCH 2/2 v2] jit : Generate debug info for variables, testcase
>From 87d081f6b4233446f8a45f76dfd674f1e0b6aafe Mon Sep 17 00:00:00 2001 From: Petter Tomner Date: Sun, 5 Sep 2021 00:18:10 +0200 Subject: [PATCH 2/2] libgccjit: Test cases for debug info Assure that debug info is available for a local and global variable and a function with GDB. Signed-off-by: 2021-09-05 Petter Tomner gcc/testsuite/jit.dg/ * jit.exp: Helper function * test-debuginfo.c: New file --- gcc/testsuite/jit.dg/jit.exp | 25 ++ gcc/testsuite/jit.dg/test-debuginfo.c | 72 +++ 2 files changed, 97 insertions(+) create mode 100644 gcc/testsuite/jit.dg/test-debuginfo.c diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp index 005ba01601a..905ebe62fbd 100644 --- a/gcc/testsuite/jit.dg/jit.exp +++ b/gcc/testsuite/jit.dg/jit.exp @@ -377,6 +377,31 @@ proc dg-jit-set-exe-params { args } { } } +# For test-debuginfo.c. Starts gdb, does cmds and checks the output against match +proc jit-check-debug-info { obj_file cmds match } { +verbose "Checking debug info for $obj_file with match: $match" + +if { [catch {exec gdb -v} fid] } { +verbose "No gdb seems to be in path. Can't check debug info. Reporting 'unsupported'." +unsupported "No gdb seems to be in path. Can't check debug info" +return +} + +spawn gdb $obj_file + +foreach cmd $cmds { +send $cmd +} +expect { +-re $match { pass OK } +default { fail FAIL } +} + +# Quit gdb +send "set confirm off\n" +send "q\n" +} + proc jit-dg-test { prog do_what extra_tool_flags } { verbose "within jit-dg-test..." verbose " prog: $prog" diff --git a/gcc/testsuite/jit.dg/test-debuginfo.c b/gcc/testsuite/jit.dg/test-debuginfo.c new file mode 100644 index 000..49e8834a0ba --- /dev/null +++ b/gcc/testsuite/jit.dg/test-debuginfo.c @@ -0,0 +1,72 @@ +/* Essentially this test checks that debug info are generated for globals + locals and functions, including type info. The comment bellow is used + as fake code (does not affect the test, use for manual debugging). */ +/* +int a_global_for_test_debuginfo; +int main (int argc, char **argv) +{ +int a_local_for_test_debuginfo = 2; +return a_global_for_test_debuginfo + a_local_for_test_debuginfo; +} +*/ +#include "libgccjit.h" + +/* We don't want set_options() in harness.h to set -O3 so our little local + is optimized away. */ +#define TEST_ESCHEWS_SET_OPTIONS +static void set_options (gcc_jit_context *ctxt, const char *argv0) +{ +gcc_jit_context_set_bool_option(ctxt, GCC_JIT_BOOL_OPTION_DEBUGINFO, 1); +} + +#define TEST_COMPILING_TO_FILE +#define OUTPUT_KIND GCC_JIT_OUTPUT_KIND_EXECUTABLE +#define OUTPUT_FILENAME "jit-debuginfo.o" +#include "harness.h" + +#define LOC(row, col) gcc_jit_context_new_location(ctxt, "test-debuginfo.c", row, col) + +void +create_code (gcc_jit_context *ctxt, void* p) +{ + gcc_jit_type *int_type = gcc_jit_context_get_type(ctxt, GCC_JIT_TYPE_INT); + + gcc_jit_lvalue *bar = gcc_jit_context_new_global(ctxt, +LOC(5,1), GCC_JIT_GLOBAL_EXPORTED, +int_type, "a_global_for_test_debuginfo"); + + gcc_jit_param *argc_para = gcc_jit_context_new_param(ctxt, LOC(6,15), +int_type, "argc"); + gcc_jit_param *argv_para = gcc_jit_context_new_param(ctxt, LOC(6,28), +gcc_jit_type_get_pointer( + gcc_jit_type_get_pointer( +gcc_jit_context_get_type(ctxt, GCC_JIT_TYPE_CHAR))), +"argc"); + + gcc_jit_param *params[] = {argc_para, argv_para}; + + gcc_jit_function *foo_fn = gcc_jit_context_new_function(ctxt, LOC(6,5), +GCC_JIT_FUNCTION_EXPORTED, int_type, "main", 2, params, 0); + gcc_jit_block *start_block = gcc_jit_function_new_block(foo_fn, +"start_block"); + + gcc_jit_lvalue *a = gcc_jit_function_new_local(foo_fn, LOC(8,5), +int_type, "a_local_for_test_debuginfo"); + gcc_jit_block_add_assignment(start_block, LOC(8,36), a, +gcc_jit_context_new_rvalue_from_int(ctxt, int_type, 2)); + gcc_jit_rvalue *add = gcc_jit_context_new_binary_op(ctxt, LOC(9,40), +GCC_JIT_BINARY_OP_PLUS, int_type, +gcc_jit_lvalue_as_rvalue(a), gcc_jit_lvalue_as_rvalue(bar)); + + gcc_jit_block_end_with_return(start_block, LOC(9,5), add); +} + +#undef LOC + +/* jit-check-debug-info fires up gdb and checks that the variables have + debug info */ + +/* { dg-final { jit-check-debug-info "jit-debuginfo.o" {"info variables\n"} "int\\s+a_global_for_test_debuginfo;" } } */ +/* { dg-final { jit-check-debug-info "jit-debuginfo.o" {"pt main\n"} "int\\s*\\(\\s*int\\s*,\\s*char\\s*\\*\\*\\s*\\)"} } */ +/* { dg-final { jit-check-debug-info "jit-debuginfo.o" {"start\n" "info locals\n"} "a_local_for_test_debuginfo"} } */ +/* { dg-final { jit-check-debug-info "jit-debuginfo.o" {"start\n" "pt a_local_for_test_debuginfo\n"} "int"} } */ \ No newline at end of file -- 2.20.1
[PATCH 1/2 v2] jit : Generate debug info for variables
>From 521349806136bef9096d094f4785f5868854a19d Mon Sep 17 00:00:00 2001 From: Petter Tomner Date: Sat, 4 Sep 2021 23:55:34 +0200 Subject: [PATCH 1/2] libgccjit: Generate debug info for variables Finalize declares via available helpers after location is set. Set TYPE_NAME of primitives and friends to "int" etc. Debug info is now set properly for variables. Signed-off-by: 2021-09-05 Petter Tomner gcc/jit/ * jit-playback.c: Moved global var processing to after loc handling. Setting TYPE_NAME for fundamental types. Using common functions for finalizing globals. * jit-playback.h: New method init_types(). Changed get_tree_node_for_type() to method. gcc/testsuite/jit.dg/ * test-error-array-bounds.c: Array is not unsigned --- gcc/jit/jit-playback.c| 70 +++ gcc/jit/jit-playback.h| 5 ++ .../jit.dg/test-error-array-bounds.c | 2 +- 3 files changed, 62 insertions(+), 15 deletions(-) diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c index 79ac525e5df..bf1bd10dedd 100644 --- a/gcc/jit/jit-playback.c +++ b/gcc/jit/jit-playback.c @@ -165,7 +165,8 @@ gt_ggc_mx () /* Given an enum gcc_jit_types value, get a "tree" type. */ -static tree +tree +playback::context:: get_tree_node_for_type (enum gcc_jit_types type_) { switch (type_) @@ -192,11 +193,7 @@ get_tree_node_for_type (enum gcc_jit_types type_) return short_unsigned_type_node; case GCC_JIT_TYPE_CONST_CHAR_PTR: - { - tree const_char = build_qualified_type (char_type_node, - TYPE_QUAL_CONST); - return build_pointer_type (const_char); - } + return m_const_char_ptr; case GCC_JIT_TYPE_INT: return integer_type_node; @@ -579,10 +576,6 @@ playback::lvalue * playback::context:: global_finalize_lvalue (tree inner) { - varpool_node::get_create (inner); - - varpool_node::finalize_decl (inner); - m_globals.safe_push (inner); return new lvalue (this, inner); @@ -2952,9 +2945,7 @@ replay () { JIT_LOG_SCOPE (get_logger ()); - m_const_char_ptr -= build_pointer_type (build_qualified_type (char_type_node, - TYPE_QUAL_CONST)); + init_types (); /* Replay the recorded events: */ timevar_push (TV_JIT_REPLAY); @@ -2984,10 +2975,17 @@ replay () { int i; function *func; - + tree global; /* No GC can happen yet; process the cached source locations. */ handle_locations (); + /* Finalize globals. See how FORTRAN 95 does it in gfc_be_parse_file() + for a simple reference. */ + FOR_EACH_VEC_ELT (m_globals, i, global) +rest_of_decl_compilation (global, true, true); + + wrapup_global_declarations (m_globals.address(), m_globals.length()); + /* We've now created tree nodes for the stmts in the various blocks in each function, but we haven't built each function's single stmt list yet. Do so now. */ @@ -3081,6 +3079,50 @@ location_comparator (const void *lhs, const void *rhs) return loc_lhs->get_column_num () - loc_rhs->get_column_num (); } +/* Initialize the NAME_TYPE of the primitive types as well as some + others. */ +void +playback::context:: +init_types () +{ + /* See lto_init() in lto-lang.c or void visit (TypeBasic *t) in D's types.cc + for reference. If TYPE_NAME is not set, debug info will not contain types */ +#define NAME_TYPE(t,n) \ +if (t) \ + TYPE_NAME (t) = build_decl (UNKNOWN_LOCATION, TYPE_DECL, \ + get_identifier (n), t) + + NAME_TYPE (integer_type_node, "int"); + NAME_TYPE (char_type_node, "char"); + NAME_TYPE (long_integer_type_node, "long int"); + NAME_TYPE (unsigned_type_node, "unsigned int"); + NAME_TYPE (long_unsigned_type_node, "long unsigned int"); + NAME_TYPE (long_long_integer_type_node, "long long int"); + NAME_TYPE (long_long_unsigned_type_node, "long long unsigned int"); + NAME_TYPE (short_integer_type_node, "short int"); + NAME_TYPE (short_unsigned_type_node, "short unsigned int"); + if (signed_char_type_node != char_type_node) +NAME_TYPE (signed_char_type_node, "signed char"); + if (unsigned_char_type_node != char_type_node) +NAME_TYPE (unsigned_char_type_node, "unsigned char"); + NAME_TYPE (float_type_node, "float"); + NAME_TYPE (double_type_node, "double"); + NAME_TYPE (long_double_type_node, "long double"); + NAME_TYPE (void_type_node, "void"); + NAME_TYPE (boolean_type_node, "bool"); + NAME_TYPE (complex_float_type_node, "complex float"); + NAME_TYPE (complex_double_type_node, "complex double"); + NAME_TYPE (complex_long_double_type_node, "complex long double"); + + m_const_char_ptr = build_pointer_type( +build_qualified_type (char_type_node, TYPE_QUAL_CONST)); + + NAME_TYPE (m_const_char_ptr, "char"); + NAME_TYPE (size_type_node, "size_t"); + NAME_TYPE
[PATCH 0/2 v2] jit : Generate debug info for variables
Hi, This is a revision of my patch for debug info. The patches are posted as mails to this thread. Make check-jit runs fine on Debian x64. Below is the original mail and under it a rehash of the review comments. Regards, Petter - Hi, This is a patch to generate debug info for local variables as well as globals. With this, "ptype foo", "info variables", "info locals" etc works when debugging in GDB. Finalizing of global variable declares are moved to after locations are handled and done as Fortran, C, Go etc do it. Also, primitive types have their TYPE_NAME set for debug info on types to work. Below are the patch, and I attached a testcase. Since it requires GDB to run it might not be suitable? Make check-jit runs fine on Debian x64. Regards, - > Can you write non-empty ChangeLog entries please. Done. I think the python script chokes on asd/qwe/jit.db/ (the dot) though. > @@ -2984,15 +2975,22 @@ replay () > Looks like some whitespace churn above Fixed. > I don't see "Signed-off-by" tags in the patches. Added. > I think this should be "unsupported" rather than "xfail". Changed. > This is OK, but maybe using gcc_jit_context_dump_to_file with update_locations == 1 might be more sustainable in the long run? Ye I didn't remember that flag. Entering loc manually aint no fun.
[Bug middle-end/19987] [meta-bug] fold missing optimizations in general
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 Bug 19987 depends on bug 95433, which changed state. Bug 95433 Summary: Failure to completely optimize simple compare after operations https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95433 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/95433] Failure to completely optimize simple compare after operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95433 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |11.0 Status|NEW |RESOLVED --- Comment #8 from Andrew Pinski --- Fixed in GCC 11 by the commits.
gcc-11-20210904 is now available
Snapshot gcc-11-20210904 is now available on https://gcc.gnu.org/pub/gcc/snapshots/11-20210904/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 11 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-11 revision 9c3a4753acfa1dde12aa1a935a01b8387ca017ec You'll find: gcc-11-20210904.tar.xz Complete GCC SHA256=2ffeb49b3238e57ff02b1e45a4388dba6208fa98b40472e3e7cca66fcbf02a22 SHA1=b0e64f10992449ebcd57cc3132cb9608c98aed65 Diffs from 11-20210828 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-11 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
[Bug tree-optimization/18487] Warnings for pure and const functions that are not actually pure or const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18487 --- Comment #30 from Federico Kircheis --- It seems to me we are not going to agree as we tend to repeat ourselves, lets see if we go around and around in circles or if it is more like a spiral ;) Your view is more about the compiler, how it is interpreting the attributes and thus why it is unneeded, mine is more about the developers writing (but most importantly) reading it. > The only functions GCC can warn about are those that don’t need the attributes in the first place. The way any warning would work is to detect whether it is pure/const, and then see how the user marked it. So anything it can properly detect as right or wrong didn’t need an attribute to begin with - the compiler could already tell if it was pure/const My knowledge about how GCC (or other compilers) works, is very limited, but If the function is implemented in another * translation unit * library * pre-compiled library * pre-compiled library created by another compiler does GCC know it can avoid calling it multiple times? Whole-program-optimization might help in some of those cases (I admit I have no idea; can the linker remove multiple function calls and replace them with a variable?), but depending on the project size it might add up a lot in term of compile-times. So even for simple functions, where GCC can clearly determine its purity, it can be useful adding the attribute. And even assuming that whole-program-optimization helps in most of those cases (which do not depend on the complexity or length of a function) how does someone know if adding those attributes to a function that is pure makes sense or not? Adding pure to `inline int answer_of_life(){return 42;}` might not make any difference (both for programmers and compiler, because of it's simplicity and because inline), but where should the line be drawn? Should I mark my functions (with something else as you are suggesting too it might do more harm than good), add for all those dummy tests, and check in the generated assembly if GCC recognizes them as pure and elides the second call? There must be surely be a better way, but I currently know no other. > Rather than tell the user they got it wrong, you might as well tell the user to remove the attribute because it isn’t necessary and won’t be necessary. No, removing it as unnecessary would be wrong. Then you cannot tell anymore the difference between functions that are pure by accident and by design. And you cannot prevent anymore a pure-function to getting nonpure, except by reading the code. It is useful for programmers (yes, even they look at the code), even for those function where GCC does not need the attribute. > Giving a bunch of really contrived examples where users may update things wrong doesn’t seem like a good motivation to make a warning that can only possibly have a really high false positive rate. Just adding a "printf" statement for debugging, or increasing/decreasing a global counter invalidates the pure attributes. Thus by trying to understand/analyze a bug, another is added. > It is a tool for experts. And I see no harm in making it more developer-friendly. Why would that be a bad idea? As you claimed previously. Because it is difficult to implement? I do not know if it is, but that would not make it a bad idea. Because of false positives? Developers can handle them, case-by-case by documenting and disabling (or ignoring) the diagnostic, or globally by not turning the diagnostic on. Just like any other diagnostic. Because it adds nothing from a compiler perspective? I'm still not convinced that it has no added value, especially when interacting with "extern" code/libraries. But it definitively has some value for developers. It's part of the API of a function, just like declaring the member function of a class const (or the parameter of a function). Adding const might even avoid some optimization, and leads to code-duplication when one needs overloads (like for operator[] in container-like classes), but from a developer perspective it's great. It helps to catch errors. Of course one could never use it, for the compiler it would be the same. And it would not invalidate it's original use-case, thus it would still be possible to use those attributes like today if someone wants to, they would not even need to change a thing.
[Bug target/94789] Failure to take advantage of shift operand semantics to turn subtraction into negate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94789 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug middle-end/92080] Missed CSE of _mm512_set1_epi8(c) with _mm256_set1_epi8(c)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2019-10-14 00:00:00 |2021-9-4 Severity|normal |enhancement --- Comment #5 from Andrew Pinski --- This gives good code: #include __m512i sinkz; __m256i sinky; void foo(char c) { __m512i a = _mm512_set1_epi8(c); sinkz = a; sinky = *((__m256i*)); }
[Bug target/93346] gcc does not generate BZHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346 Andrew Pinski changed: What|Removed |Added CC||peter at cordes dot ca --- Comment #8 from Andrew Pinski --- *** Bug 82298 has been marked as a duplicate of this bug. ***
[Bug target/82298] x86 BMI: no peephole for BZHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82298 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |10.0 Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #1 from Andrew Pinski --- Fixed in GCC 10. Dup of bug 93346. *** This bug has been marked as a duplicate of bug 93346 ***
[Bug target/93346] gcc does not generate BZHI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93346 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |10.0
[PATCH] x86: Enable FMA in unsigned SI to SF expanders
Enable FMA in scalar/vector unsigned SI to SF expanders. gcc/ PR target/85819 * config/i386/i386-expand.c (ix86_expand_convert_uns_sisf_sse): Enable FMA. (ix86_expand_vector_convert_uns_vsivsf): Likewise. gcc/testsuite/ PR target/85819 * gcc.target/i386/pr85819-1.c: New test. * gcc.target/i386/pr85819-2a.c: Likewise. * gcc.target/i386/pr85819-2b.c: Likewise. * gcc.target/i386/pr85819-2c.c: Likewise. * gcc.target/i386/pr85819-3.c: Likewise. --- gcc/config/i386/i386-expand.c | 44 -- gcc/testsuite/gcc.target/i386/pr85819-1.c | 11 ++ gcc/testsuite/gcc.target/i386/pr85819-2a.c | 17 + gcc/testsuite/gcc.target/i386/pr85819-2b.c | 6 +++ gcc/testsuite/gcc.target/i386/pr85819-2c.c | 7 gcc/testsuite/gcc.target/i386/pr85819-3.c | 18 + 6 files changed, 91 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr85819-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr85819-2a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr85819-2b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr85819-2c.c create mode 100644 gcc/testsuite/gcc.target/i386/pr85819-3.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 2500dbfa7fb..26263bbe1af 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -1851,12 +1851,21 @@ ix86_expand_convert_uns_sisf_sse (rtx target, rtx input) fp_lo = gen_reg_rtx (SFmode); emit_insn (gen_floatsisf2 (fp_hi, int_hi)); emit_insn (gen_floatsisf2 (fp_lo, int_lo)); - fp_hi = expand_simple_binop (SFmode, MULT, fp_hi, x, fp_hi, - 0, OPTAB_DIRECT); - fp_hi = expand_simple_binop (SFmode, PLUS, fp_hi, fp_lo, target, - 0, OPTAB_DIRECT); - if (!rtx_equal_p (target, fp_hi)) -emit_move_insn (target, fp_hi); + if (TARGET_FMA || TARGET_AVX512F) +{ + x = validize_mem (force_const_mem (SFmode, x)); + fp_hi = gen_rtx_FMA (SFmode, fp_hi, x, fp_lo); + emit_move_insn (target, fp_hi); +} + else +{ + fp_hi = expand_simple_binop (SFmode, MULT, fp_hi, x, fp_hi, + 0, OPTAB_DIRECT); + fp_hi = expand_simple_binop (SFmode, PLUS, fp_hi, fp_lo, target, + 0, OPTAB_DIRECT); + if (!rtx_equal_p (target, fp_hi)) + emit_move_insn (target, fp_hi); +} } /* floatunsv{4,8}siv{4,8}sf2 expander. Expand code to convert @@ -1888,12 +1897,23 @@ ix86_expand_vector_convert_uns_vsivsf (rtx target, rtx val) real_ldexp (, , 16); tmp[5] = const_double_from_real_value (TWO16r, SFmode); tmp[5] = force_reg (fltmode, ix86_build_const_vector (fltmode, 1, tmp[5])); - tmp[6] = expand_simple_binop (fltmode, MULT, tmp[4], tmp[5], NULL_RTX, 1, - OPTAB_DIRECT); - tmp[7] = expand_simple_binop (fltmode, PLUS, tmp[3], tmp[6], target, 1, - OPTAB_DIRECT); - if (tmp[7] != target) -emit_move_insn (target, tmp[7]); + unsigned vector_size = GET_MODE_SIZE (fltmode); + if (TARGET_FMA + || (TARGET_AVX512F && vector_size == 64) + || (TARGET_AVX512VL && (vector_size == 32 || vector_size == 16))) +{ + tmp[6] = gen_rtx_FMA (fltmode, tmp[4], tmp[5], tmp[3]); + emit_move_insn (target, tmp[6]); +} + else +{ + tmp[6] = expand_simple_binop (fltmode, MULT, tmp[4], tmp[5], + NULL_RTX, 1, OPTAB_DIRECT); + tmp[7] = expand_simple_binop (fltmode, PLUS, tmp[3], tmp[6], + target, 1, OPTAB_DIRECT); + if (tmp[7] != target) + emit_move_insn (target, tmp[7]); +} } /* Adjust a V*SFmode/V*DFmode value VAL so that *sfix_trunc* resp. fix_trunc* diff --git a/gcc/testsuite/gcc.target/i386/pr85819-1.c b/gcc/testsuite/gcc.target/i386/pr85819-1.c new file mode 100644 index 000..db02282d100 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr85819-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mfma -mfpmath=sse" } */ + +float +foo (unsigned int x) +{ + return x; +} + +/* { dg-final { scan-assembler "vfmadd132ss" { target ia32 } } } */ +/* { dg-final { scan-assembler "vcvtsi2ssq" { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr85819-2a.c b/gcc/testsuite/gcc.target/i386/pr85819-2a.c new file mode 100644 index 000..cea599fe416 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr85819-2a.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-avx512f -mavx2 -mfma -mfpmath=sse" } */ + +typedef float To __attribute__ ((__vector_size__ (32))); +typedef unsigned int From __attribute__ ((__vector_size__ (32))); + +#define A2(I) (float)a[I], (float)a[1+I] +#define A4(I) A2(I), A2(2+I) +#define A8(I) A4(I), A4(4+I) + +To +f(From a) +{ + return __extension__ (To) {A8(0)}; +} + +/*
[PATCH] x86: Add non-destructive source to @xorsign3_1
Add non-destructive source alternative to @xorsign3_1 for AVX. gcc/ PR target/89984 * config/i386/i386-expand.c (ix86_split_xorsign): Use operands[2]. * config/i386/i386.md (@xorsign3_1): Add non-destructive source alternative for AVX. gcc/testsuite/ PR target/89984 * gcc.target/i386/pr89984-1.c: New test. * gcc.target/i386/pr89984-2.c: Likewise. * gcc.target/i386/xorsign-avx.c: Likewise. --- gcc/config/i386/i386-expand.c | 13 - gcc/config/i386/i386.md | 11 ++- gcc/testsuite/gcc.target/i386/pr89984-1.c | 8 gcc/testsuite/gcc.target/i386/pr89984-2.c | 10 ++ gcc/testsuite/gcc.target/i386/xorsign-avx.c | 4 5 files changed, 36 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr89984-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr89984-2.c create mode 100644 gcc/testsuite/gcc.target/i386/xorsign-avx.c diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 2500dbfa7fb..273a0ba8e3d 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -2279,21 +2279,24 @@ void ix86_split_xorsign (rtx operands[]) { machine_mode mode, vmode; - rtx dest, op0, mask, x; + rtx dest, op0, op1, mask, x; dest = operands[0]; op0 = operands[1]; + op1 = operands[2]; mask = operands[3]; mode = GET_MODE (dest); vmode = GET_MODE (mask); - dest = lowpart_subreg (vmode, dest, mode); - x = gen_rtx_AND (vmode, dest, mask); - emit_insn (gen_rtx_SET (dest, x)); + op1 = lowpart_subreg (vmode, op1, mode); + x = gen_rtx_AND (vmode, op1, mask); + emit_insn (gen_rtx_SET (op1, x)); op0 = lowpart_subreg (vmode, op0, mode); - x = gen_rtx_XOR (vmode, dest, op0); + x = gen_rtx_XOR (vmode, op1, op0); + + dest = lowpart_subreg (vmode, dest, mode); emit_insn (gen_rtx_SET (dest, x)); } diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 0cd151ce4e5..18b91c77937 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10806,17 +10806,18 @@ (define_expand "xorsign3" "ix86_expand_xorsign (operands); DONE;") (define_insn_and_split "@xorsign3_1" - [(set (match_operand:MODEF 0 "register_operand" "=Yv") + [(set (match_operand:MODEF 0 "register_operand" "=Yv,Yv") (unspec:MODEF - [(match_operand:MODEF 1 "register_operand" "Yv") - (match_operand:MODEF 2 "register_operand" "0") - (match_operand: 3 "nonimmediate_operand" "Yvm")] + [(match_operand:MODEF 1 "register_operand" "Yv,Yv") + (match_operand:MODEF 2 "register_operand" "0,Yv") + (match_operand: 3 "nonimmediate_operand" "Yvm,Yvm")] UNSPEC_XORSIGN))] "SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH" "#" "&& reload_completed" [(const_int 0)] - "ix86_split_xorsign (operands); DONE;") + "ix86_split_xorsign (operands); DONE;" + [(set_attr "isa" "noavx,avx")]) ;; One complement instructions diff --git a/gcc/testsuite/gcc.target/i386/pr89984-1.c b/gcc/testsuite/gcc.target/i386/pr89984-1.c new file mode 100644 index 000..d77691c0da0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr89984-1.c @@ -0,0 +1,8 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2" } */ + +float +check_f_pos (float x, float y) +{ + return x * __builtin_copysignf (1.0f, y); +} diff --git a/gcc/testsuite/gcc.target/i386/pr89984-2.c b/gcc/testsuite/gcc.target/i386/pr89984-2.c new file mode 100644 index 000..ff6a8e50573 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr89984-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mavx" } */ + +float +check_f_pos (float x, float y) +{ + return x * __builtin_copysignf (1.0f, y); +} + +/* { dg-final { scan-assembler-not "vmovaps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/xorsign-avx.c b/gcc/testsuite/gcc.target/i386/xorsign-avx.c new file mode 100644 index 000..f2e2054b6fb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/xorsign-avx.c @@ -0,0 +1,4 @@ +/* { dg-do run { target avx_runtime } } */ +/* { dg-options "-O2 -mavx -mfpmath=sse -ftree-vectorize" } */ + +#include "xorsign.c" -- 2.31.1
[Bug tree-optimization/99082] manual bit-field creation followed by manual extraction does not always produce good code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99082 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug target/97286] GCC sometimes uses an extra xmm register for the destination of _mm_blend_ps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97286 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Keywords||ra
[Bug target/88473] AVX512: constant folding on mask does not remove unnecessary instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88473 Andrew Pinski changed: What|Removed |Added Blocks||93885 --- Comment #7 from Andrew Pinski --- The UNSPEC_MASKOP ones are still there. PR 93885 is the same issue. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93885 [Bug 93885] Spurious instruction kshiftlw issued
[Bug target/95974] AArch64 arm_neon.h stores interfere with gimple optimisations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95974 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Severity|normal |enhancement Last reconfirmed||2021-09-04 --- Comment #1 from Andrew Pinski --- Confirmed, maybe adding some access attributes will help this.
[Bug tree-optimization/89811] uint32_t load is not recognized if shifts are done in a fixed-size loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89811 Andrew Pinski changed: What|Removed |Added CC||gabravier at gmail dot com --- Comment #3 from Andrew Pinski --- *** Bug 94834 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/94834] Failure to optimize loop bswap pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94834 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #3 from Andrew Pinski --- This is a dup of bug 89811. *** This bug has been marked as a duplicate of bug 89811 ***
[Bug target/93885] Spurious instruction kshiftlw issued
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93885 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Ever confirmed|0 |1 Last reconfirmed||2021-09-04 Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski --- Confirmed, this is due to UNSPEC_MASKOP on the shift which most likely can be removed these days.
[Bug middle-end/91899] Merge constant literals
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91899 Andrew Pinski changed: What|Removed |Added Resolution|--- |WONTFIX Status|NEW |RESOLVED --- Comment #6 from Andrew Pinski --- You need to use -fmerge-all-constants and the linker will merge them.
[Bug target/85539] x86_64: loads are not always narrowed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85539 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=92180 Known to fail||10.3.0 Status|NEW |RESOLVED Known to work||11.1.0 --- Comment #3 from Andrew Pinski --- Trying 6 -> 7: 6: r86:DI=[r87:DI] REG_DEAD r87:DI 7: r85:SI=r86:DI#0 REG_DEAD r86:DI Successfully matched this instruction: (set (reg:SI 85 [ *p_3(D) ]) (mem:SI (reg:DI 87) [1 *p_3(D)+0 S4 A64])) allowing combination of insns 6 and 7 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 6. modifying insn i3 7: r85:SI=[r87:DI] REG_DEAD r87:DI deferring rescan insn with uid = 7. starting the processing of deferred insns rescanning insn with uid = 7. ending the processing of deferred insns This is because cse no longer props the subreg into the last move: (insn 7 6 8 2 (set (reg:SI 85) (subreg:SI (reg:DI 86) 0)) "/app/example.cpp":7:13 67 {*movsi_internal} (nil)) (insn 8 7 12 2 (set (reg:SI 83 [ ]) (reg:SI 85)) "/app/example.cpp":7:13 67 {*movsi_internal} (nil)) (insn 12 8 13 2 (set (reg/i:SI 0 ax) (reg:SI 83 [ ])) "/app/example.cpp":8:1 67 {*movsi_internal} (nil)) And this was due to the patch which fixes PR 92180 and it was an expected out come too.
[Bug middle-end/90424] memcpy into vector builtin not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90424 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2019-05-13 00:00:00 |2021-9-4 Severity|normal |enhancement Component|target |middle-end --- Comment #8 from Andrew Pinski --- Happens on aarch64 also.
[Bug tree-optimization/18487] Warnings for pure and const functions that are not actually pure or const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18487 --- Comment #29 from Daniel Berlin --- Let me try to explain a different way: The only functions GCC can warn about are those that don’t need the attributes in the first place. The way any warning would work is to detect whether it is pure/const, and then see how the user marked it. So anything it can properly detect as right or wrong didn’t need an attribute to begin with - the compiler could already tell if it was pure/const Rather than tell the user they got it wrong, you might as well tell the user to remove the attribute because it isn’t necessary and won’t be necessary. This is precisely why attributes are meant for when you are sure you know more than the compiler can tell, and *no other time *. It is a tool for experts. Giving a bunch of really contrived examples where users may update things wrong doesn’t seem like a good motivation to make a warning that can only possibly have a really high false positive rate. The same logic applies to a lot of expert-use-only attributes. It is assumed you know what you are doing, because the compiler can’t tell you you are wrong accurately On Sat, Sep 4, 2021 at 4:40 PM federico.kircheis at gmail dot com < gcc-bugzi...@gcc.gnu.org> wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18487 > > --- Comment #28 from Federico Kircheis com> --- > >Edit: sorry, my last comment about what GCC thinks is wrong. > > Unless it is going to inline the function call, in that case the > attributes are > as-if ignored (at least the case I've tested with GCC 11.2). > > -- > You are receiving this mail because: > You are on the CC list for the bug.
[Bug tree-optimization/89811] uint32_t load is not recognized if shifts are done in a fixed-size loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89811 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Last reconfirmed|2019-03-25 00:00:00 |2021-9-4
[Bug tree-optimization/93040] gcc doesn't optimize unaligned accesses to a 16-bit value on the x86 as well as it does a 32-bit value (or clang)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93040 Andrew Pinski changed: What|Removed |Added CC||nok.raven at gmail dot com --- Comment #6 from Andrew Pinski --- *** Bug 89809 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/89809] movzwl is not utilized when uint16_t is loaded with bit-shifts (while memcpy does)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89809 Andrew Pinski changed: What|Removed |Added Known to fail||9.4.0 Resolution|--- |DUPLICATE Status|NEW |RESOLVED Known to work||10.1.0 Target Milestone|--- |10.0 --- Comment #4 from Andrew Pinski --- Fixed for GCC 10. Dup of bug 93040. *** This bug has been marked as a duplicate of bug 93040 ***
[Bug tree-optimization/93040] gcc doesn't optimize unaligned accesses to a 16-bit value on the x86 as well as it does a 32-bit value (or clang)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93040 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |10.0
[Bug tree-optimization/18487] Warnings for pure and const functions that are not actually pure or const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18487 --- Comment #28 from Federico Kircheis --- >Edit: sorry, my last comment about what GCC thinks is wrong. Unless it is going to inline the function call, in that case the attributes are as-if ignored (at least the case I've tested with GCC 11.2).
[Bug tree-optimization/18487] Warnings for pure and const functions that are not actually pure or const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18487 --- Comment #27 from Federico Kircheis --- Edit: sorry, my last comment about what GCC thinks is wrong. GCC seems to follow the gnu::pure/gnu::const directive to the letter, it does not ignore it when it sees the implementation of the function, thus my comment about information are already available can be ignored.
[Bug tree-optimization/18487] Warnings for pure and const functions that are not actually pure or const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18487 --- Comment #26 from Federico Kircheis --- As multiple people commented this Ticket, I do not know to who the least message is sent, but I would like to give again my opinion on it, as I would really like to use those attributes in non-toy projects. > This seems like a bad idea I think there are valid use-cases for those warnings. > and is impossible in general Let me quote myself: > ... a warning that even only works for trivial case is much better than > nothing, because at least I know I can safely use the attribute for some > functions as a contract to the caller, and have it checked. There are now two possible outcomes if a compiler emits a warning. 1) I look at the definition, and *gasp*, the compiler is actually right. The function was pure before, but the last changes made it impure. Either I did not realize it, or I forgot to change the function declaration. Thank you GCC for making me aware of the issue, I'll fix it. 2) I look at the definition an think that GCC is wrong. I know better, and the function is pure. I can either try to simplify the function in such a way that GCC does not complain anymore (which might be a good idea), or I can use a pragma to ignore this one warning (and comment why it's ignored), or remove the attribute altogether, as GCC might call the function multiple times if it thinks it's impure (see example at the end). In the first approach, I can still benefit from warnings if the function changes again. In the second case I cant but at least, I can still grep in the entire codebase and check periodically which warnings have been disabled locally, just like I do for other warnings. In the third case yes, I would probably report a bug with a minimal example. This (hopefully), would improve GCC analysis capabilities. > The whole point of the attributes is to tell the compiler things are > pure/const in cases it can't already prove. That does not mean that it is not useful to let it do the check, *especially if it can prove that the attribute is used incorrectly*, but even if it can't prove anything. And also see the example at the end why this is not completely true. > It can already prove a lot, and doesn't need help in most of the simple > examples being given (in other bugs). But programmers (at least for the most use-cases I've seen) needs that type of support. I would like to know if a function has side effects. It's great if the compiler can see it automatically, but when reading and writing code, especially code not written by me or maintained by multiple authors, we might want to restrict the functionality of some functions. For side-effect free functions, the attributes const and pure are great, but using them is more harmful, because if used wrongly it introduces UB, thus 1) they do not really document if a function is pure, as there is no tooling checking if the statement is true 2) they introduce bugs that no-one can explain (see at the end). Thus a comment "this function is pure", is by contrast much better, as it does not introduce UB, but we all know that those kind of commends do not age well. Thus at the end, they get ignored because not trustworthy, and one need always to look at the implementation. > You are basically going to warn in the cases the compiler can't prove it [...] And for many use-cases it is fine. Also the second example I gave: // bar.hpp [[gnu::const]] int get_value(); // bar.cpp int get_value(){static int i = 0; return ++i;} // foo.cpp int foo(){ int i = get_value(); int j = get_value(); return i+j; } The compiler will still optimize the call to get_value, (unless it is able to see the definition of get_value and see that there are side effects). Thus, if the function is marked pure, the compiler * will not call it a second time if it does not see the implementation of `get_value` * will call it a second time if it sees the implementation of `get_value` and notices it is not pure. This is one of those bugs that no-one can explain, as simply moving code (making a function, for example, inline, or move it to another file), or changing optimization level, changes the behavior of the program. Thus, given main.cpp [[gnu::const]] int foo(); // foo.cpp int main(){ int i = foo(); int j = foo(); return i+j; } how many times is GCC going to call foo? If GCC thinks that the function is pure, then only once. If it thinks it is not pure, twice. I have no idea what GCC thinks, because there are no diagnostics for it! And look, it does not even matter if foo is pure or not, it matters if GCC thinks if it is pure or not. I can similarly tell GCC to inline functions, but if GCC doesn't at least it will tell me he didn't.(warning: 'always_inline' function might not be inlinable [-Wattributes]) We can of course say "those attributes are only for those people that really know better", but as the compiler is
[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 --- Comment #37 from Peter Cordes --- Correction, PR82666 is that the cmov on the critical path happens even at -O2 (with GCC7 and later). Not just with -O3 -fno-tree-vectorize. Anyway, that's related, but probably separate from choosing to do if-conversion or not after inlining.
[Bug target/56309] conditional moves instead of compare and branch result in almost 2x slower code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 Peter Cordes changed: What|Removed |Added CC||peter at cordes dot ca --- Comment #36 from Peter Cordes --- Related: a similar case of cmov being a worse choice, for a threshold condition with an array input that happens to already be sorted: https://stackoverflow.com/questions/28875325/gcc-optimization-flag-o3-makes-code-slower-than-o2 GCC with -fprofile-generate / -fprofile-use does correctly decide to use branches. GCC7 and later (including current trunk) with -O3 -fno-tree-vectorize de-optimizes by putting the CMOV on the critical path, instead of as part of creating a zero/non-zero input for the ADD. PR82666. If you do allow full -O3, then vectorization is effective, though.
[Bug c/29970] mixing ({...}) with VLA leads to massive breakage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29970 --- Comment #13 from Martin Uecker --- The remaining problem with constant index 0 for the patch mentioned above, appears to be related to fold_binary_loc which transforms (a + (x, 0)) to (x, a) which breaks if 'x' depends on something in 'a'.
pdp11: 'src' may be used uninitialized in this function
Hi Paul, using a recent GCC when building GCC results in a possibly uninitialized access: $ .../gcc/configure --target=pdp11-aout --enable-werror-always --enable-languages=all --disable-gcov --disable-shared --disable-threads --without-headers [...] $ make all-gcc [...] [all 2021-09-04 17:21:14] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/gcc/../libbacktrace -o xcoffout.o -MT xcoffout.o -MMD -MP -MF ./.deps/xcoffout.TPo ../../gcc/gcc/xcoffout.c [all 2021-09-04 17:21:16] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/gcc/../libbacktrace -o pdp11.o -MT pdp11.o -MMD -MP -MF ./.deps/pdp11.TPo ../../gcc/gcc/config/pdp11/pdp11.c [all 2021-09-04 17:21:19] ../../gcc/gcc/config/pdp11/pdp11.c: In function 'bool pdp11_rtx_costs(rtx, machine_mode, int, int, int*, bool)': [all 2021-09-04 17:21:19] ../../gcc/gcc/config/pdp11/pdp11.c:1113:39: error: 'src' may be used uninitialized in this function [-Werror=maybe-uninitialized] [all 2021-09-04 17:21:19] 1113 | *total += pdp11_addr_cost (src, mode, ADDR_SPACE_GENERIC, speed); [all 2021-09-04 17:21:19] | ^~ [all 2021-09-04 17:21:20] cc1plus: all warnings being treated as errors [all 2021-09-04 17:21:20] make[1]: *** [Makefile:2421: pdp11.o] Error 1 [all 2021-09-04 17:21:20] make[1]: Leaving directory '/var/lib/laminar/run/gcc-pdp11-aout/11/toolchain-build/gcc' [all 2021-09-04 17:21:20] make: *** [Makefile:4422: all-gcc] Error 2 The warning seems legit, though hardly reachable in reality. Maybe fix it for sure, though? Thanks, Jan-Benedict -- signature.asc Description: PGP signature
Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC
PING^4 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html One major design goal of PIE was to avoid copy relocations. The original patch for GCC 5 caused problems for many years. On Wed, Aug 18, 2021 at 11:54 PM Fāng-ruì Sòng wrote: > PING^3 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html > > On Fri, Jun 4, 2021 at 3:04 PM Fāng-ruì Sòng wrote: > > > > PING^2 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html > > > > On Mon, May 24, 2021 at 9:43 AM Fāng-ruì Sòng > wrote: > > > > > > Ping https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html > > > > > > On Tue, May 11, 2021 at 8:29 PM Fangrui Song > wrote: > > > > > > > > This was introduced in 2014-12 to use local binding for external > symbols > > > > for -fPIE. Now that we have H.J. Lu's GOTPCRELX for years which > mostly > > > > nullify the benefit of HAVE_LD_PIE_COPYRELOC, HAVE_LD_PIE_COPYRELOC > > > > should retire now. > > > > > > > > One design goal of -fPIE was to avoid copy relocations. > > > > HAVE_LD_PIE_COPYRELOC has deviated from the goal. With this change, > the > > > > -fPIE behavior of x86-64 will be closer to x86-32 and other targets. > > > > > > > > --- > > > > > > > > See https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html for a > list > > > > of fixed and unfixed (e.g. gold incompatibility with protected > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=19823) issues. > > > > > > > > If you prefer a longer write-up, see > > > > > https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected > > > > --- > > > > gcc/config.in | 6 --- > > > > gcc/config/i386/i386.c| 11 +--- > > > > gcc/configure | 52 > --- > > > > gcc/configure.ac | 48 > - > > > > gcc/doc/sourcebuild.texi | 3 -- > > > > .../gcc.target/i386/pie-copyrelocs-1.c| 14 - > > > > .../gcc.target/i386/pie-copyrelocs-2.c| 14 - > > > > .../gcc.target/i386/pie-copyrelocs-3.c| 14 - > > > > .../gcc.target/i386/pie-copyrelocs-4.c| 17 -- > > > > gcc/testsuite/lib/target-supports.exp | 47 - > > > > 10 files changed, 2 insertions(+), 224 deletions(-) > > > > delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-1.c > > > > delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-2.c > > > > delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-3.c > > > > delete mode 100644 gcc/testsuite/gcc.target/i386/pie-copyrelocs-4.c > > > > > > > > diff --git a/gcc/config.in b/gcc/config.in > > > > index e54f59ce0c3..a65bf5d4176 100644 > > > > --- a/gcc/config.in > > > > +++ b/gcc/config.in > > > > @@ -1659,12 +1659,6 @@ > > > > #endif > > > > > > > > > > > > -/* Define 0/1 if your linker supports -pie option with copy reloc. > */ > > > > -#ifndef USED_FOR_TARGET > > > > -#undef HAVE_LD_PIE_COPYRELOC > > > > -#endif > > > > - > > > > - > > > > /* Define if your PowerPC linker has .gnu.attributes long double > support. */ > > > > #ifndef USED_FOR_TARGET > > > > #undef HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > > > > index 915f89f571a..5ec3c6fd0c9 100644 > > > > --- a/gcc/config/i386/i386.c > > > > +++ b/gcc/config/i386/i386.c > > > > @@ -10579,11 +10579,7 @@ legitimate_pic_address_disp_p (rtx disp) > > > > return true; > > > > } > > > > else if (!SYMBOL_REF_FAR_ADDR_P (op0) > > > > - && (SYMBOL_REF_LOCAL_P (op0) > > > > - || (HAVE_LD_PIE_COPYRELOC > > > > - && flag_pie > > > > - && !SYMBOL_REF_WEAK (op0) > > > > - && !SYMBOL_REF_FUNCTION_P (op0))) > > > > + && SYMBOL_REF_LOCAL_P (op0) > > > >&& ix86_cmodel != CM_LARGE_PIC) > > > > return true; > > > > break; > > > > @@ -22892,10 +22888,7 @@ ix86_atomic_assign_expand_fenv (tree *hold, > tree *clear, tree *update) > > > > static bool > > > > ix86_binds_local_p (const_tree exp) > > > > { > > > > - return default_binds_local_p_3 (exp, flag_shlib != 0, true, true, > > > > - (!flag_pic > > > > - || (TARGET_64BIT > > > > - && HAVE_LD_PIE_COPYRELOC != > 0))); > > > > + return default_binds_local_p_3 (exp, flag_shlib != 0, true, true, > !flag_pic); > > > > } > > > > #endif > > > > > > > > diff --git a/gcc/configure b/gcc/configure > > > > index f03fe888384..c500f5ca11e 100755 > > > > --- a/gcc/configure > > > > +++ b/gcc/configure > > > > @@ -29968,58 +29968,6 @@ fi > > > > { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld_pie" >&5 > > > > $as_echo "$gcc_cv_ld_pie" >&6; } > > > > > > > > -{ $as_echo
nvtpx: error: array subscript -1 is below array bounds of 'short int [2][16]'
Hi! Running automated tests again, I found that when building current (2fcfc03459a907c0237ea6e2c6e4ce4871034bed) GCC with a recent GCC, a build (make all-gcc) when ./configure'ed for -target=nvptx-none --enable-werror-always --enable-languages=all --disable-gcov --disable-shared --disable-threads --without-headers fails with: [all 2021-09-04 16:33:59] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/gcc/../libbacktrace -o lra-constraints.o -MT lra-constraints.o -MMD -MP -MF ./.deps/lra-constraints.TPo ../../gcc/gcc/lra-constraints.c [all 2021-09-04 16:34:07] In function 'bool check_and_process_move(bool*, bool*)', [all 2021-09-04 16:34:07] inlined from 'bool curr_insn_transform(bool)' at ../../gcc/gcc/lra-constraints.c:4092:33: [all 2021-09-04 16:34:07] ../../gcc/gcc/lra-constraints.c:1317:56: error: array subscript -1 is below array bounds of 'short int [2][16]' [-Werror=array-bounds] [all 2021-09-04 16:34:07] 1317 | reg_renumber[dregno] = ira_class_hard_regs[dclass][0]; [all 2021-09-04 16:34:07] In file included from ../../gcc/gcc/lra-constraints.c:123: [all 2021-09-04 16:34:07] ../../gcc/gcc/ira.h: In function 'bool curr_insn_transform(bool)': [all 2021-09-04 16:34:07] ../../gcc/gcc/ira.h:85:9: note: while referencing 'target_ira::x_ira_class_hard_regs' [all 2021-09-04 16:34:07]85 | short x_ira_class_hard_regs[N_REG_CLASSES][FIRST_PSEUDO_REGISTER]; [all 2021-09-04 16:34:07] | ^ [all 2021-09-04 16:34:07] In function 'bool check_and_process_move(bool*, bool*)', [all 2021-09-04 16:34:07] inlined from 'bool curr_insn_transform(bool)' at ../../gcc/gcc/lra-constraints.c:4092:33: [all 2021-09-04 16:34:07] ../../gcc/gcc/lra-constraints.c:1324:56: error: array subscript -1 is below array bounds of 'short int [2][16]' [-Werror=array-bounds] [all 2021-09-04 16:34:07] 1324 | reg_renumber[sregno] = ira_class_hard_regs[sclass][0]; [all 2021-09-04 16:34:07] In file included from ../../gcc/gcc/lra-constraints.c:123: [all 2021-09-04 16:34:07] ../../gcc/gcc/ira.h: In function 'bool curr_insn_transform(bool)': [all 2021-09-04 16:34:07] ../../gcc/gcc/ira.h:85:9: note: while referencing 'target_ira::x_ira_class_hard_regs' [all 2021-09-04 16:34:07]85 | short x_ira_class_hard_regs[N_REG_CLASSES][FIRST_PSEUDO_REGISTER]; [all 2021-09-04 16:34:07] | ^ [all 2021-09-04 16:34:13] cc1plus: all warnings being treated as errors [all 2021-09-04 16:34:13] make[1]: *** [Makefile:1142: lra-constraints.o] Error 1 [all 2021-09-04 16:34:13] make[1]: Leaving directory '/var/lib/laminar/run/gcc-nvptx-none/5/toolchain-build/gcc' [all 2021-09-04 16:34:13] make: *** [Makefile:4407: all-gcc] Error 2 Thanks, Jan-Benedict -- signature.asc Description: PGP signature
[Bug c++/102201] Accepts invalid C++98 with nested class and sizeof of outer's non-static field
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102201 Harald van Dijk changed: What|Removed |Added CC||harald at gigawatt dot nl --- Comment #1 from Harald van Dijk --- This doesn't need inner classes, a simpler reproducer is: struct S { int i; }; int j = sizeof S::i; gcc accepts this in all modes ever since the C++11 rule for non-static members in unevaluated contexts was implemented (4.4). clang says in C++98 mode: test.cc:2:19: error: invalid use of non-static data member 'i' int j = sizeof S::i; ~~~^ 1 error generated.
[Bug c++/101355] incorrect `this' in destructor calls when compiling coroutines with ubsan
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101355 Dan Klishch changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Dan Klishch --- GCC stopped instrumenting destructors in this particular case, so I guess the bug is fixed. https://godbolt.org/z/KGa6aGf5x
[Bug tree-optimization/102196] -Wmaybe-uninitialized: Maybe generate helpful hints?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102196 --- Comment #6 from Jan-Benedict Glaw --- Calling the compiler again with just adding -fanalyzer doesn't add more information to the output. Do I need to turn on extra warnings to enable static analysis for access to possibly uninitialized variables?
[Bug tree-optimization/102200] [12 Regression] ice in put_ref, at pointer-query.cc:1351
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102200 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-09-04 --- Comment #1 from Andrew Pinski --- (In reply to David Binderman from comment #0) > The bug first seems to occur sometime between git hash 7a6f40d0452ec76e > and 9695e1c23be5b5c5. Only 21 commits. Most likely r12-3300-ece28da924dd Confirmed.
[Bug c++/102199] is_default_constructible incorrect for an inner type with NSDMI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102199 --- Comment #3 from Eyal Rozenberg --- Andrew: What you're saying would be plausible if g++ would find the structure to be incomplete. It does not. The completeness check passes; and it is why adding the explicit default ctor makes the asserting pass - despite your rationale applying to that case just as well.
[Bug c++/102201] New: Accepts invalid C++98 with nested class and sizeof of outer's non-static field
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102201 Bug ID: 102201 Summary: Accepts invalid C++98 with nested class and sizeof of outer's non-static field Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: accepts-invalid Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: struct outer { struct inner { inner() :x(sizeof(y)) { } unsigned int x; }; int y; }; - CUT The above code is valid C++11 but invalid C++98 because the field y is non-static.
[Bug c++/102199] is_default_constructible incorrect for an inner type with NSDMI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102199 --- Comment #2 from Andrew Pinski --- This is because the following is still valid C++11: struct outer { struct inner { // inner() { } unsigned int x = y; }; static constexpr int y =10; }; That is inner is not completed until outer is completed.
[Bug c++/102199] is_default_constructible incorrect for an inner type with NSDMI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102199 Andrew Pinski changed: What|Removed |Added Component|libstdc++ |c++ --- Comment #1 from Andrew Pinski --- THis comes down to when the struct is complete.
[Bug tree-optimization/102200] [12 Regression] ice in put_ref, at pointer-query.cc:1351
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102200 Andrew Pinski changed: What|Removed |Added Keywords||ice-on-valid-code Target Milestone|--- |12.0 Component|c |tree-optimization Summary|ice in put_ref, at |[12 Regression] ice in |pointer-query.cc:1351 |put_ref, at ||pointer-query.cc:1351
[Bug tree-optimization/18487] Warnings for pure and const functions that are not actually pure or const
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18487 --- Comment #25 from Daniel Berlin --- This seems like a bad idea, and is impossible in general. The whole point of the attributes is to tell the compiler things are pure/const in cases it can't already prove. It can already prove a lot, and doesn't need help in most of the simple examples being given (in other bugs). You are basically going to warn in the cases the compiler can't prove it (IE sees something it thinks makes the function not pure/const), and those are *exactly* the cases the attribute exists for - where the compiler doesn't know, but you do.