[Bug tree-optimization/113718] New: std::bit_cast making the compiler generate unnecessary code.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113718 Bug ID: 113718 Summary: std::bit_cast making the compiler generate unnecessary code. Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: #include void f(); auto const p1 = auto const p2 = std::bit_cast(); bool a() { return p1 == p2; } The code emitted for `a` should be the same as-if `return true;` but the usage of a "no-op" `std::bit_cast` muddies the waters and the compiler generates: a(): cmp QWORD PTR p2[rip], OFFSET FLAT:_Z1fv sete al ret FWIW: The following changes make the compiler to generate more efficient code: 1. Move `p1` and `p2` inside the body of `a`. 2. Replace `std::bit_cast` with `static_cast`. 3. Remove the cast altogether. Things get terribly worse if `p1` and `p2` are made `static` and moved inside the body of `a`. Given that the compiler can get confused by a "no-op" `std::bit_cast`, I wonder if it would do the same for more interesting code than this toy example. https://godbolt.org/z/daWe5Yod8
[Bug middle-end/110906] New: __attribute__((optimize("no-math-errno"))) has no effect.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110906 Bug ID: 110906 Summary: __attribute__((optimize("no-math-errno"))) has no effect. Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider this C++ code compiled with -O3: double g(double x) { return std::sqrt(x); } Usually this does call the library function std::sqrt because x might be negative and errno needs to be set accordingly. Moreover, with -fno-math-errno a single sqrtsd instruction is emitted. However, annotating g with __attribute__((optimize("no-math-errno"))) has no effect. This attribute (and #pragma GCC optimize("no-math-errno") ) used to work up to gcc 5.5. https://godbolt.org/z/T1nb11bv5
[Bug tree-optimization/107564] New: Fail to recognize overflow check for addition of __uint128_t operands
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107564 Bug ID: 107564 Summary: Fail to recognize overflow check for addition of __uint128_t operands Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: char f128(__uint128_t m, __uint128_t n) { #if !defined(USE_BUILTIN_ADD_OVERFLOW) m += n; return m < n; #else __uint128_t r; return __builtin_add_overflow(m, n, ); #endif } When USE_BUILTIN_ADD_OVERFLOW is undefined, GCC fails to recognise this is an overflow check and with -O3 generates this: mov r8, rdi mov rax, rsi mov rdi, rax mov rsi, r8 mov rax, rdx mov rdx, rcx add rsi, rax adc rdi, rcx cmp rsi, rax mov rcx, rdi sbb rcx, rdx setcal ret When USE_BUILTIN_ADD_OVERFLOW is defined, it generates better code but still suboptimal: mov r8, rdi mov rax, rsi mov rsi, r8 mov rdi, rax add rsi, rdx adc rdi, rcx setcal ret For other unsigned integer types GCC generates the same optimal code for both methods. For instance for uint64_t: add rdi, rsi setcal ret https://godbolt.org/z/bj4M5no4j
[Bug tree-optimization/104539] Failed to inline a very simple template function when it's explicit instantiated.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104539 --- Comment #1 from Cassio Neri --- Sorry, the last snippet above should be template inline int f() { return 0; }
[Bug tree-optimization/104539] New: Failed to inline a very simple template function when it's explicit instantiated.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104539 Bug ID: 104539 Summary: Failed to inline a very simple template function when it's explicit instantiated. Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: template //inline int f() { return 0; } int g() { return f<0>() + 1; } Using -O3, I'd expect f to be inlined in g and this is indeed the case: g(): mov eax, 1 ret However, if f is explicit instantiated: template unsigned f<0>(); then we get a function call (or a jmp if tail call optimisation is possible) g(): sub rsp, 8 call int f<0>() add rsp, 8 add eax, 1 ret A (quite unusual, IMHO) workaround is declaring f as inline: template inline unsigned f() { return n; } https://godbolt.org/z/TarsTY3zb
[Bug tree-optimization/104444] New: Missing constant folding in shift expression.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10 Bug ID: 10 Summary: Missing constant folding in shift expression. Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- #include inline bool f(uint32_t m, int n) { return (m >> n) != 0; } bool g(int n) { return f(1 << 24, n); } g can be optimised to "return n <= 24". LLVM does that but gcc doesn't. The example above drove me to another missing optimisation opportunity based on undefined behaviour. (Perhaps a matter for other report?) bool h(uint32_t m, int n) { return (n >= 0 && n < 32) || (m >> n) != 0; } If (n >= 0 && n < 32) is false, then (m >> n) is UB (in C++, probably also in C). Therefore, h can be optimised to "return true" but gcc doesn't do that (neither does LLVM). See here: https://godbolt.org/z/hx9vGe6Kj If confirmed, these bugs could be added to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 Potentially related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95817 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94789#c1
[Bug tree-optimization/101436] Yet another bogus "array subscript is partly outside array bounds"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101436 --- Comment #3 from Cassio Neri --- Because of the typeid check the unsafe static_cast never happens and I think the compiler should not be warning about a problem that doesn't exist. Besides, there's no array involved in this code. I appreciate the attempt to emit a good warning that might improve my code but the message is completely misleading and make me scratch my head. Here the code is minimal and obvious to figure out that there's no array. In a large code base I could spend longtime looking for an array that doesn't exist or I could find an array that has no issue but the compiler makes me think it has. Re using a dynamic_cast: I could surely use a dynamic_cast in real code but this is a compiler test case. IMHO, it should be minimal, straight to the point at the expense of neglecting other aspects of the language (e.g. better practices) that could otherwise divert the attention. As I said the virtual destructor in A and the typeid check were there to avoid obvious UB that would happen had I unconditionally performed the static_cast. In that UB case (provided the message were clearer and not misleading talking about arrays) I'd be very grateful for getting the warning. Notice also that I provided a couple of changes that don't make the code any better w.r.t. an unsafe static_cast (which, again, is never performed). These changes make the spurious warning to go away (which is good) and this shows that there's certainly something wrong with the logic that decides to emit the warning for the code as originally posted. What about this example which involves no virtual method and where dynamic_cast cannot help? struct A { int type; }; struct C1 { int i; int j; }; struct C2 { int i; }; template struct B : A { B() : A{i} {} T x; }; using BC1 = B; using BC2 = B; void do_something(int); BC2 get_BC(); void h(A& a) { if (a.type == 1) { BC1& b = static_cast(a); int i = b.x.i; do_something(i); } else if (a.type == 2) { BC2& b = static_cast(a); int i = b.x.i; do_something(i); } } void foo() { auto x = get_BC(); h(x); } Here again, there are changes that make the code no better w.r.t. a potential unsafe cast but do make the warning to go away: 1) Change the return type of get_BC to BC1. 2) Remove C1::j. 3) Remove the extra level of indirection given by template class B. (See [2].) [1] Example above: https://godbolt.org/z/Tha3M6xq3 [2] Example with no template: https://godbolt.org/z/nWsPvTrYr
[Bug tree-optimization/101436] New: Yet another bogus "array subscript is partly outside array bounds"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101436 Bug ID: 101436 Summary: Yet another bogus "array subscript is partly outside array bounds" Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- This bogus warning was reported at least twice recently: #98266 and #101374. Below is a new case that, it seems, hasn't been addressed yet. #include struct A { virtual ~A(); }; template struct B : A { T x; }; struct C1 { int i; double j; }; struct C2 { int i; }; void do_something(int); B get_BC2(); void h(A& a) { if (typeid(a) == typeid(B)) { B& b = static_cast&>(a); int i = b.x.i; do_something(i); } } void foo() { B x = get_BC2(); h(x); } Compiled with '-O3 -Warray-bounds' yields: : In function 'void foo()': :27:9: warning: array subscript 'B[0]' is partly outside array bounds of 'B [1]' [-Warray-bounds] 27 | int i = b.x.i; | ^ :33:9: note: while referencing 'x' 33 | B x = get_BC2(); FWIW: 1) This is a regression from GCC 10.3. 2) The warning goes away if any of the following changes are made: * Remove C1::j. * Change type of C1::j to any of int, char, bool, unsigned or float. (Perhaps any type T such that sizeof(T) <= sizeof(int)). * Compile with '-fPIC' (however, if h is marked inline then the warning comes back). 3) If b is declared as B (as opposed to B&), then the warning points to line 'struct B: A {'. 4) The test case could be simplified further by removing A's virtual destructor and the typeid check. However, this would make the code to invoke UB and I hope the code above doesn't. 5) #98266 regards virtual inheritance which does not appear here and a test cases therein issues no warning when compiled with GCC 11.1. 6) IIUC the warning reported by #101374 happens in GCC's own code and was caused by some recent change that is not part of GCC 11.1. Indeed a test case reported therein compiles fine with GCC 11.1 whereas the one above doesn't. See also: Test case above: https://godbolt.org/z/n4obaohPs Test case from #98266: https://godbolt.org/z/PEjfhs3T6 Test case from #101374: https://godbolt.org/z/Ebb8YszT5
[Bug tree-optimization/101225] New: Example where y % 16 == 0 seems more expensive than y % 400 == 0.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101225 Bug ID: 101225 Summary: Example where y % 16 == 0 seems more expensive than y % 400 == 0. Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider this implementation of is_leap_year: bool is_leap_year_1(short year) { return year % 100 == 0 ? year % 400 == 0 : year % 4 == 0; } If a number is multiple of 100, then it's divisible by 400 if and only if it's divisible by 16. Since checking divisibility by 16 is cheap, one would expect the following version to be more efficient (at least, not worse): bool is_leap_year_2(short year) { return year % 100 == 0 ? year % 16 == 0 : year % 4 == 0; } According to [1] the latter is 1.4x slower than the former. The emitted instructions with -O3 [2] don't seem bad and, except for a leal x addw, the difference is a localized strength-reduction from "y % 400 == 0" to "y % 16 == 0": is_leap_year_1(short): imulw $23593, %di, %ax leal 1308(%rax), %edx rorw $2, %dx cmpw $654, %dx ja .L2 addw $1296, %ax # Begin: year % 400 == 0 rorw $4, %ax# cmpw $162, %ax # setbe %al # End : year % 400 == 0 ret .L2: andl $3, %edi sete %al ret is_leap_year_2(short): imulw $23593, %di, %ax addw $1308, %ax rorw $2, %ax cmpw $654, %ax ja .L6 andl $15, %edi # Begin: y % 16 == 0 sete %al # End : y % 16 == 0 ret .L6: andl $3, %edi sete %al ret FWIW: My educated **guess** is that the issue is the choice of registers: for version 1 just after leal, the register rax/ax/al is free and regardless of the branch taken, the CPU can continue the calculation of "y % 100 == 0" in parallel with the other divisibility check, up to "sete %al". For version 2, rax/ax/al is busy during the whole execution of "y % 100" and "sete %al" can't be preemptively executed. As a test for my theory I reimplemented half of is_leap_year_2 in inline asm (see in [1] and [2]) using similar choices of registers as in is_leap_year_1 and I got the performance boost that I was expecting. [1] https://quick-bench.com/q/3U8t4qzXxtSpsehbWNOh3SWxBGQ [2] https://godbolt.org/z/jfK3j5777 Note: [1] runs GCC 10.2 but the same happens on GCC 11.0.0.
[Bug tree-optimization/88797] [9 Regression] Unneeded branch added when function is inlined (function runs faster if not inlined)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 --- Comment #13 from Cassio Neri --- FWIW: This seems to have been fixed since 10.1. As we can see in [1], on version 10.1, test_f has no unnecessary branches, as opposed to version 9.3. [1] https://godbolt.org/z/h87Efbanb As far as I'm concerned, you could close the ticket.
[Bug middle-end/93634] Improving modular calculations (e.g. divisibility tests).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93634 --- Comment #1 from Cassio Neri --- FYI, this is what clang trunk generates: imull $-1431655765, %edi, %eax # imm = 0xAAAB addl $1431655764, %eax # imm = 0x5554 rorl %eax cmpl $715827882, %eax # imm = 0x2AAA setb %al retq
[Bug middle-end/93634] New: Improving modular calculations (e.g. divisibility tests).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93634 Bug ID: 93634 Summary: Improving modular calculations (e.g. divisibility tests). Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: bool f(unsigned n) { return n % 6 == 4; } at -O3 the code generated for x86_64 is mov%edi,%eax mov$0xaaab,%edx imul %rdx,%rax shr$0x22,%rax lea(%rax,%rax,2),%eax add%eax,%eax sub%eax,%edi cmp$0x4,%edi sete %al retq whereas it could be sub$0x4,%edi imul $0xaaab,%edi,%edi ror%edi cmp$0x2aa9,%edi setbe %al retq Notice the later is quite similar to what gcc generates for n % 6 == 3: imul $0xaaab,%edi,%edi sub$0x1,%edi ror%edi cmp$0x2aaa,%edi setbe %al retq It's true that there's a small mathematical difference for the cases r <= 3 and r >= 4 but not enough to throw away the faster algorithm. I reckon this is not obvious and I refer to https://accu.org/var/uploads/journals/Overload154.pdf#page=13 which presents the overall idea and some benchmarks. In addition, it makes some comments on gcc's generated code for other cases of n % d == r. References therein provide mathematical proofs and extra benchmarks. FWIW: 1) This relates to bug 82853 and bug 12849 and to a lesser extend bug 89845. 2) Specifically, it confirms the idea (for unsigned integers) described by Orr Shalom Dvory in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853#c33
[Bug libstdc++/92124] New: std::vector copy-assigning when it should move-assign.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92124 Bug ID: 92124 Summary: std::vector copy-assigning when it should move-assign. Product: gcc Version: 9.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider two vectors a and rv. In the situation below a = std::move(rv) copy-assigns elements of rv into a, violating [container.requirements.general]/4, Table 83: "All existing elements of a are either move assigned to or destroyed" (See [1].) It happens with std::vector> such that: 1) X's move-constructor might throw (though I'm assigning and not constructing); 2) A does not propagate on move-assignment and allocators used by source and target vectors do not compare equal. The following MCVE contains some boiler plate and the most important parts are indicated by comments. #include #include #include #include struct X { X() = default; X(const X&) = default; // Move constructor might throw X(X&&) noexcept(false) {} // "= default" changes reported behaviour // Tracking calls to assignment functions X& operator=(const X&) { putchar('c'); return *this; } X& operator=(X&&) noexcept(true) { putchar('m'); return *this; } }; unsigned counter = 0; template struct A : std::allocator { template struct rebind { using other = A; }; A() : std::allocator(), id(++counter) {} // Does not propagate using propagate_on_container_move_assignment = std::false_type; // Does not always compare equal using is_always_equal = std::false_type; bool operator ==(const A& o) { return id == o.id; } bool operator !=(const A& o) { return id != o.id; } unsigned id; }; int main() { std::vector> a(2), rv(2); a = std::move(rv); } Running the code above outputs "cc" (instead of "mm") confirming the two elements of rv are copy-assigned into a. See relevant discussion in [2] (with link to possible culprit lines of code in libstdc++) and life example above in [3] [1] https://timsong-cpp.github.io/cppwp/n4659/container.requirements#tab:containers.container.requirements [2] https://stackoverflow.com/questions/58378051/issue-when-compiling-libstdc-with-clang?noredirect=1#comment103136248_58378051 [3] https://godbolt.org/z/EgkPrP
[Bug c++/91158] "if (__builtin_constant_p(n))" versus "if constexpr (__builtin_constant_p(n))"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91158 --- Comment #7 from Cassio Neri --- (In reply to Jakub Jelinek from comment #4) Got it! Thank you, Mark and Jonathan. Please, feel free to close the ticket.
[Bug c++/91158] "if (__builtin_constant_p(n))" versus "if constexpr (__builtin_constant_p(n))"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91158 --- Comment #3 from Cassio Neri --- Forget my use case and comments on dead code elimination. That was a digression. (My bad.) In general, I don't expect `if` and `if constexpr` to behave the same but I do in this particular case. (I might be wrong.) Finally, since `__builtin_constant_p` is not standard this ticket is a feature request, not a bug report. My reasoning is this: when the compiler sees `static_assert(f1(1));` it enters a "constexpr evaluation mode" (not sure this is the right terminology but you get the point). At this moment, regardless of optimization level the compiler must propagate `1` to `f1` otherwise (generally speaking) it cannot evaluate `f1(1)` and decide whether the `static_assert` passes or not. Therefore, when it enters `f1` and sees `if constexpr (__builtin_constant_p(n))` it is already in "constexpr evaluation mode" (so `constexpr` here is redundant) and it knows `n == 1`. Hence, it should evaluate `__builtin_constant_p(n)` to `1`. To make clear that my point is not that `if` and `if constexpr` should always work the same, please, contrast with this program: int main() { if (f0(1)) puts("if : yes"); else puts("if : no"); if constexpr (f0(1)) puts("if ce: yes"); else puts("if ce: no"); } The output in -O0 mode is `if : no` and `if ce: yes`. Since the first `if` is not `constexpr`, the compiler doesn't need to enter "constexpr evaluation mode" and -O0 is too low for `1` to be propagated to `f0`. The second `if`, on the other hand, is `constexpr`. The compiler enters "constexpr evaluation mode", propagates `1` to `f0` and evaluates `if (__builtin_constant_p(n))` to `1` regardless that this `if` is not `constexpr`. Also, to make clear I'm OK with `if constexpr (__builtin_constant_p(n))` evaluating to `0` even in -O3 level, consider this: int main() { if (f0(1)) puts("if : yes"); else puts("if : no"); } The output is `if : no`. Since the `if` is not `constexpr`, constant propagation is up to the optimizer (QoI issue) and I'm OK if it enters "constexpr evaluation mode" only inside `f1` (when it sees `if constexpr (__builtin_constant_p(n))`) at which point is too late to know the value of `n` and considers `n` as non constant. Finally, I would link to link this issue with bug 70552 comment 5. Martin Sebor, commenting on a patch for another related issue says: "The patch referenced from it sets a precedent for the intrinsic treating constant expressions as constant despite its late evaluation under "normal" circumstances". IIUIC, it says that `__builtin_constant_p(expr)` always evaluates to `1` if expr is a C++ constant expression (e.g. a call to a `constexpr` function). Similarly, I believe that in "constexpr evaluation mode", almost every evaluation of `__builtin_constant_p(expr)` in the taken path should yield `1`. (There are exceptions, notably, when `expr` is a non `constexpr` local variable.)
[Bug c++/91158] New: "if (__builtin_constant_p(n))" versus "if constexpr (__builtin_constant_p(n))"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91158 Bug ID: 91158 Summary: "if (__builtin_constant_p(n))" versus "if constexpr (__builtin_constant_p(n))" Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: constexpr bool f0(int n) { if (__builtin_constant_p(n)) return true; return false; // Alternatively: // return __builtin_constant_p(n) ? true : false; // return __builtin_constant_p(n); } constexpr bool f1(int n) { if constexpr (__builtin_constant_p(n)) return true; return false; } static_assert( f0(1)); static_assert( f1(1)); // gcc 9.1 fails I would expect both static_asserts to pass, that is, I would expect no difference in behaviour between f0 and f1. (FWIW, for gcc 9.1, -O0, -O1, -O2 and -O3 all behave the same.) This might be an issue with different moments where 'if constexpr' and __builtin_constant_p are evaluated. (Similarly to bug 19449 comment 2.) In any case, I find f1 very misleading. My real use case is like if (__builtin_constant_p(n)) // efficient code else // less efficient code However, branching at runtime is unacceptable and, if the compiler does not know the value of n it's preferable to drop the 'if-else' altogether and live with the less efficient code. Willing to avoid branching at runtime is a big hint for using 'if constexpr' but, as things stand, this implies *never* using the more efficient code. Although a regular 'if' does what I want, I don't get the assurance that 'if contexpr' provides about no branching at runtime. Instead, I need to rely on the optimizer rather than on the semantics of C++. See also bug 54021.
[Bug middle-end/12849] testing divisibility by constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849 --- Comment #7 from Cassio Neri --- Thanks for implementing the modular inverse algorithm in gcc. However, the implementation has an issue. In some cases, for no obvious reason, the compiler falls back to the old algorithm. For instance, bool f1(unsigned n) { return n % 10 == 5; } as expected, uses the modular inverse algorithm and translates to f1(unsigned int): imull $-858993459, %edi, %edi subl $1, %edi rorl %edi cmpl $429496729, %edi setbe %al ret whereas bool f2(unsigned n) { return n % 10 == 6; } doesn't use the modular inverse algorithm and is the same as in older versions of gcc: f2(unsigned int): movl %edi, %eax movl $3435973837, %edx imulq %rdx, %rax shrq $35, %rax leal (%rax,%rax,4), %eax addl %eax, %eax subl %eax, %edi cmpl $6, %edi sete %al ret See on godbolt: https://godbolt.org/z/u-C54I I would like make another observation. For some divisors (e.g. 7, 19, 21) the modular inverse algorithm seems to be faster than the traditional one even when the remainder r (in n % d == r) is not a compile time constant. In general this happens in cases where the "magic number" M used by the traditional algorithm to replace the division "n / d" with "n * M >> k" is such that M doesn't fit in a register and extra operations are required to overcome this problem. In other words, these are the divisors for which '"Add" indicator' in https://www.hackersdelight.org/magic.htm shows 1. I made some measurements and I hope to make my results available for your consideration soon.
[Bug tree-optimization/90447] Missed opportunities to use adc (worse when -1 is involved)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90447 --- Comment #1 from Cassio Neri --- Forgot to mention this discussion on SO: https://stackoverflow.com/questions/56101507/is-there-anything-special-about-1-0x-regarding-adc
[Bug tree-optimization/90447] New: Missed opportunities to use adc (worse when -1 is involved)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90447 Bug ID: 90447 Summary: Missed opportunities to use adc (worse when -1 is involved) Product: gcc Version: 9.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- The following are three attempts to get gcc to generate adc instructions from C++: #include unsigned constexpr X = 0; unsigned f1(unsigned a, unsigned b) { b += a; auto c = b < a; b += X + c; return b; } unsigned f2(unsigned a, unsigned b) { b += a; b += X + (b < a); return b; } unsigned f3(unsigned a, unsigned b) { b += a; unsigned char c = b < a; _addcarry_u32(c, b, X, ); return b; } The 3 functions above (-O3 -std=c++17) generate: addl%edi, %esi movl%esi, %eax adcl$0, %eax ret This is great and I would expect that changing X would only affect the immediate value and nothing more. I was wrong. Changing X to 1, makes f1 and f3 change as I expected but f2 becomes: f2(unsigned int, unsigned int): xorl%eax, %eax addl%edi, %esi setc%al addl$1, %eax addl%esi, %eax ret I thought I could blame "b += X + (b < a);" for being undefined behaviour. However, I believe that, at least in c++17 this is not the case given the addition of this sentence: "The right operand is sequenced before the left operand." to [expr.ass]. As far as Standard C++ is concerned, I expect f1 to be equivalent to f2. Things got worse when X == -1: f1(unsigned int, unsigned int): xorl %eax, %eax addl %edi, %esi setc %al leal -1(%rax,%rsi), %eax ret f2(unsigned int, unsigned int): xorl %eax, %eax addl %edi, %esi setnc %al subl %eax, %esi movl %esi, %eax ret f3(unsigned int, unsigned int): addl %esi, %edi movl $-1, %eax setc %dl addb $-1, %dl adcl %edi, %eax ret No adc whatsoever. I'm not an assembly guy but if I understand f3 correctly, "setc %dl / addb $-1, dl" is simply storing the CF in dl and adding dl to 0xff to force CF to get the same value it already had before instruction setc was executed. Basically, this is a convoluted-register-wasteful nop. I thought the problem could be related to issue [1] but this one has already being resolved in trunk where this issue also happens and -fno-split-paths doesn't seem to change anything. The example in godbold is https://godbolt.org/z/3GUyLj but if you play with the site's settings (particularly, lib.f) be aware of their issue [2]. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 but this [2] https://github.com/mattgodbolt/compiler-explorer/issues/1377
[Bug c++/89960] New: Implicit derived to base conversion considered type punning.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89960 Bug ID: 89960 Summary: Implicit derived to base conversion considered type punning. Product: gcc Version: 8.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: struct base { int i; void f(){} }; template struct derived : base { void g1() { return (this->*F)(); } void g2() { base* p = this; return (p->*F)(); } }; void h() { derived<::f> x; x.g1(); x.g2(); } Compiling with -O2 -Wstrict-aliasing gives a warning warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] return (this->*F)(); ~~^~ It looks like the implicit conversion from derived to base is considered type-punning. Remarks: The warning goes away if either: 1) -O2 is not used. 2) -Wstrict-aliasing is not used. 3) base has no non-static data members. 4) F is not a template parameter. 5) x.g1()) is not called. (In contrast, x.g2() compiles fine and this is a workaround for the issue.) 6) if another compiler is used (other vendor's but also gcc 4.6.4 or earlier)
[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 --- Comment #5 from Cassio Neri --- There's a (fragile) workaround: void use(unsigned); #define VERSION 0 bool f(unsigned x, unsigned y) { #if VERSION == 0 return x < + (y <= ); #else bool b = y <= ; return x < + b; #endif } void test_f(unsigned x, unsigned y) { for (unsigned i = 0; i < ; ++i) use(f(x++, y++)); } f is till the same. Version 0 of test_f has 4 jumps whereas version 1 has only one. https://godbolt.org/z/gZZQ2f
[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 --- Comment #4 from Cassio Neri --- Comment on attachment 45408 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45408 Running example The magic numbers 4, 6, 7, 0x24924924u and 0xb6db6db7u were chosen in an attempt to maximize the probability of making branch prediction harder and the difference in performance clearer.
[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 --- Comment #3 from Cassio Neri --- The attached file is running example that shows that performance is damaged. The code runs faster when test_f calls g instead of f where g is bool g(unsigned x, unsigned y) { if (x >= y) return false; return f(n, r); } even in the case where x < y and g does call f. Depending on #defines the example runs either f, g or both. These are the timings: $ g++ -O3 -o gcc_issue gcc_issue.cpp -D RUN_SIMPLE && time ./gcc_issue Running simple function... real0m3.646s user0m3.645s sys 0m0.000s $ g++ -O3 -o gcc_issue gcc_issue.cpp -D RUN_COMPLEX && time ./gcc_issue Running complex function... real0m1.165s user0m1.161s sys 0m0.003s $ g++ -O3 -o gcc_issue gcc_issue.cpp -D RUN_BOTH && time ./gcc_issue Running simple function... Running complex function... real0m3.059s user0m3.051s sys 0m0.007s Notice that run both is faster than running f only! This is so because then the compiler gives up inlining and calls the (good) generated code for f in isolation.
[Bug tree-optimization/88797] Unneeded branch added when function is inlined (function runs faster if not inlined)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 --- Comment #2 from Cassio Neri --- Created attachment 45408 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45408=edit Running example
[Bug rtl-optimization/88797] New: Unneeded branch added when function is inlined (function runs faster if not inlined)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88797 Bug ID: 88797 Summary: Unneeded branch added when function is inlined (function runs faster if not inlined) Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- Consider: void use(unsigned); bool f(unsigned x, unsigned y) { return x < + (y <= ); } void test_f(unsigned x, unsigned y) { for (unsigned i = 0; i < ; ++i) use(f(x++, y++)); } The generated code for f seems fine and the there's no branch to test y <= : f(unsigned int, unsigned int): xorl %eax, %eax cmpl $, %esi setbe %al addl $, %eax cmpl %edi, %eax seta %al ret However, when f is inlined in test_f, a branch is introduced to decide whether x should be compared to or 1112 (code cut for brevity) test_f(unsigned int, unsigned int): [...] jmp .L6 .L14: cmpl $, %eax .L12: [...] .L6: [...] cmpl $, %ebx jbe .L14 cmpl $1110, %eax jmp .L12 [...] See https://godbolt.org/z/_EC992 use -O3. This seems to be a regression: it used to be OK up to 6.3 and then degraded in 7.1 (according to godbolt).
[Bug middle-end/12849] testing divisibility by constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849 Cassio Neri changed: What|Removed |Added CC||cassio.neri at gmail dot com --- Comment #4 from Cassio Neri --- A simple mathematical proof that the algorithm works is found here: http://clomont.com/efficient-divisibility-testing/ See also https://stackoverflow.com/a/49264279/1137388.
[Bug tree-optimization/84648] New: Missed optimization : loop not removed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84648 Bug ID: 84648 Summary: Missed optimization : loop not removed. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Target Milestone: --- The loop below is not eliminated: int main() { for (unsigned i = 0; i < (1u << 31); ++i) { } return 0; } Compiled with -O3: main: xor eax, eax .L2: add eax, 1 jns .L2 xor eax, eax ret The loop is removed for other bounds, e.g. (1u << 31) + 1 or (1u << 31) - 1, or when < is replaced with <=. Allow me to make a guess of the underlying problem: The optimization that uses jns to detect when i reaches (10...0)_2 ends up by blocking the other optimization that eliminates the loop altoghether. Same issue when using unsigned long long and (1ull << 63). FWIW: clang has the same issue (in C but not in C++).
[Bug c++/59238] New: Dynamic allocating a list-initialized object of a type with private destructor fails.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59238 Bug ID: 59238 Summary: Dynamic allocating a list-initialized object of a type with private destructor fails. Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com Consider: class foo { ~foo() {} }; int main() { new foo; // OK new foo(); // OK new foo{}; // error: 'foo::~foo()' is private } The last line shouldn't fail to compile since the destructor is not invoked. FWIW, it compiles fine with clang. It also compiles fine with gcc 4.9.0 20131109 if foo has a user declared default constructor.
[Bug c++/58170] New: Crash when aliasing a template class that is a member of its template base class.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58170 Bug ID: 58170 Summary: Crash when aliasing a template class that is a member of its template base class. Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: cassio.neri at gmail dot com The code below crashes gcc 4.8.1 (coincidentaly, it also crashes clang 3.3). - template typename T, typename U struct base { template typename V struct derived; }; template typename T, typename U template typename V struct baseT, U::derived : public baseT, V { }; // This (wrong?) alias declaration provoques the crash. template typename T, typename U, typename V using derived = baseT, U::derivedV; // This one works: // template typename T, typename U, typename V // using derived = typename baseT, U::template derivedV; template typename T void f() { derivedT, bool, char m{}; (void) m; } int main() { fint(); } - $ g++ -v -save-temps -std=c++11 -Wall -pedantic main.cpp Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /build/gcc/src/gcc-4.8-20130725/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --disable-cloog-version-check --enable-lto --enable-gold --enable-ld=default --enable-plugin --with-plugin-ld=ld.gold --with-linker-hash-style=gnu --disable-install-libiberty --disable-multilib --disable-libssp --disable-werror --enable-checking=release Thread model: posix gcc version 4.8.1 20130725 (prerelease) (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-std=c++11' '-Wall' '-Wpedantic' '-shared-libgcc' '-mtune=generic' '-march=x86-64' /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/cc1plus -E -quiet -v -D_GNU_SOURCE main.cpp -mtune=generic -march=x86-64 -std=c++11 -Wall -Wpedantic -fpch-preprocess -o main.ii ignoring nonexistent directory /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../x86_64-unknown-linux-gnu/include #include ... search starts here: #include ... search starts here: /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../include/c++/4.8.1 /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../include/c++/4.8.1/x86_64-unknown-linux-gnu /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/../../../../include/c++/4.8.1/backward /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/include /usr/local/include /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/include-fixed /usr/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-std=c++11' '-Wall' '-Wpedantic' '-shared-libgcc' '-mtune=generic' '-march=x86-64' /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.1/cc1plus -fpreprocessed main.ii -quiet -dumpbase main.cpp -mtune=generic -march=x86-64 -auxbase main -Wall -Wpedantic -std=c++11 -version -o main.s GNU C++ (GCC) version 4.8.1 20130725 (prerelease) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.8.1 20130725 (prerelease), GMP version 5.1.2, MPFR version 3.1.2, MPC version 1.0.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C++ (GCC) version 4.8.1 20130725 (prerelease) (x86_64-unknown-linux-gnu) compiled by GNU C version 4.8.1 20130725 (prerelease), GMP version 5.1.2, MPFR version 3.1.2, MPC version 1.0.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: fcec9480bd3d120c7e8d40d79394317d main.cpp: In substitution of ‘templateclass T, class U, class V using derived = baseT, U::derivedV [with T = T; U = bool; V = char]’: main.cpp:24:24: required from here main.cpp:16:39: internal compiler error: Segmentation fault using derived = baseT, U::derivedV; ^ Please submit a full bug report, with preprocessed source if appropriate. See https://bugs.archlinux.org/ for instructions.
[Bug c++/56693] New: Fail to ignore const qualification on top of a function type.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56693 Bug #: 56693 Summary: Fail to ignore const qualification on top of a function type. Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: cassio.n...@gmail.com The following code: void f() {} template typename T void g(const T*) { } int main() { g(f); } raises an error with this note: types 'const T' and 'void()' have incompatible cv-qualifiers Attempting to instantiate g creates a function that takes a pointer to a const T where T = void(). Since there's no such thing as a const function, this explains the note. However, C++11 8.3.5/6 says The effect of a cv-qualifier-seq in a function declarator is not the same as adding cv-qualification on top of the function type. In the latter case, the cv-qualifiers are ignored. Hence, the const qualifier should be ignored and the code should compile. (It does compile with clang and visual studio.) For more information see: http://stackoverflow.com/questions/15578298/can-a-const-t-match-a-pointer-to-free-function
[Bug c++/55101] New: Invalid implicit conversion in initialization when source type is a template argument type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55101 Bug #: 55101 Summary: Invalid implicit conversion in initialization when source type is a template argument type Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: cassio.n...@gmail.com Calling f(b) below implies an implicit call to an explicit conversion operator. This is fine with for gcc 4.8.0 but illegal for clang 3.1 and 3.2 (trunk). Notice that gcc complains (as it should) in other similar circumstances. struct A { }; struct B { explicit operator int() const { return 1; } explicit operator A() const { return A(); } }; template typename T void f(T b) { int x = b; } template typename T void g(T b) { A y = b; } int main() { B b; //int x = b; // Error: cannot convert 'B' to 'int' in initialization f(b);// OK for gcc 4.8.0, despite that 'int x = b;' occurs inside f // Error for clang 3.1 and 3.2. //A y = b; // Error: conversion from 'B' to non-scalar type 'A' requested //g(b); // Error: conversion from 'B' to non-scalar type 'A' requested }
[Bug libstdc++/54722] New: std::is_nothrow_default_constructibleT::value depends on whether destructor throws or not.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54722 Bug #: 54722 Summary: std::is_nothrow_default_constructibleT::value depends on whether destructor throws or not. Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: cassio.n...@gmail.com Consider: #include iostream #include type_traits struct foo { foo() noexcept {} ~foo() {} }; int main() { std::cout std::boolalpha; std::cout std::is_nothrow_default_constructiblefoo::value std::endl; return 0; } This should output 'true' but it outputs 'false'. Adding a 'noexcept' specification to ~foo() makes the code to output the expected result. Looking at the source, I guess, the reason lies on the implementation of this helper class: templatetypename _Tp struct __is_nt_default_constructible_atom : public integral_constantbool, noexcept(_Tp()) { }; Indeed, the expression '_Tp()' if executed creates a temporary of type _Tp whose lifetime ends immediately and ~Tp_ is called. Therefore, 'noexcept(_Tp())' is 'true' if and only if neither the constructor nor the destructor throw. I believe the below implementation fixes the problem (at least it does for the example above): templatetypename _Tp struct __is_nt_default_constructible_atom : public std::integral_constantbool, noexcept(new (std::nothrow) _Tp) { };