[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452 --- Comment #7 from Paweł Bylica --- (In reply to Martin Jambor from comment #6) > (In reply to Paweł Bylica from comment #5) > > (In reply to Martin Jambor from comment #4) > > > In this testcase all (well, both) functions referenced from the array > > > are semantically equivalent which is recognized by ICF but making it > > > be able to pass this information to the inliner would be > > > non-trivial... and is this the common case worth optimizing for? > > > > I reduced the original code to the array of two identical functions. > > Originally, there weren't identical. I can update the test case if this make > > more sense. > > Probably not. But how many elements does the array have in the original > code? Perhaps we could speculatively inline them if there are only few. 5. These are boolean functions from RIPEMD160.
[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452 --- Comment #5 from Paweł Bylica --- (In reply to Martin Jambor from comment #4) > In this testcase all (well, both) functions referenced from the array > are semantically equivalent which is recognized by ICF but making it > be able to pass this information to the inliner would be > non-trivial... and is this the common case worth optimizing for? I reduced the original code to the array of two identical functions. Originally, there weren't identical. I can update the test case if this make more sense.
[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452 --- Comment #2 from Paweł Bylica --- I don't think this is related to lambdas. The following is also not optimized: using F = int (*)(int) noexcept; inline int impl(int x) noexcept { return x; } void test(int z[2]) noexcept { static constexpr F fs[]{ impl, impl, }; for (int i = 0; i < 2; ++i) { z[i] = fs[i](z[i]); } } https://godbolt.org/z/9hPbzo4Px
[Bug rtl-optimization/114452] New: Functions invoked through compile-time table of function pointers not inlined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452 Bug ID: 114452 Summary: Functions invoked through compile-time table of function pointers not inlined Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- In the following example there is a compile-time table of pointers to simple functions. When the table is used in a simple unrolled loop with constant trip count the functions invoked by pointers are not inlined. using F = int (*)(int) noexcept; void test(int z[2]) noexcept { static constexpr F fs[]{ [](int x) noexcept { return x; }, [](int x) noexcept { return x; }, }; for (int i = 0; i < 2; ++i) { z[i] = fs[i](z[i]); } } Generated assembly: test(int*)::{lambda(int)#1}::_FUN(int): mov eax, edi ret test(int*)::{lambda(int)#2}::_FUN(int): mov eax, edi ret test(int*): mov rdx, rdi mov edi, DWORD PTR [rdi] calltest(int*)::{lambda(int)#1}::_FUN(int) mov edi, DWORD PTR [rdx+4] mov DWORD PTR [rdx], eax calltest(int*)::{lambda(int)#2}::_FUN(int) mov DWORD PTR [rdx+4], eax ret https://godbolt.org/z/fGqPKh81j
[Bug target/113764] New: [X86] Generates lzcnt when bsr is sufficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113764 Bug ID: 113764 Summary: [X86] Generates lzcnt when bsr is sufficient Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- When lzcnt instructions is enabled (-mlzcnt) the compiler generates lzcnt for __builtin_clz() in the context where the bsr instruction is sufficient and better. unsigned bsr(unsigned x) { return __builtin_clz(x) ^ 31; } bsr: xor eax, eax lzcnt eax, edi xor eax, 31 ret Without -mlzcnt the generated code is optimal. bsr: bsr eax, edi ret https://godbolt.org/z/5qcTq18nr
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #15 from Paweł Bylica --- For what it's worth, clang's __builtin_addc is implemented in frontend only as a pair of __builtin_add_overflow. The commit from 11 year ago does not explain why they were added. https://github.com/llvm/llvm-project/commit/54398015bf8cbdc3af54dda74807d6f3c8436164 Producing a chain of ADC instructions out of __builtin_add_overflow patterns has been done quite recently (~1 year ago). And this work is not fully finished yet. On the other hand, Go recently added "addc" like "builtins" in https://pkg.go.dev/math/bits. And they are really pleasure to use in multi-precision arithmetic.
[Bug tree-optimization/110020] [13/14 Regression] SHA2 misscompilation at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110020 --- Comment #2 from Paweł Bylica --- Yes, you are right. Sorry for taking your time.
[Bug tree-optimization/110020] New: [13/14 Regression] SHA2 misscompilation at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110020 Bug ID: 110020 Summary: [13/14 Regression] SHA2 misscompilation at -O3 Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- This is a test case reduced from a C implementation of SHA256. void test(unsigned h[8]) { for (unsigned i = 0; i < 2; i++) { unsigned w[16]; for (unsigned j = 0; j < 16; j++) { if (i == 0) w[j] = 0; h[7] = h[6]; h[6] = h[5]; h[5] = h[4]; h[4] = h[3]; h[3] = h[2]; h[2] = h[1]; h[1] = h[0]; h[0] += w[j]; } } } It looks that at -O3 compiler looses track of w[j] = 0 and uses uninitialized stack storage. test: movl-36(%rsp), %ecx movl-68(%rsp), %eax movq%rdi, %rdx movl-32(%rsp), %esi addl-72(%rsp), %eax addl-64(%rsp), %eax addl-60(%rsp), %eax addl-56(%rsp), %eax addl-52(%rsp), %eax addl-48(%rsp), %eax addl-44(%rsp), %eax addl-40(%rsp), %eax addl(%rdi), %eax addl%eax, %ecx movl-28(%rsp), %edi movl-24(%rsp), %r8d movl%eax, 28(%rdx) addl%ecx, %esi movl-20(%rsp), %r9d movl-16(%rsp), %r10d movl%ecx, 24(%rdx) addl%esi, %edi movl-12(%rsp), %r11d movl%esi, 20(%rdx) addl%edi, %r8d movl%edi, 16(%rdx) addl%r8d, %r9d movl%r8d, 12(%rdx) addl%r9d, %r10d movl%r9d, 8(%rdx) addl%r10d, %r11d movl%r10d, 4(%rdx) movl%r11d, (%rdx) ret https://godbolt.org/z/ff7E9sd94
[Bug rtl-optimization/109845] New: Addition overflow/carry flag unnecessarily put in a temporary register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109845 Bug ID: 109845 Summary: Addition overflow/carry flag unnecessarily put in a temporary register Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- When we have an addition and an overflow check and the overflow flag is combined with some other condition the codegen may generate variant when the overflow flag is temporary register. unsigned s = y + z; _Bool ov = s < y; if (x || ov) return; This produces add esi, edx setcal testedi, edi jne .L1 testeax, eax jne .L1 while it could be add esi, edx jc .L6 testedi, edi jne .L6 There are easy workaround to the C code which make the assembly optimal: 1. Change the order of checks if (ov || x) 2. Split if into two if (x) return; if (ov) return; https://godbolt.org/z/rxsrnhPdc
[Bug rtl-optimization/49054] useless cmp+jmp generated for switch when "default:" is unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49054 Paweł Bylica changed: What|Removed |Added CC||chfast at gmail dot com --- Comment #7 from Paweł Bylica --- GCC 13 generates optimal decision tree for the mentioned modified case. if id == 3: i() elif id <= 3: if id == 0: f() else: # 1 g() else: if id == 4: j() else: # 23456 h() https://godbolt.org/z/9j6b88qKE So I think this issue is fixed.
[Bug middle-end/109844] New: Unnecessary basic block with single jmp instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109844 Bug ID: 109844 Summary: Unnecessary basic block with single jmp instruction Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- The code void err(void); void merge_bb(int y) { if (y) return err(); } is merge_bb: testedi, edi jne .L4 ret .L4: jmp err but could be merge_bb: testedi, edi jne err ret https://godbolt.org/z/eafPa4o4T
[Bug target/105354] __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354 Paweł Bylica changed: What|Removed |Added CC||chfast at gmail dot com --- Comment #6 from Paweł Bylica --- Confirmed fixed. https://godbolt.org/z/rEqcMqKaz
[Bug middle-end/104151] [10/11/12/13/14 Regression] x86: excessive code generated for 128-bit byteswap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104151 Paweł Bylica changed: What|Removed |Added CC||chfast at gmail dot com --- Comment #18 from Paweł Bylica --- Not sure if this helps in any way, but this is a 256-bit variant: https://godbolt.org/z/84fMTs1YP.
[Bug rtl-optimization/109771] New: Unnecessary pblendw for vectorized or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109771 Bug ID: 109771 Summary: Unnecessary pblendw for vectorized or Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- I have an example of vectorization of 4x64-bit struct (representation of 256-bit integer). The implementation just uses for loop of count 4. This is vectorized in isolation however when combined with some non-trivial control-flow and additional wrapping functions the final assembly contains weird pblendw instructions. pblendw xmm1, xmm3, 240 (GCC 13, x86-64-v2) movlpd xmm1, QWORD PTR [rdi+16] (GCC 13, x86-64-v1) shufpd xmm1, xmm3, 2(GCC 12) I believe this is some kind of regression in GCC 13 because I have a bigger context where GCC 12 was optimizing it "correctly". However, I lost this information during test reduction. https://godbolt.org/z/jzK44h3js cpp: struct u256 { unsigned long w[4]; }; inline u256 or_(u256 x, u256 y) { u256 z; for (int i = 0; i < 4; ++i) z.w[i] = x.w[i] | y.w[i]; return z; } inline void or_to(u256& z, u256 y) { z = or_(z, y); } void op_or(u256* t) { or_to(t[1], t[0]); } void test(u256* t) { void* tbl[]{&, &}; CLOBBER: goto * 0; OR: op_or(t); goto * 0; } x86-64-v2 asm: test(u256*): xorl%eax, %eax jmp *%rax movdqu 32(%rdi), %xmm3 movdqu (%rdi), %xmm1 movdqu 16(%rdi), %xmm2 movdqu 48(%rdi), %xmm0 por %xmm3, %xmm1 movups %xmm1, 32(%rdi) movdqa %xmm2, %xmm1 pblendw $240, %xmm0, %xmm1 pblendw $240, %xmm2, %xmm0 por %xmm1, %xmm0 movups %xmm0, 48(%rdi) jmp *%rax
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #32 from Paweł Bylica --- For what it's worth, the original code is compiled the same as in Clang since GCC 10. https://godbolt.org/z/vxorYW815
[Bug tree-optimization/109667] New: [12/13/14 Regression] Unnecessary temporary storage used for 32-byte struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109667 Bug ID: 109667 Summary: [12/13/14 Regression] Unnecessary temporary storage used for 32-byte struct Product: gcc Version: 12.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- Reduced reproducer: struct i256 { long v[4]; }; void assign(struct i256 *v, long z) { struct i256 r = {}; for (int i = 0; i < 1; ++i) r.v[i] = z; *v = r; } https://godbolt.org/z/avM74o3r6 The compiler allocates temporary storage on stack for `r`: assign: pxorxmm0, xmm0 mov QWORD PTR [rsp-40], rsi movups XMMWORD PTR [rsp-32], xmm0 movdqa xmm1, XMMWORD PTR [rsp-40] mov QWORD PTR [rsp-16], 0 movdqa xmm2, XMMWORD PTR [rsp-24] movups XMMWORD PTR [rdi], xmm1 movups XMMWORD PTR [rdi+16], xmm2 ret Regression since 12. The 11 compiles nicely to: assign: mov QWORD PTR [rdi], rsi mov QWORD PTR [rdi+8], 0 mov QWORD PTR [rdi+16], 0 mov QWORD PTR [rdi+24], 0 ret
[Bug tree-optimization/106786] [12/13 Regression] SRA regression causes extra instructions sometimes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106786 --- Comment #4 from Paweł Bylica --- Any update on this? I've identified some other similar cases where this hurting the performance.
[Bug tree-optimization/107837] New: Missed optimization: Using memcpy to load a struct unnecessary uses stack space
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107837 Bug ID: 107837 Summary: Missed optimization: Using memcpy to load a struct unnecessary uses stack space Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- I have a simple struct with array uint64_t[4]. When using memcpy() load it from a storage of bytes and then performing some additional operations, a temporary object on the stack is created. struct uint256 { unsigned long v[4]; }; void load_bad(uint256* o, const char* src) noexcept { uint256 x; __builtin_memcpy(, src, sizeof(x)); uint256 y; y.v[0] = __builtin_bswap64(x.v[3]); y.v[1] = __builtin_bswap64(x.v[2]); y.v[2] = __builtin_bswap64(x.v[1]); y.v[3] = __builtin_bswap64(x.v[0]); *o = y; } load_bad(uint256*, char const*): movdqu xmm0, XMMWORD PTR [rsi] movdqu xmm1, XMMWORD PTR [rsi+16] movaps XMMWORD PTR [rsp-40], xmm0 mov rdx, QWORD PTR [rsp-32] mov rax, QWORD PTR [rsp-40] movaps XMMWORD PTR [rsp-24], xmm1 mov rsi, QWORD PTR [rsp-16] mov rcx, QWORD PTR [rsp-24] bswap rdx bswap rax mov QWORD PTR [rdi+16], rdx bswap rsi bswap rcx mov QWORD PTR [rdi], rsi mov QWORD PTR [rdi+8], rcx mov QWORD PTR [rdi+24], rax ret The workaround is to use reinterpret_cast. https://godbolt.org/z/WevYch8nv
[Bug c++/96868] C++20 designated initializer erroneous warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96868 --- Comment #6 from Paweł Bylica --- The workaround is MyObj obj = {}; which at least suggests some inconsistency in the compiler internals. For me this warning should be disabled in C++ when designated initializers are used and all other fields are value initialized.
[Bug c++/107434] New: Wrong -Wmissing-field-initializers for C++ designated initializers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107434 Bug ID: 107434 Summary: Wrong -Wmissing-field-initializers for C++ designated initializers Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- If a struct S has a field c of type C having user constructor the "missing-field-initializers" is reported for this field even though designated initializers are used. struct C { int x = 0; }; struct S { C c; bool flag = false; }; S test() { return {.flag = true}; } : In function 'S test()': :15:25: warning: missing initializer for member 'S::c' [-Wmissing-field-initializers] 15 | return {.flag = true}; | ^ https://godbolt.org/z/sxc8PP7Pq
[Bug tree-optimization/106786] New: Regression in cmp+sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106786 Bug ID: 106786 Summary: Regression in cmp+sbb Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- I noticed a regression when using the builtin for sbb instruction (__builtin_ia32_sbb_u64). typedef unsigned long long u64; struct R { u64 value; bool carry; }; inline R subc(u64 x, u64 y, bool carry) noexcept { u64 d; const u64 carryout = __builtin_ia32_sbb_u64(carry, x, y, ); return {d, carryout != 0}; } bool bad(u64 x, u64 y) { const R z = subc(x, y, false); R a = subc(x, y, z.carry); return a.carry; } https://godbolt.org/z/f41KKe19q The expected assembly is cmp rdi, rsi sbb rdi, rsi But GCC 12.2.0 and trunk produces cmp rdi, rsi setbal movzx eax, al add al, -1 sbb rdi, rsi The regression is in 12.2.0, the 11.3.0 optimizes properly. There are simple changes which will bring back the expected optimization: - change `const R z` to `R z`, - change `bool carry` to `u64 carry`. This may be related to calling convention / ABI because I noticed in one of the tree optimization outputs for 12.2.0 that the `bool carry` is forced to be in memory: `MEM [(struct R *) + 8B]`. https://godbolt.org/z/7zh7GxraK
[Bug rtl-optimization/96475] direct threaded interpreter with computed gotos generates suboptimal dispatch loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96475 Paweł Bylica changed: What|Removed |Added CC||chfast at gmail dot com --- Comment #25 from Paweł Bylica --- Is this issue resolved then?
[Bug c++/105481] New: ICE: unexpected expression of kind template_parm_index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105481 Bug ID: 105481 Summary: ICE: unexpected expression of kind template_parm_index Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- I get intx_reduced.cpp: In substitution of ‘template uint f(const T&) [with unsigned int N = N; T = uint; = ]’: intx_reduced.cpp:18:31: required from here intx_reduced.cpp:13:5: internal compiler error: unexpected expression ‘N’ of kind template_parm_index 13 | typename = typename std::enable_if>::value>::type> | ^~~~ for code: #include template struct uint { int words_[N]; }; template uint f(const uint& y) noexcept; template >::value>::type> uint f(const T& y) noexcept; using X = uint<1>; X (*fp)(X const&) noexcept = The reduced version (cvise): template struct integral_constant { static constexpr _Tp value = __v; }; using true_type = integral_constant; using false_type = integral_constant; template using __bool_constant = integral_constant; template struct conditional; template struct __or_; template struct __or_<_B1, _B2> : conditional<_B1::value, _B1, _B2>::type {}; template struct is_const; template struct is_array : false_type {}; template struct is_function : __bool_constant::value> {}; template struct is_const : true_type {}; template , is_array<_To>>::value> struct __is_convertible_helper { template static true_type __test(int); typedef decltype(__test<_To>(0)) type; }; template struct is_convertible : __is_convertible_helper<_From, _To>::type {}; template struct enable_if { typedef _Tp type; }; template struct conditional { typedef _Iffalse type; }; template struct uint; template uint f(const uint &); template < unsigned N, typename T, typename = typename enable_if>::value>::type> uint f(T); using X = uint<1>; X (*fp)(X const &) = f;
[Bug target/100119] New: [x86] Conversion unsigned int -> double produces -0 (-m32 -msse2 -mfpmath=sse)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100119 Bug ID: 100119 Summary: [x86] Conversion unsigned int -> double produces -0 (-m32 -msse2 -mfpmath=sse) Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- When building for 32-bit x86 but with SSE2 floating-point enabled: -m32 -msse2 -mfpmath=sse the conversion from unsigned int 0 to double produces the result of -0.0 when floating-point rounding mode is set to FE_DOWNWARD. I used -frounding-math and #pragma STDC FENV_ACCESS ON. This bug is not present on x87 nor x86_64 builds. The bug seems to be present at least since GCC 5. #include #pragma STDC FENV_ACCESS ON __attribute__((noinline)) double u32_to_f64(unsigned x) { return static_cast(x); } int main() { fesetround(FE_DOWNWARD); double d = u32_to_f64(0); return __builtin_signbit(d) != 0; // signbit should be 0 } The assembly: u32_to_f64(unsigned int): sub esp, 12 pxorxmm0, xmm0 mov eax, DWORD PTR [esp+16] add eax, -2147483648 cvtsi2sdxmm0, eax addsd xmm0, QWORD PTR .LC0 movsd QWORD PTR [esp], xmm0 fld QWORD PTR [esp] add esp, 12 ret main: lea ecx, [esp+4] and esp, -16 pushDWORD PTR [ecx-4] pushebp mov ebp, esp pushecx sub esp, 32 push1024 callfesetround mov DWORD PTR [esp], 0 callu32_to_f64(unsigned int) mov ecx, DWORD PTR [ebp-4] add esp, 16 fstpQWORD PTR [ebp-16] movsd xmm0, QWORD PTR [ebp-16] leave lea esp, [ecx-4] movmskpdeax, xmm0 and eax, 1 ret .LC0: .long 0 .long 1105199104 https://godbolt.org/z/rrMWY9jsG
[Bug target/99620] Subtract with borrow (SBB) missed optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99620 --- Comment #4 from Paweł Bylica --- Can you give me introduction where and how to fix it? I have a longer list of similar issues, so maybe it's good time to learn how to fix them myself. FYI, clang is unifying both cases by changing `k = l > a.l` into `k = a.l < b.l` and only having SUB_OVERFLOW match for `k = a.l < b.l` case.
[Bug rtl-optimization/99620] New: Subtract with borrow (SBB) missed optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99620 Bug ID: 99620 Summary: Subtract with borrow (SBB) missed optimization Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- Hi. For the 128-bit precision subtraction: SUB + SBB the optimization depends on the how the carry bit condition is specified in the code. In the first case below everything works nicely, but in the second we have unnecessary CMP in the final code. I believe the second carry bit condition is simpler (does not require unsigned integer wrapping behavior) and does not have dependency on the first subtraction. using u64 = unsigned long; struct u128 { u64 l; u64 h; }; auto sub_good(u128 a, u128 b) { auto l = a.l - b.l; auto k = l > a.l; auto h = a.h - b.h - k; return u128{l, h}; } auto sub_bad(u128 a, u128 b) { auto l = a.l - b.l; auto k = a.l < b.l; auto h = a.h - b.h - k; return u128{l, h}; } sub_good(u128, u128): mov rax, rdi sub rax, rdx sbb rsi, rcx mov rdx, rsi ret sub_bad(u128, u128): cmp rdi, rdx mov rax, rdi sbb rsi, rcx sub rax, rdx mov rdx, rsi ret If you think this is easy to fix, I would like to give it a try if I could get some pointers where to start.
[Bug c++/97145] Sanitizer pointer-subtract breaks constexpr functions subtracting pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97145 Paweł Bylica changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Paweł Bylica --- This looks to be fixed in trunk. Thanks.
[Bug middle-end/51839] GCC not generating adc instruction for canonical multi-precision add sequence
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51839 Paweł Bylica changed: What|Removed |Added CC||chfast at gmail dot com --- Comment #1 from Paweł Bylica --- This is fixed in GCC 8.1 (at least for add+adc pair). https://godbolt.org/z/9j4f6r
[Bug libstdc++/97659] Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659 --- Comment #4 from Paweł Bylica --- I'd like to explain some things here (to my best knowledge): 1. The "pointer-subtract" checks is ASan extension, not enabled by default. When running with this check enabled in my application I have not detected any issues in std::vector. 2. The "pointer-subtract" checks if you pointer subtraction operands are from the same memory allocation. Allowed values are all pointers from the memory region plus the "end" pointer one element outside of the region. Other subtractions are UB in C to my information. 3. The issue shows up only when "pointer-subtract" is combined with _GLIBCXX_SANITIZE_VECTOR. Moreover, the report looks like false positive because the subtraction is between the "end" pointer and a pointer from inside of a memory region.
[Bug libstdc++/97659] Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659 --- Comment #2 from Paweł Bylica --- Created attachment 49482 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49482=edit Minimal test case source code It turned out the problem is related to vector's internal instrumentation _GLIBCXX_SANITIZE_VECTOR. The minimal test case is the following: #define _GLIBCXX_SANITIZE_VECTOR 1 #include int main() { std::vector v; v.reserve(1); char in[1] = {}; v.insert(v.end(), in, in + 1); return 0; } export ASAN_OPTIONS=detect_invalid_pointer_pairs=1 g++ pointer_subtract_bug.cpp -fsanitize=address,pointer-subtract ./a.out
[Bug libstdc++/97659] New: Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659 Bug ID: 97659 Summary: Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer) Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- When vector::insert(iterator pos, InputIt first, InputIt last) is used the AddressSanitizer additional check "pointer-subtract" reports invalid pointer pair in c++/10/bits/vector.tcc:729. The relevant code is this: template template void vector<_Tp, _Alloc>:: _M_range_insert(iterator __position, _ForwardIterator __first, _ForwardIterator __last, std::forward_iterator_tag) { if (__first != __last) { const size_type __n = std::distance(__first, __last); if (size_type(this->_M_impl._M_end_of_storage - this->_M_impl._M_finish) >= __n) // FAILS HERE! { My core code causing the problem is this: void push(std::vector& b, uint32_t value) { uint8_t storage[sizeof(value)]; __builtin_memcpy(storage, , sizeof(value)); b.insert(b.end(), std::begin(storage), std::end(storage)); } My program is pushing single bytes and uint32_t value using the above helper to a vector, without preallocation. But I was not able to reproduce this issues on a side. I will need more time to reduce my code to a proper regression test. gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0 export ASAN_OPTIONS=detect_invalid_pointer_pairs=1 = ==3327279==ERROR: AddressSanitizer: invalid-pointer-pair: 0x60206e5c 0x60206e5a #0 0x556e32bfecbf in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*, std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:729 #1 0x556e32bfecbf in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*, std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665 #2 0x556e32bfecbf in __gnu_cxx::__normal_iterator > > std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*) /usr/include/c++/10/bits/stl_vector.h:1383 #3 0x556e32bfecbf in push /home/chfast/Projects/wasmx/fizzy/lib/fizzy/parser_expr.cpp:26 ... 0x60206e5c is located 0 bytes to the right of 12-byte region [0x60206e50,0x60206e5c) allocated by thread T0 here: #0 0x7f0bfa861f17 in operator new(unsigned long) (/lib/x86_64-linux-gnu/libasan.so.6+0xb1f17) #1 0x556e32bff1e1 in __gnu_cxx::new_allocator::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115 #2 0x556e32bff1e1 in std::allocator_traits >::allocate(std::allocator&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460 #3 0x556e32bff1e1 in std::_Vector_base >::_M_allocate(unsigned long) /usr/include/c++/10/bits/stl_vector.h:346 #4 0x556e32bff1e1 in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*, std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:769 #5 0x556e32bff1e1 in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*, std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665 #6 0x556e32bff1e1 in __gnu_cxx::__normal_iterator > > std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*) /usr/include/c++/10/bits/stl_vector.h:1383 #7 0x556e32bff1e1 in push /home/chfast/Projects/wasmx/fizzy/lib/fizzy/parser_expr.cpp:26 ... 0x60206e5a is located 10 bytes inside of 12-byte region [0x60206e50,0x60206e5c) allocated by thread T0 here: #0 0x7f0bfa861f17 in operator new(unsigned long) (/lib/x86_64-linux-gnu/libasan.so.6+0xb1f17) #1 0x556e32bff1e1 in __gnu_cxx::new_allocator::allocate(unsigned long, void const*) /usr/include/c++/10/ext/new_allocator.h:115 #2 0x556e32bff1e1 in std::allocator_traits >::allocate(std::allocator&, unsigned long) /usr/include/c++/10/bits/alloc_traits.h:460 #3 0x556e32bff1e1 in std::_Vector_base >::_M_allocate(unsigned long) /usr/include/c++/10/bits/stl_vector.h:346 #4 0x556e32bff1e1 in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*, std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:769 #5 0x556e32bff1e1 in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*, std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665 #6 0x556e32bff1e1 in __gnu_cxx::__normal_iterator > > std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned
[Bug libstdc++/97415] New: Invalid pointer comparison in stringbuf::str() (reported by pointer-compare AddressSanitizer)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97415 Bug ID: 97415 Summary: Invalid pointer comparison in stringbuf::str() (reported by pointer-compare AddressSanitizer) Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com Target Milestone: --- When my application is instrumented with -fsanitize=address,pointer-compare and running under ASAN_OPTIONS=detect_invalid_pointer_pairs=2, I get for following failure in basic_stringbuf::str() ==3879==ERROR: AddressSanitizer: invalid-pointer-pair: 0x7ffcdf273b66 0x #0 0x5597a6c6d786 in std::__cxx11::basic_stringbuf, std::allocator >::str() const /usr/include/c++/10/sstream:184 #1 0x5597a6c6d786 in std::__cxx11::basic_ostringstream, std::allocator >::str() const /usr/include/c++/10/sstream:678 #2 0x5597a6c6d786 in std::basic_ostream >& std::__detail::operator<< , std::__cxx11::basic_string, std::allocator > const&>(std::basic_ostream >&, std::__detail::_Quoted_string, std::allocator > const&, char> const&) /usr/include/c++/10/bits/quoted_string.h:130 #3 0x5597a6c6d786 in std::basic_ostream >& std::filesystem::__cxx11::operator<< >(std::basic_ostream >&, std::filesystem::__cxx11::path const&) /usr/include/c++/10/bits/fs_path.h:441 #4 0x5597a6c6d786 in log_total /home/builder/project/test/spectests/spectests.cpp:675 #5 0x5597a6c48939 in run_tests_from_dir /home/builder/project/test/spectests/spectests.cpp:708 #6 0x5597a6c48939 in main /home/builder/project/test/spectests/spectests.cpp:750 Here is the implementation of basic_stringbuf::str() used for compilation: __string_type str() const { __string_type __ret(_M_string.get_allocator()); if (this->pptr()) { // The current egptr() may not be the actual string end. if (this->pptr() > this->egptr()) __ret.assign(this->pbase(), this->pptr()); else __ret.assign(this->pbase(), this->egptr()); } else __ret = _M_string; return __ret; } In the line `if (this->pptr() > this->egptr())`, the `this->egptr()` may be nullptr and therefore AddressSanitizer complains about this comparison. I don't have handy repro code for the issue, but I can try to build one if desired. GCC version: cpp (Debian 10.2.0-15) 10.2.0
[Bug sanitizer/97414] New: AddressSanitizer CHECK failed: detect_stack_use_after_return and detect_invalid_pointer_pairs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97414 Bug ID: 97414 Summary: AddressSanitizer CHECK failed: detect_stack_use_after_return and detect_invalid_pointer_pairs Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: chfast at gmail dot com CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- ==638106==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/asan/asan_thread.cpp:369 "((bottom)) != (0)" (0x0, 0x0) #0 0x7f00888e08b8 (/lib/x86_64-linux-gnu/libasan.so.6+0xb98b8) #1 0x7f00889007ce (/lib/x86_64-linux-gnu/libasan.so.6+0xd97ce) #2 0x7f00888e64f0 (/lib/x86_64-linux-gnu/libasan.so.6+0xbf4f0) #3 0x7f00888dd68b (/lib/x86_64-linux-gnu/libasan.so.6+0xb668b) #4 0x7f00888e0269 in __sanitizer_ptr_sub (/lib/x86_64-linux-gnu/libasan.so.6+0xb9269) #5 0x55e8cd6641f2 in pointer_diff(int const*, int const*) /home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/pointer_subtract_crash.cpp:2 #6 0x55e8cd664248 in main /home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/pointer_subtract_crash.cpp:10 #7 0x7f008865c0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) #8 0x55e8cd66410d in _start (/home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/a.out+0x110d) When running the program [[gnu::noinline]] auto pointer_diff(const int *begin, const int *end) { return end - begin; } int main() { constexpr auto size = (2048 / sizeof(int)) + 1; auto buf = new int[size]; auto end = buf + size; pointer_diff(end, buf); delete[] buf; return 0; } compiled with gcc -fsanitize=address,pointer-subtract -g pointer_subtract_crash.cpp To reproduce the crash, both runtime options must be enabled: ASAN_OPTIONS=detect_stack_use_after_return=1:detect_invalid_pointer_pairs=1 This bug was previously reported in LLVM's AddressSanitizer project https://bugs.llvm.org/show_bug.cgi?id=47626, but pointer-subtract is not supported there.