[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-04-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #7 from Paweł Bylica  ---
(In reply to Martin Jambor from comment #6)
> (In reply to Paweł Bylica from comment #5)
> > (In reply to Martin Jambor from comment #4)
> > > In this testcase all (well, both) functions referenced from the array
> > > are semantically equivalent which is recognized by ICF but making it
> > > be able to pass this information to the inliner would be
> > > non-trivial... and is this the common case worth optimizing for?
> > 
> > I reduced the original code to the array of two identical functions.
> > Originally, there weren't identical. I can update the test case if this make
> > more sense.
> 
> Probably not.  But how many elements does the array have in the original
> code?  Perhaps we could speculatively inline them if there are only few.

5. These are boolean functions from RIPEMD160.

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-04-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #5 from Paweł Bylica  ---
(In reply to Martin Jambor from comment #4)
> In this testcase all (well, both) functions referenced from the array
> are semantically equivalent which is recognized by ICF but making it
> be able to pass this information to the inliner would be
> non-trivial... and is this the common case worth optimizing for?

I reduced the original code to the array of two identical functions.
Originally, there weren't identical. I can update the test case if this make
more sense.

[Bug rtl-optimization/114452] Functions invoked through compile-time table of function pointers not inlined

2024-03-25 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

--- Comment #2 from Paweł Bylica  ---
I don't think this is related to lambdas. The following is also not optimized:


using F = int (*)(int) noexcept;

inline int impl(int x) noexcept { return x; }

void test(int z[2]) noexcept {
static constexpr F fs[]{
impl,
impl,
};

for (int i = 0; i < 2; ++i) {
z[i] = fs[i](z[i]);
}
}

https://godbolt.org/z/9hPbzo4Px

[Bug rtl-optimization/114452] New: Functions invoked through compile-time table of function pointers not inlined

2024-03-25 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114452

Bug ID: 114452
   Summary: Functions invoked through compile-time table of
function pointers not inlined
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

In the following example there is a compile-time table of pointers to simple
functions. When the table is used in a simple unrolled loop with constant trip
count the functions invoked by pointers are not inlined.

using F = int (*)(int) noexcept;

void test(int z[2]) noexcept {
static constexpr F fs[]{
[](int x) noexcept { return x; },
[](int x) noexcept { return x; },
};

for (int i = 0; i < 2; ++i) {
z[i] = fs[i](z[i]);
}
}

Generated assembly:

test(int*)::{lambda(int)#1}::_FUN(int):
mov eax, edi
ret
test(int*)::{lambda(int)#2}::_FUN(int):
mov eax, edi
ret
test(int*):
mov rdx, rdi
mov edi, DWORD PTR [rdi]
calltest(int*)::{lambda(int)#1}::_FUN(int)
mov edi, DWORD PTR [rdx+4]
mov DWORD PTR [rdx], eax
calltest(int*)::{lambda(int)#2}::_FUN(int)
mov DWORD PTR [rdx+4], eax
ret


https://godbolt.org/z/fGqPKh81j

[Bug target/113764] New: [X86] Generates lzcnt when bsr is sufficient

2024-02-05 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113764

Bug ID: 113764
   Summary: [X86] Generates lzcnt when bsr is sufficient
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When lzcnt instructions is enabled (-mlzcnt) the compiler generates lzcnt for
__builtin_clz() in the context where the bsr instruction is sufficient and
better.

unsigned bsr(unsigned x)
{
return __builtin_clz(x) ^ 31;
}

bsr:
  xor eax, eax
  lzcnt eax, edi
  xor eax, 31
  ret


Without -mlzcnt the generated code is optimal.

bsr:
  bsr eax, edi
  ret


https://godbolt.org/z/5qcTq18nr

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-05 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #15 from Paweł Bylica  ---
For what it's worth, clang's __builtin_addc is implemented in frontend only as
a pair of __builtin_add_overflow. The commit from 11 year ago does not explain
why they were added.
https://github.com/llvm/llvm-project/commit/54398015bf8cbdc3af54dda74807d6f3c8436164

Producing a chain of ADC instructions out of __builtin_add_overflow patterns
has been done quite recently (~1 year ago). And this work is not fully finished
yet.

On the other hand, Go recently added "addc" like "builtins" in
https://pkg.go.dev/math/bits. And they are really pleasure to use in
multi-precision arithmetic.

[Bug tree-optimization/110020] [13/14 Regression] SHA2 misscompilation at -O3

2023-05-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110020

--- Comment #2 from Paweł Bylica  ---
Yes, you are right. Sorry for taking your time.

[Bug tree-optimization/110020] New: [13/14 Regression] SHA2 misscompilation at -O3

2023-05-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110020

Bug ID: 110020
   Summary: [13/14 Regression] SHA2 misscompilation at -O3
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

This is a test case reduced from a C implementation of SHA256.

void test(unsigned h[8]) {
for (unsigned i = 0; i < 2; i++) {

unsigned w[16];
for (unsigned j = 0; j < 16; j++) {
if (i == 0)
w[j] = 0;

h[7] = h[6];
h[6] = h[5];
h[5] = h[4];
h[4] = h[3];
h[3] = h[2];
h[2] = h[1];
h[1] = h[0];
h[0] += w[j];
}
}
}

It looks that at -O3 compiler looses track of w[j] = 0 and uses uninitialized
stack storage.

test:
movl-36(%rsp), %ecx
movl-68(%rsp), %eax
movq%rdi, %rdx
movl-32(%rsp), %esi
addl-72(%rsp), %eax
addl-64(%rsp), %eax
addl-60(%rsp), %eax
addl-56(%rsp), %eax
addl-52(%rsp), %eax
addl-48(%rsp), %eax
addl-44(%rsp), %eax
addl-40(%rsp), %eax
addl(%rdi), %eax
addl%eax, %ecx
movl-28(%rsp), %edi
movl-24(%rsp), %r8d
movl%eax, 28(%rdx)
addl%ecx, %esi
movl-20(%rsp), %r9d
movl-16(%rsp), %r10d
movl%ecx, 24(%rdx)
addl%esi, %edi
movl-12(%rsp), %r11d
movl%esi, 20(%rdx)
addl%edi, %r8d
movl%edi, 16(%rdx)
addl%r8d, %r9d
movl%r8d, 12(%rdx)
addl%r9d, %r10d
movl%r9d, 8(%rdx)
addl%r10d, %r11d
movl%r10d, 4(%rdx)
movl%r11d, (%rdx)
ret 


https://godbolt.org/z/ff7E9sd94

[Bug rtl-optimization/109845] New: Addition overflow/carry flag unnecessarily put in a temporary register

2023-05-13 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109845

Bug ID: 109845
   Summary: Addition overflow/carry flag unnecessarily put in a
temporary register
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When we have an addition and an overflow check and the overflow flag is
combined with some other condition the codegen may generate variant when the
overflow flag is temporary register.

unsigned s = y + z;
_Bool ov = s < y;

if (x || ov) 
return;

This produces

add esi, edx
setcal
testedi, edi
jne .L1
testeax, eax
jne .L1

while it could be

add esi, edx
jc  .L6
testedi, edi
jne .L6


There are easy workaround to the C code which make the assembly optimal:

1. Change the order of checks 
if (ov || x)

2. Split if into two
if (x)
return;
if (ov) 
return;

https://godbolt.org/z/rxsrnhPdc

[Bug rtl-optimization/49054] useless cmp+jmp generated for switch when "default:" is unreachable

2023-05-13 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49054

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #7 from Paweł Bylica  ---
GCC 13 generates optimal decision tree for the mentioned modified case.

if id == 3:
i()
elif id <= 3:
if id == 0:
f()
else:  # 1
g()
else:
if id == 4:
j()
else:  # 23456
h()

https://godbolt.org/z/9j6b88qKE

So I think this issue is fixed.

[Bug middle-end/109844] New: Unnecessary basic block with single jmp instruction

2023-05-13 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109844

Bug ID: 109844
   Summary: Unnecessary basic block with single jmp instruction
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

The code

void err(void);

void merge_bb(int y) {
if (y) 
return err();
}

is

merge_bb:
testedi, edi
jne .L4
ret
.L4:
jmp err


but could be

merge_bb:
testedi, edi
jne err
ret

https://godbolt.org/z/eafPa4o4T

[Bug target/105354] __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled

2023-05-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #6 from Paweł Bylica  ---
Confirmed fixed. https://godbolt.org/z/rEqcMqKaz

[Bug middle-end/104151] [10/11/12/13/14 Regression] x86: excessive code generated for 128-bit byteswap

2023-05-11 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104151

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #18 from Paweł Bylica  ---
Not sure if this helps in any way, but this is a 256-bit variant:
https://godbolt.org/z/84fMTs1YP.

[Bug rtl-optimization/109771] New: Unnecessary pblendw for vectorized or

2023-05-08 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109771

Bug ID: 109771
   Summary: Unnecessary pblendw for vectorized or
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I have an example of vectorization of 4x64-bit struct (representation of
256-bit integer). The implementation just uses for loop of count 4.

This is vectorized in isolation however when combined with some non-trivial
control-flow and additional wrapping functions the final assembly contains
weird pblendw instructions.

pblendw xmm1, xmm3, 240  (GCC 13, x86-64-v2)
movlpd  xmm1, QWORD PTR [rdi+16] (GCC 13, x86-64-v1)
shufpd  xmm1, xmm3, 2(GCC 12)

I believe this is some kind of regression in GCC 13 because I have a bigger
context where GCC 12 was optimizing it "correctly". However, I lost this
information during test reduction.

https://godbolt.org/z/jzK44h3js

cpp:

struct u256 {
unsigned long w[4];
};

inline u256 or_(u256 x, u256 y) {
u256 z;
for (int i = 0; i < 4; ++i) 
z.w[i] = x.w[i] | y.w[i];
return z;
}

inline void or_to(u256& z, u256 y) { z = or_(z, y); }

void op_or(u256* t) { or_to(t[1], t[0]); }

void test(u256* t) {
void* tbl[]{&, &};
CLOBBER:
goto * 0;
OR:
op_or(t);
goto * 0;
}


x86-64-v2 asm:

test(u256*):
xorl%eax, %eax
jmp *%rax
movdqu  32(%rdi), %xmm3
movdqu  (%rdi), %xmm1
movdqu  16(%rdi), %xmm2
movdqu  48(%rdi), %xmm0
por %xmm3, %xmm1
movups  %xmm1, 32(%rdi)
movdqa  %xmm2, %xmm1
pblendw $240, %xmm0, %xmm1
pblendw $240, %xmm2, %xmm0
por %xmm1, %xmm0
movups  %xmm0, 48(%rdi)
jmp *%rax

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2023-05-07 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #32 from Paweł Bylica  ---
For what it's worth, the original code is compiled the same as in Clang since
GCC 10. https://godbolt.org/z/vxorYW815

[Bug tree-optimization/109667] New: [12/13/14 Regression] Unnecessary temporary storage used for 32-byte struct

2023-04-28 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109667

Bug ID: 109667
   Summary: [12/13/14 Regression] Unnecessary temporary storage
used for 32-byte struct
   Product: gcc
   Version: 12.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

Reduced reproducer:

struct i256 {
long v[4];
};
void assign(struct i256 *v, long z) {
struct i256 r = {};
for (int i = 0; i < 1; ++i) 
r.v[i] = z;
*v = r;
}

https://godbolt.org/z/avM74o3r6

The compiler allocates temporary storage on stack for `r`:

assign:
pxorxmm0, xmm0
mov QWORD PTR [rsp-40], rsi
movups  XMMWORD PTR [rsp-32], xmm0
movdqa  xmm1, XMMWORD PTR [rsp-40]
mov QWORD PTR [rsp-16], 0
movdqa  xmm2, XMMWORD PTR [rsp-24]
movups  XMMWORD PTR [rdi], xmm1
movups  XMMWORD PTR [rdi+16], xmm2
ret

Regression since 12. The 11 compiles nicely to:

assign:
mov QWORD PTR [rdi], rsi
mov QWORD PTR [rdi+8], 0
mov QWORD PTR [rdi+16], 0
mov QWORD PTR [rdi+24], 0
ret

[Bug tree-optimization/106786] [12/13 Regression] SRA regression causes extra instructions sometimes

2022-11-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106786

--- Comment #4 from Paweł Bylica  ---
Any update on this? I've identified some other similar cases where this hurting
the performance.

[Bug tree-optimization/107837] New: Missed optimization: Using memcpy to load a struct unnecessary uses stack space

2022-11-23 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107837

Bug ID: 107837
   Summary: Missed optimization: Using memcpy to load a struct
unnecessary uses stack space
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I have a simple struct with array uint64_t[4]. When using memcpy() load it from
a storage of bytes and then performing some additional operations, a temporary
object on the stack is created.


struct uint256
{
unsigned long v[4];
};

void load_bad(uint256* o, const char* src) noexcept
{
uint256 x;
__builtin_memcpy(, src, sizeof(x));
uint256 y;
y.v[0] = __builtin_bswap64(x.v[3]);
y.v[1] = __builtin_bswap64(x.v[2]);
y.v[2] = __builtin_bswap64(x.v[1]);
y.v[3] = __builtin_bswap64(x.v[0]);
*o = y;
}


load_bad(uint256*, char const*):
movdqu  xmm0, XMMWORD PTR [rsi]
movdqu  xmm1, XMMWORD PTR [rsi+16]
movaps  XMMWORD PTR [rsp-40], xmm0
mov rdx, QWORD PTR [rsp-32]
mov rax, QWORD PTR [rsp-40]
movaps  XMMWORD PTR [rsp-24], xmm1
mov rsi, QWORD PTR [rsp-16]
mov rcx, QWORD PTR [rsp-24]
bswap   rdx
bswap   rax
mov QWORD PTR [rdi+16], rdx
bswap   rsi
bswap   rcx
mov QWORD PTR [rdi], rsi
mov QWORD PTR [rdi+8], rcx
mov QWORD PTR [rdi+24], rax
ret


The workaround is to use reinterpret_cast.

https://godbolt.org/z/WevYch8nv

[Bug c++/96868] C++20 designated initializer erroneous warnings

2022-10-29 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96868

--- Comment #6 from Paweł Bylica  ---
The workaround is 

MyObj obj = {};

which at least suggests some inconsistency in the compiler internals.

For me this warning should be disabled in C++ when designated initializers are
used and all other fields are value initialized.

[Bug c++/107434] New: Wrong -Wmissing-field-initializers for C++ designated initializers

2022-10-27 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107434

Bug ID: 107434
   Summary: Wrong -Wmissing-field-initializers for C++ designated
initializers
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

If a struct S has a field c of type C having user constructor the
"missing-field-initializers" is reported for this field even though designated
initializers are used.

struct C
{
int x = 0;
};

struct S
{
C c;
bool flag = false;

};

S test()
{
return {.flag = true};
}

: In function 'S test()':
:15:25: warning: missing initializer for member 'S::c'
[-Wmissing-field-initializers]
   15 | return {.flag = true};
  | ^

https://godbolt.org/z/sxc8PP7Pq

[Bug tree-optimization/106786] New: Regression in cmp+sbb

2022-08-31 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106786

Bug ID: 106786
   Summary: Regression in cmp+sbb
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I noticed a regression when using the builtin for sbb instruction
(__builtin_ia32_sbb_u64).

typedef unsigned long long u64;

struct R {
u64 value;
bool carry;
};

inline R subc(u64 x, u64 y, bool carry) noexcept {
u64 d;
const u64 carryout = __builtin_ia32_sbb_u64(carry, x, y, );
return {d, carryout != 0};
}

bool bad(u64 x, u64 y) {
const R z = subc(x, y, false);
R a = subc(x, y, z.carry);
return a.carry;
}

https://godbolt.org/z/f41KKe19q

The expected assembly is
cmp rdi, rsi
sbb rdi, rsi

But GCC 12.2.0 and trunk produces
cmp rdi, rsi
setbal
movzx   eax, al
add al, -1
sbb rdi, rsi

The regression is in 12.2.0, the 11.3.0 optimizes properly.

There are simple changes which will bring back the expected optimization:
- change `const R z` to `R z`,
- change `bool carry` to `u64 carry`.

This may be related to calling convention / ABI because I noticed in one of the
tree optimization outputs for 12.2.0 that the `bool carry` is forced to be in
memory: `MEM  [(struct R *) + 8B]`.

https://godbolt.org/z/7zh7GxraK

[Bug rtl-optimization/96475] direct threaded interpreter with computed gotos generates suboptimal dispatch loop

2022-08-22 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96475

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #25 from Paweł Bylica  ---
Is this issue resolved then?

[Bug c++/105481] New: ICE: unexpected expression of kind template_parm_index

2022-05-04 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105481

Bug ID: 105481
   Summary: ICE: unexpected expression of kind template_parm_index
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

I get 

intx_reduced.cpp: In substitution of ‘template
uint f(const T&) [with unsigned int N = N; T = uint;
 = ]’:
intx_reduced.cpp:18:31:   required from here
intx_reduced.cpp:13:5: internal compiler error: unexpected expression ‘N’ of
kind template_parm_index
   13 | typename = typename std::enable_if>::value>::type>
  | ^~~~


for code:

#include 

template 
struct uint
{
   int words_[N];
};

template 
uint f(const uint& y) noexcept;

template >::value>::type>
uint f(const T& y) noexcept;

using X = uint<1>;

X (*fp)(X const&) noexcept = 


The reduced version (cvise):

template  struct integral_constant {
  static constexpr _Tp value = __v;
};
using true_type = integral_constant;
using false_type = integral_constant;
template  using __bool_constant = integral_constant;
template  struct conditional;
template  struct __or_;
template 
struct __or_<_B1, _B2> : conditional<_B1::value, _B1, _B2>::type {};
template  struct is_const;
template  struct is_array : false_type {};
template 
struct is_function : __bool_constant::value> {};
template  struct is_const : true_type {};
template , is_array<_To>>::value>
struct __is_convertible_helper {
  template  static true_type __test(int);
  typedef decltype(__test<_To>(0)) type;
};
template 
struct is_convertible : __is_convertible_helper<_From, _To>::type {};
template  struct enable_if { typedef _Tp type; };
template 
struct conditional {
  typedef _Iffalse type;
};
template  struct uint;
template  uint f(const uint &);
template <
unsigned N, typename T,
typename = typename enable_if>::value>::type>
uint f(T);
using X = uint<1>;
X (*fp)(X const &) = f;

[Bug target/100119] New: [x86] Conversion unsigned int -> double produces -0 (-m32 -msse2 -mfpmath=sse)

2021-04-16 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100119

Bug ID: 100119
   Summary: [x86] Conversion unsigned int -> double produces -0
(-m32 -msse2 -mfpmath=sse)
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When building for 32-bit x86 but with SSE2 floating-point enabled:
-m32 -msse2 -mfpmath=sse

the conversion from unsigned int 0 to double produces the result of -0.0 when
floating-point rounding mode is set to FE_DOWNWARD.

I used -frounding-math and #pragma STDC FENV_ACCESS ON.

This bug is not present on x87 nor x86_64 builds.

The bug seems to be present at least since GCC 5.


#include 

#pragma STDC FENV_ACCESS ON

__attribute__((noinline)) double u32_to_f64(unsigned x) {
  return static_cast(x);
}

int main() {
  fesetround(FE_DOWNWARD);

  double d = u32_to_f64(0);

  return __builtin_signbit(d) != 0;  // signbit should be 0
}


The assembly:

u32_to_f64(unsigned int):
sub esp, 12
pxorxmm0, xmm0
mov eax, DWORD PTR [esp+16]
add eax, -2147483648
cvtsi2sdxmm0, eax
addsd   xmm0, QWORD PTR .LC0
movsd   QWORD PTR [esp], xmm0
fld QWORD PTR [esp]
add esp, 12
ret
main:
lea ecx, [esp+4]
and esp, -16
pushDWORD PTR [ecx-4]
pushebp
mov ebp, esp
pushecx
sub esp, 32
push1024
callfesetround
mov DWORD PTR [esp], 0
callu32_to_f64(unsigned int)
mov ecx, DWORD PTR [ebp-4]
add esp, 16
fstpQWORD PTR [ebp-16]
movsd   xmm0, QWORD PTR [ebp-16]
leave
lea esp, [ecx-4]
movmskpdeax, xmm0
and eax, 1
ret
.LC0:
.long   0
.long   1105199104


https://godbolt.org/z/rrMWY9jsG

[Bug target/99620] Subtract with borrow (SBB) missed optimization

2021-03-17 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99620

--- Comment #4 from Paweł Bylica  ---
Can you give me introduction where and how to fix it? I have a longer list of
similar issues, so maybe it's good time to learn how to fix them myself.

FYI, clang is unifying both cases by changing `k = l > a.l` into `k = a.l <
b.l` and only having SUB_OVERFLOW match for `k = a.l < b.l` case.

[Bug rtl-optimization/99620] New: Subtract with borrow (SBB) missed optimization

2021-03-16 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99620

Bug ID: 99620
   Summary: Subtract with borrow (SBB) missed optimization
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

Hi.

For the 128-bit precision subtraction: SUB + SBB the optimization depends on
the how the carry bit condition is specified in the code. In the first case
below everything works nicely, but in the second we have unnecessary CMP in the
final code.

I believe the second carry bit condition is simpler (does not require unsigned
integer wrapping behavior) and does not have dependency on the first
subtraction. 


using u64 = unsigned long;

struct u128
{
u64 l;
u64 h;
};

auto sub_good(u128 a, u128 b)
{
auto l = a.l - b.l;
auto k = l > a.l;
auto h = a.h - b.h - k;
return u128{l, h};
}

auto sub_bad(u128 a, u128 b)
{
auto l = a.l - b.l;
auto k = a.l < b.l;
auto h = a.h - b.h - k;
return u128{l, h};
}


sub_good(u128, u128):
mov rax, rdi
sub rax, rdx
sbb rsi, rcx
mov rdx, rsi
ret
sub_bad(u128, u128):
cmp rdi, rdx
mov rax, rdi
sbb rsi, rcx
sub rax, rdx
mov rdx, rsi
ret


If you think this is easy to fix, I would like to give it a try if I could get
some pointers where to start.

[Bug c++/97145] Sanitizer pointer-subtract breaks constexpr functions subtracting pointers

2021-02-23 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97145

Paweł Bylica  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Paweł Bylica  ---
This looks to be fixed in trunk. Thanks.

[Bug middle-end/51839] GCC not generating adc instruction for canonical multi-precision add sequence

2021-02-17 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51839

Paweł Bylica  changed:

   What|Removed |Added

 CC||chfast at gmail dot com

--- Comment #1 from Paweł Bylica  ---
This is fixed in GCC 8.1 (at least for add+adc pair).
https://godbolt.org/z/9j4f6r

[Bug libstdc++/97659] Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)

2020-11-01 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659

--- Comment #4 from Paweł Bylica  ---
I'd like to explain some things here (to my best knowledge):

1. The "pointer-subtract" checks is ASan extension, not enabled by default.
When running with this check enabled in my application I have not detected any
issues in std::vector.

2. The "pointer-subtract" checks if you pointer subtraction operands are from
the same memory allocation. Allowed values are all pointers from the memory
region plus the "end" pointer one element outside of the region. Other
subtractions are UB in C to my information.

3. The issue shows up only when "pointer-subtract" is combined with
_GLIBCXX_SANITIZE_VECTOR. Moreover, the report looks like false positive
because the subtraction is between the "end" pointer and a pointer from inside
of a memory region.

[Bug libstdc++/97659] Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)

2020-10-31 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659

--- Comment #2 from Paweł Bylica  ---
Created attachment 49482
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49482=edit
Minimal test case source code

It turned out the problem is related to vector's internal instrumentation
_GLIBCXX_SANITIZE_VECTOR.

The minimal test case is the following:

#define _GLIBCXX_SANITIZE_VECTOR 1
#include 

int main()
{
std::vector v;
v.reserve(1);

char in[1] = {};
v.insert(v.end(), in, in + 1);

return 0;
}


export ASAN_OPTIONS=detect_invalid_pointer_pairs=1
g++ pointer_subtract_bug.cpp -fsanitize=address,pointer-subtract
./a.out

[Bug libstdc++/97659] New: Invalid pointer subtraction in vector::insert() (reported by pointer-subtract AddressSanitizer)

2020-10-31 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659

Bug ID: 97659
   Summary: Invalid pointer subtraction in vector::insert()
(reported by pointer-subtract AddressSanitizer)
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When vector::insert(iterator pos, InputIt first, InputIt last) is used
the AddressSanitizer additional check "pointer-subtract" reports invalid
pointer pair in c++/10/bits/vector.tcc:729.

The relevant code is this:

  template
template
  void
  vector<_Tp, _Alloc>::
  _M_range_insert(iterator __position, _ForwardIterator __first,
  _ForwardIterator __last, std::forward_iterator_tag)
  {
if (__first != __last)
  {
const size_type __n = std::distance(__first, __last);
if (size_type(this->_M_impl._M_end_of_storage
  - this->_M_impl._M_finish) >= __n)  // FAILS HERE!
  {


My core code causing the problem is this:

void push(std::vector& b, uint32_t value)
{
uint8_t storage[sizeof(value)];
__builtin_memcpy(storage, , sizeof(value));
b.insert(b.end(), std::begin(storage), std::end(storage));
}


My program is pushing single bytes and uint32_t value using the above helper to
a vector, without preallocation. But I was not able to reproduce this issues on
a side. I will need more time to reduce my code to a proper regression test.

gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
export ASAN_OPTIONS=detect_invalid_pointer_pairs=1 

=
==3327279==ERROR: AddressSanitizer: invalid-pointer-pair: 0x60206e5c
0x60206e5a
#0 0x556e32bfecbf in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:729
#1 0x556e32bfecbf in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665
#2 0x556e32bfecbf in __gnu_cxx::__normal_iterator > >
std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*,
unsigned char*) /usr/include/c++/10/bits/stl_vector.h:1383
#3 0x556e32bfecbf in push
/home/chfast/Projects/wasmx/fizzy/lib/fizzy/parser_expr.cpp:26
...

0x60206e5c is located 0 bytes to the right of 12-byte region
[0x60206e50,0x60206e5c)
allocated by thread T0 here:
#0 0x7f0bfa861f17 in operator new(unsigned long)
(/lib/x86_64-linux-gnu/libasan.so.6+0xb1f17)
#1 0x556e32bff1e1 in __gnu_cxx::new_allocator::allocate(unsigned long, void const*)
/usr/include/c++/10/ext/new_allocator.h:115
#2 0x556e32bff1e1 in std::allocator_traits
>::allocate(std::allocator&, unsigned long)
/usr/include/c++/10/bits/alloc_traits.h:460
#3 0x556e32bff1e1 in std::_Vector_base >::_M_allocate(unsigned long)
/usr/include/c++/10/bits/stl_vector.h:346
#4 0x556e32bff1e1 in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:769
#5 0x556e32bff1e1 in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665
#6 0x556e32bff1e1 in __gnu_cxx::__normal_iterator > >
std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*,
unsigned char*) /usr/include/c++/10/bits/stl_vector.h:1383
#7 0x556e32bff1e1 in push
/home/chfast/Projects/wasmx/fizzy/lib/fizzy/parser_expr.cpp:26
...

0x60206e5a is located 10 bytes inside of 12-byte region
[0x60206e50,0x60206e5c)
allocated by thread T0 here:
#0 0x7f0bfa861f17 in operator new(unsigned long)
(/lib/x86_64-linux-gnu/libasan.so.6+0xb1f17)
#1 0x556e32bff1e1 in __gnu_cxx::new_allocator::allocate(unsigned long, void const*)
/usr/include/c++/10/ext/new_allocator.h:115
#2 0x556e32bff1e1 in std::allocator_traits
>::allocate(std::allocator&, unsigned long)
/usr/include/c++/10/bits/alloc_traits.h:460
#3 0x556e32bff1e1 in std::_Vector_base >::_M_allocate(unsigned long)
/usr/include/c++/10/bits/stl_vector.h:346
#4 0x556e32bff1e1 in void std::vector >::_M_range_insert(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::forward_iterator_tag) /usr/include/c++/10/bits/vector.tcc:769
#5 0x556e32bff1e1 in void std::vector >::_M_insert_dispatch(__gnu_cxx::__normal_iterator > >, unsigned char*, unsigned char*,
std::__false_type) /usr/include/c++/10/bits/stl_vector.h:1665
#6 0x556e32bff1e1 in __gnu_cxx::__normal_iterator > >
std::vector >::insert(__gnu_cxx::__normal_iterator > >, unsigned char*,
unsigned 

[Bug libstdc++/97415] New: Invalid pointer comparison in stringbuf::str() (reported by pointer-compare AddressSanitizer)

2020-10-14 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97415

Bug ID: 97415
   Summary: Invalid pointer comparison in stringbuf::str()
(reported by pointer-compare AddressSanitizer)
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
  Target Milestone: ---

When my application is instrumented with -fsanitize=address,pointer-compare
and running under ASAN_OPTIONS=detect_invalid_pointer_pairs=2,
I get for following failure in basic_stringbuf::str()

==3879==ERROR: AddressSanitizer: invalid-pointer-pair: 0x7ffcdf273b66
0x
#0 0x5597a6c6d786 in std::__cxx11::basic_stringbuf, std::allocator >::str() const
/usr/include/c++/10/sstream:184
#1 0x5597a6c6d786 in std::__cxx11::basic_ostringstream, std::allocator >::str() const
/usr/include/c++/10/sstream:678
#2 0x5597a6c6d786 in std::basic_ostream >&
std::__detail::operator<< ,
std::__cxx11::basic_string, std::allocator >
const&>(std::basic_ostream >&,
std::__detail::_Quoted_string, std::allocator > const&, char> const&)
/usr/include/c++/10/bits/quoted_string.h:130
#3 0x5597a6c6d786 in std::basic_ostream >&
std::filesystem::__cxx11::operator<< 
>(std::basic_ostream >&,
std::filesystem::__cxx11::path const&) /usr/include/c++/10/bits/fs_path.h:441
#4 0x5597a6c6d786 in log_total
/home/builder/project/test/spectests/spectests.cpp:675
#5 0x5597a6c48939 in run_tests_from_dir
/home/builder/project/test/spectests/spectests.cpp:708
#6 0x5597a6c48939 in main
/home/builder/project/test/spectests/spectests.cpp:750

Here is the implementation of basic_stringbuf::str() used for compilation:

  __string_type
  str() const
  {
__string_type __ret(_M_string.get_allocator());
if (this->pptr())
  {
// The current egptr() may not be the actual string end.
if (this->pptr() > this->egptr())
  __ret.assign(this->pbase(), this->pptr());
else
  __ret.assign(this->pbase(), this->egptr());
  }
else
  __ret = _M_string;
return __ret;
  }

In the line `if (this->pptr() > this->egptr())`,
the `this->egptr()` may be nullptr and therefore AddressSanitizer complains
about this comparison.

I don't have handy repro code for the issue, but I can try to build one if
desired.

GCC version: cpp (Debian 10.2.0-15) 10.2.0

[Bug sanitizer/97414] New: AddressSanitizer CHECK failed: detect_stack_use_after_return and detect_invalid_pointer_pairs

2020-10-14 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97414

Bug ID: 97414
   Summary: AddressSanitizer CHECK failed:
detect_stack_use_after_return and
detect_invalid_pointer_pairs
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: chfast at gmail dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

==638106==AddressSanitizer CHECK failed:
../../../../src/libsanitizer/asan/asan_thread.cpp:369 "((bottom)) != (0)" (0x0,
0x0)
#0 0x7f00888e08b8  (/lib/x86_64-linux-gnu/libasan.so.6+0xb98b8)
#1 0x7f00889007ce  (/lib/x86_64-linux-gnu/libasan.so.6+0xd97ce)
#2 0x7f00888e64f0  (/lib/x86_64-linux-gnu/libasan.so.6+0xbf4f0)
#3 0x7f00888dd68b  (/lib/x86_64-linux-gnu/libasan.so.6+0xb668b)
#4 0x7f00888e0269 in __sanitizer_ptr_sub
(/lib/x86_64-linux-gnu/libasan.so.6+0xb9269)
#5 0x55e8cd6641f2 in pointer_diff(int const*, int const*)
/home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/pointer_subtract_crash.cpp:2
#6 0x55e8cd664248 in main
/home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/pointer_subtract_crash.cpp:10
#7 0x7f008865c0b2 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
#8 0x55e8cd66410d in _start
(/home/chfast/Projects/compiler_bugs/sanitizers/pointer_subtract_crash/a.out+0x110d)


When running the program

[[gnu::noinline]] auto pointer_diff(const int *begin, const int *end) {
  return end - begin;
}

int main() {
  constexpr auto size = (2048 / sizeof(int)) + 1;

  auto buf = new int[size];
  auto end = buf + size;
  pointer_diff(end, buf);
  delete[] buf;

  return 0;
}


compiled with
gcc -fsanitize=address,pointer-subtract -g pointer_subtract_crash.cpp

To reproduce the crash, both runtime options must be enabled:
ASAN_OPTIONS=detect_stack_use_after_return=1:detect_invalid_pointer_pairs=1

This bug was previously reported in LLVM's AddressSanitizer project
https://bugs.llvm.org/show_bug.cgi?id=47626, but pointer-subtract is not
supported there.