[Bug tree-optimization/102392] Failure to optimize a sign extension to a zero extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392 Gabriel Ravier changed: What|Removed |Added Target|X86_64-linux-gnu|x86_64-linux-gnu Version|12.0|15.0 --- Comment #5 from Gabriel Ravier --- I've wound up stumbling upon a very similar bug (which I think is the same bug at its core) while examining the following code: static uint32_t f(int8_t x) { return (~(uint32_t)x) & 1; } void floop(uint32_t *r, int8_t *x, size_t n) { #ifndef __clang__ _Pragma("GCC unroll 0") _Pragma("GCC novector") #else _Pragma("clang loop unroll(disable) vectorize(disable)") #endif for (size_t i = 0; i < n; ++i) r[i] = f(x[i]); } where for the loop, GCC generates: .L3: movsx eax, BYTE PTR [rsi+rdx]# <--- sign extension not eax and eax, 1 mov DWORD PTR [rdi+rdx*4], eax add rdx, 1 cmp rcx, rdx jne .L3 whereas LLVM manages: .LBB0_2: # =>This Inner Loop Header: Depth=1 movzx ecx, byte ptr [rsi + rax] # <--- zero extension not ecx and ecx, 1 mov dword ptr [rdi + 4*rax], ecx inc rax cmp rdx, rax jne .LBB0_2 which makes LLVM's output slightly faster (according to llvm-mca) for the same reasons (i.e. lack of conversion from sign extension to zero extension).
[Bug tree-optimization/102392] Failure to optimize a sign extension to a zero extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392 --- Comment #4 from Richard Biener --- Note w/o explicit ZEXT_EXPR / SEXT_EXPR the GIMPLE for non-"natural" extensions is more costly (more stmts) than the "natural" extension based on the sign of the object because it requires an intermediate sign changing conversion.
[Bug tree-optimization/102392] Failure to optimize a sign extension to a zero extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102392 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Ever confirmed|0 |1 Last reconfirmed||2021-09-18 Status|UNCONFIRMED |NEW Component|target |tree-optimization Keywords|ABI | --- Comment #3 from Andrew Pinski --- A better testcase is: void g(int32_t *x, int64_t *y) { if (*x < 0) __builtin_unreachable(); *y = (*x); } To see what LLVM is really doing. GCC: movslq (%rdi), %rax movq%rax, (%rsi) ret VS LLVM: movl(%rdi), %eax movq%rax, (%rsi) retq There might be a dup of this bug somewhere too.