https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118657
Bug ID: 118657
Summary: Missed optimization (unreachable branch could be
pruned after taking into account the possible values
of a constexpr lookup table)
Product: gcc
Version: 14.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: nicula.iccc at gmail dot com
Target Milestone: ---
For the following code, GCC (14.2 and the current trunk) generates suboptimal
assembly:
#include <cstddef>
#include <array>
#include <cstdint>
#include <cassert>
using u8 = uint8_t;
constexpr size_t DATA_SIZE = 1024;
constexpr std::array<size_t, 256> TO_DATA_INDEX = {};
bool foo(int* data, u8 first_idx)
{
size_t second_idx = TO_DATA_INDEX[first_idx];
if (second_idx >= DATA_SIZE)
return false;
data[second_idx] = 1;
return true;
}
Assembly from godbolt.org with -O3 (https://godbolt.org/z/5f17onGYT):
foo(int*, unsigned char):
movzx esi, sil
xor eax, eax
mov rdx, QWORD PTR TO_DATA_INDEX[0+rsi*8]
cmp rdx, 1023
ja .L1
mov DWORD PTR [rdi+rdx*4], 1
mov eax, 1
.L1:
ret
TO_DATA_INDEX:
.zero 2048
GCC misses an optimization because it doesn't realize that the 'cmp rax, 1023'
check is redundant, because the only possible value that 'second_idx' can have
is 0.
Clang, on the other hand, sees this, and removes the dead branch. Clang's
assembly with -O3 (same link):
foo(int*, unsigned char):
mov dword ptr [rdi], 1
mov al, 1
ret
Those checks/branches could become significant, especially if you have multiple
layers through which you run this kind of transformation (first_idx ->
second_idx -> third_idx -> ... -> last_idx), and with bounds checking after
each layer. Given that in some contexts the value of 'last_index' is derivable
at compile-time (like in the example I gave above), it would be great if GCC
applied this optimization.