https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119917
Bug ID: 119917
Summary: [Optimization Opportunity] Inefficient code generation
for multiplication overflow detection with explicit
zero checks
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: a1343922569 at outlook dot com
Target Milestone: ---
Description:
I've identified an opportunity for optimization in GCC where the compiler
generates suboptimal code for multiplication overflow detection functions that
include explicit division-by-zero checks. The generated assembly contains
unnecessary conditional checks and jumps which could be eliminated.
This is not a bug, but rather a missed optimization opportunity where explicit
zero checks are not fully utilized to generate more efficient code.
Environment Information:
①With GCC 14.2.0
- GCC version: GNU C++17 (Compiler-Explorer-Build-gcc--binutils-2.42) version
14.2.0 (x86_64-linux-gnu)
- System type: x86_64-linux-gnu
- Configuration: compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR
version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP
②With GCC 16.0.0
- GCC version: GNU C++17 GNU C++17
(Compiler-Explorer-Build-gcc-3a3bcb6a233352ce2bfa9f6f49dc44d8ae5aa6cb-binutils-2.42)
version 16.0.0 20250423 (experimental) (x86_64-linux-gnu)
- System type: x86_64-linux-gnu
- Configuration: compiled by GNU C version 11.4.0, GMP version 6.2.1, MPFR
version 4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP
Complete Command Line:
g++ -O2 -S test.cpp
Compiler Output:
ASM generation compiler returned: 0
Execution build compiler returned: 0
Program returned: 0
Preprocessed File:
The test case doesn't require a preprocessed file as it doesn't use any macros
or include directives. The optimization issue can be seen directly in the
assembly output from the source code provided.
Reproduction Code (C++):
bool mulOverflow64(unsigned long long a, unsigned long long b) {
if (b == 0) return false;
return a > (~0ULL) / b;
}
bool mulOverflow32(unsigned int a, unsigned int b) {
if (b == 0) return false;
return a > (~0U) / b;
}
bool mulOverflow16(unsigned short a, unsigned short b) {
if (b == 0) return false;
return a > 0xFFFF / b;
}
bool mulOverflow8(unsigned char a, unsigned char b) {
if (b == 0) return false;
return a > 0xFF / b;
}
int main()
{
return 0;
}
Current Generated Assembly:
mulOverflow64(unsigned long long, unsigned long long):
xor eax, eax
test rsi, rsi
je .L1
mov rax, rsi
mul rdi
seto al
.L1:
ret
mulOverflow32(unsigned int, unsigned int):
xor eax, eax
test esi, esi
je .L8
mov eax, esi
mul edi
seto al
.L8:
ret
mulOverflow16(unsigned short, unsigned short):
xor eax, eax
test si, si
je .L14
xor edx, edx
movzx esi, si
mov eax, 65535
movzx edi, di
idiv esi
cmp edi, eax
setg al
.L14:
ret
mulOverflow8(unsigned char, unsigned char):
xor eax, eax
test sil, sil
je .L17
xor edx, edx
movzx esi, sil
mov eax, 255
movzx edi, dil
idiv esi
cmp edi, eax
setg al
.L17:
ret
main:
xor eax, eax
ret
Expected Optimized Assembly:
mulOverflow64(unsigned long long, unsigned long long):
mov rax, rsi
mul rdi
seto al
ret
mulOverflow32(unsigned int, unsigned int):
mov eax, esi
mul edi
seto al
ret
mulOverflow16(unsigned short, unsigned short):
mov eax, esi
mul di
seto al
ret
mulOverflow8(unsigned char, unsigned char):
mov eax, esi
mul dil
seto al
ret
main:
xor eax, eax
ret
Issue:
When a function explicitly checks for b==0, the compiler should be able to
optimize away the redundant runtime check in the assembly output. Additionally,
for 16-bit and 8-bit versions, the compiler isn't using the more efficient
multiplication and overflow detection strategy that it uses for 32-bit and
64-bit versions, but the more inefficient division.
Performance Impact:
These functions are often used in performance-critical code paths. The
redundant checks and branches can negatively impact performance, especially in
tight loops and on processors with limited branch prediction capabilities.
I've verified that this issue indeed exists in both GCC 14.2.0 and the latest
trunk version (GCC 16.0.0).
Related Previous Bug:
This optimization opportunity is related to Bug ID: 117529 ("Missed
optimization: (y != 0 && x > (unsigned)(-1) / y) (multiplication overflow
check)") reported at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117529.
However, my report differs in several important aspects:
1. This report provides a more comprehensive analysis across multiple integer
widths (8, 16, 32, and 64-bit).
2. It demonstrates that the issue is more widespread than previously
documented.
3. It highlights an inconsistency in optimization strategy between different
integer widths (32/64-bit vs 8/16-bit).
4. It includes concrete examples of the expected optimized assembly for each
case.
While the previous report focused only on the 64-bit case and didn't provide
detailed expectation for the assembly result, this expanded report shows the
optimization opportunity affects all integer widths, with the smaller widths
(8/16-bit) having an even larger performance gap due to the use of idiv
instructions instead of mul with overflow detection.
I hope this more comprehensive analysis provides sufficient motivation to
address this optimization opportunity, which could improve performance for a
common pattern in security-critical and performance-sensitive code.