https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124589
Bug ID: 124589
Summary: Missed optimization for std::ranges::contains
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gcc at hazardy dot de
Target Milestone: ---
This simple code
#include <algorithm>
#include <array>
struct S {
int I;
};
bool algo(std::array<S, 11>& s) {
return std::ranges::contains(s, 5, &S::I);
}
results with -O2 in this x86_64 assembly:
"algo(std::array<S, 11ul>&)":
lea rax, [rdi+44]
jmp .L3
.L6:
add rdi, 4
cmp rax, rdi
je .L2
.L3:
cmp DWORD PTR [rdi], 5
jne .L6
.L2:
cmp rax, rdi
setne al
ret
This is not optimal, since the compare at L2 isn't neccessary, it seems GCC
doesn't see that the find, if returning within the loop never gets to end, and
the check isn't necessary.
This is outperformed by a simple raw loop:
bool raw(std::array<S, 11>& s) {
for (auto iter = std::ranges::begin(s); iter != std::ranges::end(s);
++iter) {
if (iter->I == 5) {
return true;
}
}
return false;
}
results in
"raw(std::array<S, 11ul>&)":
lea rax, [rdi+44]
.L9:
cmp DWORD PTR [rdi], 5
je .L10
add rdi, 4
cmp rdi, rax
jne .L9
xor eax, eax
ret
.L10:
mov eax, 1
ret
Kind of interesting is that
bool raw2(std::array<S, 11>& s) {
auto iter = std::ranges::begin(s);
const auto end = std::ranges::end(s);
for (; iter != end; ++iter) {
if (iter->I == 5) {
break;
}
}
return iter != end;
}
is even better:
"raw2(std::array<S, 11ul>&)":
lea rax, [rdi+44]
.L14:
cmp DWORD PTR [rdi], 5
je .L13
add rdi, 4
cmp rax, rdi
jne .L14
.L13:
cmp rax, rdi
setne al
ret
With -O3 there's a similar mismatch, here a godbolt link showing trunk and 13.2
GCC: https://godbolt.org/z/4bz5zP5Kq
Using quickbench (which has only GCC 13.2) I get, unit in factor, smaller is
better.
-O Algo Raw1 Raw2
--- ---- ---- ----
-O0 4.3 2.0 1.0
-O1 1.0 1.0 1.0
-O2 1.9 1.5 1.0
-O3 9.9 1.0 8.8
-Og 1.6 1.3 1.0
Ideally all of those would produce the same assembly.
This is the code for the benchmark: https://godbolt.org/z/9ofYEsc8b