https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124589

            Bug ID: 124589
           Summary: Missed optimization for std::ranges::contains
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gcc at hazardy dot de
  Target Milestone: ---

This simple code

#include <algorithm>
#include <array>

struct S {
    int I;
};

bool algo(std::array<S, 11>& s) {
    return std::ranges::contains(s, 5, &S::I);
}

results with -O2 in this x86_64 assembly:
"algo(std::array<S, 11ul>&)":
        lea     rax, [rdi+44]
        jmp     .L3
.L6:
        add     rdi, 4
        cmp     rax, rdi
        je      .L2
.L3:
        cmp     DWORD PTR [rdi], 5
        jne     .L6
.L2:
        cmp     rax, rdi
        setne   al
        ret

This is not optimal, since the compare at L2 isn't neccessary, it seems GCC
doesn't see that the find, if returning within the loop never gets to end, and
the check isn't necessary.
This is outperformed by a simple raw loop:

bool raw(std::array<S, 11>& s) {
    for (auto iter = std::ranges::begin(s); iter != std::ranges::end(s);
++iter) {
        if (iter->I == 5) {
            return true;
        }
    }

    return false;
}

results in

"raw(std::array<S, 11ul>&)":
        lea     rax, [rdi+44]
.L9:
        cmp     DWORD PTR [rdi], 5
        je      .L10
        add     rdi, 4
        cmp     rdi, rax
        jne     .L9
        xor     eax, eax
        ret
.L10:
        mov     eax, 1
        ret

Kind of interesting is that

bool raw2(std::array<S, 11>& s) {
    auto iter = std::ranges::begin(s);
    const auto end = std::ranges::end(s);
    for (; iter != end; ++iter) {
        if (iter->I == 5) {
            break;
        }
    }

    return iter != end;
}

is even better:

"raw2(std::array<S, 11ul>&)":
        lea     rax, [rdi+44]
.L14:
        cmp     DWORD PTR [rdi], 5
        je      .L13
        add     rdi, 4
        cmp     rax, rdi
        jne     .L14
.L13:
        cmp     rax, rdi
        setne   al
        ret

With -O3 there's a similar mismatch, here a godbolt link showing trunk and 13.2
GCC: https://godbolt.org/z/4bz5zP5Kq

Using quickbench (which has only GCC 13.2) I get, unit in factor, smaller is
better.

-O   Algo Raw1 Raw2
---  ---- ---- ----    
-O0   4.3  2.0  1.0
-O1   1.0  1.0  1.0
-O2   1.9  1.5  1.0
-O3   9.9  1.0  8.8
-Og   1.6  1.3  1.0

Ideally all of those would produce the same assembly.

This is the code for the benchmark: https://godbolt.org/z/9ofYEsc8b

Reply via email to