https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103071

            Bug ID: 103071
           Summary: Missed optimization for symmetric subset: (a & b) == a
                    || (a & b) == b
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: steinar+gcc at gunderson dot no
  Target Milestone: ---

This is a bit of a long shot, but I'll file it anyway :-)

I have this function in a hot path (of course, in the real project, it's
inlined):

#include <stdbool.h>
#include <stdint.h>

void foo();
void bar();

void EitherIsSubset(uint64_t v0, uint64_t v1) {
        if ((v0 & v1) == v0 || (v0 & v1) == v1) {
                foo();
        } else {
                bar();
        }
}

It is intended to treat v0 and v1 as bit sets, and then test whether either v0
or v1 is a subset of each other (or that they are equal). (An equivalent
formulation happens to be replacing & with |.)

GCC compiles (with -O2, x86-64) this to:

EitherIsSubset:
        movq    %rdi, %rax
        andq    %rsi, %rax
        cmpq    %rsi, %rax
        je      .L4
        cmpq    %rdi, %rax
        je      .L4
        xorl    %eax, %eax
        jmp     bar@PLT
.L4:
        xorl    %eax, %eax
        jmp     foo@PLT

This is pretty straight-forward, but feels like it's using two (relatively
hard-to-predict) branches where it should be possible to deal with one. And
indeed, GNU superopt (!) found this amazing sequence instead, with v0 in eax
and v1 in edx (this is, of course, trivially portable to 64-bit):

14:     mov     %edx,%ecx
        or      %eax,%edx
        cmp     %edx,%eax
        sbb     %ebx,%ebx
        sbb     %ecx,%edx
        adc     $1,%ebx

I can't claim to understand fully what it does, but after this, ebx contains
either 0 or 1 with the right answer, and one would assume that after this, the
zero flag is also usable to branch on (leaving us with one branch instead of
two, in all).

Is it possible to teach GCC this sequence? I tried using it as inline
assembler, and while it works, it seems it becomes suboptimal and slower,
because I can't return a condition code (so I get a redundant test):

inline bool EitherIsSubsetAsm(uint64_t v0, uint64_t v1) {
        uint64_t tmp = v0 | v1;
        bool result;
        asm("cmp %1, %2 ; sbb %0, %0 ; sbb %3, %1 ; adc $1, %0"
            : "=r"(result), "+&r"(tmp)
            : "r"(v0), "r"(v1)
            : "cc");
        return result;
}

void EitherIsSubset(uint64_t v0, uint64_t v1) {
        if (EitherIsSubsetAsm(v0, v1)) {
                foo();
        } else {
                bar();
        }
}

Reply via email to