On Mon, 24 Aug 2020, Richard Biener via Gcc wrote:

> Whether or not the conditional branch sequence is faster depends on whether
> the branch is well-predicted which very much depends on the data you
> feed the isWhitespace function with but I guess since this is the
> c == ' ' test it _will_ be a well-predicted branch which means the
> conditional branch sequence will be usually faster.  The proposed
> change turns the control into a data dependence which constrains
> instruction scheduling and retirement.  Indeed a mispredicted branch
> will likely be more costly.

There's also the question how the caller is using the return value. In all
likelihood, the caller branches on it, so making isWhitespace branchless
just moves the misprediction cost to the caller.

On x86, we should be aiming to produce the BT instruction. GIMPLE reassoc
nicely transforms multiple branches into a bit test, but unfortunately it
uses right shift, while RTL matches for a left shift, but not right..
With hand-written code it's easy to make GCC produce BT as desired:

void is_ws_cb(unsigned char c, void f(void))
{
        unsigned long long mask = 1ll<<' ' | 1<<'\t' | 1<<'\r' | 1<<'\n';
        if (c <= 32 && (mask & (1ll<<c)))
            f();
}

        cmpb    $32, %dil
        ja      .L5
        movabsq $4294977024, %rax
        btq     %rdi, %rax
        jnc     .L5
        jmp     *%rsi
.L5:
        ret

In PR 96633 I also outline how an efficient branchless code could look like.

Alexander

Reply via email to