On Tue, Feb 17, 2026 at 3:51 PM Roger Sayle <[email protected]> wrote:
>
>
> Perhaps the easiest way to demonstrate that tree-ssa's isel pass isn't a
> replacement for GCC's RTL expansion pass is with a concrete example where
> ISEL hurt's performance (on x86_64), which should be unsurprising given
> that gimple-isel.cc doesn't once mention rtx_costs (or cost).
>
> Consider the example:
>
> void foo(char c[])
> {
>     for (int i = 0; i < 16; i++)
>         c[i] = c[i] != 'a';
> }
>
> currently when compiled with -O2 -mavx2 this generates:
>
> foo:    movl    $1633771873, %eax
>         vpxor   %xmm1, %xmm1, %xmm1
>         vmovd   %eax, %xmm0
>         vpbroadcastd    %xmm0, %xmm0
>         vpcmpeqb        (%rdi), %xmm0, %xmm0
>         vpcmpeqb        %xmm1, %xmm0, %xmm0
>         vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpabsb  %xmm1, %xmm1
>         vpand   %xmm1, %xmm0, %xmm0
>         vmovdqu %xmm0, (%rdi)
>         ret
>
> with the attached patch, when applied on top of the previously
> posted https://gcc.gnu.org/pipermail/gcc-patches/2026-February/708351.html
> we generate the improved:
>
>         movl    $1633771873, %eax
>         vpxor   %xmm1, %xmm1, %xmm1
>         vmovd   %eax, %xmm0
>         vpbroadcastd    %xmm0, %xmm0
>         vpcmpeqb        (%rdi), %xmm0, %xmm0
>         vpcmpeqb        %xmm1, %xmm0, %xmm0
>         vpabsb  %xmm0, %xmm0
>         vmovdqu %xmm0, (%rdi)
>         ret
>
> The difference is that to convert a vector of 0 and -1 values to a
> vector of 0 and 1 values, we don't use AND as in "cond & {1,1,1,1...}"
> but can use (in this case) ABS or a vector logical right shift when
> available.  Clearly using vpabsb is faster, as the materialization of
> the vector "{1,1,1,1,1...}" already uses vpabsb, before the vpand.
>
> Unfortunately, the i386-expand.cc change (which understands the various
> instruction availabilities and implicit costs) on its own is insufficent,
> because isel's gimple_expand_vec_cond_expr blindly lowers IFN_VCOND_MASK
> without letting expand or the target backend decide on the best possible
> implementation.  The patch removes these premature optimizations (the
> root of all evil).  Aside, I suspect that one cause for confusion is the
> poor naming; the "isel" pass has little to do with "instruction selection",
> so perhaps internal-fn-lowering or similar would be better.  Even the
> comment at the top of gimple-isel describes it as "Schedule GIMPLE
> vector statements".  Perhaps once tree-ssa has a way of querying the
> backend for instruction costs things will improve, but until then RTL
> expansion makes far more sense.
>
> This patch has been tested (on top of the patch mentioned above) on
> x86_64-pc-linux-gnu with make bootstrap and make -k check, both with
> and without --target_board=unix{-m32} with no new failures.
> Thoughts?  (Both) Ok for stage1?
>
>
> 2026-02-17  Roger Sayle  <[email protected]>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Optimize
>         case where op_false is a vector of zeros, and op_true is a vector
>         of ones, using either vector logical right shifts or vector ABS.
>         * gimple-isel.cc (gimple_expand_vec_cond_expr): Always lower
>         VEC_COND_EXPR to IFN_VCOND_MASK.  Remove the "optimization" of
>         special cases as these are best performed (by the backend) during
>         RTL expansion.

Removing this "optimization" means you need to add the optimizations
to the other backends (at least RISCV, aarch64 and PowerPC) and it
will definitely regress aarch64 and powerpc testcases (it was added
for powerpc and then used also for aarch64 when I changed the
aarch64's pattern names to use andn).

Thanks,
Andrew

>
>
> Roger
> --
>

Reply via email to