https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123238

--- Comment #12 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <[email protected]>:

https://gcc.gnu.org/g:ae80ad655d514d7c275b21f5d1ad155793ae4cc0

commit r17-1923-gae80ad655d514d7c275b21f5d1ad155793ae4cc0
Author: Roger Sayle <[email protected]>
Date:   Fri Jun 26 16:20:05 2026 +0100

    i386: ix86_expand_sse_movcc improvements

    This patch implements Alexander Monakov's suggestion from PR 123238.
    Traditionally, the x86_64 backend implements VCOND_MASK using a three
    instruction sequence of pand, pandn and por (requiring three registers),
    however when op_true and op_false are both constant vectors, this can
    be done using just two instructions, pand and pxor (requiring only two
    registers).  This requires delaying forcing const_vector operands to
    memory (the constant pool) as late as possible, including changing the
    predicates on the define_expand patterns that call ix86_expand_sse_movcc
    to (consistently) accept vector_or_const_vector_operand.

    void f(char c[])
    {
        for (int i = 0; i < 8; i++)
            c[i] = c[i] ? 'a' : 'c';
    }

    Before with -O2 (12 instructions):
    f:      movq    (%rdi), %xmm0
            pxor    %xmm1, %xmm1
            movabsq $7016996765293437281, %rdx  // {'a','a','a'...}
            movabsq $7161677110969590627, %rax  // {'c','c','c'...}
            movq    %rdx, %xmm2
            pcmpeqb %xmm1, %xmm0
            movq    %rax, %xmm1
            pand    %xmm0, %xmm1
            pandn   %xmm2, %xmm0
            por     %xmm1, %xmm0
            movq    %xmm0, (%rdi)
            ret

    After with -O2 (11 instructions):
    f:      movq    (%rdi), %xmm0
            pxor    %xmm1, %xmm1
            movabsq $144680345676153346, %rdx  // {2,2,2...}
            movabsq $7016996765293437281, %rax  // {'a','a','a'...}
            pcmpeqb %xmm1, %xmm0
            movq    %rdx, %xmm1
            pand    %xmm1, %xmm0
            movq    %rax, %xmm1
            pxor    %xmm1, %xmm0
            movq    %xmm0, (%rdi)
            ret

    2026-06-26  Roger Sayle  <[email protected]>
                Hongtao Liu  <[email protected]>

    gcc/ChangeLog
            PR target/123238
            * config/i386/i386-expand.cc: Delay calling force_reg on
            op_true and op_false.  Generate an AND then XOR sequence
            if op_true and op_false are both CONST_VECTOR_P.
            * config/i386/mmx.md (vcond_mask_<mode>v4hi): Allow operands
            1 and 2 to be vector_or_const_vector_operand.
            (vcond_mask_<mode>v2hi): Likewise.
            (vcond_mask_<mode><mmxintvecmodelower>): Likewise.
            (vcond_mask_<mode><mode>): Likewise.
            * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
            Likewise.
            (vcond_mask_<mode><sseintvecmodelower>): Likewise.
            (vcond_mask_v1tiv1ti): Likewise.
            (vcond_mask_<mode><sseintvecmodelower>): Likewise.
            (vcond_mask_<mode><sseintvecmodelower>): Likewise.
            * config/i386/predicates.md (vector_or_0_or_1s_operand): Delete
            predicate with no remaining uses.

    gcc/testsuite/ChangeLog
            PR target/123238
            * gcc.target/i386/pr123238-2.c: New test case.

Reply via email to