https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90483

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <[email protected]>:

https://gcc.gnu.org/g:a87cdfd2ca3260126d3c75ddfb5cdea6e721d8d0

commit r17-597-ga87cdfd2ca3260126d3c75ddfb5cdea6e721d8d0
Author: Roger Sayle <[email protected]>
Date:   Tue May 19 07:29:08 2026 -0400

    i386: Optimize ptestz(x,-1) as ptestz(x,x) on x86

    This patch, inspired by PR target/90483 and libstdc++/118416, implements
    some RTL expansion-time simplifications of ptest. A common idiom for
    testing a vector against zero is to use ptestz(mask,-1).  Alas the code
    generated for this is suboptimal, requiring materialization of an all_ones
    vector.  Given that ptestz(x,y) is defined as (x & y) == 0, an equivalent
    form is ptestz(mask,mask), saving an instruction (if ~0 isn't available).

    Consider the function:

    typedef long long v2di __attribute__ ((__vector_size__ (16)));

    int foo (v2di x)
    {
      return __builtin_ia32_ptestz128(x,~(v2di){0,0});
    }

    with -O2 -mavx2, GCC currently generates:

    foo:    vpcmpeqd        %xmm1, %xmm1, %xmm1
            xorl    %eax, %eax
            vptest  %xmm1, %xmm0
            sete    %al
            ret

    with this patch, it now generates:

    foo:    xorl    %eax, %eax
            vptest  %xmm0, %xmm0
            sete    %al
            ret

    2026-05-19  Roger Sayle  <[email protected]>

    gcc/ChangeLog
            PR target/90483
            PR libstdc++/118416
            * config/i386/i386-expand.cc (ix86_expand_sse_ptest):  Refactor
            with optimizations for PTESTZ*, PTESTC* and PTESTNZC*, including
            transforming ptestz(x,-1) into ptestz(x,x).

    gcc/testsuite/ChangeLog
            PR target/90483
            PR libstdc++/118416
            * gcc.target/i386/sse4_1-ptest-8.c: New test case.
            * gcc.target/i386/sse4_1-ptest-9.c: Likewise.

Reply via email to