http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56876



             Bug #: 56876

           Summary: Combine does not invent new moves

    Classification: Unclassified

           Product: gcc

           Version: 4.9.0

            Status: UNCONFIRMED

          Keywords: missed-optimization

          Severity: enhancement

          Priority: P3

         Component: rtl-optimization

        AssignedTo: unassig...@gcc.gnu.org

        ReportedBy: gli...@gcc.gnu.org

            Target: x86_64-linux-gnu





Hello,



I am looking at this testcase:



typedef unsigned long long vec __attribute__((vector_size(16)));

vec g;

vec f1(vec a, vec b){

  return ~a&b;

}

vec f2(vec a, vec b){

  return ~g&b;

}



which compiles to:



f1:

    pandn    %xmm1, %xmm0



f2:

    pcmpeqd    %xmm0, %xmm0

    pxor    g(%rip), %xmm0

    pand    %xmm1, %xmm0



whereas I would like to get, like I do with the _mm_andnot_si128 builtin:



    movdqa    g(%rip), %xmm0

    pandn    %xmm1, %xmm0



It seems that combine cannot match the pandn pattern because the first argument

is a memory load and not a register. In this case, it would be better if it

emitted a move to put it in a register so it can match, instead of giving up. I

don't know if there is a good way to characterize such situations where an

extra move is worth it.

Reply via email to