------- Comment #3 from jamborm at gcc dot gnu dot org  2009-05-21 16:02 -------
With he new SRA, the optimized dump looks like:

  D.6886_10 = {1, 1, 1, 1};
  D.6887_11 = VIEW_CONVERT_EXPR<vector long long int>(D.6886_10);
  D.6893_12 = VIEW_CONVERT_EXPR<vector int>(D.6887_11);
  D.6891_14 = __builtin_ia32_pcmpeqd128 (D.6893_12, D.6893_12);
  D.6890_15 = VIEW_CONVERT_EXPR<vector long long int>(D.6891_14);
  D.6897_16 = VIEW_CONVERT_EXPR<vector char>(D.6890_15);
  D.6896_17 = __builtin_ia32_pmovmskb128 (D.6897_16);
  D.6933_21 = D.6896_17 != 65535;
  return D.6933_21;


x is completely gone.

The (relevant) assembly output is 

main:
        movdqa  .LC0, %xmm0
        pcmpeqd %xmm0, %xmm0
        pmovmskb        %xmm0, %eax
        cmpl    $65535, %eax
        pushl   %ebp
        setne   %al
        movl    %esp, %ebp
        movzbl  %al, %eax
        popl    %ebp
        ret

So  even though  I  don't  really understand  the  SSE instructions  I
believe the  new SRA does indeed  help.  I'll add  a testcase checking
that x vanishes to the patch series as I am finalizing the final patch
set now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122

Reply via email to