------- Comment #3 from jamborm at gcc dot gnu dot org 2009-05-21 16:02 ------- With he new SRA, the optimized dump looks like:
D.6886_10 = {1, 1, 1, 1}; D.6887_11 = VIEW_CONVERT_EXPR<vector long long int>(D.6886_10); D.6893_12 = VIEW_CONVERT_EXPR<vector int>(D.6887_11); D.6891_14 = __builtin_ia32_pcmpeqd128 (D.6893_12, D.6893_12); D.6890_15 = VIEW_CONVERT_EXPR<vector long long int>(D.6891_14); D.6897_16 = VIEW_CONVERT_EXPR<vector char>(D.6890_15); D.6896_17 = __builtin_ia32_pmovmskb128 (D.6897_16); D.6933_21 = D.6896_17 != 65535; return D.6933_21; x is completely gone. The (relevant) assembly output is main: movdqa .LC0, %xmm0 pcmpeqd %xmm0, %xmm0 pmovmskb %xmm0, %eax cmpl $65535, %eax pushl %ebp setne %al movl %esp, %ebp movzbl %al, %eax popl %ebp ret So even though I don't really understand the SSE instructions I believe the new SRA does indeed help. I'll add a testcase checking that x vanishes to the patch series as I am finalizing the final patch set now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40122