"H. J. Lu" <[EMAIL PROTECTED]> wrote on 24/04/2007 01:03:25:
...
>
> There are
>
> [EMAIL PROTECTED] vect]$ cat pmovzxbw.c
> typedef unsigned char vec_t;
> typedef unsigned short vecx_t;
>
> in
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667
>

By the way, this PR says "Integer externsions aren't vectorized" - but I
think the testcase you are referring to does get vectorized, only not as
efficiently as you would want it to (right?).

> > * Also I wonder how the gcc code looks like when complete unrolling is
> > applied (did you use -funoll-loops?). (like the point above, this is
just
> > so that we aompre apples w apples).
>
> It is similar. I am enclosing it at the end.
>

thanks

> > * I don't entirely follow the code that gcc generates
> > what's that for exactly?:
> >       pxor    %xmm2, %xmm2
> >       movdqa  %xmm2, %xmm1
> >       pcmpgtb %xmm0, %xmm1
> > Is this part of the vec_unpack_hi, and if so - I wonder if there's a
better
> > way to model the vec_unpack_hi using the new sse4 instructions?
>
> That is for signed extension.  I tried to model vec_unpack_hi with SSE4.
It
> isn't easy to move N/2 high elemenets to N/2 low elemenets.

just curious - why is it difficult? (couldn't you use a psrldq? is it too
expensive?)

dorit

> The best way
> to do it is to combine
>
>    movdqa  x(%rip), %xmm9
>         pmovsxbw   %xmm9, %xmm11
>
> into
>
>    pmovsxbw x(%rip),%xmm11
>
> and repeat it for N/2 elements. Of cause, we should only do it if
> vec_unpack_lo is a single instructions.
>
> However, I think we need a more general approach based on the number
> of elements in the resulting vector to handle, vec_extend, like,
>
> V4QI -> V4SI
> V2QI -> V2DI
> V2HI -> V2DI
>
> They should be independent of vec_unpack.
>
>
> H.J.
> ----
>    .file   "pmovsxbw.c"
>    .text
>    .p2align 4,,15
> .globl foo
>    .type   foo, @function
> foo:
>    pxor   %xmm2, %xmm2
>    movdqa   x(%rip), %xmm9
>    movdqa   x+16(%rip), %xmm6
>    movdqa   %xmm2, %xmm10
>    movdqa   %xmm2, %xmm7
>    movdqa   x+32(%rip), %xmm3
>    movdqa   %xmm2, %xmm4
>    pmovsxbw   %xmm9, %xmm11
>    movdqa   x+48(%rip), %xmm0
>    pcmpgtb   %xmm9, %xmm10
>    pcmpgtb   %xmm6, %xmm7
>    pmovsxbw   %xmm6, %xmm8
>    pcmpgtb   %xmm3, %xmm4
>    pmovsxbw   %xmm3, %xmm5
>    pcmpgtb   %xmm0, %xmm2
>    pmovsxbw   %xmm0, %xmm1
>    punpckhbw   %xmm10, %xmm9
>    punpckhbw   %xmm7, %xmm6
>    punpckhbw   %xmm4, %xmm3
>    punpckhbw   %xmm2, %xmm0
>    movdqa   %xmm11, y(%rip)
>    movdqa   %xmm9, y+16(%rip)
>    movdqa   %xmm8, y+32(%rip)
>    movdqa   %xmm6, y+48(%rip)
>    movdqa   %xmm5, y+64(%rip)
>    movdqa   %xmm3, y+80(%rip)
>    movdqa   %xmm1, y+96(%rip)
>    movdqa   %xmm0, y+112(%rip)
>    ret
>    .size   foo, .-foo
>    .ident   "GCC: (GNU) 4.3.0 20070423 (experimental) [trunk revision
124056]"
>    .section   .note.GNU-stack,"",@progbits

Reply via email to