http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607

--- Comment #9 from Marc Glisse <marc.glisse at normalesup dot org> 2012-03-19 
18:29:50 UTC ---
(In reply to comment #8)
> I'm not very keen on having too many different routines, the more generic they
> are, the better.

Agreed, that was one of my concerns from the first message in this bug, but to
experiment it was easier to have separate functions.

> So IMHO e.g. the two insn sequence, vperm2[if]128 + some one
> insn shuffle could look like:
> 
> /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to expand
>    a vector permutation using two instructions, vperm2f128 resp.
>    vperm2i128 followed by any single in-lane permutation.  */

I haven't yet looked at it closely enough to understand what it does (those
functions are surprisingly confusing when you don't write them yourself), but
that looks interesting.

My first idea in order to make things more generic was to tentatively turn
__builtin_shuffle(x,m) into __builtin_shuffle(x,vperm2f128(x,x,33),mm) where mm
avoids any cross-lane. The 2-vector no-cross-lane shuffle should take at most 3
instructions in v4df or v8sf (I haven't checked if it works now) and that's
where most of the work would happen (instead of having many routines for
single-vector shuffles that almost all start with vperm2f128). Then you would
probably want to check how many instructions it used, since it could be more or
less than one of the few instruction sequences that don't start with
vperm2f128.

>From a quick look, it looks like you may be doing something even more
generic...

> This will handle e.g. vperm2f128 + {vshufpd,vblendpd,vunpcklpd,vunpckhpd} etc.

Cool!

Reply via email to