> +The elements of the input vectors are numbered from left to right across
> +one or both of the vectors. Each element in the mask specifies a number
> +of element from the input vector(s). Consider the following example.

It would be more preferable to talk about the memory ordering of the
elements rather than "left" and "right" which are ambiguous at best.

> +  if (TREE_CODE (mask) == VECTOR_CST)
> +    {
> +      tree m_type, call;
> +      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type);
> +      /*rtx t;*/

Leftover crap.

> +
> +      if (!fn)
> +     goto vshuffle;
> +
> +      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
> +     {
> +       int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
> +       tree cvt = build_vector_type (m_type, units);
> +       mask = fold_convert (cvt, mask);
> +     }
> +
> +      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), 
> fn);
> +      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
> +
> +      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, 
> NULL);
> +    }
> +
> +vshuffle:
> +  gcc_assert (operand_equal_p (v0, v1, 0));

Why can't a non-constant shuffle have different V0 and V1?  That seems
like a direct violation of the documentation, and any sort of usefulness.

Also, while I'm ok with the use of builtin_vec_perm here in the short
term, I think that in the long term we should simply force the named
pattern to handle constants.  Then the vectorizer can simply use the
predicate and the tree code and we can drop the large redundancy of
builtins with different argument types.

Indeed, once this patch is applied, I think that ought to be the very
next task in this domain.

> +/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPR<v0, v1, maks>

Typo in "mask".

> +   foreach i in length (mask):
> +     A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]]

Surely it's v1[mask[i] - length].

> +      if (TREE_CODE (vect) == VECTOR_CST)
> +        {
> +            unsigned i;

Indentation is off all through this function.

> +  mask = gen_rtx_AND (maskmode, mask, mm);
> +  
> +  /* Convert mask to vector of chars.  */
> +  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
> +  mask = force_reg (V16QImode, mask);

Why are you using force_reg to do all the dirty work?  Seems to
me this should be using expand_normal.  All throughout this 
function.  That would also avoid the need for all of the extra
force_reg stuff that ought not be there for -O0.

I also see that you're not even attempting to use xop_pperm.

Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?
And give the vshuffle named pattern the wrong number of arguments?
It's certainly possible to handle it, though it takes a few more steps,
and might well be more efficient as a libgcc function rather than inline.


r~


Reply via email to