On Thu, Sep 15, 2011 at 8:05 PM, Richard Henderson <r...@redhat.com> wrote: >> +The elements of the input vectors are numbered from left to right across >> +one or both of the vectors. Each element in the mask specifies a number >> +of element from the input vector(s). Consider the following example. > > It would be more preferable to talk about the memory ordering of the > elements rather than "left" and "right" which are ambiguous at best. > >> + if (TREE_CODE (mask) == VECTOR_CST) >> + { >> + tree m_type, call; >> + tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), >> &m_type); >> + /*rtx t;*/ > > Leftover crap.
Fixed. >> + >> + if (!fn) >> + goto vshuffle; >> + >> + if (m_type != TREE_TYPE (TREE_TYPE (mask))) >> + { >> + int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)); >> + tree cvt = build_vector_type (m_type, units); >> + mask = fold_convert (cvt, mask); >> + } >> + >> + call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), >> fn); >> + call = build_call_nary (type /* ? */, call, 3, v0, v1, mask); >> + >> + return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, >> NULL); >> + } >> + >> +vshuffle: >> + gcc_assert (operand_equal_p (v0, v1, 0)); > > Why can't a non-constant shuffle have different V0 and V1? That seems > like a direct violation of the documentation, and any sort of usefulness. Ok, I agree. The reason why this assert is here is that noone in the middle-end generates the code that does not meet this assert. In principle we definitely want to support it in the upcoming patches, but it would be nice to start with a simple thing. > Also, while I'm ok with the use of builtin_vec_perm here in the short > term, I think that in the long term we should simply force the named > pattern to handle constants. Then the vectorizer can simply use the > predicate and the tree code and we can drop the large redundancy of > builtins with different argument types. > > Indeed, once this patch is applied, I think that ought to be the very > next task in this domain. > >> +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks> > > Typo in "mask". > >> + foreach i in length (mask): >> + A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]] > > Surely it's v1[mask[i] - length]. > >> + if (TREE_CODE (vect) == VECTOR_CST) >> + { >> + unsigned i; > > Indentation is off all through this function. Fixed. >> + mask = gen_rtx_AND (maskmode, mask, mm); >> + >> + /* Convert mask to vector of chars. */ >> + mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0); >> + mask = force_reg (V16QImode, mask); > > Why are you using force_reg to do all the dirty work? Seems to > me this should be using expand_normal. All throughout this > function. That would also avoid the need for all of the extra > force_reg stuff that ought not be there for -O0. I don't really understand this. As far as I know, expand_normal "converts" tree to rtx. All my computations are happening at the level of rtx and force_reg is needed just to bring an rtx expression to the register of the correct mode. If I am missing something, could you give an example how can I use expand_normal instead of force_reg in this particular code. > I also see that you're not even attempting to use xop_pperm. As I said, I am happy to experiment with the cases v0 != v1 in the upcoming patches. Let's just start with a simple thing and see what kind of issues/problems it would bring. > Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1? My personal feeling is that it may be the case with v0 != v1, that it would be more efficient to perform piecewise shuffling rather than bitwise dances around the masks. > And give the vshuffle named pattern the wrong number of arguments? Ok, If I'll make vshuffle to accept only two arguments -- vector and mask, would it be ok? > It's certainly possible to handle it, though it takes a few more steps, > and might well be more efficient as a libgcc function rather than inline. I don't really understand why it could be more efficient. I thought that inline gives more chances to the final RTL optimisation. > > > r~ > > > Thanks, Artem.