On Mon, 30 Sep 2013, Vidya Praveen wrote: > On Fri, Sep 27, 2013 at 04:19:45PM +0100, Vidya Praveen wrote: > > On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote: > > [...] > > > > > I can't really insist on the single lane load.. something like: > > > > > > > > > > vc:V4SI[0] = c > > > > > vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) > > > > > va:V4SI = vb:V4SI <op> vt:V4SI > > > > > > > > > > Or is there any other way to do this? > > > > > > > > Can you elaborate on "I can't really insist on the single lane load"? > > > > What's the single lane load in your example? > > > > > > Loading just one lane of the vector like this: > > > > > > vc:V4SI[0] = c // from the above scalar example > > > > > > or > > > > > > vc:V4SI[0] = c[2] > > > > > > is what I meant by single lane load. In this example: > > > > > > t = c[2] > > > ... > > > vb:v4si = b[0:3] > > > vc:v4si = { t, t, t, t } > > > va:v4si = vb:v4si <op> vc:v4si > > > > > > If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot > > > insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could > > > be > > > seen as vec_select:SI (vect_t 0) ). > > > > > > > I'd expect the instruction > > > > pattern as quoted to just work (and I hope we expand an uniform > > > > constructor { a, a, a, a } properly using vec_duplicate). > > > > > > As much as I went through the code, this is only done using vect_init. It > > > is > > > not expanded as vec_duplicate from, for example, store_constructor() of > > > expr.c > > > > Do you see any issues if we expand such constructor as vec_duplicate > > directly > > instead of going through vect_init way? > > Sorry, that was a bad question. > > But here's what I would like to propose as a first step. Please tell me if > this > is acceptable or if it makes sense: > > - Introduce standard pattern names > > "vmulim4" - vector muliply with second operand as indexed operand > > Example: > > (define_insn "vmuliv4si4" > [set (match_operand:V4SI 0 "register_operand") > (mul:V4SI (match_operand:V4SI 1 "register_operand") > (vec_duplicate:V4SI > (vec_select:SI > (match_operand:V4SI 2 "register_operand") > (match_operand:V4SI 3 "immediate_operand)))))] > ... > )
We could factor this with providing a standard pattern name for (define_insn "vdupi<mode>" [set (match_operand:<mode> 0 "register_operand") (vec_duplicate:<mode> (vec_select:<scalarmode> (match_operand:<mode> 1 "register_operand") (match_operand:SI 2 "immediate_operand))))] (you use V4SI for the immediate? Ideally vdupi has another custom mode for the vector index). Note that this factored pattern is already available as vec_perm_const! It is simply (vec_perm_const:V4SI <source> <source> <immediate-selector>). Which means that on the GIMPLE level we should try to combine el_4 = BIT_FIELD_REF <v_3, ...>; v_5 = { el_4, el_4, ... }; into v_5 = VEC_PERM_EXPR <v_3, v_3, ...>; which it should already do with simplify_permutation. But I'm not sure what you are after at then end ;) Richard.