On Mon, Sep 09, 2013 at 07:02:52PM +0100, Marc Glisse wrote: > On Mon, 9 Sep 2013, Vidya Praveen wrote: > > > Hello, > > > > This post details some thoughts on an enhancement to the vectorizer that > > could take advantage of the SIMD instructions that allows indexed element > > as an operand thus reducing the need for duplication and possibly improve > > reuse of previously loaded data. > > > > Appreciate your opinion on this. > > > > --- > > > > A phrase like this: > > > > for(i=0;i<4;i++) > > a[i] = b[i] <op> c[2]; > > > > is usually vectorized as: > > > > va:V4SI = a[0:3] > > vb:V4SI = b[0:3] > > t = c[2] > > vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init > > ... > > va:V4SI = vb:V4SI <op> vc:V4SI > > > > But this could be simplified further if a target has instructions that > > support > > indexed element as a parameter. For example an instruction like this: > > > > mul v0.4s, v1.4s, v2.4s[2] > > > > can perform multiplication of each element of v2.4s with the third element > > of > > v2.4s (specified as v2.4s[2]) and store the results in the corresponding > > elements of v0.4s. > > > > For this to happen, vectorizer needs to understand this idiom and treat the > > operand c[2] specially (and by taking in to consideration if the machine > > supports indexed element as an operand for <op> through a target hook or > > macro) > > and consider this as vectorizable statement without having to duplicate the > > elements explicitly. > > > > There are fews ways this could be represented at gimple: > > > > ... > > va:V4SI = vb:V4SI <op> VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) > > ... > > > > or by allowing a vectorizer treat an indexed element as a valid operand in a > > vectorizable statement: > > Might as well allow any scalar then...
Yes, I had given an example below. > > > ... > > va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 2) > > ... > > > > For the sake of explanation, the above two representations assumes that > > c[0:3] is loaded in vc for some other use and reused here. But when c[2] is > > the > > only use of 'c' then it may be safer to just load one element and use it > > like > > this: > > > > vc:V4SI[0] = c[2] > > va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0) > > > > This could also mean that expressions involving scalar could be treated > > similarly. For example, > > > > for(i=0;i<4;i++) > > a[i] = b[i] <op> c > > > > could be vectorized as: > > > > vc:V4SI[0] = c > > va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0) > > > > Such a change would also require new standard pattern names to be defined > > for > > each <op>. > > > > Alternatively, having something like this: > > > > ... > > vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2)) > > va:V4SI = vb:V4SI <op> vt:V4SI > > ... > > > > would remove the need to introduce several new standard pattern names but > > have > > just one to represent vec_duplicate(vec_select()) but ofcourse this will > > expect > > the target to have combiner patterns. > > The cost estimation wouldn't be very good, but aren't combine patterns > enough for the whole thing? Don't you model your mul instruction as: > > (mult:V4SI > (match_operand:V4SI) > (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI)))) > > anyway? Seems that combine should be able to handle it. What currently > happens that we fail to generate the right instruction? At vec_init, I can recognize an idiom in order to generate vec_duplicate but I can't really insist on the single lane load.. something like: vc:V4SI[0] = c vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) va:V4SI = vb:V4SI <op> vt:V4SI Or is there any other way to do this? Cheers VP > > In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR > for vec_duplicate, adding new nodes is always painful. > > > This enhancement could possibly help further optimizing larger scenarios > > such > > as linear systems. > > > > Regards > > VP > > -- > Marc Glisse >