On Mon, Sep 09, 2013 at 07:02:52PM +0100, Marc Glisse wrote:
> On Mon, 9 Sep 2013, Vidya Praveen wrote:
> 
> > Hello,
> >
> > This post details some thoughts on an enhancement to the vectorizer that
> > could take advantage of the SIMD instructions that allows indexed element
> > as an operand thus reducing the need for duplication and possibly improve
> > reuse of previously loaded data.
> >
> > Appreciate your opinion on this.
> >
> > ---
> >
> > A phrase like this:
> >
> > for(i=0;i<4;i++)
> >   a[i] = b[i] <op> c[2];
> >
> > is usually vectorized as:
> >
> >  va:V4SI = a[0:3]
> >  vb:V4SI = b[0:3]
> >  t = c[2]
> >  vc:V4SI = { t, t, t, t } // typically expanded as vec_duplicate at vec_init
> >  ...
> >  va:V4SI = vb:V4SI <op> vc:V4SI
> >
> > But this could be simplified further if a target has instructions that 
> > support
> > indexed element as a parameter. For example an instruction like this:
> >
> >  mul v0.4s, v1.4s, v2.4s[2]
> >
> > can perform multiplication of each element of v2.4s with the third element 
> > of
> > v2.4s (specified as v2.4s[2]) and store the results in the corresponding
> > elements of v0.4s.
> >
> > For this to happen, vectorizer needs to understand this idiom and treat the
> > operand c[2] specially (and by taking in to consideration if the machine
> > supports indexed element as an operand for <op> through a target hook or 
> > macro)
> > and consider this as vectorizable statement without having to duplicate the
> > elements explicitly.
> >
> > There are fews ways this could be represented at gimple:
> >
> >  ...
> >  va:V4SI = vb:V4SI <op> VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
> >  ...
> >
> > or by allowing a vectorizer treat an indexed element as a valid operand in a
> > vectorizable statement:
> 
> Might as well allow any scalar then...

Yes, I had given an example below.

> 
> >  ...
> >  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 2)
> >  ...
> >
> > For the sake of explanation, the above two representations assumes that
> > c[0:3] is loaded in vc for some other use and reused here. But when c[2] is 
> > the
> > only use of 'c' then it may be safer to just load one element and use it 
> > like
> > this:
> >
> >  vc:V4SI[0] = c[2]
> >  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
> >
> > This could also mean that expressions involving scalar could be treated
> > similarly. For example,
> >
> >  for(i=0;i<4;i++)
> >    a[i] = b[i] <op> c
> >
> > could be vectorized as:
> >
> >  vc:V4SI[0] = c
> >  va:V4SI = vb:V4SI <op> VEC_SELECT_EXPR (vc:V4SI 0)
> >
> > Such a change would also require new standard pattern names to be defined 
> > for
> > each <op>.
> >
> > Alternatively, having something like this:
> >
> >  ...
> >  vt:V4SI = VEC_DUPLICATE_EXPR (VEC_SELECT_EXPR (vc:V4SI 2))
> >  va:V4SI = vb:V4SI <op> vt:V4SI
> >  ...
> >
> > would remove the need to introduce several new standard pattern names but 
> > have
> > just one to represent vec_duplicate(vec_select()) but ofcourse this will 
> > expect
> > the target to have combiner patterns.
> 
> The cost estimation wouldn't be very good, but aren't combine patterns 
> enough for the whole thing? Don't you model your mul instruction as:
> 
> (mult:V4SI
>    (match_operand:V4SI)
>    (vec_duplicate:V4SI (vec_select:SI (match_operand:V4SI))))
> 
> anyway? Seems that combine should be able to handle it. What currently 
> happens that we fail to generate the right instruction?

At vec_init, I can recognize an idiom in order to generate vec_duplicate but
I can't really insist on the single lane load.. something like:

vc:V4SI[0] = c
vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0)
va:V4SI = vb:V4SI <op> vt:V4SI

Or is there any other way to do this?

Cheers
VP

> 
> In gimple, we already have BIT_FIELD_REF for vec_select and CONSTRUCTOR 
> for vec_duplicate, adding new nodes is always painful.
> 
> > This enhancement could possibly help further optimizing larger scenarios 
> > such
> > as linear systems.
> >
> > Regards
> > VP
> 
> -- 
> Marc Glisse
>


Reply via email to