On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote:
> However, the double pattern is completely broken.  This cannot go in.

[snip]

> It is unacceptable to have to do the inner loop doing a load, vector add, and
> store in the loop.

Before the patch, the final reduction used *vsx_reduc_splus_v2df; after
the patch, it is *vsx_reduc_plus_v2df_scalar.  The former does a vector
add, the latter a float add.  And it uses the same pseudoregister for the
accumulator throughout.  IRA decides a register is more expensive than
memory for this, I suppose because it wants both V2DF and DF?  It doesn't
seem to like the subreg very much.

The new code does look nicer otherwise :-)


Segher

Reply via email to