On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote: > However, the double pattern is completely broken. This cannot go in.
[snip] > It is unacceptable to have to do the inner loop doing a load, vector add, and > store in the loop. Before the patch, the final reduction used *vsx_reduc_splus_v2df; after the patch, it is *vsx_reduc_plus_v2df_scalar. The former does a vector add, the latter a float add. And it uses the same pseudoregister for the accumulator throughout. IRA decides a register is more expensive than memory for this, I suppose because it wants both V2DF and DF? It doesn't seem to like the subreg very much. The new code does look nicer otherwise :-) Segher