It's nice to use the processors vector arithmetic to good effect, but
it's all for naught when
there are too many moves from/to general registers cluttering up the
loop. With a
double-vector reduction variable, the standard final reduction code got
so awkward that
the register allocator decided that the reduction variable must live in
general purpose
registers, not only after the loop, but across the loop patch.
Splitting the reduction to force the first step to be done as a vector
operation
seemed the obvious solution. The hook was called, but the vectorizer still
generated the vanilla final reduction code. It turns out that the
reduction splitting
was calculated, but the result not used, and the calculation started anew.
The attached patch fixes this.
bootstrapped and regression tested on x86_64-pc-linux-gnu .
2018-10-31 Joern Rennecke <joern.renne...@riscy-ip.com>
* tree-vect-loop.c (vect_create_epilog_for_reduction):
If we split the reduction, use the result in Case 3 too.
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c (revision 266008)
+++ tree-vect-loop.c (working copy)
@@ -5139,6 +5139,7 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
while (sz > sz1)
{
gcc_assert (!slp_reduc);
+ gcc_assert (new_phis.length () == 1);
sz /= 2;
vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
@@ -5301,7 +5302,12 @@ vect_create_epilog_for_reduction (vec<tree> vect_d
FOR_EACH_VEC_ELT (new_phis, i, new_phi)
{
int bit_offset;
- if (gimple_code (new_phi) == GIMPLE_PHI)
+
+ /* If we did reduction splitting, make sure to use the result. */
+ if (!slp_reduc && new_phis.length () == 1
+ && new_temp != new_phi_result)
+ vec_temp = new_temp;
+ else if (gimple_code (new_phi) == GIMPLE_PHI)
vec_temp = PHI_RESULT (new_phi);
else
vec_temp = gimple_assign_lhs (new_phi);