Hi,
This test case:
void foo(long long *d, int *f)
{
int i;
for (i=0; i< 16; i++)
{
d[i] = f[i];
}
}
when vectorized for big-endian mode, generates this sequence of widening
operations:
...
_33 = (void *) ivtmp.22_25;
vect__11.5_39 = MEM[base: _33, offset: 0B];
vect__12.6_40 = [vec_unpack_hi_expr] vect__11.5_39;
vect__12.6_41 = [vec_unpack_lo_expr] vect__11.5_39;
_29 = (void *) ivtmp.25_26;
MEM[base: _29, offset: 0B] = vect__12.6_40;
_5 = _29 + 16;
MEM[base: _5, offset: 0B] = vect__12.6_41;
...
I tried this on two targets configured for big-endian(aarch64 and powerpc).
From the IR above, it seems that result of widening the high part
(vect__12.6_40) is being stored at offset 0 from _29 and result of widening the
low part goes into *(_29 + 16). Shouldn't this be the other way around?
The source of this seems to be code in
tree-vect-stmst.c:supportable_widening_operation() that swaps the tree codes for
high and low part widening for Big Endian targets.
if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
{
enum tree_code ctmp = c1;
c1 = c2;
c2 = ctmp;
}
During vectorization of the scalar widening operation, it is transformed into
two vector operations - one for widening high part and one for widening low part
and these are stored as a linked list in STMT_VINFO(stmt) of a scalar gimple
statement. What's interesting here is the order in which they are stored:
scalar_gimple_stmt.vect_info->vectorized_stmt
points to this list:
[vec_unpack_hi_expr] vect__11.5_39->[vec_unpack_lo_expr] vect__11.5_39
What happens when vectorizing the store of the widened results is that the
single store is split into two stores based on the algorithm of ncopies =
VF/nunits where VF is the vectorization factor and nunits is the number of units
of the bigger data type. For the first of the stores, when
vec_oprnd = vect_get_vec_def_for_operand (op, next_stmt, NULL);
is called, vect_get_def returns the stmt_vinfo which is
[vec_unpack_hi_expr] vect__11.5_39
which gets stored in *(_29 + 0) and for the second store,
vec_oprnd = vect_get_vec_def_for_stmt_copy (dt, op);
is called which returns the stmt_related_vinfo() which is the second part of the
vectorized widened operation.
[vec_unpack_lo_expr] vect__11.5_39
Isn't this an inconsistency in ordering stores of high and low parts of a
widened vector operation? Is there something I'm missing?
Thanks,
Tejas Belagod.
ARM.