Re: [RFC] Inconsistency in ordering vector widening operations on big-endian targets?

2013-06-14 Thread Tejas Belagod

Alan Modra wrote:

On Wed, Jun 12, 2013 at 04:22:46PM +0100, Tejas Belagod wrote:

From the IR above, it seems that result of widening the high part
(vect__12.6_40) is being stored at offset 0 from _29 and result of
widening the low part goes into *(_29 + 16). Shouldn't this be the
other way around?


Big-endian targets store the high part of multi-byte values at the low
address.  Why should vectors be different?



True. But the memory order of individual elements cannot be changed.

I did a bit more digging into this and realized this was because of the 
big-endian representation of vectors in gcc where the order of lanes are 
reversed. I will need to fix up my back-end to reflect this reverse ordering.


Thanks for your input.

Tejas.



[RFC] Inconsistency in ordering vector widening operations on big-endian targets?

2013-06-12 Thread Tejas Belagod

Hi,

This test case:

void foo(long long *d, int *f)
{
  int i;
  for (i=0; i 16; i++)
  {
d[i] = f[i];
  }
}

when vectorized for big-endian mode, generates this sequence of widening 
operations:

  ...
  _33 = (void *) ivtmp.22_25;
  vect__11.5_39 = MEM[base: _33, offset: 0B];
  vect__12.6_40 = [vec_unpack_hi_expr] vect__11.5_39;
  vect__12.6_41 = [vec_unpack_lo_expr] vect__11.5_39;
  _29 = (void *) ivtmp.25_26;
  MEM[base: _29, offset: 0B] = vect__12.6_40;
  _5 = _29 + 16;
  MEM[base: _5, offset: 0B] = vect__12.6_41;
  ...

I tried this on two targets configured for big-endian(aarch64 and powerpc).

From the IR above, it seems that result of widening the high part 
(vect__12.6_40) is being stored at offset 0 from _29 and result of widening the 
low part goes into *(_29 + 16). Shouldn't this be the other way around?


The source of this seems to be code in 
tree-vect-stmst.c:supportable_widening_operation() that swaps the tree codes for 
high and low part widening for Big Endian targets.


if (BYTES_BIG_ENDIAN  c1 != VEC_WIDEN_MULT_EVEN_EXPR)
{
  enum tree_code ctmp = c1;
  c1 = c2;
  c2 = ctmp;
}

During vectorization of the scalar widening operation, it is transformed into 
two vector operations - one for widening high part and one for widening low part 
and these are stored as a linked list in STMT_VINFO(stmt) of a scalar gimple 
statement. What's interesting here is the order in which they are stored:


  scalar_gimple_stmt.vect_info-vectorized_stmt

points to this list:

  [vec_unpack_hi_expr] vect__11.5_39-[vec_unpack_lo_expr] vect__11.5_39

What happens when vectorizing the store of the widened results is that the 
single store is split into two stores based on the algorithm of ncopies = 
VF/nunits where VF is the vectorization factor and nunits is the number of units 
of the bigger data type. For the first of the stores, when


  vec_oprnd = vect_get_vec_def_for_operand (op, next_stmt, NULL);

is called, vect_get_def returns the stmt_vinfo which is

   [vec_unpack_hi_expr] vect__11.5_39

which gets stored in *(_29 + 0) and for the second store,

   vec_oprnd = vect_get_vec_def_for_stmt_copy (dt, op);

is called which returns the stmt_related_vinfo() which is the second part of the 
vectorized widened operation.


[vec_unpack_lo_expr] vect__11.5_39

Isn't this an inconsistency in ordering stores of high and low parts of a 
widened vector operation? Is there something I'm missing?


Thanks,
Tejas Belagod.
ARM.



Re: [RFC] Inconsistency in ordering vector widening operations on big-endian targets?

2013-06-12 Thread Alan Modra
On Wed, Jun 12, 2013 at 04:22:46PM +0100, Tejas Belagod wrote:
 From the IR above, it seems that result of widening the high part
 (vect__12.6_40) is being stored at offset 0 from _29 and result of
 widening the low part goes into *(_29 + 16). Shouldn't this be the
 other way around?

Big-endian targets store the high part of multi-byte values at the low
address.  Why should vectors be different?

-- 
Alan Modra
Australia Development Lab, IBM