Hi,

This test case:

void foo(long long *d, int *f)
{
  int i;
  for (i=0; i< 16; i++)
  {
    d[i] = f[i];
  }
}

when vectorized for big-endian mode, generates this sequence of widening 
operations:

  ...
  _33 = (void *) ivtmp.22_25;
  vect__11.5_39 = MEM[base: _33, offset: 0B];
  vect__12.6_40 = [vec_unpack_hi_expr] vect__11.5_39;
  vect__12.6_41 = [vec_unpack_lo_expr] vect__11.5_39;
  _29 = (void *) ivtmp.25_26;
  MEM[base: _29, offset: 0B] = vect__12.6_40;
  _5 = _29 + 16;
  MEM[base: _5, offset: 0B] = vect__12.6_41;
  ...

I tried this on two targets configured for big-endian(aarch64 and powerpc).

From the IR above, it seems that result of widening the high part (vect__12.6_40) is being stored at offset 0 from _29 and result of widening the low part goes into *(_29 + 16). Shouldn't this be the other way around?

The source of this seems to be code in tree-vect-stmst.c:supportable_widening_operation() that swaps the tree codes for high and low part widening for Big Endian targets.

if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
    {
      enum tree_code ctmp = c1;
      c1 = c2;
      c2 = ctmp;
    }

During vectorization of the scalar widening operation, it is transformed into two vector operations - one for widening high part and one for widening low part and these are stored as a linked list in STMT_VINFO(stmt) of a scalar gimple statement. What's interesting here is the order in which they are stored:

  scalar_gimple_stmt.vect_info->vectorized_stmt

points to this list:

  [vec_unpack_hi_expr] vect__11.5_39->[vec_unpack_lo_expr] vect__11.5_39

What happens when vectorizing the store of the widened results is that the single store is split into two stores based on the algorithm of ncopies = VF/nunits where VF is the vectorization factor and nunits is the number of units of the bigger data type. For the first of the stores, when

  vec_oprnd = vect_get_vec_def_for_operand (op, next_stmt, NULL);

is called, vect_get_def returns the stmt_vinfo which is

   [vec_unpack_hi_expr] vect__11.5_39

which gets stored in *(_29 + 0) and for the second store,

   vec_oprnd = vect_get_vec_def_for_stmt_copy (dt, op);

is called which returns the stmt_related_vinfo() which is the second part of the vectorized widened operation.

    [vec_unpack_lo_expr] vect__11.5_39

Isn't this an inconsistency in ordering stores of high and low parts of a widened vector operation? Is there something I'm missing?

Thanks,
Tejas Belagod.
ARM.

Reply via email to