On Thu, Feb 27, 2020 at 11:56:49AM +0100, Richard Biener wrote:
> > > This calling convention would also be useful in the future for vectorizing
> > > functions that return complex values either by value or by reference.
> >
> > Only by value, you really don't know what the code does if something is
> > passed by reference, whether it is read, written into, or both etc.
> > And for _Complex {float,double}, e.g. the Intel ABI already specifies how to
> > pass them, just GCC isn't able to do that right now.
> 
> Ah, ok.  So what's missing is the standard function cexpi both GCC and
> libmvec can use.

That, plus adjust omp-simd-clone.c and the backends so that they do support
the complex modes and essentially transform those into passing/returning of
either vector of the complex elts with twice as many subparts, or twice as
many vectors, like e.g. the Intel ABI specifies.  E.g. for return type
adjustment, right now we have:
  t = TREE_TYPE (TREE_TYPE (fndecl));
  if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t))
    veclen = node->simdclone->vecsize_int;
  else
    veclen = node->simdclone->vecsize_float;
  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
  if (veclen > node->simdclone->simdlen)
    veclen = node->simdclone->simdlen;
  if (POINTER_TYPE_P (t))
    t = pointer_sized_int_node;
  if (veclen == node->simdclone->simdlen)
    t = build_vector_type (t, node->simdclone->simdlen);
  else
    {
      t = build_vector_type (t, veclen);
      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
    }
and we'd need to deal with the complex types accordingly.
And of course then to teach the vectorizer.

The Intel ABI e.g. for SSE2 (their 'x' letter, which roughly matches our 'b'
letter) they have:
        sizeof          VLEN=2  VLEN=4  VLEN=8  VLEN=16
float   4               1*MS128 1*MS128 2*MS128 4*MS128
double  8               1*MD128 2*MD128 4*MD128 8*MD128
float
complex 8               1*MS128 2*MS128 4*MS128 8*MS128
double
complex 16              2*MD128 4*MD128 8*MD128 16*MD128
where MS128 is __m128 and MD128 __m128d, i.e. float
__attribute__((vector_size (16))) and double __attribute__((vector_size (16))).

I'll need to check ICC on godbolt how they actually pass the complex,
whether it is real0 imag0 real1 imag1 real2 imag2 real3 imag3 or
real0 real1 real2 real3 imag0 imag1 imag2 imag3.

        Jakub

Reply via email to