On Thu, Feb 27, 2020 at 11:56:49AM +0100, Richard Biener wrote: > > > This calling convention would also be useful in the future for vectorizing > > > functions that return complex values either by value or by reference. > > > > Only by value, you really don't know what the code does if something is > > passed by reference, whether it is read, written into, or both etc. > > And for _Complex {float,double}, e.g. the Intel ABI already specifies how to > > pass them, just GCC isn't able to do that right now. > > Ah, ok. So what's missing is the standard function cexpi both GCC and > libmvec can use.
That, plus adjust omp-simd-clone.c and the backends so that they do support the complex modes and essentially transform those into passing/returning of either vector of the complex elts with twice as many subparts, or twice as many vectors, like e.g. the Intel ABI specifies. E.g. for return type adjustment, right now we have: t = TREE_TYPE (TREE_TYPE (fndecl)); if (INTEGRAL_TYPE_P (t) || POINTER_TYPE_P (t)) veclen = node->simdclone->vecsize_int; else veclen = node->simdclone->vecsize_float; veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t)); if (veclen > node->simdclone->simdlen) veclen = node->simdclone->simdlen; if (POINTER_TYPE_P (t)) t = pointer_sized_int_node; if (veclen == node->simdclone->simdlen) t = build_vector_type (t, node->simdclone->simdlen); else { t = build_vector_type (t, veclen); t = build_array_type_nelts (t, node->simdclone->simdlen / veclen); } and we'd need to deal with the complex types accordingly. And of course then to teach the vectorizer. The Intel ABI e.g. for SSE2 (their 'x' letter, which roughly matches our 'b' letter) they have: sizeof VLEN=2 VLEN=4 VLEN=8 VLEN=16 float 4 1*MS128 1*MS128 2*MS128 4*MS128 double 8 1*MD128 2*MD128 4*MD128 8*MD128 float complex 8 1*MS128 2*MS128 4*MS128 8*MS128 double complex 16 2*MD128 4*MD128 8*MD128 16*MD128 where MS128 is __m128 and MD128 __m128d, i.e. float __attribute__((vector_size (16))) and double __attribute__((vector_size (16))). I'll need to check ICC on godbolt how they actually pass the complex, whether it is real0 imag0 real1 imag1 real2 imag2 real3 imag3 or real0 real1 real2 real3 imag0 imag1 imag2 imag3. Jakub