On Thu, Aug 10, 2023 at 03:08:31PM +0000, Jiang, Haochen via Gcc-patches wrote:
> There are lots of discussions on arch level and ABIs and I really appreciate 
> that.
> 
> For the arch level issue, it might be a little early to discuss and should 
> not block
> these patches.
> 
> For ABI issue, the problem actually comes from the current behavior between
> GCC and clang/LLVM are different in return value for m512 w/o 512 bit support.
> Then it becomes a question to get unified and we get the whole discussion.
> However, it is a corner case.

What LLVM does looks just wrong to me.

Try:

typedef int V256 __attribute__((vector_size (32)));
typedef int V512 __attribute__((vector_size (64)));
typedef int V1024 __attribute__((vector_size (128)));

V256
foo256 (V256 x, V256 y)
{
  return x + y;
}

V512
foo512 (V512 x, V512 y)
{
  return x + y;
}

V1024
foo1024 (V1024 x, V1024 y)
{
  return x + y;
}

with -msse4, -mavx2 and -mavx512f.
GCC passes all arguments and all return values in memory with warnings for
the first case, all but foo256 in the second case and everything in foo1024
in the last case.  That matches the psABI without/with __m256 and/or __m512
additions, it is unfortunate that there is no interoperability between the
pre-AVX2 vs. AVX2+ resp. pre-AVX512F vs. AVX512F+ passing/returning, but
that is a consequence of wanting to get fast code on new ISAs.

While LLVM passes all the arguments the same as GCC (though without
warnings), but for foo256 returns the result in xmm0/xmm1 pair with -msse4
and in ymm0 for -mavx2 and later, for foo512 returns the result in
xmm0/xmm1/xmm2/xmm3 quadruplet for -msse4, in ymm0/ymm1 pair for -mavx2 and
finally in zmm0 for -mavx512f.  And for foo1024 in memory for -msse4,
in ymm0/ymm1/ymm2/ymm3 quadruplet for -mavx2 and in zmm0/zmm1 pair for
-mavx512f.  I have no idea what in psABI would that be based on, both the
different passing of arguments vs. returning of result, but more
importantly, this doesn't mean 2 different ABIs for one function depending
on ISA flags, but 3, maybe 4 (with -mno-sse?).

        Jakub

Reply via email to