Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE

Andre Vieira (lists) Thu, 01 Feb 2024 09:02:02 -0800



On 01/02/2024 07:19, Richard Biener wrote:

On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:


The patch didn't come with a testcase so it's really hard to tell
what goes wrong now and how it is fixed ...


My bad! I had a testcase locally but never added it...

However... now I look at it and ran it past Richard S, the codegen isn't'wrong', but it does have the potential to lead to some pretty slowcodegen, especially for inbranch simdclones where it transforms the SVEpredicate into an Advanced SIMD vector by inserting the elements one ata time...


An example of which can be seen if you do:

gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S

with the following t.c:
#pragma omp declare simd simdlen(4) inbranch
int __attribute__ ((const)) fn5(int);

void fn4 (int *a, int *b, int n)
{
    for (int i = 0; i < n; ++i)
        b[i] = fn5(a[i]);
}

Now I do have to say, for our main usecase of libmvec we won't have any'inbranch' Advanced SIMD clones, so we avoid that issue... But of coursethat doesn't mean user-code will.

I'm gonna remove this patch and run another test regression to see if itcatches anything weird, but if not then I guess we do have the option tonot use this patch and aim to solve the costing or codegen issue inGCC-15. We don't currently do any simdclone costing and I don't have aclear suggestion for how given openmp has no mechanism that I know offto expose the speedup of a simdclone over it's scalar variant, so howwould we 'compare' a simdclone call with extra overhead of argumentpreparation vs scalar, though at least we could prefer a call to adifferent simdclone with less argument preparation. Anyways I digress.

Other tests, these require aarch64-autovec-preference=2 so that also hasme worried less...

gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 --paramaarch64-autovec-preference=2 -fopenmp-simd t.c -S


t.c:
#pragma omp declare simd simdlen(2) notinbranch
float __attribute__ ((const)) fn1(double);

void fn0 (float *a, float *b, int n)
{
    for (int i = 0; i < n; ++i)
        b[i] = fn1((double) a[i]);
}

#pragma omp declare simd simdlen(2) notinbranch
float __attribute__ ((const)) fn3(float);

void fn2 (float *a, double *b, int n)
{
    for (int i = 0; i < n; ++i)
        b[i] = (double) fn3(a[i]);
}

Richard.


That said, I wonder how we end up mixing things up in the first place.

Richard.

Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE

Reply via email to