On Thu, 21 Mar 2024, Rainer Orth wrote:

> gcc.dg/vect/bb-slp-32.c currently XPASSes on 32 and 64-bit Solaris/SPARC:
> 
> XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects  scan-tree-dump slp2 
> "vectorization is not profitable"
> XPASS: gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not 
> profitable"
> 
> At least on SPARC, the current xfail can simply go, but I'm highly
> uncertain if this is right in general.
> 
> Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.
> 
> Ok for trunk?

The condition was made for the case where vectorization fails even when
not considering costing.  But given we now do

  p = __builtin_assume_aligned (p, __BIGGEST_ALIGNMENT__);

that condition doesn't make sense anymore (forgot to update it in my
r11-6715-gb36c9cd09472c8 change).

In principle the testcase should be profitable to vectorize with
the SLP reduction support now (and we'd vectorize it that way).
But we fail to apply SLP node CSE when merging the SLP instance
into a common subgraph, so we over-estimate cost (and perform
double code generation that's later CSEd).

That it's still not profitable on x86_64 for me is a quite narrow loss:

  Vector cost: 144
  Scalar cost: 140

So ideally we'd key the FAIL on .REDUC_PLUS not being available for
V4SImode but then we also try V2SImode where the reduction isn't
recognized.  So the testcase wouldn't work well for targets comparing
cost.

I'd say we remove the dg-final completely for now.  I filed PR114413
about the costing/CSE issue above.

Richard.

Reply via email to