Steve Ellcey <sell...@marvell.com> writes: > I am testing the latest GCC with not-yet-submitted GLIBC changes that > implement libmvec on Aarch64. > > While trying to run SPEC 2017 (specifically 521.wrf_r) I ran into a > case where GCC was generating a call to _ZGVnN2vv_powf, that is a > vectorized powf call for 2 (not 4) elements. This was a problem > because I only implemented a 4 element 32 bit vectorized powf function > for libmvec and not a 2 element version. > > I think this is due to aarch64_simd_clone_compute_vecsize_and_simdlen > which allows for (element count * element size) to be either 64 > or 128. > > I would like some thoughts on what we should do about this, should > we require glibc/libmvec to provide 2 element 32 bit floating point > vector functions (as well as the 4 element ones) or should we change > aarch64_simd_clone_compute_vecsize_and_simdlen to only allow 4 > element (128 total bit size) vectors and not 2 element (64 total bit > size) ones? > > This is obviously a question for the pre-SVE vector instructions, > I am not sure how this would be handled in SVE.
The vector ABI says that "#pragma omp declare simd" without a simdlen declares both 64-bit and 128-bit functions, so I think the GCC code is doing the right thing. If glibc only implements 128-bit functions for powf then it should use simdlen(4). It would be nice to support simdlen(2) as well though. Low-trip-count loops like the one below would be one use case. Another would be SLP. And hopefully at some point in the future we'll be able to turn vect-epilogues-nomask on by default, in which case we would also have 64-bit vectorisation in the tail of a loop vectorised at 128 bits. Thanks, Richard > > Steve Ellcey > sell...@marvell.com > > P.S. Here a test case in Fortran that generated the 2 element > vector call. It unrolled the loop into one vector call > of 2 elements and one scalar call. > > SUBROUTINE FOO(B,W,P) > REAL, DIMENSION (3) :: W, P > DO 10 I = 1, 3 > P(I) = W(I) ** B > 10 CONTINUE > END SUBROUTINE FOO