Suppress exceptions (when specified), by saving, manipulating, and restoring the FPSCR. Similarly, save, set, and restore the floating-point rounding mode when required.
No attempt is made to optimize writing the FPSCR (by checking if the new value would be the same), other than using lighter weight instructions when possible. Note that explicit instruction scheduling "barriers" are added to prevent floating-point computations from being moved before or after the explicit FPSCR manipulations. (That these are required has been reported as an issue in GCC: PR102783.) The scalar versions naively use the parallel versions to compute the single scalar result and then construct the remainder of the result. Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO are swapped from the corresponding values on x86 so as to match the corresponding rounding mode values in the Power ISA. Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and convert _mm_ceil* and _mm_floor* into macros. This matches the current analogous implementations in config/i386/smmintrin.h. Function signatures match the analogous functions in config/i386/smmintrin.h. Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss, modeled after the very similar "floor" and "ceil" tests. Include basic tests, plus tests at the boundaries for floating-point representation, positive and negative, test all of the parameterized rounding modes as well as the C99 rounding modes and interactions between the two. Exceptions are not explicitly tested. 2021-10-18 Paul A. Clarke <p...@us.ibm.com> gcc * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT, _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF, _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC, _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC, _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New. * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss): Convert from function to macro. gcc/testsuite * gcc.target/powerpc/sse4_1-round3.h: New. * gcc.target/powerpc/sse4_1-roundpd.c: New. * gcc.target/powerpc/sse4_1-roundps.c: New. * gcc.target/powerpc/sse4_1-roundsd.c: New. * gcc.target/powerpc/sse4_1-roundss.c: New. Okay. Thanks, David