On Thu, Aug 17, 2017 at 08:40:34PM -0500, Steven Munroe wrote: > > > +/* Convert the lower SPFP value to a 32-bit integer according to the > > > current > > > + rounding mode. */ > > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, > > > __artificial__)) > > > +_mm_cvtss_si32 (__m128 __A) > > > +{ > > > + __m64 res = 0; > > > +#ifdef _ARCH_PWR8 > > > + __m128 vtmp; > > > + __asm__( > > > + "xxsldwi %x1,%x2,%x2,3;\n" > > > + "xscvspdp %x1,%x1;\n" > > > + "fctiw %1,%1;\n" > > > + "mfvsrd %0,%x1;\n" > > > + : "=r" (res), > > > + "=&wi" (vtmp) > > > + : "wa" (__A) > > > + : ); > > > +#endif > > > + return (res); > > > +} > > > > Maybe it could do something better than return the wrong answer for non-p8? > > Ok this gets tricky. Before _ARCH_PWR8 the vector to scalar transfer > would go through storage. But that is not the worst of it.
Float to int conversion goes trough storage on older systems, too. > The semantic of cvtss requires rint or llrint. But __builtin_rint will > generate a call to libm unless we assert -ffast-math. Yeah, we should fix that some day. If we can. > And we don't have > builtins to generate fctiw/fctid directly. Yup. Well, __builtin_rint*, but that currently calls out to libm. > So I will add the #else using __builtin_rint if that libm dependency is > ok (this will pop in the DG test for older machines. Another option is to not support this intrinsic for < POWER8. I don't have a big (or well-informed) opinion on which it best; but I doubt always returning 0 is the best we can do ;-) Segher