On Thu, Aug 17, 2017 at 08:40:34PM -0500, Steven Munroe wrote:
> > > +/* Convert the lower SPFP value to a 32-bit integer according to the 
> > > current
> > > +   rounding mode.  */
> > > +extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > > __artificial__))
> > > +_mm_cvtss_si32 (__m128 __A)
> > > +{
> > > +  __m64 res = 0;
> > > +#ifdef _ARCH_PWR8
> > > +  __m128 vtmp;
> > > +  __asm__(
> > > +      "xxsldwi %x1,%x2,%x2,3;\n"
> > > +      "xscvspdp %x1,%x1;\n"
> > > +      "fctiw  %1,%1;\n"
> > > +      "mfvsrd  %0,%x1;\n"
> > > +      : "=r" (res),
> > > + "=&wi" (vtmp)
> > > +      : "wa" (__A)
> > > +      : );
> > > +#endif
> > > +  return (res);
> > > +}
> > 
> > Maybe it could do something better than return the wrong answer for non-p8?
> 
> Ok this gets tricky. Before _ARCH_PWR8 the vector to scalar transfer
> would go through storage. But that is not the worst of it.

Float to int conversion goes trough storage on older systems, too.

> The semantic of cvtss requires rint or llrint. But __builtin_rint will
> generate a call to libm unless we assert -ffast-math.

Yeah, we should fix that some day.  If we can.

> And we don't have
> builtins to generate fctiw/fctid directly.

Yup.  Well, __builtin_rint*, but that currently calls out to libm.

> So I will add the #else using __builtin_rint if that libm dependency is
> ok (this will pop in the DG test for older machines.

Another option is to not support this intrinsic for < POWER8.

I don't have a big (or well-informed) opinion on which it best; but I
doubt always returning 0 is the best we can do ;-)


Segher

Reply via email to