Re: [PATCH] Allow {nearby,r}int{,f} vectorization on x86 with sse4.1 and later (PR target/93078)

Uros Bizjak Sat, 28 Dec 2019 05:20:36 -0800

On Sat, Dec 28, 2019 at 12:02 PM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Sat, Dec 28, 2019 at 11:48:12AM +0100, Uros Bizjak wrote:
> > On Sat, Dec 28, 2019 at 10:33 AM Jakub Jelinek <ja...@redhat.com> wrote:
> > >
> > > Hi!
> > >
> > > In i386.md, we have nearbyint<mode>2 and rint<mode>2 patterns that expand
> > > SF/DF/XF mode patterns to rounding instructions.  For pre-sse4.1 that is
> > > done using XFmode and so inappropriate for vectorization, but for sse4.1
> > > and later we can just use the {,v}{round,rndscale}p{s,d} instructions
> > > when we emit {,v}rounds{s,d} for SF/DF mode.
> >
> > In i386-builtins.c, ix86_builtin_vectorized_function, we already have:
> >
> > --cut here--
> >     CASE_CFN_RINT:
> >       /* The round insn does not trap on denormals.  */
> >       if (flag_trapping_math || !TARGET_SSE4_1)
> > break;
> >
> >       if (out_mode == DFmode && in_mode == DFmode)
> > {
> >  if (out_n == 2 && in_n == 2)
> >    return ix86_get_builtin (IX86_BUILTIN_RINTPD);
> >  else if (out_n == 4 && in_n == 4)
> >    return ix86_get_builtin (IX86_BUILTIN_RINTPD256);
> > }
> >       if (out_mode == SFmode && in_mode == SFmode)
> > {
> >  if (out_n == 4 && in_n == 4)
> >    return ix86_get_builtin (IX86_BUILTIN_RINTPS);
> >  else if (out_n == 8 && in_n == 8)
> >    return ix86_get_builtin (IX86_BUILTIN_RINTPS256);
> > }
> >       break;
> > --cut here--
>
> Ok, will test removing that stuff, seems nothing in the headers uses that.
>
> > which is converting rint functions to corresponding x86 builtin. If we
> > want to go through generic path, then the above code is probably
> > redundant and should be removed together with corresponding builtins.
> > OTOH, the existing code also bails out for flag_trapping_math, so this
> > condition should also be considered in named expanders.
>
> The conditions are:
> (define_expand "nearbyint<mode>2"
>   [(use (match_operand:MODEF 0 "register_operand"))
>    (use (match_operand:MODEF 1 "nonimmediate_operand"))]
>   "(TARGET_USE_FANCY_MATH_387
>     && (!(SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)
>           || TARGET_MIX_SSE_I387)
>     && !flag_trapping_math)
>    || (TARGET_SSE4_1 && TARGET_SSE_MATH)"
> and:
> (define_expand "rint<mode>2"
>   [(use (match_operand:MODEF 0 "register_operand"))
>    (use (match_operand:MODEF 1 "nonimmediate_operand"))]
>   "TARGET_USE_FANCY_MATH_387
>    || (SSE_FLOAT_MODE_P (<MODE>mode) && TARGET_SSE_MATH)"
> Only nearbyint tests flag_trapping_math, and only for the pre-sse4.1 case,


This is correct, since x87 frndint always generates precision
(inexact) exceptions, but nearbyint should not generate any.

On a related note, trap on denormal is not IEEE exception, and
documentation explicitly says that -fno-trapping-math affects only
division by zero, overflow, underflow, inexact result and invalid
operation. So, do we need to check for flag_trapping_math in
ix86_builtin_vectorized_function for other builtins involving ROUND
insn? Also, perhaps floor/ceil/trunc can be reimplemented using
standard named expander instead.

> with sse4.1 it is enabled regardless of that (just depends on
> TARGET_SSE_MATH, but I think for vectorization we don't really test that,
> vectorization is always done in sse*).

Your patch with stuff removed from ix86_builtin_vectorized_function is OK.

Thanks,
Uros.

Re: [PATCH] Allow {nearby,r}int{,f} vectorization on x86 with sse4.1 and later (PR target/93078)

Reply via email to