[Bug tree-optimization/88713] Vectorized code slow vs. flang

rguenther at suse dot de Wed, 23 Jan 2019 06:06:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713


--- Comment #39 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> 
> --- Comment #38 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to rguent...@suse.de from comment #37)
> > On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
> > > 
> > > --- Comment #36 from H.J. Lu <hjl.tools at gmail dot com> ---
> > > (In reply to Richard Biener from comment #34)
> > > > GCC definitely fails to see the FMA use as opportunity in
> > > > ix86_emit_swsqrtsf, the a == 0 checking is because of the missing
> > > > expander w/o avx512er where we could still use the NR sequence
> > > > with the other instruction.  HJ?
> > > 
> > > Like this?
> > 
> > Yes.  The lack of an expander for the rqsrt operation is probably
> > more severe though (causing sqrt + approx recip to appear)
> > 
> 
> Can we use UNSPEC_RSQRT14 here if UNSPEC_RSQRT28 isn't available?

I think we can but we lack an expander for this.  IIRC for the following
existing expander the RTL is ignored and thus we could simply
replace the TARGET_AVX512ER check with TARGET_AVX512F?

(define_expand "rsqrtv16sf2"
  [(set (match_operand:V16SF 0 "register_operand")
        (unspec:V16SF
          [(match_operand:V16SF 1 "vector_operand")]
          UNSPEC_RSQRT28))]
  "TARGET_SSE_MATH && TARGET_AVX512ER"
{
  ix86_emit_swsqrtsf (operands[0], operands[1], V16SFmode, true);
  DONE;
})

[Bug tree-optimization/88713] Vectorized code slow vs. flang

Reply via email to