https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #39 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 > > --- Comment #38 from H.J. Lu <hjl.tools at gmail dot com> --- > (In reply to rguent...@suse.de from comment #37) > > On Wed, 23 Jan 2019, hjl.tools at gmail dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713 > > > > > > --- Comment #36 from H.J. Lu <hjl.tools at gmail dot com> --- > > > (In reply to Richard Biener from comment #34) > > > > GCC definitely fails to see the FMA use as opportunity in > > > > ix86_emit_swsqrtsf, the a == 0 checking is because of the missing > > > > expander w/o avx512er where we could still use the NR sequence > > > > with the other instruction. HJ? > > > > > > Like this? > > > > Yes. The lack of an expander for the rqsrt operation is probably > > more severe though (causing sqrt + approx recip to appear) > > > > Can we use UNSPEC_RSQRT14 here if UNSPEC_RSQRT28 isn't available? I think we can but we lack an expander for this. IIRC for the following existing expander the RTL is ignored and thus we could simply replace the TARGET_AVX512ER check with TARGET_AVX512F? (define_expand "rsqrtv16sf2" [(set (match_operand:V16SF 0 "register_operand") (unspec:V16SF [(match_operand:V16SF 1 "vector_operand")] UNSPEC_RSQRT28))] "TARGET_SSE_MATH && TARGET_AVX512ER" { ix86_emit_swsqrtsf (operands[0], operands[1], V16SFmode, true); DONE; })