Re: [PATCH 0/2] Initial support for AVX512FP16

H.J. Lu via Gcc-patches Tue, 06 Jul 2021 05:12:36 -0700

On Tue, Jul 6, 2021 at 3:15 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Tue, Jul 6, 2021 at 10:46 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Thu, Jul 1, 2021 at 9:04 PM Jakub Jelinek via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote:
> > > > > The main issue is complex _Float16 functions in libgcc.  If _Float16 
> > > > > doesn't
> > > > > require -mavx512fp16, we need to compile complex _Float16 functions in
> > > > > libgcc without -mavx512fp16.  Complex _Float16 performance is very
> > > > > important for our _Float16 usage.   _Float16 performance has to be
> > > > > very fast.  There should be no emulation anywhere when -mavx512fp16
> > > > > is used.   That is why _Float16 is available only with -mavx512fp16.
> > > >
> > > > It should be possible to emulate scalar _Float16 using _Float32 with a
> > > > reasonable
> > > > performance trade-off.  I think users caring for _Float16 performance 
> > > > will
> > > > use vector intrinsics anyway since for scalar code _Float32 code will 
> > > > likely
> > > > perform the same (at double storage cost)
> > >
> > > Only if it is allowed to have excess precision for _Float16.  If not, then
> > > one would need to (expensively?) round after every operation at least.
> > There may be inconsistent behavior between soft-fp and avx512fp16
> > instructions if we emulate _Float16 w/ float .
> >  i.e
> >   1) for a + b - c where b and c are variables with the same big value
> > and a + b is NAN at _Float16 and real value at float, avx512fp16
> > instruction will raise an exception but soft-fp won't(unless it's
> > rounded after every operation.)
> >   2) a / b where b is denormal value and AVX512FP16 won't flush it to
> > zero even w/ -Ofast, but when it's extended to float and using divss,
> > it will be flushed to zero and raise an exception when compiling w/
> > Ofast
> >
> > To solve the upper issue, i try to add full emulation for _Float16(for
> > all those under libgcc/soft-fp/, i.e. add/sub/mul/div/cmp, .etc),
> > problem is in pass_expand, it always try wider mode first instead of
> > using soft-fp
> >
> >   /* Look for a wider mode of the same class for which we think we
> >      can open-code the operation.  Check for a widening multiply at the
> >      wider mode as well.  */
> >
> >   if (CLASS_HAS_WIDER_MODES_P (mclass)
> >       && methods != OPTAB_DIRECT && methods != OPTAB_LIB)
> >     FOR_EACH_WIDER_MODE (wider_mode, mode)
> >
> > I think pass_expand did this for some reason, so I'm a little afraid
> > to touch this part of the code.
>
> It might be the first time we hit this ;)  I don't think it's safe for
> non-integer modes or even anything but a small set of operations.
> Just consider ssadd besides rounding issues or FP.
>
> > So the key point is that the soft-fp and avx512fp16 instructions may
> > do not behave the same on the exception, is this acceptable?
>
> I think that's quite often the case for soft-fp.


So this is a GCC limitation.  Please document difference behaviors
of _Float16 with and without AVX512FP16, similar to

---
 The '__fp16' type may only be used as an argument to intrinsics defined
in '<arm_fp16.h>', or as a storage format.  For purposes of arithmetic
and other operations, '__fp16' values in C or C++ expressions are
automatically promoted to 'float'.

 The ARM target provides hardware support for conversions between
'__fp16' and 'float' values as an extension to VFP and NEON (Advanced
SIMD), and from ARMv8-A provides hardware support for conversions
between '__fp16' and 'double' values.  GCC generates code using these
hardware instructions if you compile with options to select an FPU that
provides them; for example, '-mfpu=neon-fp16 -mfloat-abi=softfp', in
addition to the '-mfp16-format' option to select a half-precision
format.

 Language-level support for the '__fp16' data type is independent of
whether GCC generates code using hardware floating-point instructions.
In cases where hardware support is not specified, GCC implements
conversions between '__fp16' and other types as library calls.

 It is recommended that portable code use the '_Float16' type defined by
ISO/IEC TS 18661-3:2015.  *Note Floating Types::.
---

We recommend portable code of _Float16 with AVX512FP16.

> > BTW, i've finished a initial patch to enable _Float16 on sse2, and
> > emulate _Float16 operation w/ float, and it passes all  312 new tests
> > which are related to _Float16, but those units tests doesn't cover the
> > scenario I'm talking about.
> > >
> > >         Jakub
> > >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
H.J.

Re: [PATCH 0/2] Initial support for AVX512FP16

Reply via email to