On Tue, Jul 6, 2021 at 3:15 AM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Tue, Jul 6, 2021 at 10:46 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > On Thu, Jul 1, 2021 at 9:04 PM Jakub Jelinek via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > On Thu, Jul 01, 2021 at 02:58:01PM +0200, Richard Biener wrote: > > > > > The main issue is complex _Float16 functions in libgcc. If _Float16 > > > > > doesn't > > > > > require -mavx512fp16, we need to compile complex _Float16 functions in > > > > > libgcc without -mavx512fp16. Complex _Float16 performance is very > > > > > important for our _Float16 usage. _Float16 performance has to be > > > > > very fast. There should be no emulation anywhere when -mavx512fp16 > > > > > is used. That is why _Float16 is available only with -mavx512fp16. > > > > > > > > It should be possible to emulate scalar _Float16 using _Float32 with a > > > > reasonable > > > > performance trade-off. I think users caring for _Float16 performance > > > > will > > > > use vector intrinsics anyway since for scalar code _Float32 code will > > > > likely > > > > perform the same (at double storage cost) > > > > > > Only if it is allowed to have excess precision for _Float16. If not, then > > > one would need to (expensively?) round after every operation at least. > > There may be inconsistent behavior between soft-fp and avx512fp16 > > instructions if we emulate _Float16 w/ float . > > i.e > > 1) for a + b - c where b and c are variables with the same big value > > and a + b is NAN at _Float16 and real value at float, avx512fp16 > > instruction will raise an exception but soft-fp won't(unless it's > > rounded after every operation.) > > 2) a / b where b is denormal value and AVX512FP16 won't flush it to > > zero even w/ -Ofast, but when it's extended to float and using divss, > > it will be flushed to zero and raise an exception when compiling w/ > > Ofast > > > > To solve the upper issue, i try to add full emulation for _Float16(for > > all those under libgcc/soft-fp/, i.e. add/sub/mul/div/cmp, .etc), > > problem is in pass_expand, it always try wider mode first instead of > > using soft-fp > > > > /* Look for a wider mode of the same class for which we think we > > can open-code the operation. Check for a widening multiply at the > > wider mode as well. */ > > > > if (CLASS_HAS_WIDER_MODES_P (mclass) > > && methods != OPTAB_DIRECT && methods != OPTAB_LIB) > > FOR_EACH_WIDER_MODE (wider_mode, mode) > > > > I think pass_expand did this for some reason, so I'm a little afraid > > to touch this part of the code. > > It might be the first time we hit this ;) I don't think it's safe for > non-integer modes or even anything but a small set of operations. > Just consider ssadd besides rounding issues or FP. > > > So the key point is that the soft-fp and avx512fp16 instructions may > > do not behave the same on the exception, is this acceptable? > > I think that's quite often the case for soft-fp.
So this is a GCC limitation. Please document difference behaviors of _Float16 with and without AVX512FP16, similar to --- The '__fp16' type may only be used as an argument to intrinsics defined in '<arm_fp16.h>', or as a storage format. For purposes of arithmetic and other operations, '__fp16' values in C or C++ expressions are automatically promoted to 'float'. The ARM target provides hardware support for conversions between '__fp16' and 'float' values as an extension to VFP and NEON (Advanced SIMD), and from ARMv8-A provides hardware support for conversions between '__fp16' and 'double' values. GCC generates code using these hardware instructions if you compile with options to select an FPU that provides them; for example, '-mfpu=neon-fp16 -mfloat-abi=softfp', in addition to the '-mfp16-format' option to select a half-precision format. Language-level support for the '__fp16' data type is independent of whether GCC generates code using hardware floating-point instructions. In cases where hardware support is not specified, GCC implements conversions between '__fp16' and other types as library calls. It is recommended that portable code use the '_Float16' type defined by ISO/IEC TS 18661-3:2015. *Note Floating Types::. --- We recommend portable code of _Float16 with AVX512FP16. > > BTW, i've finished a initial patch to enable _Float16 on sse2, and > > emulate _Float16 operation w/ float, and it passes all 312 new tests > > which are related to _Float16, but those units tests doesn't cover the > > scenario I'm talking about. > > > > > > Jakub > > > > > > > > > -- > > BR, > > Hongtao -- H.J.