Re: [PATCH 0/2] Initial support for AVX512FP16

Joseph Myers Thu, 01 Jul 2021 13:46:34 -0700

Some general comments, following what I said on libc-alpha:


1. Can you confirm that the ABI being used for 64-bit, for _Float16 and 
_Complex _Float16 argument passing and return, follows the current x86_64 
ABI document?


2. Can you confirm that if you build with this instruction set extension 
enabled by default, and run GCC tests for a corresponding (emulated?) 
processor, all the existing float16 tests in the testsuite are enabled and 
PASS (both compilation and execution) (both 64-bit and 32-bit testing)?


3. There's an active 32-bit ABI mailing list (ia32-...@googlegroups.com).  
If you want to support _Float16 in the 32-bit case, please work with it to 
get the corresponding ABI documented (using only memory and 
general-purpose registers seems like a good idea, so that the ABI can be 
supported for the base architecture without depending on SSE registers 
being present).  In the absence of 32-bit ABI support it might be better 
to disable the HFmode support for 32-bit.


4. Support for _Float16 really ought not to depend on whether a particular 
instruction set extension is present, just like with other floating-point 
types; it makes sense, as an API, for all x86 processors (and like many 
APIs, it will be faster on some processors than on others).  More specific 
points here are:

(a) Basic arithmetic (+-*/) can be done by converting to SFmode, doing 
arithmetic there and converting back to HFmode; the results of doing so 
will be correctly rounded.  Indeed, I think optabs.c handles that 
automatically when operations are available on a wider mode but not on the 
desired mode (but you'd need to check carefully that all the expected 
conversions do occur).

(b) Conversions to/from all other floating-point modes will always be 
needed, whether in hardware or in software.

(c) In the F16C (Ivy Bridge and later) case, where you have hardware 
conversions to/from float (only), it's fine to convert to double (or long 
double) via float.  (On efficiency grounds, widening from HFmode to TFmode 
should be a pure software operations, that should be faster than having an 
intermediate conversion to SFmode when the SFmode-to-TFmode conversion is 
a software operation.)

(d) In the F16C case (where there are hardware conversions only from 
SFmode, not from wider modes), conversion *from* DFmode (or XFmode or 
TFmode) to HFmode should be a software operation, to avoid double 
rounding; an intermediate conversion to SFmode would be incorrect.

(e) It's OK for conversions to/from integer modes to go via SFmode 
(although I don't know if that's efficient or not).  Any case where a 
conversion from integer to SFmode is inexact would overflow HFmode, so 
there are no double rounding issues.

(f) In the F16C case, it seems the hardware instructions only work on 
vectors, not scalars, so care would need to be taken to use them for 
scalar conversions only if the other elements of the vector register are 
known to be safe to convert without raising any exceptions (e.g. all zero 
bits, or -fno-trapping-math in effect).

(g) If concerned about efficiency of intermediate truncations on 
processors without hardware _Float16 arithmetic, look at 
aarch64_excess_precision; you have the option of using excess precision 
for _Float16 by default, though that only really helps for C given the 
lack of excess precision support in the C++ front end.  (Enabling this can 
cause trouble for code that only expects C99/C11 values of 
FLT_EVAL_METHOD, however; see the -fpermitted-flt-eval-methods option for 
more details.)


5. Suppose that in some cases you do disable _Float16 support (whether 
that's just for 32-bit until the ABI has been defined, or also in the 
absence of instruction set support despite my comments above).  Then the 
way you do that in this patch series, enabling the type in 
ix86_scalar_mode_supported_p and ix86_libgcc_floating_mode_supported_p and 
giving an error later in ix86_expand_move, is a bad idea.

Errors in expanders are generally problematic (they don't have good 
location information available).  But apart from that, ordinary user code 
should be able to tell whether _Float16 is supported by testing whether 
e.g. __FLT16_MANT_DIG__ is defined (like float.h does), or by including 
float.h (with __STDC_WANT_IEC_60559_TYPES_EXT__ defined) and then testing 
whether one of the FLT16_* macros is defined, or in a configure test by 
just declaring something using the _Float16 type.  Patch 1 changes 
check_effective_target_float16 to work around your technique for disabling 
_Float16 in ix86_expand_move, but it should be considered a stable user 
API that any of the above methods can be used in user code to check for 
_Float16 support - user code shouldn't need to know implementation details 
that you need to do something that will go through ix86_expand_move to see 
whether _Float16 is supported or not (and user code shouldn't need to use 
a configure test at all for this, testing FLT16_* after including float.h 
should work as a fully portable way of testing it - that's using only ISO 
C facilities).

So enable HFmode in ix86_scalar_mode_supported_p and 
ix86_libgcc_floating_mode_supported_p exactly when all operations are 
supported in the rest of the compiler - don't enable it there and then 
disable it elsewhere, because that will break user code testing for 
whether _Float16 is available using FLT16_* macros.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 0/2] Initial support for AVX512FP16

Reply via email to