RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-15 Thread Wang, Pengfei via Gcc-patches
It seems Clang doesn't support -fexcess-precision=xxx:
https://github.com/llvm/llvm-project/blob/main/clang/test/Driver/clang_f_opts.c#L403

Thanks
Pengfei

-Original Message-
From: Hongtao Liu  
Sent: Thursday, July 15, 2021 2:35 PM
To: Wang, Pengfei 
Cc: Craig Topper ; Jakub Jelinek ; 
Liu, Hongtao ; gcc-patches@gcc.gnu.org; Joseph Myers 

Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

On Thu, Jul 15, 2021 at 10:07 AM Wang, Pengfei  wrote:
>
> Clang for AArch64 promotes each individual operation and rounds immediately 
> afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> fadd operations. It's implemented in the LLVM backend where we can't see what 
> was originally a single expression.
>
>
>
> Yes, but this is not consistent with Clang document. I think we should ask 
> Clang FE to do the promotion and truncation.
>
>
>
> Thanks
>
> Pengfei
>
>
>
> From: llvm-dev  On Behalf Of Craig 
> Topper via llvm-dev
> Sent: Wednesday, July 14, 2021 11:32 PM
> To: Hongtao Liu 
> Cc: Jakub Jelinek ; llvm-dev 
> ; Liu, Hongtao ; 
> gcc-patches@gcc.gnu.org; Joseph Myers 
> Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16
>
>
>
> On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev 
>  wrote:
>
> > >
> > Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to 
> > round after each operation could keep semantics right.
> > And I'll document the behavior difference between soft-fp and
> > AVX512FP16 instruction for exceptions.
> I got some feedback from my colleague who's working on supporting
> _Float16 for llvm.
> The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for 
> soft-fp so that codes can be more efficient.
> i.e.
> _Float16 a, b, c, d;
> d = a + b + c;
>
> would be transformed to
> float tmp, tmp1, a1, b1, c1;
> a1 = (float) a;
> b1 = (float) b;
> c1 = (float) c;
> tmp = a1 + b1;
> tmp1 = tmp + c1;
> d = (_Float16) tmp;
>
> so there's only 1 truncation in the end.
>
> if users want to round back after every operation. codes should be 
> explicitly written as
> _Float16 a, b, c, d, e;
> e = a + b;
> d = e + c;
>
> That's what Clang does, quote from [1]
>  _Float16 arithmetic will be performed using native half-precision 
> support when available on the target (e.g. on ARMv8.2a); otherwise it 
> will be performed at a higher precision (currently always float) and 
> then truncated down to _Float16. Note that C and C++ allow 
> intermediate floating-point operands of an expression to be computed 
> with greater precision than is expressible in their type, so Clang may 
> avoid intermediate truncations in certain cases; this may lead to 
> results that are inconsistent with native arithmetic.
>
>
>
> Clang for AArch64 promotes each individual operation and rounds immediately 
> afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two 
> fadd operations. It's implemented in the LLVM backend where we can't see what 
> was originally a single expression.
>
>
When i'm reading option documents for excess-precision from 
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

-fexcess-precision=style

This option allows further control over excess precision on machines where 
floating-point operations occur in a format with more precision or range than 
the IEEE standard and interchange floating-point types.
By default, -fexcess-precision=fast is in effect; this means that operations 
may be carried out in a wider precision than the types specified in the source 
if that would result in faster code, and it is unpredictable when rounding to 
the types specified in the source code takes place. When compiling C, if 
-fexcess-precision=standard is specified then excess precision follows the 
rules specified in ISO C99; in particular, both casts and assignments cause 
values to be rounded to their semantic types (whereas -ffloat-store only 
affects assignments). This option is enabled by default for C if a strict 
conformance option such as -std=c99 is used. -ffast-math enables 
-fexcess-precision=fast by default regardless of whether a strict conformance 
option is used.

For -fexcess-precision=fast,
 we should set flt_eval_mathond to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for 
soft-fp, and FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for AVX512FP16

For  -fexcess-precision=standard
set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when TARGET_SSE2? so for soft-fp it will 
round back after every operation?
>
>
> and so does arm gcc
> quote from arm.c
>
> /* We can calculate either in 16-bit range and precision or
>32-bit range and precision.  Make that decision based on whether
>we have native support for the ARMv8.2-A 16-bit floating-point
>instructions or not.  */
> return (TARGET_VFP_FP16INST
> ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
>
>
> [1]https://clang.llvm.org/docs/LanguageExtensions.html
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> 

RE: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

2021-07-14 Thread Wang, Pengfei via Gcc-patches
  *   Clang for AArch64 promotes each individual operation and rounds 
immediately afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between 
the two fadd operations. It's implemented in the LLVM backend where we can't 
see what was originally a single expression.

Yes, but this is not consistent with Clang document. I think we should ask 
Clang FE to do the promotion and truncation.

Thanks
Pengfei

From: llvm-dev  On Behalf Of Craig Topper via 
llvm-dev
Sent: Wednesday, July 14, 2021 11:32 PM
To: Hongtao Liu 
Cc: Jakub Jelinek ; llvm-dev ; Liu, 
Hongtao ; gcc-patches@gcc.gnu.org; Joseph Myers 

Subject: Re: [llvm-dev] [PATCH 0/2] Initial support for AVX512FP16

On Wed, Jul 14, 2021 at 12:45 AM Hongtao Liu via llvm-dev 
mailto:llvm-...@lists.llvm.org>> wrote:
> >
> Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
> round after each operation could keep semantics right.
> And I'll document the behavior difference between soft-fp and
> AVX512FP16 instruction for exceptions.
I got some feedback from my colleague who's working on supporting
_Float16 for llvm.
The LLVM side wants to set  FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
soft-fp so that codes can be more efficient.
i.e.
_Float16 a, b, c, d;
d = a + b + c;

would be transformed to
float tmp, tmp1, a1, b1, c1;
a1 = (float) a;
b1 = (float) b;
c1 = (float) c;
tmp = a1 + b1;
tmp1 = tmp + c1;
d = (_Float16) tmp;

so there's only 1 truncation in the end.

if users want to round back after every operation. codes should be
explicitly written as
_Float16 a, b, c, d, e;
e = a + b;
d = e + c;

That's what Clang does, quote from [1]
 _Float16 arithmetic will be performed using native half-precision
support when available on the target (e.g. on ARMv8.2a); otherwise it
will be performed at a higher precision (currently always float) and
then truncated down to _Float16. Note that C and C++ allow
intermediate floating-point operands of an expression to be computed
with greater precision than is expressible in their type, so Clang may
avoid intermediate truncations in certain cases; this may lead to
results that are inconsistent with native arithmetic.

Clang for AArch64 promotes each individual operation and rounds immediately 
afterwards. https://godbolt.org/z/qzGfv6nvo note the fcvts between the two fadd 
operations. It's implemented in the LLVM backend where we can't see what was 
originally a single expression.


and so does arm gcc
quote from arm.c

/* We can calculate either in 16-bit range and precision or
   32-bit range and precision.  Make that decision based on whether
   we have native support for the ARMv8.2-A 16-bit floating-point
   instructions or not.  */
return (TARGET_VFP_FP16INST
? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
: FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);


[1]https://clang.llvm.org/docs/LanguageExtensions.html
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao
___
LLVM Developers mailing list
llvm-...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


RE: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-13 Thread Wang, Pengfei via Gcc-patches
Hi H.J.,

Our LLVM implementation currently use %xmm0 for both _Complex's real part and 
imaginary part. Do we have special reason to use two registers?
We are using one register on X64. Considering the performance, especially the 
register pressure, should it be better to use one register for _Complex 
_Float16 on 32 bits target?

Thanks
Pengfei

-Original Message-
From: H.J. Lu  
Sent: Tuesday, July 13, 2021 10:26 PM
To: Wang, Pengfei ; llvm-...@lists.llvm.org
Cc: Joseph Myers ; GCC Patches 
; GNU C Library ; IA32 
System V Application Binary Interface 
Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support

On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei  wrote:
>
> > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
>
> Can you please explain the behavior here? Is there difference between 
> _Float16 and _Complex _Float16 when return? I.e., 1, In which case will 
> _Float16 values return in both %xmm0 and %xmm1?
> 2, For a single _Float16 value, are both real part and imaginary part 
> returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?

Here is the v2 patch to add the missing _Float16 bits.   The PDF file is at

https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI

> Thanks
> Pengfei
>
> -Original Message-
> From: llvm-dev  On Behalf Of H.J. Lu 
> via llvm-dev
> Sent: Friday, July 2, 2021 6:28 AM
> To: Joseph Myers 
> Cc: llvm-...@lists.llvm.org; GCC Patches ; 
> GNU C Library ; IA32 System V Application 
> Binary Interface 
> Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support
>
> On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers  wrote:
> >
> > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
> >
> > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
> >
> > That restricts use of _Float16 to processors with SSE.  Is that what 
> > we want in the ABI, or should _Float16 be available with base 32-bit
> > x86 architecture features only, much like _Float128 and the decimal 
> > FP types
>
> Yes, _Float16 requires XMM registers.
>
> > are?  (If it is restricted to SSE, we can of course ensure relevant 
> > libgcc functions are built with SSE enabled, and likewise in glibc 
> > if that gains
> > _Float16 functions, though maybe with some extra complications to 
> > get relevant testcases to run whenever possible.)
> >
>
> _Float16 functions in libgcc should be compiled with SSE enabled.
>
> BTW, _Float16 software emulation may require more than just SSE since we need 
> to do _Float16 load and store with XMM registers.
> There is no 16bit load/store for XMM registers without AVX512FP16.
>
> --
> H.J.
> ___
> LLVM Developers mailing list
> llvm-...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
H.J.


RE: [llvm-dev] [PATCH] Add optional _Float16 support

2021-07-12 Thread Wang, Pengfei via Gcc-patches
> Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.

Can you please explain the behavior here? Is there difference between _Float16 
and _Complex _Float16 when return? I.e.,
1, In which case will _Float16 values return in both %xmm0 and %xmm1?
2, For a single _Float16 value, are both real part and imaginary part returned 
in %xmm0? Or returned in %xmm0 and %xmm1 respectively?

Thanks
Pengfei

-Original Message-
From: llvm-dev  On Behalf Of H.J. Lu via 
llvm-dev
Sent: Friday, July 2, 2021 6:28 AM
To: Joseph Myers 
Cc: llvm-...@lists.llvm.org; GCC Patches ; GNU C 
Library ; IA32 System V Application Binary Interface 

Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support

On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers  wrote:
>
> On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote:
>
> > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers.
>
> That restricts use of _Float16 to processors with SSE.  Is that what 
> we want in the ABI, or should _Float16 be available with base 32-bit 
> x86 architecture features only, much like _Float128 and the decimal FP 
> types

Yes, _Float16 requires XMM registers.

> are?  (If it is restricted to SSE, we can of course ensure relevant 
> libgcc functions are built with SSE enabled, and likewise in glibc if 
> that gains
> _Float16 functions, though maybe with some extra complications to get 
> relevant testcases to run whenever possible.)
>

_Float16 functions in libgcc should be compiled with SSE enabled.

BTW, _Float16 software emulation may require more than just SSE since we need 
to do _Float16 load and store with XMM registers.
There is no 16bit load/store for XMM registers without AVX512FP16.

--
H.J.
___
LLVM Developers mailing list
llvm-...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev