On Thu, Jul 1, 2021 at 2:40 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Thu, Jul 1, 2021 at 4:10 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > [Sorry for double post, gcc-patches address was wrong in original post]
> >
> > On Thu, Jul 1, 2021 at 7:48 AM liuhongt <hongtao....@intel.com> wrote:
> > >
> > > Hi:
> > >   AVX512FP16 is disclosed, refer to [1].
> > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the 
> > > convenience of review, we divide the 67 patches into 2 major parts.
> > >   The first part is 2 patches containing basic support for AVX512FP16 
> > > (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 65 
> > > patches covering all instructions of AVX512FP16(including intrinsic 
> > > support and some optimizations).
> > >   There is a problem with the first part, _Float16 is not a C++ standard, 
> > > so the front-end does not support this type and its mangling, so we "make 
> > > up" a _Float16 type on the back-end and use _DF16 as its mangling. The 
> > > purpose of this is to align with llvm side, because llvm C++ FE already 
> > > supports _Float16[2].
> > >
> > > [1] 
> > > https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > [2] https://reviews.llvm.org/D33719
> >
> > Looking through implementation of _Float16 support, I think, there is
> > no need for _Float16 support to depend on AVX512FP16.
> >
> > The compiler is smart enough to use either a named pattern that
> > describes the instruction when available or diverts to a library call
> > to a soft-fp implementation. So, I think that general _Float16 support
> > should be implemented first (similar to _float128) and then upgraded
> > with AVX512FP16 specific instructions.
> >
> > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > secondary_reload register.
> >
> > soft-fp library already includes all the infrastructure to implement
> > _Float16 (see half.h), so HFmode basic operations should be trivial to
> > implement (I have gone through this exercise personally years ago when
> > implementing __float128 soft-fp support).
> >
> > Looking through the patch 1/2, it looks that a new ABI is introduced,
> > where FP16 values are passed through XMM registers, but I don't think
> > there is updated psABI documentation available (for x86_64 as well as
>
> _Float16 support was added to x86-64 psABI:
>
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/71d1183e7bb95e9f8ad732e0f2b5a4f127796e2a
>
> 2 years ago.

Uh, sorry, my psABI link [1] is way out of date, but this is what
google gives for "x86_64 psABI pdf" ...

[1] https://uclibc.org/docs/psABI-x86_64.pdf

>
> > i386, where FP16 values will probably be passed through memory).
>
> That is correct.
>
> > So, the net effect of the above proposal(s) is that x86 will support
> > _Float16 out-of the box, emulate it via soft-fp without AVX512FP16 and
> > use AVX512FP16 instructions with -mavx512fp16.
> >
>
> The main issue is complex _Float16 functions in libgcc.  If _Float16 doesn't
> require -mavx512fp16, we need to compile complex _Float16 functions in
> libgcc without -mavx512fp16.  Complex _Float16 performance is very
> important for our _Float16 usage.   _Float16 performance has to be
> very fast.  There should be no emulation anywhere when -mavx512fp16
> is used.   That is why _Float16 is available only with -mavx512fp16.

If this performance is important, then the best way is that in
addition to generic versions, recompile these functions for AVX512FP16
target, or even implement them in assembly. The compiler can then call
these specific functions when -mavx512fp16 is used. Please see how
alpha implements calls to  its X_floating library.

Uros.

Reply via email to