On Mon, Jul 5, 2021 at 3:21 AM Hongtao Liu via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, Jul 2, 2021 at 4:03 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > > > >   AVX512FP16 is disclosed, refer to [1].
> > > > >   There're 100+ instructions for AVX512FP16, 67 gcc patches, for the 
> > > > > convenience of review, we divide the 67 patches into 2 major parts.
> > > > >   The first part is 2 patches containing basic support for AVX512FP16 
> > > > > (options, cpuid, _Float16 type, libgcc, etc.), and the second part is 
> > > > > 65 patches covering all instructions of AVX512FP16(including 
> > > > > intrinsic support and some optimizations).
> > > > >   There is a problem with the first part, _Float16 is not a C++ 
> > > > > standard, so the front-end does not support this type and its 
> > > > > mangling, so we "make up" a _Float16 type on the back-end and use 
> > > > > _DF16 as its mangling. The purpose of this is to align with llvm 
> > > > > side, because llvm C++ FE already supports _Float16[2].
> > > > >
> > > > > [1] 
> > > > > https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html
> > > > > [2] https://reviews.llvm.org/D33719
> > > >
> > > > Looking through implementation of _Float16 support, I think, there is
> > > > no need for _Float16 support to depend on AVX512FP16.
> > > >
> > > > The compiler is smart enough to use either a named pattern that
> > > > describes the instruction when available or diverts to a library call
> > > > to a soft-fp implementation. So, I think that general _Float16 support
> > > > should be implemented first (similar to _float128) and then upgraded
> > > > with AVX512FP16 specific instructions.
> > > >
> > > > MOVW loads/stores to XMM reg can be emulated with MOVD and a SImode
> > > > secondary_reload register.
> > > >
> > > MOVD is under sse2, so is pinsrw, which means if we want xmm
> > > load/stores for HF, sse2 is the least requirement.
> > > Also we support PEXTRW reg/m16, xmm, imm8 under SSE4_1 under which we
> > > have 16bit direct load/store for HFmode and no need for a secondary
> > > reload.
> > > So for simplicity, can we just restrict _Float16 under sse4_1?
> >
> > When baseline is not met, the equivalent integer calling convention is
> > used, for example:
> Problem is under TARGET_SSE and w/ -mno-sse2, float calling convention
>  is available for sse register, it's ok for float since there's movss
> under sse, but there's no 16bit load/store for sse registers, nor
> movement between gpr and sse register.

You can always spill though, that's prefered for some archs
over xmm <-> gpr moves anyway.

Richard.

> >
> > --cut here--
> > typedef int __v2si __attribute__ ((vector_size (8)));
> >
> > __v2si foo (__v2si a, __v2si b)
> > {
> >   return a + b;
> > }
> > --cut here--
> >
> > will still compile with -m32 -mno-mmx with warnings:
> >
> > mmx1.c: In function ‘foo’:
> > mmx1.c:4:1: warning: MMX vector return without MMX enabled changes the
> > ABI [-Wpsabi]
> > mmx1.c:3:8: warning: MMX vector argument without MMX enabled changes
> > the ABI [-Wpsabi]
> >
> > So, by setting the baseline to SSE4.1, a big pool of targets will be
> > forced to use alternative ABI. This is quite inconvenient, and we
> > revert to the alternative ABI if we *really*  can't satisfy ABI
> > requirements (e.g. register type is not available, basic move insn
> > can't be implemented). Based on your analysis, I think that SSE2
> > should be the baseline.
> Agreed.
> >
> > Also, looking at insn tables, it looks that movzwl from memory + movd
> > is faster than pinsrw (and similar for pextrw to memory), but I have
> > no hard data here.
> >
> > Regarding secondary_reload, a scratch register is needed in case of
> > HImode moves between memory and XMM reg, since scratch register needs
> > a different mode than source and destination. Please see
> > TARGET_SECONDARY_RELOAD documentation and several examples in the
> > source.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao

Reply via email to