On Mon, Jul 12, 2021 at 8:59 PM Wang, Pengfei <pengfei.w...@intel.com> wrote: > > > Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > Can you please explain the behavior here? Is there difference between > _Float16 and _Complex _Float16 when return? I.e., > 1, In which case will _Float16 values return in both %xmm0 and %xmm1? > 2, For a single _Float16 value, are both real part and imaginary part > returned in %xmm0? Or returned in %xmm0 and %xmm1 respectively?
Here is the v2 patch to add the missing _Float16 bits. The PDF file is at https://gitlab.com/x86-psABIs/i386-ABI/-/wikis/Intel386-psABI > Thanks > Pengfei > > -----Original Message----- > From: llvm-dev <llvm-dev-boun...@lists.llvm.org> On Behalf Of H.J. Lu via > llvm-dev > Sent: Friday, July 2, 2021 6:28 AM > To: Joseph Myers <jos...@codesourcery.com> > Cc: llvm-...@lists.llvm.org; GCC Patches <gcc-patches@gcc.gnu.org>; GNU C > Library <libc-al...@sourceware.org>; IA32 System V Application Binary > Interface <ia32-...@googlegroups.com> > Subject: Re: [llvm-dev] [PATCH] Add optional _Float16 support > > On Thu, Jul 1, 2021 at 3:10 PM Joseph Myers <jos...@codesourcery.com> wrote: > > > > On Thu, 1 Jul 2021, H.J. Lu via Gcc-patches wrote: > > > > > 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. > > > > That restricts use of _Float16 to processors with SSE. Is that what > > we want in the ABI, or should _Float16 be available with base 32-bit > > x86 architecture features only, much like _Float128 and the decimal FP > > types > > Yes, _Float16 requires XMM registers. > > > are? (If it is restricted to SSE, we can of course ensure relevant > > libgcc functions are built with SSE enabled, and likewise in glibc if > > that gains > > _Float16 functions, though maybe with some extra complications to get > > relevant testcases to run whenever possible.) > > > > _Float16 functions in libgcc should be compiled with SSE enabled. > > BTW, _Float16 software emulation may require more than just SSE since we need > to do _Float16 load and store with XMM registers. > There is no 16bit load/store for XMM registers without AVX512FP16. > > -- > H.J. > _______________________________________________ > LLVM Developers mailing list > llvm-...@lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -- H.J.
From b48c361b939ef9216184f1a58a9d5052bbeb7551 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Thu, 1 Jul 2021 13:58:00 -0700 Subject: [PATCH v2] Add optional _Float16 support 1. Pass _Float16 and _Complex _Float16 values on stack. 2. Return _Float16 and _Complex _Float16 values in %xmm0/%xmm1 registers. --- low-level-sys-info.tex | 76 ++++++++++++++++++++++++++++++------------ 1 file changed, 54 insertions(+), 22 deletions(-) diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index acaf30e..157509b 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -30,7 +30,8 @@ object, and the term \emph{\textindex{\sixteenbyte{}}} refers to a \subsubsection{Fundamental Types} Table~\ref{basic-types} shows the correspondence between ISO C -scalar types and the processor scalar types. \code{__float80}, +scalar types and the processor scalar types. \code{_Float16}, +\code{__float80}, \code{__float128}, \code{__m64}, \code{__m128}, \code{__m256} and \code{__m512} types are optional. @@ -79,22 +80,27 @@ scalar types and the processor scalar types. \code{__float80}, & \texttt{\textit{any-type} *} & 4 & 4 & unsigned \fourbyte \\ & \texttt{\textit{any-type} (*)()} & & \\ \hline - Floating-& \texttt{float} & 4 & 4 & single (IEEE-754) \\ \cline{2-5} - point & \texttt{double} & 8 & 4 & double (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Float16}$^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & 16-bit (IEEE-754) \\ \cline{2-5} - & \texttt{__float80}$^{\dagger\dagger}$ & 12 & 4 & 80-bit extended (IEEE-754) \\ - & \texttt{long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{float} & 4 & 4 & single (IEEE-754) \\ + \cline{2-5} + Floating- & \texttt{double} & 8 + & 8$^{\dagger\dagger\dagger\dagger}$ & double (IEEE-754) \\ + \cline{2-5} + point & \texttt{__float80}$^{\dagger\dagger}$ & 16 & 16 & 80-bit extended (IEEE-754) \\ + & \texttt{long double}$^{\dagger\dagger\dagger\dagger\dagger}$ & 16 & 16 & 80-bit extended (IEEE-754) \\ \cline{2-5} & \texttt{__float128}$^{\dagger\dagger}$ & 16 & 16 & 128-bit extended (IEEE-754) \\ \hline - Complex& \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ + & \texttt{_Complex _Float16} $^{\dagger\dagger\dagger\dagger\dagger\dagger}$ & 2 & 2 & complex 16-bit (IEEE-754) \\ \cline{2-5} - Floating-& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ - point & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + & \texttt{_Complex float} & 8 & 4 & complex single (IEEE-754) \\ \cline{2-5} - & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ + Complex& \texttt{_Complex double} & 16 & 4 & complex double (IEEE-754) \\ + Floating-& \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ + \cline{2-5} + point & \texttt{_Complex __float80}$^{\dagger\dagger}$ & 24 & 4 & complex 80-bit extended (IEEE-754) \\ & \texttt{_Complex long double}$^{\dagger\dagger\dagger\dagger}$ & & & \\ \cline{2-5} & \texttt{_Complex __float128}$^{\dagger\dagger}$ & 32 & 16 & complex 128-bit extended (IEEE-754) \\ @@ -125,6 +131,8 @@ The \texttt{long double} type is 64-bit, the same as the \texttt{double} type, on the Android{\texttrademark} platform. More information on the Android{\texttrademark} platform is available from \url{http://www.android.com/}.}\\ +\multicolumn{5}{p{13cm}}{\myfontsize $^{\dagger\dagger\dagger\dagger\dagger\dagger}$ +The \texttt{_Float16} type, from ISO/IEC TS 18661-3:2015, is optional.}\\ \end{tabular} } \end{table} @@ -323,6 +331,7 @@ at the time of the call. \begin{table} \Hrule \caption{Register Usage} + \myfontsize \label{fig-reg-usage} \begin{center} \begin{tabular}{l|p{8.35cm}|l} @@ -346,13 +355,29 @@ of some 64bit return types & No \\ \EBP & callee-saved register; optionally used as frame pointer & Yes \\ \ESI & callee-saved register & yes \\ \EDI & callee-saved register & yes \\ -\reg{xmm0}, \reg{ymm0} & scratch registers; also used to pass and return -\code{__m128}, \code{__m256} parameters & No\\ -\reg{xmm1}--\reg{xmm2},& scratch registers; also used to pass -\code{__m128}, & No \\ -\reg{ymm1}--\reg{ymm2} & \code{__m256} parameters & \\ -\reg{xmm3}--\reg{xmm7},& scratch registers & No \\ -\reg{ymm3}--\reg{ymm7} & & \\ +\reg{xmm0} & scratch register; also used to pass the first \code{__m128} + parameter and return \code{__m128}, \code{_Float16}, + the real part of \code{_Complex _Float16} & No \\ +\reg{ymm0} & scratch register; also used to pass the first \code{__m256} + parameter and return \code{__m256} & No \\ +\reg{zmm0} & scratch register; also used to pass the first \code{__m512} + parameter and return \code{__m512} & No \\ +\reg{xmm1} & scratch register; also used to pass the second \code{__m128} + parameter and return the imaginary part of + \code{_Complex _Float16} & No \\ +\reg{ymm1} & scratch register; also used to pass the second \code{__m256} + parameters & No \\ +\reg{zmm1} & scratch register; also used to pass the second \code{__m512} + parameters & No \\ +\reg{xmm2} & scratch register; also used to pass the third \code{__m128} + parameters & No \\ +\reg{ymm2} & scratch register; also used to pass the third \code{__m256} + parameters & No \\ +\reg{zmm2} & scratch register; also used to pass the third \code{__m512} + parameters & No \\ +\reg{xmm3}--\reg{xmm7} & scratch registers & No \\ +\reg{ymm3}--\reg{ymm7} & scratch registers & No \\ +\reg{zmm3}--\reg{zmm7} & scratch registers & No \\ \reg{mm0} & scratch register; also used to pass and return \code{__m64} parameter & No\\ \reg{mm1}--\reg{mm2} & used to pass \code{__m64} parameters & No\\ @@ -420,6 +445,8 @@ and \texttt{unions}) are always returned in memory. & \texttt{\textit{any-type} *} & \EAX \\ & \texttt{\textit{any-type} (*)()} & \\ \hline + & \texttt{_Float16} & \reg{xmm0} \\ + \cline{2-3} & \texttt{float} & \reg{st0} \\ \cline{2-3} Floating- & \texttt{double} & \reg{st0} \\ @@ -430,14 +457,19 @@ and \texttt{unions}) are always returned in memory. \cline{2-3} & \texttt{__float128} & memory \\ \hline - & \texttt{_Complex float} & \EDX:\EAX \\ - & & The real part is returned in \EAX. The imaginary part is + & \texttt{_Complex _Float16} & \reg{xmm0}:\reg{xmm1} \\ + & & The real part is returned in \reg{xmm0}. The imaginary part is + returned \\ + & & in \reg{xmm1}.\\ + \cline{2-3} + Complex & \texttt{_Complex float} & \EDX:\EAX \\ + floating- & & The real part is returned in \EAX. The imaginary part is returned \\ - Complex & & in \EDX.\\ + point & & in \EDX.\\ \cline{2-3} - floating- & \texttt{_Complex double} & memory \\ + & \texttt{_Complex double} & memory \\ \cline{2-3} - point & \texttt{_Complex long double} & memory \\ + & \texttt{_Complex long double} & memory \\ \cline{2-3} & \texttt{_Complex __float80} & memory \\ \cline{2-3} -- 2.31.1