Hello all,

I am interested in revisiting the return ABI of _Float16 on i386.
Currently it is returned in xmm0, meaning SSE is required for the type.
This is rather inconvenient when _Float16 is otherwise quite well
supported. Compilers need to pick between hacking together a custom ABI
that works on the baseline, or passing the burden on to users to gate
everything.

Is there any interest in adjusting the specification such that _Float16
is returned in a GPR rather than SSE?

This was brought up before in the thread at [1], with the concern about
efficient 16-bit moves between GPRs or memory and XMM. This doesn't seem
to be relevant, however, given there isn't any reason to have a _Float16
in XMM unless F16C is available, implying SSE2 and SSE4.1 for PINSRW and
PEXTRW to/from memory (unless I am missing something?).

A sample patch to the psABI is below. Needless to say there are
compatibility concerns that come from a change but given workarounds
already exist (e.g. in LLVM), it seems worth considering whether
something should be codefied to make this simpler for everyone.

Best regards,
Trevor

[1]: 
https://inbox.sourceware.org/gcc-patches/[email protected]/

(some CCs added from the linked discussion)

--- patch follows ---

>From 1af72db89f9a10b93569fa0b9f64f65f2dd73334 Mon Sep 17 00:00:00 2001
From: Trevor Gross <[email protected]>
Date: Fri, 23 Jan 2026 21:11:43 +0000
Subject: [PATCH] Return _Float16 and _Complex _Float16 in GPRs

Currently the ABI specifies that _Float16 is to be passed on the stack
and returned in xmm0, meaning SSE is required to support the type.
Adjust both _Float16 and _Complex _Float16 to return in eax, dropping
the SSE requirement.

This has the benefit of making _Float16 ABI-compatible with `short`.
---
 low-level-sys-info.tex | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index 0015c8c..a2d8d6d 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -384,8 +384,7 @@ of some 64bit return types & No \\
 \ESI & callee-saved register & yes \\
 \EDI & callee-saved register & yes \\
 \reg{xmm0} & scratch register; also used to pass the first \code{__m128}
-             parameter and return \code{__m128}, \code{_Float16},
-            \code{_Complex _Float16} & No \\
+             parameter and return \code{__m128} & No \\
 \reg{ymm0} & scratch register; also used to pass the first \code{__m256}
              parameter and return \code{__m256} & No \\
 \reg{zmm0} & scratch register; also used to pass the first \code{__m512}
@@ -472,7 +471,11 @@ and \texttt{unions}) are always returned in memory.
     & \texttt{\textit{any-type} *} & \EAX \\
     & \texttt{\textit{any-type} (*)()} & \\
     \hline
-    & \texttt{_Float16} & \reg{xmm0} \\
+    & \texttt{_Float16} & \reg{ax} \\
+    & & The upper 16 bits of \EAX are undefined.
+    The caller must not \\
+    & & rely on these being set in a predefined
+    way by the called function. \\
     \cline{2-3}
     & \texttt{float} & \reg{st0} \\
     \cline{2-3}
@@ -484,7 +487,7 @@ and \texttt{unions}) are always returned in memory.
     \cline{2-3}
     & \texttt{__float128} & memory \\
     \hline
-    & \texttt{_Complex _Float16} & \reg{xmm0} \\
+    & \texttt{_Complex _Float16} & \reg{eax} \\
     & & The real part is returned in bits 0..15. The imaginary part is
         returned \\
     & & in bits 16..31.\\
--
2.50.1 (Apple Git-155)

Reply via email to