On 6/9/26 08:23, Peter Maydell wrote:
On Wed, 3 Jun 2026 at 04:29, Richard Henderson
<[email protected]> wrote:

Use softfloat-parts.h so that we can more naturally
perform the required operations witha single rounding step.
This happens to also simplify the NaN detection step.

Signed-off-by: Richard Henderson <[email protected]>
Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
Message-Id: <[email protected]>
---

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index 376fbd48d4..7ef6f5d71b 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -27,6 +27,7 @@
  #include "accel/tcg/helper-retaddr.h"
  #include "qemu/int128.h"
  #include "fpu/softfloat.h"
+#include "fpu/softfloat-parts.h"
  #include "vec_internal.h"
  #include "sve_ldst_internal.h"
      } else {
-        float64 e1r = float16_to_float64(h1r, true, s_f16);
-        float64 e1c = float16_to_float64(h1c, true, s_f16);
-        float64 e2r = float16_to_float64(h2r, true, s_f16);
-        float64 e2c = float16_to_float64(h2c, true, s_f16);
-        float64 t64;
-
          /*
           * The ARM pseudocode function FPDot performs both multiplies
-         * and the add with a single rounding operation.  Emulate this
-         * by performing the first multiply in round-to-odd, then doing
-         * the second multiply as fused multiply-add, and rounding to
-         * float32 all in one step.
+         * and the add with a single rounding operation.
           */
-        t64 = float64_mul(e1r, e2r, s_odd);
-        t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+        FloatParts64 tmp = parts64_mul(&p1r, &p2r, s_std);
+        tmp = parts64_muladd(&p1c, &p2c, &tmp, 0, s_std);

This change results in our incorrectly reporting an input-denormal-used
exception when FPCR.AH=1 and some of the halfprec inputs are denormal.
(For AH=1 we are supposed to set IDC for input-denormal-used only for
single and double precision inputs, not for halfprec. We achieve that by
having vfp_get_fpsr_from_host() mask out the input_denormal_used flag
from FPST_STD_F16.)

Oh, right.


I think this worked previously because we did a float_to_float
conversion using the f16 status (float_to_float consumes an
input denormal, which we don't report because we use the f16
status), and any f16 denormal is not a denormal in f64 and
so the 64-bit ops never see an input denormal, so using s_std
wasn't a problem ?

That sounds right.

(Also, for the operation using s_odd that
is a copy of an fp_status so any exception flags set there
never get copied back to the live version. Presumably we
relied on it never being able to generate an exception?)

I didn't think we were relying on that, only avoiding swapping rounding mode for each vector element.

I think it doesn't matter whether we use s_std or s_f16
in the final round-and-pack step, because the only differences
between the F16 and normal fpstatus are to do with input
denormals (separate flush-to-zero control, and different
denormal-consumed reporting), and that happens in the
"unpack" and "do operation" steps; nothing cares in "pack".
I went with s_f16 because we usually do the whole
"unpack; operation; pack" sequence with a single fp status.

Yep, thanks.


r~

Reply via email to