On Tue, 19 May 2026 at 11:16, Peter Maydell <[email protected]> wrote:
>
> On Sun, 17 May 2026 at 01:32, Richard Henderson
> <[email protected]> wrote:
> >
> > Signed-off-by: Richard Henderson <[email protected]>
> > ---
>
>
> > +static uint8_t fcvt_fp8_e4m3_output(FloatParts64 *p, int scale,
> > +                                    bool saturate, float_status *s)
> > +{
> > +    *p = parts64_scalbn(p, scale, s);
> > +    /*
> > +     * Saturating Inf -> Max handled in uncanon_e4m3_overflow
> > +     * because there is no infinity encoding.
> > +     */
> > +    return float8_e4m3_round_pack_canonical(p, s, saturate);
> > +}
>
> The E4M3 conversion code doesn't seem to produce the right
> output for an input NaN. This looks like it's because the
> fpu/ code isn't special-casing "there's only one NaN value
> for this format". Here, if we input a QNaN the scalbn operation
> will ignore the specific input NaN and return the default NaN.
> There's no special-casing of E4M3 in partsN(default_nan), so
> you get a frac value that's based on the default_nan_pattern,
> which for Arm has the MSB frac bit clear. Then in partsN(uncanon)
> we just move that frac value down into the right place for E4M3,
> without enforcing that it really is the all-ones NaN.
>
> I think this should fix this, but maybe there's a better way?
>
> @@ -519,6 +519,14 @@ static void partsN(uncanon)(FloatPartsN *p,
> float_status *s,
>          case float_class_snan:
>              assert(fmt->exp_max_kind != float_expmax_normal);
>              p->exp = fmt->exp_max;
> +            if (fmt->exp_max_kind == float_expmax_e4m3) {
> +                /*
> +                 * E4M3 has exactly one NaN, with the frac field all-ones.
> +                 * Using the frac from the input NaN might result in
> +                 * a non-NaN value.
> +                 */
> +                fracN(allones)(p);
> +            }
>              fracN(shr)(p, fmt->frac_shift);
>              return;
>          default:

If we do take this approach we also need something like:

--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -288,6 +288,7 @@ static void
partsN(uncanon_e4m3_overflow)(FloatPartsN *p, float_status *s,
         p->frac_hi = E4M3_NORMAL_FRAC_MAX;
     } else {
         *p = partsN(default_nan)(s);
+        fracN(allones)(p);
     }
 }

because the NaN value from uncanon_e4m3_overflow isn't passed
through the NaN handling part of uncanon.

Though there's an argument for making default_nan() for the
format return the right thing I guess.

-- PMM

Reply via email to