On Sun, 17 May 2026 at 01:32, Richard Henderson
<[email protected]> wrote:
>
> Signed-off-by: Richard Henderson <[email protected]>
> ---
> +static uint8_t fcvt_fp8_e4m3_output(FloatParts64 *p, int scale,
> + bool saturate, float_status *s)
> +{
> + *p = parts64_scalbn(p, scale, s);
> + /*
> + * Saturating Inf -> Max handled in uncanon_e4m3_overflow
> + * because there is no infinity encoding.
> + */
> + return float8_e4m3_round_pack_canonical(p, s, saturate);
> +}
The E4M3 conversion code doesn't seem to produce the right
output for an input NaN. This looks like it's because the
fpu/ code isn't special-casing "there's only one NaN value
for this format". Here, if we input a QNaN the scalbn operation
will ignore the specific input NaN and return the default NaN.
There's no special-casing of E4M3 in partsN(default_nan), so
you get a frac value that's based on the default_nan_pattern,
which for Arm has the MSB frac bit clear. Then in partsN(uncanon)
we just move that frac value down into the right place for E4M3,
without enforcing that it really is the all-ones NaN.
I think this should fix this, but maybe there's a better way?
@@ -519,6 +519,14 @@ static void partsN(uncanon)(FloatPartsN *p,
float_status *s,
case float_class_snan:
assert(fmt->exp_max_kind != float_expmax_normal);
p->exp = fmt->exp_max;
+ if (fmt->exp_max_kind == float_expmax_e4m3) {
+ /*
+ * E4M3 has exactly one NaN, with the frac field all-ones.
+ * Using the frac from the input NaN might result in
+ * a non-NaN value.
+ */
+ fracN(allones)(p);
+ }
fracN(shr)(p, fmt->frac_shift);
return;
default:
-- PMM