Hi everyone,
I'm experimenting with adding _Float16 support to s390x and I'm
wondering how targetm.c.excess_precision() should behave when called for
EXCESS_PRECISION_TYPE_FLOAT16. The default implementation returns
unconditionally FLT_EVAL_METHOD_PROMOTE_TO_FLOAT. If I keep this
behavior then gcc.dg/tree-ssa/pow_fold_1.c is failing since for test
pow1over_f16, while using -fexcess-precision=16, we emit
void pow1over_f16 (_Float16 x, _Float16 y)
{
_Float16 t4;
_Float16 t3;
_Float16 t2;
_Float16 t1;
float _1;
float _2;
<bb 2> [local count: 1073741824]:
_1 = (float) x_4(D);
_2 = 1.0e+0 / _1;
t1_5 = (_Float16) _2;
t2_7 = __builtin_powf16 (t1_5, y_6(D));
t3_8 = -y_6(D);
t4_9 = __builtin_powf16 (x_4(D), t3_8);
if (t2_7 != t4_9)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 3> [local count: 354334800]:
link_error (); [tail call]
<bb 4> [local count: 1073741824]:
return;
}
Since the division 1.0f16 / x is done by first up casting to float, FRE
cannot fold t2 != t4 and the call to link_error() stays which makes the
test fail.
Looking at targets like aarch64, i386, and riscv, they all return
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 for EXCESS_PRECISION_TYPE_FLOAT16 and
therefore do not introduce the upcast.
In order to be sound first of all and second of all to streamline back
ends I'm wondering whether it would make sense for s390x to also have
targetm.c.excess_precision (EXCESS_PRECISION_TYPE_FLOAT16) evaluate to
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16? My understanding is that for
-fexcess-precision=16 it is sound having targetm.c.excess_precision
(EXCESS_PRECISION_TYPE_FLOAT16) returning
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT---which would align with the default
implementation---or FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 but I might be
wrong here. Therefore, any feedback is very welcome.
Cheers,
Stefan