On 5/23/23 06:57, BALATON Zoltan wrote:
This solves the softfloat related usages, the rest probably are lower overhead, I could
not measure any more improvement with removing asserts on top of this patch. I still have
these functions high in my profiling result:
children self command symbol
11.40% 10.86% qemu-system-ppc helper_compute_fprf_float64
You might need to dig in with perf here, but my first guess is
#define COMPUTE_CLASS(tp) \
static int tp##_classify(tp arg) \
{ \
int ret = tp##_is_neg(arg) * is_neg; \
if (unlikely(tp##_is_any_nan(arg))) { \
float_status dummy = { }; /* snan_bit_is_one = 0 */ \
ret |= (tp##_is_signaling_nan(arg, &dummy) \
? is_snan : is_qnan); \
} else if (unlikely(tp##_is_infinity(arg))) { \
ret |= is_inf; \
} else if (tp##_is_zero(arg)) { \
ret |= is_zero; \
} else if (tp##_is_zero_or_denormal(arg)) { \
ret |= is_denormal; \
} else { \
ret |= is_normal; \
} \
return ret; \
}
The tests are poorly ordered, testing many unlikely things before the most likely thing
(normal). A better ordering would be
if (likely(tp##_is_normal(arg))) {
} else if (tp##_is_zero(arg)) {
} else if (tp##_is_zero_or_denormal(arg)) {
} else if (tp##_is_infinity(arg)) {
} else {
// nan case
}
Secondly, we compute the classify bitmask, and then deconstruct the mask again in
set_fprf_from_class. Since we don't use the classify bitmask for anything else, better
would be to compute the fprf value directly in the if-ladder.
11.25% 0.61% qemu-system-ppc helper_fmadds
This is unsurprising, and nothing much that can be done.
All of the work is in muladd doing the arithmetic.
Unrelated to this patch I also started to see random crashes with a DSI on a dcbz
instruction now which did not happen before (or not frequently enough for me to notice). I
did not bisect that as it happens randomly but I wonder if it could be related to recent
unaligned access changes or some other TCG change? Any idea what to check?
No idea.
r~