On Wed, 26 Feb 2020, Alex Bennée wrote:
That's the wrong way around. We have regression tests for a reason. I'll
happily accept patches to turn on hardfloat for PPC if:
a) they don't cause regressions in our fairly extensive floating point
tests
Where are these tests and how to run them? I'm not aware of such tests so
I've only tried running simple guest code to test changes but if there are
more extensive FP tests I'd better use those.
b) the PPC maintainers are happy with the new performance profile
The way forward would be to:
1. patch to drop #if defined(TARGET_PPC) || defined(__FAST_MATH__)
This is simple but I've found that while it seems to make some vector
instructions faster it also makes most simple FP ops slower because it
will go thorugh checking if it can use hardfloat but never can because the
fp_status is cleared before every FP op. That's why I've set inexact bit
to let it use hardfloat and be able to test if it would work at all.
That's all my RFC patch did, I've had a 2nd version trying to avoid slow
down with above #if defined() dropped but hardfloat=false so it only uses
softfloat as before but it did not worked out too well, some tests said v2
was even slower. Maybe to avoid overhead we should add a variable instead
of the QEMU_NO_HARDFLOAT define that can be set during runtime but
probably that won't be faster either. Thus it seems there's no way to
enable hardfloat for PPC and not have slower performance for most FP ops
without also doing some of the other points below (even if it's beneficial
for vector ops).
2. audit target/ppc/fpu_helper.c w.r.t chip manual and fix any unneeded
splatting of flags (if any)
This would either need someone who knows PPC FPU or someone who can take
the time to learn and go through the code. Not sure I want to volunteer
for that. But I think the clearing of the flags is mainly to emulate FI
bit which is an non-sticky inexact bit that should show the inexact status
of last FP op. (There's another simliar bit for fraction rounded as well
but that does not disable hardfloat.) Question is if we really want to
accurately emulate these bits? Are there any software we care about
relying on these? If we can live with not having correct FI bit emulation
(which was the case for a long time until these were fixed) then we could
have an easy way to enable hardfloat without more extensive changes. If we
want to accurately emulate also these bits then we probably will need
changes to softfloat to allow registering FP exception handlers so we
don't have to clear and check bits but can get an exception from FPU and
then can set those bits but I have no idea how to do that.
Regards,
BALATON Zoltan