On Tue, 11 Dec 2018, David Gibson wrote:
On Mon, Dec 10, 2018 at 09:54:51PM +0100, BALATON Zoltan wrote:
Yes, I don't really know what these tests use but I think "lame" test is
mostly floating point but tried with "lame_vmx" which should at least use
some vector ops and "mplayer -benchmark" test is more vmx dependent based on
my previous profiling and testing with hardfloat but I'm not sure. (When
testing these with hardfloat I've found that lame was benefiting from
hardfloat but mplayer wasn't and more VMX related functions showed up with
mplayer so I assumed it's more VMX bound.)
I should clarify here. When I say "floating point" above, I'm not
meaning things using the regular FPU instead of the vector unit. I'm
saying *anything* involving floating point calculations whether
they're done in the FPU or the vector unit.
OK that clarifies it. I admit I was only testing these but didn't have
time to look what changed exactly.
The patches here don't convert all VMX instructions to use vector TCG
ops - they only convert a few, and those few are about using the
vector unit for integer (and logical) operations. VMX instructions
involving floating point calculations are unaffected and will still
use soft-float.
What I've said above about lame test being more FPU and mplayer more VMX
intensive probably still holds as I've retried now on a Haswell i5 and got
1-2% difference with lame_vmx and ~6% with mplayer. That's very little
improvement but if only some VMX instructions should be faster then this
may make sense.
These tests are not the best, maybe there are better ways to measure this
but I don't know of any,
Maybe the PPC softmmu should be reviewed and optimised by someone who knows
it...
I'm not sure there is anyone who knows it at this point. I probably
know it as well as anybody, and the ppc32 code scares me. It's a
crufty mess and it would be nice to clean up, but that requires
someone with enough time and interest.
At least this seems to be a big bottleneck in PPC emulation and one that's
not being worked on (others like hardfloat and VMX while not finished and
still lot to do but already there are some results but no one is looking
at softmmu). I was just trying to direct some attention to that softmmu
may also need some optimisation and hope someone would notice this. I have
some interest but not much time these days and if it scares you what
should I say. I don't even understand most of it so it would take a lot of
time to even get how it works and what would need to be done. So I hope
someone with more time or knowledge shows up and maybe at least provides
some hints on what may need to be done.
Regards,
BALATON Zoltan