Am 20.08.2015 um 19:19 schrieb Richard Henderson:
This isn't surprising, because at the moment tcg optimizations are almost
completely ineffective for sparc. The way the register windows are implemented
means that there are very few proper tcg temporaries to optimize.
I've just updated an old branch that attempts to cure this. It creates proper
tcg temporaries for the windowed registers, and uses a bit of recursion to find
the place at which they should be stored.
git://github.com/rth7680/qemu.git tcg-indirect
With a few quick unscientific tests, it appears to help. It would be nice to
put that branch side-by-side with your tests above.
tcg-indirect seems not to improve (stream test degrades even more)
without-optimization means qemu.org-git + undefine USE_TCG_OPTIMIZATIONS
git clone git://github.com/rth7680/qemu.git
cd qemu
git checkout tcg-indirect
g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c
-MMD -MP
tcg-indirect: ~2:46.5
qemu.org-git: ~2:51.2 (worst result)
without-optimization: ~2:14.1 (best result)
gcc prime.c -o prime.out -lm
prime.out runtime
tcg-indirect: ~9.3 sec (best result)
qemu.org-git: ~11 sec
without-optimization: ~9.9 sec (worst result)
stream results (STREAM version $Revision: 5.10 $)
tcg-indirect: (worst result)
Your clock granularity/precision appears to be 41 microseconds.
Each test below will take on the order of 632527 microseconds.
(= 15427 clock ticks)
Function Best Rate MB/s Avg time Min time Max time
Copy: 320.8 0.511297 0.498785 0.590214
Scale: 187.0 0.858693 0.855465 0.863527
Add: 218.2 1.104654 1.099698 1.110341
Triad: 169.5 1.433273 1.416321 1.502248
qemu.org-git: (best result)
Your clock granularity/precision appears to be 42 microseconds.
Each test below will take on the order of 330428 microseconds.
(= 7867 clock ticks)
Function Best Rate MB/s Avg time Min time Max time
Copy: 771.5 0.214717 0.207377 0.244214
Scale: 288.1 0.573320 0.555401 0.660161
Add: 423.5 0.633523 0.566661 1.092067
Triad: 242.9 1.053032 0.987970 1.499563
without-optimization:
Your clock granularity/precision appears to be 41 microseconds.
Each test below will take on the order of 745254 microseconds.
(= 18176 clock ticks)
Function Best Rate MB/s Avg time Min time Max time
Copy: 316.6 0.524065 0.505313 0.580103
Scale: 200.5 0.813356 0.798024 0.840986
Add: 243.9 1.010247 0.984025 1.119149
Triad: 182.9 1.345601 1.312236 1.427459