Am 21.08.2015 um 07:49 schrieb Richard Henderson:
On 08/20/2015 09:32 PM, Dennis Luehring wrote:
> gcc prime.c -o prime.out -lm
>
> prime.out runtime
>
> tcg-indirect: ~9.3 sec (best result)
> qemu.org-git: ~11 sec
> without-optimization: ~9.9 sec (worst result)
I presume this is integer prime factoring?
Aurelien Jarno extracted this code from sysbench (just for my qemu
sparc64 tests)
#include <math.h>
unsigned long long max_prime = 2000;
void prime_test()
{
unsigned long long c;
unsigned long long l,t;
unsigned long long n=0;
/* So far we're using very simple test prime number tests in 64bit */
for(c=3; c < max_prime; c++)
{
t = sqrt(c);
for(l = 2; l <= t; l++)
if (c % l == 0)
break;
if (l > t )
n++;
}
}
int main()
{
int i;
for (i = 0 ; i < 10000 ; i++)
{
prime_test();
}
return 0;
}
> g++ src/pugixml.cpp -g -Wall -Wextra -Werror -pedantic -std=c++0x -c -MMD -MP
>
> tcg-indirect: ~2:46.5
> qemu.org-git: ~2:51.2 (worst result)
> without-optimization: ~2:14.1 (best result)
No compiler optimization? I wouldn't expect there to be much for tcg to
optimize there -- dropping values to memory all the time doesn't leave much.
without-optimization means qemu.org-git release build + undefine
USE_TCG_OPTIMIZATIONS in tcg/tcg.c
or what compiler do you mean?
>
> stream results (STREAM version $Revision: 5.10 $)
>
> tcg-indirect: (worst result)
>
> Your clock granularity/precision appears to be 41 microseconds.
> Each test below will take on the order of 632527 microseconds.
> (= 15427 clock ticks)
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 320.8 0.511297 0.498785 0.590214
> Scale: 187.0 0.858693 0.855465 0.863527
> Add: 218.2 1.104654 1.099698 1.110341
> Triad: 169.5 1.433273 1.416321 1.502248
>
> qemu.org-git: (best result)
>
> Your clock granularity/precision appears to be 42 microseconds.
> Each test below will take on the order of 330428 microseconds.
> (= 7867 clock ticks)
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 771.5 0.214717 0.207377 0.244214
> Scale: 288.1 0.573320 0.555401 0.660161
> Add: 423.5 0.633523 0.566661 1.092067
> Triad: 242.9 1.053032 0.987970 1.499563
>
> without-optimization:
>
> Your clock granularity/precision appears to be 41 microseconds.
> Each test below will take on the order of 745254 microseconds.
> (= 18176 clock ticks)
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 316.6 0.524065 0.505313 0.580103
> Scale: 200.5 0.813356 0.798024 0.840986
> Add: 243.9 1.010247 0.984025 1.119149
> Triad: 182.9 1.345601 1.312236 1.427459
These results are weird. Unoptimized less than half the speed of mainline?
Improving optimization (with no extra work, mind) brings the results back down?
yep they are - it seems that the assumption of the involved developers
where speed can be improved / or slowbess comes from is not correct
how are SPARC64 benchmarks done usually?
r~