On Tue, 26 Oct 1999, Chris Pirih wrote:

> [ lots of bogus and incommensurable stream benchmarks deleted ]
> 
> Come on guys, read the whole output of stream (not just the last
> five lines) and you'll notice that your datasets are not large 
> enough to produce meaningful results.  You should be taking at 
> least 20 clock ticks per test.  It looks like some of you are 
> taking 1 or 2 clock ticks instead.
> -
> Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/dmentre/smp-howto/
> To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]
> 


Thanks for the reminder - we do have to be careful on this.  The system I was
using had 128 MB of memory, and the maximum "ticks per test" was 18 or 
19 with STREAM configured to use every last drop of memory.

To check the consistency of the results (dual 500-MHz Pentium III 
machine, Intel N440BX motherboard, egcs-2.91.66, libc 2.1.1, built '-O2') 
I ran versions of STREAM built with second_cpu.c, second_wall.c, and 
second_cycle.c.  In the latter I used the cycle clock to get very high 
precision timings (2 nsec resolution good enough for you?).  See the 
results below.

If anyone wants to try the binaries, they're at
  ftp://linux-rep.fnal.gov/pub/stream_d/
along with the source codes and checksums.  The binaries are statically 
linked and stripped.  I presume the version with second_cycle.c won't 
work on an Athlon, but the other two will work fine.  I would really 
appreciate someone reporting Athlon results (let's factor out the 
compiler as Alan Cox suggests).



The results (redundant info edited out):

=====================================================================
STREAM with second_cpu.c:

pcqcd1:~/benchmarks/stream_code$ ./stream_cpu
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 4000000, Offset = 0
Total memory required = 91.6 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 150000 microseconds.
   (= 15 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         336.8421       0.1990       0.1900       0.2000
Scale:        336.8421       0.1990       0.1900       0.2000
Add:          384.0000       0.2550       0.2500       0.2600
Triad:        309.6774       0.3210       0.3100       0.3300


=====================================================================
STREAM with second_wall.c:

pcqcd1:~/benchmarks/stream_code$ ./stream_wall
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 156932 microseconds.
   (= 156932 clock ticks)
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         324.2230       0.1979       0.1974       0.1993
Scale:        322.6782       0.1987       0.1983       0.2003
Add:          379.3222       0.2535       0.2531       0.2542
Triad:        292.5438       0.3286       0.3282       0.3295


=====================================================================
STREAM with second_cycle.c:

pcqcd1:~/benchmarks/stream_code$ ./stream_cycle
-------------------------------------------------------------
Cycles/second = 498756675.125641
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 157104 microseconds.
   (= 157104 clock ticks)
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         324.3966       0.1976       0.1973       0.1984
Scale:        322.9465       0.1986       0.1982       0.2001
Add:          379.1082       0.2535       0.2532       0.2552
Triad:        293.3682       0.3280       0.3272       0.3286



Don Holmgren
Fermilab



-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/dmentre/smp-howto/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to