Re: Machines are getting too damn fast

2001-03-16 Thread Joachim Strömbergson

Aloha!

In an earlier mail to the thread I pointed to the STREAM benchmark for
memory sub systems. Additionally, I wrote that I knew there were another
benchmark that tries to analyze word sizes, access latencies for the
different memories in the mem sub system. I know can name that benchmark
(or at least one such benchmark): MOB. Check out:

http://steamboat.cs.ucsb.edu/mob/

The benchmark is currently not in the ports, but it has been tested (see
the mob home page) on FreeBSD. I have downloaded it, compiled it and are
measuring my own system while writing this. [1]

Additionally, the following page is a pretty comprehensive list of links
to tools and documentation relating to performance measurement, analysis
and optimization. 
 
http://www.cs.virginia.edu/~clc5q/perflinks.html

[1] MOB reported the following for my Dual Celeron 533 system:

Allocated 64MB of memory for benchmarking
Performing benchmark: Cache Size/Levels 
Data Caches:
Found L1: [16384]
Found L2: [131072]
Found L3: [8388608]
Instruction Caches:
Found L1: [16384]
Found L2: [131072]
Performing benchmark: Cache Share 
Level 1 cache is not shared 
Level 2 cache is shared 
Performing benchmark: Cache Line Size 
Data Caches:
Elected line size:  32
Elected line size:  28
Elected line size:  108
Instruction Caches:
Elected line size:  100
Performing benchmark: Cache Associativity 
Data Caches:
Detected L1 is 8-way associative.
Detected L2 is 4-way associative.
Detected L3 is 4-way associative.
Instruction Caches:
Detected L1 is 4-way associative.
Performing benchmark: Cache Replacement Policy 
Data Caches:
Detected L1 is random replacement policy
Detected L2 is random replacement policy
Detected L3 is random replacement policy
Instruction Caches:
Detected L1 has LRU replacement policy
Performing benchmark: Cache Write Policy 
Data Caches:
Found L1 replacement policy is write-back/allocate
Found L2 replacement policy is write-back/allocate
Found L3 replacement policy is write-through/no-allocate
Performing benchmark: Cache Indexing (Virtual/Physical) 
Performing benchmark: TLB Page Size 
Data TLBs:
Instruction TLBs:
Performing benchmark: TLB Entry Count 
Data TLBs:
Num entries: 10
Instruction TLBs:
Number of entries not detected
Performing benchmark: TLB Associativity 
Data TLBs:
Found associativity 32 
Instruction TLBs:
Found associativity 32 

# MOB Config file 
# Date Fri Mar 16 17:50:35 2001
# Host: fetis.ninja.se 
# Run params: trials=3 runTime=100 verbosity=2 
[CACHE]
level = 1
type = Data
size = 16384
lineSize = 32
associativity = 8
replacement = random
writeMode = writeBack-alloc
latency = 

[CACHE]
level = 2
type = Shared
size = 131072
lineSize = 28
associativity = 4
replacement = random
writeMode = writeBack-alloc
latency = 1.0869

[CACHE]
level = 3
type = Data
size = 8388608
lineSize = 108
associativity = 4
replacement = random
writeMode = writeThrough-noalloc
latency = 235.7955

[CACHE]
level = 1
type = Instruction
size = 16384
lineSize = 100
associativity = 4
replacement = lru
writeMode = 
latency = 1.0397

[TLB]
type = Data
pageSize = 524288
numEntries = 10
associativity = 32
latency = 

[TLB]
type = Instruction
pageSize = 1024
numEntries = 
associativity = 32
latency = 

-- 
Cheers!
Joachim - Alltid i harmonisk svngning
--- FairLight -- FairLight -- FairLight -- FairLight ---
Joachim Strmbergson ASIC SoC designer, nice to CUTE animals
Phone: +46(0)31 - 27 98 47Web: http://www.ludd.luth.se/~watchman
--- Spamfodder: [EMAIL PROTECTED] ---

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-16 Thread Dan Nelson

In the last episode (Mar 16), Joachim Strmbergson said:
 In an earlier mail to the thread I pointed to the STREAM benchmark
 for memory sub systems. Additionally, I wrote that I knew there were
 another benchmark that tries to analyze word sizes, access latencies
 for the different memories in the mem sub system. I know can name
 that benchmark (or at least one such benchmark): MOB. Check out:
 
 http://steamboat.cs.ucsb.edu/mob/
 
 The benchmark is currently not in the ports, but it has been tested
 (see the mob home page) on FreeBSD. I have downloaded it, compiled it
 and are measuring my own system while writing this. [1]

It's sort of misfiled:

$ cat /usr/ports/devel/mob/pkg-descr

This is a port of mob, that tries to figure out memory system
characteristics at run-time.
$

-- 
Dan Nelson
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: RE: RE: Machines are getting too damn fast

2001-03-12 Thread Matt Dillon


:
:Noted.
:
:Is there a gcc PR associated with this?
:
:http://gcc.gnu.org/cgi-bin/gnatsweb.pl
:
:A GNATS searc for "freebsd kernel" didn't return anything.
:
:-Charles

No idea.  Somewhere around 4.1 my -O2 and -Os kernel compiles just 
stopped working.  There was a bunch of stuff on the list about it.

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-06 Thread Michael Sinz

Matt Dillon [EMAIL PROTECTED] wrote:
 Subject: Re: Machines are getting too damn fast
 
 :throughput.  For example, on the PIII-850 (116MHz FSB and SDRAM, its
 :overclocked) here on my desk with 256KB L2 cache:
 :
 :dd if=/dev/zero of=/dev/null bs=512k count=4000
 :4000+0 records in
 :4000+0 records out
 :2097152000 bytes transferred in 8.229456 secs (254834825 bytes/sec)
 :
 :dd if=/dev/zero of=/dev/null bs=128k count=16000
 :16000+0 records in
 :16000+0 records out
 :2097152000 bytes transferred in 1.204001 secs (1741819224 bytes/sec)
 :
 :Now THAT is a significant difference.  :-)
 
 Interesting.  I get very different results with the 1.3 GHz P4.  The
 best I seem to get is 1.4 GBytes/sec.  I'm not sure what the L2 cache
 is on the box, but it's definitely a consumer model.
 
 dd if=/dev/zero of=/dev/null bs=512k count=4000
 2097152000 bytes transferred in 2.363903 secs (887156520 bytes/sec)
 
 dd if=/dev/zero of=/dev/null bs=128k count=16000
 2097152000 bytes transferred in 1.471046 secs (1425619621 bytes/sec)
 
 If I use lower block sizes the syscall overhead blows up the
 performance (it gets lower rather then higher).  So I figure I don't
 have as much L2 as on your system.

The P4 has other issues when you don't do straight line code.
Any branch mis-predictions cost a minimum of 20 cycles due to the
pipeline plus whatever cache/fetch/decode hits you may get on the
actual target.  This may be why you get lower values than a PIII or
Athelon.  (Both have significantly lower penalty for branch mis-prediction)

-- 
Michael Sinz  Worldgate Communications  [EMAIL PROTECTED]
A master's secrets are only as good as
the master's ability to explain them to others.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-06 Thread Andrew Gallatin


Matt Dillon writes:
  
  I modified my original C program again, this time to simply read
  the data from memory given a block size in kilobytes as an argument.  
  I had to throw in a little __asm to do it right, but here are my results.
  It shows about 3.2 GBytes/sec from the L2 (well, insofar as my
  3-instruction loop goes), and about 1.4 GBytes/sec from main memory.
  
  
  NOTE:  cc x.c -O2 -o x
  
  ./x 4
  3124.96 MBytes/sec (read)
...
  ./x 1024
  1397.90 MBytes/sec (read)
  
  In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2,
  and 444 MBytes/sec from main memory.
  

FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard,
(PC133 ECC Registered Dimms)

./x 4
2393.70 MBytes/sec (read)
./x 8
2398.19 MBytes/sec (read)
...
./x 1024
627.32 MBytes/sec (read)


And a Dual 933MHz PIII SuperMicro 370DER Serverworks HE-SL Chipset
(2-way interleaved PC133 ECC Registered DIMMS)

./x 4
1853.54 MBytes/sec (read)
./x 1024
526.19 MBytes/sec (read)


There's something diabolic about your previous bw test, though.  I
think it only hits one bank of interleaved ram.  On the 370DER it gets
only 167MB/sec.  Every other bw test I've run on the box shows copy
perf at around 260MB/sec (Hbench, lmbench).  I see the same problem on
a PE4400 (also 2-way interleaved); it shows copy perf as 111MB/sec.
Every other test has it at 230MB/sec.

The Athlon copies at 174MB/sec, which is right about what lmbench, hbench,
etc, and your test show.

How's your P4 for floating point?  Is real-life perf as good as the
specbench numbers would indicate, or do you need a better compiler
than GCC to get any benefit from it?  My wife is a statistician, and
she runs some really fp intensive workloads.  This Athlon is faster
than the Serverworks box and (barely) faster than a year-old Alpha
UP1000 for her code.

Drew


--
Andrew Gallatin, Sr Systems Programmer  http://www.cs.duke.edu/~gallatin
Duke University Email: [EMAIL PROTECTED]
Department of Computer Science  Phone: (919) 660-6590

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-06 Thread Peter Seebach

In message [EMAIL PROTECTED], Andrew Gallatin 
writes:
FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard,
(PC133 ECC Registered Dimms)

Note that the KT does *NOT* support ECC.  A few places have claimed it does,
but the VIA chipset spec says it doesn't.  The KV or KX does (I forget the
model #), but the KT is secretly doing no error correcting at all.  I got
burned on this with an ABit VP6, which proclaims loudly that it supports ECC,
but doesn't actually *do* any.

-s

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-06 Thread Matt Dillon

:How's your P4 for floating point?  Is real-life perf as good as the
:specbench numbers would indicate, or do you need a better compiler
:than GCC to get any benefit from it?  My wife is a statistician, and
:she runs some really fp intensive workloads.  This Athlon is faster
:than the Serverworks box and (barely) faster than a year-old Alpha
:UP1000 for her code.
:
:Drew
:
:--
:Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin

My understanding is that Intel focused on FP performance in the P4,
and that it is very, very good at it.  I dunno how to test it though.

GCC generally does not produce very good code, but I would expect that
it would get reasonably close in regards to FP because Intel's FP 
instruction set is a good fit with it.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: Machines are getting too damn fast

2001-03-06 Thread Charles Randall

From: Matt Dillon [mailto:[EMAIL PROTECTED]]
My understanding is that Intel focused on FP performance in the P4,
and that it is very, very good at it.  I dunno how to test it though.

GCC generally does not produce very good code, but I would expect that
it would get reasonably close in regards to FP because Intel's FP 
instruction set is a good fit with it.

Which begs the question I've tried to ask a number of times in different
forums. Who's working on P4 optimizations and code generation for the P4?

Sure, i386 code will run but the benchmarks seem to indicate that peak
performance is heavily dependent on a good optimizing compiler.

A query to the gcc mailing list returned no responses.

Charles

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-06 Thread Mike Silbersack


On Tue, 6 Mar 2001, Matt Dillon wrote:

 My understanding is that Intel focused on FP performance in the P4,
 and that it is very, very good at it.  I dunno how to test it though.

From the benchmarks tom's hardware / others did, I got the impression that
SSE2 performance is awesome, but x87 FPU operations aren't really
improved, so the Athlon still has the advantage there.

 GCC generally does not produce very good code, but I would expect that
 it would get reasonably close in regards to FP because Intel's FP
 instruction set is a good fit with it.

   -Matt

I'm quite confused about Intel's strategy wrt that compiler.  Every time
someone does a benchmark showing Intel's newest processor getting beat at
something, they send code compiled with it to the benchmarker.  However,
they haven't even attempted to make it a popular compiler.  Everything
I've seen/heard indicates that msvc and gcc are all that gets really used
on x86.

My only guess is that part of the company wants to have everyone use it to
get optimal performance out of intel processors, while the other half
wants people to be forced to buy faster processors.  This would explain
why it's still sold, but in such a way that nobody will really buy it.

(The reason I mention this is because someone was talking about trying to
compile the kernel with sun's CC.  Maybe rigging intel's compiler to do so
would be fruitful.)

Mike "Silby" Silbersack



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-06 Thread Kenneth D. Merry

On Tue, Mar 06, 2001 at 10:56:46 -0500, Andrew Gallatin wrote:
 Matt Dillon writes:
   
   I modified my original C program again, this time to simply read
   the data from memory given a block size in kilobytes as an argument.  
   I had to throw in a little __asm to do it right, but here are my results.
   It shows about 3.2 GBytes/sec from the L2 (well, insofar as my
   3-instruction loop goes), and about 1.4 GBytes/sec from main memory.
   
   
   NOTE:  cc x.c -O2 -o x
   
   ./x 4
   3124.96 MBytes/sec (read)
 ...
   ./x 1024
   1397.90 MBytes/sec (read)
   
   In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2,
   and 444 MBytes/sec from main memory.
   
 
 FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard,
 (PC133 ECC Registered Dimms)
 
 ./x 4
 2393.70 MBytes/sec (read)
 ./x 8
 2398.19 MBytes/sec (read)
 ...
 ./x 1024
 627.32 MBytes/sec (read)
 
 
 And a Dual 933MHz PIII SuperMicro 370DER Serverworks HE-SL Chipset
 (2-way interleaved PC133 ECC Registered DIMMS)
 
 ./x 4
 1853.54 MBytes/sec (read)
 ./x 1024
 526.19 MBytes/sec (read)

Dell Precision 420 (i840 chipset) with a single PIII 800 and probably one
RIMM, unknown speed:

{rivendell:/usr/home/ken/src:76:0} ./memspeed 4   
1049.51 MBytes/sec (read)
{rivendell:/usr/home/ken/src:77:0} ./memspeed 1024
378.41 MBytes/sec (read)

The above machine may not have been completely idle, it seems a little
slow.

Dual 1GHz PIII SuperMicro 370DE6 Serverworks HE-SL chipset, 4x256MB PC133
ECC Registered DIMMs:

{gondolin:/usr/home/ken/src:51:0} ./memspeed 4
1985.95 MBytes/sec (read)
{gondolin:/usr/home/ken/src:52:0} ./memspeed 1024
516.62 MBytes/sec (read)

 There's something diabolic about your previous bw test, though.  I
 think it only hits one bank of interleaved ram.  On the 370DER it gets
 only 167MB/sec.  Every other bw test I've run on the box shows copy
 perf at around 260MB/sec (Hbench, lmbench).  I see the same problem on
 a PE4400 (also 2-way interleaved); it shows copy perf as 111MB/sec.
 Every other test has it at 230MB/sec.

The previous test showed about 270MB/sec on my Serverworks box:

{gondolin:/usr/home/ken/src:53:0} ./memory_speed
269.23 MBytes/sec (copy)

Ken
-- 
Kenneth Merry
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: RE: Machines are getting too damn fast

2001-03-06 Thread Matt Dillon

:Which begs the question I've tried to ask a number of times in different
:forums. Who's working on P4 optimizations and code generation for the P4?

I'd be happy if GCC -O2 just worked without introducing bugs.  I want to
be able to compile the kernel with it again.

-Matt

:Sure, i386 code will run but the benchmarks seem to indicate that peak
:performance is heavily dependent on a good optimizing compiler.
:
:A query to the gcc mailing list returned no responses.
:
:Charles




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: Machines are getting too damn fast

2001-03-06 Thread Brandon Gale

This explained in great detail exactly why people are seeing the performance
they are from the P4 etc.  The author knows his stuff.

http://www.emulators.com/pentium4.htm

Brandon

:-Original Message-
:From: [EMAIL PROTECTED]
:[mailto:[EMAIL PROTECTED]]On Behalf Of Kenneth D. Merry
:Sent: Tuesday, March 06, 2001 1:08 PM
:To: Andrew Gallatin
:Cc: Matt Dillon; [EMAIL PROTECTED]
:Subject: Re: Machines are getting too damn fast
:
:
:On Tue, Mar 06, 2001 at 10:56:46 -0500, Andrew Gallatin wrote:
: Matt Dillon writes:
:  
:   I modified my original C program again, this time to simply read
:   the data from memory given a block size in kilobytes as
:an argument.
:   I had to throw in a little __asm to do it right, but here
:are my results.
:   It shows about 3.2 GBytes/sec from the L2 (well, insofar as my
:   3-instruction loop goes), and about 1.4 GBytes/sec from
:main memory.
:  
:  
:   NOTE:  cc x.c -O2 -o x
:  
:   ./x 4
:   3124.96 MBytes/sec (read)
: ...
:   ./x 1024
:   1397.90 MBytes/sec (read)
:  
:   In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2,
:   and 444 MBytes/sec from main memory.
:  
:
: FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard,
: (PC133 ECC Registered Dimms)
:
: ./x 4
: 2393.70 MBytes/sec (read)
: ./x 8
: 2398.19 MBytes/sec (read)
: ...
: ./x 1024
: 627.32 MBytes/sec (read)
:
:
: And a Dual 933MHz PIII SuperMicro 370DER Serverworks HE-SL Chipset
: (2-way interleaved PC133 ECC Registered DIMMS)
:
: ./x 4
: 1853.54 MBytes/sec (read)
: ./x 1024
: 526.19 MBytes/sec (read)
:
:Dell Precision 420 (i840 chipset) with a single PIII 800 and probably one
:RIMM, unknown speed:
:
:{rivendell:/usr/home/ken/src:76:0} ./memspeed 4
:1049.51 MBytes/sec (read)
:{rivendell:/usr/home/ken/src:77:0} ./memspeed 1024
:378.41 MBytes/sec (read)
:
:The above machine may not have been completely idle, it seems a little
:slow.
:
:Dual 1GHz PIII SuperMicro 370DE6 Serverworks HE-SL chipset, 4x256MB PC133
:ECC Registered DIMMs:
:
:{gondolin:/usr/home/ken/src:51:0} ./memspeed 4
:1985.95 MBytes/sec (read)
:{gondolin:/usr/home/ken/src:52:0} ./memspeed 1024
:516.62 MBytes/sec (read)
:
: There's something diabolic about your previous bw test, though.  I
: think it only hits one bank of interleaved ram.  On the 370DER it gets
: only 167MB/sec.  Every other bw test I've run on the box shows copy
: perf at around 260MB/sec (Hbench, lmbench).  I see the same problem on
: a PE4400 (also 2-way interleaved); it shows copy perf as 111MB/sec.
: Every other test has it at 230MB/sec.
:
:The previous test showed about 270MB/sec on my Serverworks box:
:
:{gondolin:/usr/home/ken/src:53:0} ./memory_speed
:269.23 MBytes/sec (copy)
:
:Ken
:--
:Kenneth Merry
:[EMAIL PROTECTED]
:
:To Unsubscribe: send mail to [EMAIL PROTECTED]
:with "unsubscribe freebsd-hackers" in the body of the message
:


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: RE: Machines are getting too damn fast

2001-03-06 Thread Matt Dillon


:
:This explained in great detail exactly why people are seeing the performance
:they are from the P4 etc.  The author knows his stuff.
:
:http://www.emulators.com/pentium4.htm
:
:Brandon

Heh heh.  You can practically see the sweat popping off his face while
reading his article.

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: RE: Machines are getting too damn fast

2001-03-06 Thread Charles Randall

Noted.

Is there a gcc PR associated with this?

http://gcc.gnu.org/cgi-bin/gnatsweb.pl

A GNATS searc for "freebsd kernel" didn't return anything.

-Charles

-Original Message-
From: Matt Dillon [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, March 06, 2001 11:44 AM
To: Charles Randall
Cc: Andrew Gallatin; [EMAIL PROTECTED]
Subject: Re: RE: Machines are getting too damn fast


:Which begs the question I've tried to ask a number of times in different
:forums. Who's working on P4 optimizations and code generation for the P4?

I'd be happy if GCC -O2 just worked without introducing bugs.  I want to
be able to compile the kernel with it again.

-Matt

:Sure, i386 code will run but the benchmarks seem to indicate that peak
:performance is heavily dependent on a good optimizing compiler.
:
:A query to the gcc mailing list returned no responses.
:
:Charles



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-05 Thread Joachim Strömbergson

Aloha!

(Sorry for jumping right in to the thread here...)

This might be a good time to mention the STREAM benchmark developed by
John McCalpin. It measures sustained bandwidth of systems. The official
web page is at:

http://www.cs.virginia.edu/stream/

If you check the section under "PC-compatible Results" you should find
machines comparable to what you are measuring. Also, there are a tool
out there that will measure effective latency for cache levels and main
store for a system - both first page and burst transfers. Need to search
my ever inflating bookmarks... 

The news group comp.arch might be a source of info and would probably be
interested in your results too.

Matt Dillon wrote:
 
 :You should see what speed RamBus they were using, 600 or 800 Mhz. It is
 :pretty fast for large memory writes and reads. It'd be cool to see how
 :the different speeds stack up against one another. DDR comparisons would
 :be cool too. Yeah, for the frequency, you have to take into account that
 :these are different chips than your PIII or Athlons and the performance
 :difference is not simply a linear relation to the frequency rating
 :(i.e.: 1.3Ghz is not really over one-billion instructions per second,
 :just clocks per second). We installed Linux at a UC Free OS User Group
 :installfest here in cincinnati, it was pretty sweet. The machine was a
 :Dell and the case was freakin' huge. It also came with a 21" monitor and
 :stuff. The performace was really good, but not really any better than I
 :hads gleaned from the newer 1Ghz Athlons or PIII's.
 
 It says 800 MHz (PC-800 RIMMs) on the side of the box.
 
 The technical reviews basically say that bulk transfer rates for
 RamBus blow DDR away, but DDR wins for random reads and writes
 due to RamBus's higher startup latency.  I don't have any DDR
 systems to test but I can devise a test program.
 
 Celeron 650 MHz (HP desktop) (DIMM)
 16.16 MBytes/sec (copy)
 
 Pentium III 550 MHz (Dell 2400) (DIMM)
 25.90 MBytes/sec (copy)
 
 Pentium 4 1.3 GHz / PC-800 RIMMs (Sony VAIO)
 32.38 MBytes/sec (copy)
 
 -Matt

-- 
Cheers!
Joachim - Alltid i harmonisk svngning
--- FairLight -- FairLight -- FairLight -- FairLight ---
Joachim Strmbergson ASIC SoC designer, nice to CUTE animals
Phone: +46(0)31 - 27 98 47Web: http://www.ludd.luth.se/~watchman
--- Spamfodder: [EMAIL PROTECTED] ---

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-05 Thread Coleman Kane

I believe DDR still has it bead (PC2100 that is), Rambus serailizes all writes
to 32 bit or 16 bit (an even 8 bit) depending on the modules. I haven't seen any
64bit RIMMs. Also, DDR scales higher (up to 333Mhz I think). Just wait for
that. I've pretty much given up on RamBus, I saw back in '96 when they talked
about how great it would be. I thought it was crap back then. The idea is that
it can queue 28 mem ops, where SDRAM can only do like 4. Typical environments
typically don't do the writes that it looks good on, jsut certain special cases.
That's where you get into the fact that RamBus sucks for use in PCs. Too
specialized. Too proprietary.

Matt Dillon had the audacity to say:
 
 
 :You should see what speed RamBus they were using, 600 or 800 Mhz. It is
 :pretty fast for large memory writes and reads. It'd be cool to see how
 :the different speeds stack up against one another. DDR comparisons would
 :be cool too. Yeah, for the frequency, you have to take into account that
 :these are different chips than your PIII or Athlons and the performance
 :difference is not simply a linear relation to the frequency rating
 :(i.e.: 1.3Ghz is not really over one-billion instructions per second,
 :just clocks per second). We installed Linux at a UC Free OS User Group
 :installfest here in cincinnati, it was pretty sweet. The machine was a
 :Dell and the case was freakin' huge. It also came with a 21" monitor and
 :stuff. The performace was really good, but not really any better than I
 :hads gleaned from the newer 1Ghz Athlons or PIII's.
 
 It says 800 MHz (PC-800 RIMMs) on the side of the box.
 
 The technical reviews basically say that bulk transfer rates for
 RamBus blow DDR away, but DDR wins for random reads and writes
 due to RamBus's higher startup latency.  I don't have any DDR
 systems to test but I can devise a test program.
 
 Celeron 650 MHz (HP desktop) (DIMM)
   16.16 MBytes/sec (copy)
 
 Pentium III 550 MHz (Dell 2400) (DIMM)
   25.90 MBytes/sec (copy)
 
 Pentium 4 1.3 GHz / PC-800 RIMMs (Sony VAIO)
   32.38 MBytes/sec (copy)
 
 
   -Matt
 
 Compile -O2, changing the two occurances of '512' to '4' will reproduce
 the original bulk-transfer rates.  By default this program tests 
 single-transfer (always cache miss).
 
 #include sys/types.h
 #include sys/time.h
 #include stdio.h
 #include stdlib.h
 #include stdarg.h
 #include unistd.h
 
 #define NLOOP 100
 
 char Buf1[2 * 1024 * 1024];
 char Buf2[2 * 1024 * 1024];
 
 int deltausecs(struct timeval *tv1, struct timeval *tv2);
 
 int
 main(int ac, char **av)
 {
 int i;
 double dtime;
 struct timeval tv1;
 struct timeval tv2;
 
 memset(Buf1, 1, sizeof(Buf1));
 for (i = 0; i  10; ++i)
   bcopy(Buf1, Buf2, sizeof(Buf1));
 
 gettimeofday(tv1, NULL);
 for (i = 0; i  NLOOP; ++i) {
   int j;
   int k;
   for (k = sizeof(int); k = 512; k += sizeof(int)) {
   for (j = sizeof(Buf1) - k; j = 0; j -= 512)
   *(int *)(Buf2 + j) = *(int *)(Buf1 + j);
   }
 }
 gettimeofday(tv2, NULL);
 
 dtime = (double)deltausecs(tv1, tv2);
 printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime);
 return(0);
 }
 
 int
 deltausecs(struct timeval *tv1, struct timeval *tv2)
 {
 int usec;
 
 usec = (tv2-tv_usec + 100 - tv1-tv_usec);
 usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100;
 return(usec);
 }
 

 PGP signature


Re: Machines are getting too damn fast

2001-03-05 Thread Chris Dillon

On Mon, 5 Mar 2001, E.B. Dreger wrote:

  Date: Sun, 04 Mar 2001 19:39:09 -0600
  From: David A. Gobeille [EMAIL PROTECTED]
 
  It would also be interesting to see the numbers for an Athlon/PIII
  system with DDR, if anyone has such a machine.

 Personally, I'd be [more] interested in a ServerWorks III HE core chipset
 with four-way interleaved SDRAM. :-)

I've got a ServerWorks III HE-SL system with 512MB of two-way
interleaved PC133 SDRAM and dual PIII-800's.  Is that close enough?
:-)

Here is my "memory bandwidth test", much much simpler and less
scientific than Matt's:

dd if=/dev/zero of=/dev/null bs=10m count=1000
1000+0 records in
1000+0 records out
1048576 bytes transferred in 23.716504 secs (442129245 bytes/sec)

I just did a recent 4.2-STABLE 'make -j 4 buildworld' on that system
in just over 34 minutes.  Here's the time output:
1980.707u 768.223s 34:20.89 133.3%  1297+1456k 39517+6202io 1661pf+0w

 If one _truly_ needs the bandwidth of Rambus (which, IIRC, is
 higher real-world latency than SDRAM), then how about having the
 bus bandwidth to back it up?

The higher real-world latency of RDRAM over SDRAM is what makes the
benefits of its higher bandwidth so questionable.  PC2100 DDR-SDRAM --
which has higher latencies than regular SDRAM but still lower than
RDRAM -- should have it beat soundly, though we'll have to wait for
some systems that are actually designed to take advantage of it to say
for sure.  :-)


-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED]
   FreeBSD: The fastest and most stable server OS on the planet.
   For IA32 and Alpha architectures. IA64, PPC, and ARM under development.
   http://www.freebsd.org




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-05 Thread Matt Dillon

:On Mon, 5 Mar 2001, E.B. Dreger wrote:
:
:I've got a ServerWorks III HE-SL system with 512MB of two-way
:interleaved PC133 SDRAM and dual PIII-800's.  Is that close enough?
::-)
:
:Here is my "memory bandwidth test", much much simpler and less
:scientific than Matt's:
:
:dd if=/dev/zero of=/dev/null bs=10m count=1000
:1000+0 records in
:1000+0 records out
:1048576 bytes transferred in 23.716504 secs (442129245 bytes/sec)
:
:I just did a recent 4.2-STABLE 'make -j 4 buildworld' on that system
:in just over 34 minutes.  Here's the time output:
:1980.707u 768.223s 34:20.89 133.3%  1297+1456k 39517+6202io 1661pf+0w

That is quite impressive for SDRAM, though I'm not exactly sure what's
being measured due to the way /dev/zero and /dev/null operate.  On
my system the above dd test returns around 883MB/sec so I would guess
that it is only doing a read-swipe on the memory.

(sony 1.3G / RIMM)
apollo:/home/dillon dd if=/dev/zero of=/dev/null bs=10m count=1000
1000+0 records in
1000+0 records out
1048576 bytes transferred in 11.867550 secs (883565697 bytes/sec)

On the DELL 2400 I get:

1048576000 bytes transferred in 2.737955 secs (382977810 bytes/sec)

The only thing I don't like about this baby is the IBM IDE hard drive's
write performance.  I only get 10-12 MBytes/sec.  Read performance is
incredible, though... I get 37MB/sec dd'ing from /dev/ad0s1a to
/dev/null.

ad0: 58644MB IBM-DTLA-307060 [119150/16/63] at ata0-master UDMA100

-Matt

: If one _truly_ needs the bandwidth of Rambus (which, IIRC, is
: higher real-world latency than SDRAM), then how about having the
: bus bandwidth to back it up?
:
:The higher real-world latency of RDRAM over SDRAM is what makes the
:benefits of its higher bandwidth so questionable.  PC2100 DDR-SDRAM --
:which has higher latencies than regular SDRAM but still lower than
:RDRAM -- should have it beat soundly, though we'll have to wait for
:some systems that are actually designed to take advantage of it to say
:for sure.  :-)
:
:
:-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-05 Thread Chris Dillon

On Mon, 5 Mar 2001, Matt Dillon wrote:

 :On Mon, 5 Mar 2001, E.B. Dreger wrote:
 :
 :I've got a ServerWorks III HE-SL system with 512MB of two-way
 :interleaved PC133 SDRAM and dual PIII-800's.  Is that close enough?
 ::-)
 :
 :Here is my "memory bandwidth test", much much simpler and less
 :scientific than Matt's:
 :
 :dd if=/dev/zero of=/dev/null bs=10m count=1000
 :1000+0 records in
 :1000+0 records out
 :1048576 bytes transferred in 23.716504 secs (442129245 bytes/sec)
 :
 :I just did a recent 4.2-STABLE 'make -j 4 buildworld' on that system
 :in just over 34 minutes.  Here's the time output:
 :1980.707u 768.223s 34:20.89 133.3%  1297+1456k 39517+6202io 1661pf+0w

 That is quite impressive for SDRAM, though I'm not exactly sure what's
 being measured due to the way /dev/zero and /dev/null operate.  On
 my system the above dd test returns around 883MB/sec so I would guess
 that it is only doing a read-swipe on the memory.

I figured if /dev/null and /dev/zero had relatively little memory
impact, which I figure is the case, it would matter more how dd does
its blocking.  If you reduce the blocksize to something that will fit
in your L2 caches, you'll see a very significant increase in
throughput.  For example, on the PIII-850 (116MHz FSB and SDRAM, its
overclocked) here on my desk with 256KB L2 cache:

dd if=/dev/zero of=/dev/null bs=10m count=200
200+0 records in
200+0 records out
2097152000 bytes transferred in 10.032157 secs (209042982 bytes/sec)

dd if=/dev/zero of=/dev/null bs=512k count=4000
4000+0 records in
4000+0 records out
2097152000 bytes transferred in 8.229456 secs (254834825 bytes/sec)

dd if=/dev/zero of=/dev/null bs=128k count=16000
16000+0 records in
16000+0 records out
2097152000 bytes transferred in 1.204001 secs (1741819224 bytes/sec)

Now THAT is a significant difference.  :-)


 (sony 1.3G / RIMM)
 apollo:/home/dillon dd if=/dev/zero of=/dev/null bs=10m count=1000
 1000+0 records in
 1000+0 records out
 1048576 bytes transferred in 11.867550 secs (883565697 bytes/sec)

Very impressive.  Looks about right given the theoretical maximum
bandwidth of your memory system.

From what I've been gathering, that dd-test shows roughly 1/4 of the
total theoretical memory bandwidth (two reads and two writes
happening?), minus any chipset deficiencies when it comes to memory
transfers.

For example, this PIII-850 on a BX board is actually an overclocked
700MHz 100MHz FSB PIII.  So, it is using a 116MHz FSB and the SDRAM is
running at that speed as well.  The theoretical maximum bandwidth of
this should be 8(bytes-per-clock)*116(MHz)=928MB/sec.  I'm seeing
about 210MB/sec on the dd-test, so thats pretty close to 1/4.

In your case, you are using PC800 RDRAM on a Pentium-4 system on an
i850 board, which IIRC uses a dual-channel RDRAM setup, theoretically
doubling the bandwidth that you could get with only one channel.
Since you're using dual RDRAM channels of PC800 RDRAM, that would be
4(bytes-per-clock)*800(MHz)=3200MB/sec.  Again, the dd-test is pretty
close to 1/4 the total theoretical bandwidth (actually a little over),
since you achieved about 883MB/sec.

In an HE-SL system I have here, it uses dual-interleaved PC133 SDRAM.
That would be 16(bytes-per-clock)*133(MHz)=2128MB/sec.  The dd-test is
a little off the mark of 1/4 the theoretical maximum bandwidth at
442MB/sec, but this thing is the only system I've seen the dd-test on
so far that hasn't used an Intel chipset.  The Intel chipsets have
shown in other benchmarks to be more efficient than anybody else's
chipsets (VIA, AMD, RCC, etc) at doing the memory thing, so that makes
sense.  I think I'm on to something here with this 'cheap' dd-test.
:-)

A PC2100 DDR-SDRAM system will have a theoretical bandwidth of
2100MB/sec (duh), so a dd-test should come out to around 525MB/sec.
I'm guessing it will actually be a bit less since the AMD and
especially VIA chipsets aren't going to be anywhere near as efficient
as the Intels have been.  I'm guessing 95% efficiency for the AMD, so
about 500MB/sec from the dd-test.  85% efficient for the VIA, so
around 450MB/sec.  Anyone care to test that on one or both of the new
AMD-chipset and VIA-chipset DDR systems and see how close it comes?
:-)

 The only thing I don't like about this baby is the IBM IDE hard drive's
 write performance.  I only get 10-12 MBytes/sec.  Read performance is
 incredible, though... I get 37MB/sec dd'ing from /dev/ad0s1a to
 /dev/null.

 ad0: 58644MB IBM-DTLA-307060 [119150/16/63] at ata0-master UDMA100

That is a 75GXP series, which is supposed to be one of the fastest, if
not the fastest, IDE drive on the market right now.  Hmm.


-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED]
   FreeBSD: The fastest and most stable server OS on the planet.
   For IA32 and Alpha architectures. IA64, PPC, and ARM under development.
   http://www.freebsd.org



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe 

Re: Machines are getting too damn fast

2001-03-05 Thread Matt Dillon

:throughput.  For example, on the PIII-850 (116MHz FSB and SDRAM, its
:overclocked) here on my desk with 256KB L2 cache:
:
:dd if=/dev/zero of=/dev/null bs=512k count=4000
:4000+0 records in
:4000+0 records out
:2097152000 bytes transferred in 8.229456 secs (254834825 bytes/sec)
:
:dd if=/dev/zero of=/dev/null bs=128k count=16000
:16000+0 records in
:16000+0 records out
:2097152000 bytes transferred in 1.204001 secs (1741819224 bytes/sec)
:
:Now THAT is a significant difference.  :-)

Interesting.  I get very different results with the 1.3 GHz P4.  The
best I seem to get is 1.4 GBytes/sec.  I'm not sure what the L2 cache
is on the box, but it's definitely a consumer model.

dd if=/dev/zero of=/dev/null bs=512k count=4000
2097152000 bytes transferred in 2.363903 secs (887156520 bytes/sec)

dd if=/dev/zero of=/dev/null bs=128k count=16000
2097152000 bytes transferred in 1.471046 secs (1425619621 bytes/sec)

If I use lower block sizes the syscall overhead blows up the 
performance (it gets lower rather then higher).  So I figure I don't
have as much L2 as on your system.

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-05 Thread Matt Dillon

:IIRC, Intel is using a very different caching method on the P4 from
:what we are used to on just about every other x86 processor we've
:seen.  Well, I can't remember if the data cache has changed much, but
:the instruction cache has.  I doubt the difference in instruction
:cache behaviour would make a difference here though.  Hmm.
:
:I wonder if it makes any difference that I'm using -march=pentium
:-mcpu=pentium for my CFLAGS?  Actually, the kernel I tested on might
:even be using -march/-mcpu=pentiumpro, since I only recently changed
:it to =pentium to allow me to do buildworlds for another Pentium-class
:machine.  I did wonder the same thing a while back and did the same
:test with and without the optimizations, and with pentiumpro opts the
:big block size transfer rate went _down_ a little bit, which was odd.
:I didn't compare with L2-cache-friendly blocks, though.
:
:-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED]

I modified my original C program again, this time to simply read
the data from memory given a block size in kilobytes as an argument.  
I had to throw in a little __asm to do it right, but here are my results.
It shows about 3.2 GBytes/sec from the L2 (well, insofar as my
3-instruction loop goes), and about 1.4 GBytes/sec from main memory.


NOTE:  cc x.c -O2 -o x

./x 4
3124.96 MBytes/sec (read)

./x 8
3242.45 MBytes/sec (read)

./x 16
3060.93 MBytes/sec (read)

./x 32
3359.97 MBytes/sec (read)

./x 64
3362.06 MBytes/sec (read)

./x 128
3365.53 MBytes/sec (read)

./x 240
3307.86 MBytes/sec (read)

./x 256
3232.33 MBytes/sec (read)

./x 512
1396.45 MBytes/sec (read)

./x 1024
1397.90 MBytes/sec (read)

In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2,
and 444 MBytes/sec from main memory.

-Matt

/*
 * NOTE:  cc x.c -O2 -o x
 */

#include sys/types.h
#include sys/time.h
#include stdio.h
#include stdlib.h
#include stdarg.h
#include unistd.h

int deltausecs(struct timeval *tv1, struct timeval *tv2);

int
main(int ac, char **av)
{
int i;
int bytes;
double dtime;
struct timeval tv1;
struct timeval tv2;
char *buf;

if (ac == 1) {
fprintf(stderr, "%s numKB\n", av[0]);
exit(1);
}
bytes = strtol(av[1], NULL, 0) * 1024;
if (bytes  4 * 1024 || bytes  256 * 1024 * 1024) {
fprintf(stderr, "Oh please.  Try a reasonable value\n");
exit(1);
}
buf = malloc(bytes);
if (buf == NULL) {
perror("malloc");
exit(1);
}
bzero(buf, bytes);

gettimeofday(tv1, NULL);
for (i = 0; i  10; i += bytes) {
register int j;

for (j = bytes - 4; j = 0; j -= 4)
__asm __volatile("movl (%0,%1),%%eax" : 
"=r" (buf), "=r" (j) :
"0" (buf), "1" (j) : "ax" );
}
gettimeofday(tv2, NULL);

dtime = (double)deltausecs(tv1, tv2);
printf("%6.2f MBytes/sec (read)\n", (double)10 / dtime);
return(0);
}

int
deltausecs(struct timeval *tv1, struct timeval *tv2)
{
int usec;

usec = (tv2-tv_usec + 100 - tv1-tv_usec);
usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100;
return(usec);
}


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Machines are getting too damn fast

2001-03-04 Thread Matt Dillon

I was browsing CompUSA today and noticed they were selling Sony
VAIO 1.3 and 1.5 GHz desktops, amoung other things.  It's amazing
how fast processors have gotten just in the last two years!  I just
had to pick up one of these babies and give it a run through to see
how fast the RamBus memory is.

I'm suitably impressed, at least when comparing it against other Intel
cpu's.  Intel is finally getting some decent memory bandwidth.  I've
included some memory copying tests below.  The actual memory bandwidth
is 2x what the test reports since it's a copy test.

Sony 1.3 GHz Pentium 4 VAIO w/ 128MB RamBus memory (two 64MB RIMMs)
571.20 MBytes/sec (copy)

650 MHz Celleron (HP desktop, DIMM)
114.65 MBytes/sec (copy)

750 MHz P-III (2U VALINUX box, 2-cpu, 1024M ECC-DIMM)
162.20 MBytes/sec (copy)

700 MHz Celeron(?) (1U VALINUX box, 1-cpu, 128MB DIMM)
93.56 MBytes/sec (copy)  yuch

550 MHz P-III (4U Dell 2400, 1-cpu, 256MB DIMM)
225.92 MBytes/sec (copy)

600 MHz P-III (2U Dell 2450, 2-cpus, 512MB DIMM))
228.91 MBytes/sec (copy)

I was somewhat disappointed with the VALINUX boxes, I expected them to
be on par with the DELLs.  In anycase, the Sony VAIO workstation with
the RamBus memory blew the field away.  The cpu is so fast that a
buildworld I did was essentially I/O bound.  I'll have to go and buy some 
more RamBus memory for the thing (it only came with 128MB), which is 
kinda of annoying seeing as I have a gigabyte worth of DIMMs just sitting
on my desk :-( that I can't use.

I'm tring to imagine 1.3 GHz.  That's over a billion instructions
a second.  And in a few years with the new chip fab lithography
standards it's going to be 10 GHz.

We need to find something more interesting then buildworlds to do on
these machines.

-Matt


/*
 * Attempt to test memory copy speeds.  Use a buffer large enough to
 * defeat the on-cpu L1 and L2 caches.
 */

#include sys/types.h
#include sys/time.h
#include stdio.h
#include stdlib.h
#include stdarg.h
#include unistd.h

#define NLOOP   100

char Buf1[2 * 1024 * 1024];
char Buf2[2 * 1024 * 1024];

int deltausecs(struct timeval *tv1, struct timeval *tv2);

int
main(int ac, char **av)
{
int i;
double dtime;
struct timeval tv1;
struct timeval tv2;

memset(Buf1, 1, sizeof(Buf1));
for (i = 0; i  10; ++i)
bcopy(Buf1, Buf2, sizeof(Buf1));

gettimeofday(tv1, NULL);
for (i = 0; i  NLOOP; ++i)
bcopy(Buf1, Buf2, sizeof(Buf1));
gettimeofday(tv2, NULL);

dtime = (double)deltausecs(tv1, tv2);
printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime);
return(0);
}

int
deltausecs(struct timeval *tv1, struct timeval *tv2)
{
int usec;

usec = (tv2-tv_usec + 100 - tv1-tv_usec);
usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100;
return(usec);
}


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-04 Thread Tyler K McGeorge


- Original Message - 
From: Matt Dillon [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, March 04, 2001 3:34 AM
Subject: Machines are getting too damn fast


| We need to find something more interesting then buildworlds to do on
| these machines.

Let's just complicate the code more.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



RE: Machines are getting too damn fast

2001-03-04 Thread Dominic Marks

 Let's just complicate the code more.

Hey it works for Microsoft after all! Perhaps if buildworld takes 3 days
even on a an eight CPU AlphaServer FreeBSD will rocket to almost completely
domination of the OS market?

Then again - maybe not :)

Dominic

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Tyler K McGeorge
Sent: 04 March 2001 10:21
To: Matt Dillon; [EMAIL PROTECTED]
Subject: Re: Machines are getting too damn fast



- Original Message -
From: Matt Dillon [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Sunday, March 04, 2001 3:34 AM
Subject: Machines are getting too damn fast


| We need to find something more interesting then buildworlds to do on
| these machines.

Let's just complicate the code more.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-04 Thread Mark Huizer

On Sun, Mar 04, 2001 at 04:21:02AM -0600, Tyler K McGeorge wrote:
 
 - Original Message - 
 From: Matt Dillon [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Sunday, March 04, 2001 3:34 AM
 Subject: Machines are getting too damn fast
 
 | We need to find something more interesting then buildworlds to do on
 | these machines.
 
 Let's just complicate the code more.
 
We could vendor-import win2000?

Mark
-- 
Nice testing in little China...

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-04 Thread David A. Gobeille

Tyler K McGeorge wrote:
 
 - Original Message -
 From: Matt Dillon [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Sunday, March 04, 2001 3:34 AM
 Subject: Machines are getting too damn fast
 
 | We need to find something more interesting then buildworlds to do on
 | these machines.
 
 Let's just complicate the code more.
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message

How about bw/s -- buildworlds per second ;-)

It would also be interesting to see the numbers for an Athlon/PIII
system with DDR, if anyone has such a machine.

-- 

Dave

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-04 Thread Coleman Kane

You should see what speed RamBus they were using, 600 or 800 Mhz. It is
pretty fast for large memory writes and reads. It'd be cool to see how
the different speeds stack up against one another. DDR comparisons would
be cool too. Yeah, for the frequency, you have to take into account that
these are different chips than your PIII or Athlons and the performance
difference is not simply a linear relation to the frequency rating
(i.e.: 1.3Ghz is not really over one-billion instructions per second,
just clocks per second). We installed Linux at a UC Free OS User Group
installfest here in cincinnati, it was pretty sweet. The machine was a
Dell and the case was freakin' huge. It also came with a 21" monitor and
stuff. The performace was really good, but not really any better than I
hads gleaned from the newer 1Ghz Athlons or PIII's.

Matt Dillon had the audacity to say:
 
 I was browsing CompUSA today and noticed they were selling Sony
 VAIO 1.3 and 1.5 GHz desktops, amoung other things.  It's amazing
 how fast processors have gotten just in the last two years!  I just
 had to pick up one of these babies and give it a run through to see
 how fast the RamBus memory is.
 
 I'm suitably impressed, at least when comparing it against other Intel
 cpu's.  Intel is finally getting some decent memory bandwidth.  I've
 included some memory copying tests below.  The actual memory bandwidth
 is 2x what the test reports since it's a copy test.
 
 Sony 1.3 GHz Pentium 4 VAIO w/ 128MB RamBus memory (two 64MB RIMMs)
   571.20 MBytes/sec (copy)
 
 650 MHz Celleron (HP desktop, DIMM)
   114.65 MBytes/sec (copy)
 
 750 MHz P-III (2U VALINUX box, 2-cpu, 1024M ECC-DIMM)
   162.20 MBytes/sec (copy)
 
 700 MHz Celeron(?) (1U VALINUX box, 1-cpu, 128MB DIMM)
   93.56 MBytes/sec (copy)  yuch
 
 550 MHz P-III (4U Dell 2400, 1-cpu, 256MB DIMM)
   225.92 MBytes/sec (copy)
 
 600 MHz P-III (2U Dell 2450, 2-cpus, 512MB DIMM))
   228.91 MBytes/sec (copy)
 
 I was somewhat disappointed with the VALINUX boxes, I expected them to
 be on par with the DELLs.  In anycase, the Sony VAIO workstation with
 the RamBus memory blew the field away.  The cpu is so fast that a
 buildworld I did was essentially I/O bound.  I'll have to go and buy some 
 more RamBus memory for the thing (it only came with 128MB), which is 
 kinda of annoying seeing as I have a gigabyte worth of DIMMs just sitting
 on my desk :-( that I can't use.
 
 I'm tring to imagine 1.3 GHz.  That's over a billion instructions
 a second.  And in a few years with the new chip fab lithography
 standards it's going to be 10 GHz.
 
 We need to find something more interesting then buildworlds to do on
 these machines.
 
   -Matt
 
 
 /*
  * Attempt to test memory copy speeds.  Use a buffer large enough to
  * defeat the on-cpu L1 and L2 caches.
  */
 
 #include sys/types.h
 #include sys/time.h
 #include stdio.h
 #include stdlib.h
 #include stdarg.h
 #include unistd.h
 
 #define NLOOP 100
 
 char Buf1[2 * 1024 * 1024];
 char Buf2[2 * 1024 * 1024];
 
 int deltausecs(struct timeval *tv1, struct timeval *tv2);
 
 int
 main(int ac, char **av)
 {
 int i;
 double dtime;
 struct timeval tv1;
 struct timeval tv2;
 
 memset(Buf1, 1, sizeof(Buf1));
 for (i = 0; i  10; ++i)
   bcopy(Buf1, Buf2, sizeof(Buf1));
 
 gettimeofday(tv1, NULL);
 for (i = 0; i  NLOOP; ++i)
   bcopy(Buf1, Buf2, sizeof(Buf1));
 gettimeofday(tv2, NULL);
 
 dtime = (double)deltausecs(tv1, tv2);
 printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime);
 return(0);
 }
 
 int
 deltausecs(struct timeval *tv1, struct timeval *tv2)
 {
 int usec;
 
 usec = (tv2-tv_usec + 100 - tv1-tv_usec);
 usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100;
 return(usec);
 }
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message
 

 PGP signature


Re: Machines are getting too damn fast

2001-03-04 Thread E.B. Dreger

 Date: Sun, 04 Mar 2001 19:39:09 -0600
 From: David A. Gobeille [EMAIL PROTECTED]
 
 It would also be interesting to see the numbers for an Athlon/PIII
 system with DDR, if anyone has such a machine.

Personally, I'd be [more] interested in a ServerWorks III HE core chipset
with four-way interleaved SDRAM. :-)

If one _truly_ needs the bandwidth of Rambus (which, IIRC, is higher
real-world latency than SDRAM), then how about having the bus bandwidth to
back it up?


Eddy

---

Brotsman  Dreger, Inc.
EverQuick Internet / EternalCommerce Division

E-Mail: [EMAIL PROTECTED]
Phone: (316) 794-8922

---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Machines are getting too damn fast

2001-03-04 Thread Matt Dillon


:You should see what speed RamBus they were using, 600 or 800 Mhz. It is
:pretty fast for large memory writes and reads. It'd be cool to see how
:the different speeds stack up against one another. DDR comparisons would
:be cool too. Yeah, for the frequency, you have to take into account that
:these are different chips than your PIII or Athlons and the performance
:difference is not simply a linear relation to the frequency rating
:(i.e.: 1.3Ghz is not really over one-billion instructions per second,
:just clocks per second). We installed Linux at a UC Free OS User Group
:installfest here in cincinnati, it was pretty sweet. The machine was a
:Dell and the case was freakin' huge. It also came with a 21" monitor and
:stuff. The performace was really good, but not really any better than I
:hads gleaned from the newer 1Ghz Athlons or PIII's.

It says 800 MHz (PC-800 RIMMs) on the side of the box.

The technical reviews basically say that bulk transfer rates for
RamBus blow DDR away, but DDR wins for random reads and writes
due to RamBus's higher startup latency.  I don't have any DDR
systems to test but I can devise a test program.

Celeron 650 MHz (HP desktop) (DIMM)
16.16 MBytes/sec (copy)

Pentium III 550 MHz (Dell 2400) (DIMM)
25.90 MBytes/sec (copy)

Pentium 4 1.3 GHz / PC-800 RIMMs (Sony VAIO)
32.38 MBytes/sec (copy)


-Matt

Compile -O2, changing the two occurances of '512' to '4' will reproduce
the original bulk-transfer rates.  By default this program tests 
single-transfer (always cache miss).

#include sys/types.h
#include sys/time.h
#include stdio.h
#include stdlib.h
#include stdarg.h
#include unistd.h

#define NLOOP   100

char Buf1[2 * 1024 * 1024];
char Buf2[2 * 1024 * 1024];

int deltausecs(struct timeval *tv1, struct timeval *tv2);

int
main(int ac, char **av)
{
int i;
double dtime;
struct timeval tv1;
struct timeval tv2;

memset(Buf1, 1, sizeof(Buf1));
for (i = 0; i  10; ++i)
bcopy(Buf1, Buf2, sizeof(Buf1));

gettimeofday(tv1, NULL);
for (i = 0; i  NLOOP; ++i) {
int j;
int k;
for (k = sizeof(int); k = 512; k += sizeof(int)) {
for (j = sizeof(Buf1) - k; j = 0; j -= 512)
*(int *)(Buf2 + j) = *(int *)(Buf1 + j);
}
}
gettimeofday(tv2, NULL);

dtime = (double)deltausecs(tv1, tv2);
printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime);
return(0);
}

int
deltausecs(struct timeval *tv1, struct timeval *tv2)
{
int usec;

usec = (tv2-tv_usec + 100 - tv1-tv_usec);
usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100;
return(usec);
}


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message