Re: Machines are getting too damn fast
Aloha! In an earlier mail to the thread I pointed to the STREAM benchmark for memory sub systems. Additionally, I wrote that I knew there were another benchmark that tries to analyze word sizes, access latencies for the different memories in the mem sub system. I know can name that benchmark (or at least one such benchmark): MOB. Check out: http://steamboat.cs.ucsb.edu/mob/ The benchmark is currently not in the ports, but it has been tested (see the mob home page) on FreeBSD. I have downloaded it, compiled it and are measuring my own system while writing this. [1] Additionally, the following page is a pretty comprehensive list of links to tools and documentation relating to performance measurement, analysis and optimization. http://www.cs.virginia.edu/~clc5q/perflinks.html [1] MOB reported the following for my Dual Celeron 533 system: Allocated 64MB of memory for benchmarking Performing benchmark: Cache Size/Levels Data Caches: Found L1: [16384] Found L2: [131072] Found L3: [8388608] Instruction Caches: Found L1: [16384] Found L2: [131072] Performing benchmark: Cache Share Level 1 cache is not shared Level 2 cache is shared Performing benchmark: Cache Line Size Data Caches: Elected line size: 32 Elected line size: 28 Elected line size: 108 Instruction Caches: Elected line size: 100 Performing benchmark: Cache Associativity Data Caches: Detected L1 is 8-way associative. Detected L2 is 4-way associative. Detected L3 is 4-way associative. Instruction Caches: Detected L1 is 4-way associative. Performing benchmark: Cache Replacement Policy Data Caches: Detected L1 is random replacement policy Detected L2 is random replacement policy Detected L3 is random replacement policy Instruction Caches: Detected L1 has LRU replacement policy Performing benchmark: Cache Write Policy Data Caches: Found L1 replacement policy is write-back/allocate Found L2 replacement policy is write-back/allocate Found L3 replacement policy is write-through/no-allocate Performing benchmark: Cache Indexing (Virtual/Physical) Performing benchmark: TLB Page Size Data TLBs: Instruction TLBs: Performing benchmark: TLB Entry Count Data TLBs: Num entries: 10 Instruction TLBs: Number of entries not detected Performing benchmark: TLB Associativity Data TLBs: Found associativity 32 Instruction TLBs: Found associativity 32 # MOB Config file # Date Fri Mar 16 17:50:35 2001 # Host: fetis.ninja.se # Run params: trials=3 runTime=100 verbosity=2 [CACHE] level = 1 type = Data size = 16384 lineSize = 32 associativity = 8 replacement = random writeMode = writeBack-alloc latency = [CACHE] level = 2 type = Shared size = 131072 lineSize = 28 associativity = 4 replacement = random writeMode = writeBack-alloc latency = 1.0869 [CACHE] level = 3 type = Data size = 8388608 lineSize = 108 associativity = 4 replacement = random writeMode = writeThrough-noalloc latency = 235.7955 [CACHE] level = 1 type = Instruction size = 16384 lineSize = 100 associativity = 4 replacement = lru writeMode = latency = 1.0397 [TLB] type = Data pageSize = 524288 numEntries = 10 associativity = 32 latency = [TLB] type = Instruction pageSize = 1024 numEntries = associativity = 32 latency = -- Cheers! Joachim - Alltid i harmonisk svngning --- FairLight -- FairLight -- FairLight -- FairLight --- Joachim Strmbergson ASIC SoC designer, nice to CUTE animals Phone: +46(0)31 - 27 98 47Web: http://www.ludd.luth.se/~watchman --- Spamfodder: [EMAIL PROTECTED] --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
In the last episode (Mar 16), Joachim Strmbergson said: In an earlier mail to the thread I pointed to the STREAM benchmark for memory sub systems. Additionally, I wrote that I knew there were another benchmark that tries to analyze word sizes, access latencies for the different memories in the mem sub system. I know can name that benchmark (or at least one such benchmark): MOB. Check out: http://steamboat.cs.ucsb.edu/mob/ The benchmark is currently not in the ports, but it has been tested (see the mob home page) on FreeBSD. I have downloaded it, compiled it and are measuring my own system while writing this. [1] It's sort of misfiled: $ cat /usr/ports/devel/mob/pkg-descr This is a port of mob, that tries to figure out memory system characteristics at run-time. $ -- Dan Nelson [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: RE: RE: Machines are getting too damn fast
: :Noted. : :Is there a gcc PR associated with this? : :http://gcc.gnu.org/cgi-bin/gnatsweb.pl : :A GNATS searc for "freebsd kernel" didn't return anything. : :-Charles No idea. Somewhere around 4.1 my -O2 and -Os kernel compiles just stopped working. There was a bunch of stuff on the list about it. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
Matt Dillon [EMAIL PROTECTED] wrote: Subject: Re: Machines are getting too damn fast :throughput. For example, on the PIII-850 (116MHz FSB and SDRAM, its :overclocked) here on my desk with 256KB L2 cache: : :dd if=/dev/zero of=/dev/null bs=512k count=4000 :4000+0 records in :4000+0 records out :2097152000 bytes transferred in 8.229456 secs (254834825 bytes/sec) : :dd if=/dev/zero of=/dev/null bs=128k count=16000 :16000+0 records in :16000+0 records out :2097152000 bytes transferred in 1.204001 secs (1741819224 bytes/sec) : :Now THAT is a significant difference. :-) Interesting. I get very different results with the 1.3 GHz P4. The best I seem to get is 1.4 GBytes/sec. I'm not sure what the L2 cache is on the box, but it's definitely a consumer model. dd if=/dev/zero of=/dev/null bs=512k count=4000 2097152000 bytes transferred in 2.363903 secs (887156520 bytes/sec) dd if=/dev/zero of=/dev/null bs=128k count=16000 2097152000 bytes transferred in 1.471046 secs (1425619621 bytes/sec) If I use lower block sizes the syscall overhead blows up the performance (it gets lower rather then higher). So I figure I don't have as much L2 as on your system. The P4 has other issues when you don't do straight line code. Any branch mis-predictions cost a minimum of 20 cycles due to the pipeline plus whatever cache/fetch/decode hits you may get on the actual target. This may be why you get lower values than a PIII or Athelon. (Both have significantly lower penalty for branch mis-prediction) -- Michael Sinz Worldgate Communications [EMAIL PROTECTED] A master's secrets are only as good as the master's ability to explain them to others. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
Matt Dillon writes: I modified my original C program again, this time to simply read the data from memory given a block size in kilobytes as an argument. I had to throw in a little __asm to do it right, but here are my results. It shows about 3.2 GBytes/sec from the L2 (well, insofar as my 3-instruction loop goes), and about 1.4 GBytes/sec from main memory. NOTE: cc x.c -O2 -o x ./x 4 3124.96 MBytes/sec (read) ... ./x 1024 1397.90 MBytes/sec (read) In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2, and 444 MBytes/sec from main memory. FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard, (PC133 ECC Registered Dimms) ./x 4 2393.70 MBytes/sec (read) ./x 8 2398.19 MBytes/sec (read) ... ./x 1024 627.32 MBytes/sec (read) And a Dual 933MHz PIII SuperMicro 370DER Serverworks HE-SL Chipset (2-way interleaved PC133 ECC Registered DIMMS) ./x 4 1853.54 MBytes/sec (read) ./x 1024 526.19 MBytes/sec (read) There's something diabolic about your previous bw test, though. I think it only hits one bank of interleaved ram. On the 370DER it gets only 167MB/sec. Every other bw test I've run on the box shows copy perf at around 260MB/sec (Hbench, lmbench). I see the same problem on a PE4400 (also 2-way interleaved); it shows copy perf as 111MB/sec. Every other test has it at 230MB/sec. The Athlon copies at 174MB/sec, which is right about what lmbench, hbench, etc, and your test show. How's your P4 for floating point? Is real-life perf as good as the specbench numbers would indicate, or do you need a better compiler than GCC to get any benefit from it? My wife is a statistician, and she runs some really fp intensive workloads. This Athlon is faster than the Serverworks box and (barely) faster than a year-old Alpha UP1000 for her code. Drew -- Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin Duke University Email: [EMAIL PROTECTED] Department of Computer Science Phone: (919) 660-6590 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
In message [EMAIL PROTECTED], Andrew Gallatin writes: FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard, (PC133 ECC Registered Dimms) Note that the KT does *NOT* support ECC. A few places have claimed it does, but the VIA chipset spec says it doesn't. The KV or KX does (I forget the model #), but the KT is secretly doing no error correcting at all. I got burned on this with an ABit VP6, which proclaims loudly that it supports ECC, but doesn't actually *do* any. -s To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
:How's your P4 for floating point? Is real-life perf as good as the :specbench numbers would indicate, or do you need a better compiler :than GCC to get any benefit from it? My wife is a statistician, and :she runs some really fp intensive workloads. This Athlon is faster :than the Serverworks box and (barely) faster than a year-old Alpha :UP1000 for her code. : :Drew : :-- :Andrew Gallatin, Sr Systems Programmer http://www.cs.duke.edu/~gallatin My understanding is that Intel focused on FP performance in the P4, and that it is very, very good at it. I dunno how to test it though. GCC generally does not produce very good code, but I would expect that it would get reasonably close in regards to FP because Intel's FP instruction set is a good fit with it. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
RE: Machines are getting too damn fast
From: Matt Dillon [mailto:[EMAIL PROTECTED]] My understanding is that Intel focused on FP performance in the P4, and that it is very, very good at it. I dunno how to test it though. GCC generally does not produce very good code, but I would expect that it would get reasonably close in regards to FP because Intel's FP instruction set is a good fit with it. Which begs the question I've tried to ask a number of times in different forums. Who's working on P4 optimizations and code generation for the P4? Sure, i386 code will run but the benchmarks seem to indicate that peak performance is heavily dependent on a good optimizing compiler. A query to the gcc mailing list returned no responses. Charles To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
On Tue, 6 Mar 2001, Matt Dillon wrote: My understanding is that Intel focused on FP performance in the P4, and that it is very, very good at it. I dunno how to test it though. From the benchmarks tom's hardware / others did, I got the impression that SSE2 performance is awesome, but x87 FPU operations aren't really improved, so the Athlon still has the advantage there. GCC generally does not produce very good code, but I would expect that it would get reasonably close in regards to FP because Intel's FP instruction set is a good fit with it. -Matt I'm quite confused about Intel's strategy wrt that compiler. Every time someone does a benchmark showing Intel's newest processor getting beat at something, they send code compiled with it to the benchmarker. However, they haven't even attempted to make it a popular compiler. Everything I've seen/heard indicates that msvc and gcc are all that gets really used on x86. My only guess is that part of the company wants to have everyone use it to get optimal performance out of intel processors, while the other half wants people to be forced to buy faster processors. This would explain why it's still sold, but in such a way that nobody will really buy it. (The reason I mention this is because someone was talking about trying to compile the kernel with sun's CC. Maybe rigging intel's compiler to do so would be fruitful.) Mike "Silby" Silbersack To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
On Tue, Mar 06, 2001 at 10:56:46 -0500, Andrew Gallatin wrote: Matt Dillon writes: I modified my original C program again, this time to simply read the data from memory given a block size in kilobytes as an argument. I had to throw in a little __asm to do it right, but here are my results. It shows about 3.2 GBytes/sec from the L2 (well, insofar as my 3-instruction loop goes), and about 1.4 GBytes/sec from main memory. NOTE: cc x.c -O2 -o x ./x 4 3124.96 MBytes/sec (read) ... ./x 1024 1397.90 MBytes/sec (read) In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2, and 444 MBytes/sec from main memory. FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard, (PC133 ECC Registered Dimms) ./x 4 2393.70 MBytes/sec (read) ./x 8 2398.19 MBytes/sec (read) ... ./x 1024 627.32 MBytes/sec (read) And a Dual 933MHz PIII SuperMicro 370DER Serverworks HE-SL Chipset (2-way interleaved PC133 ECC Registered DIMMS) ./x 4 1853.54 MBytes/sec (read) ./x 1024 526.19 MBytes/sec (read) Dell Precision 420 (i840 chipset) with a single PIII 800 and probably one RIMM, unknown speed: {rivendell:/usr/home/ken/src:76:0} ./memspeed 4 1049.51 MBytes/sec (read) {rivendell:/usr/home/ken/src:77:0} ./memspeed 1024 378.41 MBytes/sec (read) The above machine may not have been completely idle, it seems a little slow. Dual 1GHz PIII SuperMicro 370DE6 Serverworks HE-SL chipset, 4x256MB PC133 ECC Registered DIMMs: {gondolin:/usr/home/ken/src:51:0} ./memspeed 4 1985.95 MBytes/sec (read) {gondolin:/usr/home/ken/src:52:0} ./memspeed 1024 516.62 MBytes/sec (read) There's something diabolic about your previous bw test, though. I think it only hits one bank of interleaved ram. On the 370DER it gets only 167MB/sec. Every other bw test I've run on the box shows copy perf at around 260MB/sec (Hbench, lmbench). I see the same problem on a PE4400 (also 2-way interleaved); it shows copy perf as 111MB/sec. Every other test has it at 230MB/sec. The previous test showed about 270MB/sec on my Serverworks box: {gondolin:/usr/home/ken/src:53:0} ./memory_speed 269.23 MBytes/sec (copy) Ken -- Kenneth Merry [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: RE: Machines are getting too damn fast
:Which begs the question I've tried to ask a number of times in different :forums. Who's working on P4 optimizations and code generation for the P4? I'd be happy if GCC -O2 just worked without introducing bugs. I want to be able to compile the kernel with it again. -Matt :Sure, i386 code will run but the benchmarks seem to indicate that peak :performance is heavily dependent on a good optimizing compiler. : :A query to the gcc mailing list returned no responses. : :Charles To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
RE: Machines are getting too damn fast
This explained in great detail exactly why people are seeing the performance they are from the P4 etc. The author knows his stuff. http://www.emulators.com/pentium4.htm Brandon :-Original Message- :From: [EMAIL PROTECTED] :[mailto:[EMAIL PROTECTED]]On Behalf Of Kenneth D. Merry :Sent: Tuesday, March 06, 2001 1:08 PM :To: Andrew Gallatin :Cc: Matt Dillon; [EMAIL PROTECTED] :Subject: Re: Machines are getting too damn fast : : :On Tue, Mar 06, 2001 at 10:56:46 -0500, Andrew Gallatin wrote: : Matt Dillon writes: : : I modified my original C program again, this time to simply read : the data from memory given a block size in kilobytes as :an argument. : I had to throw in a little __asm to do it right, but here :are my results. : It shows about 3.2 GBytes/sec from the L2 (well, insofar as my : 3-instruction loop goes), and about 1.4 GBytes/sec from :main memory. : : : NOTE: cc x.c -O2 -o x : : ./x 4 : 3124.96 MBytes/sec (read) : ... : ./x 1024 : 1397.90 MBytes/sec (read) : : In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2, : and 444 MBytes/sec from main memory. : : : FWIW: 1.2GHz Athlon, VIA Apollo KT133 chipset, Asus A7V motherboard, : (PC133 ECC Registered Dimms) : : ./x 4 : 2393.70 MBytes/sec (read) : ./x 8 : 2398.19 MBytes/sec (read) : ... : ./x 1024 : 627.32 MBytes/sec (read) : : : And a Dual 933MHz PIII SuperMicro 370DER Serverworks HE-SL Chipset : (2-way interleaved PC133 ECC Registered DIMMS) : : ./x 4 : 1853.54 MBytes/sec (read) : ./x 1024 : 526.19 MBytes/sec (read) : :Dell Precision 420 (i840 chipset) with a single PIII 800 and probably one :RIMM, unknown speed: : :{rivendell:/usr/home/ken/src:76:0} ./memspeed 4 :1049.51 MBytes/sec (read) :{rivendell:/usr/home/ken/src:77:0} ./memspeed 1024 :378.41 MBytes/sec (read) : :The above machine may not have been completely idle, it seems a little :slow. : :Dual 1GHz PIII SuperMicro 370DE6 Serverworks HE-SL chipset, 4x256MB PC133 :ECC Registered DIMMs: : :{gondolin:/usr/home/ken/src:51:0} ./memspeed 4 :1985.95 MBytes/sec (read) :{gondolin:/usr/home/ken/src:52:0} ./memspeed 1024 :516.62 MBytes/sec (read) : : There's something diabolic about your previous bw test, though. I : think it only hits one bank of interleaved ram. On the 370DER it gets : only 167MB/sec. Every other bw test I've run on the box shows copy : perf at around 260MB/sec (Hbench, lmbench). I see the same problem on : a PE4400 (also 2-way interleaved); it shows copy perf as 111MB/sec. : Every other test has it at 230MB/sec. : :The previous test showed about 270MB/sec on my Serverworks box: : :{gondolin:/usr/home/ken/src:53:0} ./memory_speed :269.23 MBytes/sec (copy) : :Ken :-- :Kenneth Merry :[EMAIL PROTECTED] : :To Unsubscribe: send mail to [EMAIL PROTECTED] :with "unsubscribe freebsd-hackers" in the body of the message : To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: RE: Machines are getting too damn fast
: :This explained in great detail exactly why people are seeing the performance :they are from the P4 etc. The author knows his stuff. : :http://www.emulators.com/pentium4.htm : :Brandon Heh heh. You can practically see the sweat popping off his face while reading his article. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
RE: RE: Machines are getting too damn fast
Noted. Is there a gcc PR associated with this? http://gcc.gnu.org/cgi-bin/gnatsweb.pl A GNATS searc for "freebsd kernel" didn't return anything. -Charles -Original Message- From: Matt Dillon [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 06, 2001 11:44 AM To: Charles Randall Cc: Andrew Gallatin; [EMAIL PROTECTED] Subject: Re: RE: Machines are getting too damn fast :Which begs the question I've tried to ask a number of times in different :forums. Who's working on P4 optimizations and code generation for the P4? I'd be happy if GCC -O2 just worked without introducing bugs. I want to be able to compile the kernel with it again. -Matt :Sure, i386 code will run but the benchmarks seem to indicate that peak :performance is heavily dependent on a good optimizing compiler. : :A query to the gcc mailing list returned no responses. : :Charles To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
Aloha! (Sorry for jumping right in to the thread here...) This might be a good time to mention the STREAM benchmark developed by John McCalpin. It measures sustained bandwidth of systems. The official web page is at: http://www.cs.virginia.edu/stream/ If you check the section under "PC-compatible Results" you should find machines comparable to what you are measuring. Also, there are a tool out there that will measure effective latency for cache levels and main store for a system - both first page and burst transfers. Need to search my ever inflating bookmarks... The news group comp.arch might be a source of info and would probably be interested in your results too. Matt Dillon wrote: :You should see what speed RamBus they were using, 600 or 800 Mhz. It is :pretty fast for large memory writes and reads. It'd be cool to see how :the different speeds stack up against one another. DDR comparisons would :be cool too. Yeah, for the frequency, you have to take into account that :these are different chips than your PIII or Athlons and the performance :difference is not simply a linear relation to the frequency rating :(i.e.: 1.3Ghz is not really over one-billion instructions per second, :just clocks per second). We installed Linux at a UC Free OS User Group :installfest here in cincinnati, it was pretty sweet. The machine was a :Dell and the case was freakin' huge. It also came with a 21" monitor and :stuff. The performace was really good, but not really any better than I :hads gleaned from the newer 1Ghz Athlons or PIII's. It says 800 MHz (PC-800 RIMMs) on the side of the box. The technical reviews basically say that bulk transfer rates for RamBus blow DDR away, but DDR wins for random reads and writes due to RamBus's higher startup latency. I don't have any DDR systems to test but I can devise a test program. Celeron 650 MHz (HP desktop) (DIMM) 16.16 MBytes/sec (copy) Pentium III 550 MHz (Dell 2400) (DIMM) 25.90 MBytes/sec (copy) Pentium 4 1.3 GHz / PC-800 RIMMs (Sony VAIO) 32.38 MBytes/sec (copy) -Matt -- Cheers! Joachim - Alltid i harmonisk svngning --- FairLight -- FairLight -- FairLight -- FairLight --- Joachim Strmbergson ASIC SoC designer, nice to CUTE animals Phone: +46(0)31 - 27 98 47Web: http://www.ludd.luth.se/~watchman --- Spamfodder: [EMAIL PROTECTED] --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
I believe DDR still has it bead (PC2100 that is), Rambus serailizes all writes to 32 bit or 16 bit (an even 8 bit) depending on the modules. I haven't seen any 64bit RIMMs. Also, DDR scales higher (up to 333Mhz I think). Just wait for that. I've pretty much given up on RamBus, I saw back in '96 when they talked about how great it would be. I thought it was crap back then. The idea is that it can queue 28 mem ops, where SDRAM can only do like 4. Typical environments typically don't do the writes that it looks good on, jsut certain special cases. That's where you get into the fact that RamBus sucks for use in PCs. Too specialized. Too proprietary. Matt Dillon had the audacity to say: :You should see what speed RamBus they were using, 600 or 800 Mhz. It is :pretty fast for large memory writes and reads. It'd be cool to see how :the different speeds stack up against one another. DDR comparisons would :be cool too. Yeah, for the frequency, you have to take into account that :these are different chips than your PIII or Athlons and the performance :difference is not simply a linear relation to the frequency rating :(i.e.: 1.3Ghz is not really over one-billion instructions per second, :just clocks per second). We installed Linux at a UC Free OS User Group :installfest here in cincinnati, it was pretty sweet. The machine was a :Dell and the case was freakin' huge. It also came with a 21" monitor and :stuff. The performace was really good, but not really any better than I :hads gleaned from the newer 1Ghz Athlons or PIII's. It says 800 MHz (PC-800 RIMMs) on the side of the box. The technical reviews basically say that bulk transfer rates for RamBus blow DDR away, but DDR wins for random reads and writes due to RamBus's higher startup latency. I don't have any DDR systems to test but I can devise a test program. Celeron 650 MHz (HP desktop) (DIMM) 16.16 MBytes/sec (copy) Pentium III 550 MHz (Dell 2400) (DIMM) 25.90 MBytes/sec (copy) Pentium 4 1.3 GHz / PC-800 RIMMs (Sony VAIO) 32.38 MBytes/sec (copy) -Matt Compile -O2, changing the two occurances of '512' to '4' will reproduce the original bulk-transfer rates. By default this program tests single-transfer (always cache miss). #include sys/types.h #include sys/time.h #include stdio.h #include stdlib.h #include stdarg.h #include unistd.h #define NLOOP 100 char Buf1[2 * 1024 * 1024]; char Buf2[2 * 1024 * 1024]; int deltausecs(struct timeval *tv1, struct timeval *tv2); int main(int ac, char **av) { int i; double dtime; struct timeval tv1; struct timeval tv2; memset(Buf1, 1, sizeof(Buf1)); for (i = 0; i 10; ++i) bcopy(Buf1, Buf2, sizeof(Buf1)); gettimeofday(tv1, NULL); for (i = 0; i NLOOP; ++i) { int j; int k; for (k = sizeof(int); k = 512; k += sizeof(int)) { for (j = sizeof(Buf1) - k; j = 0; j -= 512) *(int *)(Buf2 + j) = *(int *)(Buf1 + j); } } gettimeofday(tv2, NULL); dtime = (double)deltausecs(tv1, tv2); printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime); return(0); } int deltausecs(struct timeval *tv1, struct timeval *tv2) { int usec; usec = (tv2-tv_usec + 100 - tv1-tv_usec); usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100; return(usec); } PGP signature
Re: Machines are getting too damn fast
On Mon, 5 Mar 2001, E.B. Dreger wrote: Date: Sun, 04 Mar 2001 19:39:09 -0600 From: David A. Gobeille [EMAIL PROTECTED] It would also be interesting to see the numbers for an Athlon/PIII system with DDR, if anyone has such a machine. Personally, I'd be [more] interested in a ServerWorks III HE core chipset with four-way interleaved SDRAM. :-) I've got a ServerWorks III HE-SL system with 512MB of two-way interleaved PC133 SDRAM and dual PIII-800's. Is that close enough? :-) Here is my "memory bandwidth test", much much simpler and less scientific than Matt's: dd if=/dev/zero of=/dev/null bs=10m count=1000 1000+0 records in 1000+0 records out 1048576 bytes transferred in 23.716504 secs (442129245 bytes/sec) I just did a recent 4.2-STABLE 'make -j 4 buildworld' on that system in just over 34 minutes. Here's the time output: 1980.707u 768.223s 34:20.89 133.3% 1297+1456k 39517+6202io 1661pf+0w If one _truly_ needs the bandwidth of Rambus (which, IIRC, is higher real-world latency than SDRAM), then how about having the bus bandwidth to back it up? The higher real-world latency of RDRAM over SDRAM is what makes the benefits of its higher bandwidth so questionable. PC2100 DDR-SDRAM -- which has higher latencies than regular SDRAM but still lower than RDRAM -- should have it beat soundly, though we'll have to wait for some systems that are actually designed to take advantage of it to say for sure. :-) -- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED] FreeBSD: The fastest and most stable server OS on the planet. For IA32 and Alpha architectures. IA64, PPC, and ARM under development. http://www.freebsd.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
:On Mon, 5 Mar 2001, E.B. Dreger wrote: : :I've got a ServerWorks III HE-SL system with 512MB of two-way :interleaved PC133 SDRAM and dual PIII-800's. Is that close enough? ::-) : :Here is my "memory bandwidth test", much much simpler and less :scientific than Matt's: : :dd if=/dev/zero of=/dev/null bs=10m count=1000 :1000+0 records in :1000+0 records out :1048576 bytes transferred in 23.716504 secs (442129245 bytes/sec) : :I just did a recent 4.2-STABLE 'make -j 4 buildworld' on that system :in just over 34 minutes. Here's the time output: :1980.707u 768.223s 34:20.89 133.3% 1297+1456k 39517+6202io 1661pf+0w That is quite impressive for SDRAM, though I'm not exactly sure what's being measured due to the way /dev/zero and /dev/null operate. On my system the above dd test returns around 883MB/sec so I would guess that it is only doing a read-swipe on the memory. (sony 1.3G / RIMM) apollo:/home/dillon dd if=/dev/zero of=/dev/null bs=10m count=1000 1000+0 records in 1000+0 records out 1048576 bytes transferred in 11.867550 secs (883565697 bytes/sec) On the DELL 2400 I get: 1048576000 bytes transferred in 2.737955 secs (382977810 bytes/sec) The only thing I don't like about this baby is the IBM IDE hard drive's write performance. I only get 10-12 MBytes/sec. Read performance is incredible, though... I get 37MB/sec dd'ing from /dev/ad0s1a to /dev/null. ad0: 58644MB IBM-DTLA-307060 [119150/16/63] at ata0-master UDMA100 -Matt : If one _truly_ needs the bandwidth of Rambus (which, IIRC, is : higher real-world latency than SDRAM), then how about having the : bus bandwidth to back it up? : :The higher real-world latency of RDRAM over SDRAM is what makes the :benefits of its higher bandwidth so questionable. PC2100 DDR-SDRAM -- :which has higher latencies than regular SDRAM but still lower than :RDRAM -- should have it beat soundly, though we'll have to wait for :some systems that are actually designed to take advantage of it to say :for sure. :-) : : :-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
On Mon, 5 Mar 2001, Matt Dillon wrote: :On Mon, 5 Mar 2001, E.B. Dreger wrote: : :I've got a ServerWorks III HE-SL system with 512MB of two-way :interleaved PC133 SDRAM and dual PIII-800's. Is that close enough? ::-) : :Here is my "memory bandwidth test", much much simpler and less :scientific than Matt's: : :dd if=/dev/zero of=/dev/null bs=10m count=1000 :1000+0 records in :1000+0 records out :1048576 bytes transferred in 23.716504 secs (442129245 bytes/sec) : :I just did a recent 4.2-STABLE 'make -j 4 buildworld' on that system :in just over 34 minutes. Here's the time output: :1980.707u 768.223s 34:20.89 133.3% 1297+1456k 39517+6202io 1661pf+0w That is quite impressive for SDRAM, though I'm not exactly sure what's being measured due to the way /dev/zero and /dev/null operate. On my system the above dd test returns around 883MB/sec so I would guess that it is only doing a read-swipe on the memory. I figured if /dev/null and /dev/zero had relatively little memory impact, which I figure is the case, it would matter more how dd does its blocking. If you reduce the blocksize to something that will fit in your L2 caches, you'll see a very significant increase in throughput. For example, on the PIII-850 (116MHz FSB and SDRAM, its overclocked) here on my desk with 256KB L2 cache: dd if=/dev/zero of=/dev/null bs=10m count=200 200+0 records in 200+0 records out 2097152000 bytes transferred in 10.032157 secs (209042982 bytes/sec) dd if=/dev/zero of=/dev/null bs=512k count=4000 4000+0 records in 4000+0 records out 2097152000 bytes transferred in 8.229456 secs (254834825 bytes/sec) dd if=/dev/zero of=/dev/null bs=128k count=16000 16000+0 records in 16000+0 records out 2097152000 bytes transferred in 1.204001 secs (1741819224 bytes/sec) Now THAT is a significant difference. :-) (sony 1.3G / RIMM) apollo:/home/dillon dd if=/dev/zero of=/dev/null bs=10m count=1000 1000+0 records in 1000+0 records out 1048576 bytes transferred in 11.867550 secs (883565697 bytes/sec) Very impressive. Looks about right given the theoretical maximum bandwidth of your memory system. From what I've been gathering, that dd-test shows roughly 1/4 of the total theoretical memory bandwidth (two reads and two writes happening?), minus any chipset deficiencies when it comes to memory transfers. For example, this PIII-850 on a BX board is actually an overclocked 700MHz 100MHz FSB PIII. So, it is using a 116MHz FSB and the SDRAM is running at that speed as well. The theoretical maximum bandwidth of this should be 8(bytes-per-clock)*116(MHz)=928MB/sec. I'm seeing about 210MB/sec on the dd-test, so thats pretty close to 1/4. In your case, you are using PC800 RDRAM on a Pentium-4 system on an i850 board, which IIRC uses a dual-channel RDRAM setup, theoretically doubling the bandwidth that you could get with only one channel. Since you're using dual RDRAM channels of PC800 RDRAM, that would be 4(bytes-per-clock)*800(MHz)=3200MB/sec. Again, the dd-test is pretty close to 1/4 the total theoretical bandwidth (actually a little over), since you achieved about 883MB/sec. In an HE-SL system I have here, it uses dual-interleaved PC133 SDRAM. That would be 16(bytes-per-clock)*133(MHz)=2128MB/sec. The dd-test is a little off the mark of 1/4 the theoretical maximum bandwidth at 442MB/sec, but this thing is the only system I've seen the dd-test on so far that hasn't used an Intel chipset. The Intel chipsets have shown in other benchmarks to be more efficient than anybody else's chipsets (VIA, AMD, RCC, etc) at doing the memory thing, so that makes sense. I think I'm on to something here with this 'cheap' dd-test. :-) A PC2100 DDR-SDRAM system will have a theoretical bandwidth of 2100MB/sec (duh), so a dd-test should come out to around 525MB/sec. I'm guessing it will actually be a bit less since the AMD and especially VIA chipsets aren't going to be anywhere near as efficient as the Intels have been. I'm guessing 95% efficiency for the AMD, so about 500MB/sec from the dd-test. 85% efficient for the VIA, so around 450MB/sec. Anyone care to test that on one or both of the new AMD-chipset and VIA-chipset DDR systems and see how close it comes? :-) The only thing I don't like about this baby is the IBM IDE hard drive's write performance. I only get 10-12 MBytes/sec. Read performance is incredible, though... I get 37MB/sec dd'ing from /dev/ad0s1a to /dev/null. ad0: 58644MB IBM-DTLA-307060 [119150/16/63] at ata0-master UDMA100 That is a 75GXP series, which is supposed to be one of the fastest, if not the fastest, IDE drive on the market right now. Hmm. -- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED] FreeBSD: The fastest and most stable server OS on the planet. For IA32 and Alpha architectures. IA64, PPC, and ARM under development. http://www.freebsd.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe
Re: Machines are getting too damn fast
:throughput. For example, on the PIII-850 (116MHz FSB and SDRAM, its :overclocked) here on my desk with 256KB L2 cache: : :dd if=/dev/zero of=/dev/null bs=512k count=4000 :4000+0 records in :4000+0 records out :2097152000 bytes transferred in 8.229456 secs (254834825 bytes/sec) : :dd if=/dev/zero of=/dev/null bs=128k count=16000 :16000+0 records in :16000+0 records out :2097152000 bytes transferred in 1.204001 secs (1741819224 bytes/sec) : :Now THAT is a significant difference. :-) Interesting. I get very different results with the 1.3 GHz P4. The best I seem to get is 1.4 GBytes/sec. I'm not sure what the L2 cache is on the box, but it's definitely a consumer model. dd if=/dev/zero of=/dev/null bs=512k count=4000 2097152000 bytes transferred in 2.363903 secs (887156520 bytes/sec) dd if=/dev/zero of=/dev/null bs=128k count=16000 2097152000 bytes transferred in 1.471046 secs (1425619621 bytes/sec) If I use lower block sizes the syscall overhead blows up the performance (it gets lower rather then higher). So I figure I don't have as much L2 as on your system. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
:IIRC, Intel is using a very different caching method on the P4 from :what we are used to on just about every other x86 processor we've :seen. Well, I can't remember if the data cache has changed much, but :the instruction cache has. I doubt the difference in instruction :cache behaviour would make a difference here though. Hmm. : :I wonder if it makes any difference that I'm using -march=pentium :-mcpu=pentium for my CFLAGS? Actually, the kernel I tested on might :even be using -march/-mcpu=pentiumpro, since I only recently changed :it to =pentium to allow me to do buildworlds for another Pentium-class :machine. I did wonder the same thing a while back and did the same :test with and without the optimizations, and with pentiumpro opts the :big block size transfer rate went _down_ a little bit, which was odd. :I didn't compare with L2-cache-friendly blocks, though. : :-- Chris Dillon - [EMAIL PROTECTED] - [EMAIL PROTECTED] I modified my original C program again, this time to simply read the data from memory given a block size in kilobytes as an argument. I had to throw in a little __asm to do it right, but here are my results. It shows about 3.2 GBytes/sec from the L2 (well, insofar as my 3-instruction loop goes), and about 1.4 GBytes/sec from main memory. NOTE: cc x.c -O2 -o x ./x 4 3124.96 MBytes/sec (read) ./x 8 3242.45 MBytes/sec (read) ./x 16 3060.93 MBytes/sec (read) ./x 32 3359.97 MBytes/sec (read) ./x 64 3362.06 MBytes/sec (read) ./x 128 3365.53 MBytes/sec (read) ./x 240 3307.86 MBytes/sec (read) ./x 256 3232.33 MBytes/sec (read) ./x 512 1396.45 MBytes/sec (read) ./x 1024 1397.90 MBytes/sec (read) In contrast I get 1052.50 MBytes/sec on the Dell 2400 from the L2, and 444 MBytes/sec from main memory. -Matt /* * NOTE: cc x.c -O2 -o x */ #include sys/types.h #include sys/time.h #include stdio.h #include stdlib.h #include stdarg.h #include unistd.h int deltausecs(struct timeval *tv1, struct timeval *tv2); int main(int ac, char **av) { int i; int bytes; double dtime; struct timeval tv1; struct timeval tv2; char *buf; if (ac == 1) { fprintf(stderr, "%s numKB\n", av[0]); exit(1); } bytes = strtol(av[1], NULL, 0) * 1024; if (bytes 4 * 1024 || bytes 256 * 1024 * 1024) { fprintf(stderr, "Oh please. Try a reasonable value\n"); exit(1); } buf = malloc(bytes); if (buf == NULL) { perror("malloc"); exit(1); } bzero(buf, bytes); gettimeofday(tv1, NULL); for (i = 0; i 10; i += bytes) { register int j; for (j = bytes - 4; j = 0; j -= 4) __asm __volatile("movl (%0,%1),%%eax" : "=r" (buf), "=r" (j) : "0" (buf), "1" (j) : "ax" ); } gettimeofday(tv2, NULL); dtime = (double)deltausecs(tv1, tv2); printf("%6.2f MBytes/sec (read)\n", (double)10 / dtime); return(0); } int deltausecs(struct timeval *tv1, struct timeval *tv2) { int usec; usec = (tv2-tv_usec + 100 - tv1-tv_usec); usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100; return(usec); } To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Machines are getting too damn fast
I was browsing CompUSA today and noticed they were selling Sony VAIO 1.3 and 1.5 GHz desktops, amoung other things. It's amazing how fast processors have gotten just in the last two years! I just had to pick up one of these babies and give it a run through to see how fast the RamBus memory is. I'm suitably impressed, at least when comparing it against other Intel cpu's. Intel is finally getting some decent memory bandwidth. I've included some memory copying tests below. The actual memory bandwidth is 2x what the test reports since it's a copy test. Sony 1.3 GHz Pentium 4 VAIO w/ 128MB RamBus memory (two 64MB RIMMs) 571.20 MBytes/sec (copy) 650 MHz Celleron (HP desktop, DIMM) 114.65 MBytes/sec (copy) 750 MHz P-III (2U VALINUX box, 2-cpu, 1024M ECC-DIMM) 162.20 MBytes/sec (copy) 700 MHz Celeron(?) (1U VALINUX box, 1-cpu, 128MB DIMM) 93.56 MBytes/sec (copy) yuch 550 MHz P-III (4U Dell 2400, 1-cpu, 256MB DIMM) 225.92 MBytes/sec (copy) 600 MHz P-III (2U Dell 2450, 2-cpus, 512MB DIMM)) 228.91 MBytes/sec (copy) I was somewhat disappointed with the VALINUX boxes, I expected them to be on par with the DELLs. In anycase, the Sony VAIO workstation with the RamBus memory blew the field away. The cpu is so fast that a buildworld I did was essentially I/O bound. I'll have to go and buy some more RamBus memory for the thing (it only came with 128MB), which is kinda of annoying seeing as I have a gigabyte worth of DIMMs just sitting on my desk :-( that I can't use. I'm tring to imagine 1.3 GHz. That's over a billion instructions a second. And in a few years with the new chip fab lithography standards it's going to be 10 GHz. We need to find something more interesting then buildworlds to do on these machines. -Matt /* * Attempt to test memory copy speeds. Use a buffer large enough to * defeat the on-cpu L1 and L2 caches. */ #include sys/types.h #include sys/time.h #include stdio.h #include stdlib.h #include stdarg.h #include unistd.h #define NLOOP 100 char Buf1[2 * 1024 * 1024]; char Buf2[2 * 1024 * 1024]; int deltausecs(struct timeval *tv1, struct timeval *tv2); int main(int ac, char **av) { int i; double dtime; struct timeval tv1; struct timeval tv2; memset(Buf1, 1, sizeof(Buf1)); for (i = 0; i 10; ++i) bcopy(Buf1, Buf2, sizeof(Buf1)); gettimeofday(tv1, NULL); for (i = 0; i NLOOP; ++i) bcopy(Buf1, Buf2, sizeof(Buf1)); gettimeofday(tv2, NULL); dtime = (double)deltausecs(tv1, tv2); printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime); return(0); } int deltausecs(struct timeval *tv1, struct timeval *tv2) { int usec; usec = (tv2-tv_usec + 100 - tv1-tv_usec); usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100; return(usec); } To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
- Original Message - From: Matt Dillon [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, March 04, 2001 3:34 AM Subject: Machines are getting too damn fast | We need to find something more interesting then buildworlds to do on | these machines. Let's just complicate the code more. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
RE: Machines are getting too damn fast
Let's just complicate the code more. Hey it works for Microsoft after all! Perhaps if buildworld takes 3 days even on a an eight CPU AlphaServer FreeBSD will rocket to almost completely domination of the OS market? Then again - maybe not :) Dominic -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Tyler K McGeorge Sent: 04 March 2001 10:21 To: Matt Dillon; [EMAIL PROTECTED] Subject: Re: Machines are getting too damn fast - Original Message - From: Matt Dillon [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, March 04, 2001 3:34 AM Subject: Machines are getting too damn fast | We need to find something more interesting then buildworlds to do on | these machines. Let's just complicate the code more. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
On Sun, Mar 04, 2001 at 04:21:02AM -0600, Tyler K McGeorge wrote: - Original Message - From: Matt Dillon [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, March 04, 2001 3:34 AM Subject: Machines are getting too damn fast | We need to find something more interesting then buildworlds to do on | these machines. Let's just complicate the code more. We could vendor-import win2000? Mark -- Nice testing in little China... To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
Tyler K McGeorge wrote: - Original Message - From: Matt Dillon [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, March 04, 2001 3:34 AM Subject: Machines are getting too damn fast | We need to find something more interesting then buildworlds to do on | these machines. Let's just complicate the code more. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message How about bw/s -- buildworlds per second ;-) It would also be interesting to see the numbers for an Athlon/PIII system with DDR, if anyone has such a machine. -- Dave To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
You should see what speed RamBus they were using, 600 or 800 Mhz. It is pretty fast for large memory writes and reads. It'd be cool to see how the different speeds stack up against one another. DDR comparisons would be cool too. Yeah, for the frequency, you have to take into account that these are different chips than your PIII or Athlons and the performance difference is not simply a linear relation to the frequency rating (i.e.: 1.3Ghz is not really over one-billion instructions per second, just clocks per second). We installed Linux at a UC Free OS User Group installfest here in cincinnati, it was pretty sweet. The machine was a Dell and the case was freakin' huge. It also came with a 21" monitor and stuff. The performace was really good, but not really any better than I hads gleaned from the newer 1Ghz Athlons or PIII's. Matt Dillon had the audacity to say: I was browsing CompUSA today and noticed they were selling Sony VAIO 1.3 and 1.5 GHz desktops, amoung other things. It's amazing how fast processors have gotten just in the last two years! I just had to pick up one of these babies and give it a run through to see how fast the RamBus memory is. I'm suitably impressed, at least when comparing it against other Intel cpu's. Intel is finally getting some decent memory bandwidth. I've included some memory copying tests below. The actual memory bandwidth is 2x what the test reports since it's a copy test. Sony 1.3 GHz Pentium 4 VAIO w/ 128MB RamBus memory (two 64MB RIMMs) 571.20 MBytes/sec (copy) 650 MHz Celleron (HP desktop, DIMM) 114.65 MBytes/sec (copy) 750 MHz P-III (2U VALINUX box, 2-cpu, 1024M ECC-DIMM) 162.20 MBytes/sec (copy) 700 MHz Celeron(?) (1U VALINUX box, 1-cpu, 128MB DIMM) 93.56 MBytes/sec (copy) yuch 550 MHz P-III (4U Dell 2400, 1-cpu, 256MB DIMM) 225.92 MBytes/sec (copy) 600 MHz P-III (2U Dell 2450, 2-cpus, 512MB DIMM)) 228.91 MBytes/sec (copy) I was somewhat disappointed with the VALINUX boxes, I expected them to be on par with the DELLs. In anycase, the Sony VAIO workstation with the RamBus memory blew the field away. The cpu is so fast that a buildworld I did was essentially I/O bound. I'll have to go and buy some more RamBus memory for the thing (it only came with 128MB), which is kinda of annoying seeing as I have a gigabyte worth of DIMMs just sitting on my desk :-( that I can't use. I'm tring to imagine 1.3 GHz. That's over a billion instructions a second. And in a few years with the new chip fab lithography standards it's going to be 10 GHz. We need to find something more interesting then buildworlds to do on these machines. -Matt /* * Attempt to test memory copy speeds. Use a buffer large enough to * defeat the on-cpu L1 and L2 caches. */ #include sys/types.h #include sys/time.h #include stdio.h #include stdlib.h #include stdarg.h #include unistd.h #define NLOOP 100 char Buf1[2 * 1024 * 1024]; char Buf2[2 * 1024 * 1024]; int deltausecs(struct timeval *tv1, struct timeval *tv2); int main(int ac, char **av) { int i; double dtime; struct timeval tv1; struct timeval tv2; memset(Buf1, 1, sizeof(Buf1)); for (i = 0; i 10; ++i) bcopy(Buf1, Buf2, sizeof(Buf1)); gettimeofday(tv1, NULL); for (i = 0; i NLOOP; ++i) bcopy(Buf1, Buf2, sizeof(Buf1)); gettimeofday(tv2, NULL); dtime = (double)deltausecs(tv1, tv2); printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime); return(0); } int deltausecs(struct timeval *tv1, struct timeval *tv2) { int usec; usec = (tv2-tv_usec + 100 - tv1-tv_usec); usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100; return(usec); } To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message PGP signature
Re: Machines are getting too damn fast
Date: Sun, 04 Mar 2001 19:39:09 -0600 From: David A. Gobeille [EMAIL PROTECTED] It would also be interesting to see the numbers for an Athlon/PIII system with DDR, if anyone has such a machine. Personally, I'd be [more] interested in a ServerWorks III HE core chipset with four-way interleaved SDRAM. :-) If one _truly_ needs the bandwidth of Rambus (which, IIRC, is higher real-world latency than SDRAM), then how about having the bus bandwidth to back it up? Eddy --- Brotsman Dreger, Inc. EverQuick Internet / EternalCommerce Division E-Mail: [EMAIL PROTECTED] Phone: (316) 794-8922 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Machines are getting too damn fast
:You should see what speed RamBus they were using, 600 or 800 Mhz. It is :pretty fast for large memory writes and reads. It'd be cool to see how :the different speeds stack up against one another. DDR comparisons would :be cool too. Yeah, for the frequency, you have to take into account that :these are different chips than your PIII or Athlons and the performance :difference is not simply a linear relation to the frequency rating :(i.e.: 1.3Ghz is not really over one-billion instructions per second, :just clocks per second). We installed Linux at a UC Free OS User Group :installfest here in cincinnati, it was pretty sweet. The machine was a :Dell and the case was freakin' huge. It also came with a 21" monitor and :stuff. The performace was really good, but not really any better than I :hads gleaned from the newer 1Ghz Athlons or PIII's. It says 800 MHz (PC-800 RIMMs) on the side of the box. The technical reviews basically say that bulk transfer rates for RamBus blow DDR away, but DDR wins for random reads and writes due to RamBus's higher startup latency. I don't have any DDR systems to test but I can devise a test program. Celeron 650 MHz (HP desktop) (DIMM) 16.16 MBytes/sec (copy) Pentium III 550 MHz (Dell 2400) (DIMM) 25.90 MBytes/sec (copy) Pentium 4 1.3 GHz / PC-800 RIMMs (Sony VAIO) 32.38 MBytes/sec (copy) -Matt Compile -O2, changing the two occurances of '512' to '4' will reproduce the original bulk-transfer rates. By default this program tests single-transfer (always cache miss). #include sys/types.h #include sys/time.h #include stdio.h #include stdlib.h #include stdarg.h #include unistd.h #define NLOOP 100 char Buf1[2 * 1024 * 1024]; char Buf2[2 * 1024 * 1024]; int deltausecs(struct timeval *tv1, struct timeval *tv2); int main(int ac, char **av) { int i; double dtime; struct timeval tv1; struct timeval tv2; memset(Buf1, 1, sizeof(Buf1)); for (i = 0; i 10; ++i) bcopy(Buf1, Buf2, sizeof(Buf1)); gettimeofday(tv1, NULL); for (i = 0; i NLOOP; ++i) { int j; int k; for (k = sizeof(int); k = 512; k += sizeof(int)) { for (j = sizeof(Buf1) - k; j = 0; j -= 512) *(int *)(Buf2 + j) = *(int *)(Buf1 + j); } } gettimeofday(tv2, NULL); dtime = (double)deltausecs(tv1, tv2); printf("%6.2f MBytes/sec (copy)\n", (double)sizeof(Buf1) * NLOOP / dtime); return(0); } int deltausecs(struct timeval *tv1, struct timeval *tv2) { int usec; usec = (tv2-tv_usec + 100 - tv1-tv_usec); usec += (tv2-tv_sec - tv1-tv_sec - 1) * 100; return(usec); } To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message