Re: Google Terasort Benchmark

2008-11-22 Thread Patrick McCormack

I reckon it's all about spindles - I took a quick look at the pretty detailed 
hardware config that Owen released with the Hadoop benchmark and it was run on 
nodes with 4 Sata drives - the Google blog hints at 12 disks per node (the 
number of disks/nodes was only given for their 1Pb experiement). Google got 3 
times performance increase with 3 times the number of disks. 

Patrick.





From: Tom White <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org; [EMAIL PROTECTED]
Sent: Saturday, November 22, 2008 1:26:25 AM
Subject: Google Terasort Benchmark

>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html

"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."

Something for the Hadoop community to aim for: a threefold performance increase.

Tom



  

Google Terasort Benchmark

2008-11-22 Thread Tom White
>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html

"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."

Something for the Hadoop community to aim for: a threefold performance increase.

Tom