I reckon it's all about spindles - I took a quick look at the pretty detailed
hardware config that Owen released with the Hadoop benchmark and it was run on
nodes with 4 Sata drives - the Google blog hints at 12 disks per node (the
number of disks/nodes was only given for their 1Pb experiement). Google got 3
times performance increase with 3 times the number of disks.
Patrick.
From: Tom White <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org; [EMAIL PROTECTED]
Sent: Saturday, November 22, 2008 1:26:25 AM
Subject: Google Terasort Benchmark
>From the Google Blog,
http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html
"We are excited to announce we were able to sort 1TB (stored on the
Google File System as 10 billion 100-byte records in uncompressed text
files) on 1,000 computers in 68 seconds. By comparison, the previous
1TB sorting record [using Hadoop] is 209 seconds on 910 computers."
Something for the Hadoop community to aim for: a threefold performance increase.
Tom