?? wrote:
Actually, there's a widely misunderstanding of this "Common PC" . Common PC
doesn't means PCs which are daily used, It means the performance of each node, can be
measured by common pc's computing power.
In the matter of fact, we dont use Gb enthernet for daily pcs' communication, we dont use
linux for our document process, and most importantly, Hadoop cannot run effectively on
thoese "daily pc"s.
Hadoop is designed for High performance computing equipment, but "claimed" to be fit for "daily pc"s.
Hadoop for pcs? what a joke.
Hadoop is designed to build a high throughput dataprocessing
infrastructure from commodity PC parts. SATA not RAID or SAN, x68+linux
not supercomputer hardware and OS. You can bring it up on lighter weight
systems, but it has a minimium overhead that is quite steep for small
datasets. I've been doing MapReduce work over small in-memory datasets
using Erlang, which works very well in such a context.
-you need a good network, with DNS working (fast), good backbone and
switches
-the faster your disks, the better your throughput
-ECC memory makes a lot of sense
-you need a good cluster management setup unless you like SSH-ing to 20
boxes to find out which one is playing up