On May 16, 2011, at 11:03 AM, Evert Lammerts wrote: > Hi all, > > What acceptance tests are people using when buying clusters for Hadoop? Any > pointers to relevant methods?
We get some test nodes from various manufacturers. We do some raw IO benchmarking vs. our other nodes. We add them to our various grids to see how they perform real world, paying attn to avg task time turn around for certain jobs. Since we know where our current machines are at, we can look at price per perf improvements. Other random things that I think are important: a) Unless someone shares their entire *-site.xml data, most published benchmarks on the net are mostly useless. Simple things like block size have a big impact. b) Test your actual workload. Synthetic benchmarks are just that--synthetic. They may not reflect that particular nuances of your job. c) Establish a baseline. If you have no hardware today, then at least establish something on EC2 to compare. d) Make sure you talk to multiple vendors. e) Any advice anyone gives you on config is likely going to be wrong.