On May 16, 2011, at 11:03 AM, Evert Lammerts wrote:

> Hi all,
> 
> What acceptance tests are people using when buying clusters for Hadoop? Any 
> pointers to relevant methods?


        We get some test nodes from various manufacturers.  We do some raw IO 
benchmarking vs. our other nodes.  We add them to our various grids to see how 
they perform real world, paying attn to avg task time turn around for certain 
jobs.   Since we know where our current machines are at, we can look at price 
per perf improvements.

        Other random things that I think are important:

                a) Unless someone shares their entire *-site.xml data, most 
published benchmarks on the net are mostly useless.  Simple things like block 
size have a big impact.

                b) Test your actual workload.  Synthetic benchmarks are just 
that--synthetic.  They may not reflect that particular nuances of your job.

                c) Establish a baseline. If you have no hardware today, then at 
least establish something on EC2 to compare.

                d) Make sure you talk to multiple vendors.

                e) Any advice anyone gives you on config is likely going to be 
wrong.

Reply via email to