Has anyone done any extensive testing of what instance types on Amazon EC2
give you the most bang for the buck?

Given the normal Hadoop recommendations of beefy machines, I would expect
the best performance from the extra-large, but our testing showed otherwise.
We did some rough testing while we were just getting started with like a 10
node cluster, and we found that the extra large instance doesn't come close
to twice the actual performance of the large instance (pricing at $0.80 and
$0.40). My rationalization is that some of the resources are shared, and the
extra-large instance corresponds to the actual hardware, while the large
instance sometimes gets to take advantage of IO and network bandwidth beyond
50% when the other tenant isn't doing much.

I'm revisiting our config because we're deploying HBase soon, and I'm not
sure whether I would be better off going to the extra-large instances so
that I can co-locate the tasktrackers and the region servers on the same
nodes, or if I should stick with large instances and put hbase on separate
servers. Mostly I'm wondering if my results were a fluke.

Reply via email to