On Thu, Dec 9, 2010 at 5:17 PM, Mark <static.void....@gmail.com> wrote:

> Does anyone have any thoughts/experiences on running Hadoop in AWS? What
> are some pros/cons?
>
The EMR is a possiblity. If you would like to try some MR job, it's ok, but
if you want to reuse the started instances is better to have your own setup.
Especially for small jobs is inefficient to not just start and stop new
instances, that's why I am not using EMR.

Cons:
The network connection between standard instances are not so big, in some
cases can reduce the overall performance. You cannot garantee rack locality,
your instances are picked up randomly from diverse racks, further increase
the network bandwidth problem.

Pros:
You can easily choose the size of your cluster.



>
> Are there any good AMI's out there for this?
>
I am using whirr based setup of Cloudera distribution. The cluster creation
is always starting from a clean Amazon Linux AMI (or you may select another
one) which image is not tied to Hadoop at all. So you don't need any special
AMI.


>
> Thanks for any advice.
>

Reply via email to