On Thu, Dec 9, 2010 at 5:17 PM, Mark <static.void....@gmail.com> wrote:
> Does anyone have any thoughts/experiences on running Hadoop in AWS? What > are some pros/cons? > The EMR is a possiblity. If you would like to try some MR job, it's ok, but if you want to reuse the started instances is better to have your own setup. Especially for small jobs is inefficient to not just start and stop new instances, that's why I am not using EMR. Cons: The network connection between standard instances are not so big, in some cases can reduce the overall performance. You cannot garantee rack locality, your instances are picked up randomly from diverse racks, further increase the network bandwidth problem. Pros: You can easily choose the size of your cluster. > > Are there any good AMI's out there for this? > I am using whirr based setup of Cloudera distribution. The cluster creation is always starting from a clean Amazon Linux AMI (or you may select another one) which image is not tied to Hadoop at all. So you don't need any special AMI. > > Thanks for any advice. >