We recently crossed this bridge and here are some insights. We did an
extensive study comparing costs and benchmarking local vs EMR for our
current needs and future trend.

- Scalability you get with EMR is unmatched although you need to look at
your requirement and decide this is something you need.

- When using EMR its cheaper to use reserved instances vs nodes on the fly.
You can always add more nodes when required. I suggest looking at your
current computing needs and reserve instances for a year or two and use
these to run EMR and add nodes at peak needs. In your cost estimation you
will need to factor in the data transfer time/costs unless you are dealing
with public datasets on S3

- EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
benchmark). For IO intensive jobs you will need to add more nodes to
compensate this.

- When compared to local cluster, you will need to factor the time it takes
for the EMR cluster to setup when starting a job. This like data transfer
time, cluster replication time etc

- EMR API is very flexible however you will need to build a custom interface
on top of it to suit your job management and monitoring needs

- EMR bootstrap actions can satisfy most of your native lib needs so no
drawbacks there.


-- Sudhir


On 12/26/10 5:26 AM, "common-user-digest-h...@hadoop.apache.org"
<common-user-digest-h...@hadoop.apache.org> wrote:

> From: Otis Gospodnetic <otis_gospodne...@yahoo.com>
> Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
> To: <common-user@hadoop.apache.org>
> Subject: Re: Hadoop/Elastic MR on AWS
> 
> Hello Amandeep,
> 
> 
> 
> ----- Original Message ----
>> From: Amandeep Khurana <ama...@gmail.com>
>> To: common-user@hadoop.apache.org
>> Sent: Fri, December 10, 2010 1:14:45 AM
>> Subject: Re: Hadoop/Elastic MR on AWS
>> 
>> Mark,
>> 
>> Using EMR makes it very easy to start a cluster and add/reduce  capacity as
>> and when required. There are certain optimizations that make EMR  an
>> attractive choice as compared to building your own cluster out. Using  EMR
> 
> 
> Could you please point out what optimizations you are referring to?
> 
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
>> also ensures you are using a production quality, stable system backed by  the
>> EMR engineers. You can always use bootstrap actions to put your own  tweaked
>> version of Hadoop in there if you want to do that.
>> 
>> Also, you  don't have to tear down your cluster after every job. You can set
>> the alive  option when you start your cluster and it will stay there even
>> after your  Hadoop job completes.
>> 
>> If you face any issues with EMR, send me a mail  offline and I'll be happy to
>> help.
>> 
>> -Amandeep
>> 
>> 
>> On Thu, Dec 9,  2010 at 9:47 PM, Mark <static.void....@gmail.com>  wrote:
>> 
>>> Does anyone have any thoughts/experiences on running Hadoop  in AWS? What
>>> are some pros/cons?
>>> 
>>> Are there any good  AMI's out there for this?
>>> 
>>> Thanks for any advice.
>>> 
>> 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.


Reply via email to