Re: AWS MapReduce

Mohit Anchlia Mon, 05 Mar 2012 09:29:29 -0800

On Mon, Mar 5, 2012 at 7:40 AM, John Conwell <j...@iamjohn.me> wrote:


> AWS MapReduce (EMR) does not use S3 for its HDFS persistance.  If it did
> your S3 billing would be massive :)  EMR reads all input jar files and
> input data from S3, but it copies these files down to its local disk.  It
> then does starts the MR process, doing all HDFS reads and writes to the
> local disks.  At the end of the MR job, it copies the MR job output and all
> process logs to S3, and then tears down the VM instances.
>
> You can see this for yourself if you spin up a small EMR cluster, but turn
> off the configuration flag that kills the VMs at the end if the MR job.
>  Then look at the hadoop configuration files to see how hadoop is
> configured.
>
> I really like EMR.  Amazon  has done a lot of work to optimize the hadoop
> configurations and VM instance AMIs to execute MR jobs fairly efficiently
> on a VM cluster.  I had to do a lot of (expensive) trial and error work to
> figure out an optimal hadoop / VM configuration to run our MR jobs without
> crashing / timing out the jobs.  The only reason we didnt standardize on
> EMR was that it strongly bound your code base / process to using EMR for
> hadoop processing, vs a flexible infrastructure that could use a local
> cluster or cluster on a different cloud provider.
>
> Thanks for your input. I am assuming HDFS is created on ephemerial disks
and not EBS. Also, is it possible to share some of your findings?

>
> On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia <mohitanch...@gmail.com
> >wrote:
>
> > As far as I see in the docs it looks like you could also use hdfs instead
> > of s3. But what I am not sure is if these are local disks or EBS.
> >
> > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer <
> > hannesc...@googlemail.com
> > > wrote:
> >
> > > Hi,
> > >
> > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow.
> > > The setup is done pretty fast and there are some configuration
> parameters
> > > you can bypass - for example blocksizes etc. - but in the end imho
> > setting
> > > up ec2 instances by copying images is the better alternative.
> > >
> > > Kind Regards
> > >
> > > Hannes
> > >
> > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <mohitanch...@gmail.com
> > > >wrote:
> > >
> > > > I think found answer to this question. However, it's still not clear
> if
> > > > HDFS is on local disk or EBS volumes. Does anyone know?
> > > >
> > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia <
> mohitanch...@gmail.com
> > > > >wrote:
> > > >
> > > > > Just want to check  how many are using AWS mapreduce and understand
> > the
> > > > > pros and cons of Amazon's MapReduce machines? Is it true that these
> > map
> > > > > reduce machines are really reading and writing from S3 instead of
> > local
> > > > > disks? Has anyone found issues with Amazon MapReduce and how does
> it
> > > > > compare with using MapReduce on local attached disks compared to
> > using
> > > > S3.
> > > >
> > >
> > > ---
> > > www.informera.de
> > > Hadoop & Big Data Services
> > >
> >
>
>
>
> --
>
> Thanks,
> John C
>

Re: AWS MapReduce

Reply via email to