On Mon, Mar 5, 2012 at 7:40 AM, John Conwell <j...@iamjohn.me> wrote:
> AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did > your S3 billing would be massive :) EMR reads all input jar files and > input data from S3, but it copies these files down to its local disk. It > then does starts the MR process, doing all HDFS reads and writes to the > local disks. At the end of the MR job, it copies the MR job output and all > process logs to S3, and then tears down the VM instances. > > You can see this for yourself if you spin up a small EMR cluster, but turn > off the configuration flag that kills the VMs at the end if the MR job. > Then look at the hadoop configuration files to see how hadoop is > configured. > > I really like EMR. Amazon has done a lot of work to optimize the hadoop > configurations and VM instance AMIs to execute MR jobs fairly efficiently > on a VM cluster. I had to do a lot of (expensive) trial and error work to > figure out an optimal hadoop / VM configuration to run our MR jobs without > crashing / timing out the jobs. The only reason we didnt standardize on > EMR was that it strongly bound your code base / process to using EMR for > hadoop processing, vs a flexible infrastructure that could use a local > cluster or cluster on a different cloud provider. > > Thanks for your input. I am assuming HDFS is created on ephemerial disks and not EBS. Also, is it possible to share some of your findings? > > On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia <mohitanch...@gmail.com > >wrote: > > > As far as I see in the docs it looks like you could also use hdfs instead > > of s3. But what I am not sure is if these are local disks or EBS. > > > > On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer < > > hannesc...@googlemail.com > > > wrote: > > > > > Hi, > > > > > > yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. > > > The setup is done pretty fast and there are some configuration > parameters > > > you can bypass - for example blocksizes etc. - but in the end imho > > setting > > > up ec2 instances by copying images is the better alternative. > > > > > > Kind Regards > > > > > > Hannes > > > > > > On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia <mohitanch...@gmail.com > > > >wrote: > > > > > > > I think found answer to this question. However, it's still not clear > if > > > > HDFS is on local disk or EBS volumes. Does anyone know? > > > > > > > > On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia < > mohitanch...@gmail.com > > > > >wrote: > > > > > > > > > Just want to check how many are using AWS mapreduce and understand > > the > > > > > pros and cons of Amazon's MapReduce machines? Is it true that these > > map > > > > > reduce machines are really reading and writing from S3 instead of > > local > > > > > disks? Has anyone found issues with Amazon MapReduce and how does > it > > > > > compare with using MapReduce on local attached disks compared to > > using > > > > S3. > > > > > > > > > > --- > > > www.informera.de > > > Hadoop & Big Data Services > > > > > > > > > -- > > Thanks, > John C >