Re: Persistent HDFS On EC2

Steve Loughran Wed, 11 Mar 2009 06:40:00 -0700

Malcolm Matalka wrote:

If this is not the correct place to ask Hadoop + EC2 questions please
let me know.

I am trying to get a handle on how to use Hadoop on EC2 before
committing any money to it.  My question is, how do I maintain a
persistent HDFS between restarts of instances.  Most of the tutorials I
have found involve the cluster being wiped once all the instances are
shut down but in my particular case I will be feeding output of a
previous days run as the input of the current days run and this data
will get large over time.  I see I can use s3 as the file system, would
I just create an EBS  volume for each instance?  What are my options?


 EBS would cost you more; you'd lose the locality of storage-per-machine.

If you stick the output of some runs back into S3 then the next jobshave no locality and higher startup overhead to pull the data down, butyou dont pay for that download (just the time it takes).

Re: Persistent HDFS On EC2

Reply via email to