If this is not the correct place to ask Hadoop + EC2 questions please let me know.
I am trying to get a handle on how to use Hadoop on EC2 before committing any money to it. My question is, how do I maintain a persistent HDFS between restarts of instances. Most of the tutorials I have found involve the cluster being wiped once all the instances are shut down but in my particular case I will be feeding output of a previous days run as the input of the current days run and this data will get large over time. I see I can use s3 as the file system, would I just create an EBS volume for each instance? What are my options? Thanks