This is what we do. If you configure HDFS to store its data on an attached Elastic Block Store (EBS) volume, you can easily make periodic snapshots of your data which gets stored in S3. With this setup, you can easily launch additional clusters based on the snapshotted data for testing and simulation purposes.
On Wed, Oct 14, 2009 at 2:25 PM, Andrew Hitchcock <[email protected]>wrote: > My suggestion would be to run HDFS on your cluster, tell HBase to > write to HDFS, and then make periodic snapshots of your data to S3. > > Regards, > Andrew > >
