The new scripts in trunk at src/contrib/ec2 will offer this approach soon. Right now they simply back HDFS with instance storage (volatile) and rely on not having more than the HDFS replication factor (default = 3) instances crash or terminate at one time. Using EBS is a big win for its persistence and transparent/background snapshot facility. One thing our scripts will have to deal with though is how to back a ~100 or so node cluster with EBS volumes, and also supporting elastic operation, creating them on the fly as necessary.
Also in the cards is performance and stability testing with HBase root filesystem on Hadoop's S3N fs (http://wiki.apache.org/hadoop/AmazonS3). I tried some limited testing with the S3 fs option just for basic filesystem operations -- albeit on a 209 GB file -- and had an unhappy result so will avoid that for now. Some time ago Clint Morgan ran a simple performance comparison and here was his results: http://markmail.org/message/xqhwgdw25oi7u3rb "So to summarize: loading data: almost twice as slow A long scan is about 1.5 times slower short scans are over an order of magnitude slower and random reads (done on the sorted "scan") are over 2 orders of magnitude slower" In some fairly short time we should have a replacement for the HBase S3 related page up on the wiki. In the meantime you may consider perusing http://www.google.com/search?hl=en&q=hbase+s3 - Andy ________________________________ From: Vaibhav Puranik <vpura...@gmail.com> To: hbase-user@hadoop.apache.org Sent: Mon, November 16, 2009 9:46:52 AM Subject: Re: Hbase on Amazon S3? We have HBase 0.20.0 running on EC2 with EBS volume since July 2009. We are using m1.Large machines for all the 4 nodes. All of our data resides on EBS volume. This helps us in backing up the data. This also helps us in bringing up a separate cluster with the same data for QA purposes. So far no problems. If you have any specific questions please let us know. Regards, Vaibhav Puranik Gumgum On Mon, Nov 16, 2009 at 9:38 AM, Something Something < mailinglist...@gmail.com> wrote: > Anyone installed HBase on S3 (or EC2 for that matter)? Any pointers would > be greatly appreciated. Thanks. >