We (GumGum) are using HBase on EC2 happily for past 8 months. Here is why we chose EC2:
(All of these points have been mentioned by Jon before, but I am reiterating them here. I think it's important that people opposing ec2 understand that *these considerations were the most important considerations for us*). 1) The product we were using was experimental. 2) We had no sys admins. 3) The downtime of few minutes (to bring the cluster back up form a failure) would not cost us a lot. 4) We were given a short time to bring our software to market. During eight months we had only had one major failure because of EC2. And the failure did not happen suddenly. *EC2 warned us that the machine we were running our namenode on has gone bad and therefore we should replace it.* We simply booted a new instance (with our hadoop/hbase bundled in it) and pointed the rest of the nodes to the new namenode. With EBS, our data is safe. In fact because of the *new EBS backed instance feature,* it has become even more easier to manage hadoop/hbase on ec2. Thus, it's not true that the people using HBase on EC2 are not aware of the risks involved. They are absolutely ok with the risks! It's a choice we have made deliberately. Regards, Vaibhav Puranik http://aws-musings.com/ On Sat, Mar 13, 2010 at 11:28 AM, Bradford Stephens < bradfordsteph...@gmail.com> wrote: > I don't recommend our customers use EC2 -- especially when you can buy > last-gen 8 core / 8 GB-16GB boxes for $1000-$2000 each, which is all > HBase needs to be happy (unless you're running something like su.pr). > > That being said, we're prototyping and building EC2 mgmt scripts, > because a lot of customers want to try out our platform there. > > In fact, we're rolling out EBS + HBase management on Crane, which is > cloud management using Clojure. > > -B > > On Sat, Mar 13, 2010 at 1:13 PM, Jonathan Gray <jl...@streamy.com> wrote: > > Prasen, > > > > You could definitely do something like that. As long as you keep > everything > > for your Hadoop/HBase setup to use EBS volumes, you should be able to > spin > > the cluster down, turn off the nodes, and then bring them back up at a > later > > time with all the data still intact. > > > > JG > > > >> -----Original Message----- > >> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > >> Sent: Saturday, March 13, 2010 8:42 AM > >> To: hbase-user@hadoop.apache.org > >> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] > >> HUG9) > >> > >> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee > >> <prasen....@gmail.com> wrote: > >> > I agree that running 24/7 hbase servers on ec2 is not advisable. But > >> I > >> > need some suggestions for running mapred-jobs ( in batches ) followed > >> > by updating the results on an existing hbase server. > >> > > >> > Is it advisable to use EBS drives ( attached to each different slave > >> > ) and have them configured as HDSF Storage Directory ? And then use > >> > hbase on top of it. I am assuming that ec2 clusters can be shutdown > >> > and restarted ( at a later point of time ) to use the same hbase. > >> > > >> > -Prasen > >> > > >> > On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <apurt...@apache.org> > >> wrote: > >> >> During the Q&A period after my presentation at HUG9, it was > >> interesting that some in the audience indicated they are running > >> production Hadoop and/or HBase clusters on EC2. I want to follow up on > >> some comments I made there. > >> >> > >> >> This is a little surprising, because currently the HDFS NameNode is > >> a single point of failure which can bring the whole service > >> >> down. That the NameNode is a SPOF is not quite so large a concern if > >> you have the ability to engineer the particular server hosting the > >> NameNode to be especially reliable. However, when > >> >> architecting services on EC2, you must be mindful of its guarantees, > >> or lack thereof. On EC2 the reliability of any given instance is not > >> guaranteed, only the service in the aggregate. > >> >> > >> >> Running > >> >> Hadoop on top of EC2 in production is thus not advised until there > >> is a good hot > >> >> fail over solution for the NameNode. > >> >> > >> >> AWS offers a form of hosted Hadoop called Elastic MapReduce: > >> http://aws.amazon.com/elasticmapreduce/. Note this service treats the > >> Hadoop/HDFS cluster as a transient unreliable construction. So should > >> you. > >> >> > >> >> Regarding a hot fail over solution for the NameNode, there is some > >> really interesting work ongoing at the moment -- "AvatarNode", possibly > >> with inclusion of "BookKeeper" in the architecture. > >> >> > >> >> > >> >> http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high- > >> availability.html > >> >> > >> >> > >> >> http://issues.apache.org/jira/browse/HDFS-976 > >> >> > >> >> http://issues.apache.org/jira/browse/HDFS-234 > >> >> > >> http://issues.apache.org/jira/secure/attachment/12399656/create.png > >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-276 > >> >> Once something like the above is vetted and tested, of course my > >> above advice changes and it would become possible to architect reliable > >> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds. > >> >> > >> >> In the meantime, EC2 and similar IaaS clouds are a great resource > >> for prototyping, research and development, and hosting ephemeral > >> clusters for QA or end to end system tests. The HBase EC2 scripts are a > >> useful tool for doing such things with relative ease. > >> >> > >> >> Best regards, > >> >> > >> >> - Andy > >> >> > >> >> > >> >> > >> >> ----- Original Message ---- > >> >> From: Jonathan Gray > >> >> To: hbase-user@hadoop.apache.org > >> >> Sent: Thu, March 11, 2010 3:01:22 PM > >> >> Subject: RE: [databasepro-48] HUG9 > >> >> > >> >> Pardon the link vomit, hopefully this comes across okay... > >> >> > >> >> > >> >> HBase Project Update by Jonathan Gray > >> >> > >> >> > >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil > >> e&do= > >> >> get&target=HUG9_HBaseUpdate_JonathanGray.pdf > >> >> > >> >> > >> >> HBase and HDFS by Todd Lipcon of Cloudera > >> >> > >> >> > >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil > >> e&do= > >> >> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf > >> >> > >> >> > >> >> HBase on EC2 by Andrew Purtell of Trend Micro > >> >> > >> >> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf > >> >> > >> >> > >> >> > >> >> > >> >> > >> > > >> > >> I have not used EC2 extensively but some of the things you can do are > >> very impressive in terms of spin up. > >> > >> As a sys-admin and a guy who worked at a data center, I would suggest > >> to shop around. Do not fall in love with EC2 because its hip. I you > >> are short on cash. You can get 6 dedicated services for $375.00 USD > >> Per Month > >> http://www.leeware.com/services.html. (I use leeware for some hosting) > >> That is a big difference 6 servers for 375 vs 1 VM for $500. > >> > >> I am not saying use service X or service Y, but I do not see much > >> value. If you have a small strong ops team with > >> kickstart+(puppet/CFegnine) you can get that "fast spin up" magic. > > > > > > > > > > -- > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >