I don't recommend our customers use EC2 -- especially when you can buy last-gen 8 core / 8 GB-16GB boxes for $1000-$2000 each, which is all HBase needs to be happy (unless you're running something like su.pr).
That being said, we're prototyping and building EC2 mgmt scripts, because a lot of customers want to try out our platform there. In fact, we're rolling out EBS + HBase management on Crane, which is cloud management using Clojure. -B On Sat, Mar 13, 2010 at 1:13 PM, Jonathan Gray <jl...@streamy.com> wrote: > Prasen, > > You could definitely do something like that. As long as you keep everything > for your Hadoop/HBase setup to use EBS volumes, you should be able to spin > the cluster down, turn off the nodes, and then bring them back up at a later > time with all the data still intact. > > JG > >> -----Original Message----- >> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] >> Sent: Saturday, March 13, 2010 8:42 AM >> To: hbase-user@hadoop.apache.org >> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] >> HUG9) >> >> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee >> <prasen....@gmail.com> wrote: >> > I agree that running 24/7 hbase servers on ec2 is not advisable. But >> I >> > need some suggestions for running mapred-jobs ( in batches ) followed >> > by updating the results on an existing hbase server. >> > >> > Is it advisable to use EBS drives ( attached to each different slave >> > ) and have them configured as HDSF Storage Directory ? And then use >> > hbase on top of it. I am assuming that ec2 clusters can be shutdown >> > and restarted ( at a later point of time ) to use the same hbase. >> > >> > -Prasen >> > >> > On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <apurt...@apache.org> >> wrote: >> >> During the Q&A period after my presentation at HUG9, it was >> interesting that some in the audience indicated they are running >> production Hadoop and/or HBase clusters on EC2. I want to follow up on >> some comments I made there. >> >> >> >> This is a little surprising, because currently the HDFS NameNode is >> a single point of failure which can bring the whole service >> >> down. That the NameNode is a SPOF is not quite so large a concern if >> you have the ability to engineer the particular server hosting the >> NameNode to be especially reliable. However, when >> >> architecting services on EC2, you must be mindful of its guarantees, >> or lack thereof. On EC2 the reliability of any given instance is not >> guaranteed, only the service in the aggregate. >> >> >> >> Running >> >> Hadoop on top of EC2 in production is thus not advised until there >> is a good hot >> >> fail over solution for the NameNode. >> >> >> >> AWS offers a form of hosted Hadoop called Elastic MapReduce: >> http://aws.amazon.com/elasticmapreduce/. Note this service treats the >> Hadoop/HDFS cluster as a transient unreliable construction. So should >> you. >> >> >> >> Regarding a hot fail over solution for the NameNode, there is some >> really interesting work ongoing at the moment -- "AvatarNode", possibly >> with inclusion of "BookKeeper" in the architecture. >> >> >> >> >> >> http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high- >> availability.html >> >> >> >> >> >> http://issues.apache.org/jira/browse/HDFS-976 >> >> >> >> http://issues.apache.org/jira/browse/HDFS-234 >> >> >> http://issues.apache.org/jira/secure/attachment/12399656/create.png >> >> https://issues.apache.org/jira/browse/ZOOKEEPER-276 >> >> Once something like the above is vetted and tested, of course my >> above advice changes and it would become possible to architect reliable >> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds. >> >> >> >> In the meantime, EC2 and similar IaaS clouds are a great resource >> for prototyping, research and development, and hosting ephemeral >> clusters for QA or end to end system tests. The HBase EC2 scripts are a >> useful tool for doing such things with relative ease. >> >> >> >> Best regards, >> >> >> >> - Andy >> >> >> >> >> >> >> >> ----- Original Message ---- >> >> From: Jonathan Gray >> >> To: hbase-user@hadoop.apache.org >> >> Sent: Thu, March 11, 2010 3:01:22 PM >> >> Subject: RE: [databasepro-48] HUG9 >> >> >> >> Pardon the link vomit, hopefully this comes across okay... >> >> >> >> >> >> HBase Project Update by Jonathan Gray >> >> >> >> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil >> e&do= >> >> get&target=HUG9_HBaseUpdate_JonathanGray.pdf >> >> >> >> >> >> HBase and HDFS by Todd Lipcon of Cloudera >> >> >> >> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil >> e&do= >> >> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf >> >> >> >> >> >> HBase on EC2 by Andrew Purtell of Trend Micro >> >> >> >> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf >> >> >> >> >> >> >> >> >> >> >> > >> >> I have not used EC2 extensively but some of the things you can do are >> very impressive in terms of spin up. >> >> As a sys-admin and a guy who worked at a data center, I would suggest >> to shop around. Do not fall in love with EC2 because its hip. I you >> are short on cash. You can get 6 dedicated services for $375.00 USD >> Per Month >> http://www.leeware.com/services.html. (I use leeware for some hosting) >> That is a big difference 6 servers for 375 vs 1 VM for $500. >> >> I am not saying use service X or service Y, but I do not see much >> value. If you have a small strong ops team with >> kickstart+(puppet/CFegnine) you can get that "fast spin up" magic. > > > -- http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science