We (GumGum) are using HBase on EC2 happily for past 8 months. Here is why we
chose EC2:

(All of these points have been mentioned by Jon before, but I am reiterating
them here. I think it's important that people opposing ec2 understand
that *these
considerations were the most important considerations for us*).

1) The product we were using was experimental.
2) We had no sys admins.
3) The downtime of few minutes (to bring the cluster back up form a failure)
would not cost us a lot.
4) We were given a short time to bring our software to market.

During eight months we had only had one major failure because of EC2. And
the failure did not happen suddenly. *EC2 warned us that the machine we were
running our namenode on has gone bad and therefore we should replace it.* We
simply booted a new instance (with our hadoop/hbase bundled in it) and
pointed the rest of the nodes to the new namenode.

With EBS, our data is safe. In fact because of the *new EBS backed instance
feature,* it has become even more easier to manage hadoop/hbase on ec2.

Thus, it's not true that the people using HBase on EC2 are not aware of the
risks involved. They are absolutely ok with the risks! It's a choice we have
made deliberately.

Regards,
Vaibhav Puranik
http://aws-musings.com/




On Sat, Mar 13, 2010 at 11:28 AM, Bradford Stephens <
bradfordsteph...@gmail.com> wrote:

> I don't recommend our customers use EC2 -- especially when you can buy
> last-gen 8 core / 8 GB-16GB boxes for $1000-$2000 each, which is all
> HBase needs to be happy (unless you're running something like su.pr).
>
> That being said, we're prototyping and building EC2 mgmt scripts,
> because a lot of customers want to try out our platform there.
>
> In fact, we're rolling out EBS + HBase management on Crane, which is
> cloud management using Clojure.
>
> -B
>
> On Sat, Mar 13, 2010 at 1:13 PM, Jonathan Gray <jl...@streamy.com> wrote:
> > Prasen,
> >
> > You could definitely do something like that.  As long as you keep
> everything
> > for your Hadoop/HBase setup to use EBS volumes, you should be able to
> spin
> > the cluster down, turn off the nodes, and then bring them back up at a
> later
> > time with all the data still intact.
> >
> > JG
> >
> >> -----Original Message-----
> >> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> >> Sent: Saturday, March 13, 2010 8:42 AM
> >> To: hbase-user@hadoop.apache.org
> >> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
> >> HUG9)
> >>
> >> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
> >> <prasen....@gmail.com> wrote:
> >> > I agree that running 24/7 hbase servers on ec2 is not advisable. But
> >> I
> >> > need some suggestions for running mapred-jobs ( in batches ) followed
> >> > by updating the results on an existing hbase server.
> >> >
> >> > Is it advisable to use EBS drives ( attached to each different  slave
> >> > )  and have them configured as HDSF Storage Directory ?  And then use
> >> > hbase on top of it. I am assuming that ec2 clusters can be shutdown
> >> > and restarted ( at a later point of time ) to use the same hbase.
> >> >
> >> > -Prasen
> >> >
> >> > On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <apurt...@apache.org>
> >> wrote:
> >> >> During the Q&A period after my presentation at HUG9, it was
> >> interesting that some in the audience indicated they are running
> >> production Hadoop and/or HBase clusters on EC2. I want to follow up on
> >> some comments I made there.
> >> >>
> >> >> This is a little surprising, because currently the HDFS NameNode is
> >> a single point of failure which can bring the whole service
> >> >> down. That the NameNode is a SPOF is not quite so large a concern if
> >> you have the ability to engineer the particular server hosting the
> >> NameNode to be especially reliable. However, when
> >> >> architecting services on EC2, you must be mindful of its guarantees,
> >> or lack thereof. On EC2 the reliability of any given instance is not
> >> guaranteed, only the service in the aggregate.
> >> >>
> >> >> Running
> >> >> Hadoop on top of EC2 in production is thus not advised until there
> >> is a good hot
> >> >> fail over solution for the NameNode.
> >> >>
> >> >> AWS offers a form of hosted Hadoop called Elastic MapReduce:
> >> http://aws.amazon.com/elasticmapreduce/. Note this service treats the
> >> Hadoop/HDFS cluster as a transient unreliable construction. So should
> >> you.
> >> >>
> >> >> Regarding a hot fail over solution for the NameNode, there is some
> >> really interesting work ongoing at the moment -- "AvatarNode", possibly
> >> with inclusion of "BookKeeper" in the architecture.
> >> >>
> >> >>
> >> >>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-
> >> availability.html
> >> >>
> >> >>
> >> >>    http://issues.apache.org/jira/browse/HDFS-976
> >> >>
> >> >>    http://issues.apache.org/jira/browse/HDFS-234
> >> >>
> >>  http://issues.apache.org/jira/secure/attachment/12399656/create.png
> >> >>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
> >> >> Once something like the above is vetted and tested, of course my
> >> above advice changes and it would become possible to architect reliable
> >> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
> >> >>
> >> >> In the meantime, EC2 and similar IaaS clouds are a great resource
> >> for prototyping, research and development, and hosting ephemeral
> >> clusters for QA or end to end system tests. The HBase EC2 scripts are a
> >> useful tool for doing such things with relative ease.
> >> >>
> >> >> Best regards,
> >> >>
> >> >>   - Andy
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message ----
> >> >> From: Jonathan Gray
> >> >> To: hbase-user@hadoop.apache.org
> >> >> Sent: Thu, March 11, 2010 3:01:22 PM
> >> >> Subject: RE: [databasepro-48] HUG9
> >> >>
> >> >> Pardon the link vomit, hopefully this comes across okay...
> >> >>
> >> >>
> >> >> HBase Project Update by Jonathan Gray
> >> >>
> >> >>
> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> >> e&do=
> >> >> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
> >> >>
> >> >>
> >> >> HBase and HDFS by Todd Lipcon of Cloudera
> >> >>
> >> >>
> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> >> e&do=
> >> >> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
> >> >>
> >> >>
> >> >> HBase on EC2 by Andrew Purtell of Trend Micro
> >> >>
> >> >> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >> I have not used EC2 extensively but some of the things you can do are
> >> very impressive in terms of spin up.
> >>
> >> As a sys-admin and a guy who worked at a data center, I would suggest
> >> to shop around. Do not fall in love with EC2 because its hip. I you
> >> are short on cash. You can get 6 dedicated services for $375.00 USD
> >> Per Month
> >> http://www.leeware.com/services.html. (I use leeware for some hosting)
> >> That is a big difference 6 servers for 375 vs 1 VM for $500.
> >>
> >> I am not saying use service X or service Y, but I do not see much
> >> value. If you have a small strong ops team with
> >> kickstart+(puppet/CFegnine) you can get that "fast spin up" magic.
> >
> >
> >
>
>
>
> --
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>

Reply via email to