Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Bradford Stephens Sat, 13 Mar 2010 11:29:10 -0800

I don't recommend our customers use EC2 -- especially when you can buy
last-gen 8 core / 8 GB-16GB boxes for $1000-$2000 each, which is all
HBase needs to be happy (unless you're running something like su.pr).


That being said, we're prototyping and building EC2 mgmt scripts,
because a lot of customers want to try out our platform there.

In fact, we're rolling out EBS + HBase management on Crane, which is
cloud management using Clojure.

-B

On Sat, Mar 13, 2010 at 1:13 PM, Jonathan Gray <jl...@streamy.com> wrote:
> Prasen,
>
> You could definitely do something like that.  As long as you keep everything
> for your Hadoop/HBase setup to use EBS volumes, you should be able to spin
> the cluster down, turn off the nodes, and then bring them back up at a later
> time with all the data still intact.
>
> JG
>
>> -----Original Message-----
>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
>> Sent: Saturday, March 13, 2010 8:42 AM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
>> HUG9)
>>
>> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
>> <prasen....@gmail.com> wrote:
>> > I agree that running 24/7 hbase servers on ec2 is not advisable. But
>> I
>> > need some suggestions for running mapred-jobs ( in batches ) followed
>> > by updating the results on an existing hbase server.
>> >
>> > Is it advisable to use EBS drives ( attached to each different  slave
>> > )  and have them configured as HDSF Storage Directory ?  And then use
>> > hbase on top of it. I am assuming that ec2 clusters can be shutdown
>> > and restarted ( at a later point of time ) to use the same hbase.
>> >
>> > -Prasen
>> >
>> > On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <apurt...@apache.org>
>> wrote:
>> >> During the Q&A period after my presentation at HUG9, it was
>> interesting that some in the audience indicated they are running
>> production Hadoop and/or HBase clusters on EC2. I want to follow up on
>> some comments I made there.
>> >>
>> >> This is a little surprising, because currently the HDFS NameNode is
>> a single point of failure which can bring the whole service
>> >> down. That the NameNode is a SPOF is not quite so large a concern if
>> you have the ability to engineer the particular server hosting the
>> NameNode to be especially reliable. However, when
>> >> architecting services on EC2, you must be mindful of its guarantees,
>> or lack thereof. On EC2 the reliability of any given instance is not
>> guaranteed, only the service in the aggregate.
>> >>
>> >> Running
>> >> Hadoop on top of EC2 in production is thus not advised until there
>> is a good hot
>> >> fail over solution for the NameNode.
>> >>
>> >> AWS offers a form of hosted Hadoop called Elastic MapReduce:
>> http://aws.amazon.com/elasticmapreduce/. Note this service treats the
>> Hadoop/HDFS cluster as a transient unreliable construction. So should
>> you.
>> >>
>> >> Regarding a hot fail over solution for the NameNode, there is some
>> really interesting work ongoing at the moment -- "AvatarNode", possibly
>> with inclusion of "BookKeeper" in the architecture.
>> >>
>> >>
>> >>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-
>> availability.html
>> >>
>> >>
>> >>    http://issues.apache.org/jira/browse/HDFS-976
>> >>
>> >>    http://issues.apache.org/jira/browse/HDFS-234
>> >>
>>  http://issues.apache.org/jira/secure/attachment/12399656/create.png
>> >>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
>> >> Once something like the above is vetted and tested, of course my
>> above advice changes and it would become possible to architect reliable
>> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
>> >>
>> >> In the meantime, EC2 and similar IaaS clouds are a great resource
>> for prototyping, research and development, and hosting ephemeral
>> clusters for QA or end to end system tests. The HBase EC2 scripts are a
>> useful tool for doing such things with relative ease.
>> >>
>> >> Best regards,
>> >>
>> >>   - Andy
>> >>
>> >>
>> >>
>> >> ----- Original Message ----
>> >> From: Jonathan Gray
>> >> To: hbase-user@hadoop.apache.org
>> >> Sent: Thu, March 11, 2010 3:01:22 PM
>> >> Subject: RE: [databasepro-48] HUG9
>> >>
>> >> Pardon the link vomit, hopefully this comes across okay...
>> >>
>> >>
>> >> HBase Project Update by Jonathan Gray
>> >>
>> >>
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
>> e&do=
>> >> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
>> >>
>> >>
>> >> HBase and HDFS by Todd Lipcon of Cloudera
>> >>
>> >>
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
>> e&do=
>> >> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
>> >>
>> >>
>> >> HBase on EC2 by Andrew Purtell of Trend Micro
>> >>
>> >> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>> I have not used EC2 extensively but some of the things you can do are
>> very impressive in terms of spin up.
>>
>> As a sys-admin and a guy who worked at a data center, I would suggest
>> to shop around. Do not fall in love with EC2 because its hip. I you
>> are short on cash. You can get 6 dedicated services for $375.00 USD
>> Per Month
>> http://www.leeware.com/services.html. (I use leeware for some hosting)
>> That is a big difference 6 servers for 375 vs 1 VM for $500.
>>
>> I am not saying use service X or service Y, but I do not see much
>> value. If you have a small strong ops team with
>> kickstart+(puppet/CFegnine) you can get that "fast spin up" magic.
>
>
>



-- 
http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Reply via email to