Hey!

I strongly disagree with Tatsaya's assessment of HBase, specifically below:


On Wed, Oct 14, 2009 at 12:31 AM, Tatsuya Kawano
<tatsuy...@snowcocoa.info> wrote:
> HI Keith,
>
> On Wed, Oct 14, 2009 at 11:58 AM, Keith Thomas <keith.tho...@gmail.com> wrote:
>> Am I correct in understanding that a farm of EC2 instances with Hadoop and
>> HBase installed and configured individually by myself are the quickest and
>> most effective way to progress with this effort?
>
> Well, you're not wrong. To run HBase on Amazon Web Services, you
> should use EC2 instances and configure them by yourself. Make sure you
> pick Extra Large instances from EC2 (see:
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A8), and you may
> also want EBS volumes as the storage devices rather than S3. (S3 is
> good for archiving data)
>
>
> But...
>
> Are you really sure you want to use HBase for your Grail based web
> application on the cloud? I would definitely recommend MySQL which
> should be more suitable for both web applications and Amazon Web
> Services environment. HBase is not a cloud database and is currently
> more suitable for batch processing with billions of records.

This is not a correct assessment - first off, what does it mean to be
a "cloud database". And secondly, HBase is suitable for storing real
time queries, and it is a major use case that we have here at
stumbleupon.

>
> If you use HBase for this purpose, you will
>
> -- loose the Object Relational Mapping support from Grails.
> -- have to take care of database transactions and secondary indices by 
> yourself.

You do "lose" the transactions (if you even used them) and you may
have to maintain secondary indexes, but you gain a flexible
schema-less column-oriented datastore that scales far beyond anything
mysql can do.

> -- likely suffered from a latency of data retrieval, unless you use memcached.

This is not correct - HBase has good caching built in, and takes full
advantage of linux's disk buffer cache. Much more effective than MySQL
because it is easier to get more ram across 10-20 machines (or more)
than ram in 1-2 machines.


> -- need more server resources than MySQL. MySQL can run on 1 EC2
> instance, while HBase requires about 12 EC2 instances (2 for masters
> and DFS namenodes, 5 for region servers and DFS datanodes, 5 for
> ZooKeeper)

Again, this is not entirely correct, you are overspecing quite a bit.
3 ZK nodes is fine, and they should be able to run on the "master"
nodes. And you also reveal your misunderstanding, suggesting to the OP
that you can run namenode on 2 hosts and that is that. The situation
for HDFS is (unfortunately) more complicated than that.

It is totally possible for a HBase cluster to be run on 4 EC2
instances, 1 master, 3 datanodes.  Maybe even less, but you are
sacrificing data reliability.


i appreciate your enthusiasm for HBase, but please don't mislead our
users so badly!

Thanks,
-ryan

>
>
> Is there any special reason to use HBase for you web application?
>
> Thanks,
>
> --
> Tatsuya Kawano (Mr.)
> Tokyo, Japan
>

Reply via email to