Re: Hbase VS Hive

Edward Capriolo Thu, 04 Mar 2010 09:41:13 -0800

On Thu, Mar 4, 2010 at 12:13 PM, Fitrah Elly Firdaus
<fitrah.fird...@gmail.com> wrote:
> On 03/04/2010 01:19 AM, Michael Segel wrote:
>>
>>
>>
>>>
>>> Date: Thu, 4 Mar 2010 00:42:11 +0700
>>> From: fitrah.fird...@gmail.com
>>> To: common-user@hadoop.apache.org
>>> Subject: Hbase VS Hive
>>>
>>> Hello Everyone
>>>
>>> I want to ask about Hbase and Hive.
>>>
>>> What is the different between Hbase and Hive? and then what is the
>>> consideration for
>>> choose between Hbase or Hive?
>>>
>>>
>>
>> Hi,
>>
>> HBase is a column oriented database that sits on top of HDFS.
>> You really want to use it if you're thinking about 'transactional' or
>> random access of data within the data set.
>>
>> Hive sits on HDFS and generates Hadoop jobs to process data.
>>
>> I believe that Hive supports a SQL query type language (or am I confusing
>> it with Pig) so you tend to write a query to walk through your data sets and
>> perform a map reduce.
>>
>> HBase, you want to pull subsets or even individual rows of data from a
>> very large data set.
>> It can be used as part of Hadoop jobs or as a separate application.
>>
>> If you're asking if you want to choose one or the other, I'd say think
>> about knowing both.
>> Also how do you want to persist the data? In HDFS as flat files, or within
>> a column oriented database.
>>
>> HTH
>>
>> -Mike
>>
>>
>
> Thanks for Your Reply,
>
> Based on your explain,if I want to build Data mining project for Decision
> Support System,I should choose hive,is it correct?
>
> Kind Regards
>
> Firdaus
>
Hive and Hbase are very different. I am basically doing a rehash of
what some people have said above with my own thoughts. It is hard to
sum up complex projects in few words.


To some people, Hbase looks like a distributed, persistent memcached
with hadoop as a storage backend.

It is designed for fast put, fast scans, fast gets and auto-sharding
for linear scalability and performance. It works in the "real time".
If you have a data store that you need to scale past a couple of
nodes, but need "real-time" put/get/scan hbase might be for you.

To some people, Hive looks like a distributed, RDBMS, with hadoop as a
storage backend.

Hive has a query language that looks like SQL. HQL is missing some
things that relation databases do, but adds some things that relation
database can not easily do.
http://wiki.apache.org/hadoop/Hive/LanguageManual

Hive is not a real time system. It goals are not to support Real time,
select , insertions, no updates. Hive allows you to wrap a schema
around a file in HDFS and treat is as a "table" and then "query it".
These files can be (and usually are) really really really large. Hive
can effectively parallelize queries using map/reduce, something that
traditional single node database can not normally do.

Because hive can work over files in HDFS it can deal with input in
many formats. Below is a new feature being added that will allow hive
to do queries over HBase data (Hbase input format)

https://issues.apache.org/jira/browse/HIVE-705

This is something the Hive community is very excited about. It opens
up many doors, for both hbase and Hive.

Re: Hbase VS Hive

Reply via email to