On Thu, Mar 4, 2010 at 12:13 PM, Fitrah Elly Firdaus <fitrah.fird...@gmail.com> wrote: > On 03/04/2010 01:19 AM, Michael Segel wrote: >> >> >> >>> >>> Date: Thu, 4 Mar 2010 00:42:11 +0700 >>> From: fitrah.fird...@gmail.com >>> To: common-user@hadoop.apache.org >>> Subject: Hbase VS Hive >>> >>> Hello Everyone >>> >>> I want to ask about Hbase and Hive. >>> >>> What is the different between Hbase and Hive? and then what is the >>> consideration for >>> choose between Hbase or Hive? >>> >>> >> >> Hi, >> >> HBase is a column oriented database that sits on top of HDFS. >> You really want to use it if you're thinking about 'transactional' or >> random access of data within the data set. >> >> Hive sits on HDFS and generates Hadoop jobs to process data. >> >> I believe that Hive supports a SQL query type language (or am I confusing >> it with Pig) so you tend to write a query to walk through your data sets and >> perform a map reduce. >> >> HBase, you want to pull subsets or even individual rows of data from a >> very large data set. >> It can be used as part of Hadoop jobs or as a separate application. >> >> If you're asking if you want to choose one or the other, I'd say think >> about knowing both. >> Also how do you want to persist the data? In HDFS as flat files, or within >> a column oriented database. >> >> HTH >> >> -Mike >> >> > > Thanks for Your Reply, > > Based on your explain,if I want to build Data mining project for Decision > Support System,I should choose hive,is it correct? > > Kind Regards > > Firdaus > Hive and Hbase are very different. I am basically doing a rehash of what some people have said above with my own thoughts. It is hard to sum up complex projects in few words.
To some people, Hbase looks like a distributed, persistent memcached with hadoop as a storage backend. It is designed for fast put, fast scans, fast gets and auto-sharding for linear scalability and performance. It works in the "real time". If you have a data store that you need to scale past a couple of nodes, but need "real-time" put/get/scan hbase might be for you. To some people, Hive looks like a distributed, RDBMS, with hadoop as a storage backend. Hive has a query language that looks like SQL. HQL is missing some things that relation databases do, but adds some things that relation database can not easily do. http://wiki.apache.org/hadoop/Hive/LanguageManual Hive is not a real time system. It goals are not to support Real time, select , insertions, no updates. Hive allows you to wrap a schema around a file in HDFS and treat is as a "table" and then "query it". These files can be (and usually are) really really really large. Hive can effectively parallelize queries using map/reduce, something that traditional single node database can not normally do. Because hive can work over files in HDFS it can deal with input in many formats. Below is a new feature being added that will allow hive to do queries over HBase data (Hbase input format) https://issues.apache.org/jira/browse/HIVE-705 This is something the Hive community is very excited about. It opens up many doors, for both hbase and Hive.