-----Original Message-----
From: Ricky Ho [mailto:r...@adobe.com]
Sent: Wednesday, May 06, 2009 3:56 PM
To: core-user@hadoop.apache.org
Subject: RE: PIG and Hive
Thanks Amr,
Without knowing the details of Hive, one constraint of SQL
model is you can never generate more than one records from a
single record. I don't know how this is done in Hive.
Another question is whether the Hive script can take in
user-defined functions ?
Using the following word count as an example. Can you show
me how the Pig script and Hive script looks like ?
Map:
Input: a line (a collection of words)
Output: multiple [word, 1]
Reduce:
Input: [word, [1, 1, 1, ...]]
Output: [word, count]
Rgds,
Ricky
-----Original Message-----
From: Amr Awadallah [mailto:a...@cloudera.com]
Sent: Wednesday, May 06, 2009 3:14 PM
To: core-user@hadoop.apache.org
Subject: Re: PIG and Hive
The difference between PIG and Hive seems to be pretty
insignificant.
Difference between Pig and Hive is significant, specifically:
(1) Pig doesn't require underlying structure to the data,
Hive does imply structure via a metastore. This has it pros
and cons. It allows Pig to be more suitable for ETL kind
tasks where the input data is still a mish-mash and you want
to convert it to be structured. On the other hand, Hive's
metastore provides a dictionary that lets you easily see what
columns exist in which tables which can be very handy.
(2) Pig is a new language, easy to learn if you know
languages similar to Perl. Hive is a sub-set of SQL with very
simple variations to enable map-reduce like computation. So,
if you come from a SQL background you will find Hive QL
extremely easy to pickup (many of your SQL queries will run
as is), while if you come from a procedural programming
background (w/o SQL knowledge) then Pig will be much more
suitable for you. Furthermore, Hive is a bit easier to
integrate with other systems and tools since it speaks the
language they already speak (i.e. SQL).
You're right that HBase is a completely different game, HBase
is not about being a high level language that compiles to
map-reduce, HBase is about allowing Hadoop to support
lookups/transactions on key/value pairs. HBase allows you to
(1) do quick random lookups, versus scan all of data
sequentially, (2) do insert/update/delete from middle, not
just add/append.
-- amr
Ricky Ho wrote:
Jeff,
Thanks for the pointer.
It is pretty clear that Hive and PIG are the same kind and
HBase is a different kind.
The difference between PIG and Hive seems to be pretty
insignificant. Layer a tool on top of them can completely
hide their difference.
I am viewing your PIG and Hive tutorial and hopefully can
extract some technical details there.
Rgds,
Ricky
-----Original Message-----
From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
Sent: Wednesday, May 06, 2009 1:38 PM
To: core-user@hadoop.apache.org
Subject: Re: PIG and Hive
Here's a permalink for the thread on MarkMail:
http://markmail.org/thread/ee4hpcji74higqvk
On Wed, May 6, 2009 at 4:55 AM, Sharad Agarwal
<shara...@yahoo-inc.com>wrote:
see core-user mail thread with subject "HBase, Hive, Pig and
other
Hadoop based technologies"
- Sharad
Ricky Ho wrote:
Are they competing technologies of providing a higher
level language
for
Map/Reduce programming ?
Or are they complementary ?
Any comparison between them ?
Rgds,
Ricky