Hi Ricky,

This is how the code will look in Pig. 

A = load 'textdoc' using TextLoader() as (sentence: chararray);
B = foreach A generate flatten(TOKENIZE(sentence)) as word;
C = group B by word;
D = foreach C generate group, COUNT(B);
store D into 'wordcount';

Pig training (http://www.cloudera.com/hadoop-training-pig-tutorial)
explains how the example above works.

Let me know if you have further questions.

Olga
 

> -----Original Message-----
> From: Ricky Ho [mailto:r...@adobe.com] 
> Sent: Wednesday, May 06, 2009 3:56 PM
> To: core-user@hadoop.apache.org
> Subject: RE: PIG and Hive
> 
> Thanks Amr,
> 
> Without knowing the details of Hive, one constraint of SQL 
> model is you can never generate more than one records from a 
> single record.  I don't know how this is done in Hive.  
> Another question is whether the Hive script can take in 
> user-defined functions ?
> 
> Using the following word count as an example.  Can you show 
> me how the Pig script and Hive script looks like ?
> 
> Map:
>   Input: a line (a collection of words)
>   Output: multiple [word, 1]
> 
> Reduce:
>   Input: [word, [1, 1, 1, ...]]
>   Output: [word, count] 
> 
> Rgds,
> Ricky
> 
> -----Original Message-----
> From: Amr Awadallah [mailto:a...@cloudera.com]
> Sent: Wednesday, May 06, 2009 3:14 PM
> To: core-user@hadoop.apache.org
> Subject: Re: PIG and Hive
> 
> > The difference between PIG and Hive seems to be pretty 
> insignificant. 
> 
> Difference between Pig and Hive is significant, specifically:
> 
> (1) Pig doesn't require underlying structure to the data, 
> Hive does imply structure via a metastore. This has it pros 
> and cons. It allows Pig to be more suitable for ETL kind 
> tasks where the input data is still a mish-mash and you want 
> to convert it to be structured. On the other hand, Hive's 
> metastore provides a dictionary that lets you easily see what 
> columns exist in which tables which can be very handy.
> 
> (2) Pig is a new language, easy to learn if you know 
> languages similar to Perl. Hive is a sub-set of SQL with very 
> simple variations to enable map-reduce like computation. So, 
> if you come from a SQL background you will find Hive QL 
> extremely easy to pickup (many of your SQL queries will run 
> as is), while if you come from a procedural programming 
> background (w/o SQL knowledge) then Pig will be much more 
> suitable for you. Furthermore, Hive is a bit easier to 
> integrate with other systems and tools since it speaks the 
> language they already speak (i.e. SQL).
> 
> You're right that HBase is a completely different game, HBase 
> is not about being a high level language that compiles to 
> map-reduce, HBase is about allowing Hadoop to support 
> lookups/transactions on key/value pairs. HBase allows you to 
> (1) do quick random lookups, versus scan all of data 
> sequentially, (2) do insert/update/delete from middle, not 
> just add/append.
> 
> -- amr
> 
> Ricky Ho wrote:
> > Jeff,
> >
> > Thanks for the pointer.
> > It is pretty clear that Hive and PIG are the same kind and 
> HBase is a different kind.
> > The difference between PIG and Hive seems to be pretty 
> insignificant.  Layer a tool on top of them can completely 
> hide their difference.
> >
> > I am viewing your PIG and Hive tutorial and hopefully can 
> extract some technical details there.
> >
> > Rgds,
> > Ricky
> > -----Original Message-----
> > From: Jeff Hammerbacher [mailto:ham...@cloudera.com]
> > Sent: Wednesday, May 06, 2009 1:38 PM
> > To: core-user@hadoop.apache.org
> > Subject: Re: PIG and Hive
> >
> > Here's a permalink for the thread on MarkMail:
> > http://markmail.org/thread/ee4hpcji74higqvk
> >
> > On Wed, May 6, 2009 at 4:55 AM, Sharad Agarwal 
> <shara...@yahoo-inc.com>wrote:
> >
> >   
> >> see core-user mail thread with subject "HBase, Hive, Pig and other 
> >> Hadoop based technologies"
> >>
> >> - Sharad
> >>
> >> Ricky Ho wrote:
> >>     
> >>> Are they competing technologies of providing a higher 
> level language 
> >>> for
> >>>       
> >> Map/Reduce programming ?
> >>     
> >>> Or are they complementary ?
> >>>
> >>> Any comparison between them ?
> >>>
> >>> Rgds,
> >>> Ricky
> >>>       
> >>     
> 

Reply via email to