Hi Ricky, This is how the code will look in Pig.
A = load 'textdoc' using TextLoader() as (sentence: chararray); B = foreach A generate flatten(TOKENIZE(sentence)) as word; C = group B by word; D = foreach C generate group, COUNT(B); store D into 'wordcount'; Pig training (http://www.cloudera.com/hadoop-training-pig-tutorial) explains how the example above works. Let me know if you have further questions. Olga > -----Original Message----- > From: Ricky Ho [mailto:r...@adobe.com] > Sent: Wednesday, May 06, 2009 3:56 PM > To: core-user@hadoop.apache.org > Subject: RE: PIG and Hive > > Thanks Amr, > > Without knowing the details of Hive, one constraint of SQL > model is you can never generate more than one records from a > single record. I don't know how this is done in Hive. > Another question is whether the Hive script can take in > user-defined functions ? > > Using the following word count as an example. Can you show > me how the Pig script and Hive script looks like ? > > Map: > Input: a line (a collection of words) > Output: multiple [word, 1] > > Reduce: > Input: [word, [1, 1, 1, ...]] > Output: [word, count] > > Rgds, > Ricky > > -----Original Message----- > From: Amr Awadallah [mailto:a...@cloudera.com] > Sent: Wednesday, May 06, 2009 3:14 PM > To: core-user@hadoop.apache.org > Subject: Re: PIG and Hive > > > The difference between PIG and Hive seems to be pretty > insignificant. > > Difference between Pig and Hive is significant, specifically: > > (1) Pig doesn't require underlying structure to the data, > Hive does imply structure via a metastore. This has it pros > and cons. It allows Pig to be more suitable for ETL kind > tasks where the input data is still a mish-mash and you want > to convert it to be structured. On the other hand, Hive's > metastore provides a dictionary that lets you easily see what > columns exist in which tables which can be very handy. > > (2) Pig is a new language, easy to learn if you know > languages similar to Perl. Hive is a sub-set of SQL with very > simple variations to enable map-reduce like computation. So, > if you come from a SQL background you will find Hive QL > extremely easy to pickup (many of your SQL queries will run > as is), while if you come from a procedural programming > background (w/o SQL knowledge) then Pig will be much more > suitable for you. Furthermore, Hive is a bit easier to > integrate with other systems and tools since it speaks the > language they already speak (i.e. SQL). > > You're right that HBase is a completely different game, HBase > is not about being a high level language that compiles to > map-reduce, HBase is about allowing Hadoop to support > lookups/transactions on key/value pairs. HBase allows you to > (1) do quick random lookups, versus scan all of data > sequentially, (2) do insert/update/delete from middle, not > just add/append. > > -- amr > > Ricky Ho wrote: > > Jeff, > > > > Thanks for the pointer. > > It is pretty clear that Hive and PIG are the same kind and > HBase is a different kind. > > The difference between PIG and Hive seems to be pretty > insignificant. Layer a tool on top of them can completely > hide their difference. > > > > I am viewing your PIG and Hive tutorial and hopefully can > extract some technical details there. > > > > Rgds, > > Ricky > > -----Original Message----- > > From: Jeff Hammerbacher [mailto:ham...@cloudera.com] > > Sent: Wednesday, May 06, 2009 1:38 PM > > To: core-user@hadoop.apache.org > > Subject: Re: PIG and Hive > > > > Here's a permalink for the thread on MarkMail: > > http://markmail.org/thread/ee4hpcji74higqvk > > > > On Wed, May 6, 2009 at 4:55 AM, Sharad Agarwal > <shara...@yahoo-inc.com>wrote: > > > > > >> see core-user mail thread with subject "HBase, Hive, Pig and other > >> Hadoop based technologies" > >> > >> - Sharad > >> > >> Ricky Ho wrote: > >> > >>> Are they competing technologies of providing a higher > level language > >>> for > >>> > >> Map/Reduce programming ? > >> > >>> Or are they complementary ? > >>> > >>> Any comparison between them ? > >>> > >>> Rgds, > >>> Ricky > >>> > >> >