Re: Frequency count in pig

2014-05-16 Thread Serega Sheypak
Sample pseudocode. The idea is to group tuples by movie_id and count size of group bags. movieAlias = LOAD 'path/to/movie/files' as ( user_id:long,movie_id:long,timestamp:long); groupedByMovie = group movieAlias by movie_id; counted = FOREACH groupedByMovie GENERATE group as movie_id,

RE: Frequency count in pig

2014-05-16 Thread Steve Bernstein
Really easy, fundamental actually. a = Group your_data by (user_id,movie); foreach a generate flatten(group) count($1) ; -Original Message- From: Chengi Liu [mailto:chengi.liu...@gmail.com] Sent: Wednesday, May 14, 2014 1:25 PM To: user@pig.apache.org Subject: Frequency count in pig

Frequency count in pig

2014-05-16 Thread jamal sasha
Hi, My data is in format: user_id,movie_id,timestamp 123, abc,unix_timestamp 123, def, ... 123, abc, ... 234, sda, ... Now, I want to compute the number of times each movie is played in pig.. So the output I am expecting is: 123,abc,2 123,def,1 234,sda,1 and

Re: Frequency count in pig

2014-05-16 Thread Shengjun Xin
such as the following: movie = LOAD '$input' AS (user_id:int, movie_id:chararray, timestamp:int); movie_group = GROUP movie by user_id; movie_count = FOREACH movie_group GENERATE group as user_id, movie_id, COUNT($1) AS MovieCount; On Thu, May 15, 2014 at 4:25 AM, Chengi Liu

HCatLoader Table not found

2014-05-16 Thread Patcharee Thongtra
Hi, I am using HCatLoader to load data from a table (existing in hive). A = load 'rwf_data' USING org.apache.hcatalog.pig.HCatLoader(); describe A; I got Error 1115: Table not found : ... It is weird. Any suggestions on this? Thanks Patcharee

Re: store to defined filename

2014-05-16 Thread Raviteja Chirala
You can either do Hadoop mv if its a wrapper script or do getMerge to merge and rename all part files to single part file. On May 14, 2014, at 2:11 AM, Patcharee Thongtra patcharee.thong...@uni.no wrote: Hi, Is it possible to store results in to a file with determined filename, instead

Re: store to defined filename

2014-05-16 Thread Mohammad Tariq
Hi there, You could do that with the help of MultipleOutputFormathttp://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.htmlclass. It extends FileOutputFormat,and allows us to write the output data to different output files. *Warm regards,* *Mohammad Tariq*