Your table has five "F1 = 9887" rows and joining will make 25 rows with the same F1 value.
I cannot imagine what you're intended to do. 2013/6/12 Gourav Sengupta <gourav.had...@gmail.com>: > Hi, > > I had initially forwarded this request to the user group but am yet to > receive any response. > > I will be grateful if someone can help me out in resolving the issue or > pointing out any mistakes that I may be doing. > > It took me around 5 to 6 hours to generate the test data of around 20 GB > (or more) and there must be a better alternative. > > Regards, > Gourav > > ---------- Forwarded message ---------- > From: Gourav Sengupta <gourav.had...@gmail.com> > Date: Mon, Jun 10, 2013 at 4:10 PM > Subject: GROUP BY Issue > To: u...@hive.apache.org > > > Hi, > > On running the following query I am getting multiple records with same > value of F1 > > SELECT F1, COUNT(*) > FROM > ( > SELECT F1, F2, COUNT(*) > FROM TABLE1 > GROUP BY F1, F2 > ) a > GROUP BY F1; > > As per what I understand there are multiple number of records based on > number of reducers. > > Replicating the test scenario: > STEP1: get the dataset as available in > http://snap.stanford.edu/data/amazon0302.html > > STEP2: Open the file and delete the heading > > STEP3: hadoop fs -mkdir /test > > STEP4: hadoop fs -put amazon0302.txt /test > > STEP5: create external table test (f1 int, f2 int) row format delimited > fields terminated by '\t' lines terminated by '\n' stored as textfile > location '/test'; > > STEP6: create table test1 location '/test1' as select left_table.* from > (select * from test where f1<10000) left_table join (select * from test > where f1 < 10000) right_table; > > STEP7: hadoop fs -mkdir /test2 > > STEP8: create table test2 location '/test2' as select f1, count(*) from > (select f1, f2, count(*) from test1 group by f1, f2) a group by f1; > > STEP9: select * from test2 where f1 = 9887; > > ENVIRONMENT: > HADOOP 2.0.4 > HIVE 0.11 > > Please do let me know whether I am doing anything wrong. > > > Thanks and Regards, > Gourav Sengupta