Shashi Here you'd definitely need a set of map reduce process to do the aggregation of values on the reducer. Now for sorting the output in very simple terms use another set of map reduce where the map output key would be the value of the first Map Reduce output and the map output value would be the the key of the first MapReduce output. One more map reduce process is certainly expensive. You can watch out the post as the experts would comment if there are better solutions to your problem.
Regards Bejoy.K.S On Wed, Sep 14, 2011 at 12:04 PM, ksgupta misc <ksgupta.m...@gmail.com>wrote: > Hi Guys, > Thanku for your valuable suggestion. > I see this works fine in cases were key values are unique. > > In my use cases the values are as follows: > *<bookid>,<eid>,<rating>* > 0000012742,3244,1 > 0028604164,2344,3 > 0062059017,2344,5 > 0075546701,2344,1 > 0130213268,2344,8 > 0140105425,5675,3 > 0141304286,5677,6 > 0195052668,3453,8 > 0198775024,2342,9 > 0000012742,2346,2 > 0028604164,9789,4 > 0062059017,2346,3 > 0075546701,2345,2 > 0130213268,8907,4 > 0140105425,5675,5 > 0141304286,3457,6 > 0195052668,5678,7 > 0198775024,8975,8 > 0000012742,6798,3 > 0028604164,5434,7 > 0062059017,9754,4 > 0075546701,7890,6 > 0130213268,7655,7 > 0140105425,7564,8 > 0141304286,8433,3 > 0195052668,3252,6 > 0198775024,7765,7 > > My goal here to right a program which will output the books id's sorted ( > ascending) by the average of rating. > I am done till the following steps: > 1. Map : create pairs key, value and context.write(key,value) > 2. Reducer: For each key sum of ratings/no of book entries. > context(key,avg_rating) > > Example output will be like: > 0075546701,4.6v > 0062059017,2.1 > 0195052668,6.1 > 0198775024,2.7 > > My next step is to sort the books ids based on (ascending) order of the > average rating. > How to write the program for getting the example output as follows: > > 0062059017,2.1 > 0198775024,2.7 > 0075546701,4.6 > 0195052668,6.1 > > > Please let me know if my approach is wrong as i am new to hadoop. > > Thanks in advance, > --Shashi. > > > > > > On Wed, Sep 14, 2011 at 11:32 AM, Sudharsan Sampath <sudha...@gmail.com>wrote: > >> One way is to reverse the <key,value> output in the mapper to emit<1, >> 10050> and in the reducer, use a treeset to order ur values.. for each value >> o/p <value, key> in the reducer. >> >> With this O/P will be sorted as per ur needs within each reducer. If u >> need a total sorted o/p, u can use a single reducer or design ur partition >> logic accordingly. >> >> Thanks >> Sudhan S >> >> >> On Wed, Sep 14, 2011 at 6:14 AM, ksgupta misc <ksgupta.m...@gmail.com>wrote: >> >>> Hi, >>> >>> I have the content like >>> *10103*,1042279,*4* >>> *10070*,1001089,*5* >>> *10102*,1015504,*7* >>> *10080*,1024369,*7* >>> *10050*,1025671,*1* >>> ... >>> from which i separated the key,value pairs and got the output after a >>> single map and reduce as follows: >>> >>> 10050 1 >>> 10070 5 >>> 10080 7 >>> 10102 7 >>> 10103 4 >>> ... >>> >>> I require to sort the output<key,value> pair by value (In ascending >>> order). >>> Please let me know how can i go ahead. >>> >>> Required output: >>> 10050 1 >>> 10103 4 >>> 10070 5 >>> 10080 7 >>> 10102 7 >>> >>> Thanks in advance, >>> --Shashi >>> >>> >>> >>> >>> >> >