Hey thanks a lot. That's basically what I needed. 2008/4/11 Zhang, jian <[EMAIL PROTECTED]>:
> Hi, > > Please read this, you need to implement partitioner. > It controls which key is sent to which reducer, if u want to get unique > key result, you need to implement partitioner and the compareTO function > should work properly. > [WIKI] > Partitioner > > Partitioner partitions the key space. > > Partitioner controls the partitioning of the keys of the intermediate > map-outputs. The key (or a subset of the key) is used to derive the > partition, typically by a hash function. The total number of partitions is > the same as the number of reduce tasks for the job. Hence this controls > which of the m reduce tasks the intermediate key (and hence the record) is > sent to for reduction. > > HashPartitioner is the default Partitioner. > > > > Best Regards > > Jian Zhang > > > -----邮件原件----- > 发件人: Harish Mallipeddi [mailto:[EMAIL PROTECTED] > 发送时间: 2008年4月11日 19:06 > 收件人: core-user@hadoop.apache.org > 主题: Problem with key aggregation when number of reduce tasks is more than > 1 > > Hi all, > > I wrote a custom key class (implements WritableComparable) and implemented > the compareTo() method inside this class. Everything works fine when I run > the m/r job with 1 reduce task (via setNumReduceTasks). Keys are sorted > correctly in the output files. > > But when I increase the number of reduce tasks, keys don't get aggregated > properly; same keys seem to end up in separate output files > (output/part-00000, output/part-00001, etc). This should not happen > because > right before reduce() gets called, all (k,v) pairs from all map outputs > with > the same 'k' are aggregated and the reduce function just iterates over the > values (v1, v2, etc)? > > Do I need to implement anything else inside my custom key class other than > compareTo? I also tried implementing equals() but that didn't help either. > Then I came across setOutputKeyComparator(). So I added a custom > Comparator > class inside the key class and tried setting this on the JobConf object. > But > that didn't work either. What could be wrong? > > Cheers, > > -- > Harish Mallipeddi > circos.com : poundbang.in/blog/ > -- Harish Mallipeddi circos.com : poundbang.in/blog/