Yes and as such, we've found better load balancing when the #of reduces is a prime #. Although the string.hashCode isn't great for short strings.
On 4/11/08 4:16 AM, "Zhang, jian" <[EMAIL PROTECTED]> wrote: > Hi, > > Please read this, you need to implement partitioner. > It controls which key is sent to which reducer, if u want to get unique key > result, you need to implement partitioner and the compareTO function should > work properly. > [WIKI] > Partitioner > > Partitioner partitions the key space. > > Partitioner controls the partitioning of the keys of the intermediate > map-outputs. The key (or a subset of the key) is used to derive the partition, > typically by a hash function. The total number of partitions is the same as > the number of reduce tasks for the job. Hence this controls which of the m > reduce tasks the intermediate key (and hence the record) is sent to for > reduction. > > HashPartitioner is the default Partitioner. > > > > Best Regards > > Jian Zhang > > > -----邮件原件----- > 发件人: Harish Mallipeddi [mailto:[EMAIL PROTECTED] > 发送时间: 2008年4月11日 19:06 > 收件人: core-user@hadoop.apache.org > 主题: Problem with key aggregation when number of reduce tasks is more than 1 > > Hi all, > > I wrote a custom key class (implements WritableComparable) and implemented > the compareTo() method inside this class. Everything works fine when I run > the m/r job with 1 reduce task (via setNumReduceTasks). Keys are sorted > correctly in the output files. > > But when I increase the number of reduce tasks, keys don't get aggregated > properly; same keys seem to end up in separate output files > (output/part-00000, output/part-00001, etc). This should not happen because > right before reduce() gets called, all (k,v) pairs from all map outputs with > the same 'k' are aggregated and the reduce function just iterates over the > values (v1, v2, etc)? > > Do I need to implement anything else inside my custom key class other than > compareTo? I also tried implementing equals() but that didn't help either. > Then I came across setOutputKeyComparator(). So I added a custom Comparator > class inside the key class and tried setting this on the JobConf object. But > that didn't work either. What could be wrong? > > Cheers,