Yes and as such, we've found better load balancing when the #of reduces is a
prime #.  Although the string.hashCode isn't great for short strings.


On 4/11/08 4:16 AM, "Zhang, jian" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Please read this, you need to implement partitioner.
> It controls which key is sent to which reducer, if u want to get unique key
> result, you need to implement partitioner and the compareTO function should
> work properly. 
> [WIKI]
> Partitioner
> 
> Partitioner partitions the key space.
> 
> Partitioner controls the partitioning of the keys of the intermediate
> map-outputs. The key (or a subset of the key) is used to derive the partition,
> typically by a hash function. The total number of partitions is the same as
> the number of reduce tasks for the job. Hence this controls which of the m
> reduce tasks the intermediate key (and hence the record) is sent to for
> reduction.
> 
> HashPartitioner is the default Partitioner.
> 
> 
> 
> Best Regards
> 
> Jian Zhang
> 
> 
> -----邮件原件-----
> 发件人: Harish Mallipeddi [mailto:[EMAIL PROTECTED]
> 发送时间: 2008年4月11日 19:06
> 收件人: core-user@hadoop.apache.org
> 主题: Problem with key aggregation when number of reduce tasks is more than 1
> 
> Hi all,
> 
> I wrote a custom key class (implements WritableComparable) and implemented
> the compareTo() method inside this class. Everything works fine when I run
> the m/r job with 1 reduce task (via setNumReduceTasks). Keys are sorted
> correctly in the output files.
> 
> But when I increase the number of reduce tasks, keys don't get aggregated
> properly; same keys seem to end up in separate output files
> (output/part-00000, output/part-00001, etc). This should not happen because
> right before reduce() gets called, all (k,v) pairs from all map outputs with
> the same 'k' are aggregated and the reduce function just iterates over the
> values (v1, v2, etc)?
> 
> Do I need to implement anything else inside my custom key class other than
> compareTo? I also tried implementing equals() but that didn't help either.
> Then I came across setOutputKeyComparator(). So I added a custom Comparator
> class inside the key class and tried setting this on the JobConf object. But
> that didn't work either. What could be wrong?
> 
> Cheers,

Reply via email to