Hi all, I wrote a custom key class (implements WritableComparable) and implemented the compareTo() method inside this class. Everything works fine when I run the m/r job with 1 reduce task (via setNumReduceTasks). Keys are sorted correctly in the output files.
But when I increase the number of reduce tasks, keys don't get aggregated properly; same keys seem to end up in separate output files (output/part-00000, output/part-00001, etc). This should not happen because right before reduce() gets called, all (k,v) pairs from all map outputs with the same 'k' are aggregated and the reduce function just iterates over the values (v1, v2, etc)? Do I need to implement anything else inside my custom key class other than compareTo? I also tried implementing equals() but that didn't help either. Then I came across setOutputKeyComparator(). So I added a custom Comparator class inside the key class and tried setting this on the JobConf object. But that didn't work either. What could be wrong? Cheers, -- Harish Mallipeddi circos.com : poundbang.in/blog/