Hi. I had difficulties in getting Reduce sorting to wor - it took me a good art of a day to figure out what was going wrong, so I'm sharing this in hopes of earning something from the community or getting hadoop improved to avoid thisind of error for future users.
I have 2 key classes, one holds a String, the other one extends that, and adds a boolean. I implemented the first key class (let's call it Super) public class Super implements WritableComparable<Super> { . . . public int compareTo(Super o) { // sort on string value . . . } I implemented the 2nd key class (let's call it Sub) public class Sub extends Super { . . . public int compareTo(Sub o) { // sort on boolean value . . . // if equal, use the super: ... else return super.compareTo(o); } With this setup, I used the "Sub" class as a mapper output key, and expected the sort on the boolean value to happen first, then for equal values there, the sort on the string values. What actually happened, was that the sort on the boolean value was skipped completely, and only the sort on the string was done. The reason for this is that (in 0.19.1 release) the WritableCompator instance that is created (using the defaults - no custom Comparator) knows the class is "Sub", and calls from the key value it created, and calls the compareTo method, passing it the other key. Both of these keys are of type Sub. However, they are passed via this code in WritableComparator: public int compare(WritableComparable a, WritableComparable b) { return a.compareTo(b); } Java uses the interface spec for WritableComparable that was declared, in this case WritableComparable<Super>, and infers that the arg type for the compareTo is Super. So it "skips" calling the compareTo in Sub, and just calls the one in Super. The workaround is to change the signature of Sub's compareTo method to match the spec in the interface, namely it has to take the Super as an argument, and then cast it to Sub. This seems like a very error prone design. Am I doing something wrong, or can this be improved so that this kind of error is avoided? -Marshall Schor