> But it actually adds duplicate data (i.e., the value column which needs > sorting) to the key.
Why? U can always take it out of the value to remove the redundancy. > Also, I wonder what is the benefit to sort values before reaching > reducers. It can be achieved in the reduce phase anyway. The reduce only does a merge of sorted segments. The segments have to be sorted using all the sort fields before the merge itself. Otherwise u can't do a merge. (hope I understood the question right) -----Original Message----- From: Qiong Zhang [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 06, 2008 11:25 AM To: core-user@hadoop.apache.org Subject: sort by value Hi, All, Is there a better way to sort by value in the same key before reaching reducers? I know it can be achieved by using setOutputValueGroupingComparator/setOutputKeyComparatorClass. But it actually adds duplicate data (i.e., the value column which needs sorting) to the key. Also, I wonder what is the benefit to sort values before reaching reducers. It can be achieved in the reduce phase anyway. Thanks, James