RE: sort by value

Joydeep Sen Sarma Wed, 06 Feb 2008 11:58:10 -0800

> But it actually adds duplicate data (i.e., the value column which
needs 
> sorting) to the key.


Why? U can always take it out of the value to remove the redundancy.

> Also, I wonder what is the benefit to sort values before reaching
> reducers. It can be achieved in the reduce phase anyway.

The reduce only does a merge of sorted segments. The segments have to be
sorted using all the sort fields before the merge itself. Otherwise u
can't do a merge. (hope I understood the question right)


-----Original Message-----
From: Qiong Zhang [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, February 06, 2008 11:25 AM
To: core-user@hadoop.apache.org
Subject: sort by value


Hi, All,

Is there a better way to sort by value in the same key before reaching
reducers?

I know it can be achieved by using
setOutputValueGroupingComparator/setOutputKeyComparatorClass.

But it actually adds duplicate data (i.e., the value column which needs
sorting) to the key.

Also, I wonder what is the benefit to sort values before reaching
reducers.
It can be achieved in the reduce phase anyway.

Thanks,
James

RE: sort by value

Reply via email to