On Feb 21, 2008, at 5:47 PM, Ted Dunning wrote:
It may be sorted within the output for a single reducer and,
indeed, you can
even guarantee that it is sorted but *only* by the reduce key. The
order
that values appear will not be deterministic.
Actually, there is a better answer for this. If you put both the
primary and secondary key into the key, you can use
JobConf.setOutputValueGroupingComparator to set a comparator that
only compares the primary key. Reduce will be called once per a
primary key, but all of the values will be sorted by the secondary key.
See http://tinyurl.com/32gld4
-- Owen