Hey Schubert, You need at least two new classes, a Partitioner and a Comparator for different grouping and sorting. There is an example in hadoop's source code can deal with this sort of problems. Download the least release of hadoop(version 0.20.0) and check out src/examples/SecondarySort.java. BTW, KeyFieldBasedPartitioner and KeyFieldBasedComparator can also be trouble-shooters for you, however, they have somewhat bugs.
On Mon, May 11, 2009 at 7:42 PM, zsongbo <zson...@gmail.com> wrote: > Thanks Jothi, > For example, I have a dataset with map key="city+userid+time". The output > of > mapper are sorted by this map key. > > Than, I group the reduce output according to "city+userid" by define > my OutputValueGroupingComparator > which just compare "city+userid" in the mapkey. I still want the output are > sorted by time in each group. > > It works fine. > > But to improve the performance, I want to use combiner which should also > group as "city+userid", but sorted by "city+userid+time". > > I do not know if this requirement is reasonable. > > > Schubert > > On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan <joth...@yahoo-inc.com > >wrote: > > > OutputValueGroupingComparator is used only at the reducer. AFAIK, I do > not > > think you can have a different comparator for combiners. > > > > Jothi > > > > > > On 5/7/09 3:32 PM, "zsongbo" <zson...@gmail.com> wrote: > > > > > Hi all, > > > I have a application want the rules of sorting and grouping use > > > different Comparator. > > > > > > I had tested 0.19.1 and 0.20.0 about this function, but both do not > work > > for > > > Combiner. > > > > > > In 0.19.1, I use job.setOutputValueGroupingComparator(), and > > > in 0.20.0, I use job.setGroupingComparatorClass() > > > > > > This function is ok for reduce phase, the reduce phase can group the > keys > > by > > > above Comparator, and sort by default comparator of the key class. > > > > > > But I want the combiner can use a separator comparator for group, > > different > > > from sorting, is it possible? > > > > > > Schubert > > > > > Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com