oops, misunderstanded your problem.  Before you do combine operations on map
output keys, it's that they are actually sorted by a quicksort sorter in
default according the rule you set at jobConf.setOutputKeyComparator(). It's
impossible archieving your target w/o modify some source code of hadoop,
rebuilding it. Even though, that's make no sense.


On Mon, May 11, 2009 at 7:42 PM, zsongbo <zson...@gmail.com> wrote:

> Thanks Jothi,
> For example, I have a dataset with map key="city+userid+time". The output
> of
> mapper are sorted by this map key.
>
> Than, I group the reduce output according to "city+userid" by define
> my OutputValueGroupingComparator
> which just compare "city+userid" in the mapkey. I still want the output are
> sorted by time in each group.
>
> It works fine.
>
> But to improve the performance, I want to use combiner which should also
> group as "city+userid", but sorted by "city+userid+time".
>
> I do not know if this requirement is reasonable.
>
>
> Schubert
>
> On Thu, May 7, 2009 at 7:53 PM, Jothi Padmanabhan <joth...@yahoo-inc.com
> >wrote:
>
> > OutputValueGroupingComparator is used only at the reducer. AFAIK, I do
> not
> > think you can have a different comparator for combiners.
> >
> > Jothi
> >
> >
> > On 5/7/09 3:32 PM, "zsongbo" <zson...@gmail.com> wrote:
> >
> > > Hi all,
> > > I have a application want the rules of sorting and grouping use
> > > different Comparator.
> > >
> > > I had tested 0.19.1 and 0.20.0 about this function, but both do not
> work
> > for
> > > Combiner.
> > >
> > > In 0.19.1, I use job.setOutputValueGroupingComparator(), and
> > > in 0.20.0, I use job.setGroupingComparatorClass()
> > >
> > > This function is ok for reduce phase, the reduce phase can group the
> keys
> > by
> > > above Comparator, and sort by default comparator of the key class.
> > >
> > > But I want the combiner can use a separator comparator for group,
> > different
> > > from sorting, is it possible?
> > >
> > > Schubert
> >
> >
>


Min
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Reply via email to