Hi Jay, AFAIK, when the MR does not have a reducer phase(i.e. no. of reducer=0) then the output from Mapper is not sorted.
HTH, Anil On Fri, Oct 19, 2012 at 8:19 PM, Jay Vyas <jayunit...@gmail.com> wrote: > IS there any documentation on the internals of the shuffle and sort phase? > The elephant book seems to be the best source, but it appears to only > lightly touch upon the "magic" part (i.e. the distributed merge sorting and > mapper spilling). > > Also... What is the rationale behind the sortedness of mapper outputs? Is > the reason to optimize the streaming of mapper values to reducers? In > simple scenarios, i.e. when there is no reducing to be done, it seems that > we may not care to have sorted mapper outputs : a random merge of all > spilled records would be sufficient. > > I've noticed that the Shuffle and Sort classes in hadoop have almost no > comments and appear to simply wrap other classes. > > -- > Jay Vyas > http://jayunit100.blogspot.com > -- Thanks & Regards, Anil Gupta