The input of each reducer is not same, it depends on the input data distribution and Partitioner. And the running time of each reducer consist of three phases: copy, sort and reducer.
2010/6/18 李钰 <car...@gmail.com>: > Hi Todd and Jeff, > > Thanks a lot for your discussion, it's really helpful to me. I'd like to > express my especial appreciation for Todd's patient explanation, you help me > see more clearly about the working mechanism of SORT. And Jeff, really thank > you for reminding me that sort uses TotalOrderPartitioner to do > partitioning. > Based on your discussion I update my understanding as follows: > The sorting happens on the map side during the spill process of each map > task, after that, the overall map outputs are partitioned by method of > TotalOrderPartitioner, this decides the input range of each reducer. > Reducers get map outputs as decided by the partitioner, and do merging and > write results into HDFS. > Is this understanding right? Please correct me if you find any faults, > thanks. > If this understanding is right, then my question rolls back to the original > one: Since the scale of input data and operations of each reduce task is the > same, what may cause the execution time of reduce tasks different? All nodes > used in my experiment are on the same rack, and they are homogenous. > Any suggesion will be highly appreciated, thanks. > > Best Regards, > Carp > > 2010/6/18 Todd Lipcon <t...@cloudera.com> > >> On Thu, Jun 17, 2010 at 9:37 AM, Jeff Zhang <zjf...@gmail.com> wrote: >> >> > Todd, >> > >> > Why's there a sorting in map task, the sorting here seems useless in my >> > opinion. >> > >> > >> For map-only jobs there isn't. For jobs with reduce, typically the number >> of >> reduce tasks is smaller than the number of map tasks, so parallelizing the >> sort on the mappers and just doing merge on the reducers is beneficial. >> Second, this allows the combiner to run on the mapper by identifying when >> it >> has multiple outputs for the same key. Third, this allows improved >> compression on the map output (thus less intermediate data transfer) by >> putting similar keys near each other (hopefully within the compression >> window). Fourth, it kills two birds with one stone since the mappers >> already >> have to group outputs by the partition. >> >> -Todd >> >> >> > >> > >> > On Thu, Jun 17, 2010 at 9:26 AM, Todd Lipcon <t...@cloudera.com> wrote: >> > > On Thu, Jun 17, 2010 at 12:43 AM, Jeff Zhang <zjf...@gmail.com> wrote: >> > > >> > >> Your understanding of Sort is not right. The key concept of Sort is >> > >> the TotalOrderPartitioner. Actually before the map-reduce job, client >> > >> side will do sampling of input data to estimate the distribution of >> > >> input data. And the mapper do nothing, each reducer will fetch its >> > >> data according the TotalOrderPartitioner. The data in each reducer is >> > >> local sorted, and each reducer are sorted ( r0<r1<r2....), so the >> > >> overall result data is sorted. >> > >> >> > > >> > > The sorting happens on the map side, actually, during the spill >> process. >> > The >> > > mapper itself is an identity function, but the map task code does >> perform >> > a >> > > sort (on a <partition,key> tuple) as originally described in this >> thread. >> > > Reducers just do a merge of mapper outputs. >> > > >> > > -Todd >> > > >> > > >> > >> >> > >> >> > >> >> > >> On Thu, Jun 17, 2010 at 12:13 AM, 李钰 <car...@gmail.com> wrote: >> > >> > Hi all, >> > >> > >> > >> > I'm doing some tuning of the sort benchmark of hadoop. To be more >> > >> specified, >> > >> > running test against the org.apache.hadoop.examples.Sort class. As >> > >> looking >> > >> > through the source code, I think the map tasks take responsibility >> of >> > >> > sorting the input data, and the reduce tasks just merge the map >> > outputs >> > >> and >> > >> > write them into HDFS. But here I've got a question I couldn't >> > understand: >> > >> > the time cost of the reduce phase of each reduce task, that is >> writing >> > >> data >> > >> > into HDFS, is different from each other. Since the input data and >> > >> operations >> > >> > of each reduce task is the same, what reason will cause the >> execution >> > >> time >> > >> > different? Is there anything wrong of my understanding? Does anybody >> > have >> > >> > any experience on this? Badly need your help, thanks. >> > >> > >> > >> > Best Regards, >> > >> > Carp >> > >> > >> > >> >> > >> >> > >> >> > >> -- >> > >> Best Regards >> > >> >> > >> Jeff Zhang >> > >> >> > > >> > > >> > > >> > > -- >> > > Todd Lipcon >> > > Software Engineer, Cloudera >> > > >> > >> > >> > >> > -- >> > Best Regards >> > >> > Jeff Zhang >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > -- Best Regards Jeff Zhang