I too had similar problems. I guess we should also set the debug mode for that specific class in the log4j.properties file .Isn't it?
And I didn't quite get what you mean by task's userlogs? where are these logs located ? In the logs directory I only see logs for all the daemons.Thanks On Sun, Jul 8, 2012 at 6:27 PM, Grandl Robert <rgra...@yahoo.com> wrote: > I see. I was looking into tasktracker log :). > > Thanks a lot, > Robert > > ------------------------------ > *From:* Harsh J <ha...@cloudera.com> > *To:* Grandl Robert <rgra...@yahoo.com>; mapreduce-user < > mapreduce-user@hadoop.apache.org> > *Sent:* Sunday, July 8, 2012 9:16 PM > > *Subject:* Re: Basic question on how reducer works > > The changes should appear in your Task's userlogs (not the TaskTracker > logs). Have you deployed your changed code properly (i.e. do you > generate a new tarball, or perhaps use the MRMiniCluster to do this)? > > On Mon, Jul 9, 2012 at 4:57 AM, Grandl Robert <rgra...@yahoo.com> wrote: > > Hi Harsh, > > > > Your comments were extremely helpful. > > > > Still I am wondering why if I add LOG.info <http://log.info/> entries > into MapTask.java or > > ReduceTask.java in most of the functions(including > Old/NewOutputCollector), > > the logs are not shown. In this way it's hard for me to track which > > functions are called and which not. Even more in ReduceTask.java. > > > > Do you have any ideas ? > > > > Thanks a lot for your answer, > > Robert > > > > ________________________________ > > From: Harsh J <ha...@cloudera.com> > > To: mapreduce-user@hadoop.apache.org; Grandl Robert <rgra...@yahoo.com> > > Sent: Sunday, July 8, 2012 1:34 AM > > > > Subject: Re: Basic question on how reducer works > > > > Hi Robert, > > > > Inline. (Answer is specific to Hadoop 1.x since you asked for that > > alone, but certain things may vary for Hadoop 2.x). > > > > On Sun, Jul 8, 2012 at 7:07 AM, Grandl Robert <rgra...@yahoo.com> wrote: > >> Hi, > >> > >> I have some questions related to basic functionality in Hadoop. > >> > >> 1. When a Mapper process the intermediate output data, how it knows how > >> many > >> partitions to do(how many reducers will be) and how much data to go in > >> each > >> partition for each reducer ? > > > > The number of reducers is non-dynamic and is user-specified, and is > > set in the job configuration. Hence the Partitioner knows about the > > value it needs to use for its numPartitions (== numReduces for the > > job). > > > > For this one in 1.x code, look at MapTask.java, in the constructors of > > internal classes OldOutputCollector (Stable API) and > > NewOutputCollector (New API). > > > > The data estimated to be going into a partition, for limit/scheduling > > checks, is currently a naive computation, done by summing upon the > > estimate output sizes of each map. See > > ResourceEstimator#getEstimatedReduceInputSize for the overall > > estimation across maps, and see Task#calculateOutputSize for the > > per-map estimation code. > > > >> 2. A JobTracker when assigns a task to a reducer, it will also specify > the > >> locations of intermediate output data where it should retrieve it right > ? > >> But how a reducer will know from each remote location with intermediate > >> output what portion it has to retrieve only ? > > > > The JT does not send in the information of locations when a reduce is > > scheduled. When the reducers begin their shuffle phase, they query the > > TaskTracker to get the map completion events, via > > TaskTracker#getMapCompletionEvents protocol call. The TaskTracker by > > itself calls the JobTracker#getTaskCompletionEvents protocol call to > > get this info underneath. The returned structure carries the host that > > has completed the map successfully, which the Reduce's copier relies > > on to fetch the data from the right host's TT. > > > > The reduce merely asks the data assigned for it for the specific > > completed maps at each TT. Note that a reduce task ID is also its > > partition ID, so it merely has to ask the data for its own task ID # > > and the TT serves, over HTTP, the right parts of the intermediate data > > to it. > > > > Feel free to ping back if you need some more clarification! :) > > > > -- > > Harsh J > > > > > > > > -- > Harsh J > > > -- --With Regards Pavan Kulkarni