Pavan, This is covered in the MR tutorial doc: http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html#Task+Logs
On Mon, Jul 9, 2012 at 8:26 AM, Pavan Kulkarni <pavan.babu...@gmail.com> wrote: > I too had similar problems. > I guess we should also set the debug mode for > that specific class in the log4j.properties file .Isn't it? > > And I didn't quite get what you mean by task's userlogs? > where are these logs located ? In the logs directory I only see > logs for all the daemons.Thanks > > > On Sun, Jul 8, 2012 at 6:27 PM, Grandl Robert <rgra...@yahoo.com> wrote: >> >> I see. I was looking into tasktracker log :). >> >> Thanks a lot, >> Robert >> >> ________________________________ >> From: Harsh J <ha...@cloudera.com> >> To: Grandl Robert <rgra...@yahoo.com>; mapreduce-user >> <mapreduce-user@hadoop.apache.org> >> Sent: Sunday, July 8, 2012 9:16 PM >> >> Subject: Re: Basic question on how reducer works >> >> The changes should appear in your Task's userlogs (not the TaskTracker >> logs). Have you deployed your changed code properly (i.e. do you >> generate a new tarball, or perhaps use the MRMiniCluster to do this)? >> >> On Mon, Jul 9, 2012 at 4:57 AM, Grandl Robert <rgra...@yahoo.com> wrote: >> > Hi Harsh, >> > >> > Your comments were extremely helpful. >> > >> > Still I am wondering why if I add LOG.info entries into MapTask.java or >> > ReduceTask.java in most of the functions(including >> > Old/NewOutputCollector), >> > the logs are not shown. In this way it's hard for me to track which >> > functions are called and which not. Even more in ReduceTask.java. >> > >> > Do you have any ideas ? >> > >> > Thanks a lot for your answer, >> > Robert >> > >> > ________________________________ >> > From: Harsh J <ha...@cloudera.com> >> > To: mapreduce-user@hadoop.apache.org; Grandl Robert <rgra...@yahoo.com> >> > Sent: Sunday, July 8, 2012 1:34 AM >> > >> > Subject: Re: Basic question on how reducer works >> > >> > Hi Robert, >> > >> > Inline. (Answer is specific to Hadoop 1.x since you asked for that >> > alone, but certain things may vary for Hadoop 2.x). >> > >> > On Sun, Jul 8, 2012 at 7:07 AM, Grandl Robert <rgra...@yahoo.com> wrote: >> >> Hi, >> >> >> >> I have some questions related to basic functionality in Hadoop. >> >> >> >> 1. When a Mapper process the intermediate output data, how it knows how >> >> many >> >> partitions to do(how many reducers will be) and how much data to go in >> >> each >> >> partition for each reducer ? >> > >> > The number of reducers is non-dynamic and is user-specified, and is >> > set in the job configuration. Hence the Partitioner knows about the >> > value it needs to use for its numPartitions (== numReduces for the >> > job). >> > >> > For this one in 1.x code, look at MapTask.java, in the constructors of >> > internal classes OldOutputCollector (Stable API) and >> > NewOutputCollector (New API). >> > >> > The data estimated to be going into a partition, for limit/scheduling >> > checks, is currently a naive computation, done by summing upon the >> > estimate output sizes of each map. See >> > ResourceEstimator#getEstimatedReduceInputSize for the overall >> > estimation across maps, and see Task#calculateOutputSize for the >> > per-map estimation code. >> > >> >> 2. A JobTracker when assigns a task to a reducer, it will also specify >> >> the >> >> locations of intermediate output data where it should retrieve it right >> >> ? >> >> But how a reducer will know from each remote location with intermediate >> >> output what portion it has to retrieve only ? >> > >> > The JT does not send in the information of locations when a reduce is >> > scheduled. When the reducers begin their shuffle phase, they query the >> > TaskTracker to get the map completion events, via >> > TaskTracker#getMapCompletionEvents protocol call. The TaskTracker by >> > itself calls the JobTracker#getTaskCompletionEvents protocol call to >> > get this info underneath. The returned structure carries the host that >> > has completed the map successfully, which the Reduce's copier relies >> > on to fetch the data from the right host's TT. >> > >> > The reduce merely asks the data assigned for it for the specific >> > completed maps at each TT. Note that a reduce task ID is also its >> > partition ID, so it merely has to ask the data for its own task ID # >> > and the TT serves, over HTTP, the right parts of the intermediate data >> > to it. >> > >> > Feel free to ping back if you need some more clarification! :) >> > >> > -- >> > Harsh J >> > >> > >> >> >> >> -- >> Harsh J >> >> > > > > -- > > --With Regards > Pavan Kulkarni > -- Harsh J