Oh.Thanks a lot Harsh . On Sun, Jul 8, 2012 at 11:38 PM, Harsh J <ha...@cloudera.com> wrote:
> Pavan, > > This is covered in the MR tutorial doc: > http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html#Task+Logs > > On Mon, Jul 9, 2012 at 8:26 AM, Pavan Kulkarni <pavan.babu...@gmail.com> > wrote: > > I too had similar problems. > > I guess we should also set the debug mode for > > that specific class in the log4j.properties file .Isn't it? > > > > And I didn't quite get what you mean by task's userlogs? > > where are these logs located ? In the logs directory I only see > > logs for all the daemons.Thanks > > > > > > On Sun, Jul 8, 2012 at 6:27 PM, Grandl Robert <rgra...@yahoo.com> wrote: > >> > >> I see. I was looking into tasktracker log :). > >> > >> Thanks a lot, > >> Robert > >> > >> ________________________________ > >> From: Harsh J <ha...@cloudera.com> > >> To: Grandl Robert <rgra...@yahoo.com>; mapreduce-user > >> <mapreduce-user@hadoop.apache.org> > >> Sent: Sunday, July 8, 2012 9:16 PM > >> > >> Subject: Re: Basic question on how reducer works > >> > >> The changes should appear in your Task's userlogs (not the TaskTracker > >> logs). Have you deployed your changed code properly (i.e. do you > >> generate a new tarball, or perhaps use the MRMiniCluster to do this)? > >> > >> On Mon, Jul 9, 2012 at 4:57 AM, Grandl Robert <rgra...@yahoo.com> > wrote: > >> > Hi Harsh, > >> > > >> > Your comments were extremely helpful. > >> > > >> > Still I am wondering why if I add LOG.info entries into MapTask.java > or > >> > ReduceTask.java in most of the functions(including > >> > Old/NewOutputCollector), > >> > the logs are not shown. In this way it's hard for me to track which > >> > functions are called and which not. Even more in ReduceTask.java. > >> > > >> > Do you have any ideas ? > >> > > >> > Thanks a lot for your answer, > >> > Robert > >> > > >> > ________________________________ > >> > From: Harsh J <ha...@cloudera.com> > >> > To: mapreduce-user@hadoop.apache.org; Grandl Robert < > rgra...@yahoo.com> > >> > Sent: Sunday, July 8, 2012 1:34 AM > >> > > >> > Subject: Re: Basic question on how reducer works > >> > > >> > Hi Robert, > >> > > >> > Inline. (Answer is specific to Hadoop 1.x since you asked for that > >> > alone, but certain things may vary for Hadoop 2.x). > >> > > >> > On Sun, Jul 8, 2012 at 7:07 AM, Grandl Robert <rgra...@yahoo.com> > wrote: > >> >> Hi, > >> >> > >> >> I have some questions related to basic functionality in Hadoop. > >> >> > >> >> 1. When a Mapper process the intermediate output data, how it knows > how > >> >> many > >> >> partitions to do(how many reducers will be) and how much data to go > in > >> >> each > >> >> partition for each reducer ? > >> > > >> > The number of reducers is non-dynamic and is user-specified, and is > >> > set in the job configuration. Hence the Partitioner knows about the > >> > value it needs to use for its numPartitions (== numReduces for the > >> > job). > >> > > >> > For this one in 1.x code, look at MapTask.java, in the constructors of > >> > internal classes OldOutputCollector (Stable API) and > >> > NewOutputCollector (New API). > >> > > >> > The data estimated to be going into a partition, for limit/scheduling > >> > checks, is currently a naive computation, done by summing upon the > >> > estimate output sizes of each map. See > >> > ResourceEstimator#getEstimatedReduceInputSize for the overall > >> > estimation across maps, and see Task#calculateOutputSize for the > >> > per-map estimation code. > >> > > >> >> 2. A JobTracker when assigns a task to a reducer, it will also > specify > >> >> the > >> >> locations of intermediate output data where it should retrieve it > right > >> >> ? > >> >> But how a reducer will know from each remote location with > intermediate > >> >> output what portion it has to retrieve only ? > >> > > >> > The JT does not send in the information of locations when a reduce is > >> > scheduled. When the reducers begin their shuffle phase, they query the > >> > TaskTracker to get the map completion events, via > >> > TaskTracker#getMapCompletionEvents protocol call. The TaskTracker by > >> > itself calls the JobTracker#getTaskCompletionEvents protocol call to > >> > get this info underneath. The returned structure carries the host that > >> > has completed the map successfully, which the Reduce's copier relies > >> > on to fetch the data from the right host's TT. > >> > > >> > The reduce merely asks the data assigned for it for the specific > >> > completed maps at each TT. Note that a reduce task ID is also its > >> > partition ID, so it merely has to ask the data for its own task ID # > >> > and the TT serves, over HTTP, the right parts of the intermediate data > >> > to it. > >> > > >> > Feel free to ping back if you need some more clarification! :) > >> > > >> > -- > >> > Harsh J > >> > > >> > > >> > >> > >> > >> -- > >> Harsh J > >> > >> > > > > > > > > -- > > > > --With Regards > > Pavan Kulkarni > > > > > > -- > Harsh J > -- --With Regards Pavan Kulkarni