Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Björn-Elmar Macek Fri, 25 May 2012 01:54:03 -0700

Although the reactions did not give me the feeling there was muchinterest in my case, i have found a "solution" and some reasons for myproblem. You might be interested in the discussion on Stackoverflow:

http://stackoverflow.com/questions/10720132/hadoop-reducer-is-waiting-for-mapper-inputs



Am 23.05.2012 10:47, schrieb Björn-Elmar Macek:

Ok, i have look at the logs some further and googled every tiny bit ofthem, hoping to find an answer out there.
I fear that the following line nails my problem at a big scale:
12/05/22 01:30:21 INFO mapred.ReduceTask:attempt_local_0001_r_000000_0 Need another 2 map output(s)
where 0 is already in progress
I found several discussions to problems, that also had this line intheir logs. I have checked my code for the following:
* All inputs are collected in the mapper (tho not all would be neccessary)
* The Comparators run well and return proper values for all inputs
* The Partitioner always returns proper values

Please, i would really need a hint, to where i have to look.
Am 22.05.2012 16:57, schrieb Björn-Elmar Macek:
Hi Jayaseelan,

thanks for the bump! ;)
I have continued working on the problem, but with no further success.I emptied the log directory and started the debugging all over again,resulting in no new logfiles, so i guess the program did not run intoserious problems. Also all the code other classes, namely ...
* Mapper
* Partitioner
* OutputKeyComparatorClass
is executed and can easily be debugged. Stil the Reducer and theOutputValueGroupingComparator do NOT work. After the execution of thecomparisons made by OutputKeyComparatorClass i get alot of activeprocesses in my debugging view in eclipse:
OpenJDK Client VM[localhost:5002]
    Thread [main] (Running)
    Thread [Thread-2] (Running)
    Daemon Thread [communication thread] (Running)
    Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)
    Daemon Thread [Thread for merging in memory files] (Running)
    Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)
    Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)
    Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)
    Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)
    Daemon Thread [Thread for merging on-disk files] (Running)
    Daemon Thread [Thread for polling Map Completion Events] (Running)
And those processes are running, but obviously waiting for something,since no output is produced. And it is not due to the havy load ofinput data, since this is a 10 line csv file, which shouldnt make anyproblems.
I somehow have the feeling that the framework cannot handle myclasses, but i dont understand why.
I would really appreciate a decent hint, how to fix that.

Thanks you for your time and help!
Björn-Elmar
Am 22.05.2012 12:38, schrieb Jayaseelan E:
------------------------------------------------------------------------
*From:* Björn-Elmar Macek [mailto:ma...@cs.uni-kassel.de]
*Sent:* Tuesday, May 22, 2012 3:12 PM
*To:* hdfs-user@hadoop.apache.org
*Subject:* Hadoop Debugging in LocalMode (Breakpoints not reached)

Hi there,
i am currently trying to get rid of bugs in my Hadoop program bydebugging it. Everything went fine til some point yesterday. I dontknow what exactly happened, but my program does not stop atbreakpoints within the Reducer and also not within the RawComparatorfor the values which i do use for sorting my inputs in theReducerIterator.
(see the classes set for the conf below:)

conf.setOutputValueGroupingComparator(TwitterValueGroupingComparator.class);
conf.setReducerClass(RetweetReducer.class);

The log looks like this:

Warning: $HADOOP_HOME is deprecated.

Listening for transport dt_socket at address: 5002
12/05/21 19:24:20 INFO util.NativeCodeLoader: Loaded thenative-hadoop library
12/05/21 19:24:20 WARN mapred.JobClient: Use GenericOptionsParserfor parsing the arguments. Applications should implement Tool forthe same.
12/05/21 19:24:20 WARN snappy.LoadSnappy: Snappy native library notloaded
12/05/21 19:24:20 INFO mapred.FileInputFormat: Total input paths toprocess : 2
12/05/21 19:24:20 WARN conf.Configuration:file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:aattempt to override final parameter: fs.default.name;Ignoring.
12/05/21 19:24:20 WARN conf.Configuration:file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:aattempt to override final parameter: mapred.job.tracker;Ignoring.
12/05/21 19:24:20 INFO mapred.JobClient: Running job: job_local_0001

12/05/21 19:24:20 INFO util.ProcessTree: setsid exited with exit code 0
12/05/21 19:24:21 INFO mapred.Task:Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c4ff2c
12/05/21 19:24:21 INFO mapred.MapTask: numReduceTasks: 1

12/05/21 19:24:21 INFO mapred.MapTask: io.sort.mb = 100

12/05/21 19:24:22 INFO mapred.JobClient:map 0% reduce 0%

12/05/21 19:24:22 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/21 19:24:22 INFO mapred.MapTask: record buffer = 262144/327680

12/05/21 19:24:22 INFO mapred.MapTask: Starting flush of map output

12/05/21 19:24:22 INFO mapred.MapTask: Finished spill 0
12/05/21 19:24:22 INFO mapred.Task:Task:attempt_local_0001_m_000000_0 is done. And is in the process ofcommiting
12/05/21 19:24:23 INFO mapred.LocalJobRunner:file:/home/ema/INPUT-H/tweets_ext:0+968
12/05/21 19:24:23 INFO mapred.Task: Task'attempt_local_0001_m_000000_0' done.
12/05/21 19:24:23 INFO mapred.Task:Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1e8c585
12/05/21 19:24:23 INFO mapred.MapTask: numReduceTasks: 1

12/05/21 19:24:23 INFO mapred.MapTask: io.sort.mb = 100

12/05/21 19:24:24 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/21 19:24:24 INFO mapred.MapTask: record buffer = 262144/327680

12/05/21 19:24:24 INFO mapred.MapTask: Starting flush of map output
12/05/21 19:24:24 INFO mapred.Task:Task:attempt_local_0001_m_000001_0 is done. And is in the process ofcommiting
12/05/21 19:24:24 INFO mapred.JobClient:map 100% reduce 0%
12/05/21 19:24:26 INFO mapred.LocalJobRunner:file:/home/ema/INPUT-H/tweets~:0+0
12/05/21 19:24:26 INFO mapred.Task: Task'attempt_local_0001_m_000001_0' done.
12/05/21 19:24:26 INFO mapred.Task:Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@191e4c
12/05/21 19:24:26 INFO mapred.ReduceTask: ShuffleRamManager:MemoryLimit=709551680, MaxSingleShuffleLimit=177387920
12/05/21 19:24:27 INFO mapred.ReduceTask:attempt_local_0001_r_000000_0 Need another 2 map output(s) where 0is already in progress
12/05/21 19:24:27 INFO mapred.ReduceTask:attempt_local_0001_r_000000_0 Thread started: Thread for mergingon-disk files
12/05/21 19:24:27 INFO mapred.ReduceTask:attempt_local_0001_r_000000_0 Thread waiting: Thread for mergingon-disk files
12/05/21 19:24:27 INFO mapred.ReduceTask:attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0dup hosts)
12/05/21 19:24:27 INFO mapred.ReduceTask:attempt_local_0001_r_000000_0 Thread started: Thread for merging inmemory files
12/05/21 19:24:27 INFO mapred.ReduceTask:attempt_local_0001_r_000000_0 Thread started: Thread for polling MapCompletion Events
12/05/21 19:24:32 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:35 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:42 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:48 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:51 INFO mapred.LocalJobRunner: reduce > copy >

12/05/21 19:24:57 INFO mapred.LocalJobRunner: reduce > copy >

... etc ...

Is there something i have missed?

Thanks for your help in advance!

Best regards,
Björn-Elmar

Re: Hadoop Debugging in LocalMode (Breakpoints not reached)

Reply via email to