[
https://issues.apache.org/jira/browse/HADOOP-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638203#action_12638203
]
Joydeep Sen Sarma commented on HADOOP-3155:
-------------------------------------------
i don't know why the ipc layer ignores interruptedexceptions. but the practical
matter is that it does - and my naiive suspicion is that changing that would be
non trivial. also - since the ipc layer always times out anyway - interrupts
are not required to break out of RPC calls.
as i mentioned - it's no longer a race condition. the assumption that the
thread.interrupt() works to kill the mapeventsfetcher is inherently wrong. my
speculation is that both the threads run forever - which is why new jobs/tasks
also hit the same problem when they run on this TT (i am assuming this since we
had many many jobs that hit the same problem - and they started hours apart).
we need a shutdown flag in addition to the interrupt and need to check that
flag in the eventsfetcher loop. otherwise thread.join will hang. (also - if u
look at general documentation on the web about using interrupts in java -
everyone advises using a flag in addition as general practice).
i would be curious if there are other parts of the hadoop code base that suffer
from the same assumptions (about thread.interrupt being caught and thrown).
> reducers stuck at shuffling
> ----------------------------
>
> Key: HADOOP-3155
> URL: https://issues.apache.org/jira/browse/HADOOP-3155
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Runping Qi
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.19.0
>
> Attachments: events-job_200807311630_0007.txt, fetcherThread.patch,
> fetcherThread.patch, hadoop-3155-logs.tar.gz, jobevents_1007.txt,
> patch-3155-debug-0.16.txt, patch-3155-debug-0.17.txt,
> task_200807311630_0007_r_000006_0.syslog.gz
>
>
> This happened with hadoop-0.16.2:
> In relatively small job (a few hundreds of mappers and reducers), reducers
> were stuck at shuffling.
> I saw the lines like the following repeated hundreds of thousands of times
> over a few hours:
> 2008-04-02 17:17:44,640 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:44,641 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:44,641 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:44,641 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:17:52,649 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:52,650 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:52,650 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:52,650 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:17:54,651 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:54,652 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:54,652 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:54,652 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:02,660 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:02,661 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:02,661 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:02,661 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:04,662 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:04,663 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:04,663 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:04,663 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:06,664 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:06,665 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:06,665 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:06,665 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:14,673 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:14,674 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:14,674 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:14,674 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:16,675 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:16,676 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:16,676 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:16,676 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-04-02 17:18:22,682 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:22,682 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:22,682 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s);
> scheduling...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.