[
https://issues.apache.org/jira/browse/MAPREDUCE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848535#action_12848535
]
Amareshwari Sriramadasu commented on MAPREDUCE-1617:
----------------------------------------------------
After going through the failure log, I think the following is the cause for
failure.
The test expects first three attempts of the task to fail with System.exit,
RuntimeTimeException and Timed out (failed to report
status in 30 seconds) respectively; and fourth attempt should succeed. But, in
the test log, fourth attempt also timed out.
Here is the log for fourth attempt :
{noformat}
2010-03-22 01:25:51,560 INFO mapred.JobTracker
(JobTracker.java:createTaskEntry(2484)) - Adding task (MAP)
'attempt_20100322012429762_0001_m_000000_3' to tip
task_20100322012429762_0001_m_000000, for tracker
'tracker_host1.foo.com:localhost/127.0.0.1:49080'
2010-03-22 01:25:51,562 INFO mapred.TaskTracker
(TaskTracker.java:registerTask(2125)) - LaunchTaskAction
(registerTask): attempt_20100322012429762_0001_m_000000_3 task's
state:UNASSIGNED
2010-03-22 01:25:51,562 INFO mapred.TaskTracker (TaskTracker.java:run(2062)) -
Trying to launch :
attempt_20100322012429762_0001_m_000000_3 which needs 1 slots
2010-03-22 01:25:51,562 INFO mapred.TaskTracker (TaskTracker.java:run(2094)) -
In TaskLauncher, current free slots : 2
and trying to launch attempt_20100322012429762_0001_m_000000_3 which needs 1
slots
2010-03-22 01:26:21,595 INFO mapred.TaskTracker
(TaskTracker.java:markUnresponsiveTasks(1682)) -
attempt_20100322012429762_0001_m_000000_3: Task
attempt_20100322012429762_0001_m_000000_3 failed to report status for
30 seconds. Killing!
2010-03-22 01:26:21,616 INFO mapred.TaskTracker
(TaskTracker.java:purgeTask(1827)) - About to purge task:
attempt_20100322012429762_0001_m_000000_3
2010-03-22 01:26:26,619 INFO mapred.TaskRunner (MapTaskRunner.java:close(43)) -
attempt_20100322012429762_0001_m_000000_3 done; removing files.
2010-03-22 01:26:26,620 INFO mapred.IndexCache
(IndexCache.java:removeMap(140)) - Map ID
attempt_20100322012429762_0001_m_000000_3 not found in cache
{noformat}
For the fourth attempt, attempt_20100322012429762_0001_m_000000_3, I don't see
the log saying "JVM with ID:xxxx is given
task: attempt_20100322012429762_0001_m_000000_3".
This says that jvm's getTask() has not returned in 30 seconds (the task's
timeout configured in test). This is most likely because of HADOOP-5130. We
avoid this in our clusters by setting -Djava.net.preferIPv4Stack=true in
mapred.child.java.opts.
Shall we set the same in Unit test(s) also ?
> TestBadRecords failed once in our test runs
> -------------------------------------------
>
> Key: MAPREDUCE-1617
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1617
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: test
> Reporter: Amareshwari Sriramadasu
> Fix For: 0.22.0
>
> Attachments: TestBadRecords.txt
>
>
> org.apache.hadoop.mapred.TestBadRecords.testBadMapRed failed with the
> following
> exception:
> java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
> at
> org.apache.hadoop.mapred.TestBadRecords.runMapReduce(TestBadRecords.java:94)
> at
> org.apache.hadoop.mapred.TestBadRecords.testBadMapRed(TestBadRecords.java:211)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.