[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848535#action_12848535
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1617:
----------------------------------------------------

After going through the failure log,  I think the following is the cause for 
failure.

The test expects first three attempts of the task to fail with System.exit, 
RuntimeTimeException and Timed out (failed to report
status in 30 seconds) respectively; and fourth attempt should succeed. But, in 
the test log, fourth attempt also timed out.

Here is the log for fourth attempt :
{noformat}
2010-03-22 01:25:51,560 INFO  mapred.JobTracker 
(JobTracker.java:createTaskEntry(2484)) - Adding task (MAP)
'attempt_20100322012429762_0001_m_000000_3' to tip 
task_20100322012429762_0001_m_000000, for tracker
'tracker_host1.foo.com:localhost/127.0.0.1:49080'
2010-03-22 01:25:51,562 INFO  mapred.TaskTracker 
(TaskTracker.java:registerTask(2125)) - LaunchTaskAction
(registerTask): attempt_20100322012429762_0001_m_000000_3 task's 
state:UNASSIGNED
2010-03-22 01:25:51,562 INFO  mapred.TaskTracker (TaskTracker.java:run(2062)) - 
Trying to launch :
attempt_20100322012429762_0001_m_000000_3 which needs 1 slots
2010-03-22 01:25:51,562 INFO  mapred.TaskTracker (TaskTracker.java:run(2094)) - 
In TaskLauncher, current free slots : 2
and trying to launch attempt_20100322012429762_0001_m_000000_3 which needs 1 
slots
2010-03-22 01:26:21,595 INFO  mapred.TaskTracker 
(TaskTracker.java:markUnresponsiveTasks(1682)) -
attempt_20100322012429762_0001_m_000000_3: Task 
attempt_20100322012429762_0001_m_000000_3 failed to report status for
30 seconds. Killing!
2010-03-22 01:26:21,616 INFO  mapred.TaskTracker 
(TaskTracker.java:purgeTask(1827)) - About to purge task:
attempt_20100322012429762_0001_m_000000_3
2010-03-22 01:26:26,619 INFO  mapred.TaskRunner (MapTaskRunner.java:close(43)) -
attempt_20100322012429762_0001_m_000000_3 done; removing files.
2010-03-22 01:26:26,620 INFO  mapred.IndexCache 
(IndexCache.java:removeMap(140)) - Map ID
attempt_20100322012429762_0001_m_000000_3 not found in cache
{noformat}

For the fourth attempt, attempt_20100322012429762_0001_m_000000_3, I don't see 
the log saying "JVM with ID:xxxx is given
task: attempt_20100322012429762_0001_m_000000_3".
This says that jvm's getTask() has not returned in 30 seconds (the task's 
timeout configured in test). This is most likely because of HADOOP-5130. We 
avoid this in our clusters by setting -Djava.net.preferIPv4Stack=true in 
mapred.child.java.opts. 

Shall we set the same in Unit test(s) also ?


> TestBadRecords failed once in our test runs
> -------------------------------------------
>
>                 Key: MAPREDUCE-1617
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1617
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: test
>            Reporter: Amareshwari Sriramadasu
>             Fix For: 0.22.0
>
>         Attachments: TestBadRecords.txt
>
>
> org.apache.hadoop.mapred.TestBadRecords.testBadMapRed failed with the 
> following
> exception:
> java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
>         at 
> org.apache.hadoop.mapred.TestBadRecords.runMapReduce(TestBadRecords.java:94)
>         at 
> org.apache.hadoop.mapred.TestBadRecords.testBadMapRed(TestBadRecords.java:211)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to