[ 
https://issues.apache.org/jira/browse/HADOOP-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645389#action_12645389
 ] 

Aaron Kimball commented on HADOOP-4595:
---------------------------------------

There are two reducers allowed per node; maxFreeSlots is 2. But it was in the 
middle of running tasks. numFreeSlots hit 1, even though two JVMs were already 
out there and marked as busy.

The attempt did not get launched. The launch process was what triggered the 
exception. 

> JVM Reuse triggers RuntimeException("Invalid state")
> ----------------------------------------------------
>
>                 Key: HADOOP-4595
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4595
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Aaron Kimball
>            Assignee: Devaraj Das
>
> A Reducer triggers the following exception:
> 08/11/05 08:58:50 INFO mapred.JobClient: Task Id : 
> attempt_200811040110_0230_r_000008_1, Status : FAILED
> java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an 
> unstable state while reaping a JVM for task: 
> attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
>  JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true 
> Currently running: attempt_200811040110_0230_r_000012_0
>  JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true 
> Currently running: attempt_200811040110_0230_r_000040_0
>    at java.lang.Throwable.<init>(Throwable.java:67)
>    at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
>    at 
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
>    at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
>    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410) 
> Other clues:
> In the three reduce task attempts where this was observed, this was attempt 
> _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think 
> this is happening only on speculatively-executed reduce task attempts. The 
> reduce output (part-XXXXX) gets lost when this attempt fails, even though the 
> other (earlier) attempt succeeded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to