[jira] [Updated] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating

2012-12-21 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4890:
--

   Resolution: Fixed
Fix Version/s: 0.23.6
   2.0.3-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review, Tom.  I committed this to trunk, branch-2, and 
branch-0.23.

 Invalid TaskImpl state transitions when task fails while speculating
 

 Key: MAPREDUCE-4890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4890.patch


 There are a couple of issues when a task fails while speculating (i.e.: 
 multiple attempts are active):
 # The other active attempts are not killed.
 # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which 
 can be sent from the other active attempts.  These all need to be handled 
 since they can be sent asynchronously from the other active task attempts.
 Failure to handle this properly means jobs that are configured to normally 
 tolerate failures via mapreduce.map.failures.maxpercent or 
 mapreduce.reduce.failures.maxpercent and also speculate can easily end up 
 failing due to invalid state transitions rather than complete successfully 
 with a few explicitly allowed task failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating

2012-12-19 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4890:
--

Attachment: MAPREDUCE-4890.patch

Patch to kill active attempts when a task transitions to the FAILED state and 
also ingore all T_ATTEMPT_* events while in the FAILED state.

 Invalid TaskImpl state transitions when task fails while speculating
 

 Key: MAPREDUCE-4890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-4890.patch


 There are a couple of issues when a task fails while speculating (i.e.: 
 multiple attempts are active):
 # The other active attempts are not killed.
 # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which 
 can be sent from the other active attempts.  These all need to be handled 
 since they can be sent asynchronously from the other active task attempts.
 Failure to handle this properly means jobs that are configured to normally 
 tolerate failures via mapreduce.map.failures.maxpercent or 
 mapreduce.reduce.failures.maxpercent and also speculate can easily end up 
 failing due to invalid state transitions rather than complete successfully 
 with a few explicitly allowed task failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4890) Invalid TaskImpl state transitions when task fails while speculating

2012-12-19 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-4890:
--

Assignee: Jason Lowe
Target Version/s: 2.0.3-alpha, 0.23.6
  Status: Patch Available  (was: Open)

 Invalid TaskImpl state transitions when task fails while speculating
 

 Key: MAPREDUCE-4890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 0.23.5, 2.0.2-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-4890.patch


 There are a couple of issues when a task fails while speculating (i.e.: 
 multiple attempts are active):
 # The other active attempts are not killed.
 # TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which 
 can be sent from the other active attempts.  These all need to be handled 
 since they can be sent asynchronously from the other active task attempts.
 Failure to handle this properly means jobs that are configured to normally 
 tolerate failures via mapreduce.map.failures.maxpercent or 
 mapreduce.reduce.failures.maxpercent and also speculate can easily end up 
 failing due to invalid state transitions rather than complete successfully 
 with a few explicitly allowed task failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira