[jira] [Commented] (TEZ-2954) Container launch timeouts should count towards node blacklisting
[ https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194355#comment-15194355 ] Siddharth Seth commented on TEZ-2954: - [~ozawa] - the problem is highlighted in https://issues.apache.org/jira/browse/TEZ-925?focusedCommentId=13932292=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932292 If we receive a container timeout - we would have received a task timeout as well - which is factored in. The problem is that a launch failure on the NM will be reported back via the RM. When that happens, we lose track of the fact that the launch failed. If there's a timoue while talking to the NM - that will register as a task failure. The jira description should have been better. > Container launch timeouts should count towards node blacklisting > > > Key: TEZ-2954 > URL: https://issues.apache.org/jira/browse/TEZ-2954 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-2954.001.patch > > > Currently, only task failures count towards blacklisting. A container timing > out should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2954) Container launch timeouts should count towards node blacklisting
[ https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183434#comment-15183434 ] Siddharth Seth commented on TEZ-2954: - [~ozawa] - I'll try looking at the patch by the end of the week. > Container launch timeouts should count towards node blacklisting > > > Key: TEZ-2954 > URL: https://issues.apache.org/jira/browse/TEZ-2954 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-2954.001.patch > > > Currently, only task failures count towards blacklisting. A container timing > out should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2954) Container launch timeouts should count towards node blacklisting
[ https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179606#comment-15179606 ] TezQA commented on TEZ-2954: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12791422/TEZ-2954.001.patch against master revision 91e24d7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance org.apache.tez.dag.app.dag.impl.TestDAGImpl Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1542//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1542//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1542//console This message is automatically generated. > Container launch timeouts should count towards node blacklisting > > > Key: TEZ-2954 > URL: https://issues.apache.org/jira/browse/TEZ-2954 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-2954.001.patch > > > Currently, only task failures count towards blacklisting. A container timing > out should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2954) Container launch timeouts should count towards node blacklisting
[ https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179515#comment-15179515 ] Tsuyoshi Ozawa commented on TEZ-2954: - I think the patch includes TEZ-925. [~sseth] could you take a look? > Container launch timeouts should count towards node blacklisting > > > Key: TEZ-2954 > URL: https://issues.apache.org/jira/browse/TEZ-2954 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-2954.001.patch > > > Currently, only task failures count towards blacklisting. A container timing > out should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2954) Container launch timeouts should count towards node blacklisting
[ https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057551#comment-15057551 ] Jeff Zhang commented on TEZ-2954: - Move to 0.7.2 > Container launch timeouts should count towards node blacklisting > > > Key: TEZ-2954 > URL: https://issues.apache.org/jira/browse/TEZ-2954 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth > > Currently, only task failures count towards blacklisting. A container timing > out should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)