[jira] [Commented] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill
[ https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223111#comment-15223111 ] TezQA commented on TEZ-3161: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12796701/TEZ-3161.5.txt against master revision 0c7e1c5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 35 javac compiler warnings (more than the master's current 33 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1605//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1605//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1605//console This message is automatically generated. > Allow task to report different kinds of errors - fatal / kill > - > > Key: TEZ-3161 > URL: https://issues.apache.org/jira/browse/TEZ-3161 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, > TEZ-3161.4.txt, TEZ-3161.5.txt > > > In some cases, task failures will be the same across all attempts - e.g. > exceeding memory utilization on an operation. In this case, there's no point > in running another attempt of the same task. > There's other cases where a task may want to mark itself as KILLED - i.e. a > temporary error. An example of this is pipelined shuffle. > Tez should allow both operations. > cc [~vikram.dixit], [~rajesh.balamohan] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3161 PreCommit Build #1605
Jira: https://issues.apache.org/jira/browse/TEZ-3161 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1605/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4542 lines...] [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-tests [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12796701/TEZ-3161.5.txt against master revision 0c7e1c5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 35 javac compiler warnings (more than the master's current 33 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1605//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1605//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1605//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 48c37534326b0b8ef5192101c13894b1de8379a6 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 7 tests failed. FAILED: org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs Error Message: expected: but was: Stack Trace: java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:141) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120) at org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs(TestFaultTolerance.java:763) FAILED: org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit Error Message: TezSession has already shutdown. No cluster diagnostics found. Stack Trace: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:849) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120) at org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:261) FAILED: org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices Error Message: TezSession has already shutdown. No cluster diagnostics found. Stack Trace: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No cluster diagnostics found. at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:849) at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129) a
[jira] [Updated] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill
[ https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3161: Attachment: TEZ-3161.5.txt Updated patch with the test fixed. > Allow task to report different kinds of errors - fatal / kill > - > > Key: TEZ-3161 > URL: https://issues.apache.org/jira/browse/TEZ-3161 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, > TEZ-3161.4.txt, TEZ-3161.5.txt > > > In some cases, task failures will be the same across all attempts - e.g. > exceeding memory utilization on an operation. In this case, there's no point > in running another attempt of the same task. > There's other cases where a task may want to mark itself as KILLED - i.e. a > temporary error. An example of this is pipelined shuffle. > Tez should allow both operations. > cc [~vikram.dixit], [~rajesh.balamohan] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill
[ https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223070#comment-15223070 ] TezQA commented on TEZ-3161: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12796697/TEZ-3161.4.txt against master revision 0c7e1c5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 35 javac compiler warnings (more than the master's current 33 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance org.apache.tez.dag.history.logging.ats.TestHistoryEventTimelineConversion org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//console This message is automatically generated. > Allow task to report different kinds of errors - fatal / kill > - > > Key: TEZ-3161 > URL: https://issues.apache.org/jira/browse/TEZ-3161 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, > TEZ-3161.4.txt > > > In some cases, task failures will be the same across all attempts - e.g. > exceeding memory utilization on an operation. In this case, there's no point > in running another attempt of the same task. > There's other cases where a task may want to mark itself as KILLED - i.e. a > temporary error. An example of this is pipelined shuffle. > Tez should allow both operations. > cc [~vikram.dixit], [~rajesh.balamohan] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3161 PreCommit Build #1604
Jira: https://issues.apache.org/jira/browse/TEZ-3161 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1604/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4784 lines...] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-dag [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12796697/TEZ-3161.4.txt against master revision 0c7e1c5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 35 javac compiler warnings (more than the master's current 33 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestFaultTolerance org.apache.tez.dag.history.logging.ats.TestHistoryEventTimelineConversion org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1604//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. ff8e880f9cab8ceaa402fc136f0abb85c2cea747 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 9 tests failed. FAILED: org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources Error Message: Wanted but not invoked: taskSchedulerManagerForTest.taskAllocated( 0, Mock for TA attempt_0_0001_0_01_04_1, , Container: [ContainerId: container_1_0001_01_01, NodeId: host1:0, NodeHttpAddress: host1:0, Resource: , Priority: 1, Token: null, ] ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1272) However, there were other interactions with this mock: taskSchedulerManagerForTest.init( Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143) taskSchedulerManagerForTest.setConfig( Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143) taskSchedulerManagerForTest.serviceInit( Configuration: core-default.xml, core-site.xml, yarn-default.xml, yarn-site.xml ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1143) taskSchedulerManagerForTest.start(); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144) taskSchedulerManagerForTest.serviceStart(); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144) taskSchedulerManagerForTest.instantiateSchedulers( "host", 0, "", Mock for AppContext, hashCode: 833038353 ); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestContainerReuse.java:1144) taskSchedulerManagerForTest.getContainerSignatureMatcher(); -> at org.apache.tez.dag.app.rm.TestContainerReuse.testReuseConflictLocalResources(TestCon
[jira] [Commented] (TEZ-3077) TezClient.waitTillReady should support timeout
[ https://issues.apache.org/jira/browse/TEZ-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223058#comment-15223058 ] Siddharth Seth commented on TEZ-3077: - Thanks for the updated patch [~kshukla]. This looks a lot better in terms of the APIs. Some comments. - Can the existing preWarm method be changed to invoke the new one with a timeout of 0 ? Similar to what has been done for the existing waitTillReady method. - In waitTillReady {code} + if ((timeout > 0) && + Time.monotonicNow() - startTime >= timeout) { +return false; {code} This check should be after checking the updated status to be READY. Otherwise we could end up timing out in the last iteration even if the state did change to READY. {code}long sleepTime = (SLEEP_FOR_READY > timeout) ? SLEEP_FOR_READY - timeout : SLEEP_FOR_READY;{code} Should this be {code} long sleepTime = (SLEEP_FOR_READY > timeout) ? timeout : SLEEP_FOR_READY; {code} Even better would be to sleep for whatever time is actually left. {code} long now = Time.monotonicNow(); if (startTime + timeout > now) { long sleepTime = Math.min(SLEEP_FOR_READY, startTime + timeout - now); Thread.sleep(sleepTime); } else { return false; } {code} On the unit test, could you please look at testStopRetriesUntilTimeout - and see if a test can be added along these lines. i.e. it actually validates that attempts were made to get the appReport, and a final timeout - rather than returning success. > TezClient.waitTillReady should support timeout > -- > > Key: TEZ-3077 > URL: https://issues.apache.org/jira/browse/TEZ-3077 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Kuhu Shukla > Attachments: TEZ-3077.001.patch, TEZ-3077.002.patch > > > Also preWarm. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3177) Non-DAG events should use the session domain or no domain if the data does not need protection
[ https://issues.apache.org/jira/browse/TEZ-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223051#comment-15223051 ] Siddharth Seth commented on TEZ-3177: - {code} + if (historyACLPolicyManager != null + && sessionDomainId != null && !sessionDomainId.isEmpty() + && domainId != null && !domainId.isEmpty()) { +if (HistoryEventType.isDAGSpecificEvent(event.getHistoryEvent().getEventType())) { historyACLPolicyManager.updateTimelineEntityDomain(entities[i], domainId); +} else { + historyACLPolicyManager.updateTimelineEntityDomain(entities[i], sessionDomainId); } {code} Dag specific domain id / session domain id - will both either be set, or unset ? Do wen end up missing the domainId in some cases if both are unset. It may be better to flip the check to say - if(dagDomainType) - null check the dagDomainId, otherwise null check the sessionDomainId. Rest looks good. > Non-DAG events should use the session domain or no domain if the data does > not need protection > --- > > Key: TEZ-3177 > URL: https://issues.apache.org/jira/browse/TEZ-3177 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Hitesh Shah > Attachments: TEZ-3177.1.patch > > > There have been issues noticed where when using dag specific domains, > container events get generated under different dags causing issues as they > are updated using different domains. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3161) Allow task to report different kinds of errors - fatal / kill
[ https://issues.apache.org/jira/browse/TEZ-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3161: Attachment: TEZ-3161.4.txt Updated patch with the following changes. - FailureType renamed to TaskFailureType - Have retained the APIs introduced in the patch. The existing API is going to get confusing otherwise. Added specific javadocs on fatalError explaining the behaviour, along with deprecation. This seems like the least confusing to me. - Marked killSlef as private - Renamed unsuccessfulEnd to taskFailureType - Added writing to history. Is there some place that ATS data is being read back as well ? I couldn't find that. - Changed the TaskImpl log line to be easier to understand bq. Wouldnt there be only one specific termination cause to indicate that the user-code told the framework to abort itself or kill itself? The TaskAttemptEndReason is set based on which component reported the error - Input / Processor / Output - at least from the task. There's a bunch of other EndReasons which are independent of this. FailureType would now indicate the FailureType on top of whatever EndReason is set. > Allow task to report different kinds of errors - fatal / kill > - > > Key: TEZ-3161 > URL: https://issues.apache.org/jira/browse/TEZ-3161 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-3161.1.txt, TEZ-3161.2.txt, TEZ-3161.3.txt, > TEZ-3161.4.txt > > > In some cases, task failures will be the same across all attempts - e.g. > exceeding memory utilization on an operation. In this case, there's no point > in running another attempt of the same task. > There's other cases where a task may want to mark itself as KILLED - i.e. a > temporary error. An example of this is pipelined shuffle. > Tez should allow both operations. > cc [~vikram.dixit], [~rajesh.balamohan] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3182) linux superuser use maven compile bower always fail
[ https://issues.apache.org/jira/browse/TEZ-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222898#comment-15222898 ] shenxianqiang commented on TEZ-3182: Thanks Sreenath Somarajapuram. My bad If I set default bower-allow-root as a empty value.Is that OK? > linux superuser use maven compile bower always fail > --- > > Key: TEZ-3182 > URL: https://issues.apache.org/jira/browse/TEZ-3182 > Project: Apache Tez > Issue Type: Bug > Components: UI >Affects Versions: 0.6.2, 0.8.2 > Environment: linux rh6 >Reporter: shenxianqiang >Assignee: shenxianqiang >Priority: Trivial > Attachments: TEZ-3182.1.patch, TEZ-3182.patch > > Original Estimate: 96h > Remaining Estimate: 96h > > When I am root. Using 'mvn clean package -DskipTests=true' command always > fail. > [INFO] --- exec-maven-plugin:1.3.2:exec (Bower install) @ tez-ui --- > bower ESUDO Cannot be run with sudo > Additional error details: > Since bower is a user command, there is no need to execute it with superuser > permissions. > If you're having permission errors when using bower without sudo, please > spend a few minutes learning more about how your system should work and make > any necessary repairs. > http://www.joyent.com/blog/installing-node-and-npm > https://gist.github.com/isaacs/579814 > You can however run a command with sudo using --allow-root option > I have to modify pom.xml.Why not modify pom.xml in future? -- This message was sent by Atlassian JIRA (v6.3.4#6332)