[ https://issues.apache.org/jira/browse/TEZ-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Zhang updated TEZ-1893: ---------------------------- Target Version/s: 0.5.4 (was: 0.7.0) > Some vertex init fail are still not propagated to clients > --------------------------------------------------------- > > Key: TEZ-1893 > URL: https://issues.apache.org/jira/browse/TEZ-1893 > Project: Apache Tez > Issue Type: Bug > Reporter: Jeff Zhang > Assignee: Jeff Zhang > > {code} > throw new TezUncheckedException(vertex.getLogIdentifier() + > " has -1 tasks but does not have input initializers, " + > "1-1 uninited sources or custom vertex manager to set it at > runtime"); > {code} > IMO, for this kind of verification we could do it in client side (DAG.verify) > The following are the message on the client side, the reason that Client > could not get the real status of DAG is that Tez AM is killed due to this > vertex init error > {code} > 19:25:33,716 - Thread( main) - (RMProxy.java:98) - Connecting to > ResourceManager at /0.0.0.0:8032 > 19:25:33,717 - Thread( main) - (AHSProxy.java:42) - Connecting to Application > History server at /0.0.0.0:10200 > 19:25:34,724 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:35,725 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:36,726 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:36,846 - Thread( main) - (DAGClientImpl.java:463) - DAG initialized: > CurrentState=Running > 19:25:38,351 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:39,352 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:40,354 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:41,356 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:42,357 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 4 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:43,358 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 5 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:44,359 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 6 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:45,360 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 7 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:46,361 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 8 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:47,362 - Thread( main) - (Client.java:858) - Retrying connect to > server: localhost/127.0.0.1:6000. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 19:25:47,369 - Thread( main) - (DAGClientImpl.java:463) - DAG completed. > FinalState=FAILED > 19:25:47,369 - Thread( main) - (TezWordCount.java:203) - status=FAILED, > progress=null, diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, > failedDAGs=0, killedDAGs=0 > , counters=null > 19:25:47,372 - Thread( main) - (TezClient.java:470) - Shutting down Tez > Session, sessionName=commonName, applicationId=application_1420335690331_0007 > 19:25:47,374 - Thread( main) - (TezClientUtils.java:838) - Application not > running, applicationId=application_1420335690331_0007, > yarnApplicationState=FINISHED, finalApplicationStatus=FAILED, > trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A, > diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, > killedDAGs=0 > 19:25:47,375 - Thread( main) - (TezClient.java:484) - Failed to shutdown Tez > Session via proxy > org.apache.tez.dag.api.SessionNotRunning: Application not running, > applicationId=application_1420335690331_0007, yarnApplicationState=FINISHED, > finalApplicationStatus=FAILED, > trackingUrl=http://localhost:8088/proxy/application_1420335690331_0007/A, > diagnostics=Session stats:submittedDAGs=0, successfulDAGs=0, failedDAGs=0, > killedDAGs=0 > at > org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:839) > at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:669) > at org.apache.tez.client.TezClient.stop(TezClient.java:476) > at com.zjffdu.tez.tutorial.TezWordCount.main(TezWordCount.java:204) > 19:25:47,377 - Thread( main) - (TezClient.java:489) - Could not connect to > AM, killing session via YARN, sessionName=commonName, > applicationId=application_1420335690331_0007 > 19:25:47,381 - Thread( main) - (YarnClientImpl.java:364) - Killed application > application_1420335690331_0007 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)