[ https://issues.apache.org/jira/browse/TEZ-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296588#comment-14296588 ]
Hadoop QA commented on TEZ-1929: -------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695215/TEZ-1929.3.patch against master revision e84c1aa. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/90//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/90//console This message is automatically generated. > AM intermittently sending kill signal to running task in heartbeat > ------------------------------------------------------------------ > > Key: TEZ-1929 > URL: https://issues.apache.org/jira/browse/TEZ-1929 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.6.0 > Reporter: Rajesh Balamohan > Assignee: Bikas Saha > Attachments: Screen Shot 2015-01-08 at 2.09.11 PM.png, Screen Shot > 2015-01-08 at 2.28.04 PM.png, TEZ-1929.1.patch, TEZ-1929.2.patch, > TEZ-1929.3.patch, applog.txt.gz, tasklog.txt > > > Observed this behavior 3 or 4 times > - Ran a hive query with tez (query_17 at 10 TB scale) > - Occasionally, Map_7 task will get into failed state in the middle of > fetching data from other sources (only one task is available in Map_7). > {code} > 2015-01-08 00:19:10,289 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: > Completed fetch for attempt: InputAttemptIdentifier > [inputIdentifier=InputIdentifier [inputIndex=0], attemptNumber=0, > pathComponent=attempt_1420000126204_0233_1_06_000000_0_10003] to MEMORY, > CompressedSize=6757, DecompressedSize=16490,EndTime=1420705150289, > TimeTaken=5, Rate=1.29 MB/s > 2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: All > inputs fetched for input vertex : Map 6 > 2015-01-08 00:19:10,290 INFO [Fetcher [Map_6] #0] impl.ShuffleManager: copy(0 > of 1. Transfer rate (CumulativeDataFetched/TimeSinceInputStarted)) 0.01 MB/s) > 2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: > Shutting down FetchScheduler, Was Interrupted: false > 2015-01-08 00:19:10,290 INFO [ShuffleRunner [Map_6]] impl.ShuffleManager: > Scheduler thread completed > 2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: > Received should die response from AM > 2015-01-08 00:19:41,986 INFO [TaskHeartbeatThread] task.TaskReporter: Asked > to die via task heartbeat > 2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Interrupted while > waiting for task to complete. Interrupting task > 2015-01-08 00:19:41,987 INFO [main] task.TezTaskRunner: Shutdown requested... > returning > 2015-01-08 00:19:41,987 INFO [main] task.TezChild: Got a shouldDie > notification via hearbeats. Shutting down > 2015-01-08 00:19:41,990 ERROR [TezChild] tez.TezProcessor: > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > at > org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.awaitCondition(InputReadyTracker.java:120) > at > org.apache.tez.runtime.InputReadyTracker.waitForAnyInputReady(InputReadyTracker.java:83) > at > org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAnyInputReady(TezProcessorContextImpl.java:106) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:153) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:328) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > {code} > From the initial look, it appears that TaskAttemptListenerImpTezDag.heartbeat > is unable to identify the containerId from registeredContainers. Need to > verify this. > I will attach the sample task log and the tez-ui details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)