[ https://issues.apache.org/jira/browse/HIVE-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239039#comment-16239039 ]
Hive QA commented on HIVE-17908: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12895972/HIVE-17908.6.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 11354 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=62) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=156) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[ct_noperm_loc] (batchId=94) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multi] (batchId=111) org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut (batchId=206) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAmPoolInteractions (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanQpChanges (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testApplyPlanUserMapping (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testAsyncSessionInitFailures (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testClusterFractions (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testDestroyAndReturn (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testQueueing (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReopen (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuse (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithDifferentPool (batchId=281) org.apache.hadoop.hive.ql.exec.tez.TestWorkloadManager.testReuseWithQueueing (batchId=281) org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7636/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7636/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7636/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 19 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12895972 - PreCommit-HIVE-Build > LLAP External client not correctly handling killTask for pending requests > ------------------------------------------------------------------------- > > Key: HIVE-17908 > URL: https://issues.apache.org/jira/browse/HIVE-17908 > Project: Hive > Issue Type: Bug > Components: llap > Reporter: Jason Dere > Assignee: Jason Dere > Priority: Major > Attachments: HIVE-17908.1.patch, HIVE-17908.2.patch, > HIVE-17908.3.patch, HIVE-17908.4.patch, HIVE-17908.5.patch, HIVE-17908.6.patch > > > Hitting "Timed out waiting for heartbeat for task ID" errors with the LLAP > external client. > HIVE-17393 fixed some of these errors, however it is also occurring because > the client is not correctly handling the killTask notification when the > request is accepted but still waiting for the first task heartbeat. In this > situation the client should retry the request, similar to what the LLAP AM > does. Current logic is ignoring the killTask in this situation, which results > in a heartbeat timeout - no heartbeats are sent by LLAP because of the > killTask notification. > {noformat} > 17/08/09 05:36:02 WARN TaskSetManager: Lost task 10.0 in stage 4.0 (TID 14, > cn114-10.l42scl.hortonworks.com, executor 5): java.io.IOException: Received > reader event error: Timed out waiting for heartbeat for task ID > attempt_7739111832518812959_0005_0_00_000010_0 > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:178) > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:50) > at > org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121) > at > org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: > LlapTaskUmbilicalExternalClient(attempt_7739111832518812959_0005_0_00_000010_0): > Error while attempting to read chunk length > at > org.apache.hadoop.hive.llap.io.ChunkedInputStream.read(ChunkedInputStream.java:82) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.hasInput(LlapBaseRecordReader.java:267) > at > org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:142) > ... 22 more > Caused by: java.net.SocketException: Socket closed > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)