[ https://issues.apache.org/jira/browse/HIVE-20134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538504#comment-16538504 ]
Sahil Takiar commented on HIVE-20134: ------------------------------------- This probably requires fixing HIVE-18921 too. > Improve logging when HoS Driver is killed due to exceeding memory limits > ------------------------------------------------------------------------ > > Key: HIVE-20134 > URL: https://issues.apache.org/jira/browse/HIVE-20134 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Sahil Takiar > Priority: Major > > This was improved in HIVE-18093, but more can be done. If a HoS Driver gets > killed because it exceeds its memory limits, YARN will issue a SIGTERM on the > process. The SIGTERM will cause the shutdown hook in the HoS Driver to be > triggered. This causes the Driver to kill all running jobs, even if they are > running. The user ends up seeing an error like the one below. Which isn't > very informative. We should propagate the error from the Driver shutdown hook > to the user. > {code:java} > INFO : 2018-07-09 17:48:42,580 Stage-64_0: 526/526 Finished Stage-65_0: > 1405/1405 Finished Stage-66_0: 0(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: > 0/1099 Stage-69_0: 0/1 > INFO : 2018-07-09 17:48:44,589 Stage-64_0: 526/526 Finished Stage-65_0: > 1405/1405 Finished Stage-66_0: 1(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: > 0/1099 Stage-69_0: 0/1 > INFO : 2018-07-09 17:48:45,591 Stage-64_0: 526/526 Finished Stage-65_0: > 1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: > 0/1099 Stage-69_0: 0/1 > INFO : 2018-07-09 17:48:48,596 Stage-64_0: 526/526 Finished Stage-65_0: > 1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: > 0/1099 Stage-69_0: 0/1 > ERROR : Spark job[23] failed > java.lang.InterruptedException: null > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) > ~[?:1.8.0_141] > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > ~[?:1.8.0_141] > at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > ~[scala-library-2.11.8.jar:?] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > ~[scala-library-2.11.8.jar:?] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) > ~[scala-library-2.11.8.jar:?] > at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:125) > ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:114) > ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222) > ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:264) > ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277) > ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:391) > ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT] > at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:352) > ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_141] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141] > ERROR : FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. null > INFO : Completed executing > command(queryId=hive_20180709174140_0f64ee17-f793-441a-9a77-3ee0cd0a9c32); > Time taken: 249.727 seconds > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. null > (state=08S01,code=1){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)