Sahil Takiar created HIVE-20134:
-----------------------------------
Summary: Improve logging when HoS Driver is killed due to
exceeding memory limits
Key: HIVE-20134
URL: https://issues.apache.org/jira/browse/HIVE-20134
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Sahil Takiar
This was improved in HIVE-18093, but more can be done. If a HoS Driver gets
killed because it exceeds its memory limits, YARN will issue a SIGTERM on the
process. The SIGTERM will cause the shutdown hook in the HoS Driver to be
triggered. This causes the Driver to kill all running jobs, even if they are
running. The user ends up seeing an error like the one below. Which isn't very
informative. We should propagate the error from the Driver shutdown hook to the
user.
{code:java}
INFO : 2018-07-09 17:48:42,580 Stage-64_0: 526/526 Finished Stage-65_0:
1405/1405 Finished Stage-66_0: 0(+759)/1102 Stage-67_0: 0/1099 Stage-68_0:
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:44,589 Stage-64_0: 526/526 Finished Stage-65_0:
1405/1405 Finished Stage-66_0: 1(+759)/1102 Stage-67_0: 0/1099 Stage-68_0:
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:45,591 Stage-64_0: 526/526 Finished Stage-65_0:
1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0:
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:48,596 Stage-64_0: 526/526 Finished Stage-65_0:
1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0:
0/1099 Stage-69_0: 0/1
ERROR : Spark job[23] failed
java.lang.InterruptedException: null
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
~[?:1.8.0_141]
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
~[?:1.8.0_141]
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
~[scala-library-2.11.8.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
~[scala-library-2.11.8.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
~[scala-library-2.11.8.jar:?]
at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:125)
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:114)
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222)
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:264)
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:391)
~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:352)
~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_141]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
ERROR : FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.spark.SparkTask. null
INFO : Completed executing
command(queryId=hive_20180709174140_0f64ee17-f793-441a-9a77-3ee0cd0a9c32); Time
taken: 249.727 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1
from org.apache.hadoop.hive.ql.exec.spark.SparkTask. null
(state=08S01,code=1){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)