Sahil Takiar created HIVE-20134:
-----------------------------------

             Summary: Improve logging when HoS Driver is killed due to 
exceeding memory limits
                 Key: HIVE-20134
                 URL: https://issues.apache.org/jira/browse/HIVE-20134
             Project: Hive
          Issue Type: Sub-task
          Components: Spark
            Reporter: Sahil Takiar


This was improved in HIVE-18093, but more can be done. If a HoS Driver gets 
killed because it exceeds its memory limits, YARN will issue a SIGTERM on the 
process. The SIGTERM will cause the shutdown hook in the HoS Driver to be 
triggered. This causes the Driver to kill all running jobs, even if they are 
running. The user ends up seeing an error like the one below. Which isn't very 
informative. We should propagate the error from the Driver shutdown hook to the 
user.
{code:java}
INFO : 2018-07-09 17:48:42,580 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 0(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:44,589 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 1(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:45,591 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
INFO : 2018-07-09 17:48:48,596 Stage-64_0: 526/526 Finished Stage-65_0: 
1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
0/1099 Stage-69_0: 0/1
ERROR : Spark job[23] failed
java.lang.InterruptedException: null
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
 ~[?:1.8.0_141]
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 ~[?:1.8.0_141]
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) 
~[scala-library-2.11.8.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) 
~[scala-library-2.11.8.jar:?]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) 
~[scala-library-2.11.8.jar:?]
at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:125) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:114) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:264) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277) 
~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:391)
 ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:352)
 ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
ERROR : FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. null
INFO : Completed executing 
command(queryId=hive_20180709174140_0f64ee17-f793-441a-9a77-3ee0cd0a9c32); Time 
taken: 249.727 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.spark.SparkTask. null 
(state=08S01,code=1){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to