[ 
https://issues.apache.org/jira/browse/HIVE-20134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538504#comment-16538504
 ] 

Sahil Takiar commented on HIVE-20134:
-------------------------------------

This probably requires fixing HIVE-18921 too.

> Improve logging when HoS Driver is killed due to exceeding memory limits
> ------------------------------------------------------------------------
>
>                 Key: HIVE-20134
>                 URL: https://issues.apache.org/jira/browse/HIVE-20134
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Priority: Major
>
> This was improved in HIVE-18093, but more can be done. If a HoS Driver gets 
> killed because it exceeds its memory limits, YARN will issue a SIGTERM on the 
> process. The SIGTERM will cause the shutdown hook in the HoS Driver to be 
> triggered. This causes the Driver to kill all running jobs, even if they are 
> running. The user ends up seeing an error like the one below. Which isn't 
> very informative. We should propagate the error from the Driver shutdown hook 
> to the user.
> {code:java}
> INFO : 2018-07-09 17:48:42,580 Stage-64_0: 526/526 Finished Stage-65_0: 
> 1405/1405 Finished Stage-66_0: 0(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
> 0/1099 Stage-69_0: 0/1
> INFO : 2018-07-09 17:48:44,589 Stage-64_0: 526/526 Finished Stage-65_0: 
> 1405/1405 Finished Stage-66_0: 1(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
> 0/1099 Stage-69_0: 0/1
> INFO : 2018-07-09 17:48:45,591 Stage-64_0: 526/526 Finished Stage-65_0: 
> 1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
> 0/1099 Stage-69_0: 0/1
> INFO : 2018-07-09 17:48:48,596 Stage-64_0: 526/526 Finished Stage-65_0: 
> 1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 
> 0/1099 Stage-69_0: 0/1
> ERROR : Spark job[23] failed
> java.lang.InterruptedException: null
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
>  ~[?:1.8.0_141]
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  ~[?:1.8.0_141]
> at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) 
> ~[scala-library-2.11.8.jar:?]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) 
> ~[scala-library-2.11.8.jar:?]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) 
> ~[scala-library-2.11.8.jar:?]
> at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:125) 
> ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:114) 
> ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222) 
> ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:264) 
> ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277) 
> ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:391)
>  ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:352)
>  ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_141]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
> ERROR : FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask. null
> INFO : Completed executing 
> command(queryId=hive_20180709174140_0f64ee17-f793-441a-9a77-3ee0cd0a9c32); 
> Time taken: 249.727 seconds
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. null 
> (state=08S01,code=1){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to