[ 
https://issues.apache.org/jira/browse/SPARK-25020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ricky Saltzer updated SPARK-25020:
----------------------------------
    Description: 
Opening this up to give you guys some insight in an issue that will occur when 
using Spark Streaming with Hadoop 2.8. 

Hadoop 2.8 added HADOOP-12950 which adds a upper limit of a 10 second timeout 
for its shutdown hook. From our tests, if the Spark job takes longer than 10 
seconds to shutdown gracefully, the Hadoop shutdown thread seems to trample 
over the graceful shutdown and throw an exception like
{code:java}
18/08/03 17:21:04 ERROR Utils: Uncaught exception in thread pool-1-thread-1
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.spark.streaming.scheduler.ReceiverTracker.stop(ReceiverTracker.scala:177)
at 
org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:114)
at 
org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:682)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1317)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:681)
at 
org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:715)
at 
org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1948)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}
The reason I hit this issue is because we recently upgraded to EMR 5.15, which 
has both Spark 2.3 & Hadoop 2.8. The following workaround has proven successful 
to us (in limited testing)

Instead of just running
{code:java}
...
ssc.start()
ssc.awaitTermination(){code}
We needed to do the following
{code:java}
...
ssc.start()
sys.ShutdownHookThread {
  ssc.stop(true, true)
}
ssc.awaitTermination(){code}
As far as I can tell, there is no way to override the default {{10 second}} 
timeout in HADOOP-12950, which is why we had to go with the workaround. 

Note: I also verified this bug exists even with EMR 5.12.1 which runs Spark 
2.2.x & Hadoop 2.8. 

Ricky
 Epic Games

  was:
Opening this up to give you guys some insight in an issue that will occur when 
using Spark Streaming with Hadoop 2.8. 

Hadoop 2.8 added HADOOP-12950 which adds a upper limit of a 10 second timeout 
for its shutdown hook. From our tests, if the Spark job takes longer than 10 
seconds to shutdown gracefully, the Hadoop shutdown thread seems to trample 
over the graceful shutdown and throw an exception like
{code:java}
18/08/03 17:21:04 ERROR Utils: Uncaught exception in thread pool-1-thread-1
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at 
org.apache.spark.streaming.scheduler.ReceiverTracker.stop(ReceiverTracker.scala:177)
at 
org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:114)
at 
org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:682)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1317)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:681)
at 
org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:715)
at 
org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1948)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}
The reason I hit this issue is because we recently upgraded to EMR 5.15, which 
has both Spark 2.3 & Hadoop 2.8. The following workaround has proven successful 
to us (in limited testing)

Instead of just running
{code:java}
...
ssc.start()
ssc.awaitTermination(){code}
We needed to do the following
{code:java}
...
ssc.start()
{
  ssc.stop(true, true)
}
ssc.awaitTermination(){code}
As far as I can tell, there is no way to override the default {{10 second}} 
timeout in HADOOP-12950, which is why we had to go with the workaround. 

Note: I also verified this bug exists even with EMR 5.12.1 which runs Spark 
2.2.x & Hadoop 2.8. 

Ricky
Epic Games


> Unable to Perform Graceful Shutdown in Spark Streaming with Hadoop 2.8
> ----------------------------------------------------------------------
>
>                 Key: SPARK-25020
>                 URL: https://issues.apache.org/jira/browse/SPARK-25020
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1
>         Environment: Spark Streaming
> -- Tested on 2.2 & 2.3 (more than likely affects all versions with graceful 
> shutdown) 
> Hadoop 2.8
>            Reporter: Ricky Saltzer
>            Priority: Major
>
> Opening this up to give you guys some insight in an issue that will occur 
> when using Spark Streaming with Hadoop 2.8. 
> Hadoop 2.8 added HADOOP-12950 which adds a upper limit of a 10 second timeout 
> for its shutdown hook. From our tests, if the Spark job takes longer than 10 
> seconds to shutdown gracefully, the Hadoop shutdown thread seems to trample 
> over the graceful shutdown and throw an exception like
> {code:java}
> 18/08/03 17:21:04 ERROR Utils: Uncaught exception in thread pool-1-thread-1
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.stop(ReceiverTracker.scala:177)
> at 
> org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:114)
> at 
> org.apache.spark.streaming.StreamingContext$$anonfun$stop$1.apply$mcV$sp(StreamingContext.scala:682)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1317)
> at 
> org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:681)
> at 
> org.apache.spark.streaming.StreamingContext.org$apache$spark$streaming$StreamingContext$$stopOnShutdown(StreamingContext.scala:715)
> at 
> org.apache.spark.streaming.StreamingContext$$anonfun$start$1.apply$mcV$sp(StreamingContext.scala:599)
> at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1948)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
> at scala.util.Try$.apply(Try.scala:192)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
> The reason I hit this issue is because we recently upgraded to EMR 5.15, 
> which has both Spark 2.3 & Hadoop 2.8. The following workaround has proven 
> successful to us (in limited testing)
> Instead of just running
> {code:java}
> ...
> ssc.start()
> ssc.awaitTermination(){code}
> We needed to do the following
> {code:java}
> ...
> ssc.start()
> sys.ShutdownHookThread {
>   ssc.stop(true, true)
> }
> ssc.awaitTermination(){code}
> As far as I can tell, there is no way to override the default {{10 second}} 
> timeout in HADOOP-12950, which is why we had to go with the workaround. 
> Note: I also verified this bug exists even with EMR 5.12.1 which runs Spark 
> 2.2.x & Hadoop 2.8. 
> Ricky
>  Epic Games



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to