GitHub user gaborgsomogyi opened a pull request:

    https://github.com/apache/spark/pull/20807

    SPARK-23660: Fix exception in yarn cluster mode when application ended fast

    ## What changes were proposed in this pull request?
    
    Yarn throws the following exception in cluster mode when the application is 
really small:
    
    ```
    18/03/07 23:34:22 WARN netty.NettyRpcEnv: Ignored failure: 
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7c974942 
rejected from 
java.util.concurrent.ScheduledThreadPoolExecutor@1eea9d2d[Terminated, pool size 
= 0, active threads = 0, queued tasks = 0, completed tasks = 0]
    18/03/07 23:34:22 ERROR yarn.ApplicationMaster: Uncaught exception: 
    org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
        at 
org.apache.spark.deploy.yarn.YarnAllocator.<init>(YarnAllocator.scala:102)
        at 
org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:77)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:450)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:493)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:810)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:809)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:834)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
    Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already 
stopped.
        at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
        at 
org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
        at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
        ... 17 more
    18/03/07 23:34:22 INFO yarn.ApplicationMaster: Final app status: FAILED, 
exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: 
Exception thrown in awaitResult: )
    ```
    
    Example application:
    
    ```
    object ExampleApp {
      def main(args: Array[String]): Unit = {
        val conf = new SparkConf().setAppName("ExampleApp")
        val sc = new SparkContext(conf)
        try {
          // Do nothing
        } finally {
          sc.stop()
        }
      }
    ```
    
    This PR makes `initialExecutorIdCounter ` lazy. This way `YarnAllocator` 
can be instantiated even if the driver already ended.
    
    ## How was this patch tested?
    
    Automated: Additional unit test added
    Manual: Application submitted into small cluster


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborgsomogyi/spark SPARK-23660

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20807.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20807
    
----
commit 114ac05102c9d563c922447423ec8445bb37e9ef
Author: Gabor Somogyi <gabor.g.somogyi@...>
Date:   2018-03-13T04:23:59Z

    SPARK-23660: Fix exception in yarn cluster mode when application ended fast

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to