[ https://issues.apache.org/jira/browse/KUDU-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke reassigned KUDU-3099: --------------------------------- Assignee: Waleed Fateem > KuduBackup/KuduRestore System.exit(0) results in Spark on YARN failure with > exitCode: 16 > ---------------------------------------------------------------------------------------- > > Key: KUDU-3099 > URL: https://issues.apache.org/jira/browse/KUDU-3099 > Project: Kudu > Issue Type: Bug > Components: backup, spark > Affects Versions: 1.10.0, 1.11.0 > Reporter: Waleed Fateem > Assignee: Waleed Fateem > Priority: Major > > When running KuduBackup/KuduRestore the underlying Spark application can fail > when running on YARN even when the backup/restore tasks complete > successfully. The following was from the Spark driver log: > {code:java} > INFO spark.SparkContext: Submitted application: Kudu Table Backup > .. > INFO spark.SparkContext: Starting job: save at KuduBackup.scala:90 > INFO scheduler.DAGScheduler: Got job 0 (save at KuduBackup.scala:90) with 200 > output partitions > scheduler.DAGScheduler: Final stage: ResultStage 0 (save at > KuduBackup.scala:90) > .. > INFO scheduler.DAGScheduler: Submitting 200 missing tasks from ResultStage 0 > (MapPartitionsRDD[2] at save at KuduBackup.scala:90) (first 15 tasks are for > partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) > INFO cluster.YarnClusterScheduler: Adding task set 0.0 with 200 tasks > .. > INFO cluster.YarnClusterScheduler: Removed TaskSet 0.0, whose tasks have all > completed, from pool > INFO scheduler.DAGScheduler: Job 0 finished: save at KuduBackup.scala:90, > took 20.007488 s > .. > INFO spark.SparkContext: Invoking stop() from shutdown hook > .. > INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors > .. > INFO spark.SparkContext: Successfully stopped SparkContext > INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: > Shutdown hook called before final status was reported.) > INFO util.ShutdownHookManager: Shutdown hook called{code} > Spark explicitly added this shutdown hook to catch System.exit() calls and in > case this occurs before the SparkContext stops then the application status is > considered a failure: > [https://github.com/apache/spark/blob/branch-2.3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L299] > The System.exit() call added as part of KUDU-2787 can cause this race > condition and that was merged in the 1.10.x and 1.11.x branches. > -- This message was sent by Atlassian Jira (v8.3.4#803005)