[jira] [Updated] (OOZIE-3286) [spark-action] Launch Spark application from Oozie server JVM
[ https://issues.apache.org/jira/browse/OOZIE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Piros updated OOZIE-3286: Summary: [spark-action] Launch Spark application from Oozie server JVM (was: [spark-action] Call SparkLauncher from Oozie server JVM) > [spark-action] Launch Spark application from Oozie server JVM > - > > Key: OOZIE-3286 > URL: https://issues.apache.org/jira/browse/OOZIE-3286 > Project: Oozie > Issue Type: New Feature > Components: action, core >Affects Versions: 5.0.0 >Reporter: Andras Piros >Assignee: Andras Piros >Priority: Major > > This is a major refactor / rework of the Spark action. > Today Oozie Spark actions run as follows: > # {{SparkActionExecutor extends JavaActionExecutor}} (running as part of > Oozie server code) launches a {{SparkMain}} (running as YARN application) the > usual way on YARN using {{LauncherAM}} > # {{SparkMain}} runs on a YARN NodeManager container and calls from the same > JVM > [*{{SparkSubmit}}*|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] > # {{SparkSubmit}} fires up a Spark driver: > ** in {{local}} or {{yarn-client}} mode in the same YARN NodeManager JVM > where {{SparkMain}} runs. In {{yarn-client}} mode Spark's > [*{{ApplicationMaster}}*|https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala] > runs inside the same JVM where the driver runs > ** [*in {{yarn-cluster}} > mode*|https://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn] > Spark driver runs in a different JVM than Spark's ApplicationMaster, maybe > on a different host. It can go away after the YARN application has been > submitted > Problems with this approach are: > * too many levels of indirection cause lots of latency, and a whole new world > of communication errors can happen > * since {{SparkSubmit}} is launched from the same JVM where {{SparkMain}} > runs, the Spark application will share all the environment variables, > classpath, sharelib dependencies etc. with Oozie's Spark launcher code, > causing hard-to-nail-down environment and classpath issues > The future of Oozie Spark action launching looks like this: > # {{SparkActionExecutor}} (running as part of Oozie server code) launches an > [*{{InProcessLauncher}}*|https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/InProcessLauncher.java] > (available since Spark 2.3) always in {{yarn-cluster}} mode > # {{InProcessLauncher}} calls either {{InProcessSparkSubmit}} or > {{SparkSubmit}}. We translate any Spark application modes to > {{yarn-cluster}}, so that Spark driver will run in a JVM different than the > Spark driver > # this allows for much better resource usage, since we have one YARN > ApplicationMaster (Oozie's {{LauncherAM}}) less > # since the Spark driver and executor are always launched in different JVMs, > we don't have any interference of environment variables, driver, or executor > classpath -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (OOZIE-3286) [spark-action] Launch Spark application from Oozie server
[ https://issues.apache.org/jira/browse/OOZIE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Piros updated OOZIE-3286: Summary: [spark-action] Launch Spark application from Oozie server (was: [spark-action] Launch Spark application from Oozie server JVM) > [spark-action] Launch Spark application from Oozie server > - > > Key: OOZIE-3286 > URL: https://issues.apache.org/jira/browse/OOZIE-3286 > Project: Oozie > Issue Type: New Feature > Components: action, core >Affects Versions: 5.0.0 >Reporter: Andras Piros >Assignee: Andras Piros >Priority: Major > > This is a major refactor / rework of the Spark action. > Today Oozie Spark actions run as follows: > # {{SparkActionExecutor extends JavaActionExecutor}} (running as part of > Oozie server code) launches a {{SparkMain}} (running as YARN application) the > usual way on YARN using {{LauncherAM}} > # {{SparkMain}} runs on a YARN NodeManager container and calls from the same > JVM > [*{{SparkSubmit}}*|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] > # {{SparkSubmit}} fires up a Spark driver: > ** in {{local}} or {{yarn-client}} mode in the same YARN NodeManager JVM > where {{SparkMain}} runs. In {{yarn-client}} mode Spark's > [*{{ApplicationMaster}}*|https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala] > runs inside the same JVM where the driver runs > ** [*in {{yarn-cluster}} > mode*|https://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn] > Spark driver runs in a different JVM than Spark's ApplicationMaster, maybe > on a different host. It can go away after the YARN application has been > submitted > Problems with this approach are: > * too many levels of indirection cause lots of latency, and a whole new world > of communication errors can happen > * since {{SparkSubmit}} is launched from the same JVM where {{SparkMain}} > runs, the Spark application will share all the environment variables, > classpath, sharelib dependencies etc. with Oozie's Spark launcher code, > causing hard-to-nail-down environment and classpath issues > The future of Oozie Spark action launching looks like this: > # {{SparkActionExecutor}} (running as part of Oozie server code) launches an > [*{{InProcessLauncher}}*|https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/InProcessLauncher.java] > (available since Spark 2.3) always in {{yarn-cluster}} mode > # {{InProcessLauncher}} calls either {{InProcessSparkSubmit}} or > {{SparkSubmit}}. We translate any Spark application modes to > {{yarn-cluster}}, so that Spark driver will run in a JVM different than the > Spark driver > # this allows for much better resource usage, since we have one YARN > ApplicationMaster (Oozie's {{LauncherAM}}) less > # since the Spark driver and executor are always launched in different JVMs, > we don't have any interference of environment variables, driver, or executor > classpath -- This message was sent by Atlassian JIRA (v7.6.3#76005)