[ https://issues.apache.org/jira/browse/SPARK-37572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dagang Wei updated SPARK-37572: ------------------------------- Affects Version/s: 3.2.0 (was: 3.3.0) > Flexible ways of launching executors > ------------------------------------ > > Key: SPARK-37572 > URL: https://issues.apache.org/jira/browse/SPARK-37572 > Project: Spark > Issue Type: New Feature > Components: Deploy > Affects Versions: 3.2.0 > Reporter: Dagang Wei > Priority: Major > > Currently Spark launches executor processes by constructing and running > commands [1], for example: > {code:java} > /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/jre/bin/java -cp > /opt/spark-3.2.0-bin-hadoop3.2/conf/:/opt/spark-3.2.0-bin-hadoop3.2/jars/* > -Xmx1024M -Dspark.driver.port=35729 > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://coarsegrainedschedu...@dagang.svl.corp.google.com:35729 --executor-id > 0 --hostname 100.116.124.193 --cores 6 --app-id app-20211207131146-0002 > --worker-url spark://Worker@100.116.124.193:45287 {code} > But there are use cases which require more flexible ways of launching > executors. In particular, our use case is that we run Spark in standalone > mode, Spark master and workers are running in VMs. We want to allow Spark app > developers to provide custom container images to customize the job runtime > environment (typically Java and Python dependencies), so executors (which run > the job code) need to run in Docker containers. > After investigating in the source code, we found that the concept of Spark > Command Runner might be a good solution. Basically, we want to introduce an > optional Spark command runner in Spark, so that instead of running the > command to launch executor directly, it passes the command to the runner, > which the runner then runs the command with its own strategy which could be > running in Docker, or by default running the command directly. > The runner should be customizable through an env variable > `SPARK_COMMAND_RUNNER`, which by default could be a simple script like: > {code:java} > #!/bin/bash > exec "$@" {code} > or in the case of Docker container: > {code:java} > #!/bin/bash > docker run ... – "$@" {code} > > I already have a patch for this feature and have tested in our environment. > > [1]: > [https://github.com/apache/spark/blob/v3.2.0/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala#L52] -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org