[ 
https://issues.apache.org/jira/browse/SPARK-37572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dagang Wei updated SPARK-37572:
-------------------------------
    Affects Version/s: 3.3.0
                           (was: 3.2.0)

> Flexible ways of launching executors
> ------------------------------------
>
>                 Key: SPARK-37572
>                 URL: https://issues.apache.org/jira/browse/SPARK-37572
>             Project: Spark
>          Issue Type: New Feature
>          Components: Deploy
>    Affects Versions: 3.3.0
>            Reporter: Dagang Wei
>            Priority: Major
>
> Currently Spark launches executor processes by constructing and running 
> commands [1], for example:
> {code:java}
> /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/jre/bin/java -cp 
> /opt/spark-3.2.0-bin-hadoop3.2/conf/:/opt/spark-3.2.0-bin-hadoop3.2/jars/* 
> -Xmx1024M -Dspark.driver.port=35729 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://coarsegrainedschedu...@dagang.svl.corp.google.com:35729 --executor-id 
> 0 --hostname 100.116.124.193 --cores 6 --app-id app-20211207131146-0002 
> --worker-url spark://Worker@100.116.124.193:45287 {code}
> But there are use cases which require more flexible ways of launching 
> executors. In particular, our use case is that we run Spark in standalone 
> mode, Spark master and workers are running in VMs. We want to allow Spark app 
> developers to provide custom container images to customize the job runtime 
> environment (typically Java and Python dependencies), so executors (which run 
> the job code) need to run in Docker containers.
> After investigating in the source code, we found that the concept of Spark 
> Command Runner might be a good solution. Basically, we want to introduce an 
> optional Spark command runner in Spark, so that instead of running the 
> command to launch executor directly, it passes the command to the runner, 
> which the runner then runs the command with its own strategy which could be 
> running in Docker, or by default running the command directly.
> The runner should be customizable through an env variable 
> `SPARK_COMMAND_RUNNER`, which by default could be a simple script like:
> {code:java}
> #!/bin/bash
> exec "$@" {code}
> or in the case of Docker container:
> {code:java}
> #!/bin/bash
> docker run ... – "$@" {code}
>  
> I already have a patch for this feature and have tested in our environment.
>  
> [1]: 
> [https://github.com/apache/spark/blob/v3.2.0/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala#L52]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to