[ 
https://issues.apache.org/jira/browse/SPARK-24655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821270#comment-16821270
 ] 

Thomas Graves commented on SPARK-24655:
---------------------------------------

>From the linked issues it seems the goals would be:
 * support more then alpine image base - ie a glibc version
 * Allow for adding at least certain support like GPUs - although this may just 
making base image configurable
 * Allow for overriding the start commands for things like using jupyter docker 
images.
 * add in python pip requirements, and I assume would be nice for R, is there 
something generic we can do to make this easy

Correct me if I'm wrong but anything spark related you should be able to use 
SPARK confs for, like env variables. like 
{{spark.kubernetes.driverEnv.[EnvironmentVariableName]}} and 
spark.executorEnv..  Otherwise you could just use the dockerfile built here as 
a base and build on it. 

I think we would just want to try to make it easy for the common cases and 
allow users to override things we may have hardcoded to allow them to reuse it 
as a base.

[~mcheah] From the original description, why do we want to try to not rebuild 
the image if spark version changes? It seems ok to allow them to override to 
point to their own spark version (which they could then use to do this), but I 
would think normally you would build a new docker image for a new version of 
spark? Dependencies may have changed, the docker template may have changed, 
etc..  It seems if they really wanted this, they would just specify their own 
docker image as a base and just add the spark pieces, is that what you are 
getting at?  We can make the base image a argument to the docker-image-tool.sh 
script

> [K8S] Custom Docker Image Expectations and Documentation
> --------------------------------------------------------
>
>                 Key: SPARK-24655
>                 URL: https://issues.apache.org/jira/browse/SPARK-24655
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.3.1
>            Reporter: Matt Cheah
>            Priority: Major
>
> A common use case we want to support with Kubernetes is the usage of custom 
> Docker images. Some examples include:
>  * A user builds an application using Gradle or Maven, using Spark as a 
> compile-time dependency. The application's jars (both the custom-written jars 
> and the dependencies) need to be packaged in a docker image that can be run 
> via spark-submit.
>  * A user builds a PySpark or R application and desires to include custom 
> dependencies
>  * A user wants to switch the base image from Alpine to CentOS while using 
> either built-in or custom jars
> We currently do not document how these custom Docker images are supposed to 
> be built, nor do we guarantee stability of these Docker images with various 
> spark-submit versions. To illustrate how this can break down, suppose for 
> example we decide to change the names of environment variables that denote 
> the driver/executor extra JVM options specified by 
> {{spark.[driver|executor].extraJavaOptions}}. If we change the environment 
> variable spark-submit provides then the user must update their custom 
> Dockerfile and build new images.
> Rather than jumping to an implementation immediately though, it's worth 
> taking a step back and considering these matters from the perspective of the 
> end user. Towards that end, this ticket will serve as a forum where we can 
> answer at least the following questions, and any others pertaining to the 
> matter:
>  # What would be the steps a user would need to take to build a custom Docker 
> image, given their desire to customize the dependencies and the content (OS 
> or otherwise) of said images?
>  # How can we ensure the user does not need to rebuild the image if only the 
> spark-submit version changes?
> The end deliverable for this ticket is a design document, and then we'll 
> create sub-issues for the technical implementation and documentation of the 
> contract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to