[ 
https://issues.apache.org/jira/browse/SPARK-24655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770115#comment-16770115
 ] 

Ondrej Kokes commented on SPARK-24655:
--------------------------------------

I would expect there to be a Dockerfile that would accept my 
requirements.txt/Pipfile (in case of PySpark) and install everything in there 
using standard command, so that I wouldn't have to do anything other than 
docker build. And as I noted in the duplicate issue, the base distro would need 
to be glibc-based.

The only deviation from this workflow would be if I needed to add anything 
extra into the image, say custom certificates, inject environment variables, or 
custom repositories (though some of this could be handled by Kubernetes 
itself). But at least I'd have a starting point - a Dockerfile.

Slightly off topic: you mention build-time dependencies, but there could be 
cases where we'd need to install stuff at runtime - e.g. in a Zeppelin/Jupyter 
scenario. Not sure if that affects this in any way, but it's a workflow that 
should be supported as well.

> [K8S] Custom Docker Image Expectations and Documentation
> --------------------------------------------------------
>
>                 Key: SPARK-24655
>                 URL: https://issues.apache.org/jira/browse/SPARK-24655
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.3.1
>            Reporter: Matt Cheah
>            Priority: Major
>
> A common use case we want to support with Kubernetes is the usage of custom 
> Docker images. Some examples include:
>  * A user builds an application using Gradle or Maven, using Spark as a 
> compile-time dependency. The application's jars (both the custom-written jars 
> and the dependencies) need to be packaged in a docker image that can be run 
> via spark-submit.
>  * A user builds a PySpark or R application and desires to include custom 
> dependencies
>  * A user wants to switch the base image from Alpine to CentOS while using 
> either built-in or custom jars
> We currently do not document how these custom Docker images are supposed to 
> be built, nor do we guarantee stability of these Docker images with various 
> spark-submit versions. To illustrate how this can break down, suppose for 
> example we decide to change the names of environment variables that denote 
> the driver/executor extra JVM options specified by 
> {{spark.[driver|executor].extraJavaOptions}}. If we change the environment 
> variable spark-submit provides then the user must update their custom 
> Dockerfile and build new images.
> Rather than jumping to an implementation immediately though, it's worth 
> taking a step back and considering these matters from the perspective of the 
> end user. Towards that end, this ticket will serve as a forum where we can 
> answer at least the following questions, and any others pertaining to the 
> matter:
>  # What would be the steps a user would need to take to build a custom Docker 
> image, given their desire to customize the dependencies and the content (OS 
> or otherwise) of said images?
>  # How can we ensure the user does not need to rebuild the image if only the 
> spark-submit version changes?
> The end deliverable for this ticket is a design document, and then we'll 
> create sub-issues for the technical implementation and documentation of the 
> contract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to