[jira] [Commented] (SPARK-31666) Cannot map hostPath volumes to container

Dongjoon Hyun (Jira) Sat, 04 Jul 2020 09:42:26 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151363#comment-17151363
 ]


Dongjoon Hyun commented on SPARK-31666:
---------------------------------------

First of all, the following is Spark 2.4 behavior since 2.4.0. It's not a bug.
{quote}In Spark 2.4, the `LocalDirsFeatureStep` iterates through the list of 
paths in `spark.local.dir`. For each one, it creates a Kubernetes volume of 
mount type `emptyDir` with the name `spark-local-dir-${index}`.
{quote}
The following is a wrong use case because Spark 2.4.x features are not designed 
for that. 
{quote}The issue is that I need my Spark job to use paths from my host machine 
that are on a mount point that isn't part of the directory which Kubernetes 
uses to allocate space for `emptyDir` volumes. Therefore, I mount these paths 
as type `hostPath` and ask Spark to use them as local directory space.
{quote}
Please note that the error message came from K8s. Apache Spark starts to 
support your use case in Apache Spark 3.0 by adding new features.

I guess you are confused on the issue types in the open source projects. Not 
only Apache Spark, All Apache Projects distinguishes `New Improvement` and `New 
Feature` from `Bug`. Many new improvements and features are adding those kind 
of unsupported stuffs inside old versions. We cannot backport everything into 
old branches. All committers and developers are already moving to Apache Spark 
3.1.0.

 

For the following, you misunderstand again. We didn't kill 2.4.x like 1.6.x. 
Historically, 1.6 was killed at 1.6.3. For 2.4.x, you can still use Apache 
Spark 2.4.6 and more. That's the reason why Apache Spark community declared 2.4 
as LTS (long term support). [https://spark.apache.org/versioning-policy.html.] 
We will maintain with critical bug fixes and security fixes like 
[https://spark.apache.org/security.html] . However, 2.4.7 (or 2.4.8) will be 
the same with 2.4.0~2.4.6 in terms of the features. That's the community policy.

>  I feel giving folks 6 months to migrate from one Spark release to the next 
>is fair, especially now considering how mature Spark is as a project. What are 
>your thoughts on this?

> Cannot map hostPath volumes to container
> ----------------------------------------
>
>                 Key: SPARK-31666
>                 URL: https://issues.apache.org/jira/browse/SPARK-31666
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Core
>    Affects Versions: 2.4.5
>            Reporter: Stephen Hopper
>            Priority: Major
>
> I'm trying to mount additional hostPath directories as seen in a couple of 
> places:
> [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/]
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space]
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes]
>  
> However, whenever I try to submit my job, I run into this error:
> {code:java}
> Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │
>  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. 
> Message: Pod "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique. Received status: Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath,
>  message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, 
> name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=Pod 
> "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).{code}
>  
> This is my spark-submit command (note: I've used my own build of spark for 
> kubernetes as well as a few other images that I've seen floating around (such 
> as this one seedjeffwan/spark:v2.4.5) and they all have this same issue):
> {code:java}
> bin/spark-submit \
>  --master k8s://https://my-k8s-server:443 \
>  --deploy-mode cluster \
>  --name spark-pi \
>  --class org.apache.spark.examples.SparkPi \
>  --conf spark.executor.instances=2 \
>  --conf spark.kubernetes.container.image=my-spark-image:my-tag \
>  --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \
>  --conf spark.kubernetes.namespace=my-spark-ns \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 
> \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1
>  \
>  --conf spark.local.dir="/tmp1" \
>  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
>  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 20000{code}
> Any ideas on what's causing this?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31666) Cannot map hostPath volumes to container

Reply via email to