[jira] [Commented] (SPARK-38223) PersistentVolumeClaim does not work in clusters with multiple nodes

Zimo Li (Jira) Wed, 16 Feb 2022 06:35:07 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-38223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493259#comment-17493259
 ]


Zimo Li commented on SPARK-38223:
---------------------------------

[~dongjoon] this may relate to your changes since you added 
ownPersistentVolumeClaim.

> PersistentVolumeClaim does not work in clusters with multiple nodes
> -------------------------------------------------------------------
>
>                 Key: SPARK-38223
>                 URL: https://issues.apache.org/jira/browse/SPARK-38223
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.2.1
>         Environment: 
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#how-it-works]
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes]
> [https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes]
>  
>            Reporter: Zimo Li
>            Priority: Minor
>
> We are using {{spark-submit}} to establish a ThriftServer warehouse on Google 
> Kubernetes Engine. The Spark documentation on running on Kubernetes suggests 
> that we can use 
> [persistentVolumeClaim|https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim]
>  for Spark applications.
> {code:bash}
> spark-submit \
>   --master k8s://$KUBERNETES_SERVICE_HOST \
>   --deploy-mode cluster \
>   --class $THRIFTSERVER \
>   --conf spark.sql.catalogImplementation=hive \
>   --conf spark.sql.hive.metastore.sharedPrefixes=org.postgresql \
>   --conf spark.hadoop.hive.metastore.schema.verification=false \
>   --conf spark.hadoop.datanucleus.schema.autoCreateTables=true \
>   --conf spark.hadoop.datanucleus.autoCreateSchema=false \
>   --conf spark.sql.parquet.int96RebaseModeInWrite=CORRECTED \
>   --conf 
> spark.hadoop.javax.jdo.option.ConnectionDriverName=org.postgresql.Driver \
>   --conf spark.hadoop.javax.jdo.option.ConnectionUserName=spark \
>   --conf spark.hadoop.javax.jdo.option.ConnectionPassword=Password1! \
>   --conf spark.sql.warehouse.dir=$MOUNT_PATH \
>   --conf spark.kubernetes.driver.pod.name=spark-hive-thriftserver-driver \
>   --conf spark.kubernetes.driver.label.app.kubernetes.io/name=thriftserver \
>   --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.$VOLUME_NAME.options.claimName=$CLAIM_NAME
>  \
>   --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.$VOLUME_NAME.mount.path=$MOUNT_PATH
>  \
>   --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.$VOLUME_NAME.mount.readOnly=false
>  \
>   --conf 
> spark.kubernetes.driver.volumes.persistentVolumeClaim.$VOLUME_NAME.options.claimName=$CLAIM_NAME
>  \
>   --conf 
> spark.kubernetes.driver.volumes.persistentVolumeClaim.$VOLUME_NAME.mount.path=$MOUNT_PATH
>  \
>   --conf 
> spark.kubernetes.driver.volumes.persistentVolumeClaim.$VOLUME_NAME.mount.readOnly=false
>  \
>   --conf spark.kubernetes.executor.deleteOnTermination=true \
>   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-kube \
>   --conf spark.kubernetes.container.image=$IMAGE \
>   --conf spark.kubernetes.container.image.pullPolicy=Always \
>   --conf spark.executor.memory=2g \
>   --conf spark.driver.memory=2g \
>   local:///$JAR {code}
> When it ran, it created one driver and two executors. Each of these wanted to 
> use the same pvc. Unfortunately, at least one of these pods was scheduled on 
> a different node from the rest. As GKE mounts pvs to nodes in order to honor 
> pvcs for pods, that odd pod out was unable to attach the pv:
> {code:java}
> FailedMount
> Unable to attach or mount volumes: unmounted volumes=[spark-warehouse], 
> unattached volumes=[kube-api-access-grfld spark-conf-volume-exec 
> spark-warehouse spark-local-dir-1]: timed out waiting for the condition {code}
> This is because GKE like many cloud providers does not support 
> {{ReadWriteMany}} for pvcs/pvs.
> ----
> I suggest changing the documentation not to suggest using pvcs for 
> ThriftServers.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38223) PersistentVolumeClaim does not work in clusters with multiple nodes

Reply via email to