[ https://issues.apache.org/jira/browse/SPARK-39965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576516#comment-17576516 ]
pralabhkumar commented on SPARK-39965: -------------------------------------- [~dongjoon] Thx for replying. We don't see an issue except getting exception in the logs (which was mentioned above) . However , please not that , prior to this fix , we were not getting any exception in the logs . Now in scenarios , where PV is not being used by Spark (as in our case), why should we get the above exception in the logs. Currently there is no way to not run Utils.tryLogNonFatalError \{ kubernetesClient .persistentVolumeClaims() .withLabel(SPARK_APP_ID_LABEL, applicationId()) .delete() } IMHO , there should be configuration (which check whether driver own PVC or spark uses PV ). For e.g {code:java} if (conf.get(KUBERNETES_DRIVER_OWN_PVC)) { Utils.tryLogNonFatalError { kubernetesClient .persistentVolumeClaims() .withLabel(SPARK_APP_ID_LABEL, applicationId()) .delete() } } {code} > Spark on K8s delete pvc even though it's not being used. > -------------------------------------------------------- > > Key: SPARK-39965 > URL: https://issues.apache.org/jira/browse/SPARK-39965 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.3.0 > Reporter: pralabhkumar > Priority: Minor > > From Spark32 . as a part of [https://github.com/apache/spark/pull/32288] , > functionality is added to delete PVC if the Spark driver died. > [https://github.com/apache/spark/blob/786a70e710369b195d7c117b33fe9983044014d6/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L144] > > However there are cases , where spark on K8s doesn't use PVC and use host > path for storage. > [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes] > > Now in those cases , > * it request to delete PVC (which is not required) . > * It also tries to delete in the case where driver doesn't own the PV (or > spark.kubernetes.driver.ownPersistentVolumeClaim is false) > * Moreover in the cluster , where Spark user doesn't have access to list or > delete PVC , it throws exception . > > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > GET at: > [https://kubernetes.default.svc/api/v1/namespaces/<>/persistentvolumeclaims?labelSelector=spark-app-selector%3Dspark-332bd09284b3442f8a6a214fabcd6ab1|https://kubernetes.default.svc/api/v1/namespaces/dpi-dev/persistentvolumeclaims?labelSelector=spark-app-selector%3Dspark-332bd09284b3442f8a6a214fabcd6ab1]. > Message: Forbidden!Configured service account doesn't have access. Service > account may have been revoked. persistentvolumeclaims is forbidden: User > "system:serviceaccount:dpi-dev:spark" cannot list resource > "persistentvolumeclaims" in API group "" in the namespace "<>". > > *Solution* > Ideally there should be configuration > spark.kubernetes.driver.pvc.deleteOnTermination or use > spark.kubernetes.driver.ownPersistentVolumeClaim which should be checked > before calling to delete PVC. If user have not set up PV or if the driver > doesn't own then there is no need to call the api and delete PVC . > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org