[ 
https://issues.apache.org/jira/browse/SPARK-42837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42837:
---------------------------------
    Component/s: Kubernetes

> spark-submit - issue when resolving dependencies hosted on a private 
> repository in kubernetes cluster mode
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-42837
>                 URL: https://issues.apache.org/jira/browse/SPARK-42837
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Spark Submit
>    Affects Versions: 3.3.2
>            Reporter: lione Herbet
>            Priority: Minor
>
> When using [spark 
> operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator], if 
> dependencies are hosted on a private repository with authentication needed 
> (like S3 or OCI) the spark operator submitting the job need to have all the 
> secrets to access all dependencies. If not the spark-submit fails.
> On a multi tenant kubernetes cluster where the spark operator and spark jobs 
> execution are on seperate namespaces, it involves duplicating all secrets or 
> it won't work.
> It seems that spark-submit need to acces dependencies (with credentials) only 
> to resolveGlobPath 
> ([https://github.com/apache/spark/blob/v3.3.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L364-L367)]
>  . It seems to me (but need to be confirmed by someone more skilled than me 
> on spark internals behavior) that this resolveGlobPath task is also done when 
> the driver is downloading the jars.
> Would it be possible to have this resolveGlobPath task skipped when running 
> on a  Kubernetes Cluster in cluster mode ?
> For example add a condition like this arround the 364-367 lines :
> {code:java}
> if (isKubernetesCluster) {
> ...
> } {code}
> We could even, for compatibility reason with old behavior if needed, add also 
> a condition on a spark parameter like this :
> {code:java}
> if (isKubernetesCluster && 
> sparkConf.getBoolean("spark.kubernetes.resolevGlobPathsInSubmit", true)) { 
> ...
> }{code}
> i tested both solution locally and it seems to resolve the case.
> Do yout think I need to consider other elements ?
> I may submit a patch depending on your feedback



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to