Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20059#discussion_r158568089 --- Diff: docs/running-on-kubernetes.md --- @@ -120,6 +120,23 @@ by their appropriate remote URIs. Also, application dependencies can be pre-moun Those dependencies can be added to the classpath by referencing them with `local://` URIs and/or setting the `SPARK_EXTRA_CLASSPATH` environment variable in your Dockerfiles. +### Using Remote Dependencies +When there are application dependencies hosted in remote locations like HDFS or HTTP servers, the driver and executor pods need a Kubernetes [init-container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) for downloading the dependencies so the driver and executor containers can use them locally. This requires users to specify the container image for the init-container using the configuration property `spark.kubernetes.initContainer.image`. For example, users simply add the following option to the `spark-submit` command to specify the init-container image: --- End diff -- Regarding examples, I can add one spark-submit example showing how to use remote jars/files on http/https and hdfs. But gcs requires the connector in the init-container, which is non-trivial. I'm not sure about s3. I think we should avoid doing so.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org