Hi Rob,

Interesting topic and affects UX a lot. I provided my thoughts in the
related jira.

Best,
Stavros

On Fri, Oct 5, 2018 at 5:53 PM, Rob Vesse <rve...@dotnetrdf.org> wrote:

> Folks
>
>
>
> One of the big limitations of the current Spark on K8S implementation is
> that it isn’t possible to use local dependencies (SPARK-23153 [1]) i.e.
> code, JARs, data etc that only lives on the submission client.  This
> basically leaves end users with several options on how to actually run
> their Spark jobs under K8S:
>
>
>
>    1. Store local dependencies on some external distributed file system
>    e.g. HDFS
>    2. Build custom images with their local dependencies
>    3. Mount local dependencies into volumes that are mounted by the K8S
>    pods
>
>
>
> In all cases the onus is on the end user to do the prep work.  Option 1 is
> unfortunately rare in the environments we’re looking to deploy Spark and
> Option 2 tends to be a non-starter as many of our customers whitelist
> approved images i.e. custom images are not permitted.
>
>
>
> Option 3 is more workable but still requires the users to provide a bunch
> of extra config options to configure this for simple cases or rely upon the
> pending pod template feature for complex cases.
>
>
>
> Ideally this would all just be handled automatically for users in the way
> that all other resource managers do, the K8S backend even did this at one
> point in the downstream fork but after a long discussion [2] this got
> dropped in favour of using Spark standard mechanisms i.e. spark-submit.
> Unfortunately this apparently was never followed through upon as it doesn’t
> work with master as of today.  Moreover I am unclear how this would work in
> the case of Spark on K8S cluster mode where the driver itself is inside a
> pod since the spark-submit mechanism is based upon copying from the drivers
> filesystem to the executors via a file server on the driver, if the driver
> is inside a pod it won’t be able to see local files on the submission
> client.  I think this may work out of the box with client mode but I
> haven’t dug into that enough to verify yet.
>
>
>
> I would like to start work on addressing this problem but to be honest I
> am unclear where to start with this.  It seems using the standard
> spark-submit mechanism is the way to go but I’m not sure how to get around
> the driver pod issue.  I would appreciate any pointers from folks who’ve
> looked at this previously on how and where to start on this.
>
>
>
> Cheers,
>
>
>
> Rob
>
>
>
> [1] https://issues.apache.org/jira/browse/SPARK-23153
>
> [2] https://lists.apache.org/thread.html/82b4ae9a2eb5ddeb3f7240ebf154f0
> 6f19b830f8b3120038e5d687a1@%3Cdev.spark.apache.org%3E
>

Reply via email to