Hi Rob, Interesting topic and affects UX a lot. I provided my thoughts in the related jira.
Best, Stavros On Fri, Oct 5, 2018 at 5:53 PM, Rob Vesse <rve...@dotnetrdf.org> wrote: > Folks > > > > One of the big limitations of the current Spark on K8S implementation is > that it isn’t possible to use local dependencies (SPARK-23153 [1]) i.e. > code, JARs, data etc that only lives on the submission client. This > basically leaves end users with several options on how to actually run > their Spark jobs under K8S: > > > > 1. Store local dependencies on some external distributed file system > e.g. HDFS > 2. Build custom images with their local dependencies > 3. Mount local dependencies into volumes that are mounted by the K8S > pods > > > > In all cases the onus is on the end user to do the prep work. Option 1 is > unfortunately rare in the environments we’re looking to deploy Spark and > Option 2 tends to be a non-starter as many of our customers whitelist > approved images i.e. custom images are not permitted. > > > > Option 3 is more workable but still requires the users to provide a bunch > of extra config options to configure this for simple cases or rely upon the > pending pod template feature for complex cases. > > > > Ideally this would all just be handled automatically for users in the way > that all other resource managers do, the K8S backend even did this at one > point in the downstream fork but after a long discussion [2] this got > dropped in favour of using Spark standard mechanisms i.e. spark-submit. > Unfortunately this apparently was never followed through upon as it doesn’t > work with master as of today. Moreover I am unclear how this would work in > the case of Spark on K8S cluster mode where the driver itself is inside a > pod since the spark-submit mechanism is based upon copying from the drivers > filesystem to the executors via a file server on the driver, if the driver > is inside a pod it won’t be able to see local files on the submission > client. I think this may work out of the box with client mode but I > haven’t dug into that enough to verify yet. > > > > I would like to start work on addressing this problem but to be honest I > am unclear where to start with this. It seems using the standard > spark-submit mechanism is the way to go but I’m not sure how to get around > the driver pod issue. I would appreciate any pointers from folks who’ve > looked at this previously on how and where to start on this. > > > > Cheers, > > > > Rob > > > > [1] https://issues.apache.org/jira/browse/SPARK-23153 > > [2] https://lists.apache.org/thread.html/82b4ae9a2eb5ddeb3f7240ebf154f0 > 6f19b830f8b3120038e5d687a1@%3Cdev.spark.apache.org%3E >