Hello, Me again. I was playing some more with the kubernetes backend and the whole init container thing seemed unnecessary to me.
Currently it's used to download remote jars and files, mount the volume into the driver / executor, and place those jars in the classpath / move the files to the working directory. This is all stuff that spark-submit already does without needing extra help. So I spent some time hacking stuff and removing the init container code, and launching the driver inside kubernetes using spark-submit (similar to how standalone and mesos cluster mode works): https://github.com/vanzin/spark/commit/k8s-no-init I'd like to point out the output of "git show --stat" for that diff: 29 files changed, 130 insertions(+), 1560 deletions(-) You get massive code reuse by simply using spark-submit. The remote dependencies are downloaded in the driver, and the driver does the job of service them to executors. So I guess my question is: is there any advantage in using an init container? The current init container code can download stuff in parallel, but that's an easy improvement to make in spark-submit and that would benefit everybody. You can argue that executors downloading from external servers would be faster than downloading from the driver, but I'm not sure I'd agree - it can go both ways. Also the same idea could probably be applied to starting executors; Mesos starts executors using "spark-class" already, so doing that would both improve code sharing and potentially simplify some code in the k8s backend. -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org