Hello,

Me again. I was playing some more with the kubernetes backend and the
whole init container thing seemed unnecessary to me.

Currently it's used to download remote jars and files, mount the
volume into the driver / executor, and place those jars in the
classpath / move the files to the working directory. This is all stuff
that spark-submit already does without needing extra help.

So I spent some time hacking stuff and removing the init container
code, and launching the driver inside kubernetes using spark-submit
(similar to how standalone and mesos cluster mode works):

https://github.com/vanzin/spark/commit/k8s-no-init

I'd like to point out the output of "git show --stat" for that diff:
 29 files changed, 130 insertions(+), 1560 deletions(-)

You get massive code reuse by simply using spark-submit. The remote
dependencies are downloaded in the driver, and the driver does the job
of service them to executors.

So I guess my question is: is there any advantage in using an init container?

The current init container code can download stuff in parallel, but
that's an easy improvement to make in spark-submit and that would
benefit everybody. You can argue that executors downloading from
external servers would be faster than downloading from the driver, but
I'm not sure I'd agree - it can go both ways.

Also the same idea could probably be applied to starting executors;
Mesos starts executors using "spark-class" already, so doing that
would both improve code sharing and potentially simplify some code in
the k8s backend.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to