On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan
<ramanath...@google.com> wrote:
> We can start by getting a PR going perhaps, and start augmenting the
> integration testing to ensure that there are no surprises - with/without
> credentials, accessing GCS, S3 etc as well.
> When we get enough confidence and test coverage, let's merge this in.
> Does that sound like a reasonable path forward?

I think it's beneficial to separate this into two separate things as
far as discussion goes:

- using spark-submit: the code should definitely be starting the
driver using spark-submit, and potentially the executor using
spark-class.

- separately, we can decide on whether to keep or remove init containers.

Unfortunately, code-wise, those are not separate. If you get rid of
init containers, my current p.o.c. has most of the needed changes
(only lightly tested).

But if you keep init containers, you'll need to mess with the
configuration so that spark-submit never sees spark.jars /
spark.files, so it doesn't trigger its dependency download code. (YARN
does something similar, btw.) That will surely mean different changes
in the current k8s code (which I wanted to double check anyway because
I remember seeing some oddities related to those configs in the logs).

To comment on one point made by Andrew:
> there's almost a parallel here with spark.yarn.archive, where that configures 
> the cluster (YARN) to do distribution pre-runtime

That's more of a parallel to the docker image; spark.yarn.archive
points to a jar file with Spark jars in it so that YARN can make Spark
available to the driver / executors running in the cluster.

Like the docker image, you could include other stuff that is not
really part of standard Spark in that archive too, or even not have
Spark at all there, if you want things to just fail. :-)

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to