Re: Kubernetes: why use init containers?

Andrew Ash Fri, 12 Jan 2018 13:31:20 -0800

+1 on the first release being marked experimental.  Many major features
coming into Spark in the past have gone through a stabilization process


On Fri, Jan 12, 2018 at 1:18 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> BTW I most probably will not have time to get back to this at any time
> soon, so if anyone is interested in doing some clean up, I'll leave my
> branch up.
>
> I'm seriously thinking about proposing that we document the k8s
> backend as experimental in 2.3; it seems there still a lot to be
> cleaned up in terms of user interface (as in extensibility and
> customizability), documentation, and mainly testing, and we're pretty
> far into the 2.3 cycle for all of those to be sorted out.
>
> On Thu, Jan 11, 2018 at 8:19 AM, Anirudh Ramanathan
> <ramanath...@google.com> wrote:
> > If we can separate concerns those out, that might make sense in the short
> > term IMO.
> > There are several benefits to reusing spark-submit and spark-class as you
> > pointed out previously,
> > so, we should be looking to leverage those irrespective of how we do
> > dependency management -
> > in the interest of conformance with the other cluster managers.
> >
> > I like the idea of passing arguments through in a way that it doesn't
> > trigger the dependency management code for now.
> > In the interest of time for 2.3, if we could target the just that (and
> > revisit the init containers afterwards),
> > there should be enough time to make the change, test and release with
> > confidence.
> >
> > On Wed, Jan 10, 2018 at 3:45 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >>
> >> On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan
> >> <ramanath...@google.com> wrote:
> >> > We can start by getting a PR going perhaps, and start augmenting the
> >> > integration testing to ensure that there are no surprises -
> with/without
> >> > credentials, accessing GCS, S3 etc as well.
> >> > When we get enough confidence and test coverage, let's merge this in.
> >> > Does that sound like a reasonable path forward?
> >>
> >> I think it's beneficial to separate this into two separate things as
> >> far as discussion goes:
> >>
> >> - using spark-submit: the code should definitely be starting the
> >> driver using spark-submit, and potentially the executor using
> >> spark-class.
> >>
> >> - separately, we can decide on whether to keep or remove init
> containers.
> >>
> >> Unfortunately, code-wise, those are not separate. If you get rid of
> >> init containers, my current p.o.c. has most of the needed changes
> >> (only lightly tested).
> >>
> >> But if you keep init containers, you'll need to mess with the
> >> configuration so that spark-submit never sees spark.jars /
> >> spark.files, so it doesn't trigger its dependency download code. (YARN
> >> does something similar, btw.) That will surely mean different changes
> >> in the current k8s code (which I wanted to double check anyway because
> >> I remember seeing some oddities related to those configs in the logs).
> >>
> >> To comment on one point made by Andrew:
> >> > there's almost a parallel here with spark.yarn.archive, where that
> >> > configures the cluster (YARN) to do distribution pre-runtime
> >>
> >> That's more of a parallel to the docker image; spark.yarn.archive
> >> points to a jar file with Spark jars in it so that YARN can make Spark
> >> available to the driver / executors running in the cluster.
> >>
> >> Like the docker image, you could include other stuff that is not
> >> really part of standard Spark in that archive too, or even not have
> >> Spark at all there, if you want things to just fail. :-)
> >>
> >> --
> >> Marcelo
> >
> >
> >
> >
> > --
> > Anirudh Ramanathan
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Kubernetes: why use init containers?

Reply via email to