+1 on the first release being marked experimental. Many major features coming into Spark in the past have gone through a stabilization process
On Fri, Jan 12, 2018 at 1:18 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > BTW I most probably will not have time to get back to this at any time > soon, so if anyone is interested in doing some clean up, I'll leave my > branch up. > > I'm seriously thinking about proposing that we document the k8s > backend as experimental in 2.3; it seems there still a lot to be > cleaned up in terms of user interface (as in extensibility and > customizability), documentation, and mainly testing, and we're pretty > far into the 2.3 cycle for all of those to be sorted out. > > On Thu, Jan 11, 2018 at 8:19 AM, Anirudh Ramanathan > <ramanath...@google.com> wrote: > > If we can separate concerns those out, that might make sense in the short > > term IMO. > > There are several benefits to reusing spark-submit and spark-class as you > > pointed out previously, > > so, we should be looking to leverage those irrespective of how we do > > dependency management - > > in the interest of conformance with the other cluster managers. > > > > I like the idea of passing arguments through in a way that it doesn't > > trigger the dependency management code for now. > > In the interest of time for 2.3, if we could target the just that (and > > revisit the init containers afterwards), > > there should be enough time to make the change, test and release with > > confidence. > > > > On Wed, Jan 10, 2018 at 3:45 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> > >> On Wed, Jan 10, 2018 at 3:00 PM, Anirudh Ramanathan > >> <ramanath...@google.com> wrote: > >> > We can start by getting a PR going perhaps, and start augmenting the > >> > integration testing to ensure that there are no surprises - > with/without > >> > credentials, accessing GCS, S3 etc as well. > >> > When we get enough confidence and test coverage, let's merge this in. > >> > Does that sound like a reasonable path forward? > >> > >> I think it's beneficial to separate this into two separate things as > >> far as discussion goes: > >> > >> - using spark-submit: the code should definitely be starting the > >> driver using spark-submit, and potentially the executor using > >> spark-class. > >> > >> - separately, we can decide on whether to keep or remove init > containers. > >> > >> Unfortunately, code-wise, those are not separate. If you get rid of > >> init containers, my current p.o.c. has most of the needed changes > >> (only lightly tested). > >> > >> But if you keep init containers, you'll need to mess with the > >> configuration so that spark-submit never sees spark.jars / > >> spark.files, so it doesn't trigger its dependency download code. (YARN > >> does something similar, btw.) That will surely mean different changes > >> in the current k8s code (which I wanted to double check anyway because > >> I remember seeing some oddities related to those configs in the logs). > >> > >> To comment on one point made by Andrew: > >> > there's almost a parallel here with spark.yarn.archive, where that > >> > configures the cluster (YARN) to do distribution pre-runtime > >> > >> That's more of a parallel to the docker image; spark.yarn.archive > >> points to a jar file with Spark jars in it so that YARN can make Spark > >> available to the driver / executors running in the cluster. > >> > >> Like the docker image, you could include other stuff that is not > >> really part of standard Spark in that archive too, or even not have > >> Spark at all there, if you want things to just fail. :-) > >> > >> -- > >> Marcelo > > > > > > > > > > -- > > Anirudh Ramanathan > > > > -- > Marcelo > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >