To respond to the original suggestion by Nick. I always thought it would be useful to have a Docker image on which we run the tests and build releases, so that we could have a consistent environment that other packagers or people trying to exhaustively run Spark tests could replicate (or at least look at) to understand exactly how we recommend building Spark. Sean - do you think that is too high of overhead?
In terms of providing images that we encourage as standard deployment images of Spark and want to make portable across environments, that's a much larger project and one with higher associated maintenance overhead. So I'd be interested in seeing that evolve as its own project (spark-deploy) or something associated with bigtop, etc. - Patrick On Tue, Jan 20, 2015 at 10:30 PM, Paolo Platter <paolo.plat...@agilelab.it> wrote: > Hi all, > I also tried the docker way and it works well. > I suggest to look at sequenceiq/spark dockers, they are very active on that > field. > > Paolo > > Inviata dal mio Windows Phone > ________________________________ > Da: jay vyas<mailto:jayunit100.apa...@gmail.com> > Inviato: 21/01/2015 04:45 > A: Nicholas Chammas<mailto:nicholas.cham...@gmail.com> > Cc: Will Benton<mailto:wi...@redhat.com>; Spark dev > list<mailto:dev@spark.apache.org> > Oggetto: Re: Standardized Spark dev environment > > I can comment on both... hi will and nate :) > > 1) Will's Dockerfile solution is the most simple direct solution to the > dev environment question : its a efficient way to build and develop spark > environments for dev/test.. It would be cool to put that Dockerfile > (and/or maybe a shell script which uses it) in the top level of spark as > the build entry point. For total platform portability, u could wrap in a > vagrantfile to launch a lightweight vm, so that windows worked equally > well. > > 2) However, since nate mentioned vagrant and bigtop, i have to chime in :) > the vagrant recipes in bigtop are a nice reference deployment of how to > deploy spark in a heterogenous hadoop style environment, and tighter > integration testing w/ bigtop for spark releases would be lovely ! The > vagrant stuff use puppet to deploy an n node VM or docker based cluster, in > which users can easily select components (including > spark,yarn,hbase,hadoop,etc...) by simnply editing a YAML file : > https://github.com/apache/bigtop/blob/master/bigtop-deploy/vm/vagrant-puppet/vagrantconfig.yaml.... > As nate said, it would be alot of fun to get more cross collaboration > between the spark and bigtop communities. Input on how we can better > integrate spark (wether its spork, hbase integration, smoke tests aroudn > the mllib stuff, or whatever, is always welcome ) > > > > > > > On Tue, Jan 20, 2015 at 10:21 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> How many profiles (hadoop / hive /scala) would this development environment >> support ? >> >> As many as we want. We probably want to cover a good chunk of the build >> matrix <https://issues.apache.org/jira/browse/SPARK-2004> that Spark >> officially supports. >> >> What does this provide, concretely? >> >> It provides a reliable way to create a "good" Spark development >> environment. Roughly speaking, this probably should mean an environment >> that matches Jenkins, since that's where we run "official" testing and >> builds. >> >> For example, Spark has to run on Java 6 and Python 2.6. When devs build and >> run Spark locally, we can make sure they're doing it on these versions of >> the languages with a simple vagrant up. >> >> Nate, could you comment on how something like this would relate to the >> Bigtop effort? >> >> http://chapeau.freevariable.com/2014/08/jvm-test-docker.html >> >> Will, that's pretty sweet. I tried something similar a few months ago as an >> experiment to try building/testing Spark within a container. Here's the >> shell script I used <https://gist.github.com/nchammas/60b04141f3b9f053faaa >> > >> against the base CentOS Docker image to setup an environment ready to build >> and test Spark. >> >> We want to run Spark unit tests within containers on Jenkins, so it might >> make sense to develop a single Docker image that can be used as both a "dev >> environment" as well as execution container on Jenkins. >> >> Perhaps that's the approach to take instead of looking into Vagrant. >> >> Nick >> >> On Tue Jan 20 2015 at 8:22:41 PM Will Benton <wi...@redhat.com> wrote: >> >> Hey Nick, >> > >> > I did something similar with a Docker image last summer; I haven't >> updated >> > the images to cache the dependencies for the current Spark master, but it >> > would be trivial to do so: >> > >> > http://chapeau.freevariable.com/2014/08/jvm-test-docker.html >> > >> > >> > best, >> > wb >> > >> > >> > ----- Original Message ----- >> > > From: "Nicholas Chammas" <nicholas.cham...@gmail.com> >> > > To: "Spark dev list" <dev@spark.apache.org> >> > > Sent: Tuesday, January 20, 2015 6:13:31 PM >> > > Subject: Standardized Spark dev environment >> > > >> > > What do y'all think of creating a standardized Spark development >> > > environment, perhaps encoded as a Vagrantfile, and publishing it under >> > > `dev/`? >> > > >> > > The goal would be to make it easier for new developers to get started >> > with >> > > all the right configs and tools pre-installed. >> > > >> > > If we use something like Vagrant, we may even be able to make it so >> that >> > a >> > > single Vagrantfile creates equivalent development environments across >> OS >> > X, >> > > Linux, and Windows, without having to do much (or any) OS-specific >> work. >> > > >> > > I imagine for committers and regular contributors, this exercise may >> seem >> > > pointless, since y'all are probably already very comfortable with your >> > > workflow. >> > > >> > > I wonder, though, if any of you think this would be worthwhile as a >> > > improvement to the "new Spark developer" experience. >> > > >> > > Nick >> > > >> > >> >> > > > > -- > jay vyas --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org