Re: Standardized Spark dev environment

Patrick Wendell Tue, 20 Jan 2015 23:30:21 -0800

To respond to the original suggestion by Nick. I always thought it
would be useful to have a Docker image on which we run the tests and
build releases, so that we could have a consistent environment that
other packagers or people trying to exhaustively run Spark tests could
replicate (or at least look at) to understand exactly how we recommend
building Spark. Sean - do you think that is too high of overhead?


In terms of providing images that we encourage as standard deployment
images of Spark and want to make portable across environments, that's
a much larger project and one with higher associated maintenance
overhead. So I'd be interested in seeing that evolve as its own
project (spark-deploy) or something associated with bigtop, etc.

- Patrick

On Tue, Jan 20, 2015 at 10:30 PM, Paolo Platter
<paolo.plat...@agilelab.it> wrote:
> Hi all,
> I also tried the docker way and it works well.
> I suggest to look at sequenceiq/spark dockers, they are very active on that 
> field.
>
> Paolo
>
> Inviata dal mio Windows Phone
> ________________________________
> Da: jay vyas<mailto:jayunit100.apa...@gmail.com>
> Inviato: 21/01/2015 04:45
> A: Nicholas Chammas<mailto:nicholas.cham...@gmail.com>
> Cc: Will Benton<mailto:wi...@redhat.com>; Spark dev 
> list<mailto:dev@spark.apache.org>
> Oggetto: Re: Standardized Spark dev environment
>
> I can comment on both...  hi will and nate :)
>
> 1) Will's Dockerfile solution is  the most  simple direct solution to the
> dev environment question : its a  efficient way to build and develop spark
> environments for dev/test..  It would be cool to put that Dockerfile
> (and/or maybe a shell script which uses it) in the top level of spark as
> the build entry point.  For total platform portability, u could wrap in a
> vagrantfile to launch a lightweight vm, so that windows worked equally
> well.
>
> 2) However, since nate mentioned  vagrant and bigtop, i have to chime in :)
> the vagrant recipes in bigtop are a nice reference deployment of how to
> deploy spark in a heterogenous hadoop style environment, and tighter
> integration testing w/ bigtop for spark releases would be lovely !  The
> vagrant stuff use puppet to deploy an n node VM or docker based cluster, in
> which users can easily select components (including
> spark,yarn,hbase,hadoop,etc...) by simnply editing a YAML file :
> https://github.com/apache/bigtop/blob/master/bigtop-deploy/vm/vagrant-puppet/vagrantconfig.yaml....
> As nate said, it would be alot of fun to get more cross collaboration
> between the spark and bigtop communities.   Input on how we can better
> integrate spark (wether its spork, hbase integration, smoke tests aroudn
> the mllib stuff, or whatever, is always welcome )
>
>
>
>
>
>
> On Tue, Jan 20, 2015 at 10:21 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> How many profiles (hadoop / hive /scala) would this development environment
>> support ?
>>
>> As many as we want. We probably want to cover a good chunk of the build
>> matrix <https://issues.apache.org/jira/browse/SPARK-2004> that Spark
>> officially supports.
>>
>> What does this provide, concretely?
>>
>> It provides a reliable way to create a "good" Spark development
>> environment. Roughly speaking, this probably should mean an environment
>> that matches Jenkins, since that's where we run "official" testing and
>> builds.
>>
>> For example, Spark has to run on Java 6 and Python 2.6. When devs build and
>> run Spark locally, we can make sure they're doing it on these versions of
>> the languages with a simple vagrant up.
>>
>> Nate, could you comment on how something like this would relate to the
>> Bigtop effort?
>>
>> http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>>
>> Will, that's pretty sweet. I tried something similar a few months ago as an
>> experiment to try building/testing Spark within a container. Here's the
>> shell script I used <https://gist.github.com/nchammas/60b04141f3b9f053faaa
>> >
>> against the base CentOS Docker image to setup an environment ready to build
>> and test Spark.
>>
>> We want to run Spark unit tests within containers on Jenkins, so it might
>> make sense to develop a single Docker image that can be used as both a "dev
>> environment" as well as execution container on Jenkins.
>>
>> Perhaps that's the approach to take instead of looking into Vagrant.
>>
>> Nick
>>
>> On Tue Jan 20 2015 at 8:22:41 PM Will Benton <wi...@redhat.com> wrote:
>>
>> Hey Nick,
>> >
>> > I did something similar with a Docker image last summer; I haven't
>> updated
>> > the images to cache the dependencies for the current Spark master, but it
>> > would be trivial to do so:
>> >
>> > http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>> >
>> >
>> > best,
>> > wb
>> >
>> >
>> > ----- Original Message -----
>> > > From: "Nicholas Chammas" <nicholas.cham...@gmail.com>
>> > > To: "Spark dev list" <dev@spark.apache.org>
>> > > Sent: Tuesday, January 20, 2015 6:13:31 PM
>> > > Subject: Standardized Spark dev environment
>> > >
>> > > What do y'all think of creating a standardized Spark development
>> > > environment, perhaps encoded as a Vagrantfile, and publishing it under
>> > > `dev/`?
>> > >
>> > > The goal would be to make it easier for new developers to get started
>> > with
>> > > all the right configs and tools pre-installed.
>> > >
>> > > If we use something like Vagrant, we may even be able to make it so
>> that
>> > a
>> > > single Vagrantfile creates equivalent development environments across
>> OS
>> > X,
>> > > Linux, and Windows, without having to do much (or any) OS-specific
>> work.
>> > >
>> > > I imagine for committers and regular contributors, this exercise may
>> seem
>> > > pointless, since y'all are probably already very comfortable with your
>> > > workflow.
>> > >
>> > > I wonder, though, if any of you think this would be worthwhile as a
>> > > improvement to the "new Spark developer" experience.
>> > >
>> > > Nick
>> > >
>> >
>>
>>
>
>
>
> --
> jay vyas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Standardized Spark dev environment

Reply via email to