Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
Yep,

I think it's only useful (and likely to be maintained) if we actually
use this on Jenkins. So that was my proposal. Basically give people a
docker file so they can understand exactly what versions of everything
we use for our reference build. And if they don't want to use docker
directly, this will at least serve as an up-to-date list of
packages/versions they should try to install locally in whatever
environment they have.

- Patrick

On Wed, Jan 21, 2015 at 5:42 AM, Will Benton  wrote:
> - Original Message -
>> From: "Patrick Wendell" 
>> To: "Sean Owen" 
>> Cc: "dev" , "jay vyas" , 
>> "Paolo Platter"
>> , "Nicholas Chammas" 
>> , "Will Benton" 
>> Sent: Wednesday, January 21, 2015 2:09:35 AM
>> Subject: Re: Standardized Spark dev environment
>
>> But the issue is when users can't reproduce Jenkins failures.
>
> Yeah, to answer Sean's question, this was part of the problem I was trying to 
> solve.  The other part was teasing out differences between the Fedora Java 
> environment and a more conventional Java environment.  I agree with Sean (and 
> I think this is your suggestion as well, Patrick) that making the environment 
> Jenkins runs a standard image that is available for public consumption would 
> be useful in general.
>
>
>
> best,
> wb

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Standardized Spark dev environment

2015-01-21 Thread Will Benton
- Original Message -
> From: "Patrick Wendell" 
> To: "Sean Owen" 
> Cc: "dev" , "jay vyas" , 
> "Paolo Platter"
> , "Nicholas Chammas" , 
> "Will Benton" 
> Sent: Wednesday, January 21, 2015 2:09:35 AM
> Subject: Re: Standardized Spark dev environment

> But the issue is when users can't reproduce Jenkins failures.

Yeah, to answer Sean's question, this was part of the problem I was trying to 
solve.  The other part was teasing out differences between the Fedora Java 
environment and a more conventional Java environment.  I agree with Sean (and I 
think this is your suggestion as well, Patrick) that making the environment 
Jenkins runs a standard image that is available for public consumption would be 
useful in general.



best,
wb

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Standardized Spark dev environment

2015-01-21 Thread Sean Owen
Sure, can Jenkins use this new image too? If not then it doesn't help with
reproducing a Jenkins failure, most of which even Jenkins can't reproduce.
But if it does and it can be used for builds then that does seem like it is
reducing rather than increasing environment configurations which is good.

That's different from developer setup. Surely that is a large number of
permutations to maintain? Win, Linux, OS X at least. Whereas I have not
needed nor probably would want a whole second tool chain on my machine for
Spark... for me it doesn't solve a problem. So just wondering how many
people this will help as devs versus some apparent big maintenance
overhead.

Although if this could replace the scripts that try to fetch sbt and mvn et
al that alone could save enough complexity to make it worthwhile. Would it
do that?
On Jan 21, 2015 9:09 AM, "Patrick Wendell"  wrote:

> > If the goal is a reproducible test environment then I think that is what
> > Jenkins is. Granted you can only ask it for a test. But presumably you
> get
> > the same result if you start from the same VM image as Jenkins and run
> the
> > same steps.
>
> But the issue is when users can't reproduce Jenkins failures. We don't
> publish anywhere what the exact set of packages and versions is that
> is installed on Jenkins. And it can change since it's a shared
> infrastructure with other projects. So why not publish this manifest
> as a docker file and then have it run on jenkins using that image? My
> point is that this "VM image + steps" is not public anywhere.
>
> > I bet it is not hard to set up and maintain. I bet it is easier than a
> VM.
> > But unless Jenkins is using it aren't we just making another different
> > standard build env in an effort to standardize? If it is not the same
> then
> > it loses value as being exactly the same as the reference build env. Has
> a
> > problem come up that this solves?
>
> Right now the reference build env is an AMI I created and keep adding
> stuff to when Spark gets new dependencies (e.g. the version of ruby we
> need to create the docs, new python stats libraries, etc). So if we
> had a docker image, then I would use that for making the RC's as well
> and it could serve as a definitive reference for people who want to
> understand exactly what set of things they need to build Spark.
>
> >
> > If the goal is just easing developer set up then what does a Docker
> image do
> > - what does it set up for me? I don't know of stuff I need set up on OS X
> > for me beyond the IDE.
>
> There are actually a good number of packages you need to do a full
> build of Spark including a compliant python version, Java version,
> certain python packages, ruby and jekyll stuff for the docs, etc
> (mentioned a bit earlier).
>
> - Patrick
>


Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
> If the goal is a reproducible test environment then I think that is what
> Jenkins is. Granted you can only ask it for a test. But presumably you get
> the same result if you start from the same VM image as Jenkins and run the
> same steps.

But the issue is when users can't reproduce Jenkins failures. We don't
publish anywhere what the exact set of packages and versions is that
is installed on Jenkins. And it can change since it's a shared
infrastructure with other projects. So why not publish this manifest
as a docker file and then have it run on jenkins using that image? My
point is that this "VM image + steps" is not public anywhere.

> I bet it is not hard to set up and maintain. I bet it is easier than a VM.
> But unless Jenkins is using it aren't we just making another different
> standard build env in an effort to standardize? If it is not the same then
> it loses value as being exactly the same as the reference build env. Has a
> problem come up that this solves?

Right now the reference build env is an AMI I created and keep adding
stuff to when Spark gets new dependencies (e.g. the version of ruby we
need to create the docs, new python stats libraries, etc). So if we
had a docker image, then I would use that for making the RC's as well
and it could serve as a definitive reference for people who want to
understand exactly what set of things they need to build Spark.

>
> If the goal is just easing developer set up then what does a Docker image do
> - what does it set up for me? I don't know of stuff I need set up on OS X
> for me beyond the IDE.

There are actually a good number of packages you need to do a full
build of Spark including a compliant python version, Java version,
certain python packages, ruby and jekyll stuff for the docs, etc
(mentioned a bit earlier).

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Standardized Spark dev environment

2015-01-20 Thread Sean Owen
If the goal is a reproducible test environment then I think that is what
Jenkins is. Granted you can only ask it for a test. But presumably you get
the same result if you start from the same VM image as Jenkins and run the
same steps.

I bet it is not hard to set up and maintain. I bet it is easier than a VM.
But unless Jenkins is using it aren't we just making another different
standard build env in an effort to standardize? If it is not the same then
it loses value as being exactly the same as the reference build env. Has a
problem come up that this solves?

If the goal is just easing developer set up then what does a Docker image
do - what does it set up for me? I don't know of stuff I need set up on OS
X for me beyond the IDE.
On Jan 21, 2015 7:30 AM, "Patrick Wendell"  wrote:

> To respond to the original suggestion by Nick. I always thought it
> would be useful to have a Docker image on which we run the tests and
> build releases, so that we could have a consistent environment that
> other packagers or people trying to exhaustively run Spark tests could
> replicate (or at least look at) to understand exactly how we recommend
> building Spark. Sean - do you think that is too high of overhead?
>
> In terms of providing images that we encourage as standard deployment
> images of Spark and want to make portable across environments, that's
> a much larger project and one with higher associated maintenance
> overhead. So I'd be interested in seeing that evolve as its own
> project (spark-deploy) or something associated with bigtop, etc.
>
> - Patrick
>
> On Tue, Jan 20, 2015 at 10:30 PM, Paolo Platter
>  wrote:
> > Hi all,
> > I also tried the docker way and it works well.
> > I suggest to look at sequenceiq/spark dockers, they are very active on
> that field.
> >
> > Paolo
> >
> > Inviata dal mio Windows Phone
> > 
> > Da: jay vyas<mailto:jayunit100.apa...@gmail.com>
> > Inviato: 21/01/2015 04:45
> > A: Nicholas Chammas<mailto:nicholas.cham...@gmail.com>
> > Cc: Will Benton<mailto:wi...@redhat.com>; Spark dev list dev@spark.apache.org>
> > Oggetto: Re: Standardized Spark dev environment
> >
> > I can comment on both...  hi will and nate :)
> >
> > 1) Will's Dockerfile solution is  the most  simple direct solution to the
> > dev environment question : its a  efficient way to build and develop
> spark
> > environments for dev/test..  It would be cool to put that Dockerfile
> > (and/or maybe a shell script which uses it) in the top level of spark as
> > the build entry point.  For total platform portability, u could wrap in a
> > vagrantfile to launch a lightweight vm, so that windows worked equally
> > well.
> >
> > 2) However, since nate mentioned  vagrant and bigtop, i have to chime in
> :)
> > the vagrant recipes in bigtop are a nice reference deployment of how to
> > deploy spark in a heterogenous hadoop style environment, and tighter
> > integration testing w/ bigtop for spark releases would be lovely !  The
> > vagrant stuff use puppet to deploy an n node VM or docker based cluster,
> in
> > which users can easily select components (including
> > spark,yarn,hbase,hadoop,etc...) by simnply editing a YAML file :
> >
> https://github.com/apache/bigtop/blob/master/bigtop-deploy/vm/vagrant-puppet/vagrantconfig.yaml..
> ..
> > As nate said, it would be alot of fun to get more cross collaboration
> > between the spark and bigtop communities.   Input on how we can better
> > integrate spark (wether its spork, hbase integration, smoke tests aroudn
> > the mllib stuff, or whatever, is always welcome )
> >
> >
> >
> >
> >
> >
> > On Tue, Jan 20, 2015 at 10:21 PM, Nicholas Chammas <
> > nicholas.cham...@gmail.com> wrote:
> >
> >> How many profiles (hadoop / hive /scala) would this development
> environment
> >> support ?
> >>
> >> As many as we want. We probably want to cover a good chunk of the build
> >> matrix <https://issues.apache.org/jira/browse/SPARK-2004> that Spark
> >> officially supports.
> >>
> >> What does this provide, concretely?
> >>
> >> It provides a reliable way to create a "good" Spark development
> >> environment. Roughly speaking, this probably should mean an environment
> >> that matches Jenkins, since that's where we run "official" testing and
> >> builds.
> >>
> >> For example, Spark has to run on Java 6 and Python 2.6. When devs build
> and
> >> run Spark locally, we can make sure t

Re: Standardized Spark dev environment

2015-01-20 Thread Patrick Wendell
To respond to the original suggestion by Nick. I always thought it
would be useful to have a Docker image on which we run the tests and
build releases, so that we could have a consistent environment that
other packagers or people trying to exhaustively run Spark tests could
replicate (or at least look at) to understand exactly how we recommend
building Spark. Sean - do you think that is too high of overhead?

In terms of providing images that we encourage as standard deployment
images of Spark and want to make portable across environments, that's
a much larger project and one with higher associated maintenance
overhead. So I'd be interested in seeing that evolve as its own
project (spark-deploy) or something associated with bigtop, etc.

- Patrick

On Tue, Jan 20, 2015 at 10:30 PM, Paolo Platter
 wrote:
> Hi all,
> I also tried the docker way and it works well.
> I suggest to look at sequenceiq/spark dockers, they are very active on that 
> field.
>
> Paolo
>
> Inviata dal mio Windows Phone
> 
> Da: jay vyas<mailto:jayunit100.apa...@gmail.com>
> Inviato: 21/01/2015 04:45
> A: Nicholas Chammas<mailto:nicholas.cham...@gmail.com>
> Cc: Will Benton<mailto:wi...@redhat.com>; Spark dev 
> list<mailto:dev@spark.apache.org>
> Oggetto: Re: Standardized Spark dev environment
>
> I can comment on both...  hi will and nate :)
>
> 1) Will's Dockerfile solution is  the most  simple direct solution to the
> dev environment question : its a  efficient way to build and develop spark
> environments for dev/test..  It would be cool to put that Dockerfile
> (and/or maybe a shell script which uses it) in the top level of spark as
> the build entry point.  For total platform portability, u could wrap in a
> vagrantfile to launch a lightweight vm, so that windows worked equally
> well.
>
> 2) However, since nate mentioned  vagrant and bigtop, i have to chime in :)
> the vagrant recipes in bigtop are a nice reference deployment of how to
> deploy spark in a heterogenous hadoop style environment, and tighter
> integration testing w/ bigtop for spark releases would be lovely !  The
> vagrant stuff use puppet to deploy an n node VM or docker based cluster, in
> which users can easily select components (including
> spark,yarn,hbase,hadoop,etc...) by simnply editing a YAML file :
> https://github.com/apache/bigtop/blob/master/bigtop-deploy/vm/vagrant-puppet/vagrantconfig.yaml
> As nate said, it would be alot of fun to get more cross collaboration
> between the spark and bigtop communities.   Input on how we can better
> integrate spark (wether its spork, hbase integration, smoke tests aroudn
> the mllib stuff, or whatever, is always welcome )
>
>
>
>
>
>
> On Tue, Jan 20, 2015 at 10:21 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> How many profiles (hadoop / hive /scala) would this development environment
>> support ?
>>
>> As many as we want. We probably want to cover a good chunk of the build
>> matrix <https://issues.apache.org/jira/browse/SPARK-2004> that Spark
>> officially supports.
>>
>> What does this provide, concretely?
>>
>> It provides a reliable way to create a "good" Spark development
>> environment. Roughly speaking, this probably should mean an environment
>> that matches Jenkins, since that's where we run "official" testing and
>> builds.
>>
>> For example, Spark has to run on Java 6 and Python 2.6. When devs build and
>> run Spark locally, we can make sure they're doing it on these versions of
>> the languages with a simple vagrant up.
>>
>> Nate, could you comment on how something like this would relate to the
>> Bigtop effort?
>>
>> http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>>
>> Will, that's pretty sweet. I tried something similar a few months ago as an
>> experiment to try building/testing Spark within a container. Here's the
>> shell script I used <https://gist.github.com/nchammas/60b04141f3b9f053faaa
>> >
>> against the base CentOS Docker image to setup an environment ready to build
>> and test Spark.
>>
>> We want to run Spark unit tests within containers on Jenkins, so it might
>> make sense to develop a single Docker image that can be used as both a "dev
>> environment" as well as execution container on Jenkins.
>>
>> Perhaps that's the approach to take instead of looking into Vagrant.
>>
>> Nick
>>
>> On Tue Jan 20 2015 at 8:22:41 PM Will Benton  wrote:
>>
>> Hey Nick,
>> >
>> > I did something similar with a Docker image l

R: Standardized Spark dev environment

2015-01-20 Thread Paolo Platter
Hi all,
I also tried the docker way and it works well.
I suggest to look at sequenceiq/spark dockers, they are very active on that 
field.

Paolo

Inviata dal mio Windows Phone

Da: jay vyas<mailto:jayunit100.apa...@gmail.com>
Inviato: ‎21/‎01/‎2015 04:45
A: Nicholas Chammas<mailto:nicholas.cham...@gmail.com>
Cc: Will Benton<mailto:wi...@redhat.com>; Spark dev 
list<mailto:dev@spark.apache.org>
Oggetto: Re: Standardized Spark dev environment

I can comment on both...  hi will and nate :)

1) Will's Dockerfile solution is  the most  simple direct solution to the
dev environment question : its a  efficient way to build and develop spark
environments for dev/test..  It would be cool to put that Dockerfile
(and/or maybe a shell script which uses it) in the top level of spark as
the build entry point.  For total platform portability, u could wrap in a
vagrantfile to launch a lightweight vm, so that windows worked equally
well.

2) However, since nate mentioned  vagrant and bigtop, i have to chime in :)
the vagrant recipes in bigtop are a nice reference deployment of how to
deploy spark in a heterogenous hadoop style environment, and tighter
integration testing w/ bigtop for spark releases would be lovely !  The
vagrant stuff use puppet to deploy an n node VM or docker based cluster, in
which users can easily select components (including
spark,yarn,hbase,hadoop,etc...) by simnply editing a YAML file :
https://github.com/apache/bigtop/blob/master/bigtop-deploy/vm/vagrant-puppet/vagrantconfig.yaml
As nate said, it would be alot of fun to get more cross collaboration
between the spark and bigtop communities.   Input on how we can better
integrate spark (wether its spork, hbase integration, smoke tests aroudn
the mllib stuff, or whatever, is always welcome )






On Tue, Jan 20, 2015 at 10:21 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> How many profiles (hadoop / hive /scala) would this development environment
> support ?
>
> As many as we want. We probably want to cover a good chunk of the build
> matrix <https://issues.apache.org/jira/browse/SPARK-2004> that Spark
> officially supports.
>
> What does this provide, concretely?
>
> It provides a reliable way to create a “good” Spark development
> environment. Roughly speaking, this probably should mean an environment
> that matches Jenkins, since that’s where we run “official” testing and
> builds.
>
> For example, Spark has to run on Java 6 and Python 2.6. When devs build and
> run Spark locally, we can make sure they’re doing it on these versions of
> the languages with a simple vagrant up.
>
> Nate, could you comment on how something like this would relate to the
> Bigtop effort?
>
> http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>
> Will, that’s pretty sweet. I tried something similar a few months ago as an
> experiment to try building/testing Spark within a container. Here’s the
> shell script I used <https://gist.github.com/nchammas/60b04141f3b9f053faaa
> >
> against the base CentOS Docker image to setup an environment ready to build
> and test Spark.
>
> We want to run Spark unit tests within containers on Jenkins, so it might
> make sense to develop a single Docker image that can be used as both a “dev
> environment” as well as execution container on Jenkins.
>
> Perhaps that’s the approach to take instead of looking into Vagrant.
>
> Nick
>
> On Tue Jan 20 2015 at 8:22:41 PM Will Benton  wrote:
>
> Hey Nick,
> >
> > I did something similar with a Docker image last summer; I haven't
> updated
> > the images to cache the dependencies for the current Spark master, but it
> > would be trivial to do so:
> >
> > http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
> >
> >
> > best,
> > wb
> >
> >
> > - Original Message -
> > > From: "Nicholas Chammas" 
> > > To: "Spark dev list" 
> > > Sent: Tuesday, January 20, 2015 6:13:31 PM
> > > Subject: Standardized Spark dev environment
> > >
> > > What do y'all think of creating a standardized Spark development
> > > environment, perhaps encoded as a Vagrantfile, and publishing it under
> > > `dev/`?
> > >
> > > The goal would be to make it easier for new developers to get started
> > with
> > > all the right configs and tools pre-installed.
> > >
> > > If we use something like Vagrant, we may even be able to make it so
> that
> > a
> > > single Vagrantfile creates equivalent development environments across
> OS
> > X,
> > > Linux, and Windows, without having to do much (or any) OS-specific
> work.
> > >
> > > I imagine for committers and regular contributors, this exercise may
> seem
> > > pointless, since y'all are probably already very comfortable with your
> > > workflow.
> > >
> > > I wonder, though, if any of you think this would be worthwhile as a
> > > improvement to the "new Spark developer" experience.
> > >
> > > Nick
> > >
> >
> ​
>



--
jay vyas


Re: Standardized Spark dev environment

2015-01-20 Thread jay vyas
I can comment on both...  hi will and nate :)

1) Will's Dockerfile solution is  the most  simple direct solution to the
dev environment question : its a  efficient way to build and develop spark
environments for dev/test..  It would be cool to put that Dockerfile
(and/or maybe a shell script which uses it) in the top level of spark as
the build entry point.  For total platform portability, u could wrap in a
vagrantfile to launch a lightweight vm, so that windows worked equally
well.

2) However, since nate mentioned  vagrant and bigtop, i have to chime in :)
the vagrant recipes in bigtop are a nice reference deployment of how to
deploy spark in a heterogenous hadoop style environment, and tighter
integration testing w/ bigtop for spark releases would be lovely !  The
vagrant stuff use puppet to deploy an n node VM or docker based cluster, in
which users can easily select components (including
spark,yarn,hbase,hadoop,etc...) by simnply editing a YAML file :
https://github.com/apache/bigtop/blob/master/bigtop-deploy/vm/vagrant-puppet/vagrantconfig.yaml
As nate said, it would be alot of fun to get more cross collaboration
between the spark and bigtop communities.   Input on how we can better
integrate spark (wether its spork, hbase integration, smoke tests aroudn
the mllib stuff, or whatever, is always welcome )






On Tue, Jan 20, 2015 at 10:21 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> How many profiles (hadoop / hive /scala) would this development environment
> support ?
>
> As many as we want. We probably want to cover a good chunk of the build
> matrix <https://issues.apache.org/jira/browse/SPARK-2004> that Spark
> officially supports.
>
> What does this provide, concretely?
>
> It provides a reliable way to create a “good” Spark development
> environment. Roughly speaking, this probably should mean an environment
> that matches Jenkins, since that’s where we run “official” testing and
> builds.
>
> For example, Spark has to run on Java 6 and Python 2.6. When devs build and
> run Spark locally, we can make sure they’re doing it on these versions of
> the languages with a simple vagrant up.
>
> Nate, could you comment on how something like this would relate to the
> Bigtop effort?
>
> http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>
> Will, that’s pretty sweet. I tried something similar a few months ago as an
> experiment to try building/testing Spark within a container. Here’s the
> shell script I used <https://gist.github.com/nchammas/60b04141f3b9f053faaa
> >
> against the base CentOS Docker image to setup an environment ready to build
> and test Spark.
>
> We want to run Spark unit tests within containers on Jenkins, so it might
> make sense to develop a single Docker image that can be used as both a “dev
> environment” as well as execution container on Jenkins.
>
> Perhaps that’s the approach to take instead of looking into Vagrant.
>
> Nick
>
> On Tue Jan 20 2015 at 8:22:41 PM Will Benton  wrote:
>
> Hey Nick,
> >
> > I did something similar with a Docker image last summer; I haven't
> updated
> > the images to cache the dependencies for the current Spark master, but it
> > would be trivial to do so:
> >
> > http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
> >
> >
> > best,
> > wb
> >
> >
> > - Original Message -
> > > From: "Nicholas Chammas" 
> > > To: "Spark dev list" 
> > > Sent: Tuesday, January 20, 2015 6:13:31 PM
> > > Subject: Standardized Spark dev environment
> > >
> > > What do y'all think of creating a standardized Spark development
> > > environment, perhaps encoded as a Vagrantfile, and publishing it under
> > > `dev/`?
> > >
> > > The goal would be to make it easier for new developers to get started
> > with
> > > all the right configs and tools pre-installed.
> > >
> > > If we use something like Vagrant, we may even be able to make it so
> that
> > a
> > > single Vagrantfile creates equivalent development environments across
> OS
> > X,
> > > Linux, and Windows, without having to do much (or any) OS-specific
> work.
> > >
> > > I imagine for committers and regular contributors, this exercise may
> seem
> > > pointless, since y'all are probably already very comfortable with your
> > > workflow.
> > >
> > > I wonder, though, if any of you think this would be worthwhile as a
> > > improvement to the "new Spark developer" experience.
> > >
> > > Nick
> > >
> >
> ​
>



-- 
jay vyas


Re: Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
How many profiles (hadoop / hive /scala) would this development environment
support ?

As many as we want. We probably want to cover a good chunk of the build
matrix <https://issues.apache.org/jira/browse/SPARK-2004> that Spark
officially supports.

What does this provide, concretely?

It provides a reliable way to create a “good” Spark development
environment. Roughly speaking, this probably should mean an environment
that matches Jenkins, since that’s where we run “official” testing and
builds.

For example, Spark has to run on Java 6 and Python 2.6. When devs build and
run Spark locally, we can make sure they’re doing it on these versions of
the languages with a simple vagrant up.

Nate, could you comment on how something like this would relate to the
Bigtop effort?

http://chapeau.freevariable.com/2014/08/jvm-test-docker.html

Will, that’s pretty sweet. I tried something similar a few months ago as an
experiment to try building/testing Spark within a container. Here’s the
shell script I used <https://gist.github.com/nchammas/60b04141f3b9f053faaa>
against the base CentOS Docker image to setup an environment ready to build
and test Spark.

We want to run Spark unit tests within containers on Jenkins, so it might
make sense to develop a single Docker image that can be used as both a “dev
environment” as well as execution container on Jenkins.

Perhaps that’s the approach to take instead of looking into Vagrant.

Nick

On Tue Jan 20 2015 at 8:22:41 PM Will Benton  wrote:

Hey Nick,
>
> I did something similar with a Docker image last summer; I haven't updated
> the images to cache the dependencies for the current Spark master, but it
> would be trivial to do so:
>
> http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>
>
> best,
> wb
>
>
> - Original Message -
> > From: "Nicholas Chammas" 
> > To: "Spark dev list" 
> > Sent: Tuesday, January 20, 2015 6:13:31 PM
> > Subject: Standardized Spark dev environment
> >
> > What do y'all think of creating a standardized Spark development
> > environment, perhaps encoded as a Vagrantfile, and publishing it under
> > `dev/`?
> >
> > The goal would be to make it easier for new developers to get started
> with
> > all the right configs and tools pre-installed.
> >
> > If we use something like Vagrant, we may even be able to make it so that
> a
> > single Vagrantfile creates equivalent development environments across OS
> X,
> > Linux, and Windows, without having to do much (or any) OS-specific work.
> >
> > I imagine for committers and regular contributors, this exercise may seem
> > pointless, since y'all are probably already very comfortable with your
> > workflow.
> >
> > I wonder, though, if any of you think this would be worthwhile as a
> > improvement to the "new Spark developer" experience.
> >
> > Nick
> >
>
​


Re: Standardized Spark dev environment

2015-01-20 Thread Will Benton
Hey Nick,

I did something similar with a Docker image last summer; I haven't updated the 
images to cache the dependencies for the current Spark master, but it would be 
trivial to do so:

http://chapeau.freevariable.com/2014/08/jvm-test-docker.html


best,
wb


- Original Message -
> From: "Nicholas Chammas" 
> To: "Spark dev list" 
> Sent: Tuesday, January 20, 2015 6:13:31 PM
> Subject: Standardized Spark dev environment
> 
> What do y'all think of creating a standardized Spark development
> environment, perhaps encoded as a Vagrantfile, and publishing it under
> `dev/`?
> 
> The goal would be to make it easier for new developers to get started with
> all the right configs and tools pre-installed.
> 
> If we use something like Vagrant, we may even be able to make it so that a
> single Vagrantfile creates equivalent development environments across OS X,
> Linux, and Windows, without having to do much (or any) OS-specific work.
> 
> I imagine for committers and regular contributors, this exercise may seem
> pointless, since y'all are probably already very comfortable with your
> workflow.
> 
> I wonder, though, if any of you think this would be worthwhile as a
> improvement to the "new Spark developer" experience.
> 
> Nick
> 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: Standardized Spark dev environment

2015-01-20 Thread nate
If there is some interest in more standardization and setup of dev/test 
environments spark community might be interested in starting to participate in 
apache bigtop effort:

http://bigtop.apache.org/

While the project had its start and initial focus on packaging, testing, 
deploying Hadoop/hdfs related stack its looking like we will be targeting "data 
engineers" going forward, thus spark is looking to become bigger central piece 
to bigtop effort as the project moves towards a "v1" release.

We will be doing a bigtop/bigdata workshop late Feb at the SocalLinux 
Conference:

http://www.socallinuxexpo.org/scale/13x

Right now scoping some content that will be getting started spark related for 
the event, targeted intro of bigtop/spark puppet powered deployment components 
going into the event as well.

Also the group will be holding a meetup at Amazon's Palo Alto office on Jan 
27th if any folks are interested.

Nate

-Original Message-
From: Sean Owen [mailto:so...@cloudera.com] 
Sent: Tuesday, January 20, 2015 5:09 PM
To: Nicholas Chammas
Cc: dev
Subject: Re: Standardized Spark dev environment

My concern would mostly be maintenance. It adds to an already very complex 
build. It only assists developers who are a small audience. What does this 
provide, concretely?
On Jan 21, 2015 12:14 AM, "Nicholas Chammas" 
wrote:

> What do y'all think of creating a standardized Spark development 
> environment, perhaps encoded as a Vagrantfile, and publishing it under 
> `dev/`?
>
> The goal would be to make it easier for new developers to get started 
> with all the right configs and tools pre-installed.
>
> If we use something like Vagrant, we may even be able to make it so 
> that a single Vagrantfile creates equivalent development environments 
> across OS X, Linux, and Windows, without having to do much (or any) 
> OS-specific work.
>
> I imagine for committers and regular contributors, this exercise may 
> seem pointless, since y'all are probably already very comfortable with 
> your workflow.
>
> I wonder, though, if any of you think this would be worthwhile as a 
> improvement to the "new Spark developer" experience.
>
> Nick
>


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Standardized Spark dev environment

2015-01-20 Thread Sean Owen
My concern would mostly be maintenance. It adds to an already very complex
build. It only assists developers who are a small audience. What does this
provide, concretely?
On Jan 21, 2015 12:14 AM, "Nicholas Chammas" 
wrote:

> What do y'all think of creating a standardized Spark development
> environment, perhaps encoded as a Vagrantfile, and publishing it under
> `dev/`?
>
> The goal would be to make it easier for new developers to get started with
> all the right configs and tools pre-installed.
>
> If we use something like Vagrant, we may even be able to make it so that a
> single Vagrantfile creates equivalent development environments across OS X,
> Linux, and Windows, without having to do much (or any) OS-specific work.
>
> I imagine for committers and regular contributors, this exercise may seem
> pointless, since y'all are probably already very comfortable with your
> workflow.
>
> I wonder, though, if any of you think this would be worthwhile as a
> improvement to the "new Spark developer" experience.
>
> Nick
>


Re: Standardized Spark dev environment

2015-01-20 Thread Ted Yu
How many profiles (hadoop / hive /scala) would this development environment
support ?

Cheers

On Tue, Jan 20, 2015 at 4:13 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> What do y'all think of creating a standardized Spark development
> environment, perhaps encoded as a Vagrantfile, and publishing it under
> `dev/`?
>
> The goal would be to make it easier for new developers to get started with
> all the right configs and tools pre-installed.
>
> If we use something like Vagrant, we may even be able to make it so that a
> single Vagrantfile creates equivalent development environments across OS X,
> Linux, and Windows, without having to do much (or any) OS-specific work.
>
> I imagine for committers and regular contributors, this exercise may seem
> pointless, since y'all are probably already very comfortable with your
> workflow.
>
> I wonder, though, if any of you think this would be worthwhile as a
> improvement to the "new Spark developer" experience.
>
> Nick
>


Re: Standardized Spark dev environment

2015-01-20 Thread shenyan zhen
Great suggestion.
On Jan 20, 2015 7:14 PM, "Nicholas Chammas" 
wrote:

> What do y'all think of creating a standardized Spark development
> environment, perhaps encoded as a Vagrantfile, and publishing it under
> `dev/`?
>
> The goal would be to make it easier for new developers to get started with
> all the right configs and tools pre-installed.
>
> If we use something like Vagrant, we may even be able to make it so that a
> single Vagrantfile creates equivalent development environments across OS X,
> Linux, and Windows, without having to do much (or any) OS-specific work.
>
> I imagine for committers and regular contributors, this exercise may seem
> pointless, since y'all are probably already very comfortable with your
> workflow.
>
> I wonder, though, if any of you think this would be worthwhile as a
> improvement to the "new Spark developer" experience.
>
> Nick
>


Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
What do y'all think of creating a standardized Spark development
environment, perhaps encoded as a Vagrantfile, and publishing it under
`dev/`?

The goal would be to make it easier for new developers to get started with
all the right configs and tools pre-installed.

If we use something like Vagrant, we may even be able to make it so that a
single Vagrantfile creates equivalent development environments across OS X,
Linux, and Windows, without having to do much (or any) OS-specific work.

I imagine for committers and regular contributors, this exercise may seem
pointless, since y'all are probably already very comfortable with your
workflow.

I wonder, though, if any of you think this would be worthwhile as a
improvement to the "new Spark developer" experience.

Nick