Re: Should spark-ec2 get its own repo?

2015-08-03 Thread Shivaram Venkataraman
I sent a note to the Mesos developers and created
https://github.com/apache/spark/pull/7899 to change the repository
pointer. There are 3-4 open PRs right now in the mesos/spark-ec2
repository and I'll work on migrating them to amplab/spark-ec2 later
today.

My thoughts on moving the python script is that we should have a
wrapper shell script that just fetches the latest version of
spark_ec2.py for the corresponding Spark branch. We already have
separate branches in our spark-ec2 repository for different Spark
versions so it can just be a call to `wget
https://github.com/amplab/spark-ec2/tree/spark-version/driver/spark_ec2.py`.

Thanks
Shivaram

On Sun, Aug 2, 2015 at 11:34 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote:

 I am considering porting some of this to a more general spark-cloud
 launcher, including google/aliyun/rackspace.  It shouldn't be hard at all
 given the current approach for setup/install.


 FWIW, there are already some tools for launching Spark clusters on GCE and
 Azure:

 http://spark-packages.org/?q=tags%3A%22Deployment%22

 Nick


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-08-02 Thread Nicholas Chammas
On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote:

 I am considering porting some of this to a more general spark-cloud
 launcher, including google/aliyun/rackspace.  It shouldn't be hard at all
 given the current approach for setup/install.


FWIW, there are already some tools for launching Spark clusters on GCE and
Azure:

http://spark-packages.org/?q=tags%3A%22Deployment%22

Nick


Re: Should spark-ec2 get its own repo?

2015-08-01 Thread Matt Goodman
I think that is a good idea, and slated to happen.  At the very least a
README or some such.  Is this a use case for git submodules?  I am
considering porting some of this to a more general spark-cloud launcher,
including google/aliyun/rackspace.  It shouldn't be hard at all given the
current approach for setup/install.

--Matthew Goodman

=
Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch

On Fri, Jul 31, 2015 at 6:50 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey All,

 I've mostly kept quiet since I am not very active in maintaining this
 code anymore. However, it is a bit odd that the project is
 split-brained with a lot of the code being on github and some in the
 Spark repo.

 If the consensus is to migrate everything to github, that seems okay
 with me. I would vouch for having user continuity, for instance still
 have a shim ec2/spark-ec2 script that could perhaps just download
 and unpack the real script from github.

 - Patrick

 On Fri, Jul 31, 2015 at 2:13 PM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  Yes - It is still in progress, but I have just not gotten time to get to
  this. I think getting the repo moved from mesos to amplab in the
 codebase by
  1.5 should be possible.
 
  Thanks
  Shivaram
 
  On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote:
 
  PS is this still in progress? it feels like something that would be
  good to do before 1.5.0, if it's going to happen soon.
 
  On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
   Yeah I'll send a note to the mesos dev list just to make sure they are
   informed.
  
   Shivaram
  
   On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com
 wrote:
  
   I agree it's worth informing Mesos devs and checking that there are
 no
   big objections. I presume Shivaram is plugged in enough to Mesos that
   there won't be any surprises there, and that the project would also
   agree with moving this Spark-specific bit out. they may also want to
   leave a pointer to the new location in the mesos repo of course.
  
   I don't think it is something that requires a formal vote. It's not a
   question of ownership -- neither Apache nor the project PMC owns the
   code. I don't think it's different from retiring or removing any
 other
   code.
  
  
  
  
  
   On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan 
 mri...@gmail.com
   wrote:
If I am not wrong, since the code was hosted within mesos project
repo, I assume (atleast part of it) is owned by mesos project and
 so
its PMC ?
   
- Mridul
   
On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
There is technically no PMC for the spark-ec2 project (I guess we
are
kind
of establishing one right now). I haven't heard anything from the
Spark
PMC
on the dev list that might suggest a need for a vote so far. I
 will
send
another round of email notification to the dev list when we have a
JIRA
/ PR
that actually moves the scripts (right now the only thing that
changed
is
the location of some scripts in mesos/ to amplab/).
   
Thanks
Shivaram
   
  
  
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Should spark-ec2 get its own repo?

2015-08-01 Thread Josh Rosen
I don't think that using git submodules is a good idea here:

   - The extra `git submodule init  git submodule update` step can lead
   to confusing problems in certain workflows.
   - We'd wind up with many commits that serve only to bump the submodule
   SHA; these commits will be hard to review since they won't contain line
   diffs (the author will have to manually provide a link to the diff of code
   changes).


On Sat, Aug 1, 2015 at 10:08 AM, Matt Goodman meawo...@gmail.com wrote:

 I think that is a good idea, and slated to happen.  At the very least a
 README or some such.  Is this a use case for git submodules?  I am
 considering porting some of this to a more general spark-cloud launcher,
 including google/aliyun/rackspace.  It shouldn't be hard at all given the
 current approach for setup/install.

 --Matthew Goodman

 =
 Check Out My Website: http://craneium.net
 Find me on LinkedIn: http://tinyurl.com/d6wlch

 On Fri, Jul 31, 2015 at 6:50 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Hey All,

 I've mostly kept quiet since I am not very active in maintaining this
 code anymore. However, it is a bit odd that the project is
 split-brained with a lot of the code being on github and some in the
 Spark repo.

 If the consensus is to migrate everything to github, that seems okay
 with me. I would vouch for having user continuity, for instance still
 have a shim ec2/spark-ec2 script that could perhaps just download
 and unpack the real script from github.

 - Patrick

 On Fri, Jul 31, 2015 at 2:13 PM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  Yes - It is still in progress, but I have just not gotten time to get to
  this. I think getting the repo moved from mesos to amplab in the
 codebase by
  1.5 should be possible.
 
  Thanks
  Shivaram
 
  On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote:
 
  PS is this still in progress? it feels like something that would be
  good to do before 1.5.0, if it's going to happen soon.
 
  On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
   Yeah I'll send a note to the mesos dev list just to make sure they
 are
   informed.
  
   Shivaram
  
   On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com
 wrote:
  
   I agree it's worth informing Mesos devs and checking that there are
 no
   big objections. I presume Shivaram is plugged in enough to Mesos
 that
   there won't be any surprises there, and that the project would also
   agree with moving this Spark-specific bit out. they may also want to
   leave a pointer to the new location in the mesos repo of course.
  
   I don't think it is something that requires a formal vote. It's not
 a
   question of ownership -- neither Apache nor the project PMC owns the
   code. I don't think it's different from retiring or removing any
 other
   code.
  
  
  
  
  
   On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan 
 mri...@gmail.com
   wrote:
If I am not wrong, since the code was hosted within mesos project
repo, I assume (atleast part of it) is owned by mesos project and
 so
its PMC ?
   
- Mridul
   
On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
There is technically no PMC for the spark-ec2 project (I guess we
are
kind
of establishing one right now). I haven't heard anything from the
Spark
PMC
on the dev list that might suggest a need for a vote so far. I
 will
send
another round of email notification to the dev list when we have
 a
JIRA
/ PR
that actually moves the scripts (right now the only thing that
changed
is
the location of some scripts in mesos/ to amplab/).
   
Thanks
Shivaram
   
  
  
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: Should spark-ec2 get its own repo?

2015-07-31 Thread Patrick Wendell
Hey All,

I've mostly kept quiet since I am not very active in maintaining this
code anymore. However, it is a bit odd that the project is
split-brained with a lot of the code being on github and some in the
Spark repo.

If the consensus is to migrate everything to github, that seems okay
with me. I would vouch for having user continuity, for instance still
have a shim ec2/spark-ec2 script that could perhaps just download
and unpack the real script from github.

- Patrick

On Fri, Jul 31, 2015 at 2:13 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Yes - It is still in progress, but I have just not gotten time to get to
 this. I think getting the repo moved from mesos to amplab in the codebase by
 1.5 should be possible.

 Thanks
 Shivaram

 On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote:

 PS is this still in progress? it feels like something that would be
 good to do before 1.5.0, if it's going to happen soon.

 On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  Yeah I'll send a note to the mesos dev list just to make sure they are
  informed.
 
  Shivaram
 
  On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote:
 
  I agree it's worth informing Mesos devs and checking that there are no
  big objections. I presume Shivaram is plugged in enough to Mesos that
  there won't be any surprises there, and that the project would also
  agree with moving this Spark-specific bit out. they may also want to
  leave a pointer to the new location in the mesos repo of course.
 
  I don't think it is something that requires a formal vote. It's not a
  question of ownership -- neither Apache nor the project PMC owns the
  code. I don't think it's different from retiring or removing any other
  code.
 
 
 
 
 
  On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
   If I am not wrong, since the code was hosted within mesos project
   repo, I assume (atleast part of it) is owned by mesos project and so
   its PMC ?
  
   - Mridul
  
   On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
   shiva...@eecs.berkeley.edu wrote:
   There is technically no PMC for the spark-ec2 project (I guess we
   are
   kind
   of establishing one right now). I haven't heard anything from the
   Spark
   PMC
   on the dev list that might suggest a need for a vote so far. I will
   send
   another round of email notification to the dev list when we have a
   JIRA
   / PR
   that actually moves the scripts (right now the only thing that
   changed
   is
   the location of some scripts in mesos/ to amplab/).
  
   Thanks
   Shivaram
  
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-31 Thread Shivaram Venkataraman
Yes - It is still in progress, but I have just not gotten time to get to
this. I think getting the repo moved from mesos to amplab in the codebase
by 1.5 should be possible.

Thanks
Shivaram

On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote:

 PS is this still in progress? it feels like something that would be
 good to do before 1.5.0, if it's going to happen soon.

 On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  Yeah I'll send a note to the mesos dev list just to make sure they are
  informed.
 
  Shivaram
 
  On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote:
 
  I agree it's worth informing Mesos devs and checking that there are no
  big objections. I presume Shivaram is plugged in enough to Mesos that
  there won't be any surprises there, and that the project would also
  agree with moving this Spark-specific bit out. they may also want to
  leave a pointer to the new location in the mesos repo of course.
 
  I don't think it is something that requires a formal vote. It's not a
  question of ownership -- neither Apache nor the project PMC owns the
  code. I don't think it's different from retiring or removing any other
  code.
 
 
 
 
 
  On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
   If I am not wrong, since the code was hosted within mesos project
   repo, I assume (atleast part of it) is owned by mesos project and so
   its PMC ?
  
   - Mridul
  
   On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
   shiva...@eecs.berkeley.edu wrote:
   There is technically no PMC for the spark-ec2 project (I guess we are
   kind
   of establishing one right now). I haven't heard anything from the
 Spark
   PMC
   on the dev list that might suggest a need for a vote so far. I will
   send
   another round of email notification to the dev list when we have a
 JIRA
   / PR
   that actually moves the scripts (right now the only thing that
 changed
   is
   the location of some scripts in mesos/ to amplab/).
  
   Thanks
   Shivaram
  
 
 



Re: Should spark-ec2 get its own repo?

2015-07-31 Thread Sean Owen
PS is this still in progress? it feels like something that would be
good to do before 1.5.0, if it's going to happen soon.

On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Yeah I'll send a note to the mesos dev list just to make sure they are
 informed.

 Shivaram

 On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote:

 I agree it's worth informing Mesos devs and checking that there are no
 big objections. I presume Shivaram is plugged in enough to Mesos that
 there won't be any surprises there, and that the project would also
 agree with moving this Spark-specific bit out. they may also want to
 leave a pointer to the new location in the mesos repo of course.

 I don't think it is something that requires a formal vote. It's not a
 question of ownership -- neither Apache nor the project PMC owns the
 code. I don't think it's different from retiring or removing any other
 code.





 On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan mri...@gmail.com
 wrote:
  If I am not wrong, since the code was hosted within mesos project
  repo, I assume (atleast part of it) is owned by mesos project and so
  its PMC ?
 
  - Mridul
 
  On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
  There is technically no PMC for the spark-ec2 project (I guess we are
  kind
  of establishing one right now). I haven't heard anything from the Spark
  PMC
  on the dev list that might suggest a need for a vote so far. I will
  send
  another round of email notification to the dev list when we have a JIRA
  / PR
  that actually moves the scripts (right now the only thing that changed
  is
  the location of some scripts in mesos/ to amplab/).
 
  Thanks
  Shivaram
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-22 Thread Shivaram Venkataraman
Yeah I'll send a note to the mesos dev list just to make sure they are
informed.

Shivaram

On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote:

 I agree it's worth informing Mesos devs and checking that there are no
 big objections. I presume Shivaram is plugged in enough to Mesos that
 there won't be any surprises there, and that the project would also
 agree with moving this Spark-specific bit out. they may also want to
 leave a pointer to the new location in the mesos repo of course.

 I don't think it is something that requires a formal vote. It's not a
 question of ownership -- neither Apache nor the project PMC owns the
 code. I don't think it's different from retiring or removing any other
 code.





 On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan mri...@gmail.com
 wrote:
  If I am not wrong, since the code was hosted within mesos project
  repo, I assume (atleast part of it) is owned by mesos project and so
  its PMC ?
 
  - Mridul
 
  On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
  There is technically no PMC for the spark-ec2 project (I guess we are
 kind
  of establishing one right now). I haven't heard anything from the Spark
 PMC
  on the dev list that might suggest a need for a vote so far. I will send
  another round of email notification to the dev list when we have a JIRA
 / PR
  that actually moves the scripts (right now the only thing that changed
 is
  the location of some scripts in mesos/ to amplab/).
 
  Thanks
  Shivaram
 



Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Shivaram Venkataraman
There is technically no PMC for the spark-ec2 project (I guess we are kind
of establishing one right now). I haven't heard anything from the Spark PMC
on the dev list that might suggest a need for a vote so far. I will send
another round of email notification to the dev list when we have a JIRA /
PR that actually moves the scripts (right now the only thing that changed
is the location of some scripts in mesos/ to amplab/).

Thanks
Shivaram

On Mon, Jul 20, 2015 at 12:55 PM, Mridul Muralidharan mri...@gmail.com
wrote:

 Might be a good idea to get the PMC's of both projects to sign off to
 prevent future issues with apache.

 Regards,
 Mridul

 On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  I've created https://github.com/amplab/spark-ec2 and added an initial
 set of
  committers. Note that this is not a fork of the existing
  github.com/mesos/spark-ec2 and users will need to fork from here. This
 is
  mostly to avoid the base-fork in pull requests being set incorrectly etc.
 
  I'll be migrating some PRs / closing them in the old repo and will also
  update the README in that repo.
 
  Thanks
  Shivaram
 
  On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com wrote:
 
  On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
   I am not sure why the ASF JIRA can be only used to track one set of
   artifacts that are packaged and released together. I agree that
 marking
   a
   fix version as 1.5 for a change in another repo doesn't make a lot of
   sense,
   but we could just not use fix versions for the EC2 issues ?
 
  *shrug* it just seems harder and less natural to use ASF JIRA. What's
  the benefit? I agree it's not a big deal either way but it's a small
  part of the problem we're solving in the first place. I suspect that
  one way or the other, there would be issues filed both places, so this
  probably isn't worth debating.
 
 
   My concerns are less about it being pushed out etc. For better or
 worse
   we
   have had EC2 scripts be a part of the Spark distribution from a very
   early
   stage (from version 0.5.0 if my git history reading is correct).  So
   users
   will assume that any error with EC2 scripts belong to the Spark
 project.
   In
   addition almost all the contributions to the EC2 scripts come from
 Spark
   developers and so keeping the issues in the same mailing list / JIRA
   seems
   natural. This I guess again relates to the question of managing issues
   for
   code that isn't part of the Spark release artifact.
 
  Yeah good question -- Github doesn't give you a mailing list. I think
  dev@ would still be where it's discussed which is ... again 'part of
  the problem' but as you say, probably beneficial. It's a pretty low
  traffic topic anyway.
 
 
   I'll create the amplab/spark-ec2 repo over the next couple of days
   unless
   there are more comments on this thread. This will at least alleviate
   some of
   the naming confusion over using a repository in mesos and I'll give
   Sean,
   Nick, Matthew commit access to it. I am still not convinced about
 moving
   the
   issues over though.
 
  I won't move the issues. Maybe time tells whether one approach is
  better, or that it just doesn't matter.
 
  However it'd be a great opportunity to review and clear stale EC2
 issues.
 
 



Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Mridul Muralidharan
If I am not wrong, since the code was hosted within mesos project
repo, I assume (atleast part of it) is owned by mesos project and so
its PMC ?

- Mridul

On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 There is technically no PMC for the spark-ec2 project (I guess we are kind
 of establishing one right now). I haven't heard anything from the Spark PMC
 on the dev list that might suggest a need for a vote so far. I will send
 another round of email notification to the dev list when we have a JIRA / PR
 that actually moves the scripts (right now the only thing that changed is
 the location of some scripts in mesos/ to amplab/).

 Thanks
 Shivaram

 On Mon, Jul 20, 2015 at 12:55 PM, Mridul Muralidharan mri...@gmail.com
 wrote:

 Might be a good idea to get the PMC's of both projects to sign off to
 prevent future issues with apache.

 Regards,
 Mridul

 On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  I've created https://github.com/amplab/spark-ec2 and added an initial
  set of
  committers. Note that this is not a fork of the existing
  github.com/mesos/spark-ec2 and users will need to fork from here. This
  is
  mostly to avoid the base-fork in pull requests being set incorrectly
  etc.
 
  I'll be migrating some PRs / closing them in the old repo and will also
  update the README in that repo.
 
  Thanks
  Shivaram
 
  On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com wrote:
 
  On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
   I am not sure why the ASF JIRA can be only used to track one set of
   artifacts that are packaged and released together. I agree that
   marking
   a
   fix version as 1.5 for a change in another repo doesn't make a lot of
   sense,
   but we could just not use fix versions for the EC2 issues ?
 
  *shrug* it just seems harder and less natural to use ASF JIRA. What's
  the benefit? I agree it's not a big deal either way but it's a small
  part of the problem we're solving in the first place. I suspect that
  one way or the other, there would be issues filed both places, so this
  probably isn't worth debating.
 
 
   My concerns are less about it being pushed out etc. For better or
   worse
   we
   have had EC2 scripts be a part of the Spark distribution from a very
   early
   stage (from version 0.5.0 if my git history reading is correct).  So
   users
   will assume that any error with EC2 scripts belong to the Spark
   project.
   In
   addition almost all the contributions to the EC2 scripts come from
   Spark
   developers and so keeping the issues in the same mailing list / JIRA
   seems
   natural. This I guess again relates to the question of managing
   issues
   for
   code that isn't part of the Spark release artifact.
 
  Yeah good question -- Github doesn't give you a mailing list. I think
  dev@ would still be where it's discussed which is ... again 'part of
  the problem' but as you say, probably beneficial. It's a pretty low
  traffic topic anyway.
 
 
   I'll create the amplab/spark-ec2 repo over the next couple of days
   unless
   there are more comments on this thread. This will at least alleviate
   some of
   the naming confusion over using a repository in mesos and I'll give
   Sean,
   Nick, Matthew commit access to it. I am still not convinced about
   moving
   the
   issues over though.
 
  I won't move the issues. Maybe time tells whether one approach is
  better, or that it just doesn't matter.
 
  However it'd be a great opportunity to review and clear stale EC2
  issues.
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Shivaram Venkataraman
Thats part of the confusion we are trying to fix here -- the repository
used to live in the mesos github account but was never a part of the Apache
Mesos project. It was a remnant part of Spark from when Spark used to live
at github.com/mesos/spark.

Shivaram

On Tue, Jul 21, 2015 at 11:03 AM, Mridul Muralidharan mri...@gmail.com
wrote:

 If I am not wrong, since the code was hosted within mesos project
 repo, I assume (atleast part of it) is owned by mesos project and so
 its PMC ?

 - Mridul

 On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  There is technically no PMC for the spark-ec2 project (I guess we are
 kind
  of establishing one right now). I haven't heard anything from the Spark
 PMC
  on the dev list that might suggest a need for a vote so far. I will send
  another round of email notification to the dev list when we have a JIRA
 / PR
  that actually moves the scripts (right now the only thing that changed is
  the location of some scripts in mesos/ to amplab/).
 
  Thanks
  Shivaram
 
  On Mon, Jul 20, 2015 at 12:55 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
 
  Might be a good idea to get the PMC's of both projects to sign off to
  prevent future issues with apache.
 
  Regards,
  Mridul
 
  On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
   I've created https://github.com/amplab/spark-ec2 and added an initial
   set of
   committers. Note that this is not a fork of the existing
   github.com/mesos/spark-ec2 and users will need to fork from here.
 This
   is
   mostly to avoid the base-fork in pull requests being set incorrectly
   etc.
  
   I'll be migrating some PRs / closing them in the old repo and will
 also
   update the README in that repo.
  
   Thanks
   Shivaram
  
   On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com
 wrote:
  
   On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
   shiva...@eecs.berkeley.edu wrote:
I am not sure why the ASF JIRA can be only used to track one set of
artifacts that are packaged and released together. I agree that
marking
a
fix version as 1.5 for a change in another repo doesn't make a lot
 of
sense,
but we could just not use fix versions for the EC2 issues ?
  
   *shrug* it just seems harder and less natural to use ASF JIRA. What's
   the benefit? I agree it's not a big deal either way but it's a small
   part of the problem we're solving in the first place. I suspect that
   one way or the other, there would be issues filed both places, so
 this
   probably isn't worth debating.
  
  
My concerns are less about it being pushed out etc. For better or
worse
we
have had EC2 scripts be a part of the Spark distribution from a
 very
early
stage (from version 0.5.0 if my git history reading is correct).
 So
users
will assume that any error with EC2 scripts belong to the Spark
project.
In
addition almost all the contributions to the EC2 scripts come from
Spark
developers and so keeping the issues in the same mailing list /
 JIRA
seems
natural. This I guess again relates to the question of managing
issues
for
code that isn't part of the Spark release artifact.
  
   Yeah good question -- Github doesn't give you a mailing list. I think
   dev@ would still be where it's discussed which is ... again 'part of
   the problem' but as you say, probably beneficial. It's a pretty low
   traffic topic anyway.
  
  
I'll create the amplab/spark-ec2 repo over the next couple of days
unless
there are more comments on this thread. This will at least
 alleviate
some of
the naming confusion over using a repository in mesos and I'll give
Sean,
Nick, Matthew commit access to it. I am still not convinced about
moving
the
issues over though.
  
   I won't move the issues. Maybe time tells whether one approach is
   better, or that it just doesn't matter.
  
   However it'd be a great opportunity to review and clear stale EC2
   issues.
  
  
 
 



Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Mridul Muralidharan
That sounds good. Thanks for clarifying !


Regards,
Mridul

On Tue, Jul 21, 2015 at 11:09 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Thats part of the confusion we are trying to fix here -- the repository used
 to live in the mesos github account but was never a part of the Apache Mesos
 project. It was a remnant part of Spark from when Spark used to live at
 github.com/mesos/spark.

 Shivaram

 On Tue, Jul 21, 2015 at 11:03 AM, Mridul Muralidharan mri...@gmail.com
 wrote:

 If I am not wrong, since the code was hosted within mesos project
 repo, I assume (atleast part of it) is owned by mesos project and so
 its PMC ?

 - Mridul

 On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  There is technically no PMC for the spark-ec2 project (I guess we are
  kind
  of establishing one right now). I haven't heard anything from the Spark
  PMC
  on the dev list that might suggest a need for a vote so far. I will send
  another round of email notification to the dev list when we have a JIRA
  / PR
  that actually moves the scripts (right now the only thing that changed
  is
  the location of some scripts in mesos/ to amplab/).
 
  Thanks
  Shivaram
 
  On Mon, Jul 20, 2015 at 12:55 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
 
  Might be a good idea to get the PMC's of both projects to sign off to
  prevent future issues with apache.
 
  Regards,
  Mridul
 
  On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
   I've created https://github.com/amplab/spark-ec2 and added an initial
   set of
   committers. Note that this is not a fork of the existing
   github.com/mesos/spark-ec2 and users will need to fork from here.
   This
   is
   mostly to avoid the base-fork in pull requests being set incorrectly
   etc.
  
   I'll be migrating some PRs / closing them in the old repo and will
   also
   update the README in that repo.
  
   Thanks
   Shivaram
  
   On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com
   wrote:
  
   On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
   shiva...@eecs.berkeley.edu wrote:
I am not sure why the ASF JIRA can be only used to track one set
of
artifacts that are packaged and released together. I agree that
marking
a
fix version as 1.5 for a change in another repo doesn't make a lot
of
sense,
but we could just not use fix versions for the EC2 issues ?
  
   *shrug* it just seems harder and less natural to use ASF JIRA.
   What's
   the benefit? I agree it's not a big deal either way but it's a small
   part of the problem we're solving in the first place. I suspect that
   one way or the other, there would be issues filed both places, so
   this
   probably isn't worth debating.
  
  
My concerns are less about it being pushed out etc. For better or
worse
we
have had EC2 scripts be a part of the Spark distribution from a
very
early
stage (from version 0.5.0 if my git history reading is correct).
So
users
will assume that any error with EC2 scripts belong to the Spark
project.
In
addition almost all the contributions to the EC2 scripts come from
Spark
developers and so keeping the issues in the same mailing list /
JIRA
seems
natural. This I guess again relates to the question of managing
issues
for
code that isn't part of the Spark release artifact.
  
   Yeah good question -- Github doesn't give you a mailing list. I
   think
   dev@ would still be where it's discussed which is ... again 'part of
   the problem' but as you say, probably beneficial. It's a pretty low
   traffic topic anyway.
  
  
I'll create the amplab/spark-ec2 repo over the next couple of days
unless
there are more comments on this thread. This will at least
alleviate
some of
the naming confusion over using a repository in mesos and I'll
give
Sean,
Nick, Matthew commit access to it. I am still not convinced about
moving
the
issues over though.
  
   I won't move the issues. Maybe time tells whether one approach is
   better, or that it just doesn't matter.
  
   However it'd be a great opportunity to review and clear stale EC2
   issues.
  
  
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-20 Thread Shivaram Venkataraman
I've created https://github.com/amplab/spark-ec2 and added an initial set
of committers. Note that this is not a fork of the existing
github.com/mesos/spark-ec2 and users will need to fork from here. This is
mostly to avoid the base-fork in pull requests being set incorrectly etc.

I'll be migrating some PRs / closing them in the old repo and will also
update the README in that repo.

Thanks
Shivaram

On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com wrote:

 On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  I am not sure why the ASF JIRA can be only used to track one set of
  artifacts that are packaged and released together. I agree that marking a
  fix version as 1.5 for a change in another repo doesn't make a lot of
 sense,
  but we could just not use fix versions for the EC2 issues ?

 *shrug* it just seems harder and less natural to use ASF JIRA. What's
 the benefit? I agree it's not a big deal either way but it's a small
 part of the problem we're solving in the first place. I suspect that
 one way or the other, there would be issues filed both places, so this
 probably isn't worth debating.


  My concerns are less about it being pushed out etc. For better or worse
 we
  have had EC2 scripts be a part of the Spark distribution from a very
 early
  stage (from version 0.5.0 if my git history reading is correct).  So
 users
  will assume that any error with EC2 scripts belong to the Spark project.
 In
  addition almost all the contributions to the EC2 scripts come from Spark
  developers and so keeping the issues in the same mailing list / JIRA
 seems
  natural. This I guess again relates to the question of managing issues
 for
  code that isn't part of the Spark release artifact.

 Yeah good question -- Github doesn't give you a mailing list. I think
 dev@ would still be where it's discussed which is ... again 'part of
 the problem' but as you say, probably beneficial. It's a pretty low
 traffic topic anyway.


  I'll create the amplab/spark-ec2 repo over the next couple of days unless
  there are more comments on this thread. This will at least alleviate
 some of
  the naming confusion over using a repository in mesos and I'll give Sean,
  Nick, Matthew commit access to it. I am still not convinced about moving
 the
  issues over though.

 I won't move the issues. Maybe time tells whether one approach is
 better, or that it just doesn't matter.

 However it'd be a great opportunity to review and clear stale EC2 issues.



Re: Should spark-ec2 get its own repo?

2015-07-20 Thread Mridul Muralidharan
Might be a good idea to get the PMC's of both projects to sign off to
prevent future issues with apache.

Regards,
Mridul

On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 I've created https://github.com/amplab/spark-ec2 and added an initial set of
 committers. Note that this is not a fork of the existing
 github.com/mesos/spark-ec2 and users will need to fork from here. This is
 mostly to avoid the base-fork in pull requests being set incorrectly etc.

 I'll be migrating some PRs / closing them in the old repo and will also
 update the README in that repo.

 Thanks
 Shivaram

 On Fri, Jul 17, 2015 at 3:00 PM, Sean Owen so...@cloudera.com wrote:

 On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  I am not sure why the ASF JIRA can be only used to track one set of
  artifacts that are packaged and released together. I agree that marking
  a
  fix version as 1.5 for a change in another repo doesn't make a lot of
  sense,
  but we could just not use fix versions for the EC2 issues ?

 *shrug* it just seems harder and less natural to use ASF JIRA. What's
 the benefit? I agree it's not a big deal either way but it's a small
 part of the problem we're solving in the first place. I suspect that
 one way or the other, there would be issues filed both places, so this
 probably isn't worth debating.


  My concerns are less about it being pushed out etc. For better or worse
  we
  have had EC2 scripts be a part of the Spark distribution from a very
  early
  stage (from version 0.5.0 if my git history reading is correct).  So
  users
  will assume that any error with EC2 scripts belong to the Spark project.
  In
  addition almost all the contributions to the EC2 scripts come from Spark
  developers and so keeping the issues in the same mailing list / JIRA
  seems
  natural. This I guess again relates to the question of managing issues
  for
  code that isn't part of the Spark release artifact.

 Yeah good question -- Github doesn't give you a mailing list. I think
 dev@ would still be where it's discussed which is ... again 'part of
 the problem' but as you say, probably beneficial. It's a pretty low
 traffic topic anyway.


  I'll create the amplab/spark-ec2 repo over the next couple of days
  unless
  there are more comments on this thread. This will at least alleviate
  some of
  the naming confusion over using a repository in mesos and I'll give
  Sean,
  Nick, Matthew commit access to it. I am still not convinced about moving
  the
  issues over though.

 I won't move the issues. Maybe time tells whether one approach is
 better, or that it just doesn't matter.

 However it'd be a great opportunity to review and clear stale EC2 issues.



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-17 Thread Shivaram Venkataraman
Some replies inline

On Wed, Jul 15, 2015 at 1:08 AM, Sean Owen so...@cloudera.com wrote:

 The code can continue to be a good reference implementation, no matter
 where it lives. In fact, it can be a better more complete one, and
 easier to update.

 I agree that ec2/ needs to retain some kind of pointer to the new
 location. Yes, maybe a script as well that does the checkout as you
 say. We have to be careful that the effect here isn't to make people
 think this code is still part of the blessed bits of a Spark release,
 since it isn't. But I suppose the point is that it isn't quite now
 either (isn't tested, isn't fully contained in apache/spark) and
 that's what we're fixing.

 I still don't like the idea of using the ASF JIRA for Spark to track
 issues in a separate project, as these kinds of splits are what we're
 trying to get rid of. I think it's a plus to be able to only bother
 with the Github PR/issue system, and not parallel JIRAs as well. I
 also worry that this blurs the line between code that is formally
 tested and blessed in a Spark release, and that which is not. You fix
 an issue in this separate repo and marked it fixed in Spark 1.5 --
 what does that imply?

 I am not sure why the ASF JIRA can be only used to track one set of
artifacts that are packaged and released together. I agree that marking a
fix version as 1.5 for a change in another repo doesn't make a lot of
sense, but we could just not use fix versions for the EC2 issues ?


 I think the issue is people don't like the sense this is getting
 pushed outside the wall, or 'removed' from Spark. On the one hand I
 argue it hasn't really properly been part of Spark -- that's why we
 need this change to happen. But, I also think this is easy to resolve
 other ways: spark-packages.org, the pointer in the repo, prominent
 notes in the wiki, etc.

 My concerns are less about it being pushed out etc. For better or worse we
have had EC2 scripts be a part of the Spark distribution from a very early
stage (from version 0.5.0 if my git history reading is correct).  So users
will assume that any error with EC2 scripts belong to the Spark project. In
addition almost all the contributions to the EC2 scripts come from Spark
developers and so keeping the issues in the same mailing list / JIRA seems
natural. This I guess again relates to the question of managing issues for
code that isn't part of the Spark release artifact.

I suggest Shivaram owns this, and that amplab/spark-ec2 is used to
 host? I'm not qualified to help make the new copy or repo admin but
 would be happy to help with the rest, like triaging, if you can give
 me rights to open issues.

 I'll create the amplab/spark-ec2 repo over the next couple of days unless
there are more comments on this thread. This will at least alleviate some
of the naming confusion over using a repository in mesos and I'll give
Sean, Nick, Matthew commit access to it. I am still not convinced about
moving the issues over though.

Thanks
Shivaram


Re: Should spark-ec2 get its own repo?

2015-07-17 Thread Sean Owen
On Fri, Jul 17, 2015 at 6:58 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 I am not sure why the ASF JIRA can be only used to track one set of
 artifacts that are packaged and released together. I agree that marking a
 fix version as 1.5 for a change in another repo doesn't make a lot of sense,
 but we could just not use fix versions for the EC2 issues ?

*shrug* it just seems harder and less natural to use ASF JIRA. What's
the benefit? I agree it's not a big deal either way but it's a small
part of the problem we're solving in the first place. I suspect that
one way or the other, there would be issues filed both places, so this
probably isn't worth debating.


 My concerns are less about it being pushed out etc. For better or worse we
 have had EC2 scripts be a part of the Spark distribution from a very early
 stage (from version 0.5.0 if my git history reading is correct).  So users
 will assume that any error with EC2 scripts belong to the Spark project. In
 addition almost all the contributions to the EC2 scripts come from Spark
 developers and so keeping the issues in the same mailing list / JIRA seems
 natural. This I guess again relates to the question of managing issues for
 code that isn't part of the Spark release artifact.

Yeah good question -- Github doesn't give you a mailing list. I think
dev@ would still be where it's discussed which is ... again 'part of
the problem' but as you say, probably beneficial. It's a pretty low
traffic topic anyway.


 I'll create the amplab/spark-ec2 repo over the next couple of days unless
 there are more comments on this thread. This will at least alleviate some of
 the naming confusion over using a repository in mesos and I'll give Sean,
 Nick, Matthew commit access to it. I am still not convinced about moving the
 issues over though.

I won't move the issues. Maybe time tells whether one approach is
better, or that it just doesn't matter.

However it'd be a great opportunity to review and clear stale EC2 issues.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-15 Thread Sean Owen
The code can continue to be a good reference implementation, no matter
where it lives. In fact, it can be a better more complete one, and
easier to update.

I agree that ec2/ needs to retain some kind of pointer to the new
location. Yes, maybe a script as well that does the checkout as you
say. We have to be careful that the effect here isn't to make people
think this code is still part of the blessed bits of a Spark release,
since it isn't. But I suppose the point is that it isn't quite now
either (isn't tested, isn't fully contained in apache/spark) and
that's what we're fixing.

I still don't like the idea of using the ASF JIRA for Spark to track
issues in a separate project, as these kinds of splits are what we're
trying to get rid of. I think it's a plus to be able to only bother
with the Github PR/issue system, and not parallel JIRAs as well. I
also worry that this blurs the line between code that is formally
tested and blessed in a Spark release, and that which is not. You fix
an issue in this separate repo and marked it fixed in Spark 1.5 --
what does that imply?

I think the issue is people don't like the sense this is getting
pushed outside the wall, or 'removed' from Spark. On the one hand I
argue it hasn't really properly been part of Spark -- that's why we
need this change to happen. But, I also think this is easy to resolve
other ways: spark-packages.org, the pointer in the repo, prominent
notes in the wiki, etc.

I suggest Shivaram owns this, and that amplab/spark-ec2 is used to
host? I'm not qualified to help make the new copy or repo admin but
would be happy to help with the rest, like triaging, if you can give
me rights to open issues.


On Wed, Jul 15, 2015 at 5:35 AM, Matt Goodman meawo...@gmail.com wrote:
 I concur with the things Sean said about keeping the same JIRA.  Frankly,
 its a pretty small part of spark, and as mentioned by Nicholas, a reference
 implementation of getting Spark running in ec2.

 I can see wanting to grow it to a little more general tool that implements
 launchers for other compute platforms.  Porting this over to
 Google/M$/rackspace offerings would be not too far out of reach.

 --Matthew Goodman

 =
 Check Out My Website: http://craneium.net
 Find me on LinkedIn: http://tinyurl.com/d6wlch

 On Mon, Jul 13, 2015 at 2:46 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:

  At a high level I see the spark-ec2 scripts as an effort to provide a
  reference implementation for launching EC2 clusters with Apache Spark

 On a side note, this is precisely how I used spark-ec2 for a personal
 project that does something similar: reference implementation.

 Nick
 2015년 7월 13일 (월) 오후 1:27, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu님이 작성:

 I think moving the repo-location and re-organizing the python code to
 handle dependencies, testing etc. sounds good to me. However, I think there
 are a couple of things which I am not sure about

 1. I strongly believe that we should preserve existing command-line in
 ec2/spark-ec2 (i.e. the shell script not the python file). This could be a
 thin wrapper script that just checks out the or downloads something (similar
 to say build/mvn). Mainly, I see no reason to break the workflow that users
 are used to right now.

 2. I am also not sure about that moving the issue tracker is necessarily
 a good idea. I don't think we get a large number of issues due to the EC2
 stuff  and if we do have a workflow for launching EC2 clusters, the Spark
 JIRA would still be the natural place to report issues related to this.

 At a high level I see the spark-ec2 scripts as an effort to provide a
 reference implementation for launching EC2 clusters with Apache Spark --
 Given this view I am not sure it makes sense to completely decouple this
 from the Apache project.

 Thanks
 Shivaram

 On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen so...@cloudera.com wrote:

 I agree with these points. The ec2 support is substantially a separate
 project, and would likely be better managed as one. People can much
 more rapidly iterate on it and release it.

 I suggest:

 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
 2. Add interested parties as owners/contributors
 3. Reassemble a working clone of the current code from spark/ec2 and
 mesos/spark-ec2 and check it in
 4. Announce the new location on user@, dev@
 5. Triage open JIRAs to the new repo's issue tracker and close them
 elsewhere
 6. Remove the old copies of the code and leave a pointer to the new
 location in their place

 I'd also like to hear a few more nods before pulling the trigger though.

 On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman meawo...@gmail.com
 wrote:
  I wanted to revive the conversation about the spark-ec2 tools, as it
  seems
  to have been lost in the 1.4.1 release voting spree.
 
  I think that splitting it into its own repository is a really good
  move, and
  I would also be happy to help with this transition, as well as help
  

Re: Should spark-ec2 get its own repo?

2015-07-14 Thread Matt Goodman
I concur with the things Sean said about keeping the same JIRA.  Frankly,
its a pretty small part of spark, and as mentioned by Nicholas, a reference
implementation of getting Spark running in ec2.

I can see wanting to grow it to a little more general tool that implements
launchers for other compute platforms.  Porting this over to
Google/M$/rackspace offerings would be not too far out of reach.

--Matthew Goodman

=
Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch

On Mon, Jul 13, 2015 at 2:46 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

  At a high level I see the spark-ec2 scripts as an effort to provide a
 reference implementation for launching EC2 clusters with Apache Spark

 On a side note, this is precisely how I used spark-ec2 for a personal
 project that does something similar: reference implementation.

 Nick
 2015년 7월 13일 (월) 오후 1:27, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu님이 작성:

 I think moving the repo-location and re-organizing the python code to
 handle dependencies, testing etc. sounds good to me. However, I think there
 are a couple of things which I am not sure about

 1. I strongly believe that we should preserve existing command-line in
 ec2/spark-ec2 (i.e. the shell script not the python file). This could be a
 thin wrapper script that just checks out the or downloads something
 (similar to say build/mvn). Mainly, I see no reason to break the workflow
 that users are used to right now.

 2. I am also not sure about that moving the issue tracker is necessarily
 a good idea. I don't think we get a large number of issues due to the EC2
 stuff  and if we do have a workflow for launching EC2 clusters, the Spark
 JIRA would still be the natural place to report issues related to this.

 At a high level I see the spark-ec2 scripts as an effort to provide a
 reference implementation for launching EC2 clusters with Apache Spark --
 Given this view I am not sure it makes sense to completely decouple this
 from the Apache project.

 Thanks
 Shivaram

 On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen so...@cloudera.com wrote:

 I agree with these points. The ec2 support is substantially a separate
 project, and would likely be better managed as one. People can much
 more rapidly iterate on it and release it.

 I suggest:

 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
 2. Add interested parties as owners/contributors
 3. Reassemble a working clone of the current code from spark/ec2 and
 mesos/spark-ec2 and check it in
 4. Announce the new location on user@, dev@
 5. Triage open JIRAs to the new repo's issue tracker and close them
 elsewhere
 6. Remove the old copies of the code and leave a pointer to the new
 location in their place

 I'd also like to hear a few more nods before pulling the trigger though.

 On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman meawo...@gmail.com
 wrote:
  I wanted to revive the conversation about the spark-ec2 tools, as it
 seems
  to have been lost in the 1.4.1 release voting spree.
 
  I think that splitting it into its own repository is a really good
 move, and
  I would also be happy to help with this transition, as well as help
 maintain
  the resulting repository.  Here is my justification for why we ought
 to do
  this split.
 
  User Facing:
 
  The spark-ec2 launcher dosen't use anything in the parent spark
 repository
  spark-ec2 version is disjoint from the parent repo.  I consider it
 confusing
  that the spark-ec2 script dosen't launch the version of spark it is
  checked-out with.
  Someone interested in setting up spark-ec2 with anything but the
 default
  configuration will have to clone at least 2 repositories at present,
 and
  probably fork and push changes to 1.
  spark-ec2 has mismatched dependencies wrt. to spark itself.  This
 includes a
  confusing shim in the spark-ec2 script to install boto, which frankly
 should
  just be a dependency of the script
 
  Developer Facing:
 
  Support across 2 repos will be worse than across 1.  Its unclear where
 to
  file issues/PRs, and requires extra communications for even fairly
 trivial
  stuff.
  Spark-ec2 also depends on a number binary blobs being in the right
 place,
  currently the responsibility for these is decentralized, and likely
 prone to
  various flavors of dumb.
  The current flow of booting a spark-ec2 cluster is _complicated_ I
 spent the
  better part of a couple days figuring out how to integrate our custom
 tools
  into this stack.  This is very hard to fix when commits/PR's need to
 span
  groups/repositories/buckets-o-binary, I am sure there are several other
  problems that are languishing under similar roadblocks
  It makes testing possible.  The spark-ec2 script is a great case for CI
  given the number of permutations of launch criteria there are.  I
 suspect
  AWS would be happy to foot the bill on spark-ec2 testing (probably ~20
 bucks
  a month based on some envelope sketches), 

Re: Should spark-ec2 get its own repo?

2015-07-13 Thread Shivaram Venkataraman
I think moving the repo-location and re-organizing the python code to
handle dependencies, testing etc. sounds good to me. However, I think there
are a couple of things which I am not sure about

1. I strongly believe that we should preserve existing command-line in
ec2/spark-ec2 (i.e. the shell script not the python file). This could be a
thin wrapper script that just checks out the or downloads something
(similar to say build/mvn). Mainly, I see no reason to break the workflow
that users are used to right now.

2. I am also not sure about that moving the issue tracker is necessarily a
good idea. I don't think we get a large number of issues due to the EC2
stuff  and if we do have a workflow for launching EC2 clusters, the Spark
JIRA would still be the natural place to report issues related to this.

At a high level I see the spark-ec2 scripts as an effort to provide a
reference implementation for launching EC2 clusters with Apache Spark --
Given this view I am not sure it makes sense to completely decouple this
from the Apache project.

Thanks
Shivaram

On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen so...@cloudera.com wrote:

 I agree with these points. The ec2 support is substantially a separate
 project, and would likely be better managed as one. People can much
 more rapidly iterate on it and release it.

 I suggest:

 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
 2. Add interested parties as owners/contributors
 3. Reassemble a working clone of the current code from spark/ec2 and
 mesos/spark-ec2 and check it in
 4. Announce the new location on user@, dev@
 5. Triage open JIRAs to the new repo's issue tracker and close them
 elsewhere
 6. Remove the old copies of the code and leave a pointer to the new
 location in their place

 I'd also like to hear a few more nods before pulling the trigger though.

 On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman meawo...@gmail.com wrote:
  I wanted to revive the conversation about the spark-ec2 tools, as it
 seems
  to have been lost in the 1.4.1 release voting spree.
 
  I think that splitting it into its own repository is a really good move,
 and
  I would also be happy to help with this transition, as well as help
 maintain
  the resulting repository.  Here is my justification for why we ought to
 do
  this split.
 
  User Facing:
 
  The spark-ec2 launcher dosen't use anything in the parent spark
 repository
  spark-ec2 version is disjoint from the parent repo.  I consider it
 confusing
  that the spark-ec2 script dosen't launch the version of spark it is
  checked-out with.
  Someone interested in setting up spark-ec2 with anything but the default
  configuration will have to clone at least 2 repositories at present, and
  probably fork and push changes to 1.
  spark-ec2 has mismatched dependencies wrt. to spark itself.  This
 includes a
  confusing shim in the spark-ec2 script to install boto, which frankly
 should
  just be a dependency of the script
 
  Developer Facing:
 
  Support across 2 repos will be worse than across 1.  Its unclear where to
  file issues/PRs, and requires extra communications for even fairly
 trivial
  stuff.
  Spark-ec2 also depends on a number binary blobs being in the right place,
  currently the responsibility for these is decentralized, and likely
 prone to
  various flavors of dumb.
  The current flow of booting a spark-ec2 cluster is _complicated_ I spent
 the
  better part of a couple days figuring out how to integrate our custom
 tools
  into this stack.  This is very hard to fix when commits/PR's need to span
  groups/repositories/buckets-o-binary, I am sure there are several other
  problems that are languishing under similar roadblocks
  It makes testing possible.  The spark-ec2 script is a great case for CI
  given the number of permutations of launch criteria there are.  I suspect
  AWS would be happy to foot the bill on spark-ec2 testing (probably ~20
 bucks
  a month based on some envelope sketches), as it is a piece of software
 that
  directly impacts other people giving them money.  I have some contacts
  there, and I am pretty sure this would be an easy conversation,
 particularly
  if the repo directly concerned with ec2.  Think also being able to
 assemble
  the binary blobs into s3 bucket dedicated to spark-ec2
 
  Any other thoughts/voices appreciated here.  spark-ec2 is a super-power
 tool
  and deserves a fair bit of attention!
  --Matthew Goodman
 
  =
  Check Out My Website: http://craneium.net
  Find me on LinkedIn: http://tinyurl.com/d6wlch

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Should spark-ec2 get its own repo?

2015-07-13 Thread Nicholas Chammas
 At a high level I see the spark-ec2 scripts as an effort to provide a
reference implementation for launching EC2 clusters with Apache Spark

On a side note, this is precisely how I used spark-ec2 for a personal
project that does something similar: reference implementation.

Nick
2015년 7월 13일 (월) 오후 1:27, Shivaram Venkataraman shiva...@eecs.berkeley.edu님이
작성:

 I think moving the repo-location and re-organizing the python code to
 handle dependencies, testing etc. sounds good to me. However, I think there
 are a couple of things which I am not sure about

 1. I strongly believe that we should preserve existing command-line in
 ec2/spark-ec2 (i.e. the shell script not the python file). This could be a
 thin wrapper script that just checks out the or downloads something
 (similar to say build/mvn). Mainly, I see no reason to break the workflow
 that users are used to right now.

 2. I am also not sure about that moving the issue tracker is necessarily a
 good idea. I don't think we get a large number of issues due to the EC2
 stuff  and if we do have a workflow for launching EC2 clusters, the Spark
 JIRA would still be the natural place to report issues related to this.

 At a high level I see the spark-ec2 scripts as an effort to provide a
 reference implementation for launching EC2 clusters with Apache Spark --
 Given this view I am not sure it makes sense to completely decouple this
 from the Apache project.

 Thanks
 Shivaram

 On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen so...@cloudera.com wrote:

 I agree with these points. The ec2 support is substantially a separate
 project, and would likely be better managed as one. People can much
 more rapidly iterate on it and release it.

 I suggest:

 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
 2. Add interested parties as owners/contributors
 3. Reassemble a working clone of the current code from spark/ec2 and
 mesos/spark-ec2 and check it in
 4. Announce the new location on user@, dev@
 5. Triage open JIRAs to the new repo's issue tracker and close them
 elsewhere
 6. Remove the old copies of the code and leave a pointer to the new
 location in their place

 I'd also like to hear a few more nods before pulling the trigger though.

 On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman meawo...@gmail.com wrote:
  I wanted to revive the conversation about the spark-ec2 tools, as it
 seems
  to have been lost in the 1.4.1 release voting spree.
 
  I think that splitting it into its own repository is a really good
 move, and
  I would also be happy to help with this transition, as well as help
 maintain
  the resulting repository.  Here is my justification for why we ought to
 do
  this split.
 
  User Facing:
 
  The spark-ec2 launcher dosen't use anything in the parent spark
 repository
  spark-ec2 version is disjoint from the parent repo.  I consider it
 confusing
  that the spark-ec2 script dosen't launch the version of spark it is
  checked-out with.
  Someone interested in setting up spark-ec2 with anything but the default
  configuration will have to clone at least 2 repositories at present, and
  probably fork and push changes to 1.
  spark-ec2 has mismatched dependencies wrt. to spark itself.  This
 includes a
  confusing shim in the spark-ec2 script to install boto, which frankly
 should
  just be a dependency of the script
 
  Developer Facing:
 
  Support across 2 repos will be worse than across 1.  Its unclear where
 to
  file issues/PRs, and requires extra communications for even fairly
 trivial
  stuff.
  Spark-ec2 also depends on a number binary blobs being in the right
 place,
  currently the responsibility for these is decentralized, and likely
 prone to
  various flavors of dumb.
  The current flow of booting a spark-ec2 cluster is _complicated_ I
 spent the
  better part of a couple days figuring out how to integrate our custom
 tools
  into this stack.  This is very hard to fix when commits/PR's need to
 span
  groups/repositories/buckets-o-binary, I am sure there are several other
  problems that are languishing under similar roadblocks
  It makes testing possible.  The spark-ec2 script is a great case for CI
  given the number of permutations of launch criteria there are.  I
 suspect
  AWS would be happy to foot the bill on spark-ec2 testing (probably ~20
 bucks
  a month based on some envelope sketches), as it is a piece of software
 that
  directly impacts other people giving them money.  I have some contacts
  there, and I am pretty sure this would be an easy conversation,
 particularly
  if the repo directly concerned with ec2.  Think also being able to
 assemble
  the binary blobs into s3 bucket dedicated to spark-ec2
 
  Any other thoughts/voices appreciated here.  spark-ec2 is a super-power
 tool
  and deserves a fair bit of attention!
  --Matthew Goodman
 
  =
  Check Out My Website: http://craneium.net
  Find me on LinkedIn: http://tinyurl.com/d6wlch

 

Re: Should spark-ec2 get its own repo?

2015-07-12 Thread Sean Owen
I agree with these points. The ec2 support is substantially a separate
project, and would likely be better managed as one. People can much
more rapidly iterate on it and release it.

I suggest:

1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
2. Add interested parties as owners/contributors
3. Reassemble a working clone of the current code from spark/ec2 and
mesos/spark-ec2 and check it in
4. Announce the new location on user@, dev@
5. Triage open JIRAs to the new repo's issue tracker and close them elsewhere
6. Remove the old copies of the code and leave a pointer to the new
location in their place

I'd also like to hear a few more nods before pulling the trigger though.

On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman meawo...@gmail.com wrote:
 I wanted to revive the conversation about the spark-ec2 tools, as it seems
 to have been lost in the 1.4.1 release voting spree.

 I think that splitting it into its own repository is a really good move, and
 I would also be happy to help with this transition, as well as help maintain
 the resulting repository.  Here is my justification for why we ought to do
 this split.

 User Facing:

 The spark-ec2 launcher dosen't use anything in the parent spark repository
 spark-ec2 version is disjoint from the parent repo.  I consider it confusing
 that the spark-ec2 script dosen't launch the version of spark it is
 checked-out with.
 Someone interested in setting up spark-ec2 with anything but the default
 configuration will have to clone at least 2 repositories at present, and
 probably fork and push changes to 1.
 spark-ec2 has mismatched dependencies wrt. to spark itself.  This includes a
 confusing shim in the spark-ec2 script to install boto, which frankly should
 just be a dependency of the script

 Developer Facing:

 Support across 2 repos will be worse than across 1.  Its unclear where to
 file issues/PRs, and requires extra communications for even fairly trivial
 stuff.
 Spark-ec2 also depends on a number binary blobs being in the right place,
 currently the responsibility for these is decentralized, and likely prone to
 various flavors of dumb.
 The current flow of booting a spark-ec2 cluster is _complicated_ I spent the
 better part of a couple days figuring out how to integrate our custom tools
 into this stack.  This is very hard to fix when commits/PR's need to span
 groups/repositories/buckets-o-binary, I am sure there are several other
 problems that are languishing under similar roadblocks
 It makes testing possible.  The spark-ec2 script is a great case for CI
 given the number of permutations of launch criteria there are.  I suspect
 AWS would be happy to foot the bill on spark-ec2 testing (probably ~20 bucks
 a month based on some envelope sketches), as it is a piece of software that
 directly impacts other people giving them money.  I have some contacts
 there, and I am pretty sure this would be an easy conversation, particularly
 if the repo directly concerned with ec2.  Think also being able to assemble
 the binary blobs into s3 bucket dedicated to spark-ec2

 Any other thoughts/voices appreciated here.  spark-ec2 is a super-power tool
 and deserves a fair bit of attention!
 --Matthew Goodman

 =
 Check Out My Website: http://craneium.net
 Find me on LinkedIn: http://tinyurl.com/d6wlch

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-11 Thread Matt Goodman
I wanted to revive the conversation about the spark-ec2 tools, as it seems
to have been lost in the 1.4.1 release voting spree.

I think that splitting it into its own repository is a really good move,
and I would also be happy to help with this transition, as well as help
maintain the resulting repository.  Here is my justification for why we
ought to do this split.

User Facing:

   - The spark-ec2 launcher dosen't use anything in the parent spark
   repository
   - spark-ec2 version is disjoint from the parent repo.  I consider it
   confusing that the spark-ec2 script dosen't launch the version of spark it
   is checked-out with.
   - Someone interested in setting up spark-ec2 with anything but the
   default configuration will have to clone at least 2 repositories at
   present, and probably fork and push changes to 1.
   - spark-ec2 has mismatched dependencies wrt. to spark itself.  This
   includes a confusing shim in the spark-ec2 script to install boto, which
   frankly should just be a dependency of the script

Developer Facing:

   - Support across 2 repos will be worse than across 1.  Its unclear where
   to file issues/PRs, and requires extra communications for even fairly
   trivial stuff.
   - Spark-ec2 also depends on a number binary blobs being in the right
   place, currently the responsibility for these is decentralized, and likely
   prone to various flavors of dumb.
   - The current flow of booting a spark-ec2 cluster is _complicated_ I
   spent the better part of a couple days figuring out how to integrate our
   custom tools into this stack.  This is very hard to fix when commits/PR's
   need to span groups/repositories/buckets-o-binary, I am sure there are
   several other problems that are languishing under similar roadblocks
   - It makes testing possible.  The spark-ec2 script is a great case for
   CI given the number of permutations of launch criteria there are.  I
   suspect AWS would be happy to foot the bill on spark-ec2 testing (probably
   ~20 bucks a month based on some envelope sketches), as it is a piece of
   software that directly impacts other people giving them money.  I have some
   contacts there, and I am pretty sure this would be an easy conversation,
   particularly if the repo directly concerned with ec2.  Think also being
   able to assemble the binary blobs into s3 bucket dedicated to spark-ec2

Any other thoughts/voices appreciated here.  spark-ec2 is a super-power
tool and deserves a fair bit of attention!
--Matthew Goodman

=
Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch


Re: Should spark-ec2 get its own repo?

2015-07-03 Thread Sean Owen
I'll render an opinion although I'm only barely qualified by having
just had a small discussion on this --

It does seem like mesos/spark-ec2 is in the wrong place, although
really, that is at best an issue for Mesos. But it does highlight that
the Spark EC2 support doesn't entirely live with and get distributed
with apache/spark.

It does feel like that should move and should not be separate from the
other half of EC2 support. Why not put it in apache/spark? I think the
problem is that the AMI process clones the repo, and the apache/spark
repo is huge. One answer is just to fix that by arranging a different
way of releasing the EC2 files as a downloadable archive.

However, if it is true that the Spark EC2 support doesn't need to live
with and get released with the rest of Spark, it might make more sense
to merge both halves into a new separate repo and run it separately
from apache/spark, like any other third-party repo.

I think that's less radical than it sounds, and has some benefits.
There is not quite the same argument of needing to build and maintain
this together like with language bindings and subprojects.

But is that something that people who use and maintain it agree with
or are advocating for?

On Fri, Jul 3, 2015 at 6:23 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 spark-ec2 is kind of a mini project within a project.

 It’s composed of a set of EC2 AMIs under someone’s account (maybe
 Patrick’s?) plus the following 2 code bases:

 Main command line tool: https://github.com/apache/spark/tree/master/ec2
 Scripts used to install stuff on launched instances:
 https://github.com/mesos/spark-ec2

 You’ll notice that part of the code lives under the Mesos GitHub
 organization. This is an artifact of history, when Spark itself kinda grew
 out of Mesos before becoming its own project.

 There are a few issues with this state of affairs, none of which are major
 but which nonetheless merit some discussion:

 The spark-ec2 code is split across 2 repositories when it is not technically
 necessary.
 Some of that code is owned by an organization that should technically not be
 owning Spark stuff.
 Spark and spark-ec2 live in the same repo but spark-ec2 issues are often
 completely disjoint from issues with Spark itself. This has led in some
 cases to new Spark RCs being cut because of minor issues with spark-ec2
 (like version strings not being updated).

 I wanted to put up for discussion a few suggestions and see what people
 agreed with.

 The current state of affairs is fine and it is not worth moving stuff
 around.
 spark-ec2 should get its own repo, and should be moved out of the main Spark
 repo. That means both of the code bases linked above would live in one place
 (maybe a spark-ec2/spark-ec2 repo).
 spark-ec2 should stay in the Spark repo, but the stuff under the Mesos
 organization should be moved elsewhere (again, perhaps under a
 spark-ec2/spark-ec2 repo).

 What do you think?

 Nick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-03 Thread Shivaram Venkataraman
As the person maintaining the mesos/spark-ec2 repo, here are my 2 cents

- I don't think it makes sense to put the scripts in the Spark repo itself.
Cloning the scripts on the EC2 instances is an intentional design which
allows us to make minor config changes in EC2 launches without needing a
new Spark release.

- I think having some script to launch EC2 clusters that is a part of
mainline Spark is a nice feature to have. However this could be a very thin
wrapper instead of the big Python file we have right now.

- Moving the scripts from the Mesos organization to spark-ec2 or amplab is
fine by me. In fact one nice way to do this transition would be to move the
existing spark-ec2 repo to a new organization and then move the logic from
 the launcher script out of the Spark to the new repo.

Thanks
Shivaram




On Fri, Jul 3, 2015 at 10:36 AM, Sean Owen so...@cloudera.com wrote:

 I'll render an opinion although I'm only barely qualified by having
 just had a small discussion on this --

 It does seem like mesos/spark-ec2 is in the wrong place, although
 really, that is at best an issue for Mesos. But it does highlight that
 the Spark EC2 support doesn't entirely live with and get distributed
 with apache/spark.

 It does feel like that should move and should not be separate from the
 other half of EC2 support. Why not put it in apache/spark? I think the
 problem is that the AMI process clones the repo, and the apache/spark
 repo is huge. One answer is just to fix that by arranging a different
 way of releasing the EC2 files as a downloadable archive.

 However, if it is true that the Spark EC2 support doesn't need to live
 with and get released with the rest of Spark, it might make more sense
 to merge both halves into a new separate repo and run it separately
 from apache/spark, like any other third-party repo.

 I think that's less radical than it sounds, and has some benefits.
 There is not quite the same argument of needing to build and maintain
 this together like with language bindings and subprojects.

 But is that something that people who use and maintain it agree with
 or are advocating for?

 On Fri, Jul 3, 2015 at 6:23 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  spark-ec2 is kind of a mini project within a project.
 
  It’s composed of a set of EC2 AMIs under someone’s account (maybe
  Patrick’s?) plus the following 2 code bases:
 
  Main command line tool: https://github.com/apache/spark/tree/master/ec2
  Scripts used to install stuff on launched instances:
  https://github.com/mesos/spark-ec2
 
  You’ll notice that part of the code lives under the Mesos GitHub
  organization. This is an artifact of history, when Spark itself kinda
 grew
  out of Mesos before becoming its own project.
 
  There are a few issues with this state of affairs, none of which are
 major
  but which nonetheless merit some discussion:
 
  The spark-ec2 code is split across 2 repositories when it is not
 technically
  necessary.
  Some of that code is owned by an organization that should technically
 not be
  owning Spark stuff.
  Spark and spark-ec2 live in the same repo but spark-ec2 issues are often
  completely disjoint from issues with Spark itself. This has led in some
  cases to new Spark RCs being cut because of minor issues with spark-ec2
  (like version strings not being updated).
 
  I wanted to put up for discussion a few suggestions and see what people
  agreed with.
 
  The current state of affairs is fine and it is not worth moving stuff
  around.
  spark-ec2 should get its own repo, and should be moved out of the main
 Spark
  repo. That means both of the code bases linked above would live in one
 place
  (maybe a spark-ec2/spark-ec2 repo).
  spark-ec2 should stay in the Spark repo, but the stuff under the Mesos
  organization should be moved elsewhere (again, perhaps under a
  spark-ec2/spark-ec2 repo).
 
  What do you think?
 
  Nick

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org