I think moving the repo-location and re-organizing the python code to
handle dependencies, testing etc. sounds good to me. However, I think there
are a couple of things which I am not sure about

1. I strongly believe that we should preserve existing command-line in
ec2/spark-ec2 (i.e. the shell script not the python file). This could be a
thin wrapper script that just checks out the or downloads something
(similar to say build/mvn). Mainly, I see no reason to break the workflow
that users are used to right now.

2. I am also not sure about that moving the issue tracker is necessarily a
good idea. I don't think we get a large number of issues due to the EC2
stuff  and if we do have a workflow for launching EC2 clusters, the Spark
JIRA would still be the natural place to report issues related to this.

At a high level I see the spark-ec2 scripts as an effort to provide a
reference implementation for launching EC2 clusters with Apache Spark --
Given this view I am not sure it makes sense to completely decouple this
from the Apache project.

Thanks
Shivaram

On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen <so...@cloudera.com> wrote:

> I agree with these points. The ec2 support is substantially a separate
> project, and would likely be better managed as one. People can much
> more rapidly iterate on it and release it.
>
> I suggest:
>
> 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
> 2. Add interested parties as owners/contributors
> 3. Reassemble a working clone of the current code from spark/ec2 and
> mesos/spark-ec2 and check it in
> 4. Announce the new location on user@, dev@
> 5. Triage open JIRAs to the new repo's issue tracker and close them
> elsewhere
> 6. Remove the old copies of the code and leave a pointer to the new
> location in their place
>
> I'd also like to hear a few more nods before pulling the trigger though.
>
> On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <meawo...@gmail.com> wrote:
> > I wanted to revive the conversation about the spark-ec2 tools, as it
> seems
> > to have been lost in the 1.4.1 release voting spree.
> >
> > I think that splitting it into its own repository is a really good move,
> and
> > I would also be happy to help with this transition, as well as help
> maintain
> > the resulting repository.  Here is my justification for why we ought to
> do
> > this split.
> >
> > User Facing:
> >
> > The spark-ec2 launcher dosen't use anything in the parent spark
> repository
> > spark-ec2 version is disjoint from the parent repo.  I consider it
> confusing
> > that the spark-ec2 script dosen't launch the version of spark it is
> > checked-out with.
> > Someone interested in setting up spark-ec2 with anything but the default
> > configuration will have to clone at least 2 repositories at present, and
> > probably fork and push changes to 1.
> > spark-ec2 has mismatched dependencies wrt. to spark itself.  This
> includes a
> > confusing shim in the spark-ec2 script to install boto, which frankly
> should
> > just be a dependency of the script
> >
> > Developer Facing:
> >
> > Support across 2 repos will be worse than across 1.  Its unclear where to
> > file issues/PRs, and requires extra communications for even fairly
> trivial
> > stuff.
> > Spark-ec2 also depends on a number binary blobs being in the right place,
> > currently the responsibility for these is decentralized, and likely
> prone to
> > various flavors of dumb.
> > The current flow of booting a spark-ec2 cluster is _complicated_ I spent
> the
> > better part of a couple days figuring out how to integrate our custom
> tools
> > into this stack.  This is very hard to fix when commits/PR's need to span
> > groups/repositories/buckets-o-binary, I am sure there are several other
> > problems that are languishing under similar roadblocks
> > It makes testing possible.  The spark-ec2 script is a great case for CI
> > given the number of permutations of launch criteria there are.  I suspect
> > AWS would be happy to foot the bill on spark-ec2 testing (probably ~20
> bucks
> > a month based on some envelope sketches), as it is a piece of software
> that
> > directly impacts other people giving them money.  I have some contacts
> > there, and I am pretty sure this would be an easy conversation,
> particularly
> > if the repo directly concerned with ec2.  Think also being able to
> assemble
> > the binary blobs into s3 bucket dedicated to spark-ec2
> >
> > Any other thoughts/voices appreciated here.  spark-ec2 is a super-power
> tool
> > and deserves a fair bit of attention!
> > --Matthew Goodman
> >
> > =====================
> > Check Out My Website: http://craneium.net
> > Find me on LinkedIn: http://tinyurl.com/d6wlch
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to