I think moving the repo-location and re-organizing the python code to handle dependencies, testing etc. sounds good to me. However, I think there are a couple of things which I am not sure about
1. I strongly believe that we should preserve existing command-line in ec2/spark-ec2 (i.e. the shell script not the python file). This could be a thin wrapper script that just checks out the or downloads something (similar to say build/mvn). Mainly, I see no reason to break the workflow that users are used to right now. 2. I am also not sure about that moving the issue tracker is necessarily a good idea. I don't think we get a large number of issues due to the EC2 stuff and if we do have a workflow for launching EC2 clusters, the Spark JIRA would still be the natural place to report issues related to this. At a high level I see the spark-ec2 scripts as an effort to provide a reference implementation for launching EC2 clusters with Apache Spark -- Given this view I am not sure it makes sense to completely decouple this from the Apache project. Thanks Shivaram On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen <so...@cloudera.com> wrote: > I agree with these points. The ec2 support is substantially a separate > project, and would likely be better managed as one. People can much > more rapidly iterate on it and release it. > > I suggest: > > 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ? > 2. Add interested parties as owners/contributors > 3. Reassemble a working clone of the current code from spark/ec2 and > mesos/spark-ec2 and check it in > 4. Announce the new location on user@, dev@ > 5. Triage open JIRAs to the new repo's issue tracker and close them > elsewhere > 6. Remove the old copies of the code and leave a pointer to the new > location in their place > > I'd also like to hear a few more nods before pulling the trigger though. > > On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <meawo...@gmail.com> wrote: > > I wanted to revive the conversation about the spark-ec2 tools, as it > seems > > to have been lost in the 1.4.1 release voting spree. > > > > I think that splitting it into its own repository is a really good move, > and > > I would also be happy to help with this transition, as well as help > maintain > > the resulting repository. Here is my justification for why we ought to > do > > this split. > > > > User Facing: > > > > The spark-ec2 launcher dosen't use anything in the parent spark > repository > > spark-ec2 version is disjoint from the parent repo. I consider it > confusing > > that the spark-ec2 script dosen't launch the version of spark it is > > checked-out with. > > Someone interested in setting up spark-ec2 with anything but the default > > configuration will have to clone at least 2 repositories at present, and > > probably fork and push changes to 1. > > spark-ec2 has mismatched dependencies wrt. to spark itself. This > includes a > > confusing shim in the spark-ec2 script to install boto, which frankly > should > > just be a dependency of the script > > > > Developer Facing: > > > > Support across 2 repos will be worse than across 1. Its unclear where to > > file issues/PRs, and requires extra communications for even fairly > trivial > > stuff. > > Spark-ec2 also depends on a number binary blobs being in the right place, > > currently the responsibility for these is decentralized, and likely > prone to > > various flavors of dumb. > > The current flow of booting a spark-ec2 cluster is _complicated_ I spent > the > > better part of a couple days figuring out how to integrate our custom > tools > > into this stack. This is very hard to fix when commits/PR's need to span > > groups/repositories/buckets-o-binary, I am sure there are several other > > problems that are languishing under similar roadblocks > > It makes testing possible. The spark-ec2 script is a great case for CI > > given the number of permutations of launch criteria there are. I suspect > > AWS would be happy to foot the bill on spark-ec2 testing (probably ~20 > bucks > > a month based on some envelope sketches), as it is a piece of software > that > > directly impacts other people giving them money. I have some contacts > > there, and I am pretty sure this would be an easy conversation, > particularly > > if the repo directly concerned with ec2. Think also being able to > assemble > > the binary blobs into s3 bucket dedicated to spark-ec2 > > > > Any other thoughts/voices appreciated here. spark-ec2 is a super-power > tool > > and deserves a fair bit of attention! > > --Matthew Goodman > > > > ===================== > > Check Out My Website: http://craneium.net > > Find me on LinkedIn: http://tinyurl.com/d6wlch > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >