I agree with these points. The ec2 support is substantially a separate project, and would likely be better managed as one. People can much more rapidly iterate on it and release it.
I suggest: 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ? 2. Add interested parties as owners/contributors 3. Reassemble a working clone of the current code from spark/ec2 and mesos/spark-ec2 and check it in 4. Announce the new location on user@, dev@ 5. Triage open JIRAs to the new repo's issue tracker and close them elsewhere 6. Remove the old copies of the code and leave a pointer to the new location in their place I'd also like to hear a few more nods before pulling the trigger though. On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <meawo...@gmail.com> wrote: > I wanted to revive the conversation about the spark-ec2 tools, as it seems > to have been lost in the 1.4.1 release voting spree. > > I think that splitting it into its own repository is a really good move, and > I would also be happy to help with this transition, as well as help maintain > the resulting repository. Here is my justification for why we ought to do > this split. > > User Facing: > > The spark-ec2 launcher dosen't use anything in the parent spark repository > spark-ec2 version is disjoint from the parent repo. I consider it confusing > that the spark-ec2 script dosen't launch the version of spark it is > checked-out with. > Someone interested in setting up spark-ec2 with anything but the default > configuration will have to clone at least 2 repositories at present, and > probably fork and push changes to 1. > spark-ec2 has mismatched dependencies wrt. to spark itself. This includes a > confusing shim in the spark-ec2 script to install boto, which frankly should > just be a dependency of the script > > Developer Facing: > > Support across 2 repos will be worse than across 1. Its unclear where to > file issues/PRs, and requires extra communications for even fairly trivial > stuff. > Spark-ec2 also depends on a number binary blobs being in the right place, > currently the responsibility for these is decentralized, and likely prone to > various flavors of dumb. > The current flow of booting a spark-ec2 cluster is _complicated_ I spent the > better part of a couple days figuring out how to integrate our custom tools > into this stack. This is very hard to fix when commits/PR's need to span > groups/repositories/buckets-o-binary, I am sure there are several other > problems that are languishing under similar roadblocks > It makes testing possible. The spark-ec2 script is a great case for CI > given the number of permutations of launch criteria there are. I suspect > AWS would be happy to foot the bill on spark-ec2 testing (probably ~20 bucks > a month based on some envelope sketches), as it is a piece of software that > directly impacts other people giving them money. I have some contacts > there, and I am pretty sure this would be an easy conversation, particularly > if the repo directly concerned with ec2. Think also being able to assemble > the binary blobs into s3 bucket dedicated to spark-ec2 > > Any other thoughts/voices appreciated here. spark-ec2 is a super-power tool > and deserves a fair bit of attention! > --Matthew Goodman > > ===================== > Check Out My Website: http://craneium.net > Find me on LinkedIn: http://tinyurl.com/d6wlch --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org