I concur with the things Sean said about keeping the same JIRA. Frankly, its a pretty small part of spark, and as mentioned by Nicholas, a reference implementation of getting Spark running in ec2.
I can see wanting to grow it to a little more general tool that implements launchers for other compute platforms. Porting this over to Google/M$/rackspace offerings would be not too far out of reach. --Matthew Goodman ===================== Check Out My Website: http://craneium.net Find me on LinkedIn: http://tinyurl.com/d6wlch On Mon, Jul 13, 2015 at 2:46 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > At a high level I see the spark-ec2 scripts as an effort to provide a > reference implementation for launching EC2 clusters with Apache Spark > > On a side note, this is precisely how I used spark-ec2 for a personal > project that does something similar: reference implementation. > > Nick > 2015년 7월 13일 (월) 오후 1:27, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu>님이 작성: > >> I think moving the repo-location and re-organizing the python code to >> handle dependencies, testing etc. sounds good to me. However, I think there >> are a couple of things which I am not sure about >> >> 1. I strongly believe that we should preserve existing command-line in >> ec2/spark-ec2 (i.e. the shell script not the python file). This could be a >> thin wrapper script that just checks out the or downloads something >> (similar to say build/mvn). Mainly, I see no reason to break the workflow >> that users are used to right now. >> >> 2. I am also not sure about that moving the issue tracker is necessarily >> a good idea. I don't think we get a large number of issues due to the EC2 >> stuff and if we do have a workflow for launching EC2 clusters, the Spark >> JIRA would still be the natural place to report issues related to this. >> >> At a high level I see the spark-ec2 scripts as an effort to provide a >> reference implementation for launching EC2 clusters with Apache Spark -- >> Given this view I am not sure it makes sense to completely decouple this >> from the Apache project. >> >> Thanks >> Shivaram >> >> On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen <so...@cloudera.com> wrote: >> >>> I agree with these points. The ec2 support is substantially a separate >>> project, and would likely be better managed as one. People can much >>> more rapidly iterate on it and release it. >>> >>> I suggest: >>> >>> 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ? >>> 2. Add interested parties as owners/contributors >>> 3. Reassemble a working clone of the current code from spark/ec2 and >>> mesos/spark-ec2 and check it in >>> 4. Announce the new location on user@, dev@ >>> 5. Triage open JIRAs to the new repo's issue tracker and close them >>> elsewhere >>> 6. Remove the old copies of the code and leave a pointer to the new >>> location in their place >>> >>> I'd also like to hear a few more nods before pulling the trigger though. >>> >>> On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <meawo...@gmail.com> >>> wrote: >>> > I wanted to revive the conversation about the spark-ec2 tools, as it >>> seems >>> > to have been lost in the 1.4.1 release voting spree. >>> > >>> > I think that splitting it into its own repository is a really good >>> move, and >>> > I would also be happy to help with this transition, as well as help >>> maintain >>> > the resulting repository. Here is my justification for why we ought >>> to do >>> > this split. >>> > >>> > User Facing: >>> > >>> > The spark-ec2 launcher dosen't use anything in the parent spark >>> repository >>> > spark-ec2 version is disjoint from the parent repo. I consider it >>> confusing >>> > that the spark-ec2 script dosen't launch the version of spark it is >>> > checked-out with. >>> > Someone interested in setting up spark-ec2 with anything but the >>> default >>> > configuration will have to clone at least 2 repositories at present, >>> and >>> > probably fork and push changes to 1. >>> > spark-ec2 has mismatched dependencies wrt. to spark itself. This >>> includes a >>> > confusing shim in the spark-ec2 script to install boto, which frankly >>> should >>> > just be a dependency of the script >>> > >>> > Developer Facing: >>> > >>> > Support across 2 repos will be worse than across 1. Its unclear where >>> to >>> > file issues/PRs, and requires extra communications for even fairly >>> trivial >>> > stuff. >>> > Spark-ec2 also depends on a number binary blobs being in the right >>> place, >>> > currently the responsibility for these is decentralized, and likely >>> prone to >>> > various flavors of dumb. >>> > The current flow of booting a spark-ec2 cluster is _complicated_ I >>> spent the >>> > better part of a couple days figuring out how to integrate our custom >>> tools >>> > into this stack. This is very hard to fix when commits/PR's need to >>> span >>> > groups/repositories/buckets-o-binary, I am sure there are several other >>> > problems that are languishing under similar roadblocks >>> > It makes testing possible. The spark-ec2 script is a great case for CI >>> > given the number of permutations of launch criteria there are. I >>> suspect >>> > AWS would be happy to foot the bill on spark-ec2 testing (probably ~20 >>> bucks >>> > a month based on some envelope sketches), as it is a piece of software >>> that >>> > directly impacts other people giving them money. I have some contacts >>> > there, and I am pretty sure this would be an easy conversation, >>> particularly >>> > if the repo directly concerned with ec2. Think also being able to >>> assemble >>> > the binary blobs into s3 bucket dedicated to spark-ec2 >>> > >>> > Any other thoughts/voices appreciated here. spark-ec2 is a >>> super-power tool >>> > and deserves a fair bit of attention! >>> > --Matthew Goodman >>> > >>> > ===================== >>> > Check Out My Website: http://craneium.net >>> > Find me on LinkedIn: http://tinyurl.com/d6wlch >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> For additional commands, e-mail: dev-h...@spark.apache.org >>> >>> >>