> At a high level I see the spark-ec2 scripts as an effort to provide a reference implementation for launching EC2 clusters with Apache Spark
On a side note, this is precisely how I used spark-ec2 for a personal project that does something similar: reference implementation. Nick 2015년 7월 13일 (월) 오후 1:27, Shivaram Venkataraman <shiva...@eecs.berkeley.edu>님이 작성: > I think moving the repo-location and re-organizing the python code to > handle dependencies, testing etc. sounds good to me. However, I think there > are a couple of things which I am not sure about > > 1. I strongly believe that we should preserve existing command-line in > ec2/spark-ec2 (i.e. the shell script not the python file). This could be a > thin wrapper script that just checks out the or downloads something > (similar to say build/mvn). Mainly, I see no reason to break the workflow > that users are used to right now. > > 2. I am also not sure about that moving the issue tracker is necessarily a > good idea. I don't think we get a large number of issues due to the EC2 > stuff and if we do have a workflow for launching EC2 clusters, the Spark > JIRA would still be the natural place to report issues related to this. > > At a high level I see the spark-ec2 scripts as an effort to provide a > reference implementation for launching EC2 clusters with Apache Spark -- > Given this view I am not sure it makes sense to completely decouple this > from the Apache project. > > Thanks > Shivaram > > On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen <so...@cloudera.com> wrote: > >> I agree with these points. The ec2 support is substantially a separate >> project, and would likely be better managed as one. People can much >> more rapidly iterate on it and release it. >> >> I suggest: >> >> 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ? >> 2. Add interested parties as owners/contributors >> 3. Reassemble a working clone of the current code from spark/ec2 and >> mesos/spark-ec2 and check it in >> 4. Announce the new location on user@, dev@ >> 5. Triage open JIRAs to the new repo's issue tracker and close them >> elsewhere >> 6. Remove the old copies of the code and leave a pointer to the new >> location in their place >> >> I'd also like to hear a few more nods before pulling the trigger though. >> >> On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <meawo...@gmail.com> wrote: >> > I wanted to revive the conversation about the spark-ec2 tools, as it >> seems >> > to have been lost in the 1.4.1 release voting spree. >> > >> > I think that splitting it into its own repository is a really good >> move, and >> > I would also be happy to help with this transition, as well as help >> maintain >> > the resulting repository. Here is my justification for why we ought to >> do >> > this split. >> > >> > User Facing: >> > >> > The spark-ec2 launcher dosen't use anything in the parent spark >> repository >> > spark-ec2 version is disjoint from the parent repo. I consider it >> confusing >> > that the spark-ec2 script dosen't launch the version of spark it is >> > checked-out with. >> > Someone interested in setting up spark-ec2 with anything but the default >> > configuration will have to clone at least 2 repositories at present, and >> > probably fork and push changes to 1. >> > spark-ec2 has mismatched dependencies wrt. to spark itself. This >> includes a >> > confusing shim in the spark-ec2 script to install boto, which frankly >> should >> > just be a dependency of the script >> > >> > Developer Facing: >> > >> > Support across 2 repos will be worse than across 1. Its unclear where >> to >> > file issues/PRs, and requires extra communications for even fairly >> trivial >> > stuff. >> > Spark-ec2 also depends on a number binary blobs being in the right >> place, >> > currently the responsibility for these is decentralized, and likely >> prone to >> > various flavors of dumb. >> > The current flow of booting a spark-ec2 cluster is _complicated_ I >> spent the >> > better part of a couple days figuring out how to integrate our custom >> tools >> > into this stack. This is very hard to fix when commits/PR's need to >> span >> > groups/repositories/buckets-o-binary, I am sure there are several other >> > problems that are languishing under similar roadblocks >> > It makes testing possible. The spark-ec2 script is a great case for CI >> > given the number of permutations of launch criteria there are. I >> suspect >> > AWS would be happy to foot the bill on spark-ec2 testing (probably ~20 >> bucks >> > a month based on some envelope sketches), as it is a piece of software >> that >> > directly impacts other people giving them money. I have some contacts >> > there, and I am pretty sure this would be an easy conversation, >> particularly >> > if the repo directly concerned with ec2. Think also being able to >> assemble >> > the binary blobs into s3 bucket dedicated to spark-ec2 >> > >> > Any other thoughts/voices appreciated here. spark-ec2 is a super-power >> tool >> > and deserves a fair bit of attention! >> > --Matthew Goodman >> > >> > ===================== >> > Check Out My Website: http://craneium.net >> > Find me on LinkedIn: http://tinyurl.com/d6wlch >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >