Re: Should spark-ec2 get its own repo?

Nicholas Chammas Mon, 13 Jul 2015 14:48:18 -0700

> At a high level I see the spark-ec2 scripts as an effort to provide a
reference implementation for launching EC2 clusters with Apache Spark


On a side note, this is precisely how I used spark-ec2 for a personal
project that does something similar: reference implementation.

Nick
2015년 7월 13일 (월) 오후 1:27, Shivaram Venkataraman <shiva...@eecs.berkeley.edu>님이
작성:

> I think moving the repo-location and re-organizing the python code to
> handle dependencies, testing etc. sounds good to me. However, I think there
> are a couple of things which I am not sure about
>
> 1. I strongly believe that we should preserve existing command-line in
> ec2/spark-ec2 (i.e. the shell script not the python file). This could be a
> thin wrapper script that just checks out the or downloads something
> (similar to say build/mvn). Mainly, I see no reason to break the workflow
> that users are used to right now.
>
> 2. I am also not sure about that moving the issue tracker is necessarily a
> good idea. I don't think we get a large number of issues due to the EC2
> stuff  and if we do have a workflow for launching EC2 clusters, the Spark
> JIRA would still be the natural place to report issues related to this.
>
> At a high level I see the spark-ec2 scripts as an effort to provide a
> reference implementation for launching EC2 clusters with Apache Spark --
> Given this view I am not sure it makes sense to completely decouple this
> from the Apache project.
>
> Thanks
> Shivaram
>
> On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> I agree with these points. The ec2 support is substantially a separate
>> project, and would likely be better managed as one. People can much
>> more rapidly iterate on it and release it.
>>
>> I suggest:
>>
>> 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
>> 2. Add interested parties as owners/contributors
>> 3. Reassemble a working clone of the current code from spark/ec2 and
>> mesos/spark-ec2 and check it in
>> 4. Announce the new location on user@, dev@
>> 5. Triage open JIRAs to the new repo's issue tracker and close them
>> elsewhere
>> 6. Remove the old copies of the code and leave a pointer to the new
>> location in their place
>>
>> I'd also like to hear a few more nods before pulling the trigger though.
>>
>> On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <meawo...@gmail.com> wrote:
>> > I wanted to revive the conversation about the spark-ec2 tools, as it
>> seems
>> > to have been lost in the 1.4.1 release voting spree.
>> >
>> > I think that splitting it into its own repository is a really good
>> move, and
>> > I would also be happy to help with this transition, as well as help
>> maintain
>> > the resulting repository.  Here is my justification for why we ought to
>> do
>> > this split.
>> >
>> > User Facing:
>> >
>> > The spark-ec2 launcher dosen't use anything in the parent spark
>> repository
>> > spark-ec2 version is disjoint from the parent repo.  I consider it
>> confusing
>> > that the spark-ec2 script dosen't launch the version of spark it is
>> > checked-out with.
>> > Someone interested in setting up spark-ec2 with anything but the default
>> > configuration will have to clone at least 2 repositories at present, and
>> > probably fork and push changes to 1.
>> > spark-ec2 has mismatched dependencies wrt. to spark itself.  This
>> includes a
>> > confusing shim in the spark-ec2 script to install boto, which frankly
>> should
>> > just be a dependency of the script
>> >
>> > Developer Facing:
>> >
>> > Support across 2 repos will be worse than across 1.  Its unclear where
>> to
>> > file issues/PRs, and requires extra communications for even fairly
>> trivial
>> > stuff.
>> > Spark-ec2 also depends on a number binary blobs being in the right
>> place,
>> > currently the responsibility for these is decentralized, and likely
>> prone to
>> > various flavors of dumb.
>> > The current flow of booting a spark-ec2 cluster is _complicated_ I
>> spent the
>> > better part of a couple days figuring out how to integrate our custom
>> tools
>> > into this stack.  This is very hard to fix when commits/PR's need to
>> span
>> > groups/repositories/buckets-o-binary, I am sure there are several other
>> > problems that are languishing under similar roadblocks
>> > It makes testing possible.  The spark-ec2 script is a great case for CI
>> > given the number of permutations of launch criteria there are.  I
>> suspect
>> > AWS would be happy to foot the bill on spark-ec2 testing (probably ~20
>> bucks
>> > a month based on some envelope sketches), as it is a piece of software
>> that
>> > directly impacts other people giving them money.  I have some contacts
>> > there, and I am pretty sure this would be an easy conversation,
>> particularly
>> > if the repo directly concerned with ec2.  Think also being able to
>> assemble
>> > the binary blobs into s3 bucket dedicated to spark-ec2
>> >
>> > Any other thoughts/voices appreciated here.  spark-ec2 is a super-power
>> tool
>> > and deserves a fair bit of attention!
>> > --Matthew Goodman
>> >
>> > =====================
>> > Check Out My Website: http://craneium.net
>> > Find me on LinkedIn: http://tinyurl.com/d6wlch
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>

Re: Should spark-ec2 get its own repo?

Reply via email to