Re: Should spark-ec2 get its own repo?

Matt Goodman Tue, 14 Jul 2015 21:36:04 -0700

I concur with the things Sean said about keeping the same JIRA.  Frankly,
its a pretty small part of spark, and as mentioned by Nicholas, a reference
implementation of getting Spark running in ec2.


I can see wanting to grow it to a little more general tool that implements
launchers for other compute platforms.  Porting this over to
Google/M$/rackspace offerings would be not too far out of reach.

--Matthew Goodman

=====================
Check Out My Website: http://craneium.net
Find me on LinkedIn: http://tinyurl.com/d6wlch

On Mon, Jul 13, 2015 at 2:46 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> > At a high level I see the spark-ec2 scripts as an effort to provide a
> reference implementation for launching EC2 clusters with Apache Spark
>
> On a side note, this is precisely how I used spark-ec2 for a personal
> project that does something similar: reference implementation.
>
> Nick
> 2015년 7월 13일 (월) 오후 1:27, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu>님이 작성:
>
>> I think moving the repo-location and re-organizing the python code to
>> handle dependencies, testing etc. sounds good to me. However, I think there
>> are a couple of things which I am not sure about
>>
>> 1. I strongly believe that we should preserve existing command-line in
>> ec2/spark-ec2 (i.e. the shell script not the python file). This could be a
>> thin wrapper script that just checks out the or downloads something
>> (similar to say build/mvn). Mainly, I see no reason to break the workflow
>> that users are used to right now.
>>
>> 2. I am also not sure about that moving the issue tracker is necessarily
>> a good idea. I don't think we get a large number of issues due to the EC2
>> stuff  and if we do have a workflow for launching EC2 clusters, the Spark
>> JIRA would still be the natural place to report issues related to this.
>>
>> At a high level I see the spark-ec2 scripts as an effort to provide a
>> reference implementation for launching EC2 clusters with Apache Spark --
>> Given this view I am not sure it makes sense to completely decouple this
>> from the Apache project.
>>
>> Thanks
>> Shivaram
>>
>> On Sun, Jul 12, 2015 at 1:34 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> I agree with these points. The ec2 support is substantially a separate
>>> project, and would likely be better managed as one. People can much
>>> more rapidly iterate on it and release it.
>>>
>>> I suggest:
>>>
>>> 1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
>>> 2. Add interested parties as owners/contributors
>>> 3. Reassemble a working clone of the current code from spark/ec2 and
>>> mesos/spark-ec2 and check it in
>>> 4. Announce the new location on user@, dev@
>>> 5. Triage open JIRAs to the new repo's issue tracker and close them
>>> elsewhere
>>> 6. Remove the old copies of the code and leave a pointer to the new
>>> location in their place
>>>
>>> I'd also like to hear a few more nods before pulling the trigger though.
>>>
>>> On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <meawo...@gmail.com>
>>> wrote:
>>> > I wanted to revive the conversation about the spark-ec2 tools, as it
>>> seems
>>> > to have been lost in the 1.4.1 release voting spree.
>>> >
>>> > I think that splitting it into its own repository is a really good
>>> move, and
>>> > I would also be happy to help with this transition, as well as help
>>> maintain
>>> > the resulting repository.  Here is my justification for why we ought
>>> to do
>>> > this split.
>>> >
>>> > User Facing:
>>> >
>>> > The spark-ec2 launcher dosen't use anything in the parent spark
>>> repository
>>> > spark-ec2 version is disjoint from the parent repo.  I consider it
>>> confusing
>>> > that the spark-ec2 script dosen't launch the version of spark it is
>>> > checked-out with.
>>> > Someone interested in setting up spark-ec2 with anything but the
>>> default
>>> > configuration will have to clone at least 2 repositories at present,
>>> and
>>> > probably fork and push changes to 1.
>>> > spark-ec2 has mismatched dependencies wrt. to spark itself.  This
>>> includes a
>>> > confusing shim in the spark-ec2 script to install boto, which frankly
>>> should
>>> > just be a dependency of the script
>>> >
>>> > Developer Facing:
>>> >
>>> > Support across 2 repos will be worse than across 1.  Its unclear where
>>> to
>>> > file issues/PRs, and requires extra communications for even fairly
>>> trivial
>>> > stuff.
>>> > Spark-ec2 also depends on a number binary blobs being in the right
>>> place,
>>> > currently the responsibility for these is decentralized, and likely
>>> prone to
>>> > various flavors of dumb.
>>> > The current flow of booting a spark-ec2 cluster is _complicated_ I
>>> spent the
>>> > better part of a couple days figuring out how to integrate our custom
>>> tools
>>> > into this stack.  This is very hard to fix when commits/PR's need to
>>> span
>>> > groups/repositories/buckets-o-binary, I am sure there are several other
>>> > problems that are languishing under similar roadblocks
>>> > It makes testing possible.  The spark-ec2 script is a great case for CI
>>> > given the number of permutations of launch criteria there are.  I
>>> suspect
>>> > AWS would be happy to foot the bill on spark-ec2 testing (probably ~20
>>> bucks
>>> > a month based on some envelope sketches), as it is a piece of software
>>> that
>>> > directly impacts other people giving them money.  I have some contacts
>>> > there, and I am pretty sure this would be an easy conversation,
>>> particularly
>>> > if the repo directly concerned with ec2.  Think also being able to
>>> assemble
>>> > the binary blobs into s3 bucket dedicated to spark-ec2
>>> >
>>> > Any other thoughts/voices appreciated here.  spark-ec2 is a
>>> super-power tool
>>> > and deserves a fair bit of attention!
>>> > --Matthew Goodman
>>> >
>>> > =====================
>>> > Check Out My Website: http://craneium.net
>>> > Find me on LinkedIn: http://tinyurl.com/d6wlch
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>

Re: Should spark-ec2 get its own repo?

Reply via email to