spark-ec2 is kind of a mini project within a project.

It’s composed of a set of EC2 AMIs
<https://github.com/mesos/spark-ec2/tree/branch-1.4/ami-list> under
someone’s account (maybe Patrick’s?) plus the following 2 code bases:

   - Main command line tool: https://github.com/apache/spark/tree/master/ec2
   - Scripts used to install stuff on launched instances:
   https://github.com/mesos/spark-ec2

You’ll notice that part of the code lives under the Mesos GitHub
organization. This is an artifact of history, when Spark itself kinda grew
out of Mesos before becoming its own project.

There are a few issues with this state of affairs, none of which are major
but which nonetheless merit some discussion:

   - The spark-ec2 code is split across 2 repositories when it is not
   technically necessary.
   - Some of that code is owned by an organization that should technically
   not be owning Spark stuff.
   - Spark and spark-ec2 live in the same repo but spark-ec2 issues are
   often completely disjoint from issues with Spark itself. This has led in
   some cases to new Spark RCs being cut because of minor issues with
   spark-ec2 (like version strings not being updated).

I wanted to put up for discussion a few suggestions and see what people
agreed with.

   1. The current state of affairs is fine and it is not worth moving stuff
   around.
   2. spark-ec2 should get its own repo, and should be moved out of the
   main Spark repo. That means both of the code bases linked above would live
   in one place (maybe a spark-ec2/spark-ec2 repo).
   3. spark-ec2 should stay in the Spark repo, but the stuff under the
   Mesos organization should be moved elsewhere (again, perhaps under a
   spark-ec2/spark-ec2 repo).

What do you think?

Nick
​

Reply via email to