I'll render an opinion although I'm only barely qualified by having just had a small discussion on this --
It does seem like mesos/spark-ec2 is in the wrong place, although really, that is at best an issue for Mesos. But it does highlight that the Spark EC2 support doesn't entirely live with and get distributed with apache/spark. It does feel like that should move and should not be separate from the other half of EC2 support. Why not put it in apache/spark? I think the problem is that the AMI process clones the repo, and the apache/spark repo is huge. One answer is just to fix that by arranging a different way of releasing the EC2 files as a downloadable archive. However, if it is true that the Spark EC2 support doesn't need to live with and get released with the rest of Spark, it might make more sense to merge both halves into a new separate repo and run it separately from apache/spark, like any other third-party repo. I think that's less radical than it sounds, and has some benefits. There is not quite the same argument of needing to build and maintain this together like with language bindings and subprojects. But is that something that people who use and maintain it agree with or are advocating for? On Fri, Jul 3, 2015 at 6:23 PM, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > spark-ec2 is kind of a mini project within a project. > > It’s composed of a set of EC2 AMIs under someone’s account (maybe > Patrick’s?) plus the following 2 code bases: > > Main command line tool: https://github.com/apache/spark/tree/master/ec2 > Scripts used to install stuff on launched instances: > https://github.com/mesos/spark-ec2 > > You’ll notice that part of the code lives under the Mesos GitHub > organization. This is an artifact of history, when Spark itself kinda grew > out of Mesos before becoming its own project. > > There are a few issues with this state of affairs, none of which are major > but which nonetheless merit some discussion: > > The spark-ec2 code is split across 2 repositories when it is not technically > necessary. > Some of that code is owned by an organization that should technically not be > owning Spark stuff. > Spark and spark-ec2 live in the same repo but spark-ec2 issues are often > completely disjoint from issues with Spark itself. This has led in some > cases to new Spark RCs being cut because of minor issues with spark-ec2 > (like version strings not being updated). > > I wanted to put up for discussion a few suggestions and see what people > agreed with. > > The current state of affairs is fine and it is not worth moving stuff > around. > spark-ec2 should get its own repo, and should be moved out of the main Spark > repo. That means both of the code bases linked above would live in one place > (maybe a spark-ec2/spark-ec2 repo). > spark-ec2 should stay in the Spark repo, but the stuff under the Mesos > organization should be moved elsewhere (again, perhaps under a > spark-ec2/spark-ec2 repo). > > What do you think? > > Nick --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org