As the person maintaining the mesos/spark-ec2 repo, here are my 2 cents

- I don't think it makes sense to put the scripts in the Spark repo itself.
Cloning the scripts on the EC2 instances is an intentional design which
allows us to make minor config changes in EC2 launches without needing a
new Spark release.

- I think having some script to launch EC2 clusters that is a part of
mainline Spark is a nice feature to have. However this could be a very thin
wrapper instead of the big Python file we have right now.

- Moving the scripts from the Mesos organization to spark-ec2 or amplab is
fine by me. In fact one nice way to do this transition would be to move the
existing spark-ec2 repo to a new organization and then move the logic from
 the launcher script out of the Spark to the new repo.

Thanks
Shivaram




On Fri, Jul 3, 2015 at 10:36 AM, Sean Owen <so...@cloudera.com> wrote:

> I'll render an opinion although I'm only barely qualified by having
> just had a small discussion on this --
>
> It does seem like mesos/spark-ec2 is in the wrong place, although
> really, that is at best an issue for Mesos. But it does highlight that
> the Spark EC2 support doesn't entirely live with and get distributed
> with apache/spark.
>
> It does feel like that should move and should not be separate from the
> other half of EC2 support. Why not put it in apache/spark? I think the
> problem is that the AMI process clones the repo, and the apache/spark
> repo is huge. One answer is just to fix that by arranging a different
> way of releasing the EC2 files as a downloadable archive.
>
> However, if it is true that the Spark EC2 support doesn't need to live
> with and get released with the rest of Spark, it might make more sense
> to merge both halves into a new separate repo and run it separately
> from apache/spark, like any other third-party repo.
>
> I think that's less radical than it sounds, and has some benefits.
> There is not quite the same argument of needing to build and maintain
> this together like with language bindings and subprojects.
>
> But is that something that people who use and maintain it agree with
> or are advocating for?
>
> On Fri, Jul 3, 2015 at 6:23 PM, Nicholas Chammas
> <nicholas.cham...@gmail.com> wrote:
> > spark-ec2 is kind of a mini project within a project.
> >
> > It’s composed of a set of EC2 AMIs under someone’s account (maybe
> > Patrick’s?) plus the following 2 code bases:
> >
> > Main command line tool: https://github.com/apache/spark/tree/master/ec2
> > Scripts used to install stuff on launched instances:
> > https://github.com/mesos/spark-ec2
> >
> > You’ll notice that part of the code lives under the Mesos GitHub
> > organization. This is an artifact of history, when Spark itself kinda
> grew
> > out of Mesos before becoming its own project.
> >
> > There are a few issues with this state of affairs, none of which are
> major
> > but which nonetheless merit some discussion:
> >
> > The spark-ec2 code is split across 2 repositories when it is not
> technically
> > necessary.
> > Some of that code is owned by an organization that should technically
> not be
> > owning Spark stuff.
> > Spark and spark-ec2 live in the same repo but spark-ec2 issues are often
> > completely disjoint from issues with Spark itself. This has led in some
> > cases to new Spark RCs being cut because of minor issues with spark-ec2
> > (like version strings not being updated).
> >
> > I wanted to put up for discussion a few suggestions and see what people
> > agreed with.
> >
> > The current state of affairs is fine and it is not worth moving stuff
> > around.
> > spark-ec2 should get its own repo, and should be moved out of the main
> Spark
> > repo. That means both of the code bases linked above would live in one
> place
> > (maybe a spark-ec2/spark-ec2 repo).
> > spark-ec2 should stay in the Spark repo, but the stuff under the Mesos
> > organization should be moved elsewhere (again, perhaps under a
> > spark-ec2/spark-ec2 repo).
> >
> > What do you think?
> >
> > Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to