On Sat, Apr 16, 2016 at 5:38 PM, Evan Chan <velvia.git...@gmail.com> wrote:
> Hi folks, > > Sorry to join the discussion late. I had a look at the design doc > earlier in this thread, and it was not mentioned what types of > projects are the targets of this new "spark extras" ASF umbrella.... > > Is the desire to have a maintained set of spark-related projects that > keep pace with the main Spark development schedule? Is it just for > streaming connectors? what about data sources, and other important > projects in the Spark ecosystem? > The proposal draft below has some more details on what type of projects, but in summary, "Spark-Extras" would be a good place for any of these components you mentioned. https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing > > I'm worried that this would relegate spark-packages to third tier > status, Owen answered a similar question about spark-packages earlier on this thread, but while "Spark-Extras" would a place in Apache for collaboration on the development of these extensions, they might still be published to spark-packages as they existing streaming connectors are today. > and the promotion of a select set of committers, and the > project itself, to top level ASF status (a la Arrow) would create a > further split in the community. > > As for the select set of committers, we have invited all Spark committers to be committers on the project, and I have updated the project proposal with the existing set of active Spark committers ( that have committed in the last one year) > > -Evan > > On Sat, Apr 16, 2016 at 4:46 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > > > > > > > > > > > > On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" < > chris.a.mattm...@jpl.nasa.gov> wrote: > > > >>Yeah in support of this statement I think that my primary interest in > >>this Spark Extras and the good work by Luciano here is that anytime we > >>take bits out of a code base and “move it to GitHub” I see a bad > precedent > >>being set. > >> > >>Creating this project at the ASF creates a synergy between *Apache Spark* > >>which is *at the ASF*. > >> > >>We welcome comments and as Luciano said, this is meant to invite and be > >>open to those in the Apache Spark PMC to join and help. > >> > >>Cheers, > >>Chris > > > > As one of the people named, here's my rationale: > > > > Throwing stuff into github creates that world of branches, and its no > longer something that could be managed through the ASF, where managed is: > governance, participation and a release process that includes auditing > dependencies, code-signoff, etc, > > > > > > As an example, there's a mutant hive JAR which spark uses, that's > something which currently evolved between my repo and Patrick Wendell's; > now that Josh Rosen has taken on the bold task of "trying to move spark and > twill to Kryo 3", he's going to own that code, and now the reference branch > will move somewhere else. > > > > In contrast, if there was an ASF location for this, then it'd be > something anyone with commit rights could maintain and publish > > > > (actually, I've just realised life is hard here as the hive is a fork of > ASF hive —really the spark branch should be a separate branch in Hive's own > repo ... But the concept is the same: those bits of the codebase which are > core parts of the spark project should really live in or near it) > > > > > > If everyone on the spark commit list gets write access to this extras > repo, moving things is straightforward. Release wise, things could/should > be in sync. > > > > If there's a risk, its the eternal problem of the contrib/ dir .... > Stuff ends up there that never gets maintained. I don't see that being any > worse than if things were thrown to the wind of a thousand github repos: at > least now there'd be a central issue tracking location. > -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/