On 15/04/2016, 17:41, "Mattmann, Chris A (3980)"
<chris.a.mattm...@jpl.nasa.gov> wrote:
>Yeah in support of this statement I think that my primary interest in
>this Spark Extras and the good work by Luciano here is that anytime we
>take bits out of a code base and “move it to GitHub” I see a bad precedent
>being set.
>
>Creating this project at the ASF creates a synergy between *Apache Spark*
>which is *at the ASF*.
>
>We welcome comments and as Luciano said, this is meant to invite and be
>open to those in the Apache Spark PMC to join and help.
>
>Cheers,
>Chris
As one of the people named, here's my rationale:
Throwing stuff into github creates that world of branches, and its no longer
something that could be managed through the ASF, where managed is: governance,
participation and a release process that includes auditing dependencies,
code-signoff, etc,
As an example, there's a mutant hive JAR which spark uses, that's something
which currently evolved between my repo and Patrick Wendell's; now that Josh
Rosen has taken on the bold task of "trying to move spark and twill to Kryo 3",
he's going to own that code, and now the reference branch will move somewhere
else.
In contrast, if there was an ASF location for this, then it'd be something
anyone with commit rights could maintain and publish
(actually, I've just realised life is hard here as the hive is a fork of ASF
hive —really the spark branch should be a separate branch in Hive's own repo
... But the concept is the same: those bits of the codebase which are core
parts of the spark project should really live in or near it)
If everyone on the spark commit list gets write access to this extras repo,
moving things is straightforward. Release wise, things could/should be in sync.
If there's a risk, its the eternal problem of the contrib/ dir .... Stuff ends
up there that never gets maintained. I don't see that being any worse than if
things were thrown to the wind of a thousand github repos: at least now there'd
be a central issue tracking location.