Hello all,

Recently a lot of the streaming backends were moved to a separate
project on github and removed from the main Spark repo.

While I think the idea is great, I'm a little worried about the
execution. Some concerns were already raised on the bug mentioned
above, but I'd like to have a more explicit discussion about this so
things don't fall through the cracks.

Mainly I have three concerns.

i. Ownership

That code used to be run by the ASF, but now it's hosted in a github
repo owned not by the ASF. That sounds a little sub-optimal, if not
problematic.

ii. Governance

Similar to the above; who has commit access to the above repos? Will
all the Spark committers, present and future, have commit access to
all of those repos? Are they still going to be considered part of
Spark and have release management done through the Spark community?


For both of the questions above, why are they not turned into
sub-projects of Spark and hosted on the ASF repos? I believe there is
a mechanism to do that, without the need to keep the code in the main
Spark repo, right?

iii. Usability

This is another thing I don't see discussed. For Scala-based code
things don't change much, I guess, if the artifact names don't change
(another reason to keep things in the ASF?), but what about python?
How are pyspark users expected to get that code going forward, since
it's not in Spark's pyspark.zip anymore?


Is there an easy way of keeping these things within the ASF Spark
project? I think that would be better for everybody.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to