Re: SPARK-13843 and future of streaming backends

Sean Owen Sat, 19 Mar 2016 04:47:06 -0700

Code can be removed from an ASF project.
That code can live on elsewhere (in accordance with the license)

It can't be presented as part of the official ASF project, like any
other 3rd party project
The package name certainly must change from org.apache.spark

I don't know of a protocol, but common sense dictates a good-faith
effort to offer equivalent access to the code (e.g. interested
committers should probably be repo owners too.)

This differs from "any other code deletion" in that there's an intent
to keep working on the code but outside the project.
More discussion -- like this one -- would have been useful beforehand
but nothing's undoable

Backwards-compatibility is not a good reason for things, because we're
talking about Spark 2.x, and we're already talking about distributing
the code differently.

Is the reason for this change decoupling releases? or changing governance?
Seems like the former, but we don't actually need the latter to achieve that.
There's an argument for a new repo, but this is not an argument for
moving X out of the project per se

I'm sure doing this in the ASF is more overhead, but if changing
governance is a non-goal, there's no choice.
Convenience can't trump that.

Kafka integration is clearly more important than the others.
It seems to need to stay within the project.
However this still leaves a packaging problem to solve, that might
need a new repo. This is orthgonal.

Here's what I think:

1. Leave the moved modules outside the project entirely
  (why not Kinesis though? that one was not made clear)
2. Change package names and make sure it's clearly presented as external
3. Add any committers that want to be repo owners as owners
4. Keep Kafka within the project
5. Add some subproject within the current project as needed to
accomplish distribution goals

On Thu, Mar 17, 2016 at 6:14 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Hello all,
>
> Recently a lot of the streaming backends were moved to a separate
> project on github and removed from the main Spark repo.
>
> While I think the idea is great, I'm a little worried about the
> execution. Some concerns were already raised on the bug mentioned
> above, but I'd like to have a more explicit discussion about this so
> things don't fall through the cracks.
>
> Mainly I have three concerns.
>
> i. Ownership
>
> That code used to be run by the ASF, but now it's hosted in a github
> repo owned not by the ASF. That sounds a little sub-optimal, if not
> problematic.
>
> ii. Governance
>
> Similar to the above; who has commit access to the above repos? Will
> all the Spark committers, present and future, have commit access to
> all of those repos? Are they still going to be considered part of
> Spark and have release management done through the Spark community?
>
>
> For both of the questions above, why are they not turned into
> sub-projects of Spark and hosted on the ASF repos? I believe there is
> a mechanism to do that, without the need to keep the code in the main
> Spark repo, right?
>
> iii. Usability
>
> This is another thing I don't see discussed. For Scala-based code
> things don't change much, I guess, if the artifact names don't change
> (another reason to keep things in the ASF?), but what about python?
> How are pyspark users expected to get that code going forward, since
> it's not in Spark's pyspark.zip anymore?
>
>
> Is there an easy way of keeping these things within the ASF Spark
> project? I think that would be better for everybody.
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: SPARK-13843 and future of streaming backends

Reply via email to