Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-28 Thread Luciano Resende
Just want to provide a quick update that we have submitted the "Spark Extras" proposal for review by the Apache board (see link below with the contents). https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing Note that we are in the quest for a project

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-18 Thread Luciano Resende
Evan, As long as you meet the criteria we discussed on this thread, you are welcome to join. Having said that, I have already seen other contributors that are very active on some of connectors but are not Apache Committers yet, and i wanted to be fair, and also avoid using the project as an

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-17 Thread Luciano Resende
On Sat, Apr 16, 2016 at 11:12 PM, Reynold Xin wrote: > First, really thank you for leading the discussion. > > I am concerned that it'd hurt Spark more than it helps. As many others > have pointed out, this unnecessarily creates a new tier of connectors or > 3rd party libraries

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-17 Thread Reynold Xin
First, really thank you for leading the discussion. I am concerned that it'd hurt Spark more than it helps. As many others have pointed out, this unnecessarily creates a new tier of connectors or 3rd party libraries appearing to be endorsed by the Spark PMC or the ASF. We can alleviate this

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Luciano Resende
On Sat, Apr 16, 2016 at 5:38 PM, Evan Chan wrote: > Hi folks, > > Sorry to join the discussion late. I had a look at the design doc > earlier in this thread, and it was not mentioned what types of > projects are the targets of this new "spark extras" ASF umbrella >

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Evan Chan
Hi folks, Sorry to join the discussion late. I had a look at the design doc earlier in this thread, and it was not mentioned what types of projects are the targets of this new "spark extras" ASF umbrella Is the desire to have a maintained set of spark-related projects that keep pace with

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-16 Thread Steve Loughran
On 15/04/2016, 17:41, "Mattmann, Chris A (3980)" wrote: >Yeah in support of this statement I think that my primary interest in >this Spark Extras and the good work by Luciano here is that anytime we >take bits out of a code base and “move it to GitHub” I see

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mridul Muralidharan
On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this statement I think that my primary interest in > this Spark Extras and the good work by Luciano here is that anytime we > take bits out of a code base and “move it to GitHub” I see

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
100% agree with Sean & Reynold's comments on this. Adding this as a TLP would just cause more confusion as to "official" endorsement. On Fri, Apr 15, 2016 at 11:50 AM, Sean Owen wrote: > On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: >> I

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being set. Creating this project at the ASF creates a synergy between *Apache

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mattmann, Chris A (3980)
Hey Reynold, Thanks. Getting to the heart of this, I think that this project would be successful if the Apache Spark PMC decided to participate and there was some overlap. As much as I think it would be great to stand up another project, the goal here from Luciano and crew (myself included) would

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Jean-Baptiste Onofré
+1 Regards JB On 04/15/2016 06:41 PM, Mattmann, Chris A (3980) wrote: Yeah in support of this statement I think that my primary interest in this Spark Extras and the good work by Luciano here is that anytime we take bits out of a code base and “move it to GitHub” I see a bad precedent being

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
On Fri, Apr 15, 2016 at 5:34 PM, Luciano Resende wrote: > I know the name might be confusing, but I also think that the projects have > a very big synergy, more like sibling projects, where "Spark Extras" extends > the Spark community and develop/maintain components for, and

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:34 AM, Cody Koeninger wrote: > Given that not all of the connectors were removed, I think this > creates a weird / confusing three tier system > > 1. connectors in the official project's spark/extras or spark/external > 2. connectors in "Spark

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
On Fri, Apr 15, 2016 at 9:18 AM, Sean Owen wrote: > Why would this need to be an ASF project of its own? I don't think > it's possible to have a yet another separate "Spark Extras" TLP (?) > > There is already a project to manage these bits of code on Github. How > about all

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Cody Koeninger
Given that not all of the connectors were removed, I think this creates a weird / confusing three tier system 1. connectors in the official project's spark/extras or spark/external 2. connectors in "Spark Extras" 3. connectors in some random organization's github On Fri, Apr 15, 2016 at 11:18

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Chris Fregly
and how does this all relate to the existing 1-and-a-half-class citizen known as spark-packages.org? support for this citizen is buried deep in the Spark source (which was always a bit odd, in my opinion): https://github.com/apache/spark/search?utf8=%E2%9C%93=spark-packages On Fri, Apr 15,

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Sean Owen
Why would this need to be an ASF project of its own? I don't think it's possible to have a yet another separate "Spark Extras" TLP (?) There is already a project to manage these bits of code on Github. How about all of the interested parties manage the code there, under the same process, under

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Luciano Resende
After some collaboration with other community members, we have created a initial draft for Spark Extras which is available for review at https://docs.google.com/document/d/1zRFGG4414LhbKlGbYncZ13nyX34Rw4sfWhZRA5YBtIE/edit?usp=sharing We would like to invite other community members to participate

Re: SPARK-13843 and future of streaming backends

2016-03-28 Thread Cody Koeninger
Are you talking about group/identifier name, or contained classes? Because there are plenty of org.apache.* classes distributed via maven with non-apache group / identifiers. On Fri, Mar 25, 2016 at 6:54 PM, David Nalley wrote: > >> As far as group / artifact name

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Mridul Muralidharan
On Saturday, March 26, 2016, Sean Owen wrote: > This has been resolved; see the JIRA and related PRs but also > > http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html > > This change happened subsequent to current thread (thanks

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Jean-Baptiste Onofré
Hi Luciano, I didn't mean Spark proper, but more something like you proposed. Regards JB On 03/26/2016 06:38 PM, Luciano Resende wrote: On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré > wrote: Hi Luciano, If we take the "pure"

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Luciano Resende
On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré wrote: > Hi Luciano, > > If we take the "pure" technical vision, there's pros and cons of having > spark-extra (or whatever the name we give) still as an Apache project: > > Pro: > - Governance & Quality Insurance: we

Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Luciano Resende
I believe some of this has been resolved in the context of some parts that had interest in one extra connector, but we still have a few removed, and as you mentioned, we still don't have a simple way or willingness to manage and be current on new packages like kafka. And based on the fact that

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Sean Owen
This has been resolved; see the JIRA and related PRs but also http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html This is not a scenario where a [VOTE] needs to take place, and code changes don't proceed through PMC votes. From the project perspective,

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Jacek Laskowski
Hi, Although I'm not that much experienced member of ASF, I share your concerns. I haven't looked at the issue from this point of view, but after having read the thread I think PMC should've signed off the migration of ASF-owned code to a non-ASF repo. At least a vote is required (and this

Re: SPARK-13843 and future of streaming backends

2016-03-25 Thread David Nalley
> As far as group / artifact name compatibility, at least in the case of > Kafka we need different artifact names anyway, and people are going to > have to make changes to their build files for spark 2.0 anyway. As > far as keeping the actual classes in org.apache.spark to not break > code

Re: SPARK-13843 and future of streaming backends

2016-03-20 Thread Marcelo Vanzin
Hi Reynold, thanks for the info. On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: > If one really feels strongly that we should go through all the overhead to > setup an ASF subproject for these modules that won't work with the new > structured streaming, and want to

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
Spark has hit one of the enternal problems of OSS projects, one hit by: ant, maven, hadoop, ... anything with a plugin model. Take in the plugin: you're in control, but also down for maintenance Leave out the plugin: other people can maintain it, be more agile, etc. But you've lost control,

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Hari Shreedharan
I have worked with various ASF projects for 4+ years now. Sure, ASF projects can delete code as they feel fit. But this is the first time I have really seen code being "moved out" of a project without discussion. I am sure you can do this without violating ASF policy, but the explanation for that

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Imran Rashid
On Thu, Mar 17, 2016 at 2:55 PM, Cody Koeninger wrote: > Why would a PMC vote be necessary on every code deletion? > Certainly PMC votes are not necessary on *every* code deletion. I dont' think there is a very clear rule on when such discussion is warranted, just a soft

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Cody Koeninger
There's a difference between "without discussion" and "without as much discussion as I would have liked to have a chance to notice it". There are plenty of PRs that got merged before I noticed them that I would rather have not gotten merged. As far as group / artifact name compatibility, at least

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
On Thu, Mar 17, 2016 at 12:01 PM, Cody Koeninger wrote: > i. An ASF project can clearly decide that some of its code is no > longer worth maintaining and delete it. This isn't really any > different. It's still apache licensed so ultimately whoever wants the > code can get

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
On Fri, Mar 18, 2016 at 10:09 AM, Jean-Baptiste Onofré wrote: > a project can have multiple repos: it's what we have in ServiceMix, in > Karaf. > For the *-extra on github, if the code has been in the ASF, the PMC members > have to vote to move the code on *-extra. That's

SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Hello all, Recently a lot of the streaming backends were moved to a separate project on github and removed from the main Spark repo. While I think the idea is great, I'm a little worried about the execution. Some concerns were already raised on the bug mentioned above, but I'd like to have a

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
I was not aware of a discussion in Dev list about this - agree with most of the observations. In addition, I did not see PMC signoff on moving (sub-)modules out. Regards Mridul On Thursday, March 17, 2016, Marcelo Vanzin wrote: > Hello all, > > Recently a lot of the

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Adam Kocoloski
> On Mar 19, 2016, at 8:32 AM, Steve Loughran wrote: > > >> On 18 Mar 2016, at 17:07, Marcelo Vanzin wrote: >> >> Hi Steve, thanks for the write up. >> >> On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran >> wrote: >>> If

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Shane Curcuru
Marcelo Vanzin wrote earlier: > Recently a lot of the streaming backends were moved to a separate > project on github and removed from the main Spark repo. Question: why was the code removed from the Spark repo? What's the harm in keeping it available here? The ASF is perfectly happy if anyone

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Jean-Baptiste Onofré
Hi Marcelo, a project can have multiple repos: it's what we have in ServiceMix, in Karaf. For the *-extra on github, if the code has been in the ASF, the PMC members have to vote to move the code on *-extra. Regards JB On 03/18/2016 06:07 PM, Marcelo Vanzin wrote: Hi Steve, thanks for

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
> On 18 Mar 2016, at 22:24, Marcelo Vanzin wrote: > > On Fri, Mar 18, 2016 at 2:12 PM, chrismattmann wrote: >> So, my comment here is that any code *cannot* be removed from an Apache >> project if there is a VETO issued which so far I haven't seen,

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
> On 18 Mar 2016, at 17:07, Marcelo Vanzin wrote: > > Hi Steve, thanks for the write up. > > On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran > wrote: >> If you want a separate project, eg. SPARK-EXTRAS, then it *generally* needs >> to go through

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Steve Loughran
> On 17 Mar 2016, at 21:33, Marcelo Vanzin wrote: > > Hi Reynold, thanks for the info. > > On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: >> If one really feels strongly that we should go through all the overhead to >> setup an ASF subproject for

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Sean Owen
Code can be removed from an ASF project. That code can live on elsewhere (in accordance with the license) It can't be presented as part of the official ASF project, like any other 3rd party project The package name certainly must change from org.apache.spark I don't know of a protocol, but

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Note the non-kafka bug was filed right before the change was pushed. So there really wasn't any discussion before the decision was made to remove that code. I'm just trying to merge both discussions here in the list where it's a little bit more dynamic than bug updates that end up getting lost in

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Marcelo Vanzin
Also, just wanted to point out something: On Thu, Mar 17, 2016 at 2:18 PM, Reynold Xin wrote: > Thanks for initiating this discussion. I merged the pull request because it > was unblocking another major piece of work for Spark 2.0: not requiring > assembly jars While I do

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Reynold Xin
Thanks for initiating this discussion. I merged the pull request because it was unblocking another major piece of work for Spark 2.0: not requiring assembly jars, which is arguably a lot more important than sources that are less frequently used. I take full responsibility for that. I think it's

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
I am not referring to code edits - but to migrating submodules and code currently in Apache Spark to 'outside' of it. If I understand correctly, assets from Apache Spark are being moved out of it into thirdparty external repositories - not owned by Apache. At a minimum, dev@ discussion (like this

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread chrismattmann
allows that, but the community itself must steward the code and part of that is hearing everyone's voice within that community before acting. Cheers, Chris -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-and-future-of-streaming-backends

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Luciano Resende
If the intention is to actually decouple and give a life of it's own to these connectors, I would have expected that they would still be hosted as different git repositories inside Apache even tough users will not really see much difference as they would still be mirrored in GitHub. This makes it

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Marcelo Vanzin
Hi Steve, thanks for the write up. On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran wrote: > If you want a separate project, eg. SPARK-EXTRAS, then it *generally* needs > to go through incubation. While normally its the incubator PMC which > sponsors/oversees the

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Imran Rashid
On Fri, Mar 18, 2016 at 3:15 PM, Shane Curcuru wrote: > Question: why was the code removed from the Spark repo? What's the harm > in keeping it available here? Assuming the Spark PMC has no plan on releasing the code, why would we keep it in our codebase? It only makes

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Jean-Baptiste Onofré
Hi Marcelo, I quickly discussed with Reynold this morning about this. I share your concerns. I fully understand that it's painful for users to wait a Spark releases to include fix in streaming backends as it's not really related. It makes sense to provide backends "outside" of ASF, especially

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Mattmann, Chris A (3980)
<dev@spark.apache.org> Subject: Re: SPARK-13843 and future of streaming backends >On Fri, Mar 18, 2016 at 2:12 PM, chrismattmann <mattm...@apache.org> >wrote: >> So, my comment here is that any code *cannot* be removed from an Apache >> project if there is a VET

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Luciano Resende
On Fri, Mar 18, 2016 at 10:07 AM, Marcelo Vanzin wrote: > Hi Steve, thanks for the write up. > > On Fri, Mar 18, 2016 at 3:12 AM, Steve Loughran > wrote: > > If you want a separate project, eg. SPARK-EXTRAS, then it *generally* > needs to go through

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Cody Koeninger
Why would a PMC vote be necessary on every code deletion? There was a Jira and pull request discussion about the submodules that have been removed so far. https://issues.apache.org/jira/browse/SPARK-13843 There's another ongoing one about Kafka specifically

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Cody Koeninger
> Or, as Cody Koeniger suggests, having a spark-extras project in the ASF with > a focus on extras with their own support channel. To be clear, I didn't suggest that and don't think that's the best solution. I said to the people who want things done that way, which committer is going to step

Re: SPARK-13843 and future of streaming backends

2016-03-18 Thread Marcelo Vanzin
On Fri, Mar 18, 2016 at 2:12 PM, chrismattmann wrote: > So, my comment here is that any code *cannot* be removed from an Apache > project if there is a VETO issued which so far I haven't seen, though maybe > Marcelo can clarify that. No, my intention was not to veto the