Spark has hit one of the enternal problems of OSS projects, one hit by: ant, maven, hadoop, ... anything with a plugin model.
Take in the plugin: you're in control, but also down for maintenance Leave out the plugin: other people can maintain it, be more agile, etc. But you've lost control, and you can't even manage the links. Here I think maven suffered the most by keeping stuff in codehaus; migrating off there is still hard —not only did they lose the links: they lost the JIRA. Maven's relationship with codehaus was very tightly coupled, lots of committers on both; I don't know how that relationship was handled at a higher level. On 17 Mar 2016, at 20:51, Hari Shreedharan <hshreedha...@cloudera.com<mailto:hshreedha...@cloudera.com>> wrote: I have worked with various ASF projects for 4+ years now. Sure, ASF projects can delete code as they feel fit. But this is the first time I have really seen code being "moved out" of a project without discussion. I am sure you can do this without violating ASF policy, but the explanation for that would be convoluted (someone decided to make a copy and then the ASF project deleted it?). +1 for discussion. Dev changes should -> dev list; PMC for process in general. Don't think the ASF will overlook stuff like that. Might want to raise this issue on the next broad report FWIW, it may be better to just see if you can have committers to work on these projects: recruit the people and say 'please, only work in this area —for now". That gets developers on your team, which is generally considered a metric of health in a project. Or, as Cody Koeniger suggests, having a spark-extras project in the ASF with a focus on extras with their own support channel. Also, moving the code out would break compatibility. AFAIK, there is no way to push org.apache.* artifacts directly to maven central. That happens via mirroring from the ASF maven repos. Even if it you could somehow directly push the artifacts to mvn, you really can push to org.apache.* groups only if you are part of the repo and acting as an agent of that project (which in this case would be Apache Spark). Once you move the code out, even a committer/PMC member would not be representing the ASF when pushing the code. I am not sure if there is a way to fix this issue. This topic has cropped up in the general context of third party repos publishing artifacts with org.apache names but vendor specfic suffixes (e.g org.apache.hadoop/hadoop-common.5.3-cdh.jar Some people were pretty unhappy about this, but the conclusion reached was "maven doesn't let you do anything else and still let downstream people use it". Futhermore, as all ASF releases are nominally the source releases *not the binaries*, you can look at the POMs and say "we've released source code designed to publish artifacts to repos —this is 'use as intended'. People are also free to cut their own full project distributions, etc, etc. For example, I stick up the binaries of Windows builds independent of the ASF releases; these were originally just those from HDP on windows installs, now I check out the commit of the specific ASF release on a windows 2012 VM, do the build, copy the binaries. Free for all to use. But I do suspect that the ASF legal protections get a bit blurred here. These aren't ASF binaries, but binaries built directly from unmodified ASF releases. In contrast to sticking stuff into a github repo, the moved artifacts cannot be published as org.apache artfacts on maven central. That's non-negotiable as far as the ASF are concerned. The process for releasing ASF artifacts there goes downstream of the ASF public release process: you stage the artifacts, they are part of the vote process, everything with org.apache goes through it. That said: there is nothing to stop a set of shell org.apache artifacts being written which do nothing but contain transitive dependencies on artifacts in different groups, such as org.spark-project. The shells would be released by the ASF; they pull in the new stuff. And, therefore, it'd be possible to build a spark-assembly with the files. (I'm ignoring a loop in the build DAG here, playing with git submodules would let someone eliminate this by adding the removed libraries under a modified project. I think there might some issues related to package names; you could make a case for having public APIs with the original names —they're the API, after all, and that's exactly what Apache Harmony did with the java.* packages. Thanks, Hari On Thu, Mar 17, 2016 at 1:13 PM, Mridul Muralidharan <mri...@gmail.com<mailto:mri...@gmail.com>> wrote: I am not referring to code edits - but to migrating submodules and code currently in Apache Spark to 'outside' of it. If I understand correctly, assets from Apache Spark are being moved out of it into thirdparty external repositories - not owned by Apache. At a minimum, dev@ discussion (like this one) should be initiated. As PMC is responsible for the project assets (including code), signoff is required for it IMO. More experienced Apache members might be opine better in case I got it wrong ! Regards, Mridul On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger <c...@koeninger.org<mailto:c...@koeninger.org>> wrote: > Why would a PMC vote be necessary on every code deletion? > > There was a Jira and pull request discussion about the submodules that > have been removed so far. > > https://issues.apache.org/jira/browse/SPARK-13843 > > There's another ongoing one about Kafka specifically > > https://issues.apache.org/jira/browse/SPARK-13877 > > > On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan > <mri...@gmail.com<mailto:mri...@gmail.com>> wrote: >> >> I was not aware of a discussion in Dev list about this - agree with most of >> the observations. >> In addition, I did not see PMC signoff on moving (sub-)modules out. >> >> Regards >> Mridul >> >> >> >> On Thursday, March 17, 2016, Marcelo Vanzin >> <van...@cloudera.com<mailto:van...@cloudera.com>> wrote: >>> >>> Hello all, >>> >>> Recently a lot of the streaming backends were moved to a separate >>> project on github and removed from the main Spark repo. >>> >>> While I think the idea is great, I'm a little worried about the >>> execution. Some concerns were already raised on the bug mentioned >>> above, but I'd like to have a more explicit discussion about this so >>> things don't fall through the cracks. >>> >>> Mainly I have three concerns. >>> >>> i. Ownership >>> >>> That code used to be run by the ASF, but now it's hosted in a github >>> repo owned not by the ASF. That sounds a little sub-optimal, if not >>> problematic. >>> >>> ii. Governance >>> >>> Similar to the above; who has commit access to the above repos? Will >>> all the Spark committers, present and future, have commit access to >>> all of those repos? Are they still going to be considered part of >>> Spark and have release management done through the Spark community? >>> >>> >>> For both of the questions above, why are they not turned into >>> sub-projects of Spark and hosted on the ASF repos? I believe there is >>> a mechanism to do that, without the need to keep the code in the main >>> Spark repo, right? >>> >>> iii. Usability >>> >>> This is another thing I don't see discussed. For Scala-based code >>> things don't change much, I guess, if the artifact names don't change >>> (another reason to keep things in the ASF?), but what about python? >>> How are pyspark users expected to get that code going forward, since >>> it's not in Spark's pyspark.zip anymore? >>> >>> >>> Is there an easy way of keeping these things within the ASF Spark >>> project? I think that would be better for everybody. >>> >>> -- >>> Marcelo >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: >>> dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org> >>> For additional commands, e-mail: >>> dev-h...@spark.apache.org<mailto:dev-h...@spark.apache.org> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org> For additional commands, e-mail: dev-h...@spark.apache.org<mailto:dev-h...@spark.apache.org>