> Or, as Cody Koeniger suggests, having a spark-extras project in the ASF with > a focus on extras with their own support channel.
To be clear, I didn't suggest that and don't think that's the best solution. I said to the people who want things done that way, which committer is going to step up and do that organizational work? I think there are advantages to moving everything currently in extras/ and external/ out of the spark project, but the current Kafka packaging issue can be solved straightforwardly by just adding another artifact and code tree under external/. On Fri, Mar 18, 2016 at 5:04 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > Spark has hit one of the enternal problems of OSS projects, one hit by: ant, > maven, hadoop, ... anything with a plugin model. > > Take in the plugin: you're in control, but also down for maintenance > > Leave out the plugin: other people can maintain it, be more agile, etc. > > But you've lost control, and you can't even manage the links. Here I think > maven suffered the most by keeping stuff in codehaus; migrating off there is > still hard —not only did they lose the links: they lost the JIRA. > > Maven's relationship with codehaus was very tightly coupled, lots of > committers on both; I don't know how that relationship was handled at a > higher level. > > > On 17 Mar 2016, at 20:51, Hari Shreedharan <hshreedha...@cloudera.com> > wrote: > > I have worked with various ASF projects for 4+ years now. Sure, ASF projects > can delete code as they feel fit. But this is the first time I have really > seen code being "moved out" of a project without discussion. I am sure you > can do this without violating ASF policy, but the explanation for that would > be convoluted (someone decided to make a copy and then the ASF project > deleted it?). > > > +1 for discussion. Dev changes should -> dev list; PMC for process in > general. Don't think the ASF will overlook stuff like that. > > Might want to raise this issue on the next broad report > > > FWIW, it may be better to just see if you can have committers to work on > these projects: recruit the people and say 'please, only work in this area > —for now". That gets developers on your team, which is generally considered > a metric of health in a project. > > Or, as Cody Koeniger suggests, having a spark-extras project in the ASF with > a focus on extras with their own support channel. > > > Also, moving the code out would break compatibility. AFAIK, there is no way > to push org.apache.* artifacts directly to maven central. That happens via > mirroring from the ASF maven repos. Even if it you could somehow directly > push the artifacts to mvn, you really can push to org.apache.* groups only > if you are part of the repo and acting as an agent of that project (which in > this case would be Apache Spark). Once you move the code out, even a > committer/PMC member would not be representing the ASF when pushing the > code. I am not sure if there is a way to fix this issue. > > > > > This topic has cropped up in the general context of third party repos > publishing artifacts with org.apache names but vendor specfic suffixes (e.g > org.apache.hadoop/hadoop-common.5.3-cdh.jar > > Some people were pretty unhappy about this, but the conclusion reached was > "maven doesn't let you do anything else and still let downstream people use > it". Futhermore, as all ASF releases are nominally the source releases *not > the binaries*, you can look at the POMs and say "we've released source code > designed to publish artifacts to repos —this is 'use as intended'. > > People are also free to cut their own full project distributions, etc, etc. > For example, I stick up the binaries of Windows builds independent of the > ASF releases; these were originally just those from HDP on windows installs, > now I check out the commit of the specific ASF release on a windows 2012 VM, > do the build, copy the binaries. Free for all to use. But I do suspect that > the ASF legal protections get a bit blurred here. These aren't ASF binaries, > but binaries built directly from unmodified ASF releases. > > In contrast to sticking stuff into a github repo, the moved artifacts cannot > be published as org.apache artfacts on maven central. That's non-negotiable > as far as the ASF are concerned. The process for releasing ASF artifacts > there goes downstream of the ASF public release process: you stage the > artifacts, they are part of the vote process, everything with org.apache > goes through it. > > That said: there is nothing to stop a set of shell org.apache artifacts > being written which do nothing but contain transitive dependencies on > artifacts in different groups, such as org.spark-project. The shells would > be released by the ASF; they pull in the new stuff. And, therefore, it'd be > possible to build a spark-assembly with the files. (I'm ignoring a loop in > the build DAG here, playing with git submodules would let someone eliminate > this by adding the removed libraries under a modified project. > > I think there might some issues related to package names; you could make a > case for having public APIs with the original names —they're the API, after > all, and that's exactly what Apache Harmony did with the java.* packages. > > > Thanks, > Hari > > On Thu, Mar 17, 2016 at 1:13 PM, Mridul Muralidharan <mri...@gmail.com> > wrote: >> >> I am not referring to code edits - but to migrating submodules and >> code currently in Apache Spark to 'outside' of it. >> If I understand correctly, assets from Apache Spark are being moved >> out of it into thirdparty external repositories - not owned by Apache. >> >> At a minimum, dev@ discussion (like this one) should be initiated. >> As PMC is responsible for the project assets (including code), signoff >> is required for it IMO. >> >> More experienced Apache members might be opine better in case I got it >> wrong ! >> >> >> Regards, >> Mridul >> >> >> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger <c...@koeninger.org> >> wrote: >> > Why would a PMC vote be necessary on every code deletion? >> > >> > There was a Jira and pull request discussion about the submodules that >> > have been removed so far. >> > >> > https://issues.apache.org/jira/browse/SPARK-13843 >> > >> > There's another ongoing one about Kafka specifically >> > >> > https://issues.apache.org/jira/browse/SPARK-13877 >> > >> > >> > On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan <mri...@gmail.com> >> > wrote: >> >> >> >> I was not aware of a discussion in Dev list about this - agree with >> >> most of >> >> the observations. >> >> In addition, I did not see PMC signoff on moving (sub-)modules out. >> >> >> >> Regards >> >> Mridul >> >> >> >> >> >> >> >> On Thursday, March 17, 2016, Marcelo Vanzin <van...@cloudera.com> >> >> wrote: >> >>> >> >>> Hello all, >> >>> >> >>> Recently a lot of the streaming backends were moved to a separate >> >>> project on github and removed from the main Spark repo. >> >>> >> >>> While I think the idea is great, I'm a little worried about the >> >>> execution. Some concerns were already raised on the bug mentioned >> >>> above, but I'd like to have a more explicit discussion about this so >> >>> things don't fall through the cracks. >> >>> >> >>> Mainly I have three concerns. >> >>> >> >>> i. Ownership >> >>> >> >>> That code used to be run by the ASF, but now it's hosted in a github >> >>> repo owned not by the ASF. That sounds a little sub-optimal, if not >> >>> problematic. >> >>> >> >>> ii. Governance >> >>> >> >>> Similar to the above; who has commit access to the above repos? Will >> >>> all the Spark committers, present and future, have commit access to >> >>> all of those repos? Are they still going to be considered part of >> >>> Spark and have release management done through the Spark community? >> >>> >> >>> >> >>> For both of the questions above, why are they not turned into >> >>> sub-projects of Spark and hosted on the ASF repos? I believe there is >> >>> a mechanism to do that, without the need to keep the code in the main >> >>> Spark repo, right? >> >>> >> >>> iii. Usability >> >>> >> >>> This is another thing I don't see discussed. For Scala-based code >> >>> things don't change much, I guess, if the artifact names don't change >> >>> (another reason to keep things in the ASF?), but what about python? >> >>> How are pyspark users expected to get that code going forward, since >> >>> it's not in Spark's pyspark.zip anymore? >> >>> >> >>> >> >>> Is there an easy way of keeping these things within the ASF Spark >> >>> project? I think that would be better for everybody. >> >>> >> >>> -- >> >>> Marcelo >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >>> For additional commands, e-mail: dev-h...@spark.apache.org >> >>> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org