>  Or, as Cody Koeniger suggests, having a spark-extras project in the ASF with 
> a focus on extras with their own support channel.

To be clear, I didn't suggest that and don't think that's the best
solution.  I said to the people who want things done that way, which
committer is going to step up and do that organizational work?

I think there are advantages to moving everything currently in extras/
and external/ out of the spark project, but the current Kafka
packaging issue can be solved straightforwardly by just adding another
artifact and code tree under external/.

On Fri, Mar 18, 2016 at 5:04 AM, Steve Loughran <ste...@hortonworks.com> wrote:
>
> Spark has hit one of the enternal problems of OSS projects, one hit by: ant,
> maven, hadoop, ... anything with a plugin model.
>
> Take in the plugin: you're in control, but also down for maintenance
>
> Leave out the plugin: other people can maintain it, be more agile, etc.
>
> But you've lost control, and you can't even manage the links. Here I think
> maven suffered the most by keeping stuff in codehaus; migrating off there is
> still hard —not only did they lose the links: they lost the JIRA.
>
> Maven's relationship with codehaus was very tightly coupled, lots of
> committers on both; I don't know how that relationship was handled at a
> higher level.
>
>
> On 17 Mar 2016, at 20:51, Hari Shreedharan <hshreedha...@cloudera.com>
> wrote:
>
> I have worked with various ASF projects for 4+ years now. Sure, ASF projects
> can delete code as they feel fit. But this is the first time I have really
> seen code being "moved out" of a project without discussion. I am sure you
> can do this without violating ASF policy, but the explanation for that would
> be convoluted (someone decided to make a copy and then the ASF project
> deleted it?).
>
>
> +1 for discussion. Dev changes should -> dev list; PMC for process in
> general. Don't think the ASF will overlook stuff like that.
>
> Might want to raise this issue on the next broad report
>
>
> FWIW, it may be better to just see if you can have committers to work on
> these projects: recruit the people and say 'please, only work in this area
> —for now". That gets developers on your team, which is generally considered
> a metric of health in a project.
>
> Or, as Cody Koeniger suggests, having a spark-extras project in the ASF with
> a focus on extras with their own support channel.
>
>
> Also, moving the code out would break compatibility. AFAIK, there is no way
> to push org.apache.* artifacts directly to maven central. That happens via
> mirroring from the ASF maven repos. Even if it you could somehow directly
> push the artifacts to mvn, you really can push to org.apache.* groups only
> if you are part of the repo and acting as an agent of that project (which in
> this case would be Apache Spark). Once you move the code out, even a
> committer/PMC member would not be representing the ASF when pushing the
> code. I am not sure if there is a way to fix this issue.
>
>
>
>
> This topic has cropped up in the general context of third party repos
> publishing artifacts with org.apache names but vendor specfic suffixes (e.g
> org.apache.hadoop/hadoop-common.5.3-cdh.jar
>
> Some people were pretty unhappy about this, but the conclusion reached was
> "maven doesn't let you do anything else and still let downstream people use
> it". Futhermore, as all ASF releases are nominally the source releases *not
> the binaries*, you can look at the POMs and say "we've released source code
> designed to publish artifacts to repos —this is 'use as intended'.
>
> People are also free to cut their own full project distributions, etc, etc.
> For example, I stick up the binaries of Windows builds independent of the
> ASF releases; these were originally just those from HDP on windows installs,
> now I check out the commit of the specific ASF release on a windows 2012 VM,
> do the build, copy the binaries. Free for all to use. But I do suspect that
> the ASF legal protections get a bit blurred here. These aren't ASF binaries,
> but binaries built directly from unmodified ASF releases.
>
> In contrast to sticking stuff into a github repo, the moved artifacts cannot
> be published as org.apache artfacts on maven central. That's non-negotiable
> as far as the ASF are concerned. The process for releasing ASF artifacts
> there goes downstream of the ASF public release process: you stage the
> artifacts, they are part of the vote process, everything with org.apache
> goes through it.
>
> That said: there is nothing to stop a set of shell org.apache artifacts
> being written which do nothing but contain transitive dependencies on
> artifacts in different groups, such as org.spark-project. The shells would
> be released by the ASF; they pull in the new stuff. And, therefore, it'd be
> possible to build a spark-assembly with the files. (I'm ignoring a loop in
> the build DAG here, playing with git submodules would let someone eliminate
> this by adding the removed libraries under a modified project.
>
> I think there might some issues related to package names; you could make a
> case for having public APIs with the original names —they're the API, after
> all, and that's exactly what Apache Harmony did with the java.* packages.
>
>
> Thanks,
> Hari
>
> On Thu, Mar 17, 2016 at 1:13 PM, Mridul Muralidharan <mri...@gmail.com>
> wrote:
>>
>> I am not referring to code edits - but to migrating submodules and
>> code currently in Apache Spark to 'outside' of it.
>> If I understand correctly, assets from Apache Spark are being moved
>> out of it into thirdparty external repositories - not owned by Apache.
>>
>> At a minimum, dev@ discussion (like this one) should be initiated.
>> As PMC is responsible for the project assets (including code), signoff
>> is required for it IMO.
>>
>> More experienced Apache members might be opine better in case I got it
>> wrong !
>>
>>
>> Regards,
>> Mridul
>>
>>
>> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>> > Why would a PMC vote be necessary on every code deletion?
>> >
>> > There was a Jira and pull request discussion about the submodules that
>> > have been removed so far.
>> >
>> > https://issues.apache.org/jira/browse/SPARK-13843
>> >
>> > There's another ongoing one about Kafka specifically
>> >
>> > https://issues.apache.org/jira/browse/SPARK-13877
>> >
>> >
>> > On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan <mri...@gmail.com>
>> > wrote:
>> >>
>> >> I was not aware of a discussion in Dev list about this - agree with
>> >> most of
>> >> the observations.
>> >> In addition, I did not see PMC signoff on moving (sub-)modules out.
>> >>
>> >> Regards
>> >> Mridul
>> >>
>> >>
>> >>
>> >> On Thursday, March 17, 2016, Marcelo Vanzin <van...@cloudera.com>
>> >> wrote:
>> >>>
>> >>> Hello all,
>> >>>
>> >>> Recently a lot of the streaming backends were moved to a separate
>> >>> project on github and removed from the main Spark repo.
>> >>>
>> >>> While I think the idea is great, I'm a little worried about the
>> >>> execution. Some concerns were already raised on the bug mentioned
>> >>> above, but I'd like to have a more explicit discussion about this so
>> >>> things don't fall through the cracks.
>> >>>
>> >>> Mainly I have three concerns.
>> >>>
>> >>> i. Ownership
>> >>>
>> >>> That code used to be run by the ASF, but now it's hosted in a github
>> >>> repo owned not by the ASF. That sounds a little sub-optimal, if not
>> >>> problematic.
>> >>>
>> >>> ii. Governance
>> >>>
>> >>> Similar to the above; who has commit access to the above repos? Will
>> >>> all the Spark committers, present and future, have commit access to
>> >>> all of those repos? Are they still going to be considered part of
>> >>> Spark and have release management done through the Spark community?
>> >>>
>> >>>
>> >>> For both of the questions above, why are they not turned into
>> >>> sub-projects of Spark and hosted on the ASF repos? I believe there is
>> >>> a mechanism to do that, without the need to keep the code in the main
>> >>> Spark repo, right?
>> >>>
>> >>> iii. Usability
>> >>>
>> >>> This is another thing I don't see discussed. For Scala-based code
>> >>> things don't change much, I guess, if the artifact names don't change
>> >>> (another reason to keep things in the ASF?), but what about python?
>> >>> How are pyspark users expected to get that code going forward, since
>> >>> it's not in Spark's pyspark.zip anymore?
>> >>>
>> >>>
>> >>> Is there an easy way of keeping these things within the ASF Spark
>> >>> project? I think that would be better for everybody.
>> >>>
>> >>> --
>> >>> Marcelo
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >>> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to