Thanks Wes for the confirmation. Yes, we only intend to keep
extensions that won't get merged back to datafusion core in the
contrib repo. Any code that we intend to go into the core will
definitely still be developed within the ASF GH org.

On Wed, Nov 17, 2021 at 3:29 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> Having a "community" contrib GitHub org outside of Apache sounds fine.
> If we want to move any packages into the Apache governance structure
> then we can conduct an IP clearance at that point. Since the term
> "DataFusion" doesn't have ASF trademark issues like "Arrow" does, we
> don't need to be as careful with project names (e.g. "Arrow X" is bad
> but "X for Arrow" or "X powered by Arrow" is OK)
>
> If the contrib repository gets used to "incubate" new ideas for
> mainline DataFusion, we might rather use the "experimental" repo
> policy (discussed in the past on the mailing list) to keep the work
> happening inside Apache.
>
> On Mon, Nov 15, 2021 at 7:32 AM Andrew Lamb <al...@influxdata.com> wrote:
> >
> > Thank you QP
> >
> > Andrew
> >
> > On Sun, Nov 14, 2021 at 5:02 PM QP Hou <houqp....@gmail.com> wrote:
> >
> > > Thanks Jiayu, Benson, Micah and Andrew for your input on this. I have
> > > created an unofficial Github org [1] as a quick and dirty experiment
> > > for something like spark-packages.org. We should make it clear that
> > > code developed in this org will still need to go through the donation
> > > process in order to get into the ASF org.
> > >
> > > [1]: https://github.com/datafusion-contrib
> > >
> > > On Mon, Nov 8, 2021 at 3:12 AM Andrew Lamb <al...@influxdata.com> wrote:
> > > >
> > > > I think a separate non-ASF organization, with a central list of
> > > extensions
> > > > like spark-packages.org sounds like a good idea to me.
> > > >
> > > > On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com>
> > > > wrote:
> > > >
> > > > > I'll preface this with not being an expert on these matters but this
> > > is my
> > > > > impression.
> > > > >
> > > > >
> > > > > > Therefore, I am proposing that we create an unofficial shared Github
> > > > > > organization to host these Datafusion contrib type projects that are
> > > > > > only maintained by non-PMC community members.
> > > > >
> > > > >
> > > > > I think as long as this is hosted outside of the Apache github
> > > > > organization, this seems fine.  I think being careful around 
> > > > > trade-mark
> > > > > issues and making it clear it isn't officially part of the Apache
> > > > > DataFusion project are the things to be careful about.  FWIW, I seem 
> > > > > to
> > > > > recall this type of model was something proposed in Spark and there 
> > > > > was
> > > > > some tension at the time with branding of the project.  It looks like
> > > Spark
> > > > > has settled on having a central site <https://spark-packages.org/>
> > > [1][2]
> > > > > for linking additional modules and they don't have a common namespace.
> > > > >
> > > > >
> > > > > > Am I curious if this is something that could be done under the 
> > > > > > Apache
> > > > > > governance model? My main goal is to create an unofficial incubator
> > > > > > type space for community members to develop and collaborate on
> > > > > > extensions that may or may not be adopted as official extensions in
> > > > > > the future.
> > > > >
> > > > >
> > > > > My limited understanding is either something is governed by the ASF
> > > rules
> > > > > (i.e. PMC/Committers officially recognized by the apache foundation,
> > > along
> > > > > with release requirements) or it isn't, there really isn't a half-way
> > > thing
> > > > > here from the ASF perspective.  Independent projects can choose
> > > ASF-like
> > > > > policies and manage themselves in this manner. The incubator program
> > > at the
> > > > > ASF is for projects that might or might not have sustained interest to
> > > > > continue (but my understanding is incubation follows all the process
> > > of a
> > > > > normal top-level Apache project).  Any code developed outside of ASF
> > > > > governance needs to go through the donation process (IP Clearance,
> > > etc) to
> > > > > be moved into ASF repos, even if it is developed by PMC
> > > members/committers
> > > > > (see prior discussions on Arrow2 in Rust and the Julia libraries).
> > > > >
> > > > > Cheers,
> > > > > Micah
> > > > >
> > > > > [1] https://spark.apache.org/contributing.html
> > > > > [2] https://spark-packages.org/
> > > > >
> > > > >
> > > > > On Sun, Nov 7, 2021 at 2:31 AM Benson Muite <
> > > benson_mu...@emailplus.org>
> > > > > wrote:
> > > > >
> > > > > > A community owned GitHub organization would be helpful. Maybe for 
> > > > > > all
> > > > > > other Arrow related projects not just Datafusion. This would make
> > > them
> > > > > > easier to find, and for community members to contribute. It could
> > > also
> > > > > > include a listing of relevant projects elsewhere.
> > > > > >
> > > > > > On 11/7/21 9:40 AM, Jiayu Liu wrote:
> > > > > > > FWIW if there's a way to contribute code pertaining to datafusion
> > > I can
> > > > > > > contribute my version of Java bindings to it.
> > > > > > >
> > > > > > > IMO having a central place (instead of linking) for all bindings,
> > > 3rd
> > > > > > > libraries, etc. for datafusion would mean more synergy across
> > > different
> > > > > > > languages but I won't go as far as a monorepo because the CI/CD
> > > process
> > > > > > > and release process are unlikely to benefit from it. Maybe a
> > > community
> > > > > > > owned GitHub org?
> > > > > > >
> > > > > > > On 2021/11/07 00:52:49 QP Hou wrote:
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I would like to propose a new and more community friendly
> > > governance
> > > > > > >> model for community contributed and maintained extensions for the
> > > > > > >> datafusion project.
> > > > > > >>
> > > > > > >> Over the last year, many datafusion extensions have been proposed
> > > and
> > > > > > >> created by the community including the java binding, s3 and
> > > hdfs[1]
> > > > > > >> object storage implementations, etc. Right now these code are or
> > > will
> > > > > > >> be hosted in individual github namespaces due to the following
> > > > > > >> reasons:
> > > > > > >>
> > > > > > >> * Most of these extensions are not considered part of the
> > > Datafusion
> > > > > > >> core, so the current maintainers prefer to not have them managed
> > > in
> > > > > > >> the main repository. The current python binding and ballista code
> > > base
> > > > > > >> is already adding a decent amount of overhead to our development
> > > > > > >> process. Adding more dependent crates will slow us down further
> > > > > > >> without much upside.
> > > > > > >>
> > > > > > >> * Considering the overhead of the official Apache release 
> > > > > > >> process,
> > > > > > >> current Datafusion PMCs don't have the bandwidth to manage
> > > individual
> > > > > > >> releases for these extensions. All of the authors of these
> > > extensions
> > > > > > >> are not Arrow PMC members, so they won't have the access to drive
> > > the
> > > > > > >> Apache releases by themselves.
> > > > > > >>
> > > > > > >> Therefore, I am proposing that we create an unofficial shared
> > > Github
> > > > > > >> organization to host these Datafusion contrib type projects that
> > > are
> > > > > > >> only maintained by non-PMC community members. I think this is
> > > strictly
> > > > > > >> better than hosting these extensions projects in personal github
> > > > > > >> namespaces. If any of these extensions end up getting significant
> > > > > > >> involvements or interests from Datafusion committers, then we can
> > > > > > >> promote them into official projects and provide official Apache
> > > style
> > > > > > >> release support.
> > > > > > >>
> > > > > > >> Other alternatives I have considered are:
> > > > > > >>
> > > > > > >> * Keep these projects under personal namespaces and only link
> > > them in
> > > > > > >> Datafusion's documentation.
> > > > > > >>
> > > > > > >> * Manage these extensions using experimental repos. But as far as
> > > I
> > > > > > >> know, the code owners still need to be a PMC member in order to
> > > > > > >> perform crates.io releases and it's not intended for long running
> > > > > > >> projects without no goal for eventual archival.
> > > > > > >>
> > > > > > >> * Create a dedicated mono repo named apache/datafusion-contrib to
> > > host
> > > > > > >> these extensions. However, this approach also requires PMC
> > > members to
> > > > > > >> get involved for crates.io releases if I understand it correctly.
> > > > > > >>
> > > > > > >> Am I curious if this is something that could be done under the
> > > Apache
> > > > > > >> governance model? My main goal is to create an unofficial
> > > incubator
> > > > > > >> type space for community members to develop and collaborate on
> > > > > > >> extensions that may or may not be adopted as official extensions
> > > in
> > > > > > >> the future.
> > > > > > >>
> > > > > > >> [1]: https://github.com/apache/arrow-datafusion/pull/1223
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> QP
> > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > >

Reply via email to