FWIW if there's a way to contribute code pertaining to datafusion I can
contribute my version of Java bindings to it.

IMO having a central place (instead of linking) for all bindings, 3rd
libraries, etc. for datafusion would mean more synergy across different
languages but I won't go as far as a monorepo because the CI/CD process
and release process are unlikely to benefit from it. Maybe a community
owned GitHub org?

On 2021/11/07 00:52:49 QP Hou wrote:
> Hi all,
> 
> I would like to propose a new and more community friendly governance
> model for community contributed and maintained extensions for the
> datafusion project.
> 
> Over the last year, many datafusion extensions have been proposed and
> created by the community including the java binding, s3 and hdfs[1]
> object storage implementations, etc. Right now these code are or will
> be hosted in individual github namespaces due to the following
> reasons:
> 
> * Most of these extensions are not considered part of the Datafusion
> core, so the current maintainers prefer to not have them managed in
> the main repository. The current python binding and ballista code base
> is already adding a decent amount of overhead to our development
> process. Adding more dependent crates will slow us down further
> without much upside.
> 
> * Considering the overhead of the official Apache release process,
> current Datafusion PMCs don't have the bandwidth to manage individual
> releases for these extensions. All of the authors of these extensions
> are not Arrow PMC members, so they won't have the access to drive the
> Apache releases by themselves.
> 
> Therefore, I am proposing that we create an unofficial shared Github
> organization to host these Datafusion contrib type projects that are
> only maintained by non-PMC community members. I think this is strictly
> better than hosting these extensions projects in personal github
> namespaces. If any of these extensions end up getting significant
> involvements or interests from Datafusion committers, then we can
> promote them into official projects and provide official Apache style
> release support.
> 
> Other alternatives I have considered are:
> 
> * Keep these projects under personal namespaces and only link them in
> Datafusion's documentation.
> 
> * Manage these extensions using experimental repos. But as far as I
> know, the code owners still need to be a PMC member in order to
> perform crates.io releases and it's not intended for long running
> projects without no goal for eventual archival.
> 
> * Create a dedicated mono repo named apache/datafusion-contrib to host
> these extensions. However, this approach also requires PMC members to
> get involved for crates.io releases if I understand it correctly.
> 
> Am I curious if this is something that could be done under the Apache
> governance model? My main goal is to create an unofficial incubator
> type space for community members to develop and collaborate on
> extensions that may or may not be adopted as official extensions in
> the future.
> 
> [1]: https://github.com/apache/arrow-datafusion/pull/1223
> 
> Thanks,
> QP
>

Reply via email to