FWIW if there's a way to contribute code pertaining to datafusion I can contribute my version of Java bindings to it.
IMO having a central place (instead of linking) for all bindings, 3rd libraries, etc. for datafusion would mean more synergy across different languages but I won't go as far as a monorepo because the CI/CD process and release process are unlikely to benefit from it. Maybe a community owned GitHub org? On 2021/11/07 00:52:49 QP Hou wrote: > Hi all, > > I would like to propose a new and more community friendly governance > model for community contributed and maintained extensions for the > datafusion project. > > Over the last year, many datafusion extensions have been proposed and > created by the community including the java binding, s3 and hdfs[1] > object storage implementations, etc. Right now these code are or will > be hosted in individual github namespaces due to the following > reasons: > > * Most of these extensions are not considered part of the Datafusion > core, so the current maintainers prefer to not have them managed in > the main repository. The current python binding and ballista code base > is already adding a decent amount of overhead to our development > process. Adding more dependent crates will slow us down further > without much upside. > > * Considering the overhead of the official Apache release process, > current Datafusion PMCs don't have the bandwidth to manage individual > releases for these extensions. All of the authors of these extensions > are not Arrow PMC members, so they won't have the access to drive the > Apache releases by themselves. > > Therefore, I am proposing that we create an unofficial shared Github > organization to host these Datafusion contrib type projects that are > only maintained by non-PMC community members. I think this is strictly > better than hosting these extensions projects in personal github > namespaces. If any of these extensions end up getting significant > involvements or interests from Datafusion committers, then we can > promote them into official projects and provide official Apache style > release support. > > Other alternatives I have considered are: > > * Keep these projects under personal namespaces and only link them in > Datafusion's documentation. > > * Manage these extensions using experimental repos. But as far as I > know, the code owners still need to be a PMC member in order to > perform crates.io releases and it's not intended for long running > projects without no goal for eventual archival. > > * Create a dedicated mono repo named apache/datafusion-contrib to host > these extensions. However, this approach also requires PMC members to > get involved for crates.io releases if I understand it correctly. > > Am I curious if this is something that could be done under the Apache > governance model? My main goal is to create an unofficial incubator > type space for community members to develop and collaborate on > extensions that may or may not be adopted as official extensions in > the future. > > [1]: https://github.com/apache/arrow-datafusion/pull/1223 > > Thanks, > QP >