+1 (binding)
> On Mar 2, 2024, at 2:28 PM, Dewey Dunnington <de...@voltrondata.com.invalid> > wrote: > > +1 (binding) > >> On Sat, Mar 2, 2024 at 8:08 AM vin jake <jakevin...@gmail.com> wrote: >> >> +1 (binding) >> >>> On Fri, Mar 1, 2024 at 7:33 PM Andrew Lamb <al...@influxdata.com> wrote: >>> >>> Hello, >>> >>> As we have discussed[1][2] I would like to vote on the proposal to >>> create a new Apache Top Level Project for DataFusion. The text of the >>> proposed resolution and background document is copy/pasted below >>> >>> If the community is in favor of this, we plan to submit the resolution >>> to the ASF board for approval with the next Arrow report (for the >>> April 2024 board meeting). >>> >>> The vote will be open for at least 7 days. >>> >>> [ ] +1 Accept this Proposal >>> [ ] +0 >>> [ ] -1 Do not accept this proposal because... >>> >>> Andrew >>> >>> [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 >>> [2] https://github.com/apache/arrow-datafusion/discussions/6475 >>> >>> ---------- Proposed Resolution --------- >>> >>> Resolution to Create the Apache DataFusion Project from the Apache >>> Arrow DataFusion Sub Project >>> >>> ============================================================= >>> >>> X. Establish the Apache DataFusion Project >>> >>> WHEREAS, the Board of Directors deems it to be in the best >>> interests of the Foundation and consistent with the >>> Foundation's purpose to establish a Project Management >>> Committee charged with the creation and maintenance of >>> open-source software related to an extensible query engine >>> for distribution at no charge to the public. >>> >>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management >>> Committee (PMC), to be known as the "Apache DataFusion Project", >>> be and hereby is established pursuant to Bylaws of the >>> Foundation; and be it further >>> >>> RESOLVED, that the Apache DataFusion Project be and hereby is >>> responsible for the creation and maintenance of software >>> related to an extensible query engine; and be it further >>> >>> RESOLVED, that the office of "Vice President, Apache DataFusion" be >>> and hereby is created, the person holding such office to >>> serve at the direction of the Board of Directors as the chair >>> of the Apache DataFusion Project, and to have primary responsibility >>> for management of the projects within the scope of >>> responsibility of the Apache DataFusion Project; and be it further >>> >>> RESOLVED, that the persons listed immediately below be and >>> hereby are appointed to serve as the initial members of the >>> Apache DataFusion Project: >>> >>> * Andy Grove (agr...@apache.org) >>> * Andrew Lamb (al...@apache.org) >>> * Daniël Heres (dhe...@apache.org) >>> * Jie Wen (jake...@apache.org) >>> * Kun Liu (liu...@apache.org) >>> * Liang-Chi Hsieh (vii...@apache.org) >>> * Qingping Hou: (ho...@apache.org) >>> * Wes McKinney(w...@apache.org) >>> * Will Jones (wjones...@apache.org) >>> >>> RESOLVED, that the Apache DataFusion Project be and hereby >>> is tasked with the migration and rationalization of the Apache >>> Arrow DataFusion sub-project; and be it further >>> >>> RESOLVED, that all responsibilities pertaining to the Apache >>> Arrow DataFusion sub-project encumbered upon the >>> Apache Arrow Project are hereafter discharged. >>> >>> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb >>> be appointed to the office of Vice President, Apache DataFusion, to >>> serve in accordance with and subject to the direction of the >>> Board of Directors and the Bylaws of the Foundation until >>> death, resignation, retirement, removal or disqualification, >>> or until a successor is appointed. >>> ============================================================= >>> >>> >>> ------- >>> >>> >>> Summary: >>> >>> We propose creating a new top level project, Apache DataFusion, from >>> an existing sub project of Apache Arrow to facilitate additional >>> community and project growth. >>> >>> Abstract >>> >>> Apache Arrow DataFusion[1] is a very fast, extensible query engine >>> for building high-quality data-centric systems in Rust, using the >>> Apache Arrow in-memory format. DataFusion offers SQL and Dataframe >>> APIs, excellent performance, built-in support for CSV, Parquet, JSON, >>> and Avro, extensive customization, and a great community. >>> >>> [1] https://arrow.apache.org/datafusion/ >>> >>> >>> Proposal >>> >>> We propose creating a new top level ASF project, Apache DataFusion, >>> governed initially by a subset of the Apache Arrow project’s PMC and >>> committers. The project’s code is in five existing git repositories, >>> currently governed by Apache Arrow which would transfer to the new top >>> level project. >>> >>> Background >>> >>> When DataFusion was initially donated to the Arrow project, it did not >>> have a strong enough community to stand on its own. It has since grown >>> significantly, and benefited immensely from being part of Arrow and >>> nurturing of the Apache Way, and now has a community strong enough to >>> stand on its own and that would benefit from focused governance >>> attention. >>> >>> The community has discussed this idea publicly for more than 6 months >>> https://github.com/apache/arrow-datafusion/discussions/6475 and >>> briefly on the Arrow PMC mailing list >>> https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As >>> of the time of this writing both had exclusively positive reactions. >>> >>> Several current members of the Arrow PMC are both active contributors >>> to DataFusion and understand and believe deeply in the Apache Way, and >>> play active governance roles in the Arrow project as PMC members and >>> PMC chairs, guiding the community, and releasing software versions. >>> With this existing governance experience and structure, the new top >>> level project will be able to function well immediately and >>> independently. >>> >>> Overview of DataFusion >>> >>> Current Status >>> >>> Meritocracy >>> >>> DataFusion has been developed as part of Apache Arrow and thus has >>> been operating as a meritocracy. Many of the developers of DataFusion >>> are Arrow PMC members or committers. The DataFusion project plans to >>> continue adding new PMC and committers as the project matures and >>> grows. >>> >>> Community >>> >>> The DataFusion development team seeks to foster the development and >>> user communities. We hope that becoming a separate project will help >>> both Arrow and DataFusion communities by being more focused. Focused >>> governance will make it easier to grow the community of committers and >>> PMC members and make the organization more clear to others. >>> >>> Alignment >>> >>> The ASF is a natural host for DataFusion given that it is already the >>> home of Arrow, Parquet, and other related distributed system, storage >>> and query execution systems. >>> >>> Project Leadership >>> >>> Proposed Initial PMC >>> >>> We propose the following people as the initial DataFusion PMC members. >>> This is a subset of the existing Arrow PMC members who contribute to >>> DataFusion https://people.apache.org/phonebook.html?unix=arrow >>> >>> Andy Grove (agrove): Arrow PMC Chair >>> Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair >>> Daniël Heres (dheres) Arrow PMC >>> Jie Wen (jakevin): Arrow PMC, Doris Committer >>> Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC >>> Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC >>> Qingping Hou: (houqp): Arrow PMC >>> Wes McKinney(wesm): Arrow PMC, ASF Member >>> Will Jones (wjones127): Arrow PMC >>> >>> We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF >>> VP) for the DataFusion project. >>> >>> Affiliations >>> >>> Andy Grove (agrove): NVidia >>> Andrew Lamb (alamb): InfluxData >>> Daniël Heres (dheres): Coralogix >>> Jie Wen (jakevin): SelectDB >>> Kun Liu (liukun): Ebay >>> Liang-Chi Hsieh (viirya): Apple >>> Qingping Hou: (houqp): Scribd >>> Wes McKinney(wesm): Posit >>> Will Jones (wjones127): LanceDB >>> >>> Proposed Initial Committers >>> >>> In addition to the PMC, we propose the following people as the initial >>> DataFusion committers. This is a subset of the existing Arrow >>> committers who contribute to DataFusion >>> https://people.apache.org/phonebook.html?unix=arrow >>> >>> akurmustafa Mustafa Akur (Synnada) >>> avantgardner Brent Gardner (Coralogix) >>> comphead Oleks V. (Unaffiliated) >>> jayzhan Jay Zhan (Unaffiliated) >>> jeffreyvo Jeffry Vo (Unaffiliated) >>> jiayuliu Liu Jiayu (Airbnb) >>> mete Metehan Yildirim (Synnada) >>> mingmwang Wang Mingming (Ebay) >>> mneumann Marco Neumann (InfluxData) >>> nju_yaho Zhong Yanghong (Ebay) >>> ozankabak Mehmet Ozan Kabak (Synnada) >>> paddyhoran Paddy Horan (Assured Allies) >>> rdettai Rémi Dettai (Cloudfuse) >>> sunchao Chao Sun (Apple) >>> thinkharderdev Daniel Harris (Coralogix) >>> tustvold Raphael Taylor-Davies (InfluxData) >>> wayne Ruihang Xia (Greptime) >>> xudong963 Xudong Wang (ByteDance) >>> yjshen Yijie Shen (Space and Time) >>> yangjiang Yang Jiang (ebay) >>> >>> >>> Risk Assessments >>> >>> Naming / Trademarks >>> >>> As a sub-project of Arrow, the DataFusion name has been used for over >>> 4 years without any known issues. A podling name search did not turn >>> up any concerns and was approved: >>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 >>> >>> Legal / IP Clearance >>> >>> All DataFusion code has either been donated to the Arrow project with >>> appropriate IP clearance or has been developed directly under ASF >>> processes and procedures. Thus creating a new top level project poses >>> no new Legal or IP risks. >>> >>> Code Extraction >>> >>> The relevant code is already in 5 separate repositories: >>> https://github.com/apache/arrow-datafusion/ >>> https://github.com/apache/arrow-datafusion-python >>> https://github.com/apache/arrow-ballista >>> https://github.com/apache/arrow-ballista-python >>> https://github.com/apache/arrow-datafusion-comet >>> >>> We foresee no issues with code extraction and propose these >>> repositories be renamed to reflect top level projects >>> >>> Note: https://github.com/apache/arrow-rs, the Rust implementation of >>> Arrow, would remain part of the Arrow project. >>> >>> Orphaned Products >>> >>> DataFusion is known to be used in many open source and commercial >>> projects >>> https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users >>> , >>> has had multiple commits daily for several years, and its adoption and >>> number of contributors appears to be growing. We do not foresee the >>> project being orphaned in the next several years. >>> >>> Inexperience with Open Source >>> >>> The proposed PMC has extensive experience with Apache Arrow and other >>> Apache projects, and includes PMC members, PMC chairs and an ASF >>> Member. The DataFusion PMC and more experienced committers will >>> continue to coach new community members who may be less familiar with >>> the Apache Way. >>> >>> Homogeneous Developers >>> >>> The 9 proposed PMC members are from 9 different employers and the >>> proposed committers are similarly distributed across affiliations. No >>> specific entity employs more than 3 total proposed developers. >>> >>> Reliance on Salaried Developers >>> >>> A substantial amount of work on DataFusion has been by salaried >>> developers, but it also has a long tradition of attracting >>> contributions from students and hobbyists and we plan no changes in >>> contribution structure. >>> >>> Relationships with Other Apache Products >>> >>> DataFusion will obviously have a strong relationship with the Arrow >>> project given the overlap in people. We don’t foresee close >>> collaboration with other projects at this time. >>> >>> Cryptography >>> >>> DataFusion does not directly support encryption and there are no >>> near-term plans to add support for encryption. Users who need this >>> functionality can use the extension APIs. >>> >>> Required Resources >>> >>> Mailing Lists >>> >>> - priv...@datafusion.apache.org for private PMC discussions (with >>> moderated subscriptions) >>> - d...@datafusion.apache.org >>> - comm...@datafusion.apache.org >>> - u...@datafusion.apache.org >>> >>> Version Control >>> >>> We propose to continue to use git for source control and github for >>> hosting and testing resources. >>> >>> We also need to rename the github repositories to reflect the new top >>> level names: >>> >>> https://github.com/apache/arrow-datafusion/ → apache/datafusion >>> https://github.com/apache/arrow-datafusion-python → >>> apache/datafusion-python >>> https://github.com/apache/arrow-ballista → apache/datafusion-ballista >>> https://github.com/apache/arrow-ballista-python → >>> apache/datafusion-ballista-python >>> https://github.com/apache/arrow-datafusion-comet → apache/datafusion-comet >>> >>> >>> >>> Issue Tracking >>> >>> DataFusion would continue to use github for its issue tracking and >>> communications >>> >>> Other Resources >>> >>> The existing repositories already make use of existing Apache >>> infrastructure, and we expect no change in the initial resource usage. >>> As the project continues to grow, we expect continued infrastructure >>> demand growth. >>> >>> >>> FAQ: Has a sub project been promoted to a top level project before? >>> >>> Yes, and it appears to happen commonly. The Arrow project itself was >>> created as a top level project from work that started in Apache Drill, >>> and there are many sub projects of Hadoop that spun out as their own >>> top level projects such as Mahout, Avro and HBase: >>> >>> https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 >>> >>> >>> >>> Related material: >>> Name search request / research for DataFusion: >>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 >>> Discussion about this proposal on the arrow mailing list: >>> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 >>> Discussion about which repositories on the arrow mailing list: >>> https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q >>> Discussion about initial PMC on the arrow mailing list: >>> https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b >>> Discussion in github about creating a new DataFusion top level >>> project: https://github.com/apache/arrow-datafusion/discussions/6475 >>> Discussion about graduating on incubator list: >>> https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 >>> Original Proposal for the Arrow project: >>> https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 >>>