+1 (binding)
On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote: > +1 (binding) > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace <weston.p...@gmail.com> wrote: > >> +1 (binding) >> >> On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb <al...@influxdata.com> wrote: >> >> > Hello, >> > >> > As we have discussed[1][2] I would like to vote on the proposal to >> > create a new Apache Top Level Project for DataFusion. The text of the >> > proposed resolution and background document is copy/pasted below >> > >> > If the community is in favor of this, we plan to submit the resolution >> > to the ASF board for approval with the next Arrow report (for the >> > April 2024 board meeting). >> > >> > The vote will be open for at least 7 days. >> > >> > [ ] +1 Accept this Proposal >> > [ ] +0 >> > [ ] -1 Do not accept this proposal because... >> > >> > Andrew >> > >> > [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 >> > [2] https://github.com/apache/arrow-datafusion/discussions/6475 >> > >> > ---------- Proposed Resolution --------- >> > >> > Resolution to Create the Apache DataFusion Project from the Apache >> > Arrow DataFusion Sub Project >> > >> > ============================================================= >> > >> > X. Establish the Apache DataFusion Project >> > >> > WHEREAS, the Board of Directors deems it to be in the best >> > interests of the Foundation and consistent with the >> > Foundation's purpose to establish a Project Management >> > Committee charged with the creation and maintenance of >> > open-source software related to an extensible query engine >> > for distribution at no charge to the public. >> > >> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management >> > Committee (PMC), to be known as the "Apache DataFusion Project", >> > be and hereby is established pursuant to Bylaws of the >> > Foundation; and be it further >> > >> > RESOLVED, that the Apache DataFusion Project be and hereby is >> > responsible for the creation and maintenance of software >> > related to an extensible query engine; and be it further >> > >> > RESOLVED, that the office of "Vice President, Apache DataFusion" be >> > and hereby is created, the person holding such office to >> > serve at the direction of the Board of Directors as the chair >> > of the Apache DataFusion Project, and to have primary responsibility >> > for management of the projects within the scope of >> > responsibility of the Apache DataFusion Project; and be it further >> > >> > RESOLVED, that the persons listed immediately below be and >> > hereby are appointed to serve as the initial members of the >> > Apache DataFusion Project: >> > >> > * Andy Grove (agr...@apache.org) >> > * Andrew Lamb (al...@apache.org) >> > * Daniël Heres (dhe...@apache.org) >> > * Jie Wen (jake...@apache.org) >> > * Kun Liu (liu...@apache.org) >> > * Liang-Chi Hsieh (vii...@apache.org) >> > * Qingping Hou: (ho...@apache.org) >> > * Wes McKinney(w...@apache.org) >> > * Will Jones (wjones...@apache.org) >> > >> > RESOLVED, that the Apache DataFusion Project be and hereby >> > is tasked with the migration and rationalization of the Apache >> > Arrow DataFusion sub-project; and be it further >> > >> > RESOLVED, that all responsibilities pertaining to the Apache >> > Arrow DataFusion sub-project encumbered upon the >> > Apache Arrow Project are hereafter discharged. >> > >> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb >> > be appointed to the office of Vice President, Apache DataFusion, to >> > serve in accordance with and subject to the direction of the >> > Board of Directors and the Bylaws of the Foundation until >> > death, resignation, retirement, removal or disqualification, >> > or until a successor is appointed. >> > ============================================================= >> > >> > >> > ------- >> > >> > >> > Summary: >> > >> > We propose creating a new top level project, Apache DataFusion, from >> > an existing sub project of Apache Arrow to facilitate additional >> > community and project growth. >> > >> > Abstract >> > >> > Apache Arrow DataFusion[1] is a very fast, extensible query engine >> > for building high-quality data-centric systems in Rust, using the >> > Apache Arrow in-memory format. DataFusion offers SQL and Dataframe >> > APIs, excellent performance, built-in support for CSV, Parquet, JSON, >> > and Avro, extensive customization, and a great community. >> > >> > [1] https://arrow.apache.org/datafusion/ >> > >> > >> > Proposal >> > >> > We propose creating a new top level ASF project, Apache DataFusion, >> > governed initially by a subset of the Apache Arrow project’s PMC and >> > committers. The project’s code is in five existing git repositories, >> > currently governed by Apache Arrow which would transfer to the new top >> > level project. >> > >> > Background >> > >> > When DataFusion was initially donated to the Arrow project, it did not >> > have a strong enough community to stand on its own. It has since grown >> > significantly, and benefited immensely from being part of Arrow and >> > nurturing of the Apache Way, and now has a community strong enough to >> > stand on its own and that would benefit from focused governance >> > attention. >> > >> > The community has discussed this idea publicly for more than 6 months >> > https://github.com/apache/arrow-datafusion/discussions/6475 and >> > briefly on the Arrow PMC mailing list >> > https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As >> > of the time of this writing both had exclusively positive reactions. >> > >> > Several current members of the Arrow PMC are both active contributors >> > to DataFusion and understand and believe deeply in the Apache Way, and >> > play active governance roles in the Arrow project as PMC members and >> > PMC chairs, guiding the community, and releasing software versions. >> > With this existing governance experience and structure, the new top >> > level project will be able to function well immediately and >> > independently. >> > >> > Overview of DataFusion >> > >> > Current Status >> > >> > Meritocracy >> > >> > DataFusion has been developed as part of Apache Arrow and thus has >> > been operating as a meritocracy. Many of the developers of DataFusion >> > are Arrow PMC members or committers. The DataFusion project plans to >> > continue adding new PMC and committers as the project matures and >> > grows. >> > >> > Community >> > >> > The DataFusion development team seeks to foster the development and >> > user communities. We hope that becoming a separate project will help >> > both Arrow and DataFusion communities by being more focused. Focused >> > governance will make it easier to grow the community of committers and >> > PMC members and make the organization more clear to others. >> > >> > Alignment >> > >> > The ASF is a natural host for DataFusion given that it is already the >> > home of Arrow, Parquet, and other related distributed system, storage >> > and query execution systems. >> > >> > Project Leadership >> > >> > Proposed Initial PMC >> > >> > We propose the following people as the initial DataFusion PMC members. >> > This is a subset of the existing Arrow PMC members who contribute to >> > DataFusion https://people.apache.org/phonebook.html?unix=arrow >> > >> > Andy Grove (agrove): Arrow PMC Chair >> > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair >> > Daniël Heres (dheres) Arrow PMC >> > Jie Wen (jakevin): Arrow PMC, Doris Committer >> > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC >> > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC >> > Qingping Hou: (houqp): Arrow PMC >> > Wes McKinney(wesm): Arrow PMC, ASF Member >> > Will Jones (wjones127): Arrow PMC >> > >> > We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF >> > VP) for the DataFusion project. >> > >> > Affiliations >> > >> > Andy Grove (agrove): NVidia >> > Andrew Lamb (alamb): InfluxData >> > Daniël Heres (dheres): Coralogix >> > Jie Wen (jakevin): SelectDB >> > Kun Liu (liukun): Ebay >> > Liang-Chi Hsieh (viirya): Apple >> > Qingping Hou: (houqp): Scribd >> > Wes McKinney(wesm): Posit >> > Will Jones (wjones127): LanceDB >> > >> > Proposed Initial Committers >> > >> > In addition to the PMC, we propose the following people as the initial >> > DataFusion committers. This is a subset of the existing Arrow >> > committers who contribute to DataFusion >> > https://people.apache.org/phonebook.html?unix=arrow >> > >> > akurmustafa Mustafa Akur (Synnada) >> > avantgardner Brent Gardner (Coralogix) >> > comphead Oleks V. (Unaffiliated) >> > jayzhan Jay Zhan (Unaffiliated) >> > jeffreyvo Jeffry Vo (Unaffiliated) >> > jiayuliu Liu Jiayu (Airbnb) >> > mete Metehan Yildirim (Synnada) >> > mingmwang Wang Mingming (Ebay) >> > mneumann Marco Neumann (InfluxData) >> > nju_yaho Zhong Yanghong (Ebay) >> > ozankabak Mehmet Ozan Kabak (Synnada) >> > paddyhoran Paddy Horan (Assured Allies) >> > rdettai Rémi Dettai (Cloudfuse) >> > sunchao Chao Sun (Apple) >> > thinkharderdev Daniel Harris (Coralogix) >> > tustvold Raphael Taylor-Davies (InfluxData) >> > wayne Ruihang Xia (Greptime) >> > xudong963 Xudong Wang (ByteDance) >> > yjshen Yijie Shen (Space and Time) >> > yangjiang Yang Jiang (ebay) >> > >> > >> > Risk Assessments >> > >> > Naming / Trademarks >> > >> > As a sub-project of Arrow, the DataFusion name has been used for over >> > 4 years without any known issues. A podling name search did not turn >> > up any concerns and was approved: >> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 >> > >> > Legal / IP Clearance >> > >> > All DataFusion code has either been donated to the Arrow project with >> > appropriate IP clearance or has been developed directly under ASF >> > processes and procedures. Thus creating a new top level project poses >> > no new Legal or IP risks. >> > >> > Code Extraction >> > >> > The relevant code is already in 5 separate repositories: >> > https://github.com/apache/arrow-datafusion/ >> > https://github.com/apache/arrow-datafusion-python >> > https://github.com/apache/arrow-ballista >> > https://github.com/apache/arrow-ballista-python >> > https://github.com/apache/arrow-datafusion-comet >> > >> > We foresee no issues with code extraction and propose these >> > repositories be renamed to reflect top level projects >> > >> > Note: https://github.com/apache/arrow-rs, the Rust implementation of >> > Arrow, would remain part of the Arrow project. >> > >> > Orphaned Products >> > >> > DataFusion is known to be used in many open source and commercial >> > projects >> > >> https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users >> > , >> > has had multiple commits daily for several years, and its adoption and >> > number of contributors appears to be growing. We do not foresee the >> > project being orphaned in the next several years. >> > >> > Inexperience with Open Source >> > >> > The proposed PMC has extensive experience with Apache Arrow and other >> > Apache projects, and includes PMC members, PMC chairs and an ASF >> > Member. The DataFusion PMC and more experienced committers will >> > continue to coach new community members who may be less familiar with >> > the Apache Way. >> > >> > Homogeneous Developers >> > >> > The 9 proposed PMC members are from 9 different employers and the >> > proposed committers are similarly distributed across affiliations. No >> > specific entity employs more than 3 total proposed developers. >> > >> > Reliance on Salaried Developers >> > >> > A substantial amount of work on DataFusion has been by salaried >> > developers, but it also has a long tradition of attracting >> > contributions from students and hobbyists and we plan no changes in >> > contribution structure. >> > >> > Relationships with Other Apache Products >> > >> > DataFusion will obviously have a strong relationship with the Arrow >> > project given the overlap in people. We don’t foresee close >> > collaboration with other projects at this time. >> > >> > Cryptography >> > >> > DataFusion does not directly support encryption and there are no >> > near-term plans to add support for encryption. Users who need this >> > functionality can use the extension APIs. >> > >> > Required Resources >> > >> > Mailing Lists >> > >> > - priv...@datafusion.apache.org for private PMC discussions (with >> > moderated subscriptions) >> > - d...@datafusion.apache.org >> > - comm...@datafusion.apache.org >> > - u...@datafusion.apache.org >> > >> > Version Control >> > >> > We propose to continue to use git for source control and github for >> > hosting and testing resources. >> > >> > We also need to rename the github repositories to reflect the new top >> > level names: >> > >> > https://github.com/apache/arrow-datafusion/ → apache/datafusion >> > https://github.com/apache/arrow-datafusion-python → >> > apache/datafusion-python >> > https://github.com/apache/arrow-ballista → apache/datafusion-ballista >> > https://github.com/apache/arrow-ballista-python → >> > apache/datafusion-ballista-python >> > https://github.com/apache/arrow-datafusion-comet → >> apache/datafusion-comet >> > >> > >> > >> > Issue Tracking >> > >> > DataFusion would continue to use github for its issue tracking and >> > communications >> > >> > Other Resources >> > >> > The existing repositories already make use of existing Apache >> > infrastructure, and we expect no change in the initial resource usage. >> > As the project continues to grow, we expect continued infrastructure >> > demand growth. >> > >> > >> > FAQ: Has a sub project been promoted to a top level project before? >> > >> > Yes, and it appears to happen commonly. The Arrow project itself was >> > created as a top level project from work that started in Apache Drill, >> > and there are many sub projects of Hadoop that spun out as their own >> > top level projects such as Mahout, Avro and HBase: >> > >> > >> https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 >> > >> > >> > >> > Related material: >> > Name search request / research for DataFusion: >> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 >> > Discussion about this proposal on the arrow mailing list: >> > https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 >> > Discussion about which repositories on the arrow mailing list: >> > https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q >> > Discussion about initial PMC on the arrow mailing list: >> > https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b >> > Discussion in github about creating a new DataFusion top level >> > project: https://github.com/apache/arrow-datafusion/discussions/6475 >> > Discussion about graduating on incubator list: >> > https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 >> > Original Proposal for the Arrow project: >> > https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 >> > >>