+1 (binding) On Sun, Mar 3, 2024 at 09:43 Wayne Xia <waynest...@gmail.com> wrote:
> +1 (non-binding) > > Regards, > Wayne > > Julian Hyde <jhyde.apa...@gmail.com> 于 2024年3月4日周一 上午1:38写道: > > > +1 (binding) > > > > > On Mar 2, 2024, at 2:28 PM, Dewey Dunnington > > <de...@voltrondata.com.invalid> wrote: > > > > > > +1 (binding) > > > > > >> On Sat, Mar 2, 2024 at 8:08 AM vin jake <jakevin...@gmail.com> wrote: > > >> > > >> +1 (binding) > > >> > > >>> On Fri, Mar 1, 2024 at 7:33 PM Andrew Lamb <al...@influxdata.com> > > wrote: > > >>> > > >>> Hello, > > >>> > > >>> As we have discussed[1][2] I would like to vote on the proposal to > > >>> create a new Apache Top Level Project for DataFusion. The text of the > > >>> proposed resolution and background document is copy/pasted below > > >>> > > >>> If the community is in favor of this, we plan to submit the > resolution > > >>> to the ASF board for approval with the next Arrow report (for the > > >>> April 2024 board meeting). > > >>> > > >>> The vote will be open for at least 7 days. > > >>> > > >>> [ ] +1 Accept this Proposal > > >>> [ ] +0 > > >>> [ ] -1 Do not accept this proposal because... > > >>> > > >>> Andrew > > >>> > > >>> [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > > >>> [2] https://github.com/apache/arrow-datafusion/discussions/6475 > > >>> > > >>> ---------- Proposed Resolution --------- > > >>> > > >>> Resolution to Create the Apache DataFusion Project from the Apache > > >>> Arrow DataFusion Sub Project > > >>> > > >>> ============================================================= > > >>> > > >>> X. Establish the Apache DataFusion Project > > >>> > > >>> WHEREAS, the Board of Directors deems it to be in the best > > >>> interests of the Foundation and consistent with the > > >>> Foundation's purpose to establish a Project Management > > >>> Committee charged with the creation and maintenance of > > >>> open-source software related to an extensible query engine > > >>> for distribution at no charge to the public. > > >>> > > >>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management > > >>> Committee (PMC), to be known as the "Apache DataFusion Project", > > >>> be and hereby is established pursuant to Bylaws of the > > >>> Foundation; and be it further > > >>> > > >>> RESOLVED, that the Apache DataFusion Project be and hereby is > > >>> responsible for the creation and maintenance of software > > >>> related to an extensible query engine; and be it further > > >>> > > >>> RESOLVED, that the office of "Vice President, Apache DataFusion" be > > >>> and hereby is created, the person holding such office to > > >>> serve at the direction of the Board of Directors as the chair > > >>> of the Apache DataFusion Project, and to have primary responsibility > > >>> for management of the projects within the scope of > > >>> responsibility of the Apache DataFusion Project; and be it further > > >>> > > >>> RESOLVED, that the persons listed immediately below be and > > >>> hereby are appointed to serve as the initial members of the > > >>> Apache DataFusion Project: > > >>> > > >>> * Andy Grove (agr...@apache.org) > > >>> * Andrew Lamb (al...@apache.org) > > >>> * Daniël Heres (dhe...@apache.org) > > >>> * Jie Wen (jake...@apache.org) > > >>> * Kun Liu (liu...@apache.org) > > >>> * Liang-Chi Hsieh (vii...@apache.org) > > >>> * Qingping Hou: (ho...@apache.org) > > >>> * Wes McKinney(w...@apache.org) > > >>> * Will Jones (wjones...@apache.org) > > >>> > > >>> RESOLVED, that the Apache DataFusion Project be and hereby > > >>> is tasked with the migration and rationalization of the Apache > > >>> Arrow DataFusion sub-project; and be it further > > >>> > > >>> RESOLVED, that all responsibilities pertaining to the Apache > > >>> Arrow DataFusion sub-project encumbered upon the > > >>> Apache Arrow Project are hereafter discharged. > > >>> > > >>> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb > > >>> be appointed to the office of Vice President, Apache DataFusion, to > > >>> serve in accordance with and subject to the direction of the > > >>> Board of Directors and the Bylaws of the Foundation until > > >>> death, resignation, retirement, removal or disqualification, > > >>> or until a successor is appointed. > > >>> ============================================================= > > >>> > > >>> > > >>> ------- > > >>> > > >>> > > >>> Summary: > > >>> > > >>> We propose creating a new top level project, Apache DataFusion, from > > >>> an existing sub project of Apache Arrow to facilitate additional > > >>> community and project growth. > > >>> > > >>> Abstract > > >>> > > >>> Apache Arrow DataFusion[1] is a very fast, extensible query engine > > >>> for building high-quality data-centric systems in Rust, using the > > >>> Apache Arrow in-memory format. DataFusion offers SQL and Dataframe > > >>> APIs, excellent performance, built-in support for CSV, Parquet, JSON, > > >>> and Avro, extensive customization, and a great community. > > >>> > > >>> [1] https://arrow.apache.org/datafusion/ > > >>> > > >>> > > >>> Proposal > > >>> > > >>> We propose creating a new top level ASF project, Apache DataFusion, > > >>> governed initially by a subset of the Apache Arrow project’s PMC and > > >>> committers. The project’s code is in five existing git repositories, > > >>> currently governed by Apache Arrow which would transfer to the new > top > > >>> level project. > > >>> > > >>> Background > > >>> > > >>> When DataFusion was initially donated to the Arrow project, it did > not > > >>> have a strong enough community to stand on its own. It has since > grown > > >>> significantly, and benefited immensely from being part of Arrow and > > >>> nurturing of the Apache Way, and now has a community strong enough to > > >>> stand on its own and that would benefit from focused governance > > >>> attention. > > >>> > > >>> The community has discussed this idea publicly for more than 6 months > > >>> https://github.com/apache/arrow-datafusion/discussions/6475 and > > >>> briefly on the Arrow PMC mailing list > > >>> https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As > > >>> of the time of this writing both had exclusively positive reactions. > > >>> > > >>> Several current members of the Arrow PMC are both active contributors > > >>> to DataFusion and understand and believe deeply in the Apache Way, > and > > >>> play active governance roles in the Arrow project as PMC members and > > >>> PMC chairs, guiding the community, and releasing software versions. > > >>> With this existing governance experience and structure, the new top > > >>> level project will be able to function well immediately and > > >>> independently. > > >>> > > >>> Overview of DataFusion > > >>> > > >>> Current Status > > >>> > > >>> Meritocracy > > >>> > > >>> DataFusion has been developed as part of Apache Arrow and thus has > > >>> been operating as a meritocracy. Many of the developers of DataFusion > > >>> are Arrow PMC members or committers. The DataFusion project plans to > > >>> continue adding new PMC and committers as the project matures and > > >>> grows. > > >>> > > >>> Community > > >>> > > >>> The DataFusion development team seeks to foster the development and > > >>> user communities. We hope that becoming a separate project will help > > >>> both Arrow and DataFusion communities by being more focused. Focused > > >>> governance will make it easier to grow the community of committers > and > > >>> PMC members and make the organization more clear to others. > > >>> > > >>> Alignment > > >>> > > >>> The ASF is a natural host for DataFusion given that it is already the > > >>> home of Arrow, Parquet, and other related distributed system, storage > > >>> and query execution systems. > > >>> > > >>> Project Leadership > > >>> > > >>> Proposed Initial PMC > > >>> > > >>> We propose the following people as the initial DataFusion PMC > members. > > >>> This is a subset of the existing Arrow PMC members who contribute to > > >>> DataFusion https://people.apache.org/phonebook.html?unix=arrow > > >>> > > >>> Andy Grove (agrove): Arrow PMC Chair > > >>> Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair > > >>> Daniël Heres (dheres) Arrow PMC > > >>> Jie Wen (jakevin): Arrow PMC, Doris Committer > > >>> Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC > > >>> Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC > > >>> Qingping Hou: (houqp): Arrow PMC > > >>> Wes McKinney(wesm): Arrow PMC, ASF Member > > >>> Will Jones (wjones127): Arrow PMC > > >>> > > >>> We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF > > >>> VP) for the DataFusion project. > > >>> > > >>> Affiliations > > >>> > > >>> Andy Grove (agrove): NVidia > > >>> Andrew Lamb (alamb): InfluxData > > >>> Daniël Heres (dheres): Coralogix > > >>> Jie Wen (jakevin): SelectDB > > >>> Kun Liu (liukun): Ebay > > >>> Liang-Chi Hsieh (viirya): Apple > > >>> Qingping Hou: (houqp): Scribd > > >>> Wes McKinney(wesm): Posit > > >>> Will Jones (wjones127): LanceDB > > >>> > > >>> Proposed Initial Committers > > >>> > > >>> In addition to the PMC, we propose the following people as the > initial > > >>> DataFusion committers. This is a subset of the existing Arrow > > >>> committers who contribute to DataFusion > > >>> https://people.apache.org/phonebook.html?unix=arrow > > >>> > > >>> akurmustafa Mustafa Akur (Synnada) > > >>> avantgardner Brent Gardner (Coralogix) > > >>> comphead Oleks V. (Unaffiliated) > > >>> jayzhan Jay Zhan (Unaffiliated) > > >>> jeffreyvo Jeffry Vo (Unaffiliated) > > >>> jiayuliu Liu Jiayu (Airbnb) > > >>> mete Metehan Yildirim (Synnada) > > >>> mingmwang Wang Mingming (Ebay) > > >>> mneumann Marco Neumann (InfluxData) > > >>> nju_yaho Zhong Yanghong (Ebay) > > >>> ozankabak Mehmet Ozan Kabak (Synnada) > > >>> paddyhoran Paddy Horan (Assured Allies) > > >>> rdettai Rémi Dettai (Cloudfuse) > > >>> sunchao Chao Sun (Apple) > > >>> thinkharderdev Daniel Harris (Coralogix) > > >>> tustvold Raphael Taylor-Davies (InfluxData) > > >>> wayne Ruihang Xia (Greptime) > > >>> xudong963 Xudong Wang (ByteDance) > > >>> yjshen Yijie Shen (Space and Time) > > >>> yangjiang Yang Jiang (ebay) > > >>> > > >>> > > >>> Risk Assessments > > >>> > > >>> Naming / Trademarks > > >>> > > >>> As a sub-project of Arrow, the DataFusion name has been used for over > > >>> 4 years without any known issues. A podling name search did not turn > > >>> up any concerns and was approved: > > >>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > >>> > > >>> Legal / IP Clearance > > >>> > > >>> All DataFusion code has either been donated to the Arrow project with > > >>> appropriate IP clearance or has been developed directly under ASF > > >>> processes and procedures. Thus creating a new top level project poses > > >>> no new Legal or IP risks. > > >>> > > >>> Code Extraction > > >>> > > >>> The relevant code is already in 5 separate repositories: > > >>> https://github.com/apache/arrow-datafusion/ > > >>> https://github.com/apache/arrow-datafusion-python > > >>> https://github.com/apache/arrow-ballista > > >>> https://github.com/apache/arrow-ballista-python > > >>> https://github.com/apache/arrow-datafusion-comet > > >>> > > >>> We foresee no issues with code extraction and propose these > > >>> repositories be renamed to reflect top level projects > > >>> > > >>> Note: https://github.com/apache/arrow-rs, the Rust implementation > of > > >>> Arrow, would remain part of the Arrow project. > > >>> > > >>> Orphaned Products > > >>> > > >>> DataFusion is known to be used in many open source and commercial > > >>> projects > > >>> > > > https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users > > >>> , > > >>> has had multiple commits daily for several years, and its adoption > and > > >>> number of contributors appears to be growing. We do not foresee the > > >>> project being orphaned in the next several years. > > >>> > > >>> Inexperience with Open Source > > >>> > > >>> The proposed PMC has extensive experience with Apache Arrow and other > > >>> Apache projects, and includes PMC members, PMC chairs and an ASF > > >>> Member. The DataFusion PMC and more experienced committers will > > >>> continue to coach new community members who may be less familiar with > > >>> the Apache Way. > > >>> > > >>> Homogeneous Developers > > >>> > > >>> The 9 proposed PMC members are from 9 different employers and the > > >>> proposed committers are similarly distributed across affiliations. No > > >>> specific entity employs more than 3 total proposed developers. > > >>> > > >>> Reliance on Salaried Developers > > >>> > > >>> A substantial amount of work on DataFusion has been by salaried > > >>> developers, but it also has a long tradition of attracting > > >>> contributions from students and hobbyists and we plan no changes in > > >>> contribution structure. > > >>> > > >>> Relationships with Other Apache Products > > >>> > > >>> DataFusion will obviously have a strong relationship with the Arrow > > >>> project given the overlap in people. We don’t foresee close > > >>> collaboration with other projects at this time. > > >>> > > >>> Cryptography > > >>> > > >>> DataFusion does not directly support encryption and there are no > > >>> near-term plans to add support for encryption. Users who need this > > >>> functionality can use the extension APIs. > > >>> > > >>> Required Resources > > >>> > > >>> Mailing Lists > > >>> > > >>> - priv...@datafusion.apache.org for private PMC discussions (with > > >>> moderated subscriptions) > > >>> - d...@datafusion.apache.org > > >>> - comm...@datafusion.apache.org > > >>> - u...@datafusion.apache.org > > >>> > > >>> Version Control > > >>> > > >>> We propose to continue to use git for source control and github for > > >>> hosting and testing resources. > > >>> > > >>> We also need to rename the github repositories to reflect the new top > > >>> level names: > > >>> > > >>> https://github.com/apache/arrow-datafusion/ → apache/datafusion > > >>> https://github.com/apache/arrow-datafusion-python → > > >>> apache/datafusion-python > > >>> https://github.com/apache/arrow-ballista → > apache/datafusion-ballista > > >>> https://github.com/apache/arrow-ballista-python → > > >>> apache/datafusion-ballista-python > > >>> https://github.com/apache/arrow-datafusion-comet → > > apache/datafusion-comet > > >>> > > >>> > > >>> > > >>> Issue Tracking > > >>> > > >>> DataFusion would continue to use github for its issue tracking and > > >>> communications > > >>> > > >>> Other Resources > > >>> > > >>> The existing repositories already make use of existing Apache > > >>> infrastructure, and we expect no change in the initial resource > usage. > > >>> As the project continues to grow, we expect continued infrastructure > > >>> demand growth. > > >>> > > >>> > > >>> FAQ: Has a sub project been promoted to a top level project before? > > >>> > > >>> Yes, and it appears to happen commonly. The Arrow project itself was > > >>> created as a top level project from work that started in Apache > Drill, > > >>> and there are many sub projects of Hadoop that spun out as their own > > >>> top level projects such as Mahout, Avro and HBase: > > >>> > > >>> > > > https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 > > >>> > > >>> > > >>> > > >>> Related material: > > >>> Name search request / research for DataFusion: > > >>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > >>> Discussion about this proposal on the arrow mailing list: > > >>> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > > >>> Discussion about which repositories on the arrow mailing list: > > >>> https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q > > >>> Discussion about initial PMC on the arrow mailing list: > > >>> https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b > > >>> Discussion in github about creating a new DataFusion top level > > >>> project: https://github.com/apache/arrow-datafusion/discussions/6475 > > >>> Discussion about graduating on incubator list: > > >>> https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 > > >>> Original Proposal for the Arrow project: > > >>> https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 > > >>> > > >