+1 (binding)

On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote:
> +1 (binding)
>
> On Fri, Mar 1, 2024 at 6:20 AM Weston Pace <weston.p...@gmail.com> wrote:
>
>> +1 (binding)
>>
>> On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb <al...@influxdata.com> wrote:
>>
>> > Hello,
>> >
>> > As we have discussed[1][2] I would like to vote on the proposal to
>> > create a new Apache Top Level Project for DataFusion. The text of the
>> > proposed resolution and background document is copy/pasted below
>> >
>> > If the community is in favor of this, we plan to submit the resolution
>> > to the ASF board for approval with the next Arrow report (for the
>> > April 2024 board meeting).
>> >
>> > The vote will be open for at least 7 days.
>> >
>> > [ ] +1 Accept this Proposal
>> > [ ] +0
>> > [ ] -1 Do not accept this proposal because...
>> >
>> > Andrew
>> >
>> > [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>> > [2] https://github.com/apache/arrow-datafusion/discussions/6475
>> >
>> > ---------- Proposed Resolution ---------
>> >
>> > Resolution to Create the Apache DataFusion Project from the Apache
>> > Arrow DataFusion Sub Project
>> >
>> > =============================================================
>> >
>> > X. Establish the Apache DataFusion Project
>> >
>> > WHEREAS, the Board of Directors deems it to be in the best
>> > interests of the Foundation and consistent with the
>> > Foundation's purpose to establish a Project Management
>> > Committee charged with the creation and maintenance of
>> > open-source software related to an extensible query engine
>> > for distribution at no charge to the public.
>> >
>> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>> > Committee (PMC), to be known as the "Apache DataFusion Project",
>> > be and hereby is established pursuant to Bylaws of the
>> > Foundation; and be it further
>> >
>> > RESOLVED, that the Apache DataFusion Project be and hereby is
>> > responsible for the creation and maintenance of software
>> > related to an extensible query engine; and be it further
>> >
>> > RESOLVED, that the office of "Vice President, Apache DataFusion" be
>> > and hereby is created, the person holding such office to
>> > serve at the direction of the Board of Directors as the chair
>> > of the Apache DataFusion Project, and to have primary responsibility
>> > for management of the projects within the scope of
>> > responsibility of the Apache DataFusion Project; and be it further
>> >
>> > RESOLVED, that the persons listed immediately below be and
>> > hereby are appointed to serve as the initial members of the
>> > Apache DataFusion Project:
>> >
>> > * Andy Grove (agr...@apache.org)
>> > * Andrew Lamb (al...@apache.org)
>> > * Daniël Heres (dhe...@apache.org)
>> > * Jie Wen (jake...@apache.org)
>> > * Kun Liu (liu...@apache.org)
>> > * Liang-Chi Hsieh (vii...@apache.org)
>> > * Qingping Hou: (ho...@apache.org)
>> > * Wes McKinney(w...@apache.org)
>> > * Will Jones (wjones...@apache.org)
>> >
>> > RESOLVED, that the Apache DataFusion Project be and hereby
>> > is tasked with the migration and rationalization of the Apache
>> > Arrow DataFusion sub-project; and be it further
>> >
>> > RESOLVED, that all responsibilities pertaining to the Apache
>> > Arrow DataFusion sub-project encumbered upon the
>> > Apache Arrow Project are hereafter discharged.
>> >
>> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb
>> > be appointed to the office of Vice President, Apache DataFusion, to
>> > serve in accordance with and subject to the direction of the
>> > Board of Directors and the Bylaws of the Foundation until
>> > death, resignation, retirement, removal or disqualification,
>> > or until a successor is appointed.
>> > =============================================================
>> >
>> >
>> > -------
>> >
>> >
>> > Summary:
>> >
>> > We propose creating a new top level project, Apache DataFusion, from
>> > an existing sub project of Apache Arrow to facilitate additional
>> > community and project growth.
>> >
>> > Abstract
>> >
>> > Apache Arrow DataFusion[1]  is a very fast, extensible query engine
>> > for building high-quality data-centric systems in Rust, using the
>> > Apache Arrow in-memory format. DataFusion offers SQL and Dataframe
>> > APIs, excellent performance, built-in support for CSV, Parquet, JSON,
>> > and Avro, extensive customization, and a great community.
>> >
>> > [1] https://arrow.apache.org/datafusion/
>> >
>> >
>> > Proposal
>> >
>> > We propose creating a new top level ASF project, Apache DataFusion,
>> > governed initially by a subset of the Apache Arrow project’s PMC and
>> > committers. The project’s code is in five existing git repositories,
>> > currently governed by Apache Arrow which would transfer to the new top
>> > level project.
>> >
>> > Background
>> >
>> > When DataFusion was initially donated to the Arrow project, it did not
>> > have a strong enough community to stand on its own. It has since grown
>> > significantly, and benefited immensely from being part of Arrow and
>> > nurturing of the Apache Way, and now has a community strong enough to
>> > stand on its own and that would benefit from focused governance
>> > attention.
>> >
>> > The community has discussed this idea publicly for more than 6 months
>> > https://github.com/apache/arrow-datafusion/discussions/6475  and
>> > briefly on the Arrow PMC mailing list
>> > https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As
>> > of the time of this writing both had exclusively positive reactions.
>> >
>> > Several current members of the Arrow PMC are both active contributors
>> > to DataFusion and understand and believe deeply in the Apache Way, and
>> > play active governance roles in the Arrow project as PMC members and
>> > PMC chairs, guiding the community, and releasing software versions.
>> > With this existing governance experience and structure, the new top
>> > level project will be able to function well immediately and
>> > independently.
>> >
>> > Overview of DataFusion
>> >
>> > Current Status
>> >
>> > Meritocracy
>> >
>> > DataFusion has been developed as part of Apache Arrow and thus has
>> > been operating as a meritocracy. Many of the developers of DataFusion
>> > are Arrow PMC members or committers. The DataFusion project plans to
>> > continue adding new PMC and committers as the project matures and
>> > grows.
>> >
>> > Community
>> >
>> > The DataFusion development team seeks to foster the development and
>> > user communities. We hope that becoming a separate project will help
>> > both Arrow and DataFusion communities by being more focused.  Focused
>> > governance will make it easier to grow the community of committers and
>> > PMC members and make the organization more clear to others.
>> >
>> > Alignment
>> >
>> > The ASF is a natural host for DataFusion given that it is already the
>> > home of Arrow, Parquet, and other related distributed system, storage
>> > and query execution systems.
>> >
>> > Project Leadership
>> >
>> > Proposed Initial PMC
>> >
>> > We propose the following people as the initial DataFusion PMC members.
>> > This is a subset of the existing Arrow PMC members who contribute to
>> > DataFusion https://people.apache.org/phonebook.html?unix=arrow
>> >
>> > Andy Grove (agrove):  Arrow PMC Chair
>> > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair
>> > Daniël Heres (dheres) Arrow PMC
>> > Jie Wen (jakevin):  Arrow PMC, Doris Committer
>> > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC
>> > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC
>> > Qingping Hou: (houqp): Arrow PMC
>> > Wes McKinney(wesm): Arrow PMC, ASF Member
>> > Will Jones (wjones127): Arrow PMC
>> >
>> > We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF
>> > VP) for the DataFusion project.
>> >
>> > Affiliations
>> >
>> > Andy Grove (agrove):  NVidia
>> > Andrew Lamb (alamb): InfluxData
>> > Daniël Heres (dheres): Coralogix
>> > Jie Wen (jakevin): SelectDB
>> > Kun Liu (liukun): Ebay
>> > Liang-Chi Hsieh (viirya): Apple
>> > Qingping Hou: (houqp): Scribd
>> > Wes McKinney(wesm): Posit
>> > Will Jones (wjones127): LanceDB
>> >
>> > Proposed Initial Committers
>> >
>> > In addition to the PMC, we propose the following people as the initial
>> > DataFusion committers. This is a subset of the existing Arrow
>> > committers who contribute to DataFusion
>> > https://people.apache.org/phonebook.html?unix=arrow
>> >
>> > akurmustafa Mustafa Akur (Synnada)
>> > avantgardner Brent Gardner (Coralogix)
>> > comphead Oleks V. (Unaffiliated)
>> > jayzhan Jay Zhan (Unaffiliated)
>> > jeffreyvo Jeffry Vo (Unaffiliated)
>> > jiayuliu Liu Jiayu (Airbnb)
>> > mete Metehan Yildirim (Synnada)
>> > mingmwang Wang Mingming (Ebay)
>> > mneumann Marco Neumann (InfluxData)
>> > nju_yaho Zhong Yanghong (Ebay)
>> > ozankabak Mehmet Ozan Kabak (Synnada)
>> > paddyhoran Paddy Horan (Assured Allies)
>> > rdettai Rémi Dettai (Cloudfuse)
>> > sunchao Chao Sun (Apple)
>> > thinkharderdev Daniel Harris (Coralogix)
>> > tustvold Raphael Taylor-Davies (InfluxData)
>> > wayne Ruihang Xia (Greptime)
>> > xudong963 Xudong Wang (ByteDance)
>> > yjshen Yijie Shen (Space and Time)
>> > yangjiang Yang Jiang (ebay)
>> >
>> >
>> > Risk Assessments
>> >
>> > Naming / Trademarks
>> >
>> > As a sub-project of Arrow, the DataFusion name has been used for over
>> > 4 years without any known issues. A podling name search did not turn
>> > up any concerns and was approved:
>> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>> >
>> > Legal / IP Clearance
>> >
>> > All DataFusion code has either been donated to the Arrow project with
>> > appropriate IP clearance or  has been developed directly under ASF
>> > processes and procedures. Thus creating a new top level project poses
>> > no new Legal or IP risks.
>> >
>> > Code Extraction
>> >
>> > The relevant code is already in 5 separate repositories:
>> > https://github.com/apache/arrow-datafusion/
>> > https://github.com/apache/arrow-datafusion-python
>> > https://github.com/apache/arrow-ballista
>> > https://github.com/apache/arrow-ballista-python
>> > https://github.com/apache/arrow-datafusion-comet
>> >
>> > We foresee no issues with code extraction and propose these
>> > repositories be  renamed to reflect top level projects
>> >
>> > Note:  https://github.com/apache/arrow-rs, the Rust implementation of
>> > Arrow, would remain part of the Arrow project.
>> >
>> > Orphaned Products
>> >
>> > DataFusion is known to be used in many open source and commercial
>> > projects
>> >
>> https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users
>> > ,
>> > has had multiple commits daily for several years, and its adoption and
>> > number of contributors appears to be growing. We do not foresee the
>> > project being orphaned in the next several years.
>> >
>> > Inexperience with Open Source
>> >
>> > The proposed PMC has extensive experience with Apache Arrow and other
>> > Apache projects, and includes PMC members, PMC chairs and an ASF
>> > Member. The DataFusion PMC and more experienced committers will
>> > continue to coach new community members who may be less familiar with
>> > the Apache Way.
>> >
>> > Homogeneous Developers
>> >
>> > The 9 proposed PMC members are from 9 different employers and the
>> > proposed committers are similarly distributed across affiliations. No
>> > specific entity employs more than 3 total proposed developers.
>> >
>> > Reliance on Salaried Developers
>> >
>> > A substantial amount of work on DataFusion has been by salaried
>> > developers, but it also has a long tradition of attracting
>> > contributions from students and hobbyists and we plan no changes in
>> > contribution structure.
>> >
>> > Relationships with Other Apache Products
>> >
>> > DataFusion will obviously have a strong relationship with the Arrow
>> > project given the overlap in people. We don’t foresee close
>> > collaboration with other projects at this time.
>> >
>> > Cryptography
>> >
>> > DataFusion does not directly support encryption and there are no
>> > near-term plans to add support for encryption. Users who need this
>> > functionality can use the extension APIs.
>> >
>> > Required Resources
>> >
>> > Mailing Lists
>> >
>> > - priv...@datafusion.apache.org for private PMC discussions (with
>> > moderated subscriptions)
>> > - d...@datafusion.apache.org
>> > - comm...@datafusion.apache.org
>> > - u...@datafusion.apache.org
>> >
>> > Version Control
>> >
>> > We propose to continue to use git for source control and github for
>> > hosting and testing resources.
>> >
>> > We also need to rename the github repositories to reflect the new top
>> > level names:
>> >
>> > https://github.com/apache/arrow-datafusion/ → apache/datafusion
>> > https://github.com/apache/arrow-datafusion-python →
>> > apache/datafusion-python
>> > https://github.com/apache/arrow-ballista → apache/datafusion-ballista
>> > https://github.com/apache/arrow-ballista-python  →
>> > apache/datafusion-ballista-python
>> > https://github.com/apache/arrow-datafusion-comet →
>> apache/datafusion-comet
>> >
>> >
>> >
>> > Issue Tracking
>> >
>> > DataFusion would continue to use github for its issue tracking and
>> > communications
>> >
>> > Other Resources
>> >
>> > The existing repositories already make use of existing Apache
>> > infrastructure, and we expect no change in the initial resource usage.
>> > As the project continues to grow, we expect continued infrastructure
>> > demand growth.
>> >
>> >
>> > FAQ: Has a sub project been promoted to a top level project before?
>> >
>> > Yes, and it appears to happen commonly. The Arrow project itself was
>> > created as a top level project from work that started in Apache Drill,
>> > and there are many sub projects of Hadoop that spun out as their own
>> > top level projects such as Mahout, Avro and HBase:
>> >
>> >
>> https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4
>> >
>> >
>> >
>> > Related material:
>> > Name search request / research for DataFusion:
>> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>> > Discussion about this proposal on the arrow mailing list:
>> > https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>> > Discussion about which repositories on the arrow mailing list:
>> > https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q
>> > Discussion about initial PMC on the arrow mailing list:
>> > https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b
>> > Discussion in github about creating a new DataFusion top level
>> > project: https://github.com/apache/arrow-datafusion/discussions/6475
>> > Discussion about graduating on incubator list:
>> > https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99
>> > Original Proposal for the Arrow project:
>> > https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3
>> >
>>

Reply via email to