+1 (binding) 

> On Mar 2, 2024, at 2:28 PM, Dewey Dunnington <de...@voltrondata.com.invalid> 
> wrote:
> 
> +1 (binding)
> 
>> On Sat, Mar 2, 2024 at 8:08 AM vin jake <jakevin...@gmail.com> wrote:
>> 
>> +1 (binding)
>> 
>>> On Fri, Mar 1, 2024 at 7:33 PM Andrew Lamb <al...@influxdata.com> wrote:
>>> 
>>> Hello,
>>> 
>>> As we have discussed[1][2] I would like to vote on the proposal to
>>> create a new Apache Top Level Project for DataFusion. The text of the
>>> proposed resolution and background document is copy/pasted below
>>> 
>>> If the community is in favor of this, we plan to submit the resolution
>>> to the ASF board for approval with the next Arrow report (for the
>>> April 2024 board meeting).
>>> 
>>> The vote will be open for at least 7 days.
>>> 
>>> [ ] +1 Accept this Proposal
>>> [ ] +0
>>> [ ] -1 Do not accept this proposal because...
>>> 
>>> Andrew
>>> 
>>> [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>>> [2] https://github.com/apache/arrow-datafusion/discussions/6475
>>> 
>>> ---------- Proposed Resolution ---------
>>> 
>>> Resolution to Create the Apache DataFusion Project from the Apache
>>> Arrow DataFusion Sub Project
>>> 
>>> =============================================================
>>> 
>>> X. Establish the Apache DataFusion Project
>>> 
>>> WHEREAS, the Board of Directors deems it to be in the best
>>> interests of the Foundation and consistent with the
>>> Foundation's purpose to establish a Project Management
>>> Committee charged with the creation and maintenance of
>>> open-source software related to an extensible query engine
>>> for distribution at no charge to the public.
>>> 
>>> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>>> Committee (PMC), to be known as the "Apache DataFusion Project",
>>> be and hereby is established pursuant to Bylaws of the
>>> Foundation; and be it further
>>> 
>>> RESOLVED, that the Apache DataFusion Project be and hereby is
>>> responsible for the creation and maintenance of software
>>> related to an extensible query engine; and be it further
>>> 
>>> RESOLVED, that the office of "Vice President, Apache DataFusion" be
>>> and hereby is created, the person holding such office to
>>> serve at the direction of the Board of Directors as the chair
>>> of the Apache DataFusion Project, and to have primary responsibility
>>> for management of the projects within the scope of
>>> responsibility of the Apache DataFusion Project; and be it further
>>> 
>>> RESOLVED, that the persons listed immediately below be and
>>> hereby are appointed to serve as the initial members of the
>>> Apache DataFusion Project:
>>> 
>>> * Andy Grove (agr...@apache.org)
>>> * Andrew Lamb (al...@apache.org)
>>> * Daniël Heres (dhe...@apache.org)
>>> * Jie Wen (jake...@apache.org)
>>> * Kun Liu (liu...@apache.org)
>>> * Liang-Chi Hsieh (vii...@apache.org)
>>> * Qingping Hou: (ho...@apache.org)
>>> * Wes McKinney(w...@apache.org)
>>> * Will Jones (wjones...@apache.org)
>>> 
>>> RESOLVED, that the Apache DataFusion Project be and hereby
>>> is tasked with the migration and rationalization of the Apache
>>> Arrow DataFusion sub-project; and be it further
>>> 
>>> RESOLVED, that all responsibilities pertaining to the Apache
>>> Arrow DataFusion sub-project encumbered upon the
>>> Apache Arrow Project are hereafter discharged.
>>> 
>>> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb
>>> be appointed to the office of Vice President, Apache DataFusion, to
>>> serve in accordance with and subject to the direction of the
>>> Board of Directors and the Bylaws of the Foundation until
>>> death, resignation, retirement, removal or disqualification,
>>> or until a successor is appointed.
>>> =============================================================
>>> 
>>> 
>>> -------
>>> 
>>> 
>>> Summary:
>>> 
>>> We propose creating a new top level project, Apache DataFusion, from
>>> an existing sub project of Apache Arrow to facilitate additional
>>> community and project growth.
>>> 
>>> Abstract
>>> 
>>> Apache Arrow DataFusion[1]  is a very fast, extensible query engine
>>> for building high-quality data-centric systems in Rust, using the
>>> Apache Arrow in-memory format. DataFusion offers SQL and Dataframe
>>> APIs, excellent performance, built-in support for CSV, Parquet, JSON,
>>> and Avro, extensive customization, and a great community.
>>> 
>>> [1] https://arrow.apache.org/datafusion/
>>> 
>>> 
>>> Proposal
>>> 
>>> We propose creating a new top level ASF project, Apache DataFusion,
>>> governed initially by a subset of the Apache Arrow project’s PMC and
>>> committers. The project’s code is in five existing git repositories,
>>> currently governed by Apache Arrow which would transfer to the new top
>>> level project.
>>> 
>>> Background
>>> 
>>> When DataFusion was initially donated to the Arrow project, it did not
>>> have a strong enough community to stand on its own. It has since grown
>>> significantly, and benefited immensely from being part of Arrow and
>>> nurturing of the Apache Way, and now has a community strong enough to
>>> stand on its own and that would benefit from focused governance
>>> attention.
>>> 
>>> The community has discussed this idea publicly for more than 6 months
>>> https://github.com/apache/arrow-datafusion/discussions/6475  and
>>> briefly on the Arrow PMC mailing list
>>> https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As
>>> of the time of this writing both had exclusively positive reactions.
>>> 
>>> Several current members of the Arrow PMC are both active contributors
>>> to DataFusion and understand and believe deeply in the Apache Way, and
>>> play active governance roles in the Arrow project as PMC members and
>>> PMC chairs, guiding the community, and releasing software versions.
>>> With this existing governance experience and structure, the new top
>>> level project will be able to function well immediately and
>>> independently.
>>> 
>>> Overview of DataFusion
>>> 
>>> Current Status
>>> 
>>> Meritocracy
>>> 
>>> DataFusion has been developed as part of Apache Arrow and thus has
>>> been operating as a meritocracy. Many of the developers of DataFusion
>>> are Arrow PMC members or committers. The DataFusion project plans to
>>> continue adding new PMC and committers as the project matures and
>>> grows.
>>> 
>>> Community
>>> 
>>> The DataFusion development team seeks to foster the development and
>>> user communities. We hope that becoming a separate project will help
>>> both Arrow and DataFusion communities by being more focused.  Focused
>>> governance will make it easier to grow the community of committers and
>>> PMC members and make the organization more clear to others.
>>> 
>>> Alignment
>>> 
>>> The ASF is a natural host for DataFusion given that it is already the
>>> home of Arrow, Parquet, and other related distributed system, storage
>>> and query execution systems.
>>> 
>>> Project Leadership
>>> 
>>> Proposed Initial PMC
>>> 
>>> We propose the following people as the initial DataFusion PMC members.
>>> This is a subset of the existing Arrow PMC members who contribute to
>>> DataFusion https://people.apache.org/phonebook.html?unix=arrow
>>> 
>>> Andy Grove (agrove):  Arrow PMC Chair
>>> Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair
>>> Daniël Heres (dheres) Arrow PMC
>>> Jie Wen (jakevin):  Arrow PMC, Doris Committer
>>> Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC
>>> Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC
>>> Qingping Hou: (houqp): Arrow PMC
>>> Wes McKinney(wesm): Arrow PMC, ASF Member
>>> Will Jones (wjones127): Arrow PMC
>>> 
>>> We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF
>>> VP) for the DataFusion project.
>>> 
>>> Affiliations
>>> 
>>> Andy Grove (agrove):  NVidia
>>> Andrew Lamb (alamb): InfluxData
>>> Daniël Heres (dheres): Coralogix
>>> Jie Wen (jakevin): SelectDB
>>> Kun Liu (liukun): Ebay
>>> Liang-Chi Hsieh (viirya): Apple
>>> Qingping Hou: (houqp): Scribd
>>> Wes McKinney(wesm): Posit
>>> Will Jones (wjones127): LanceDB
>>> 
>>> Proposed Initial Committers
>>> 
>>> In addition to the PMC, we propose the following people as the initial
>>> DataFusion committers. This is a subset of the existing Arrow
>>> committers who contribute to DataFusion
>>> https://people.apache.org/phonebook.html?unix=arrow
>>> 
>>> akurmustafa Mustafa Akur (Synnada)
>>> avantgardner Brent Gardner (Coralogix)
>>> comphead Oleks V. (Unaffiliated)
>>> jayzhan Jay Zhan (Unaffiliated)
>>> jeffreyvo Jeffry Vo (Unaffiliated)
>>> jiayuliu Liu Jiayu (Airbnb)
>>> mete Metehan Yildirim (Synnada)
>>> mingmwang Wang Mingming (Ebay)
>>> mneumann Marco Neumann (InfluxData)
>>> nju_yaho Zhong Yanghong (Ebay)
>>> ozankabak Mehmet Ozan Kabak (Synnada)
>>> paddyhoran Paddy Horan (Assured Allies)
>>> rdettai Rémi Dettai (Cloudfuse)
>>> sunchao Chao Sun (Apple)
>>> thinkharderdev Daniel Harris (Coralogix)
>>> tustvold Raphael Taylor-Davies (InfluxData)
>>> wayne Ruihang Xia (Greptime)
>>> xudong963 Xudong Wang (ByteDance)
>>> yjshen Yijie Shen (Space and Time)
>>> yangjiang Yang Jiang (ebay)
>>> 
>>> 
>>> Risk Assessments
>>> 
>>> Naming / Trademarks
>>> 
>>> As a sub-project of Arrow, the DataFusion name has been used for over
>>> 4 years without any known issues. A podling name search did not turn
>>> up any concerns and was approved:
>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>> 
>>> Legal / IP Clearance
>>> 
>>> All DataFusion code has either been donated to the Arrow project with
>>> appropriate IP clearance or  has been developed directly under ASF
>>> processes and procedures. Thus creating a new top level project poses
>>> no new Legal or IP risks.
>>> 
>>> Code Extraction
>>> 
>>> The relevant code is already in 5 separate repositories:
>>> https://github.com/apache/arrow-datafusion/
>>> https://github.com/apache/arrow-datafusion-python
>>> https://github.com/apache/arrow-ballista
>>> https://github.com/apache/arrow-ballista-python
>>> https://github.com/apache/arrow-datafusion-comet
>>> 
>>> We foresee no issues with code extraction and propose these
>>> repositories be  renamed to reflect top level projects
>>> 
>>> Note:  https://github.com/apache/arrow-rs, the Rust implementation of
>>> Arrow, would remain part of the Arrow project.
>>> 
>>> Orphaned Products
>>> 
>>> DataFusion is known to be used in many open source and commercial
>>> projects
>>> https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users
>>> ,
>>> has had multiple commits daily for several years, and its adoption and
>>> number of contributors appears to be growing. We do not foresee the
>>> project being orphaned in the next several years.
>>> 
>>> Inexperience with Open Source
>>> 
>>> The proposed PMC has extensive experience with Apache Arrow and other
>>> Apache projects, and includes PMC members, PMC chairs and an ASF
>>> Member. The DataFusion PMC and more experienced committers will
>>> continue to coach new community members who may be less familiar with
>>> the Apache Way.
>>> 
>>> Homogeneous Developers
>>> 
>>> The 9 proposed PMC members are from 9 different employers and the
>>> proposed committers are similarly distributed across affiliations. No
>>> specific entity employs more than 3 total proposed developers.
>>> 
>>> Reliance on Salaried Developers
>>> 
>>> A substantial amount of work on DataFusion has been by salaried
>>> developers, but it also has a long tradition of attracting
>>> contributions from students and hobbyists and we plan no changes in
>>> contribution structure.
>>> 
>>> Relationships with Other Apache Products
>>> 
>>> DataFusion will obviously have a strong relationship with the Arrow
>>> project given the overlap in people. We don’t foresee close
>>> collaboration with other projects at this time.
>>> 
>>> Cryptography
>>> 
>>> DataFusion does not directly support encryption and there are no
>>> near-term plans to add support for encryption. Users who need this
>>> functionality can use the extension APIs.
>>> 
>>> Required Resources
>>> 
>>> Mailing Lists
>>> 
>>> - priv...@datafusion.apache.org for private PMC discussions (with
>>> moderated subscriptions)
>>> - d...@datafusion.apache.org
>>> - comm...@datafusion.apache.org
>>> - u...@datafusion.apache.org
>>> 
>>> Version Control
>>> 
>>> We propose to continue to use git for source control and github for
>>> hosting and testing resources.
>>> 
>>> We also need to rename the github repositories to reflect the new top
>>> level names:
>>> 
>>> https://github.com/apache/arrow-datafusion/ → apache/datafusion
>>> https://github.com/apache/arrow-datafusion-python →
>>> apache/datafusion-python
>>> https://github.com/apache/arrow-ballista → apache/datafusion-ballista
>>> https://github.com/apache/arrow-ballista-python  →
>>> apache/datafusion-ballista-python
>>> https://github.com/apache/arrow-datafusion-comet → apache/datafusion-comet
>>> 
>>> 
>>> 
>>> Issue Tracking
>>> 
>>> DataFusion would continue to use github for its issue tracking and
>>> communications
>>> 
>>> Other Resources
>>> 
>>> The existing repositories already make use of existing Apache
>>> infrastructure, and we expect no change in the initial resource usage.
>>> As the project continues to grow, we expect continued infrastructure
>>> demand growth.
>>> 
>>> 
>>> FAQ: Has a sub project been promoted to a top level project before?
>>> 
>>> Yes, and it appears to happen commonly. The Arrow project itself was
>>> created as a top level project from work that started in Apache Drill,
>>> and there are many sub projects of Hadoop that spun out as their own
>>> top level projects such as Mahout, Avro and HBase:
>>> 
>>> https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4
>>> 
>>> 
>>> 
>>> Related material:
>>> Name search request / research for DataFusion:
>>> https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>> Discussion about this proposal on the arrow mailing list:
>>> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>>> Discussion about which repositories on the arrow mailing list:
>>> https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q
>>> Discussion about initial PMC on the arrow mailing list:
>>> https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b
>>> Discussion in github about creating a new DataFusion top level
>>> project: https://github.com/apache/arrow-datafusion/discussions/6475
>>> Discussion about graduating on incubator list:
>>> https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99
>>> Original Proposal for the Arrow project:
>>> https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3
>>> 

Reply via email to