Wes brought up a great point on the document[1] that I wanted to discuss
here more broadly:

> Others may point out that (I think) you don't have any ASF Members on
your initial PMC. When we started Arrow, we had several veteran ASF members
on our initial PMC who haven't been very active in the project otherwise.
If you wanted Jacques or I (both Members), for example, to serve on the PMC
in that capacity we would likely be happy to do that.

I personally think having ASF Member(s) [2] on the PMC would be most
helpful to connect us to the larger organization and would like to add Wes
and or Jacques if they are willing to do so (are you Wes / Jacques)?

If there are no concerns and Wes / Jacques are willing I will add their
names to the proposed initial PMC.

Andrew

[1]
https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit?disco=AAABH2b6I88
[2] https://www.apache.org/foundation/members

On Mon, Feb 26, 2024 at 5:10 PM Andrew Lamb <al...@influxdata.com> wrote:

> An update:
>
> I have updated the proposal [1] with additional information (new
> committers Jeffrey Vo and Jay Zhan, and the new datafusion-comet repository)
>
> I plan to:
> 1. Call for a formal vote on this (dev@arrow.apache.org) mailing list
> this Friday March 2
> 2. If the vote passes, submit the proposal to the ASF board as part of the
> April 2024 Arrow report.
>
> This extended timeline is designed to balance the needs of some
> contributors to prepare for the changed structure with their employers.
>
> Full Details can be found on [2].
>
> Thank you,
> Andrew
>
> [1]
> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit
> [2] https://github.com/apache/arrow-datafusion/discussions/6475
>
> On Fri, Jan 5, 2024 at 11:19 AM Andrew Lamb <al...@influxdata.com> wrote:
>
>> Thank you very much
>>
>> On Fri, Jan 5, 2024 at 11:17 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Andrew,
>>>
>>> The PODLINGNAMESEARCH is not yet completed: the VP Brand Management
>>> (Mark Thomas) should comment in the Jira to approve or not the name.
>>>
>>> I added a comment in the Jira to ping Mark. He should get back to us
>>> soon.
>>>
>>> Regards
>>> JB
>>>
>>> On Fri, Jan 5, 2024 at 3:38 PM Andrew Lamb <al...@influxdata.com> wrote:
>>> >
>>> > Thanks JB,
>>> >
>>> > I did do a name search and posted the results here [1]
>>> >
>>> > However, I am not sure what the next steps for that particular process
>>> is
>>> > (like does someone have to approve it, for example?)
>>> >
>>> > Any insight you could provide would be greatly appreciated
>>> >
>>> > Andrew
>>> >
>>> > [1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>> >
>>> >
>>> > On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>> >
>>> > > Hi Andrew,
>>> > >
>>> > > I did a quick review on the doc and it looks good to me. I just added
>>> > > a question about name search (DataFusion will probably work as TLP,
>>> > > but we have to check as we have a new Apache name moving from Arrow
>>> > > DataFusion to DataFusion).
>>> > >
>>> > > Please let me know if I can help on that.
>>> > >
>>> > > Thanks !
>>> > > Regards
>>> > > JB
>>> > >
>>> > > On Fri, Jan 5, 2024 at 12:26 PM Andrew Lamb <al...@influxdata.com>
>>> wrote:
>>> > > >
>>> > > > Upon reviewing the board report template, I am planning on the
>>> following
>>> > > > schedule:
>>> > > > 1. I'll leave this proposal for another few weeks to gather any
>>> > > additional
>>> > > > input
>>> > > > 2. In early February 2024 I'll start a formal vote thread on the
>>> dev@
>>> > > > mailing list for this proposal
>>> > > > 3. If the vote passes, I'll submit a proposed resolution to the
>>> ASF board
>>> > > > for their meeting in April 2024 using the pre-existing template[1]
>>> > > >
>>> > > >
>>> > > > [1]
>>> > > >
>>> > >
>>> https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt
>>> > > >
>>> > > > On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh <vii...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > > Thanks for writing the proposal. It looks great to me too.
>>> > > > > I added a few comments on it.
>>> > > > >
>>> > > > > On Wed, Dec 27, 2023 at 3:05 PM Andy Grove <
>>> andygrov...@gmail.com>
>>> > > wrote:
>>> > > > > >
>>> > > > > > Thank you for creating the draft proposal, Andrew. I have
>>> reviewed
>>> > > this
>>> > > > > and
>>> > > > > > I think it looks great.
>>> > > > > >
>>> > > > > > Andy.
>>> > > > > >
>>> > > > > > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb <
>>> al...@influxdata.com>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > > I have created a draft proposal [1] to break DataFusion out
>>> to its
>>> > > own
>>> > > > > top
>>> > > > > > > level project. Please provide your feedback and suggestions.
>>> > > > > > >
>>> > > > > > > The proposal is included at the end of this email and in this
>>> > > Google
>>> > > > > Doc:
>>> > > > > > >
>>> > > > > > >
>>> > > > >
>>> > >
>>> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g
>>> > > > > > > .
>>> > > > > > >
>>> > > > > > > Feel free to respond to this email or comment / make
>>> suggestions
>>> > > > > directly
>>> > > > > > > on the document.
>>> > > > > > >
>>> > > > > > > I would be especially grateful if people could review and
>>> comment
>>> > > on
>>> > > > > the
>>> > > > > > > proposed list of committers and PMC members.
>>> > > > > > >
>>> > > > > > > I hope everyone is not getting sick of hearing about this,
>>> but I
>>> > > think
>>> > > > > in
>>> > > > > > > this case it is better to over communicate than risk
>>> surprises.
>>> > > > > > >
>>> > > > > > > Andrew
>>> > > > > > >
>>> > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/8491
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > ----------
>>> > > > > > >
>>> > > > > > > DataFusion Top Level Project Proposal
>>> > > > > > > Dec 27, 2023
>>> > > > > > >
>>> > > > > > > [Editor’s note: This document is based on the proposal to
>>> the ASF
>>> > > > > board to
>>> > > > > > > create the Arrow project. One it is been reviewed, we plan
>>> to send
>>> > > it
>>> > > > > to
>>> > > > > > > the ASF board sometime in January or February 2024 for their
>>> > > > > consideration]
>>> > > > > > >
>>> > > > > > > To: The ASF (bo...@apache.org)
>>> > > > > > >
>>> > > > > > > Summary:
>>> > > > > > >
>>> > > > > > > We propose creating a new top level project, Apache
>>> DataFusion,
>>> > > from an
>>> > > > > > > existing sub project of Apache Arrow to facilitate additional
>>> > > > > community and
>>> > > > > > > project growth.
>>> > > > > > >
>>> > > > > > > ----
>>> > > > > > > Apache DataFusion for Apache Top Level Project
>>> > > > > > >
>>> > > > > > > Abstract
>>> > > > > > >
>>> > > > > > > Apache Arrow DataFusion[1]  is a very fast, extensible query
>>> > > engine for
>>> > > > > > > building high-quality data-centric systems in Rust, using the
>>> > > Apache
>>> > > > > Arrow
>>> > > > > > > in-memory format. DataFusion offers SQL and Dataframe APIs,
>>> > > excellent
>>> > > > > > > performance, built-in support for CSV, Parquet, JSON, and
>>> Avro,
>>> > > > > extensive
>>> > > > > > > customization, and a great community.
>>> > > > > > >
>>> > > > > > > [1] https://arrow.apache.org/datafusion/
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > Proposal
>>> > > > > > >
>>> > > > > > > We propose creating a new top level ASF project, Apache
>>> DataFusion,
>>> > > > > > > governed initially by a subset of the Arrow project’s PMC and
>>> > > > > committers.
>>> > > > > > > The project’s code is in four existing git repositories,
>>> currently
>>> > > > > governed
>>> > > > > > > by Apache Arrow which would transfer to the new top level
>>> project.
>>> > > > > > >
>>> > > > > > > Background
>>> > > > > > >
>>> > > > > > > When DataFusion was initially donated to the Arrow project,
>>> it did
>>> > > not
>>> > > > > have
>>> > > > > > > a strong enough community to stand on its own. It has since
>>> grown
>>> > > > > > > significantly, and benefited immensely from being part of
>>> Arrow and
>>> > > > > > > nurturing of the Apache Way, and now has a community strong
>>> enough
>>> > > to
>>> > > > > stand
>>> > > > > > > on its own and that would benefit from focused governance
>>> > > attention.
>>> > > > > > >
>>> > > > > > > The community has discussed this idea publicly for more than
>>> 6
>>> > > months
>>> > > > > > > https://github.com/apache/arrow-datafusion/discussions/6475
>>> and
>>> > > > > briefly
>>> > > > > > > on
>>> > > > > > > the Arrow PMC mailing list
>>> > > > > > >
>>> https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs.
>>> > > As
>>> > > > > of
>>> > > > > > > the
>>> > > > > > > time of this writing both had exclusively positive reactions.
>>> > > > > > >
>>> > > > > > > Several current members of the Arrow PMC are both active
>>> > > contributors
>>> > > > > to
>>> > > > > > > DataFusion and understand and believe deeply in the Apache
>>> Way, and
>>> > > > > play
>>> > > > > > > active governance roles in the Arrow project as PMC members
>>> and PMC
>>> > > > > chairs,
>>> > > > > > > guiding the community, and releasing software versions. With
>>> this
>>> > > > > existing
>>> > > > > > > governance experience and structure, the new top level
>>> project
>>> > > will be
>>> > > > > able
>>> > > > > > > to function well immediately and independently.
>>> > > > > > >
>>> > > > > > > Overview of DataFusion
>>> > > > > > >
>>> > > > > > > Current Status
>>> > > > > > >
>>> > > > > > > Meritocracy
>>> > > > > > >
>>> > > > > > > DataFusion has been developed as part of Apache Arrow and
>>> thus has
>>> > > been
>>> > > > > > > operating as a meritocracy. Many of the developers of
>>> DataFusion
>>> > > are
>>> > > > > Arrow
>>> > > > > > > PMC members or committers. The DataFusion project plans to
>>> continue
>>> > > > > adding
>>> > > > > > > new PMC and committers as the project matures and grows.
>>> > > > > > >
>>> > > > > > > Community
>>> > > > > > >
>>> > > > > > > The DataFusion development team seeks to foster the
>>> development and
>>> > > > > user
>>> > > > > > > communities. We hope that becoming a separate project will
>>> help
>>> > > both
>>> > > > > Arrow
>>> > > > > > > and DataFusion communities by being more focused.  Focused
>>> > > governance
>>> > > > > will
>>> > > > > > > make it easier to grow the community of committers and PMC
>>> members
>>> > > and
>>> > > > > make
>>> > > > > > > the organization more clear to others.
>>> > > > > > >
>>> > > > > > > Alignment
>>> > > > > > >
>>> > > > > > > The ASF is a natural host for DataFusion given that it is
>>> already
>>> > > the
>>> > > > > home
>>> > > > > > > of Arrow, Parquet, and other related distributed system,
>>> storage
>>> > > and
>>> > > > > query
>>> > > > > > > execution systems.
>>> > > > > > >
>>> > > > > > > Project Leadership
>>> > > > > > >
>>> > > > > > > Proposed Initial PMC
>>> > > > > > >
>>> > > > > > > We propose the following people as the initial DataFusion PMC
>>> > > members.
>>> > > > > This
>>> > > > > > > is a subset of the existing Arrow PMC members who contribute
>>> to
>>> > > > > DataFusion
>>> > > > > > > https://people.apache.org/phonebook.html?unix=arrow
>>> > > > > > >
>>> > > > > > > Andy Grove (agrove):  Arrow PMC Chair
>>> > > > > > > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair
>>> > > > > > > Daniël Heres (dheres) Arrow PMC
>>> > > > > > > Jie Wen (jakevin):  Arrow PMC, Doris Committer
>>> > > > > > > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC
>>> > > > > > > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC
>>> > > > > > > Qingping Hou: (houqp): Arrow PMC, Doris Committer
>>> > > > > > > Will Jones (wjones127): Arrow PMC
>>> > > > > > >
>>> > > > > > > We’d like to propose Andrew Lamb as the initial Chair, (and
>>> thus
>>> > > ASF
>>> > > > > VP)
>>> > > > > > > for the DataFusion project.
>>> > > > > > >
>>> > > > > > > Affiliations
>>> > > > > > >
>>> > > > > > > Andy Grove (agrove):  NVidia
>>> > > > > > > Andrew Lamb (alamb): InfluxData
>>> > > > > > > Daniël Heres (dheres): Coralogix
>>> > > > > > > Jie Wen (jakevin): SelectDB
>>> > > > > > > Kun Liu (liukun): Ebay
>>> > > > > > > Liang-Chi Hsieh (viirya): Apple
>>> > > > > > > Qingping Hou: (houqp): Scribd
>>> > > > > > > Will Jones (wjones127): VoltronData
>>> > > > > > >
>>> > > > > > > Proposed Initial Committers
>>> > > > > > >
>>> > > > > > > In addition to the PMC, we propose the following people as
>>> the
>>> > > initial
>>> > > > > > > DataFusion committers. This is a subset of the existing Arrow
>>> > > > > committers
>>> > > > > > > who contribute to DataFusion
>>> > > > > > > https://people.apache.org/phonebook.html?unix=arrow
>>> > > > > > >
>>> > > > > > > akurmustafa Mustafa Akur (Synnada)
>>> > > > > > > avantgardner Brent Gardner (Coralogix)
>>> > > > > > > comphead Oleks V. (Unaffiliated)
>>> > > > > > > jiayuliu Liu Jiayu (Airbnb)
>>> > > > > > > mete Metehan Yildirim (Synnada)
>>> > > > > > > mingmwang Wang Mingming (Ebay)
>>> > > > > > > mneumann Marco Neumann (InfluxData)
>>> > > > > > > nju_yaho Zhong Yanghong (Ebay)
>>> > > > > > > ozankabak Mehmet Ozan Kabak (Synnada)
>>> > > > > > > paddyhoran Paddy Horan (Assured Allies)
>>> > > > > > > rdettai Rémi Dettai (Cloudfuse)
>>> > > > > > > sunchao Sun Chao (Apple)
>>> > > > > > > thinkharderdev Daniel Harris (Coralogix)
>>> > > > > > > tustvold Raphael Taylor-Davies (InfluxData)
>>> > > > > > > viirya L. C. Hsieh (Apple)
>>> > > > > > > wayne Ruihang Xia (Greptime)
>>> > > > > > > xudong963 Xudong Wang (ByteDance)
>>> > > > > > > yjshen Yijie Shen (Space and Time)
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > Risk Assessments
>>> > > > > > >
>>> > > > > > > Naming / Trademarks
>>> > > > > > >
>>> > > > > > > As a sub-project of Arrow, the DataFusion name has been used
>>> for
>>> > > over 4
>>> > > > > > > years without any known issues. A podling name search has
>>> thus far
>>> > > not
>>> > > > > > > turned up any concerns:
>>> > > > > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>> > > > > > >
>>> > > > > > > Legal / IP Clearance
>>> > > > > > >
>>> > > > > > > All DataFusion code has either been donated to the Arrow
>>> project
>>> > > with
>>> > > > > > > appropriate IP clearance or  has been developed directly
>>> under ASF
>>> > > > > > > processes and procedures. Thus creating a new top level
>>> project
>>> > > poses
>>> > > > > no
>>> > > > > > > new Legal or IP risks.
>>> > > > > > >
>>> > > > > > > Code Extraction
>>> > > > > > >
>>> > > > > > > The relevant code is already in 4 separate repositories:
>>> > > > > > > https://github.com/apache/arrow-datafusion/
>>> > > > > > > https://github.com/apache/arrow-datafusion-python
>>> > > > > > > https://github.com/apache/arrow-ballista
>>> > > > > > > https://github.com/apache/arrow-ballista-python
>>> > > > > > >
>>> > > > > > > We foresee no issues with code extraction and propose these
>>> > > > > repositories be
>>> > > > > > > respectively  renamed to reflect top level projects:
>>> > > > > > > https://github.com/apache/datafusion/
>>> > > > > > > https://github.com/apache/datafusion-python
>>> > > > > > > https://github.com/apache/datafusion-ballista
>>> > > > > > > https://github.com/apache/datafusion-ballista-python
>>> > > > > > >
>>> > > > > > > Note:  https://github.com/apache/arrow-rs, the Rust
>>> > > implementation of
>>> > > > > > > Arrow, would remain part of the Arrow project.
>>> > > > > > >
>>> > > > > > > Orphaned Products
>>> > > > > > >
>>> > > > > > > DataFusion is known to be used in many open source and
>>> commercial
>>> > > > > projects
>>> > > > > > >
>>> > > > > > >
>>> > > > >
>>> > >
>>> https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users
>>> > > > > > > ,
>>> > > > > > > has had multiple commits daily for several years, and its
>>> adoption
>>> > > and
>>> > > > > > > number of contributors appears to be growing.
>>> > > > > > >
>>> > > > > > > Inexperience with Open Source
>>> > > > > > >
>>> > > > > > > The proposed PMC has extensive experience with Apache Arrow
>>> and
>>> > > other
>>> > > > > > > Apache projects, and includes PMC members and PMC chairs. The
>>> > > > > DataFusion
>>> > > > > > > PMC and more experienced committers will continue to coach
>>> new
>>> > > > > community
>>> > > > > > > members who may be less familiar with the Apache Way.
>>> > > > > > >
>>> > > > > > > Homogeneous Developers
>>> > > > > > >
>>> > > > > > > The 8 proposed PMC members are from 8 different employers
>>> and the
>>> > > > > proposed
>>> > > > > > > committers are similarly distributed across affiliations. No
>>> > > specific
>>> > > > > > > entity employs more than 3 total proposed developers.
>>> > > > > > >
>>> > > > > > > Reliance on Salaried Developers
>>> > > > > > >
>>> > > > > > > A substantial amount of work on DataFusion has been by
>>> salaried
>>> > > > > developers,
>>> > > > > > > but it also has a long tradition of attracting contributions
>>> from
>>> > > > > students
>>> > > > > > > and hobbyists and we plan no changes in contribution
>>> structure.
>>> > > > > > >
>>> > > > > > > Relationships with Other Apache Products
>>> > > > > > >
>>> > > > > > > DataFusion will obviously have a strong relationship with
>>> the Arrow
>>> > > > > project
>>> > > > > > > given the overlap in people. We don’t foresee close
>>> collaboration
>>> > > with
>>> > > > > > > other projects at this time.
>>> > > > > > >
>>> > > > > > > Cryptography
>>> > > > > > >
>>> > > > > > > DataFusion does not directly support encryption and there
>>> are no
>>> > > > > near-term
>>> > > > > > > plans to add support for encryption. Users who need this
>>> > > functionality
>>> > > > > can
>>> > > > > > > use the extension APIs.
>>> > > > > > >
>>> > > > > > > Required Resources
>>> > > > > > >
>>> > > > > > > Mailing Lists
>>> > > > > > >
>>> > > > > > > - private@datafusion for private PMC discussions (with
>>> moderated
>>> > > > > > > subscriptions)
>>> > > > > > > - dev@datafusion
>>> > > > > > > - commits@datafusion
>>> > > > > > >
>>> > > > > > > Version Control
>>> > > > > > >
>>> > > > > > > We propose to continue to use git for source control and
>>> gitub for
>>> > > > > hosting
>>> > > > > > > and testing resources.
>>> > > > > > >
>>> > > > > > > Issue Tracking
>>> > > > > > >
>>> > > > > > > DataFusion would continue to use github for its issue
>>> tracking and
>>> > > > > > > communications
>>> > > > > > >
>>> > > > > > > Other Resources
>>> > > > > > >
>>> > > > > > > The existing repositories already make use of existing Apache
>>> > > > > > > infrastructure, and we expect no change in the initial
>>> resource
>>> > > usage.
>>> > > > > As
>>> > > > > > > the project continues to grow, we expect continued
>>> infrastructure
>>> > > > > demand
>>> > > > > > > growth.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > FAQ: Has a sub project been promoted to a top level project
>>> before?
>>> > > > > > >
>>> > > > > > > Yes, and it appears to happen commonly. The Arrow project
>>> itself
>>> > > was
>>> > > > > > > created as a top level project from work that started in
>>> Apache
>>> > > Drill,
>>> > > > > and
>>> > > > > > > there are many sub projects of Hadoop that spun out as their
>>> own
>>> > > top
>>> > > > > level
>>> > > > > > > projects such as Mahout, Avro and HBase:
>>> > > > > > >
>>> > > > > > >
>>> > > > >
>>> > >
>>> https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > Related material:
>>> > > > > > > Name search request / research for DataFusion:
>>> > > > > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>> > > > > > > Discussion about which repositories on the arrow mailing
>>> list:
>>> > > > > > >
>>> https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q
>>> > > > > > > Discussion about initial PMC on the arrow mailing list:
>>> > > > > > >
>>> https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b
>>> > > > > > > Discussion about creating a new DataFusion top level project:
>>> > > > > > > https://github.com/apache/arrow-datafusion/discussions/6475
>>> > > > > > > Discussion about graduating on incubator list:
>>> > > > > > >
>>> https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99
>>> > > > > > > Original Proposal for the Arrow project:
>>> > > > > > >
>>> https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3
>>> > > > > > >
>>> > > > >
>>> > >
>>>
>>

Reply via email to