Re: [ANNOUNCE] New Arrow committer: Brent Gardner
Congratulations! David Li 于2023年1月12日周四 10:01写道: > Congrats, Brent! > > On Wed, Jan 11, 2023, at 19:07, Jacob Wujciak wrote: > > Congrats! > > > > On Thu, Jan 12, 2023 at 12:06 AM QP Hou wrote: > > > >> Congratulations Brent! > >> > >> On Wed, Jan 11, 2023 at 2:56 PM Andy Grove > wrote: > >> > >> > On behalf of the Arrow PMC, I'm happy to announce that Brent Gardner > >> > has accepted an invitation to become a committer on Apache > >> > Arrow. Welcome, and thank you for your contributions! > >> > > >> > Andy. > >> > > >> >
Re: [ANNOUNCE] New Arrow committer: Brent Gardner
Congrats, Brent! On Wed, Jan 11, 2023, at 19:07, Jacob Wujciak wrote: > Congrats! > > On Thu, Jan 12, 2023 at 12:06 AM QP Hou wrote: > >> Congratulations Brent! >> >> On Wed, Jan 11, 2023 at 2:56 PM Andy Grove wrote: >> >> > On behalf of the Arrow PMC, I'm happy to announce that Brent Gardner >> > has accepted an invitation to become a committer on Apache >> > Arrow. Welcome, and thank you for your contributions! >> > >> > Andy. >> > >>
Re: [ANNOUNCE] New Arrow committer: Brent Gardner
Congrats! On Thu, Jan 12, 2023 at 12:06 AM QP Hou wrote: > Congratulations Brent! > > On Wed, Jan 11, 2023 at 2:56 PM Andy Grove wrote: > > > On behalf of the Arrow PMC, I'm happy to announce that Brent Gardner > > has accepted an invitation to become a committer on Apache > > Arrow. Welcome, and thank you for your contributions! > > > > Andy. > > >
Adding a CODEOWNERS file
Hello Everyone, As discussed in an issue spawned by the state of the project thread [1] I have created a draft PR that adds a CODEOWNERS file to apache/arrow [2]. Adding a CODEOWNERS file will allow committers to be automatically requested for reviews that they are interested in (based on touched files, enabling them to basically "subscribe" to a selection of PRs based on their interests/competence within the monorepo without having to watch all notifications for the repo. The main advantage in my opinion is, that it removes the burden of finding an (initial) reviewer for a PR for contributors, which is a major block in the arrow dev workflow, especially for new contributors. Note that adding a CODEOWNERS file will not automatically activate the branch protection rules to enforce a codeowner review on the respective code. Please review the PR and add yourself to the file via suggestion or direct push to the branch! Documentation on CODEOWNERS file and syntax: [3] Thanks, Jacob [1]: https://github.com/apache/arrow/issues/15232 [2]: https://github.com/apache/arrow/pull/33622 [3]: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
Re: [ANNOUNCE] New Arrow committer: Brent Gardner
Congratulations Brent! On Wed, Jan 11, 2023 at 2:56 PM Andy Grove wrote: > On behalf of the Arrow PMC, I'm happy to announce that Brent Gardner > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > Andy. >
[ANNOUNCE] New Arrow committer: Brent Gardner
On behalf of the Arrow PMC, I'm happy to announce that Brent Gardner has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! Andy.
Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 16.0.0 RC1
I saw that a PR related to this issue was merged, but the issue is still open. I added a comment on the issue asking whether this issue is resolved. On Mon, Jan 9, 2023 at 11:28 PM Andrew Lamb wrote: > There is a report[1] of a seemingly serious regression. I recommend we hold > up finalizing this vote until we have resolved the issue (either in code or > decided it is not a release blocker) > > Andrew > > [1] https://github.com/apache/arrow-datafusion/issues/4844 > > On Mon, Jan 9, 2023 at 10:25 PM Patrick Horan wrote: > > > +1 verified on Mac M1 > > > > On Mon, Jan 9, 2023, at 9:34 AM, Ian Joiner wrote: > > > +1 (Non-binding) > > > > > > Ian > > > > > > Verified on my System76 / Ubuntu 22.04 / AMD64 > > > > > > On Sat, Jan 7, 2023 at 6:18 PM Andy Grove > wrote: > > > > > > > Hi, > > > > > > > > I would like to propose a release of Apache Arrow DataFusion > > > > Implementation, > > > > version 16.0.0. > > > > > > > > This release candidate is based on commit: > > > > dcd52ee3d87c4dd9e2c176165e9e20644f66988b [1] > > > > The proposed release tarball and signatures are hosted at [2]. > > > > The changelog is located at [3]. > > > > > > > > Please download, verify checksums and signatures, run the unit tests, > > and > > > > vote > > > > on the release. The vote will be open for at least 72 hours. > > > > > > > > Only votes from PMC members are binding, but all members of the > > community > > > > are > > > > encouraged to test the release and vote with "(non-binding)". > > > > > > > > The standard verification procedure is documented at > > > > > > > > > > > https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates > > > > . > > > > > > > > [ ] +1 Release this as Apache Arrow DataFusion 16.0.0 > > > > [ ] +0 > > > > [ ] -1 Do not release this as Apache Arrow DataFusion 16.0.0 > because... > > > > > > > > Here is my vote: > > > > > > > > +1 > > > > > > > > [1]: > > > > > > > > > > > https://github.com/apache/arrow-datafusion/tree/dcd52ee3d87c4dd9e2c176165e9e20644f66988b > > > > [2]: > > > > > > > > > > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-16.0.0-rc1 > > > > [3]: > > > > > > > > > > > https://github.com/apache/arrow-datafusion/blob/dcd52ee3d87c4dd9e2c176165e9e20644f66988b/CHANGELOG.md > > > > > > > > > >
Re: [DISCUSS] Updating what are considered reference implementations?
I think this [1] is the thread where the policy was proposed, but it doesn't look like we ever settled on "Java and C++" vs. "any two implementations", or had a vote. I worry that requiring maintainers to add new format features to two "complete" implementations will just lead to fragmentation. People might opt to maintain a fork rather than unblock themselves by implementing a backlog of features they don't need. [1] https://lists.apache.org/thread/9t0pglrvxjhrt4r4xcsc1zmgmbtr8pxj On Fri, Jan 6, 2023 at 12:33 PM Weston Pace wrote: > I think it would be reasonable to state that a reference > implementation must be a complete implementation (i.e. supports all > existing types) that is not derived from another implementation (e.g. > you can't pick pyarrow and arrow-c++). If an implementation does not > plan on ever supporting a new array type then maintainers of that > implementation should be empowered to vote against it. Given that, it > seems like a reasonable burden to ask maintainers to catch up first > before expanding in new directions. > > > On Fri, Jan 6, 2023 at 10:20 AM Micah Kornfield > wrote: > > > > > > > > Note this wording talks about "two reference implementations" not > "*the* > > > two reference implementations". So there can be more than two reference > > > implementations. > > > > > > Maybe reference implementation is the wrong wording here. My main > concern > > is that we try to maintain two "feature complete" implementations at all > > times. I worry if there is a pick 2 from N reference implementations > that > > potentially leads to fragmentation more quickly. But maybe this is > > premature? > > > > Cheers, > > Micah > > > > > > On Fri, Jan 6, 2023 at 10:02 AM Antoine Pitrou > wrote: > > > > > > > > Le 06/01/2023 à 18:58, Micah Kornfield a écrit : > > > > I'm having trouble finding it, but I think we've previously agreed > that > > > new > > > > features needed implementations in 2 reference implementations before > > > > approval (I had thought the community agreed on Java and C++ as the > two > > > > implementations but I can't find the vote thread on it). > > > > > > Note this wording talks about "two reference implementations" not > "*the* > > > two reference implementations". So there can be more than two reference > > > implementations. > > > > > > Regards > > > > > > Antoine. > > > >
Re: DISCUSS: ADBC More Canonical Options
Sorry for the double email. "here [1]" should reference https://github.com/apache/arrow-adbc/milestone/3. On Wed, Jan 11, 2023, at 14:16, David Li wrote: > Thanks for bringing this up. My thought is: > > - We are treating ADBC's APIs as a specification, so we should vote in > general. > - The changes here are minimal and don't introduce any compatibility > concerns - they just add more constant definitions - so I say we vote > and just merge them into main, instead of adding more friction. > > There is a set of more major proposals I have begun collecting here [1] > that would require some work to maintain compatibility. For those, I > think we would want to do development on a branch, then vote and merge > them and bump the specification version. And ideally, bundle these > changes and any others together to avoid introducing a lot of work for > implementations to maintain compatibility. > > -David > > On Wed, Jan 11, 2023, at 11:44, Matt Topol wrote: >> Hey all, >> >> I've filed a PR with ADBC (https://github.com/apache/arrow-adbc/pull/316) >> to add some more explicitly defined canonical options. This then leads the >> an interesting question that should be posed: >> >> For changes like this in general along with other potential updates, should >> we do a series of small votes that are merged into a branch and then >> bundled up into a v1.1.0 release? Or just do votes to merge to main and >> then bump to v1.0.1? Or some other combination of ideas? As this is >> technically a change to the ADBC definitions, it should warrant some kind >> of release, but it might end up spammy to bump versions frequently for >> changes like this for now? >> >> Anyway, I figured it'd be good to open it up for discussion here and see >> what people's opinions on this are. >> >> Thanks all! >> >> --Matt
Re: DISCUSS: ADBC More Canonical Options
Thanks for bringing this up. My thought is: - We are treating ADBC's APIs as a specification, so we should vote in general. - The changes here are minimal and don't introduce any compatibility concerns - they just add more constant definitions - so I say we vote and just merge them into main, instead of adding more friction. There is a set of more major proposals I have begun collecting here [1] that would require some work to maintain compatibility. For those, I think we would want to do development on a branch, then vote and merge them and bump the specification version. And ideally, bundle these changes and any others together to avoid introducing a lot of work for implementations to maintain compatibility. -David On Wed, Jan 11, 2023, at 11:44, Matt Topol wrote: > Hey all, > > I've filed a PR with ADBC (https://github.com/apache/arrow-adbc/pull/316) > to add some more explicitly defined canonical options. This then leads the > an interesting question that should be posed: > > For changes like this in general along with other potential updates, should > we do a series of small votes that are merged into a branch and then > bundled up into a v1.1.0 release? Or just do votes to merge to main and > then bump to v1.0.1? Or some other combination of ideas? As this is > technically a change to the ADBC definitions, it should warrant some kind > of release, but it might end up spammy to bump versions frequently for > changes like this for now? > > Anyway, I figured it'd be good to open it up for discussion here and see > what people's opinions on this are. > > Thanks all! > > --Matt
Arrow R package development sync call - tomorrow (Thurs 12th Jan) at 17:30 UTC
The Arrow R package dev community call is tomorrow at 17:30 UTC. Joining instructions are below. Thursday, 12 January · 17:30 – 18:30 Google Meet joining info Video call link: https://meet.google.com/dbm-ybmv-evb Or dial: (ES) +34 910 48 95 10 PIN: 919 955 818 9233# More phone numbers: https://tel.meet/dbm-ybmv-evb?pin=9199558189233 The notes from the last call can be found at: https://docs.google.com/document/d/1nSIfJw8mfqtvScqvSVqmktpWff80pFmkqiZT7nTtiDo/edit?usp=sharing Thanks, Nic
DISCUSS: ADBC More Canonical Options
Hey all, I've filed a PR with ADBC (https://github.com/apache/arrow-adbc/pull/316) to add some more explicitly defined canonical options. This then leads the an interesting question that should be posed: For changes like this in general along with other potential updates, should we do a series of small votes that are merged into a branch and then bundled up into a v1.1.0 release? Or just do votes to merge to main and then bump to v1.0.1? Or some other combination of ideas? As this is technically a change to the ADBC definitions, it should warrant some kind of release, but it might end up spammy to bump versions frequently for changes like this for now? Anyway, I figured it'd be good to open it up for discussion here and see what people's opinions on this are. Thanks all! --Matt
Re: DISCUSS: ADBC Press Release
Sorry, I didn't mean to imply that Flight SQL was Dremio-specific (and indeed we want to position Flight SQL as a vendor-agnostic protocol). A PR with some tweaks (and a notice about the correction) would be welcome. Possibly something like > ...For example, applications can get Arrow data from BigQuery via the > BigQuery Storage API. Other systems, like Dremio, support Arrow Flight SQL, > an Arrow-native protocol designed to be implemented by multiple vendors. But > not all vendors will implement Arrow Flight SQL, so client applications ... -David On Wed, Jan 11, 2023, at 02:54, Andrew Lamb wrote: > I believe the blog post in question is [1] and the relevant text is > >> Use vendor-specific protocols. For some databases, applications can use a > database-specific protocol or SDK to directly get Arrow data. For example, > applications could use Dremio via Arrow Flight SQL. But client applications > that want to support multiple database vendors would need to integrate with > each of them. (Look at all the connectors that Trino implements.) And > databases like PostgreSQL don’t offer an option supporting Arrow in the > first place. > > I did not read that to mean FlightSQL was a vendor specific protocol, but > if others did so clarifying the wording sounds like a good idea to me > > Perhaps you could propose a specific rephrasing on a PR to [2]. > > Andrew > > [1] https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/ > [2] > https://github.com/apache/arrow-site/blob/master/_posts/2023-01-05-introducing-arrow-adbc.md > > On Wed, Jan 11, 2023 at 8:02 AM James Duong > wrote: > >> Hi, >> >> In the ADBC blog entry that Flight SQL was mentioned as a vendor-specific >> protocol and s Dremio is mentioned in the same sentence. >> >> The intent of the Flight SQL was to be database agnostic and this sort of >> implies Flight SQL as a Dremio-specific protocol which is not really what >> we want. >> >> Perhaps this can be rephrased? Maybe highlight that ADBC can help with >> building generic Arrow-based applications that work with both databases >> that have a specific Arrow-interface such as Big Query in addition to any >> Flight SQL-capable sources. >>
Re: Apache Arrow Board Report, by Jan 11 2023
Here is the report that was submitted ## Description: The mission of Apache Arrow is the creation and maintenance of software related to columnar in-memory processing and data interchange ## Issues: Lack of ASF sponsored invite-free chat service is a minor source of friction for community building. Most subprojects now use github for tickets to lower the barrier to entry for new / casual contributors, but we still have fragmented stories for group chat. ASF Slack requires an invite and some sub communities use other chat-like services. ## Membership Data: Apache Arrow was founded 2016-01-20 (7 years ago) There are currently 89 committers and 45 PMC members in this project. The Committer-to-PMC ratio is roughly 2:1. Community changes, past quarter: - Kun Liu was added to the PMC on 2022-11-13 - Jacob Quinn was added to the PMC on 2022-10-25 - Nicola Crane was added to the PMC on 2022-10-25 - Jacob Wujciak was added as committer on 2022-12-19 - Ben Baumgold was added as committer on 2022-10-26 - Bogumił Kamiński was added as committer on 2022-10-24 - Eric Hanson was added as committer on 2022-10-26 - Jie Wen was added as committer on 2023-01-08 - Jarrett Revels was added as committer on 2022-11-02 - Curtis Vogt was added as committer on 2022-11-02 - Raúl Cumplido was added as committer on 2022-12-05 - Will Jones was added as committer on 2022-10-28 - Yang Jiang was added as committer on 2022-11-02 ## Project Activity: * Switching from JIRA to github issues in order to keep the overhead for new contributors low (no need to register for an ASF JIRA account) * [ADBC] (Arrow Database Connectivity) first release: * Community voted to add RLE to the specification * Additional subproject updates are below * We continue to release several different products and releases per quarter [ADBC]: https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/ Recent releases: ADBC-0.1.0 was released on 2023-01-10. RS-30.0.1 was released on 2023-01-08. RS-OS-0.5.3 was released on 2023-01-08. RS-30.0.0 was released on 2023-01-03. RS-29.0.0 was released on 2022-12-12. RS-OS-0.5.2 was released on 2022-12-07. RS-DATAFUSION-15.0.0 was released on 2022-12-05. DATAFUSION-PYTHON-0.7.0 was released on 2022-11-29. RS-28.0.0 was released on 2022-11-28. 10.0.1 was released on 2022-11-22. RS-BALLISTA-0.10.0 was released on 2022-11-21. JULIA-2.4.1 was released on 2022-11-18. RS-27.0.0 was released on 2022-11-15. RS-DATAFUSION-14.0.0 was released on 2022-11-07. RS-26.0.0 was released on 2022-11-03. 10.0.0 was released on 2022-10-26. JULIA-2.4.0 was released on 2022-10-26. RS-BALLISTA-0.9.0 was released on 2022-10-26. RS-25.0.0 was released on 2022-10-17. ## Community Health: The community health appears good, discussions on the mailing lists and github are productive. We recently had a nice discussion on the State of the Project: https://lists.apache.org/thread/r8gl3wvjgy9k8n2t194r0bbdbxx6ksqc and discussed various ways to keep encouraging the community. ## Language Area Updates Arrow has at least 12 different language bindings, as explained in https://arrow.apache.org/overview/ Arrow 10.0.0 release: https://arrow.apache.org/blog/2022/10/31/10.0.0-release/ ### C++ ### C# ### Go We’re seeing significant increases in interest and usage of the Arrow Go library. From startups like Spice.AI to being incorporated and used in Google BigQuery’s quickstart example and more. 2022 was a big year of updates, fixes, and drumming up interest for the Go module that we hope to continue for increased adoption and usage. The Go module, along with C++, is used as the initial implementation for the Run-End Encoding array implementation. Future development plans are to continue to expand the compute capabilities of the Go module and extend integration with Substrait. ### Java ### JavaScript ### Julia We’ve worked again on simplifying and streamlining the administrative side for the Julia implementation; adding additional committers, simplifying the release process, etc. This has increased the rate of contributions, as expected. There’s interest in finishing the C data/stream interfaces for the Julia implementation soon. ### Rust Rust has several projects: arrow-rs (arrow, parquet, arrow-flight object_store implementations) arrow-datafusion: rust query engine arrow-ballista: distributed query engine We are working to incorporate substrait into DataFusion Working on external communication with several blog posts about technology on sorting Fast and Memory Efficient Multi-Column Sorts in Apache Arrow Rust, Part 1 and Querying Parquet with Millisecond Latency We also continue calendar based release train with good results. ### C (GLib) We’ve added support for 16-bit float type. ### MATLAB 1. We have been focusing development efforts on implementing an "object dispatch layer" that uses MEX to "connect" MATLAB objects with corresponding C++ objects. This code is being actively developed at github.com/mathworks/libmexclass.