[VOTE][RUST] Release Apache Arrow Rust 52.2.0 RC1

2024-07-24 Thread Andrew Lamb
Hi,

I would like to propose a release of Apache Arrow Rust Implementation,
version 52.2.0.

This release candidate is based on commit:
49e714de6e951169d0d5e73381af247ad0230fcf [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[1]:
https://github.com/apache/arrow-rs/tree/49e714de6e951169d0d5e73381af247ad0230fcf
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-52.2.0-rc1
[3]:
https://github.com/apache/arrow-rs/blob/49e714de6e951169d0d5e73381af247ad0230fcf/CHANGELOG.md
[4]:
https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh


[RESULT] [VOTE][RUST] Release Apache Arrow Rust Object Store 0.10.2 RC1

2024-07-21 Thread Andrew Lamb
With 3 +1 binding votes the release is approved

The release is available here:

https://dist.apache.org/repos/dist/release/arrow/arrow-object-store-rs-0.10.2

I have also published it to crates.io :
https://crates.io/crates/object_store/0.10.2

Thank you everyone,
Andrew

On Wed, Jul 17, 2024 at 1:57 PM L. C. Hsieh  wrote:

> +1 (binding)
>
> Verified on M1 Mac.
>
> Thanks Andrew.
>
>
> On Wed, Jul 17, 2024 at 10:37 AM Andrew Lamb 
> wrote:
> >
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Object
> > Store Implementation, version 0.10.2.
> >
> > This release candidate is based on commit:
> > b44497e1cdd84933b49b56dd00506411c040b46c [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust Object Store
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust Object Store  because...
> >
> > [1]:
> >
> https://github.com/apache/arrow-rs/tree/b44497e1cdd84933b49b56dd00506411c040b46c
> > [2]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.10.2-rc1
> > [3]:
> >
> https://github.com/apache/arrow-rs/blob/b44497e1cdd84933b49b56dd00506411c040b46c/object_store/CHANGELOG.md
> > [4]:
> >
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
>


[VOTE][RUST] Release Apache Arrow Rust Object Store 0.10.2 RC1

2024-07-17 Thread Andrew Lamb
Hi,

I would like to propose a release of Apache Arrow Rust Object
Store Implementation, version 0.10.2.

This release candidate is based on commit:
b44497e1cdd84933b49b56dd00506411c040b46c [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust Object Store
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust Object Store  because...

[1]:
https://github.com/apache/arrow-rs/tree/b44497e1cdd84933b49b56dd00506411c040b46c
[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.10.2-rc1
[3]:
https://github.com/apache/arrow-rs/blob/b44497e1cdd84933b49b56dd00506411c040b46c/object_store/CHANGELOG.md
[4]:
https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh


Re: Understanding possible synergies between arrow & zarr communities?

2024-07-17 Thread Andrew Lamb
> Has there been any discussion about rewriting parts of zarr in Rust (for
example, the
> IO management stack would be a prime candidate for this type of
> treatment)?

One project that might be interesting from the DataFusion community is [1]
which is a native Rust implementation of reading/writing the zar format
into arrow (from which you could make a Pandas dataframe, for example). I
haven't used it myself, but it might be worth evaluating.

Andrew

[1]: https://github.com/datafusion-contrib/arrow-zarr

On Tue, Jul 16, 2024 at 11:57 AM Antoine Pitrou  wrote:

>
> Hi Carl,
>
> Le 08/07/2024 à 18:43, Carl Boettiger a écrit :
> >
> > As an observer to both communities, I'm interested in if there is or
> might
> > be more communication between the Pangeo community's focus on Zarr
> > serialization with what the Arrow team has done with Parquet.  I
> recognize
> > that these are different formats that serve different purposes, but
> frankly
> > it seems there are a lot of reasonably low-level optimizations in how
> Arrow
> > handles range request parsing (data type conversion,
> > compression/decompression, streaming, much else) on Parquet that I was
> > wondering might be useful in the Zarr context.
>
> Well, Parquet is a rather sophisticated format and the C++ Parquet
> implementation inside PyArrow is not meant for anything else than
> reading Parquet files :-) In other words, I'm afraid there's not much to
> reuse for other purposes there.
>
> > This discussion on the Pangeo forum may be illustrative:
> >
> https://discourse.pangeo.io/t/why-is-parquet-pandas-faster-than-zarr-xarray-here/2513
> > .  I don't want to get too caught up in that particular example since I
> > know the use cases will usually differ, but I think it illustrates only a
> > slice of the potential differences, mostly at a high level (i.e. overhead
> > from dask and maybe fsspec use in zarr).
>
> PyArrow uses the Arrow C++ filesystems under the hood (*), which might
> be faster than fsspec in some cases (by virtue of being implementing in
> C++). However, it's also possible that being implemented in Python would
> allow fsspec to implement more sophisticated optimizations, so this is
> worth measuring on a case-by-case basis.
>
> (*) https://arrow.apache.org/docs/dev/python/filesystems.html
>
> Feel free to ask any other questions, we're ready to help. If there's
> enough interest, perhaps we could even schedule a call with various
> parties at some time.
>
> Regards
>
> Antoine.
>


Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-07-17 Thread Andrew Lamb
An update here is that one of the DataFusion contributors, @xinlifoobar,
did a very neat prototype of using arrow-udf in DataFusion[1] and wrote up
their findings[2]

The major findings are that it would be possible, though it would take some
additional work (e.g. single values, making the function registry optional,
and adding datatype support)

Andrew

[1]: https://github.com/apache/datafusion/pull/11488
[2]:
https://github.com/apache/datafusion/issues/11413#issuecomment-2230502588



On Thu, Jul 4, 2024 at 3:35 PM Felipe Oliveira Carvalho 
wrote:

> Hi Andrew,
>
> During the Arrow Community Meeting I asked Xuanwo many questions trying to
> clarify my understanding of what they mean by "UDF".
>
> To me and you it seems to mean "user defined compute kernels", but in the
> context of these libraries it's *also that* plus the ability to call these
> functions by making calls to Flight services. The ultimate goal being
> federated querying (my words, not them) from databases that can talk to
> Flight to run these UDFs running on them.
>
> IMO it's too big of a problem to solve in the context of Arrow, but I
> suggested that they modularized it more (i.e. separate the kernel-writing
> parts from the parts that make the Flight wiring) and try to incorporate
> the kernel-writing parts (like the Rust macros) into the arrow-LANG
> implementation so that their framework for writing Flight services for UDFs
> could be much smaller and specific to the strategy for connecting querying
> engines they see fit.
>
> Others suggested work on standardizing the protocols used to call these
> UDFs, but IMO that is a huge undertaking. There are simply too many aspects
> to calling UDFs across different services. Standardizing the practices
> around this will take a lot of experimenting.
>
> --
> Felipe
>
> On Wed, Jul 3, 2024 at 3:43 PM Andrew Lamb  wrote:
>
> > What does everyone think about renaming this library to something like
> > `arrow-auto-vectorizer` or `arrow-functions` to emphasize its role with
> > codegen of vectorized implementations?
> >
> > In discussing this proposal internally, it took a while to explain what
> the
> > usecase of the library is
> >
> > From my understanding, the use case is "Automatically generate vectorized
> > kernels from scalar functions".
> >
> > While this can be used for User Defined Functions (UDFs), there are many
> > other uses too (like "built in functions" in processing engines)
> >
> > Andrew
> >
> >
> > On Mon, Jul 1, 2024 at 8:10 AM Xuanwo  wrote:
> >
> > > I have cross-posted the proposal to datafusion community to collect
> more
> > > feedback:
> > >
> > > https://github.com/apache/datafusion/discussions/11192
> > >
> > > On Mon, Jul 1, 2024, at 19:31, Andrew Lamb wrote:
> > > > I have been thinking about this project more, and the more I think
> > about
> > > it
> > > > the more I like it.
> > > >
> > > > For example of the kind of leverage a library like this might bring,
> we
> > > > might consider changing the implementation of Arrow UDF to re-use the
> > > > underlying buffers when possible (e.g. via unary_mut[1]). This would
> > > likely
> > > > provide an across the board efficiency improvement for no costs to
> > > > downstream crates.
> > > >
> > > > Andrew
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary_mut
> > > >
> > > > On Sat, Jun 29, 2024 at 1:47 AM Xuanwo  wrote:
> > > >
> > > >> > That said, wherever it ends up, there should be the agreement of
> > > >> > individuals to accept maintenance of it. Since it's in rust, that
> > > would
> > > >> > generally fall to the arrow-rs contributors and/or the DataFusion
> > > >> > contributors IMO.
> > > >> >
> > > >> > It would be good for it to be part of the community, but only if
> > it's
> > > not
> > > >> > going to end up just bitrotting somewhere.
> > > >>
> > > >> Thanks Matt. This concern does make sense.
> > > >>
> > > >> Arrow UDF is extensively used within RisingWave and Databend. We,
> the
> > > >> initial
> > > >> committers from both RisingWave and Databend, are eager to take
> > > >> responsibility
> > > >> for maintaining these

Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-07-07 Thread Andrew Lamb
Thank you for the summary Felipe.  Your description and suggestion sounds
reasonable to me.

In term of federated querying across services, perhaps that is
something that more naturally fits in with the substrait[1] project 樂

Andrew

[1]: https://substrait.io/

On Thu, Jul 4, 2024 at 3:35 PM Felipe Oliveira Carvalho 
wrote:

> Hi Andrew,
>
> During the Arrow Community Meeting I asked Xuanwo many questions trying to
> clarify my understanding of what they mean by "UDF".
>
> To me and you it seems to mean "user defined compute kernels", but in the
> context of these libraries it's *also that* plus the ability to call these
> functions by making calls to Flight services. The ultimate goal being
> federated querying (my words, not them) from databases that can talk to
> Flight to run these UDFs running on them.
>
> IMO it's too big of a problem to solve in the context of Arrow, but I
> suggested that they modularized it more (i.e. separate the kernel-writing
> parts from the parts that make the Flight wiring) and try to incorporate
> the kernel-writing parts (like the Rust macros) into the arrow-LANG
> implementation so that their framework for writing Flight services for UDFs
> could be much smaller and specific to the strategy for connecting querying
> engines they see fit.
>
> Others suggested work on standardizing the protocols used to call these
> UDFs, but IMO that is a huge undertaking. There are simply too many aspects
> to calling UDFs across different services. Standardizing the practices
> around this will take a lot of experimenting.
>
> --
> Felipe
>
> On Wed, Jul 3, 2024 at 3:43 PM Andrew Lamb  wrote:
>
> > What does everyone think about renaming this library to something like
> > `arrow-auto-vectorizer` or `arrow-functions` to emphasize its role with
> > codegen of vectorized implementations?
> >
> > In discussing this proposal internally, it took a while to explain what
> the
> > usecase of the library is
> >
> > From my understanding, the use case is "Automatically generate vectorized
> > kernels from scalar functions".
> >
> > While this can be used for User Defined Functions (UDFs), there are many
> > other uses too (like "built in functions" in processing engines)
> >
> > Andrew
> >
> >
> > On Mon, Jul 1, 2024 at 8:10 AM Xuanwo  wrote:
> >
> > > I have cross-posted the proposal to datafusion community to collect
> more
> > > feedback:
> > >
> > > https://github.com/apache/datafusion/discussions/11192
> > >
> > > On Mon, Jul 1, 2024, at 19:31, Andrew Lamb wrote:
> > > > I have been thinking about this project more, and the more I think
> > about
> > > it
> > > > the more I like it.
> > > >
> > > > For example of the kind of leverage a library like this might bring,
> we
> > > > might consider changing the implementation of Arrow UDF to re-use the
> > > > underlying buffers when possible (e.g. via unary_mut[1]). This would
> > > likely
> > > > provide an across the board efficiency improvement for no costs to
> > > > downstream crates.
> > > >
> > > > Andrew
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary_mut
> > > >
> > > > On Sat, Jun 29, 2024 at 1:47 AM Xuanwo  wrote:
> > > >
> > > >> > That said, wherever it ends up, there should be the agreement of
> > > >> > individuals to accept maintenance of it. Since it's in rust, that
> > > would
> > > >> > generally fall to the arrow-rs contributors and/or the DataFusion
> > > >> > contributors IMO.
> > > >> >
> > > >> > It would be good for it to be part of the community, but only if
> > it's
> > > not
> > > >> > going to end up just bitrotting somewhere.
> > > >>
> > > >> Thanks Matt. This concern does make sense.
> > > >>
> > > >> Arrow UDF is extensively used within RisingWave and Databend. We,
> the
> > > >> initial
> > > >> committers from both RisingWave and Databend, are eager to take
> > > >> responsibility
> > > >> for maintaining these crates.
> > > >>
> > > >> Additionally, some of us are involved in other Apache Projects, so
> we
> > > >> understand
> > > >> how the Apache Way func

[RESULT][VOTE][RUST] Release Apache Arrow Rust 52.1.0 RC1

2024-07-06 Thread Andrew Lamb
With 4 votes (3 binding) the release is approved

Thank you everyone who participated

The release is available here:
  https://dist.apache.org/repos/dist/release/arrow/arrow-rs-52.1.0

It has also been published to crates.io
* https://crates.io/crates/arrow/52.1.0
* https://crates.io/crates/parquet/52.1.0

Andrew

On Tue, Jul 2, 2024 at 11:42 PM Wayne Xia  wrote:

> +1 (non-binding)
>
> Verified on Arch Linux
>
> Thanks, Andrew.
>
> On Wed, Jul 3, 2024 at 6:17 AM Andy Grove  wrote:
>
> > +1 (binding)
> >
> > Verified on Ubuntu 22.04.4 LTS
> >
> > Thanks, Andrew.
> >
> > On Tue, Jul 2, 2024 at 1:14 PM L. C. Hsieh  wrote:
> >
> > > +1 (binding)
> > >
> > > Verified on M1 Mac.
> > >
> > > Thanks Andrew.
> > >
> > > On Tue, Jul 2, 2024 at 11:58 AM Andrew Lamb 
> > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I would like to propose a release of Apache Arrow Rust
> Implementation,
> > > > version 52.1.0.
> > > >
> > > > Note this is the first release in our new monthly cadence that limits
> > > > breaking API changes to quarterly releases [0].
> > > >
> > > > This release candidate is based on commit:
> > > > 035b5899f3198cbbdddc772f64c214332e6323fe [1]
> > > >
> > > > The proposed release tarball and signatures are hosted at [2].
> > > >
> > > > The changelog is located at [3].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > > > and vote on the release. There is a script [4] that automates some of
> > > > the verification.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Arrow Rust
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > >
> > > > [0]:
> > > >
> > >
> >
> https://github.com/apache/arrow-rs?tab=readme-ov-file#release-versioning-and-schedule
> > > > [1]:
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/tree/035b5899f3198cbbdddc772f64c214332e6323fe
> > > > [2]:
> > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-52.1.0-rc1
> > > > [3]:
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/035b5899f3198cbbdddc772f64c214332e6323fe/CHANGELOG.md
> > > > [4]:
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > >
> >
>


Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-07-03 Thread Andrew Lamb
What does everyone think about renaming this library to something like
`arrow-auto-vectorizer` or `arrow-functions` to emphasize its role with
codegen of vectorized implementations?

In discussing this proposal internally, it took a while to explain what the
usecase of the library is

>From my understanding, the use case is "Automatically generate vectorized
kernels from scalar functions".

While this can be used for User Defined Functions (UDFs), there are many
other uses too (like "built in functions" in processing engines)

Andrew


On Mon, Jul 1, 2024 at 8:10 AM Xuanwo  wrote:

> I have cross-posted the proposal to datafusion community to collect more
> feedback:
>
> https://github.com/apache/datafusion/discussions/11192
>
> On Mon, Jul 1, 2024, at 19:31, Andrew Lamb wrote:
> > I have been thinking about this project more, and the more I think about
> it
> > the more I like it.
> >
> > For example of the kind of leverage a library like this might bring, we
> > might consider changing the implementation of Arrow UDF to re-use the
> > underlying buffers when possible (e.g. via unary_mut[1]). This would
> likely
> > provide an across the board efficiency improvement for no costs to
> > downstream crates.
> >
> > Andrew
> >
> > [1]:
> >
> https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary_mut
> >
> > On Sat, Jun 29, 2024 at 1:47 AM Xuanwo  wrote:
> >
> >> > That said, wherever it ends up, there should be the agreement of
> >> > individuals to accept maintenance of it. Since it's in rust, that
> would
> >> > generally fall to the arrow-rs contributors and/or the DataFusion
> >> > contributors IMO.
> >> >
> >> > It would be good for it to be part of the community, but only if it's
> not
> >> > going to end up just bitrotting somewhere.
> >>
> >> Thanks Matt. This concern does make sense.
> >>
> >> Arrow UDF is extensively used within RisingWave and Databend. We, the
> >> initial
> >> committers from both RisingWave and Databend, are eager to take
> >> responsibility
> >> for maintaining these crates.
> >>
> >> Additionally, some of us are involved in other Apache Projects, so we
> >> understand
> >> how the Apache Way functions. We will focus on community growth to
> ensure
> >> this
> >> project remains active.
> >>
> >> On Sat, Jun 29, 2024, at 13:29, Matt Topol wrote:
> >> >> This UDF implementation doesn’t depend on DataFusion. It can work
> with
> >> > any data in the arrow format.
> >> >
> >> > Given this I'm in agreement with Antoine that it would be weird for
> it to
> >> > be maintained within the DataFusion repo as opposed to it's own repo
> (as
> >> > we've done in the past for things like nanoarrow and
> arrow-experiments).
> >> >
> >> > That said, wherever it ends up, there should be the agreement of
> >> > individuals to accept maintenance of it. Since it's in rust, that
> would
> >> > generally fall to the arrow-rs contributors and/or the DataFusion
> >> > contributors IMO.
> >> >
> >> > It would be good for it to be part of the community, but only if it's
> not
> >> > going to end up just bitrotting somewhere.
> >> >
> >> > --Matt
> >> >
> >> > On Fri, Jun 28, 2024, 8:49 PM Xuanwo  wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> This UDF implementation doesn’t depend on DataFusion. It can work
> with
> >> any
> >> >> data in the arrow format.
> >> >>
> >> >> It has the potential power to make users write ONE UDF function that
> >> works
> >> >> for different query engines as we showed up in databend and
> risingwave.
> >> >>
> >> >> So I personally think it should be part of arrow community.
> >> >>
> >> >> On Sat, Jun 29, 2024, at 05:06, Antoine Pitrou wrote:
> >> >> > Is this UDF implementation based on DataFusion? If so, it makes
> sense
> >> >> > for it to be part of the DataFusion project.
> >> >> >
> >> >> > OTOH, if it can work with any data in the Arrow format, then it
> would
> >> >> > sound weird to maintain it in the DataFusion repo IMHO.
> >> >> >
> >> >> > Regards
> >> >> >
> >> >>

[RUST] Propose changing behavior of casting timestamp with timezone to without timezones

2024-07-03 Thread Andrew Lamb
Hello,

In the context of implementing date_bin for timestamps with timezones in
DataFusion[1] (everyone's favorite corner-case riddled area) I would like
to change how casting (and parsing) of timestamps without timezones works
in Arrow [2].

If you have any comments (or information how other Arrow implementations
handle this), please add them to the discussion on [2].

Thank you,
Andrew

[1]: https://github.com/apache/datafusion/issues/10602
[2]: https://github.com/apache/arrow-rs/issues/5827


[VOTE][RUST] Release Apache Arrow Rust 52.1.0 RC1

2024-07-02 Thread Andrew Lamb
Hi,

I would like to propose a release of Apache Arrow Rust Implementation,
version 52.1.0.

Note this is the first release in our new monthly cadence that limits
breaking API changes to quarterly releases [0].

This release candidate is based on commit:
035b5899f3198cbbdddc772f64c214332e6323fe [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[0]:
https://github.com/apache/arrow-rs?tab=readme-ov-file#release-versioning-and-schedule
[1]:
https://github.com/apache/arrow-rs/tree/035b5899f3198cbbdddc772f64c214332e6323fe
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-52.1.0-rc1
[3]:
https://github.com/apache/arrow-rs/blob/035b5899f3198cbbdddc772f64c214332e6323fe/CHANGELOG.md
[4]:
https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh


Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-07-01 Thread Andrew Lamb
I have been thinking about this project more, and the more I think about it
the more I like it.

For example of the kind of leverage a library like this might bring, we
might consider changing the implementation of Arrow UDF to re-use the
underlying buffers when possible (e.g. via unary_mut[1]). This would likely
provide an across the board efficiency improvement for no costs to
downstream crates.

Andrew

[1]:
https://docs.rs/arrow/latest/arrow/array/struct.PrimitiveArray.html#method.unary_mut

On Sat, Jun 29, 2024 at 1:47 AM Xuanwo  wrote:

> > That said, wherever it ends up, there should be the agreement of
> > individuals to accept maintenance of it. Since it's in rust, that would
> > generally fall to the arrow-rs contributors and/or the DataFusion
> > contributors IMO.
> >
> > It would be good for it to be part of the community, but only if it's not
> > going to end up just bitrotting somewhere.
>
> Thanks Matt. This concern does make sense.
>
> Arrow UDF is extensively used within RisingWave and Databend. We, the
> initial
> committers from both RisingWave and Databend, are eager to take
> responsibility
> for maintaining these crates.
>
> Additionally, some of us are involved in other Apache Projects, so we
> understand
> how the Apache Way functions. We will focus on community growth to ensure
> this
> project remains active.
>
> On Sat, Jun 29, 2024, at 13:29, Matt Topol wrote:
> >> This UDF implementation doesn’t depend on DataFusion. It can work with
> > any data in the arrow format.
> >
> > Given this I'm in agreement with Antoine that it would be weird for it to
> > be maintained within the DataFusion repo as opposed to it's own repo (as
> > we've done in the past for things like nanoarrow and arrow-experiments).
> >
> > That said, wherever it ends up, there should be the agreement of
> > individuals to accept maintenance of it. Since it's in rust, that would
> > generally fall to the arrow-rs contributors and/or the DataFusion
> > contributors IMO.
> >
> > It would be good for it to be part of the community, but only if it's not
> > going to end up just bitrotting somewhere.
> >
> > --Matt
> >
> > On Fri, Jun 28, 2024, 8:49 PM Xuanwo  wrote:
> >
> >> Hi,
> >>
> >> This UDF implementation doesn’t depend on DataFusion. It can work with
> any
> >> data in the arrow format.
> >>
> >> It has the potential power to make users write ONE UDF function that
> works
> >> for different query engines as we showed up in databend and risingwave.
> >>
> >> So I personally think it should be part of arrow community.
> >>
> >> On Sat, Jun 29, 2024, at 05:06, Antoine Pitrou wrote:
> >> > Is this UDF implementation based on DataFusion? If so, it makes sense
> >> > for it to be part of the DataFusion project.
> >> >
> >> > OTOH, if it can work with any data in the Arrow format, then it would
> >> > sound weird to maintain it in the DataFusion repo IMHO.
> >> >
> >> > Regards
> >> >
> >> > Antoine.
> >> >
> >> >
> >> > Le 28/06/2024 à 21:52, Andrew Lamb a écrit :
> >> >> To be clear, if the arrow community thinks this would be better
> >> organized /
> >> >> administered in the Apache DataFusion project (especially if it is
> >> aligned
> >> >> with Rust) I think it would be good to discuss donating there
> >> >>
> >> >> On Fri, Jun 28, 2024 at 3:17 PM Andrew Lamb 
> >> wrote:
> >> >>
> >> >>> I think there are two aspects:
> >> >>> 1. The actual mechanics of implementing functions
> >> >>> 2. The actual library of udf functions (e.g. sin, cos, nullif, etc)
> >> >>>
> >> >>> I agree 2 is not something that belongs naturally in the arrow
> project
> >> and
> >> >>> is better aligned with query engines
> >> >>>
> >> >>> However I think 1 is worth considering.
> >> >>>
> >> >>> As I understand it, the problem arrow_udf solves is avoiding some of
> >> the
> >> >>> boilerplate  required to make vectorized udfs. So instead of
> writing a
> >> >>> special eval_gcd function like this
> >> >>>
> >> >>> ```
> >> >>> fn gcd(l: i64, r: i64) -> i64 {
> >> >>>   // do gcd calculation
> >> >>> }
> >> >>>
> >>

Re: Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-06-28 Thread Andrew Lamb
To be clear, if the arrow community thinks this would be better organized /
administered in the Apache DataFusion project (especially if it is aligned
with Rust) I think it would be good to discuss donating there

On Fri, Jun 28, 2024 at 3:17 PM Andrew Lamb  wrote:

> I think there are two aspects:
> 1. The actual mechanics of implementing functions
> 2. The actual library of udf functions (e.g. sin, cos, nullif, etc)
>
> I agree 2 is not something that belongs naturally in the arrow project and
> is better aligned with query engines
>
> However I think 1 is worth considering.
>
> As I understand it, the problem arrow_udf solves is avoiding some of the
> boilerplate  required to make vectorized udfs. So instead of writing a
> special eval_gcd function like this
>
> ```
> fn gcd(l: i64, r: i64) -> i64 {
>  // do gcd calculation
> }
>
> // implement vectorized version
> fn eval_gcd(left: , right: ) -> ArrayRef {
>   let left = left.as_primitive();
>   let right = right.as_primitive();
>   res = binary(left, right, |l, r| gcd(l, r));
>   Arc::new(res)
> }
> ```
>
> The user simply annotates the scalar function and have the library code
> gen the array version
> ```
> #[function("gcd(int64, int64) -> int64", output = "eval_gcd")]
> fn gcd(l: i64, r: i64) -> i64 {
>  // do gcd calculation
> }
> ```
>
> We have a lot of boilerplate / non idea macro stuff in DataFusion that I
> think this would help a lot.
>
> Andrew
>
>
> On Fri, Jun 28, 2024 at 3:08 PM Raphael Taylor-Davies
>  wrote:
>
>> I wonder if the DataFusion project might be a more natural home for this
>> functionality? UDFs are more of a query engine concept, whereas arrow-rs is
>> more focused on purely physical execution?
>>
>> On 28 June 2024 19:41:39 BST, Runji Wang  wrote:
>> >Hi Felipe,
>> >
>> >Vectorization will be applied whenever possible. When all input and
>> output types of a function are primitive (int16, int32, int64, float32,
>> float64) and do not involve any Option or Result, the macro will
>> automatically generate code based on unary <
>> https://docs.rs/arrow/latest/arrow/compute/fn.unary.html> or binary <
>> https://docs.rs/arrow/latest/arrow/compute/fn.binary.html> kernels,
>> which potentially allows for vectorization.
>> >
>> >Both examples you showed are not vectorized. The `div` function is due
>> to the Result output, while `gcd` is due to the loop in its implementation.
>> However, if the function is simple enough, like an `add` function:
>> >
>> >#[function("add(int, int) -> int")]
>> >fn add(a: i32, b: i32) -> i32 {
>> >a + b
>> >}
>> >
>> >It can be auto-vectorized by llvm.
>> >
>> >Runji
>> >
>> >
>> >On 2024/06/28 17:13:16 Felipe Oliveira Carvalho wrote:
>> >> On Fri, Jun 28, 2024 at 11:07 AM Andrew Lamb 
>> wrote:
>> >> >
>> >> > Hi Xuanwo,
>> >> >
>> >> > Sorry for the delay in responding. I think  the ability to easily
>> write
>> >> > functions that "feel" like native functions in whatever language and
>> be
>> >> > able to generate arrow / vectorized versions of them is quite
>> valuable.
>> >> > This is my understanding of what this proposal is about.
>> >>
>> >> My understanding is that it's not vectorized. From the examples in
>> >> risingwavelabs/arrow-udf, <https://github.com/risingwavelabs/arrow-udf>
>> it
>> >> looks like the macros generate code that gathers values from columns
>> into
>> >> local scalars that are passed as scalar parameters to user functions.
>> Is
>> >> the hope here that rustc/llvm will auto-vectorize the code?
>> >>
>> >> #[function("gcd(int, int) -> int")]
>> >> fn gcd(mut a: i32, mut b: i32) -> i32 {
>> >> while b != 0 {
>> >> (a, b) = (b, a % b);
>> >> }
>> >> a
>> >> }
>> >>
>> >> #[function("div(int, int) -> int")]
>> >> fn div(x: i32, y: i32) -> Result {
>> >> if y == 0 {
>> >> return Err("division by zero");
>> >> }
>> >> Ok(x / y)
>> >> }
>> >>
>> >> > I left some additional comments on the markdown.
>> >> >
>> >> > One thing that might be worth doing is articulate some other
>> p

Re: Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-06-28 Thread Andrew Lamb
I think there are two aspects:
1. The actual mechanics of implementing functions
2. The actual library of udf functions (e.g. sin, cos, nullif, etc)

I agree 2 is not something that belongs naturally in the arrow project and
is better aligned with query engines

However I think 1 is worth considering.

As I understand it, the problem arrow_udf solves is avoiding some of the
boilerplate  required to make vectorized udfs. So instead of writing a
special eval_gcd function like this

```
fn gcd(l: i64, r: i64) -> i64 {
 // do gcd calculation
}

// implement vectorized version
fn eval_gcd(left: , right: ) -> ArrayRef {
  let left = left.as_primitive();
  let right = right.as_primitive();
  res = binary(left, right, |l, r| gcd(l, r));
  Arc::new(res)
}
```

The user simply annotates the scalar function and have the library code gen
the array version
```
#[function("gcd(int64, int64) -> int64", output = "eval_gcd")]
fn gcd(l: i64, r: i64) -> i64 {
 // do gcd calculation
}
```

We have a lot of boilerplate / non idea macro stuff in DataFusion that I
think this would help a lot.

Andrew


On Fri, Jun 28, 2024 at 3:08 PM Raphael Taylor-Davies
 wrote:

> I wonder if the DataFusion project might be a more natural home for this
> functionality? UDFs are more of a query engine concept, whereas arrow-rs is
> more focused on purely physical execution?
>
> On 28 June 2024 19:41:39 BST, Runji Wang  wrote:
> >Hi Felipe,
> >
> >Vectorization will be applied whenever possible. When all input and
> output types of a function are primitive (int16, int32, int64, float32,
> float64) and do not involve any Option or Result, the macro will
> automatically generate code based on unary <
> https://docs.rs/arrow/latest/arrow/compute/fn.unary.html> or binary <
> https://docs.rs/arrow/latest/arrow/compute/fn.binary.html> kernels, which
> potentially allows for vectorization.
> >
> >Both examples you showed are not vectorized. The `div` function is due to
> the Result output, while `gcd` is due to the loop in its implementation.
> However, if the function is simple enough, like an `add` function:
> >
> >#[function("add(int, int) -> int")]
> >fn add(a: i32, b: i32) -> i32 {
> >a + b
> >}
> >
> >It can be auto-vectorized by llvm.
> >
> >Runji
> >
> >
> >On 2024/06/28 17:13:16 Felipe Oliveira Carvalho wrote:
> >> On Fri, Jun 28, 2024 at 11:07 AM Andrew Lamb 
> wrote:
> >> >
> >> > Hi Xuanwo,
> >> >
> >> > Sorry for the delay in responding. I think  the ability to easily
> write
> >> > functions that "feel" like native functions in whatever language and
> be
> >> > able to generate arrow / vectorized versions of them is quite
> valuable.
> >> > This is my understanding of what this proposal is about.
> >>
> >> My understanding is that it's not vectorized. From the examples in
> >> risingwavelabs/arrow-udf, <https://github.com/risingwavelabs/arrow-udf>
> it
> >> looks like the macros generate code that gathers values from columns
> into
> >> local scalars that are passed as scalar parameters to user functions. Is
> >> the hope here that rustc/llvm will auto-vectorize the code?
> >>
> >> #[function("gcd(int, int) -> int")]
> >> fn gcd(mut a: i32, mut b: i32) -> i32 {
> >> while b != 0 {
> >> (a, b) = (b, a % b);
> >> }
> >> a
> >> }
> >>
> >> #[function("div(int, int) -> int")]
> >> fn div(x: i32, y: i32) -> Result {
> >> if y == 0 {
> >> return Err("division by zero");
> >> }
> >> Ok(x / y)
> >> }
> >>
> >> > I left some additional comments on the markdown.
> >> >
> >> > One thing that might be worth doing is articulate some other potential
> >> > locations for where the code might go. One option, as I think you
> propose,
> >> > is to make its own repository.  Another option could be to donate the
> code
> >> > and put the various language bindings in the same repo as the arrow
> >> > language implementations (e.g arrow-rs, arrow for python, etc) which
> would
> >> > likely make it easier to maintain and discover.
> >> >
> >> > I am curious about what other devs / users feel about this?
> >> >
> >> > Andrew
> >> >
> >> >
> >> >
> >> > On Thu, Jun 20, 2024 at 3:04 AM Xuanwo  wrote:
> >> >
> >> > > Hello, e

Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-06-28 Thread Andrew Lamb
Hi Xuanwo,

Sorry for the delay in responding. I think  the ability to easily write
functions that "feel" like native functions in whatever language and be
able to generate arrow / vectorized versions of them is quite valuable.
This is my understanding of what this proposal is about.

I left some additional comments on the markdown.

One thing that might be worth doing is articulate some other potential
locations for where the code might go. One option, as I think you propose,
is to make its own repository.  Another option could be to donate the code
and put the various language bindings in the same repo as the arrow
language implementations (e.g arrow-rs, arrow for python, etc) which would
likely make it easier to maintain and discover.

I am curious about what other devs / users feel about this?

Andrew



On Thu, Jun 20, 2024 at 3:04 AM Xuanwo  wrote:

> Hello, everyone.
>
> I start this thread to disscuss the donation of a User-Defined Function
> Framework for Apache Arrow.
>
> Feel free to review and leave your comments here. For live review, please
> visit:
>
> https://hackmd.io/@xuanwo/apache-arrow-udf
>
> The original content also pasted here for a quick reading:
>
> --
>
> ## Abstract
>
> Arrow UDF is a User-Defined Function Framework for Apache Arrow.
>
> ## Proposal
>
> Arrow UDF allows user to easily create and run user-defined functions
> (UDF) in Rust, Python, Java or JavaScript based on Apache Arrow. The
> functions can be executed natively, or in WebAssembly, or in a remote
> server via Arrow Flight.
>
> Arrow UDF was originally designed to be used by the RisingWave project but
> is now being used by Databend and several database startups.
>
> We believe that the Arrow UDF project will provide diversity value to the
> entire Arrow community.
>
> ## Background
>
> Arrow UDF is being developed by an open-source community from day one and
> is owned by RisingWaveLabs. The project has been launched in December 2023.
>
> ## Initial Goals
>
> By transferring ownership of the project to the Apache Arrow, Arrow UDF
> expects to ensure its neutrality and further encourage and facilitate the
> adoption of Arrow UDF by the community.
>
> ## Current Status
>
> Contributors: 5
>
> Users:
>
> -   [RisingWave]: A Distributed SQL Database for Stream Processing.
> -   [Databend]: An open-source cloud data warehouse that serves as a
> cost-effective alternative to Snowflake.
>
> ## Documentation
>
> The document of Arrow UDF is hosted at
> https://docs.rs/arrow-udf/latest/arrow_udf/.
>
> ## Initial Source
>
> The project currently holds a GitHub repository and multiple packages:
>
> - https://github.com/risingwavelabs/arrow-udf
>
> Rust:
>
> - https://crates.io/arrow-udf/
> - https://crates.io/arrow-udf-python/
> - https://crates.io/arrow-udf-js/
> - https://crates.io/arrow-udf-js-deno/
> - https://crates.io/arrow-udf-wasm/
>
> Python:
>
> - https://pypi.org/project/arrow-udf/
>
> Those packge will retain its name, while the repository will be moved to
> apache org.
>
> ## Required Resources
>
> ### Mailing Lists
>
> We can reuse the existing mailing lists that arrow have.
>
> ### Git Repositories
>
> From
>
> - https://github.com/risingwavelabs/arrow-udf
>
> To
>
> - https://gitbox.apache.org/asf/repos/arrow-udf
> - https://github.com/apache/arrow-udf
>
> ### Issue Tracking
>
> The project would like to continue using GitHub Issues.
>
> ### Other Resources
>
> The project has already chosen GitHub actions as continuous integration
> tools.
>
> ## Initial Committers
>
> - Runji Wang wangrunji0...@163.com
> - Giovanny Gutiérrez
> - sundy-li sund...@apache.org
> - Xuanwo xua...@apache.org
> - Max Justus Spransy maxjus...@gmail.com
>
> [RisingWave]: https://github.com/risingwavelabs/risingwave
> [Databend]: https://github.com/datafuselabs/databend
>
> Xuanwo
>


Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?

2024-06-11 Thread Andrew Lamb
> Interesting. A separated but related discussion[10] will use
a record batch or a map array for statistics and it includes
multiple statistic items. But arrow-rs (DataFusion) uses an
Arrow array per statistic item.

The reason for using an array per item (rather than a RecordBatch) is so
the statistics can be extracted only when needed -- as in if only mins are
needed, there is no reason to also extract maxes and null counts. I don't
know how important this is in practice

> It seems that datafunsion::common::Statistics[2] (this is a
higher level statistics object, right?) doesn't use Arrow
arrays.

That is correct

> When extracted Parquet statistics are converted to
datafunsion::common::Statistics from Arrow arrays in
DataFusion? (It's WIP?)

It is currently done in [1], though I expect this code will improve over
time

Andrew

[1]:
https://github.com/apache/datafusion/blob/47026a2a3dd41a5c87e44ade58d91a89feba147b/datafusion/core/src/datasource/file_format/parquet.rs#L458-L536



On Tue, Jun 11, 2024 at 3:49 AM Sutou Kouhei  wrote:

> Hi,
>
> Thanks for sharing arrow-rs related information!
>
> > 2. Code to extract parquet statistics as Arrow arrays[3] (this is a WIP
> but
> > I plan to propose upstreaming[4] to arrow-rs when complete)
>
> Interesting. A separated but related discussion[10] will use
> a record batch or a map array for statistics and it includes
> multiple statistic items. But arrow-rs (DataFusion) uses an
> Arrow array per statistic item.
>
> It seems that datafunsion::common::Statistics[2] (this is a
> higher level statistics object, right?) doesn't use Arrow
> arrays. When extracted Parquet statistics are converted to
> datafunsion::common::Statistics from Arrow arrays in
> DataFusion? (It's WIP?)
>
>
> [10] https://lists.apache.org/thread/z0jz2bnv61j7c6lbk7lympdrs49f69cx
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?" on Mon, 10
> Jun 2024 11:26:23 -0400,
>   Andrew Lamb  wrote:
>
> >> (Does arrow-rs compute statistics from in-memory Arrow array?)
> >
> > Not really, though there are kernels[1] to do so for some types
> >
> > We have two related concepts in the Rust ecosystem:
> > 1. Full on statistics in DataFusion [2] (though no great way at the
> moment)
> > 2. Code to extract parquet statistics as Arrow arrays[3] (this is a WIP
> but
> > I plan to propose upstreaming[4] to arrow-rs when complete)
> >
> > I think that  code to extract statistics from Parquet files as arrow
> arrays
> > is a very important feature (and lets query engines do row group and data
> > page prunng).
> >
> > The value of a  higher level Statistics object is a little less clear to
> me
> > -- query engines end up with all sorts of complicated calculations on
> such
> > objects (like predicate selectivity, NDV estimation, boundary analysis,
> > etc) that finding what level makes sense in arrow might be hard.
> >
> > Andrew
> >
> > [1]: https://docs.rs/arrow/latest/arrow/compute/fn.min.html
> > [2]:
> >
> https://docs.rs/datafusion/latest/datafusion/common/struct.Statistics.html
> > [3]:
> >
> https://github.com/apache/datafusion/blob/e094f94d2a3f23128ce782a20982dbf7ac1ebed2/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L579
> > [4]: https://github.com/apache/arrow-rs/issues/4328
> >
> > On Sun, Jun 9, 2024 at 3:40 AM Sutou Kouhei  wrote:
> >
> >> Hi,
> >>
> >> Thanks for your comment.
> >>
> >> You may misunderstand my motivation.
> >>
> >> This proposal doesn't change the Apache Arrow columnar
> >> format. For example, this proposal doesn't save statistics
> >> read from Apache Parquet file to Apache Arrow IPC file. This
> >> proposal just attaches statistics read from Apache Parquet
> >> file to in-memory arrow::Array C++ objects. It's just for
> >> easy to use in-memory arrow::Array C++ objects.
> >>
> >> This proposal doesn't compute statistics from in-memory
> >> arrow::Array C++ objects. (We may want to do it later but
> >> this proposal doesn't propose it.)
> >>
> >> (Does arrow-rs compute statistics from in-memory Arrow
> >> array?)
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?" on Thu,
> 6
> >> Jun 2024 08:13:11 +0200,
> >>   Jorge Cardoso Leitão  wrote:
> >>
> >> > Hi
> >> >
> >> > This is c++ specific, but imo the questio

Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?

2024-06-10 Thread Andrew Lamb
>  It could be useful to quantify how much is being saved vs how much
complexity is being added to the format + implementations.

Xiangpeng and I are working on a blog post to quantify this overhead in
parquet-rs -- I'll post it here when ready

Andrew

On Thu, Jun 6, 2024 at 2:13 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi
>
> This is c++ specific, but imo the question applies more broadly.
>
> I understood that the rationale for stats in compressed+encoded formats
> like parquet is that computing those stats has a high cost (io + decompress
> + decode + aggregate). This motivates the materialization of aggregates.
>
> In arrow the data is already in an in-memory format (e.g. IPC+mmap, or in
> the heap) and the cost is thus smaller (aggregate).
>
> It could be useful to quantify how much is being saved vs how much
> complexity is being added to the format + implementations.
>
> Best,
> Jorge
>
>
> On Thu, Jun 6, 2024, 07:55 Micah Kornfield  wrote:
>
> > Generally I think this is a good idea that has been proposed before but I
> > don't think we could ever make progress on design.
> >
> > On Sun, Jun 2, 2024 at 7:17 PM Sutou Kouhei  wrote:
> >
> > > Hi,
> > >
> > > Related GitHub issue:
> > > https://github.com/apache/arrow/issues/41909
> > >
> > > How about adding arrow::ArrayStatistics?
> > >
> > > Motivation:
> > >
> > > An Apache Arrow format data doesn't have statistics. (We can
> > > add statistics as metadata but there isn't any standard way
> > > for it.)
> > >
> > > But a source of an Apache Arrow format data such as Apache
> > > Parquet format data may have statistics. We can get the
> > > source statistics via source reader such as
> > > parquet::ColumnChunkMetaData::statistics() but can't get
> > > them from read Apache Arrow format data. If we want to use
> > > the source statistics, we need to keep the source reader.
> > >
> > > Proposal:
> > >
> > > How about adding arrow::ArrayStatistics or something and
> > > attaching source statistics to read arrow::Array? If source
> > > statistics are attached to read arrow::Array, we don't need
> > > to keep a source reader to get source statistics.
> > >
> > > What do you think about this idea?
> > >
> > >
> > > NOTE: I haven't thought about the arrow::ArrayStatistics
> > > details yet. We'll be able to use parquet::Statistics and
> > > its family as a reference.
> > > https://github.com/apache/arrow/blob/main/cpp/src/parquet/statistics.h
> > >
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> >
>


Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?

2024-06-10 Thread Andrew Lamb
> (Does arrow-rs compute statistics from in-memory Arrow array?)

Not really, though there are kernels[1] to do so for some types

We have two related concepts in the Rust ecosystem:
1. Full on statistics in DataFusion [2] (though no great way at the moment)
2. Code to extract parquet statistics as Arrow arrays[3] (this is a WIP but
I plan to propose upstreaming[4] to arrow-rs when complete)

I think that  code to extract statistics from Parquet files as arrow arrays
is a very important feature (and lets query engines do row group and data
page prunng).

The value of a  higher level Statistics object is a little less clear to me
-- query engines end up with all sorts of complicated calculations on such
objects (like predicate selectivity, NDV estimation, boundary analysis,
etc) that finding what level makes sense in arrow might be hard.

Andrew

[1]: https://docs.rs/arrow/latest/arrow/compute/fn.min.html
[2]:
https://docs.rs/datafusion/latest/datafusion/common/struct.Statistics.html
[3]:
https://github.com/apache/datafusion/blob/e094f94d2a3f23128ce782a20982dbf7ac1ebed2/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs#L579
[4]: https://github.com/apache/arrow-rs/issues/4328

On Sun, Jun 9, 2024 at 3:40 AM Sutou Kouhei  wrote:

> Hi,
>
> Thanks for your comment.
>
> You may misunderstand my motivation.
>
> This proposal doesn't change the Apache Arrow columnar
> format. For example, this proposal doesn't save statistics
> read from Apache Parquet file to Apache Arrow IPC file. This
> proposal just attaches statistics read from Apache Parquet
> file to in-memory arrow::Array C++ objects. It's just for
> easy to use in-memory arrow::Array C++ objects.
>
> This proposal doesn't compute statistics from in-memory
> arrow::Array C++ objects. (We may want to do it later but
> this proposal doesn't propose it.)
>
> (Does arrow-rs compute statistics from in-memory Arrow
> array?)
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?" on Thu, 6
> Jun 2024 08:13:11 +0200,
>   Jorge Cardoso Leitão  wrote:
>
> > Hi
> >
> > This is c++ specific, but imo the question applies more broadly.
> >
> > I understood that the rationale for stats in compressed+encoded formats
> > like parquet is that computing those stats has a high cost (io +
> decompress
> > + decode + aggregate). This motivates the materialization of aggregates.
> >
> > In arrow the data is already in an in-memory format (e.g. IPC+mmap, or in
> > the heap) and the cost is thus smaller (aggregate).
> >
> > It could be useful to quantify how much is being saved vs how much
> > complexity is being added to the format + implementations.
> >
> > Best,
> > Jorge
> >
> >
> > On Thu, Jun 6, 2024, 07:55 Micah Kornfield 
> wrote:
> >
> >> Generally I think this is a good idea that has been proposed before but
> I
> >> don't think we could ever make progress on design.
> >>
> >> On Sun, Jun 2, 2024 at 7:17 PM Sutou Kouhei  wrote:
> >>
> >> > Hi,
> >> >
> >> > Related GitHub issue:
> >> > https://github.com/apache/arrow/issues/41909
> >> >
> >> > How about adding arrow::ArrayStatistics?
> >> >
> >> > Motivation:
> >> >
> >> > An Apache Arrow format data doesn't have statistics. (We can
> >> > add statistics as metadata but there isn't any standard way
> >> > for it.)
> >> >
> >> > But a source of an Apache Arrow format data such as Apache
> >> > Parquet format data may have statistics. We can get the
> >> > source statistics via source reader such as
> >> > parquet::ColumnChunkMetaData::statistics() but can't get
> >> > them from read Apache Arrow format data. If we want to use
> >> > the source statistics, we need to keep the source reader.
> >> >
> >> > Proposal:
> >> >
> >> > How about adding arrow::ArrayStatistics or something and
> >> > attaching source statistics to read arrow::Array? If source
> >> > statistics are attached to read arrow::Array, we don't need
> >> > to keep a source reader to get source statistics.
> >> >
> >> > What do you think about this idea?
> >> >
> >> >
> >> > NOTE: I haven't thought about the arrow::ArrayStatistics
> >> > details yet. We'll be able to use parquet::Statistics and
> >> > its family as a reference.
> >> >
> https://github.com/apache/arrow/blob/main/cpp/src/parquet/statistics.h
> >> >
> >> >
> >> > Thanks,
> >> > --
> >> > kou
> >> >
> >>
>


Re: [VOTE][RUST] Release Apache Arrow Rust 52.0.0 RC1

2024-06-04 Thread Andrew Lamb
+1 (binding)

Verified on M3 mac.

Thank you - this one is a good one.

Andrew

On Mon, Jun 3, 2024 at 12:18 PM Andy Grove  wrote:

> +1 (binding)
>
> Verified on Ubuntu 22.04.4 LTS.
>
> Thanks, Raphael.
>
> On Mon, Jun 3, 2024 at 10:12 AM L. C. Hsieh  wrote:
>
> > +1 (binding)
> >
> > Verified on M3 Mac.
> >
> > Thanks Raphael.
> >
> > On Mon, Jun 3, 2024 at 9:04 AM Raphael Taylor-Davies
> >  wrote:
> > >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow Rust Implementation,
> > > version 52.0.0.
> > >
> > > This release candidate is based on commit:
> > > f42218ae5d9c9f0b9ea3365f2b1e6025a43b8c71 [1]
> > >
> > > The proposed release tarball and signatures are hosted at [2].
> > >
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> > > and vote on the release. There is a script [4] that automates some of
> > > the verification.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow Rust
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > >
> > > I vote +1 (binding) on this release
> > >
> > > [1]:
> > >
> >
> https://github.com/apache/arrow-rs/tree/f42218ae5d9c9f0b9ea3365f2b1e6025a43b8c71
> > > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-52.0.0-rc1
> > > [3]:
> > >
> >
> https://github.com/apache/arrow-rs/blob/f42218ae5d9c9f0b9ea3365f2b1e6025a43b8c71/CHANGELOG.md
> > > [4]:
> > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > >
> >
>


Re: [DISCUSS] Move sqlparser-rs back into DataFusion project?

2024-05-30 Thread Andrew Lamb
I started a discussion about moving sqlparser into Apache Software
Foundation governance[1].

Please provide any comments you may have there

[1]: https://github.com/sqlparser-rs/sqlparser-rs/issues/1294

On Thu, Feb 29, 2024 at 5:02 PM Andy Grove  wrote:

> I will put this proposal on hold for now and restart the conversation later
> this year once DataFusion is a top-level ASF project.
>
> Thanks again for all the feedback.
>
> Andy.
>
> On Wed, Feb 28, 2024 at 9:58 AM Andy Grove  wrote:
>
> > Thanks for all the feedback so far.
> >
> > It does seem that the least contentious way to do this would be to follow
> > Andrew's suggestion of having a separate
> > apache/[arrow-]datafusion-sqlparser repository as this will ensure that
> we
> > do not end up adding any DataFusion dependencies to the sqlparser
> project,
> > and that it continues to have its own release process.
> >
> > The main benefit here is that it would bring it under ASF governance and
> > allow those who have permission from their employers to contribute to
> > Apache Arrow/DataFusion to be able to help with the maintenance burden.
> >
> > Andy.
> >
> >
> >
> > On Wed, Feb 28, 2024 at 4:28 AM Andrew Lamb 
> wrote:
> >
> >> One potential way "moving sqlparser-rs into DataFusion" could look is
> that
> >> code/repo is moved from the sqlparser-rs [1] organization to the apache
> >> organization. For example
> >>
> >> https://github.com/sqlparser-rs/sqlparser-rs
> >> to
> >> https://github.com/apache/datafusion-sqlparser
> >>
> >> We could continue development separately from any other code, release it
> >> as
> >> a separate artifact, but use the same overarching governance structure
> >> (voting on releases, committer access, etc)
> >>
> >> To follow this model, I think the largest work item would be to run the
> IP
> >> clearance process, and since sqlparser-rs has many distinct contributors
> >> that may take a while
> >>
> >> Andrew
> >>
> >>
> >>
> >> On Wed, Feb 28, 2024 at 1:45 AM Aldrin 
> >> wrote:
> >>
> >> > Maybe it would be valuable to more explicitly define "moving back into
> >> > DataFusion project".
> >> >
> >> > I assumed it meant absorbing into the datafusion repo, but it occurs
> to
> >> me
> >> > that may not be the case. Then, how would sqlparser-rs be "moved"?
> >> >
> >> >
> >> >
> >> > # --
> >> > # Aldrin
> >> >
> >> >
> >> > https://github.com/drin/
> >> > https://gitlab.com/octalene
> >> > https://keybase.io/octalene
> >> >
> >> >
> >> > On Tuesday, February 27th, 2024 at 16:20, Chak-Pong Chung <
> >> > chakpongch...@gmail.com> wrote:
> >> >
> >> > > There are cases where people need datafusion but not a SQL parser.
> For
> >> > > example, people building a composable query engine for graph or
> other
> >> > data
> >> > > modality may not choose SQL as the DSL. Decoupling them seems to be
> a
> >> > good
> >> > > idea.
> >> > >
> >> >
> >> > > On Tue, Feb 27, 2024, 6:20 AM Mehmet Ozan Kabak o...@synnada.ai
> >> wrote:
> >> > >
> >> >
> >> > > > In this case, maybe we can bring sqlparser-rs into the ASF
> umbrella
> >> > > > following the arrow-datafusion model?
> >> > > >
> >> >
> >> > > > Once DataFusion becomes a top-level project, we could move it to
> >> > > > datafusion-sqlparser-rs — it would be a quasi-independent project
> >> just
> >> > like
> >> > > > how DataFusion is today w.r.t. Arrow. But it would get most
> >> benefits of
> >> > > > having a community behind it.
> >> > > >
> >> >
> >> > > > > On Feb 27, 2024, at 2:11 AM, Andrew Lamb al...@influxdata.com
> >> wrote:
> >> > > > >
> >> >
> >> > > > > Julian, thank you for your insight. I very much agree with it.
> >> > > > >
> >> >
> >> > > > > > I think the ASF is wrong on this. I think it needs to provide
> a
> >> > home
> >> > > > > &g

Re: [DISCUSS] Migration of parquet-cpp issues to GitHub

2024-05-28 Thread Andrew Lamb
I think it is a great idea -- github has served arrow (and datafusion) very
well in my opinion.

Specifically, having to sign up for a JIRA account (which can not be
created self-service) adds a small, but real barrier to engagement and
contribution.

Removing the barrier I think encourages more contributions, especially
casual contributions

Andrew

On Tue, May 28, 2024 at 7:40 PM Rok Mihevc  wrote:

> Hi all,
>
> I'd like to re-raise the idea of migrating parquet-cpp issues from
> Parquet's Jira to Arrow's GitHub issue tracker. Arrow migrated in January
> 2023 [1]. The migration was relatively smooth and the experience since
> seems to be positive.
>
> The reasons we would want to migrate parque-cpp issues are:
> - Issues of parquet-cpp are effectively already tracked on Github [2] (220
> open, last 20h ago), while Jira [3] is less active (55 open).
> - Arrow's release process could be simplified if Jira was not in the
> release workflow [4], reducing workload of release manager
>
> A reason against this would be that split issue tracking of parquet-java
> and parquet-cpp doesn't help with feature parity of implementations.
>
> Migration was already discussed to some degree in the "[C++] Parquet and
> Arrow overlap" thread [5], but no clear consensus or vote was reached. If
> we can reach an agreement I would proceed and migrate parquet-cpp issues in
> one of the coming weekends.
>
> Additionally, we could migrate other parquet issues with relatively little
> additional effort and I'd be willing to do it if there is interest from
> the community.
>
> Rok
>
> [1] https://github.com/apache/arrow/issues/14546
> [2]
>
> https://github.com/apache/arrow/issues?q=is%3Aissue+label%3A%22Component%3A+Parquet%22+
> [3]
>
> https://issues.apache.org/jira/browse/PARQUET-2200?jql=project%20%3D%20PARQUET%20AND%20status%20%3D%20Open%20AND%20component%20%3D%20parquet-cpp%20ORDER%20BY%20created%20DESC%2C%20due%20DESC
> [4]
>
> https://arrow.apache.org/docs/developers/release.html#create-or-update-the-corresponding-maintenance-branch
> [5] https://lists.apache.org/thread/jf9wos3t6xxk6xdyx2dof1jlkbpkr56p
>


Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.10.1 RC1

2024-05-11 Thread Andrew Lamb
+1 (binding)

Thank you for the quick release and response

Verified on M3 Mac



On Fri, May 10, 2024 at 2:00 PM L. C. Hsieh  wrote:

> +1 (binding)
>
> Verified on M3 Mac.
>
> Thanks Raphael.
>
> On Fri, May 10, 2024 at 10:31 AM Raphael Taylor-Davies
>  wrote:
> >
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Object
> > Store Implementation, version 0.10.1.
> >
> > This is primarily motivated by a major bug introduced by 0.10.0 [1]
> >
> > This release candidate is based on commit:
> > 3d3ddb2108502854da98654ada85364d5627ef21 [2]
> >
> > The proposed release tarball and signatures are hosted at [3].
> >
> > The changelog is located at [4]
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [5] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust Object Store
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust Object Store because...
> >
> > [1]: https://github.com/apache/arrow-rs/issues/5743
> > [2]:
> >
> https://github.com/apache/arrow-rs/tree/3d3ddb2108502854da98654ada85364d5627ef21
> > [3]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.10.1-rc1
> > [4]:
> >
> https://github.com/apache/arrow-rs/blob/3d3ddb2108502854da98654ada85364d5627ef21/object_store/CHANGELOG.md
> > [5]:
> >
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
> >
>


Re: Fwd: [C++] Parquet and Arrow overlap

2024-05-11 Thread Andrew Lamb
It is great to see some additional enthusiasm and momentum around the
Apache Parquet implementation (congratulations on the release of parquet-mr
1.14[1]!).

As activity picks up, if the desire is to build more community around
Parquet, perhaps the Parquet PMC wants to encourage moving code back to
repositories managed by parquet (and out of arrow, for example). I realize
this would be a technical burden, but it might clarify communities and
committers.

Andrew

[1]: https://lists.apache.org/thread/2gggm938z0x9fx3wtwctfm5htsxlf3z4



On Fri, May 10, 2024 at 11:45 PM Matt Topol  wrote:

> I just wanted to also poke the question of non-Java developers who have
> worked on the other parquet implementations potentially being recognized as
> committers or otherwise on the Parquet project (speaking as the primary
> developer of the Go parquet implementation which also lives in the Arrow
> repository). It would be great to see some active contributors to
> parquet-cpp, parquet-go, or otherwise not just being considered but
> actively becoming committers.
>
> That's just my two cents from a community perspective.
>
> --Matt
>
> On Fri, May 10, 2024, 10:35 PM Jacob Wujciak 
> wrote:
>
> > Thank you, that sounds great! On first glance some seem to be rather old
> > and probably don't apply anymore.
> >
> > > BTW, do we really need to make a full copy of them to have a mirror in
> > the Arrow GitHub issues?
> >
> > I am not sure I understand what you mean? A full copy of the
> > open/closed/all issues? I'd say only the (remaining) open issues would be
> > fine.
> > For reference this is what an imported issue looks like:
> > https://github.com/apache/arrow/issues/30543
> >
> > Am Sa., 11. Mai 2024 um 04:09 Uhr schrieb Gang Wu :
> >
> > > I can initiate the vote. But before the vote, I think we need to
> revisit
> > > the states of all unresolved tickets and close some as needed.
> > >
> > > BTW, do we really need to make a full copy of them to have a mirror
> > > in the Arrow GitHub issues?
> > >
> > > I'd like to seek a consensus here before sending the vote.
> > >
> > > Best,
> > > Gang
> > >
> > > On Sat, May 11, 2024 at 8:46 AM Jacob Wujciak 
> > > wrote:
> > >
> > > > Hello Everyone!
> > > >
> > > > It seems there is general agreement on this topic, it would be great
> > if a
> > > > committer/PMC could start a (lazy consensus) procedural vote.
> > > >
> > > > I will inquire how to handle the parquet-cpp component in jira
> (ideally
> > > > disabling it, not removing).
> > > > There are currently only ~70 open tickets for parquet-cpp, with the
> > > change
> > > > in repo it is probably easier to just move open tickets but I'll
> leave
> > > that
> > > > to Rok who managed the transition of Arrows 20k+ tickets too :D
> > > >
> > > > Thanks,
> > > > Jacob
> > > >
> > > > Arrow committer
> > > >
> > > > On 2024/04/25 05:31:18 Gang Wu wrote:
> > > > > I know we have some non-Java committers and PMCs. But after the
> > > > parquet-cpp
> > > > > donation, it seems that no one worked on Parquet from arrow (cpp,
> > rust,
> > > > go,
> > > > > etc.)
> > > > > and other projects are promoted as a Parquet committer. It would be
> > > > > inconvenient
> > > > > for non-Java Parquet developers to work with apache/parquet-format
> > and
> > > > > apache/parquet-testing repositories. Furthermore, votes from these
> > > > > developers
> > > > > are not binding for a format change in the ML.
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > > > On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn 
> > wrote:
> > > > >
> > > > > > > Should we consider
> > > > > > > Parquet developers from other projects than parquet-mr as
> Parquet
> > > > > > commuters?
> > > > > >
> > > > > > We are doing this (speaking as a Parquet PMC who didn't work on
> > > > > > parquet-mr, but parquet-cpp).
> > > > > >
> > > > > > Best
> > > > > > Uwe
> > > > > >
> > > > > > On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> > > > > > > +1 for moving parquet-cpp issues from Apache Jira to Arrow's
> > GitHub
> > > > > > issue.
> > > > > > >
> > > > > > > Besides, I want to echo Will's question in the thread. Should
> we
> > > > consider
> > > > > > > Parquet developers from other projects than parquet-mr as
> Parquet
> > > > > > commiters?
> > > > > > > Currently apache/parquet-format and apache/parquet-testing
> > > > repositories
> > > > > > are
> > > > > > > solely governed by Apache Parquet PMC. It would be better for
> the
> > > > entire
> > > > > > > Parquet community if developers with sufficient contributions
> to
> > > open
> > > > > > source
> > > > > > > Parquet projects (including but not limited to parquet-cpp,
> > > arrow-rs,
> > > > > > cudf,
> > > > > > > etc.)
> > > > > > > can be considered as Parquet committer and PMC.
> > > > > > >
> > > > > > > Best,
> > > > > > > Gang
> > > > > > >
> > > > > > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn 
> > > > wrote:
> > > > > > >
> > > > > > >> I would be very supportive of this move. The 

[CROWDSOURCING] May 2024 ASF Board Report

2024-04-30 Thread Andrew Lamb
As part of being a new project, we need to submit reports to the board
every month for the first three months[1].

In the tradition of Apache Arrow, I hope the community can help draft this
report. Please take a look and add anything that might be relevant[2].

Thanks,
Andrew

[1]: https://github.com/apache/datafusion/issues/10281
[2]:
https://docs.google.com/document/d/1knyR2epIOY7WoXZO_DOtlcPNSenb3-V-osCHqPXqSms/edit


[RESULT] [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.1.0 RC2

2024-04-22 Thread Andrew Lamb
With 4 +1 votes (3 binding) the release is approved.

Thank you everyone who helped with this release (likely to be the last on
the arrow list).

Success! The release is available here:
  https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-37.1.0

It has also been released to crates.io:
https://crates.io/crates/datafusion/37.1.0

Andrew




On Sat, Apr 20, 2024 at 8:21 AM Wayne Xia  wrote:

> +1 (non-binding)
>
> Verified on AMD64 Arch Linux
>
> Thanks, Andrew.
>
> On Sat, Apr 20, 2024 at 8:32 AM Andy Grove  wrote:
>
> > +1 (binding)
> >
> > Verified on Ubuntu 24.04.4 LTS.
> >
> > Thanks, Andrew.
> >
> > On Thu, Apr 18, 2024 at 8:04 PM L. C. Hsieh  wrote:
> >
> > > +1 (binding)
> > >
> > > Verified on M3 Mac.
> > >
> > > Thanks Andrew.
> > >
> > > On Thu, Apr 18, 2024 at 2:20 PM Andrew Lamb 
> > wrote:
> > > >
> > > > I would like to propose a release of Apache Arrow DataFusion
> > > Implementation,
> > > > version 37.1.0.
> > > >
> > > > Note this is the second RC (the first RC[4] did not include the
> change
> > to
> > > > the version numbers[5] :facepalm:). I apologize for the runaround.
> > > >
> > > >
> > > > This release candidate is based on commit:
> > > > aee976aa1a75514c7dbb33ef47527b3ba99081dd [1]
> > > > The proposed release tarball and signatures are hosted at [2].
> > > > The changelog is located at [3].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > and
> > > > vote
> > > > on the release. The vote will be open for at least 72 hours.
> > > >
> > > > Only votes from PMC members are binding, but all members of the
> > community
> > > > are
> > > > encouraged to test the release and vote with "(non-binding)".
> > > >
> > > > The standard verification procedure is documented at
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > > > .
> > > >
> > > > [ ] +1 Release this as Apache Arrow DataFusion 37.1.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow DataFusion 37.1.0
> because...
> > > >
> > > > Here is my vote:
> > > >
> > > > +1
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/aee976aa1a75514c7dbb33ef47527b3ba99081dd
> > > > [2]:
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.1.0-rc2
> > > > [3]:
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/aee976aa1a75514c7dbb33ef47527b3ba99081dd/CHANGELOG.md
> > > > [4]:
> https://lists.apache.org/thread/33bkbrlkqv962y0topx9rlqg19g5q2vk
> > > > [5]: https://github.com/apache/arrow-datafusion/pull/10128
> > >
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.1.0 RC1

2024-04-18 Thread Andrew Lamb
Unfortunately it appears I made an error and forgot to update the release
version in this RC

I have started a new thread[1] for a second RC

Andrew

[1] https://lists.apache.org/thread/0md6qyhw0hody8p0v9wddvt7vo8r8z2x

On Thu, Apr 18, 2024 at 2:23 PM Andrew Lamb  wrote:

> Thanks Andy, Jake and L.C.
>
> > Note that I still had to set RUST_MIN_STACK to avoid a stack overflow. I
> don't know if that is still expected.
>
> Yes I do think that is still expected unfortunately and is consistent with
> the 37.0.0 release[1] (we didn't change anything in 37.1.0 RC1 that would
> alter this behavior unfortunately)
>
> However, I am quite pleased that the overflow *is* fixed on main (what
> will become 38.0.0), due to several improvements including [2].
>
> Andrew
>
> [1]: https://lists.apache.org/thread/lwdflob72q7t7mqqxnqcobjzqjbc218o
> [2]: https://github.com/apache/arrow-datafusion/pull/10023
>
> On Thu, Apr 18, 2024 at 11:25 AM L. C. Hsieh  wrote:
>
>> +1 (binding)
>>
>> Verified on M3 Mac. Tests are passed with RUST_MIN_STACK.
>>
>> Thanks Andrew.
>>
>> On Thu, Apr 18, 2024 at 7:51 AM vin jake  wrote:
>> >
>> > +1(binding)
>> >
>> > Thanks alamb for your work and efforts.
>> >
>> >
>> >
>> > On Thu, Apr 18, 2024 at 10:01 PM Andrew Lamb 
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I would like to propose a release of Apache Arrow DataFusion
>> > > Implementation,
>> > > version 37.1.0, a patch release with some bug fixes. Please see [4]
>> for
>> > > details.
>> > > There is a failing CI test which only affects development tools [6].
>> > >
>> > > While DataFusion is now officially its own top level Apache project,
>> we do
>> > > not yet have enough infrastructure (email lists) setup to do voting
>> > > there[5], so I would like to do this one last time on the arrow list.
>> > >
>> > > This release candidate is based on commit:
>> > > d4eb72c30d45c0f3f359c64f41a6caed30abe750 [1]
>> > > The proposed release tarball and signatures are hosted at [2].
>> > > The changelog is located at [3].
>> > >
>> > > Please download, verify checksums and signatures, run the unit tests,
>> and
>> > > vote
>> > > on the release. The vote will be open for at least 72 hours.
>> > >
>> > > Only votes from PMC members are binding, but all members of the
>> community
>> > > are
>> > > encouraged to test the release and vote with "(non-binding)".
>> > >
>> > > The standard verification procedure is documented at
>> > >
>> > >
>> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
>> > > .
>> > >
>> > > [ ] +1 Release this as Apache Arrow DataFusion 37.1.0
>> > > [ ] +0
>> > > [ ] -1 Do not release this as Apache Arrow DataFusion 37.1.0
>> because...
>> > >
>> > > Here is my vote:
>> > >
>> > > +1
>> > >
>> > > [1]:
>> > >
>> > >
>> https://github.com/apache/arrow-datafusion/tree/d4eb72c30d45c0f3f359c64f41a6caed30abe750
>> > > [2]:
>> > >
>> > >
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.1.0-rc1
>> > > [3]:
>> > >
>> > >
>> https://github.com/apache/arrow-datafusion/blob/d4eb72c30d45c0f3f359c64f41a6caed30abe750/CHANGELOG.md
>> > > [4]: https://github.com/apache/arrow-datafusion/issues/9904
>> > > [5]: https://github.com/apache/arrow-datafusion/issues/9691
>> > > [6]:
>> > >
>> > >
>> https://github.com/apache/arrow-datafusion/pull/10128#issuecomment-2063655318
>> > > .
>> > >
>>
>


[VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.1.0 RC2

2024-04-18 Thread Andrew Lamb
I would like to propose a release of Apache Arrow DataFusion Implementation,
version 37.1.0.

Note this is the second RC (the first RC[4] did not include the change to
the version numbers[5] :facepalm:). I apologize for the runaround.


This release candidate is based on commit:
aee976aa1a75514c7dbb33ef47527b3ba99081dd [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and
vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community
are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at
https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
.

[ ] +1 Release this as Apache Arrow DataFusion 37.1.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow DataFusion 37.1.0 because...

Here is my vote:

+1

[1]:
https://github.com/apache/arrow-datafusion/tree/aee976aa1a75514c7dbb33ef47527b3ba99081dd
[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.1.0-rc2
[3]:
https://github.com/apache/arrow-datafusion/blob/aee976aa1a75514c7dbb33ef47527b3ba99081dd/CHANGELOG.md
[4]: https://lists.apache.org/thread/33bkbrlkqv962y0topx9rlqg19g5q2vk
[5]: https://github.com/apache/arrow-datafusion/pull/10128


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.1.0 RC1

2024-04-18 Thread Andrew Lamb
Thanks Andy, Jake and L.C.

> Note that I still had to set RUST_MIN_STACK to avoid a stack overflow. I
don't know if that is still expected.

Yes I do think that is still expected unfortunately and is consistent with
the 37.0.0 release[1] (we didn't change anything in 37.1.0 RC1 that would
alter this behavior unfortunately)

However, I am quite pleased that the overflow *is* fixed on main (what will
become 38.0.0), due to several improvements including [2].

Andrew

[1]: https://lists.apache.org/thread/lwdflob72q7t7mqqxnqcobjzqjbc218o
[2]: https://github.com/apache/arrow-datafusion/pull/10023

On Thu, Apr 18, 2024 at 11:25 AM L. C. Hsieh  wrote:

> +1 (binding)
>
> Verified on M3 Mac. Tests are passed with RUST_MIN_STACK.
>
> Thanks Andrew.
>
> On Thu, Apr 18, 2024 at 7:51 AM vin jake  wrote:
> >
> > +1(binding)
> >
> > Thanks alamb for your work and efforts.
> >
> >
> >
> > On Thu, Apr 18, 2024 at 10:01 PM Andrew Lamb 
> wrote:
> >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow DataFusion
> > > Implementation,
> > > version 37.1.0, a patch release with some bug fixes. Please see [4] for
> > > details.
> > > There is a failing CI test which only affects development tools [6].
> > >
> > > While DataFusion is now officially its own top level Apache project,
> we do
> > > not yet have enough infrastructure (email lists) setup to do voting
> > > there[5], so I would like to do this one last time on the arrow list.
> > >
> > > This release candidate is based on commit:
> > > d4eb72c30d45c0f3f359c64f41a6caed30abe750 [1]
> > > The proposed release tarball and signatures are hosted at [2].
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> and
> > > vote
> > > on the release. The vote will be open for at least 72 hours.
> > >
> > > Only votes from PMC members are binding, but all members of the
> community
> > > are
> > > encouraged to test the release and vote with "(non-binding)".
> > >
> > > The standard verification procedure is documented at
> > >
> > >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > > .
> > >
> > > [ ] +1 Release this as Apache Arrow DataFusion 37.1.0
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow DataFusion 37.1.0 because...
> > >
> > > Here is my vote:
> > >
> > > +1
> > >
> > > [1]:
> > >
> > >
> https://github.com/apache/arrow-datafusion/tree/d4eb72c30d45c0f3f359c64f41a6caed30abe750
> > > [2]:
> > >
> > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.1.0-rc1
> > > [3]:
> > >
> > >
> https://github.com/apache/arrow-datafusion/blob/d4eb72c30d45c0f3f359c64f41a6caed30abe750/CHANGELOG.md
> > > [4]: https://github.com/apache/arrow-datafusion/issues/9904
> > > [5]: https://github.com/apache/arrow-datafusion/issues/9691
> > > [6]:
> > >
> > >
> https://github.com/apache/arrow-datafusion/pull/10128#issuecomment-2063655318
> > > .
> > >
>


Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.10.0 RC1

2024-04-18 Thread Andrew Lamb
+1 (binding)

I reviewed the breaking API changes and the changelog and ran the
verification scripts

Thank you Raphael

Andrew

On Thu, Apr 18, 2024 at 6:55 AM Raphael Taylor-Davies
 wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow Rust Object
> Store Implementation, version 0.10.0.
>
> This release candidate is based on commit:
> cd3331989d65f6d56830f9ffa758b4c96d10f4be [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust Object Store
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust Object Store because...
>
> I vote +1 (binding) on this release
>
> [1]:
>
> https://github.com/apache/arrow-rs/tree/cd3331989d65f6d56830f9ffa758b4c96d10f4be
> [2]:
>
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.10.0-rc1
> [3]:
>
> https://github.com/apache/arrow-rs/blob/cd3331989d65f6d56830f9ffa758b4c96d10f4be/object_store/CHANGELOG.md
> [4]:
>
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
>
>


[VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.1.0 RC1

2024-04-18 Thread Andrew Lamb
Hi,

I would like to propose a release of Apache Arrow DataFusion Implementation,
version 37.1.0, a patch release with some bug fixes. Please see [4] for
details.
There is a failing CI test which only affects development tools [6].

While DataFusion is now officially its own top level Apache project, we do
not yet have enough infrastructure (email lists) setup to do voting
there[5], so I would like to do this one last time on the arrow list.

This release candidate is based on commit:
d4eb72c30d45c0f3f359c64f41a6caed30abe750 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and
vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community
are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at
https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
.

[ ] +1 Release this as Apache Arrow DataFusion 37.1.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow DataFusion 37.1.0 because...

Here is my vote:

+1

[1]:
https://github.com/apache/arrow-datafusion/tree/d4eb72c30d45c0f3f359c64f41a6caed30abe750
[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.1.0-rc1
[3]:
https://github.com/apache/arrow-datafusion/blob/d4eb72c30d45c0f3f359c64f41a6caed30abe750/CHANGELOG.md
[4]: https://github.com/apache/arrow-datafusion/issues/9904
[5]: https://github.com/apache/arrow-datafusion/issues/9691
[6]:
https://github.com/apache/arrow-datafusion/pull/10128#issuecomment-2063655318.


Re: [RESULT] [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-04-18 Thread Andrew Lamb
As a follow up to this thread, and several others on this mailing list, I
am pleased to announce that the proposal to create the DataFusion Top Level
Project passed unanimously at the April 2023 ASF board meeting.

Thank you to everyone in the Arrow community who has helped nurture this
subproject over the years. It has truly been an amazing experience to see
how it is grown and now "graduated" to its own top level project.

We are tracking subtasks for the transition on [1]. Please feel free to
suggest additional items or comments there

Andrew


[1]: https://github.com/apache/arrow-datafusion/issues/9691

On Sat, Mar 9, 2024 at 3:23 AM Andrew Lamb  wrote:

> With 30 +1 votes (18 binding by my count), and 0 -1 votes, the proposal is
> approved
>
> Thank you everyone who voted and participated in helping form the proposal
> (and the community over the last few years). I think there are exciting
> times ahead for DataFusion.
>
> We will submit the proposed motion to the ASF board with our next
> quarterly report, for the April 2024 meeting, as previously discussed.
>
> Thanks again
> Andrew
>
>
>
> On Tue, Mar 5, 2024 at 2:00 PM Benjamin Kietzman 
> wrote:
>
>> +1 (binding)
>>
>> On Tue, Mar 5, 2024, 13:03 Peter Toth  wrote:
>>
>> > +1 (non-binding)
>> >
>> > Parth Chandra  ezt írta (időpont: 2024. márc. 5., K,
>> > 18:15):
>> >
>> > > +1 (non-binding)
>> > >
>> > > On Sun, Mar 3, 2024 at 8:04 PM Mehmet Ozan Kabak 
>> > wrote:
>> > >
>> > > > +1 (non-binding)
>> > > > --
>> > > > *Mehmet Ozan Kabak*
>> > > > Co-founder & CEO @ Synnada, Inc.
>> > > >
>> > > >
>> > > > On Sun, Mar 3, 2024 at 7:33 PM Jacob Wujciak-Jens
>> > > >  wrote:
>> > > >
>> > > > > +1 (non-binding)
>> > > > >
>> > > > > On Mon, Mar 4, 2024 at 3:39 AM Yang Jiang 
>> > > wrote:
>> > > > >
>> > > > > > +1 (non-binding)
>> > > > > >
>> > > > > > On 2024/03/01 18:08:26 Daniël Heres wrote:
>> > > > > > > +1 (binding)
>> > > > > > >
>> > > > > > > On Fri, Mar 1, 2024, 19:05 Chao Sun 
>> wrote:
>> > > > > > >
>> > > > > > > > +1 (non-binding)
>> > > > > > > >
>> > > > > > > > On Fri, Mar 1, 2024 at 9:59 AM QP Hou 
>> > wrote:
>> > > > > > > >
>> > > > > > > > > +1 (binding)
>> > > > > > > > >
>> > > > > > > > > exciting milestone :)
>> > > > > > > > >
>> > > > > > > > > On Fri, Mar 1, 2024 at 9:49 AM David Li <
>> lidav...@apache.org
>> > >
>> > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > +1
>> > > > > > > > > >
>> > > > > > > > > > On Fri, Mar 1, 2024, at 12:06, Jorge Cardoso Leitão
>> wrote:
>> > > > > > > > > > > +1 - great work!!!
>> > > > > > > > > > >
>> > > > > > > > > > > On Fri, Mar 1, 2024 at 5:49 PM Micah Kornfield <
>> > > > > > > > emkornfi...@gmail.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >> +1 (binding)
>> > > > > > > > > > >>
>> > > > > > > > > > >> On Friday, March 1, 2024, Uwe L. Korn <
>> uw...@xhochy.com
>> > >
>> > > > > wrote:
>> > > > > > > > > > >>
>> > > > > > > > > > >> > +1 (binding)
>> > > > > > > > > > >> >
>> > > > > > > > > > >> > On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote:
>> > > > > > > > > > >> > > +1 (binding)
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace <
>> > > > > > > > weston.p...@gmail.com
>> &

[DataFusion] 37.1.0 maintenance release

2024-04-16 Thread Andrew Lamb
We are making a 37.1.0 maintenance release[1]. If anyone wants to add
anything or review the PRs please feel free.

Andrew

[1] https://github.com/apache/arrow-datafusion/issues/9904


Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore

2024-04-11 Thread Andrew Lamb
Welcome!

On Thu, Apr 11, 2024 at 2:43 PM Ian Cook  wrote:

> Congrats Sarah!
>
> On Thu, Apr 11, 2024 at 12:31 Bryce Mecum  wrote:
>
> > Congratulations!
> >
> > On Thu, Apr 11, 2024 at 3:13 AM Sutou Kouhei  wrote:
> > >
> > > Hi,
> > >
> > > On behalf of the Arrow PMC, I'm happy to announce that Sarah
> > > Gilmore has accepted an invitation to become a committer on
> > > Apache Arrow. Welcome, and thank you for your contributions!
> > >
> > > Thanks,
> > > --
> > > kou
> >
>


Re: [DISCUSS] Versioning and releases for apache/arrow components

2024-04-07 Thread Andrew Lamb
I agree with all the other comments on this thread

Having smaller releases is key to being able to release more frequently and
finding the relevant expertise in my opinion.

We have had separate releases / votes for Arrow Rust (and Arrow DataFusion)
and it has served us quite well. The version schemes have diverged
substantially from the monorepo (we are on version 51.0.0 in arrow-rs, for
example) and it doesn't seem to have caused any large confusion with users

Andrew



On Wed, Apr 3, 2024 at 2:11 PM Dewey Dunnington
 wrote:

> Thank you Jacob for bringing this up! I am also in favor of decoupling
> versions (provided that the release managers are also in favor of
> this, since their time is required to implement this and because the
> ongoing consequences of separate releases disproportionately affects
> them).
>
> Part of the vote fatigue is, I think, partly due to the complexity of
> releasing all of the components at the same time. Running the script
> for ADBC, nanoarrow, Rust, and Julia are all fairly straightforward
> because those subprojects have a more limited scope. In contrast, I am
> rarely successful running the Arrow verification script without
> running into an error I don't understand and have become hesitant to
> vote (or try) as a cumulative result of many releases worth of this
> happening (and because R has never been a part of verification, which
> is the component that I unofficially verify anyway). Voting on a batch
> of version numbers seems like a good first step.
>
> I am also not concerned about messaging of different versions of
> different components. The fact that integration tests pass at the
> moment of the release may be meaningful for those familiar with the
> repo, but I don't think that many people are aware of which components
> are tested in that way. As Weston noted, even for components that use
> Arrow C++, the implementation of Arrow C++ features may lag behind or
> be completely unrelated (Python being the exception).
>
> On Fri, Mar 29, 2024 at 9:47 AM Weston Pace  wrote:
> >
> > Thank you for bringing this up.  I'm in favor of this.  I think there are
> > several motivations but the main ones are:
> >
> >  1. Decoupling the versions will allow components to have no release, or
> > only a minor release, when there are no breaking changes
> >  2. We do have some vote fatigue I think and we don't want to make that
> > more difficult.
> >  3. Anything we can do to ease the burden of release managers is good
> >
> > If I understand what you are describing then I think it satisfies points
> 1
> > & 2.  I am not familiar enough with the release management process to
> speak
> > to #3.
> >
> > > Voting in one thread on
> > > all components/a subset of components per voter and the surrounding
> > > technicalities is something I would like to hear some opinions on.
> >
> > I am in favor of decoupling the version numbers.  I do think batched
> > quarterly releases are still a good thing to avoid vote fatigue.  Perhaps
> > we can have a single vote on a batch of version numbers (e.g. please vote
> > on the batched release containing CPP version X, Go version Y, JS version
> > Z).
> >
> > > A more meta question is about the messaging that different versioning
> > > schemes carry, as it might no longer be obvious on first glance which
> > > versions are compatible or have the newest features.
> >
> > I am not concerned about this.  One of the advantages of Arrow is that we
> > have a stable C ABI (C Data Interface) and a stable IPC mechanism (IPC
> > serialization) and this means that version compatibility is rarely a
> > difficulty or major concern.  Plus, regarding individual features, our
> > solution already requires a compatibility table (
> > https://arrow.apache.org/docs/status.html).  Changing the versioning
> > strategy will not make this any worse.
> >
> > On Thu, Mar 28, 2024 at 1:42 PM Jacob Wujciak 
> wrote:
> >
> > > Hello Everyone!
> > >
> > > I would like to resurface the discussion of separate
> > > versioning/releases/voting for monorepo components. We have previously
> > > touched on this topic mostly in the community meetings and spread
> across
> > > multiple, only tangential related threads. I think a focused
> discussion can
> > > be a bit more results oriented, especially now that we almost regularly
> > > deviate from the quarterly release cadence with minor releases. My
> hope is
> > > that discussing this and adapting our process can lower the amount of
> work
> > > required and ease the pressure on our release managers (Thank you Raúl
> and
> > > Kou!).
> > >
> > > I think the base of the topic is the separate versioning for
> components as
> > > otherwise separate releases only have limited value. From a technical
> > > perspective standalone implementations like Go or JS are the easiest to
> > > handle in that regard, they can just follow their ecosystem standards,
> > > which has been requested by users already (major releases in Go require

Re: [VOTE] Bulk ingestion support for Flight SQL (vote #2)

2024-04-06 Thread Andrew Lamb
+1

On Sat, Apr 6, 2024 at 3:48 AM wish maple  wrote:

> +1 (non binding)
>
> Best,
> Xuwei Fu
> ulk ingestion support for Flight SQL
>
> David Li  于2024年4月5日周五 16:38写道:
>
> > Hello,
> >
> > Joel Lubinitsky has proposed adding bulk ingestion support to Arrow
> Flight
> > SQL [1]. This provides a path for uploading an Arrow dataset to a Flight
> > SQL server to create or append to a table, without having to know the
> > specifics of the SQL or Substrait support on the server. The
> functionality
> > mimics similar functionality in ADBC. This is the second attempt at a
> vote
> > [3].
> >
> > Joel has provided reference implementations of this for C++ and Go at
> [2],
> > along with an integration test.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept this proposal
> > [ ] +0
> > [ ] -1 Do not accept this proposal because...
> >
> > [1]: https://lists.apache.org/thread/mo98rsh20047xljrbfymrks8f2ngn49z
> > [2]: https://github.com/apache/arrow/pull/38256
> > [3]: https://lists.apache.org/thread/c8n3t0452807wm1ol1hvj41rs1vso3tp
> >
> > Thanks,
> > David
>


Re: [VOTE] Add new info codes and options keys to ADBC specification

2024-04-06 Thread Andrew Lamb
+1

On Fri, Apr 5, 2024 at 9:55 PM Jacob Wujciak  wrote:

> + 1 (non-binding)
>
> Am Sa., 6. Apr. 2024 um 01:57 Uhr schrieb Joel Lubinitsky <
> joell...@gmail.com>:
>
> > Yes, just updated both the issue and the PR.
> >
> > Thanks,
> > Joel
> >
> > On Fri, Apr 5, 2024 at 7:51 PM Sutou Kouhei  wrote:
> >
> > > +1
> > >
> > > Could you also update the description of
> > > https://github.com/apache/arrow-adbc/issues/1650 ?
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In 
> > >   "Re: [VOTE] Add new info codes and options keys to ADBC
> specification"
> > > on Fri, 05 Apr 2024 15:39:33 -,
> > >   Joel Lubinitsky  wrote:
> > >
> > > > Update on this:
> > > >
> > > > I've removed ADBC_INFO_VENDOR_READ_ONLY from the proposal. The change
> > is
> > > reflected in this commit [1] on the original PR [2]. The numbers
> > > corresponding to each of the other info codes have been decremented by
> 1
> > to
> > > fill the gap in numbering.
> > > >
> > > > The reason is that a similar option already exists via
> > > ConnectionGet/SetOptions, so defining it on the driver isn't helpful.
> > > >
> > > > [1]:
> > >
> >
> https://github.com/apache/arrow-adbc/pull/1649/commits/a52a4fa16e6b740392d3617751e28f044f1a8325
> > > > [2]: https://github.com/apache/arrow-adbc/pull/1649
> > > >
> > > > Thanks,
> > > > Joel
> > > >
> > > > On 2024/04/03 11:01:13 Joel Lubinitsky wrote:
> > > >> Hello,
> > > >>
> > > >> I would like to propose a change to the ADBC specification that
> > > introduces
> > > >> 5 new standard info codes and formalizes 3 existing option keys.
> > > >>
> > > >> The info codes being introduced are:
> > > >> - ADBC_INFO_VENDOR_READ_ONLY 3
> > > >> - ADBC_INFO_VENDOR_SQL 4
> > > >> - ADBC_INFO_VENDOR_SUBSTRAIT 5
> > > >> - ADBC_INFO_VENDOR_SUBSTRAIT_MIN_VERSION 6
> > > >> - ADBC_INFO_VENDOR_SUBSTRAIT_MAX_VERSION 7
> > > >>
> > > >> The option keys have been in use (defined in options.h) and are
> being
> > > moved
> > > >> to adbc.h:
> > > >> - ADBC_INGEST_OPTION_TARGET_CATALOG "adbc.ingest.target_catalog"
> > > >> - ADBC_INGEST_OPTION_TARGET_DB_SCHEMA "adbc.ingest.target_db_schema"
> > > >> - ADBC_INGEST_OPTION_TEMPORARY "adbc.ingest.temporary"
> > > >>
> > > >> The change is described in this issue [0] and an implementation is
> > > included
> > > >> in this PR [1].
> > > >>
> > > >> The vote will be open for at least 72 hours.
> > > >>
> > > >> [ ] +1 Add these info codes and options keys to the ADBC spec
> > > >> [ ] +0
> > > >> [ ] -1 Do not add these to the ADBC spec because...
> > > >>
> > > >> Thanks,
> > > >> Joel
> > > >>
> > > >> [0]: https://github.com/apache/arrow-adbc/issues/1650
> > > >> [1]: https://github.com/apache/arrow-adbc/pull/1649
> > > >>
> > >
> >
>


Re: [RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.0.0 RC2

2024-04-04 Thread Andrew Lamb
The release is available here:
  https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-37.0.0

The release has also been published to crates.io:
https://crates.io/crates/datafusion/37.0.0

Thank you all for your help!

Andrew


On Thu, Apr 4, 2024 at 4:31 PM Andrew Lamb  wrote:

> With 4 +1 votes (3 binding) the release is approved. I will now upload to
> crates.io
>
> On Mon, Apr 1, 2024 at 11:59 AM Andrew Lamb  wrote:
>
>> +1
>>
>> I verified on M3 mac
>>
>> Thanks Andy for the quick turnaround on making a new RC
>>
>>
>> Andrew
>>
>>
>> On Mon, Apr 1, 2024 at 1:51 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> +1 (non binding)
>>>
>>> - Hashed and signatures are OK
>>> - ASF headers are present on expected file
>>> - No binary file found in the source distribution
>>> - Checked on MacOS M3
>>>
>>> Regards
>>> JB
>>>
>>> On Mon, Apr 1, 2024 at 12:07 AM Andy Grove 
>>> wrote:
>>> >
>>> > Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion
>>> 37.0.0 RC2
>>> > Hi,
>>> >
>>> > I would like to propose a release of Apache Arrow DataFusion
>>> Implementation,
>>> > version 37.0.0.
>>> >
>>> > This release candidate is based on commit:
>>> > 1fa25ae5d50c5f34f17e77e9f635f854ef5e7642 [1]
>>> > The proposed release tarball and signatures are hosted at [2].
>>> > The changelog is located at [3].
>>> >
>>> > Please download, verify checksums and signatures, run the unit tests,
>>> and
>>> > vote
>>> > on the release. The vote will be open for at least 72 hours.
>>> >
>>> > Only votes from PMC members are binding, but all members of the
>>> community
>>> > are
>>> > encouraged to test the release and vote with "(non-binding)".
>>> >
>>> > The standard verification procedure is documented at
>>> >
>>> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
>>> > .
>>> >
>>> > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
>>> > [ ] +0
>>> > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0 because...
>>> >
>>> > Here is my vote:
>>> >
>>> > +1
>>> >
>>> > [1]:
>>> >
>>> https://github.com/apache/arrow-datafusion/tree/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642
>>> > [2]:
>>> >
>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2
>>> > [3]:
>>> >
>>> https://github.com/apache/arrow-datafusion/blob/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642/CHANGELOG.md
>>> > -
>>> > Running rat license checker on
>>> >
>>> /home/andy/git/apache/arrow-datafusion/dev/dist/apache-arrow-datafusion-37.0.0-rc2/apache-arrow-datafusion-37.0.0.tar.gz
>>> > NOT APPROVED: .github/workflows/pr_benchmarks.yml
>>> > (apache-arrow-datafusion-37.0.0/.github/workflows/pr_benchmarks.yml):
>>> false
>>> > NOT APPROVED: .github/workflows/pr_comment.yml
>>> > (apache-arrow-datafusion-37.0.0/.github/workflows/pr_comment.yml):
>>> false
>>> > 2 unapproved licences. Check rat report: rat.txt
>>> > (base) andy@ripper:~/git/apache/arrow-datafusion$
>>> >
>>> GH_TOKEN=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
>>> > ./dev/release/create-tarball.sh 37.0.0 2
>>> > Attempting to create  from tag 37.0.0-rc2
>>> > Draft email for dev@arrow.apache.org mailing list
>>> >
>>> > -
>>> > To: dev@arrow.apache.org
>>> > Hi,
>>> >
>>> > I would like to propose a release of Apache Arrow DataFusion
>>> Implementation,
>>> > version 37.0.0.
>>> >
>>> > This release candidate is based on commit:
>>> > 1fa25ae5d50c5f34f17e77e9f635f854ef5e7642 [1]
>>> > The proposed release tarball and signatures are hosted at [2].
>>> > The changelog is located at [3].
>>> >
>>> > Please download, verify checksums and signatures, run the unit tests,
>>> and
>>> > vote
>>> > on the release. The vote will be open for at least 72 hours.
>>> >
>>> > Only votes from PMC members are binding, but all members of the
>>> community
>>> > are
>>> > encouraged to test the release and vote with "(non-binding)".
>>> >
>>> > The standard verification procedure is documented at
>>> >
>>> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
>>> > .
>>> >
>>> > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
>>> > [ ] +0
>>> > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0 because...
>>> >
>>> > Here is my vote:
>>> >
>>> > +1 (verified on Ubuntu)
>>> >
>>> > [1]:
>>> >
>>> https://github.com/apache/arrow-datafusion/tree/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642
>>> > [2]:
>>> >
>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2
>>> > [3]:
>>> >
>>> https://github.com/apache/arrow-datafusion/blob/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642/CHANGELOG.md
>>>
>>


[RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.0.0 RC2

2024-04-04 Thread Andrew Lamb
With 4 +1 votes (3 binding) the release is approved. I will now upload to
crates.io

On Mon, Apr 1, 2024 at 11:59 AM Andrew Lamb  wrote:

> +1
>
> I verified on M3 mac
>
> Thanks Andy for the quick turnaround on making a new RC
>
>
> Andrew
>
>
> On Mon, Apr 1, 2024 at 1:51 AM Jean-Baptiste Onofré 
> wrote:
>
>> +1 (non binding)
>>
>> - Hashed and signatures are OK
>> - ASF headers are present on expected file
>> - No binary file found in the source distribution
>> - Checked on MacOS M3
>>
>> Regards
>> JB
>>
>> On Mon, Apr 1, 2024 at 12:07 AM Andy Grove  wrote:
>> >
>> > Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion
>> 37.0.0 RC2
>> > Hi,
>> >
>> > I would like to propose a release of Apache Arrow DataFusion
>> Implementation,
>> > version 37.0.0.
>> >
>> > This release candidate is based on commit:
>> > 1fa25ae5d50c5f34f17e77e9f635f854ef5e7642 [1]
>> > The proposed release tarball and signatures are hosted at [2].
>> > The changelog is located at [3].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> and
>> > vote
>> > on the release. The vote will be open for at least 72 hours.
>> >
>> > Only votes from PMC members are binding, but all members of the
>> community
>> > are
>> > encouraged to test the release and vote with "(non-binding)".
>> >
>> > The standard verification procedure is documented at
>> >
>> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
>> > .
>> >
>> > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0 because...
>> >
>> > Here is my vote:
>> >
>> > +1
>> >
>> > [1]:
>> >
>> https://github.com/apache/arrow-datafusion/tree/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642
>> > [2]:
>> >
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2
>> > [3]:
>> >
>> https://github.com/apache/arrow-datafusion/blob/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642/CHANGELOG.md
>> > -
>> > Running rat license checker on
>> >
>> /home/andy/git/apache/arrow-datafusion/dev/dist/apache-arrow-datafusion-37.0.0-rc2/apache-arrow-datafusion-37.0.0.tar.gz
>> > NOT APPROVED: .github/workflows/pr_benchmarks.yml
>> > (apache-arrow-datafusion-37.0.0/.github/workflows/pr_benchmarks.yml):
>> false
>> > NOT APPROVED: .github/workflows/pr_comment.yml
>> > (apache-arrow-datafusion-37.0.0/.github/workflows/pr_comment.yml): false
>> > 2 unapproved licences. Check rat report: rat.txt
>> > (base) andy@ripper:~/git/apache/arrow-datafusion$
>> >
>> GH_TOKEN=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
>> > ./dev/release/create-tarball.sh 37.0.0 2
>> > Attempting to create  from tag 37.0.0-rc2
>> > Draft email for dev@arrow.apache.org mailing list
>> >
>> > -
>> > To: dev@arrow.apache.org
>> > Hi,
>> >
>> > I would like to propose a release of Apache Arrow DataFusion
>> Implementation,
>> > version 37.0.0.
>> >
>> > This release candidate is based on commit:
>> > 1fa25ae5d50c5f34f17e77e9f635f854ef5e7642 [1]
>> > The proposed release tarball and signatures are hosted at [2].
>> > The changelog is located at [3].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> and
>> > vote
>> > on the release. The vote will be open for at least 72 hours.
>> >
>> > Only votes from PMC members are binding, but all members of the
>> community
>> > are
>> > encouraged to test the release and vote with "(non-binding)".
>> >
>> > The standard verification procedure is documented at
>> >
>> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
>> > .
>> >
>> > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0 because...
>> >
>> > Here is my vote:
>> >
>> > +1 (verified on Ubuntu)
>> >
>> > [1]:
>> >
>> https://github.com/apache/arrow-datafusion/tree/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642
>> > [2]:
>> >
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2
>> > [3]:
>> >
>> https://github.com/apache/arrow-datafusion/blob/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642/CHANGELOG.md
>>
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.0.0 RC2

2024-04-01 Thread Andrew Lamb
+1

I verified on M3 mac

Thanks Andy for the quick turnaround on making a new RC


Andrew


On Mon, Apr 1, 2024 at 1:51 AM Jean-Baptiste Onofré  wrote:

> +1 (non binding)
>
> - Hashed and signatures are OK
> - ASF headers are present on expected file
> - No binary file found in the source distribution
> - Checked on MacOS M3
>
> Regards
> JB
>
> On Mon, Apr 1, 2024 at 12:07 AM Andy Grove  wrote:
> >
> > Subject: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.0.0
> RC2
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion
> Implementation,
> > version 37.0.0.
> >
> > This release candidate is based on commit:
> > 1fa25ae5d50c5f34f17e77e9f635f854ef5e7642 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0 because...
> >
> > Here is my vote:
> >
> > +1
> >
> > [1]:
> >
> https://github.com/apache/arrow-datafusion/tree/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642
> > [2]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2
> > [3]:
> >
> https://github.com/apache/arrow-datafusion/blob/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642/CHANGELOG.md
> > -
> > Running rat license checker on
> >
> /home/andy/git/apache/arrow-datafusion/dev/dist/apache-arrow-datafusion-37.0.0-rc2/apache-arrow-datafusion-37.0.0.tar.gz
> > NOT APPROVED: .github/workflows/pr_benchmarks.yml
> > (apache-arrow-datafusion-37.0.0/.github/workflows/pr_benchmarks.yml):
> false
> > NOT APPROVED: .github/workflows/pr_comment.yml
> > (apache-arrow-datafusion-37.0.0/.github/workflows/pr_comment.yml): false
> > 2 unapproved licences. Check rat report: rat.txt
> > (base) andy@ripper:~/git/apache/arrow-datafusion$
> >
> GH_TOKEN=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
> > ./dev/release/create-tarball.sh 37.0.0 2
> > Attempting to create  from tag 37.0.0-rc2
> > Draft email for dev@arrow.apache.org mailing list
> >
> > -
> > To: dev@arrow.apache.org
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion
> Implementation,
> > version 37.0.0.
> >
> > This release candidate is based on commit:
> > 1fa25ae5d50c5f34f17e77e9f635f854ef5e7642 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0 because...
> >
> > Here is my vote:
> >
> > +1 (verified on Ubuntu)
> >
> > [1]:
> >
> https://github.com/apache/arrow-datafusion/tree/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642
> > [2]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc2
> > [3]:
> >
> https://github.com/apache/arrow-datafusion/blob/1fa25ae5d50c5f34f17e77e9f635f854ef5e7642/CHANGELOG.md
>


Re: [ANNOUNCE] New Committer Joel Lubinitsky

2024-04-01 Thread Andrew Lamb
Congratulations Joel.

On Mon, Apr 1, 2024 at 11:53 AM Raúl Cumplido 
wrote:

> Congratulations and welcome Joel!
>
>
> El lun, 1 abr 2024, 17:18, Kevin Gurney 
> escribió:
>
> > Congratulations, Joel!
> >
> > 
> > From: Jason Z 
> > Sent: Monday, April 1, 2024 11:13 AM
> > To: dev@arrow.apache.org 
> > Subject: Re: [ANNOUNCE] New Committer Joel Lubinitsky
> >
> > Congrats Joel!
> >
> >
> > Thanks,
> > Jiashen
> >
> >
> > On Mon, Apr 1, 2024 at 8:10 AM Ian Cook  wrote:
> >
> > > Congratulations Joel!
> > >
> > > On Mon, Apr 1, 2024 at 11:08 AM wish maple 
> > wrote:
> > >
> > > > Congrats Joel!
> > > >
> > > > Best,
> > > > Xuwei Fu
> > > >
> > > > Matt Topol  于2024年4月1日周一 22:59写道:
> > > >
> > > > > On behalf of the Arrow PMC, I'm happy to announce that Joel
> > Lubinitsky
> > > > has
> > > > > accepted an invitation to become a committer on Apache Arrow.
> > Welcome,
> > > > and
> > > > > thank you for your contributions!
> > > > >
> > > > > --Matt
> > > > >
> > > >
> > >
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.0.0 RC1

2024-03-30 Thread Andrew Lamb
Update here is that Denise did in fact find another regression in 37.0.0
[1]  and I verified the same query works in 36.0.0 [2]

There is a PR[3] up to fix it

Andrew

[1]: https://github.com/apache/arrow-datafusion/issues/9870
[2]:
https://github.com/apache/arrow-datafusion/issues/9870#issuecomment-2028018095
[3]: https://github.com/apache/arrow-datafusion/pull/9871



On Fri, Mar 29, 2024 at 12:09 PM Wayne Xia  wrote:

> +1 (non-binding)
>
> Same here on x86 Linux.
>
> Thanks.
>
> On Fri, Mar 29, 2024 at 7:22 PM Andrew Lamb  wrote:
>
> > +1 (binding)
> >
> > Thank you Andy. I likewise had to up the min stack size.
> >
> > FYI we (InfluxData) are still chasing down what we think may be a
> > regression introduced by the TreeNode refactor [1] for certain queries.
> If
> > this turns out to be the case I will coordinate making a patch release.
> >
> > Andrew
> >
> > [1] https://github.com/apache/arrow-datafusion/pull/8891
> >
> > On Thu, Mar 28, 2024 at 10:49 PM L. C. Hsieh  wrote:
> >
> > > +1 (binding)
> > >
> > > Verified on M3 after setting RUST_MIN_STACK=300.
> > >
> > > Thanks Andy.
> > >
> > > On Thu, Mar 28, 2024 at 7:39 PM Andy Grove 
> > wrote:
> > > >
> > > > Yes, I also saw this (both on Ubuntu and Mac), and I had to set
> > > > RUST_MIN_STACK=300 for the tests to pass.
> > > >
> > > > I filed https://github.com/apache/arrow-datafusion/issues/9848 to
> > > improve
> > > > the release verification documentation to mention this.
> > > >
> > > >
> > > > On Thu, Mar 28, 2024 at 7:59 PM L. C. Hsieh 
> wrote:
> > > >
> > > > > I got the following error when running verify-release-candidate.sh:
> > > > >
> > > > > thread 'tpcds_physical_q54' has overflowed its stack
> > > > > fatal runtime error: stack overflow
> > > > > error: test failed, to rerun pass `-p datafusion --test
> > tpcds_planning`
> > > > >
> > > > >
> > > > > On Thu, Mar 28, 2024 at 4:22 PM Andy Grove 
> > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to propose a release of Apache Arrow DataFusion
> > > > > Implementation,
> > > > > > version 37.0.0.
> > > > > >
> > > > > > This release candidate is based on commit:
> > > > > > 799be5e76bd631608b2357dbbe600afc2cebc359 [1]
> > > > > > The proposed release tarball and signatures are hosted at [2].
> > > > > > The changelog is located at [3].
> > > > > >
> > > > > > Please download, verify checksums and signatures, run the unit
> > > tests, and
> > > > > > vote
> > > > > > on the release. The vote will be open for at least 72 hours.
> > > > > >
> > > > > > Only votes from PMC members are binding, but all members of the
> > > community
> > > > > > are
> > > > > > encouraged to test the release and vote with "(non-binding)".
> > > > > >
> > > > > > The standard verification procedure is documented at
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > > > > > .
> > > > > >
> > > > > > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
> > > > > > [ ] +0
> > > > > > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0
> > > because...
> > > > > >
> > > > > > Here is my vote:
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > *NOTE: I had to set RUST_MIN_STACK=300 for the tests to
> pass.*
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/799be5e76bd631608b2357dbbe600afc2cebc359
> > > > > > [2]:
> > > > > >
> > > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc1
> > > > > > [3]:
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/799be5e76bd631608b2357dbbe600afc2cebc359/CHANGELOG.md
> > > > >
> > >
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 37.0.0 RC1

2024-03-29 Thread Andrew Lamb
+1 (binding)

Thank you Andy. I likewise had to up the min stack size.

FYI we (InfluxData) are still chasing down what we think may be a
regression introduced by the TreeNode refactor [1] for certain queries. If
this turns out to be the case I will coordinate making a patch release.

Andrew

[1] https://github.com/apache/arrow-datafusion/pull/8891

On Thu, Mar 28, 2024 at 10:49 PM L. C. Hsieh  wrote:

> +1 (binding)
>
> Verified on M3 after setting RUST_MIN_STACK=300.
>
> Thanks Andy.
>
> On Thu, Mar 28, 2024 at 7:39 PM Andy Grove  wrote:
> >
> > Yes, I also saw this (both on Ubuntu and Mac), and I had to set
> > RUST_MIN_STACK=300 for the tests to pass.
> >
> > I filed https://github.com/apache/arrow-datafusion/issues/9848 to
> improve
> > the release verification documentation to mention this.
> >
> >
> > On Thu, Mar 28, 2024 at 7:59 PM L. C. Hsieh  wrote:
> >
> > > I got the following error when running verify-release-candidate.sh:
> > >
> > > thread 'tpcds_physical_q54' has overflowed its stack
> > > fatal runtime error: stack overflow
> > > error: test failed, to rerun pass `-p datafusion --test tpcds_planning`
> > >
> > >
> > > On Thu, Mar 28, 2024 at 4:22 PM Andy Grove 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I would like to propose a release of Apache Arrow DataFusion
> > > Implementation,
> > > > version 37.0.0.
> > > >
> > > > This release candidate is based on commit:
> > > > 799be5e76bd631608b2357dbbe600afc2cebc359 [1]
> > > > The proposed release tarball and signatures are hosted at [2].
> > > > The changelog is located at [3].
> > > >
> > > > Please download, verify checksums and signatures, run the unit
> tests, and
> > > > vote
> > > > on the release. The vote will be open for at least 72 hours.
> > > >
> > > > Only votes from PMC members are binding, but all members of the
> community
> > > > are
> > > > encouraged to test the release and vote with "(non-binding)".
> > > >
> > > > The standard verification procedure is documented at
> > > >
> > >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > > > .
> > > >
> > > > [ ] +1 Release this as Apache Arrow DataFusion 37.0.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow DataFusion 37.0.0
> because...
> > > >
> > > > Here is my vote:
> > > >
> > > > +1
> > > >
> > > > *NOTE: I had to set RUST_MIN_STACK=300 for the tests to pass.*
> > > >
> > > > [1]:
> > > >
> > >
> https://github.com/apache/arrow-datafusion/tree/799be5e76bd631608b2357dbbe600afc2cebc359
> > > > [2]:
> > > >
> > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-37.0.0-rc1
> > > > [3]:
> > > >
> > >
> https://github.com/apache/arrow-datafusion/blob/799be5e76bd631608b2357dbbe600afc2cebc359/CHANGELOG.md
> > >
>


Re: [VOTE] Stateless prepared statements in FlightSQL

2024-03-21 Thread Andrew Lamb
+1 (binding)

I reviewed the spec proposal and the rust implementation and I think they
look good to go. I am not as confident on the golang implementation, but
the comments on the Go PR look like there are no objections.

Thank you for your work driving this forward

Andrew

On Wed, Mar 20, 2024 at 10:04 PM Adam C  wrote:

> Hello,
>
> I would like to propose a change to the FlightSQL specification as
> originally described in this Github issue [1] by Andrew Lamb. The
> specification change would allow servers to support prepared
> statements with parameters, without needing to manage state between
> client requests.
>
> There is a draft pull request [2] submitted by me that updates the
> protobuf format and documents the changes made in the FlightSQL
> specification. I have also created draft reference implementations for
> the Go [3] and Rust [4] FlightSQL libraries. These implementations
> will be submitted as pull requests once the proposal is officially
> adopted.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> Cheers,
> Adam Curtis
>
> [1]:
> https://github.com/apache/arrow/issues/37720
> [2]:
> https://github.com/apache/arrow/pull/40243
> [3]:
> https://github.com/apache/arrow/pull/40311
> [4]:
> https://github.com/apache/arrow-rs/pull/5433
>


Re: [ANNOUNCE] New Arrow committer: Bryce Mecum

2024-03-18 Thread Andrew Lamb
Congratulations Bryce!

On Mon, Mar 18, 2024 at 3:35 AM Alenka Frim 
wrote:

> Congratulations Bryce and thank you for all your contributions!!
>
> On Mon, Mar 18, 2024 at 6:43 AM Raúl Cumplido 
> wrote:
>
> > Congratulations Bryce!!!
> >
> > El lun, 18 mar 2024, 5:21, Anja  escribió:
> >
> > > Congrats Bryce! =)
> > >
> > > On Sun, 17 Mar 2024 at 22:23, Nic Crane  wrote:
> > >
> > > > On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum
> has
> > > > accepted an invitation to become a committer on Apache Arrow.
> Welcome,
> > > and
> > > > thank you for your contributions!
> > > >
> > > > Nic
> > > >
> > >
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 51.0.0 RC1

2024-03-15 Thread Andrew Lamb
+1 (binding)

I verified on an M3 mac. Thank you Raphael

Looks like a great release to me. I reviewed the changelog. Given there
were so few breaking changes, it seems like we are very close to being able
to do incremental releases (e.g. 51.1.0) which is a sign of maturity in my
mind.

Andrew

On Fri, Mar 15, 2024 at 3:40 AM Raphael Taylor-Davies
 wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 51.0.0.
>
> This release candidate is based on commit:
> ada986c7ec8f8fe4f94235c8aaeba4995392ee72 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
>
> https://github.com/apache/arrow-rs/tree/ada986c7ec8f8fe4f94235c8aaeba4995392ee72
> [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-51.0.0-rc1
> [3]:
>
> https://github.com/apache/arrow-rs/blob/ada986c7ec8f8fe4f94235c8aaeba4995392ee72/CHANGELOG.md
> [4]:
>
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
>
>


Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-03-11 Thread Andrew Lamb
Update -- turns out there was already a Rust client/server -- linked to the
ticket now

On Mon, Mar 11, 2024 at 3:07 PM Andrew Lamb  wrote:

> I sadly don't have time to help with this directly, however, I did file a
> ticket with the request to help with a Rust prototype [1]. Hopefully we'll
> get a taker
>
> [1] https://github.com/apache/arrow-rs/issues/5496
>
> On Tue, Mar 5, 2024 at 11:03 PM Ian Cook  wrote:
>
>> Update on recent progress in this Arrow-over-HTTP project:
>>
>> I cleaned up the minimal examples of HTTP clients and servers and
>> moved them into a directory in the Arrow Experiments repo:
>> https://github.com/apache/arrow-experiments/tree/main/http
>>
>> So far there are client examples in six languages and server examples
>> in two languages (Python and Go). They all have READMEs describing how
>> to use them.
>>
>> I have an open PR that adds a third server example in Java. Reviews
>> appreciated:
>> https://github.com/apache/arrow-experiments/pull/4
>>
>> I would like to see minimal client and server examples in a few more
>> languages (especially Rust) before we move on to developing richer
>> types of examples. Is anyone interested in contributing additional
>> minimal examples?
>>
>> Thanks,
>> Ian
>>
>> On Wed, Dec 6, 2023 at 2:29 PM Ian Cook  wrote:
>> >
>> > I just remembered that there is an unused "Arrow Experiments" repo [1]
>> > which Wes created a few years ago [2]. That seems like a more
>> > appropriate place to open PRs like this one. If there are no
>> > objections, I will start using that repo for these Arrow-over-HTTP
>> > PRs.
>> >
>> > [1] https://github.com/apache/arrow-experiments
>> > [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
>> >
>> > Ian
>> >
>> > On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
>> > >
>> > > Antoine,
>> > >
>> > > Thank you for taking a look. I agree—these are basic examples intended
>> > > to prove the concept and answer fundamental questions. Next I intend
>> > > to expand the set of examples to cover more complex cases.
>> > >
>> > > > This might necessitate some kind of framing layer, or a
>> > > > standardized delimiter.
>> > >
>> > > I am interested to hear more perspectives on this. My perspective is
>> > > that we should recommend using HTTP conventions to keep clean
>> > > separation between the Arrow-formatted binary data payloads and the
>> > > various application-specific fields. This can be achieved by encoding
>> > > application-specific fields in URI paths, query parameters, headers,
>> > > or separate parts of multipart/form-data messages.
>> > >
>> > > Ian
>> > >
>> > > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou 
>> wrote:
>> > > >
>> > > >
>> > > > Hi,
>> > > >
>> > > > While this looks like a nice start, I would expect more precise
>> > > > recommendations for writing non-trivial services. Especially, one
>> > > > question is how to send both an application-specific POST request
>> and an
>> > > > Arrow stream, or an application-specific GET response and an Arrow
>> > > > stream. This might necessitate some kind of framing layer, or a
>> > > > standardized delimiter.
>> > > >
>> > > > Regards
>> > > >
>> > > > Antoine.
>> > > >
>> > > >
>> > > >
>> > > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
>> > > > > This is a continuation of the discussion entitled "[DISCUSS]
>> Protocol for
>> > > > > exchanging Arrow data over REST APIs". See the previous messages
>> at
>> > > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
>> > > > >
>> > > > > To inform this discussion, I created some basic Arrow-over-HTTP
>> client and
>> > > > > server examples here:
>> > > > > https://github.com/apache/arrow/pull/39081
>> > > > >
>> > > > > My intention is to expand and improve this set of examples (with
>> your help)
>> > > > > until they reflect a set of conventions that we are comfortable
>> documenting
>> > > > > as recommendations.
>>

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-03-11 Thread Andrew Lamb
I sadly don't have time to help with this directly, however, I did file a
ticket with the request to help with a Rust prototype [1]. Hopefully we'll
get a taker

[1] https://github.com/apache/arrow-rs/issues/5496

On Tue, Mar 5, 2024 at 11:03 PM Ian Cook  wrote:

> Update on recent progress in this Arrow-over-HTTP project:
>
> I cleaned up the minimal examples of HTTP clients and servers and
> moved them into a directory in the Arrow Experiments repo:
> https://github.com/apache/arrow-experiments/tree/main/http
>
> So far there are client examples in six languages and server examples
> in two languages (Python and Go). They all have READMEs describing how
> to use them.
>
> I have an open PR that adds a third server example in Java. Reviews
> appreciated:
> https://github.com/apache/arrow-experiments/pull/4
>
> I would like to see minimal client and server examples in a few more
> languages (especially Rust) before we move on to developing richer
> types of examples. Is anyone interested in contributing additional
> minimal examples?
>
> Thanks,
> Ian
>
> On Wed, Dec 6, 2023 at 2:29 PM Ian Cook  wrote:
> >
> > I just remembered that there is an unused "Arrow Experiments" repo [1]
> > which Wes created a few years ago [2]. That seems like a more
> > appropriate place to open PRs like this one. If there are no
> > objections, I will start using that repo for these Arrow-over-HTTP
> > PRs.
> >
> > [1] https://github.com/apache/arrow-experiments
> > [2] https://lists.apache.org/thread/cw14s874pwplzf9ycnvfwtwq0xq17npg
> >
> > Ian
> >
> > On Wed, Dec 6, 2023 at 1:45 PM Ian Cook  wrote:
> > >
> > > Antoine,
> > >
> > > Thank you for taking a look. I agree—these are basic examples intended
> > > to prove the concept and answer fundamental questions. Next I intend
> > > to expand the set of examples to cover more complex cases.
> > >
> > > > This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > >
> > > I am interested to hear more perspectives on this. My perspective is
> > > that we should recommend using HTTP conventions to keep clean
> > > separation between the Arrow-formatted binary data payloads and the
> > > various application-specific fields. This can be achieved by encoding
> > > application-specific fields in URI paths, query parameters, headers,
> > > or separate parts of multipart/form-data messages.
> > >
> > > Ian
> > >
> > > On Wed, Dec 6, 2023 at 1:24 PM Antoine Pitrou 
> wrote:
> > > >
> > > >
> > > > Hi,
> > > >
> > > > While this looks like a nice start, I would expect more precise
> > > > recommendations for writing non-trivial services. Especially, one
> > > > question is how to send both an application-specific POST request
> and an
> > > > Arrow stream, or an application-specific GET response and an Arrow
> > > > stream. This might necessitate some kind of framing layer, or a
> > > > standardized delimiter.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> > > > Le 05/12/2023 à 21:10, Ian Cook a écrit :
> > > > > This is a continuation of the discussion entitled "[DISCUSS]
> Protocol for
> > > > > exchanging Arrow data over REST APIs". See the previous messages at
> > > > > https://lists.apache.org/thread/vfz74gv1knnhjdkro47shzd1z5g5ggnf.
> > > > >
> > > > > To inform this discussion, I created some basic Arrow-over-HTTP
> client and
> > > > > server examples here:
> > > > > https://github.com/apache/arrow/pull/39081
> > > > >
> > > > > My intention is to expand and improve this set of examples (with
> your help)
> > > > > until they reflect a set of conventions that we are comfortable
> documenting
> > > > > as recommendations.
> > > > >
> > > > > Please take a look and add comments / suggestions in the PR.
> > > > >
> > > > > Thanks,
> > > > > Ian
> > > > >
> > > > > On Tue, Nov 21, 2023 at 1:35 PM Dewey Dunnington
> > > > >  wrote:
> > > > >
> > > > >> I also think a set of best practices for Arrow over HTTP would be
> a
> > > > >> valuable resource for the community...even if it never becomes a
> > > > >> specification of its own, it will be beneficial for API
> developers and
> > > > >> consumers of those APIs to have a place to look to understand how
> > > > >> Arrow can help improve throughput/latency/maybe other things.
> Possibly
> > > > >> something like httpbin.org but for requests/responses that use
> Arrow
> > > > >> would be helpful as well. Thank you Ian for leading this effort!
> > > > >>
> > > > >> It has mostly been covered already, but in the (ubiquitous)
> situation
> > > > >> where a response contains some schema/table and some
> non-schema/table
> > > > >> information there is some tension between throughput (best served
> by a
> > > > >> JSON response plus one or more IPC stream responses) and latency
> (best
> > > > >> served by a single HTTP response? JSON? IPC with
> metadata/header?). In
> > > > >> addition to Antoine's list, I would add:
> > > > >>
> > > > >> - How to serve the same table in multiple 

Re: [DISCUSS] [FlightSQL] Supporting parameters and prepared statements with a stateless server

2024-03-11 Thread Andrew Lamb
We are working to incorporate all the feedback we have received so far --
thank you to everyone who commented. If you would like to participate in
this discussion, please leave your comments in github.

I expect we'll call a formal vote later in the week

Andrew

On Sun, Mar 3, 2024 at 3:26 AM Adam C  wrote:

> Hello,
>
> We would like to support prepared statements with bind parameters with a
> stateless service. This was discussed previously on the mailing list [1].
> The original ticket outlining the proposed design can be found here [2].
>
> I have prepared a specific proposal and would like feedback in preparation
> of calling for a formal vote. Here is a PR with the proposed spec change
> [3].
>
> Here are PRs with implementations in two languages: Go [4] and Rust [5].
>
> Please let me know your thoughts.
>
> [1]:
> https://lists.apache.org/thread/f0xb61z4yw611rw0v8vf9rht0qtq8opc
> [2]:
> https://github.com/apache/arrow/issues/37720
> [3]:
> https://github.com/apache/arrow/pull/40243
> [4]:
> https://github.com/apache/arrow/pull/40311
> [5]:
> https://github.com/apache/arrow-rs/pull/5433
>


[RESULT] [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-09 Thread Andrew Lamb
With 30 +1 votes (18 binding by my count), and 0 -1 votes, the proposal is
approved

Thank you everyone who voted and participated in helping form the proposal
(and the community over the last few years). I think there are exciting
times ahead for DataFusion.

We will submit the proposed motion to the ASF board with our next quarterly
report, for the April 2024 meeting, as previously discussed.

Thanks again
Andrew



On Tue, Mar 5, 2024 at 2:00 PM Benjamin Kietzman 
wrote:

> +1 (binding)
>
> On Tue, Mar 5, 2024, 13:03 Peter Toth  wrote:
>
> > +1 (non-binding)
> >
> > Parth Chandra  ezt írta (időpont: 2024. márc. 5., K,
> > 18:15):
> >
> > > +1 (non-binding)
> > >
> > > On Sun, Mar 3, 2024 at 8:04 PM Mehmet Ozan Kabak 
> > wrote:
> > >
> > > > +1 (non-binding)
> > > > --
> > > > *Mehmet Ozan Kabak*
> > > > Co-founder & CEO @ Synnada, Inc.
> > > >
> > > >
> > > > On Sun, Mar 3, 2024 at 7:33 PM Jacob Wujciak-Jens
> > > >  wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Mon, Mar 4, 2024 at 3:39 AM Yang Jiang 
> > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > On 2024/03/01 18:08:26 Daniël Heres wrote:
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > On Fri, Mar 1, 2024, 19:05 Chao Sun 
> wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > On Fri, Mar 1, 2024 at 9:59 AM QP Hou 
> > wrote:
> > > > > > > >
> > > > > > > > > +1 (binding)
> > > > > > > > >
> > > > > > > > > exciting milestone :)
> > > > > > > > >
> > > > > > > > > On Fri, Mar 1, 2024 at 9:49 AM David Li <
> lidav...@apache.org
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > > > On Fri, Mar 1, 2024, at 12:06, Jorge Cardoso Leitão
> wrote:
> > > > > > > > > > > +1 - great work!!!
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Mar 1, 2024 at 5:49 PM Micah Kornfield <
> > > > > > > > emkornfi...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> +1 (binding)
> > > > > > > > > > >>
> > > > > > > > > > >> On Friday, March 1, 2024, Uwe L. Korn <
> uw...@xhochy.com
> > >
> > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > +1 (binding)
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote:
> > > > > > > > > > >> > > +1 (binding)
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace <
> > > > > > > > weston.p...@gmail.com
> > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >> +1 (binding)
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb <
> > > > > > > > al...@influxdata.com
> > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> > Hello,
> > > > > > > > > > >> > >> >
> > > > > > > > > > >> > >> > As we have discussed[1][2] I would like to vote
> > on
> > > > the
> > > > &

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-03-06 Thread Andrew Lamb
The blog post is now live at
https://arrow.apache.org/blog/2024/03/06/comet-donation/

On Thu, Feb 29, 2024 at 9:32 AM Andrew Lamb  wrote:

> In case anyone is interested, we are working on a blog post related to
> this donation here [1]. All feedback more than welcome.
>
> [1] https://github.com/apache/arrow-site/pull/479
>
> On Mon, Feb 12, 2024 at 1:37 AM Chao Sun  wrote:
> >
> > Thank you all for the great support and interest on this project!
> >
> > On Sun, Feb 11, 2024 at 12:51 PM Wes McKinney 
> wrote:
> > >
> > > Congrats all! It's great to see the Arrow+DataFusion ecosystem expand
> in
> > > this way and to bring the work under the ASF umbrella.
> > >
> > > On Sun, Feb 11, 2024 at 5:02 AM Andrew Lamb 
> wrote:
> > >
> > > > As a follow up here the acceptance vote [1] has passed, the IP
> Clearance
> > > > Process is complete [2] and the code PR is merged[3]!
> > > >
> > > > It is a very exciting time! Congratulations to all involved
> > > >
> > > > Andrew
> > > >
> > > > [1]:
> https://lists.apache.org/thread/cyfyb96sssmpr73hhm7vh8jcdjbz8rsp
> > > > [2]: https://github.com/apache/arrow-datafusion-comet/pull/2
> > > > [3]: https://github.com/apache/arrow-datafusion-comet/pull/1
> > > >
> > > > On Wed, Jan 24, 2024 at 1:53 PM Jacques Nadeau 
> wrote:
> > > >
> > > > > For those that are interested wrt lang types/lines...
> > > > >
> > > > >
> > > > >
> > > >
> 
> > > > > Language  files  blankcomment
> > > > > code
> > > > >
> > > > >
> > > >
> 
> > > > > Rust 69   2701   2548
> > > > >  17154
> > > > > Scala69   2098   2595
> > > > >  12991
> > > > > Java 41926   1521
> > > > > 5505
> > > > > Maven 4 71156
> > > > > 1228
> > > > > Protocol Buffers  3 96 65
> > > > >  417
> > > > > XML   3 80 99
> > > > >  256
> > > > > Markdown  5 69 80
> > > > >  190
> > > > > TOML  2 14 38
> > > > >   90
> > > > > Bourne Shell  1  9 39
> > > > >   65
> > > > > make  1  5  1
> > > > >   62
> > > > > Bourne Again Shell1 12 16
> > > > >   56
> > > > > YAML  2  5 38
> > > > >   34
> > > > > Properties2  8 42
> > > > >   26
> > > > > SQL   1  0  0
> > > > >9
> > > > >
> > > > >
> > > >
> 
> > > > > SUM:204   6094   7238
> > > > >  38083
> > > > >
> > > > >
> > > >
> 
> > > > >
> > > > > On Wed, Jan 24, 2024 at 8:30 AM Chao Sun 
> wrote:
> > > > >
> > > > > > Thanks Jacques and everyone here for the feedback! We just
> created a
> > > > > > PR https://github.com/apache/arrow-datafusion-comet/pull/1 for
> the
> > > > > > donation vote and IP clearance. Please take a look there and
> provide
> > > > > > your valuable comments.
> > > > > >
> > > > > > Best,
> > > > > > Chao
> > > > > >
> > > > > > On Thu, Jan 18, 2024 at 5:24 PM Jacques Nadeau <
> jacq...@apache.org&

Re: [DISCUSS] [FlightSQL] Supporting parameters and prepared statements with a stateless server

2024-03-03 Thread Andrew Lamb
Thanks a lot for pulling this together Adam.

I likewise took a look at the proposal and I think it looks great
(disclaimer, I wrote the original issue, so I am not unbiased). I
reviewed the spec proposal and the Rust implementation. If no one has
time to review the go implementation, I can take a look at that later
in the week

In terms of process, I suggest we leave this thread open for a week to
permit time for comments, and assuming there are no unresolved
concerns, we start a formal vote thread to change the spec.

Andrew

On Sat, Mar 2, 2024 at 1:24 PM David Li  wrote:
>
> The proposal looks good to me. Thanks!
>
> On Fri, Mar 1, 2024, at 20:11, Adam Curtis wrote:
> > Hello,
> >
> > We would like to support prepared statements with bind parameters with
> > a stateless service. This was discussed previously on the mailing list
> > [1]. The original ticket outlining the proposed design can be found
> > here [2].
> >
> > I have prepared a specific proposal and would like feedback in
> > preparation of calling for a formal vote. Here is a PR with the
> > proposed spec change [3].
> >
> > Here are PRs with implementations in two languages: Go [4] and Rust [5].
> >
> > Please let me know your thoughts.
> >
> > [1]:
> > https://lists.apache.org/thread/f0xb61z4yw611rw0v8vf9rht0qtq8opc
> > [2]:
> > https://github.com/apache/arrow/issues/37720
> > [3]:
> > https://github.com/apache/arrow/pull/40243
> > [4]:
> > https://github.com/apache/arrow/pull/40311
> > [5]:
> > https://github.com/apache/arrow-rs/pull/5433


Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.9.1 RC1

2024-03-01 Thread Andrew Lamb
+1 (binding)

Verified on M3 mac

Thank you Raphael

Andrew

On Fri, Mar 1, 2024 at 2:45 AM L. C. Hsieh  wrote:
>
> +1 (binding)
>
> Verified on M1 Mac.
>
> Thanks Raphael.
>
>
> On Thu, Feb 29, 2024 at 11:10 PM Raphael Taylor-Davies
>  wrote:
> >
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Object
> > Store Implementation, version 0.9.1.
> >
> > This release candidate is based on commit:
> > 30151220c29fa5e01365c2a4e153de01d5d2c041 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust Object Store
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust Object Store because...
> >
> > [1]:
> > https://github.com/apache/arrow-rs/tree/30151220c29fa5e01365c2a4e153de01d5d2c041
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.9.1-rc1
> > [3]:
> > https://github.com/apache/arrow-rs/blob/30151220c29fa5e01365c2a4e153de01d5d2c041/object_store/CHANGELOG.md
> > [4]:
> > https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
> >


[VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Andrew Lamb
Hello,

As we have discussed[1][2] I would like to vote on the proposal to
create a new Apache Top Level Project for DataFusion. The text of the
proposed resolution and background document is copy/pasted below

If the community is in favor of this, we plan to submit the resolution
to the ASF board for approval with the next Arrow report (for the
April 2024 board meeting).

The vote will be open for at least 7 days.

[ ] +1 Accept this Proposal
[ ] +0
[ ] -1 Do not accept this proposal because...

Andrew

[1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
[2] https://github.com/apache/arrow-datafusion/discussions/6475

-- Proposed Resolution -

Resolution to Create the Apache DataFusion Project from the Apache
Arrow DataFusion Sub Project

=

X. Establish the Apache DataFusion Project

WHEREAS, the Board of Directors deems it to be in the best
interests of the Foundation and consistent with the
Foundation's purpose to establish a Project Management
Committee charged with the creation and maintenance of
open-source software related to an extensible query engine
for distribution at no charge to the public.

NOW, THEREFORE, BE IT RESOLVED, that a Project Management
Committee (PMC), to be known as the "Apache DataFusion Project",
be and hereby is established pursuant to Bylaws of the
Foundation; and be it further

RESOLVED, that the Apache DataFusion Project be and hereby is
responsible for the creation and maintenance of software
related to an extensible query engine; and be it further

RESOLVED, that the office of "Vice President, Apache DataFusion" be
and hereby is created, the person holding such office to
serve at the direction of the Board of Directors as the chair
of the Apache DataFusion Project, and to have primary responsibility
for management of the projects within the scope of
responsibility of the Apache DataFusion Project; and be it further

RESOLVED, that the persons listed immediately below be and
hereby are appointed to serve as the initial members of the
Apache DataFusion Project:

* Andy Grove (agr...@apache.org)
* Andrew Lamb (al...@apache.org)
* Daniël Heres (dhe...@apache.org)
* Jie Wen (jake...@apache.org)
* Kun Liu (liu...@apache.org)
* Liang-Chi Hsieh (vii...@apache.org)
* Qingping Hou: (ho...@apache.org)
* Wes McKinney(w...@apache.org)
* Will Jones (wjones...@apache.org)

RESOLVED, that the Apache DataFusion Project be and hereby
is tasked with the migration and rationalization of the Apache
Arrow DataFusion sub-project; and be it further

RESOLVED, that all responsibilities pertaining to the Apache
Arrow DataFusion sub-project encumbered upon the
Apache Arrow Project are hereafter discharged.

NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb
be appointed to the office of Vice President, Apache DataFusion, to
serve in accordance with and subject to the direction of the
Board of Directors and the Bylaws of the Foundation until
death, resignation, retirement, removal or disqualification,
or until a successor is appointed.
=


---


Summary:

We propose creating a new top level project, Apache DataFusion, from
an existing sub project of Apache Arrow to facilitate additional
community and project growth.

Abstract

Apache Arrow DataFusion[1]  is a very fast, extensible query engine
for building high-quality data-centric systems in Rust, using the
Apache Arrow in-memory format. DataFusion offers SQL and Dataframe
APIs, excellent performance, built-in support for CSV, Parquet, JSON,
and Avro, extensive customization, and a great community.

[1] https://arrow.apache.org/datafusion/


Proposal

We propose creating a new top level ASF project, Apache DataFusion,
governed initially by a subset of the Apache Arrow project’s PMC and
committers. The project’s code is in five existing git repositories,
currently governed by Apache Arrow which would transfer to the new top
level project.

Background

When DataFusion was initially donated to the Arrow project, it did not
have a strong enough community to stand on its own. It has since grown
significantly, and benefited immensely from being part of Arrow and
nurturing of the Apache Way, and now has a community strong enough to
stand on its own and that would benefit from focused governance
attention.

The community has discussed this idea publicly for more than 6 months
https://github.com/apache/arrow-datafusion/discussions/6475  and
briefly on the Arrow PMC mailing list
https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As
of the time of this writing both had exclusively positive reactions.

Several current members of the Arrow PMC are both active contributors
to DataFusion and understand and believe deeply in the Apache Way, and
play active governance roles in the Arrow project as PMC members and
PMC chairs, guiding the community, and releasi

Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-03-01 Thread Andrew Lamb
Thank you everyone for your help and support on this document. Other
edits to the document were:
1. Added existing arrow committer and DataFusion / arrow-rs
contributor  yangjiang Yang Jiang (ebay) to proposed committer list
(was not included initially inadvertently)
2. Add Wes to the proposed PMC

I will now start an official vote thread. Thank you all very much

Andrew

On Thu, Feb 29, 2024 at 5:36 AM Andrew Lamb  wrote:
>
> Thank you Wes. I have added your name to the proposed PMC.
>
> On Wed, Feb 28, 2024 at 4:08 PM Wes McKinney  wrote:
> >
> > I'd be happy to help. I think we will have to participate in PMC matters 
> > infrequently (should there be a difficult issue in the future, we could 
> > offer some perspective from cases in the past).
> >
> > On Wed, Feb 28, 2024 at 2:13 PM Andrew Lamb  wrote:
> >>
> >> Wes brought up a great point on the document[1] that I wanted to discuss 
> >> here more broadly:
> >>
> >> > Others may point out that (I think) you don't have any ASF Members on 
> >> > your initial PMC. When we started Arrow, we had several veteran ASF 
> >> > members on our initial PMC who haven't been very active in the project 
> >> > otherwise. If you wanted Jacques or I (both Members), for example, to 
> >> > serve on the PMC in that capacity we would likely be happy to do that.
> >>
> >> I personally think having ASF Member(s) [2] on the PMC would be most 
> >> helpful to connect us to the larger organization and would like to add Wes 
> >> and or Jacques if they are willing to do so (are you Wes / Jacques)?
> >>
> >> If there are no concerns and Wes / Jacques are willing I will add their 
> >> names to the proposed initial PMC.
> >>
> >> Andrew
> >>
> >> [1] 
> >> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit?disco=AAABH2b6I88
> >> [2] https://www.apache.org/foundation/members
> >>
> >> On Mon, Feb 26, 2024 at 5:10 PM Andrew Lamb  wrote:
> >>>
> >>> An update:
> >>>
> >>> I have updated the proposal [1] with additional information (new 
> >>> committers Jeffrey Vo and Jay Zhan, and the new datafusion-comet 
> >>> repository)
> >>>
> >>> I plan to:
> >>> 1. Call for a formal vote on this (dev@arrow.apache.org) mailing list 
> >>> this Friday March 2
> >>> 2. If the vote passes, submit the proposal to the ASF board as part of 
> >>> the April 2024 Arrow report.
> >>>
> >>> This extended timeline is designed to balance the needs of some 
> >>> contributors to prepare for the changed structure with their employers.
> >>>
> >>> Full Details can be found on [2].
> >>>
> >>> Thank you,
> >>> Andrew
> >>>
> >>> [1] 
> >>> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit
> >>> [2] https://github.com/apache/arrow-datafusion/discussions/6475
> >>>
> >>> On Fri, Jan 5, 2024 at 11:19 AM Andrew Lamb  wrote:
> >>>>
> >>>> Thank you very much
> >>>>
> >>>> On Fri, Jan 5, 2024 at 11:17 AM Jean-Baptiste Onofré  
> >>>> wrote:
> >>>>>
> >>>>> Hi Andrew,
> >>>>>
> >>>>> The PODLINGNAMESEARCH is not yet completed: the VP Brand Management
> >>>>> (Mark Thomas) should comment in the Jira to approve or not the name.
> >>>>>
> >>>>> I added a comment in the Jira to ping Mark. He should get back to us 
> >>>>> soon.
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>> On Fri, Jan 5, 2024 at 3:38 PM Andrew Lamb  wrote:
> >>>>> >
> >>>>> > Thanks JB,
> >>>>> >
> >>>>> > I did do a name search and posted the results here [1]
> >>>>> >
> >>>>> > However, I am not sure what the next steps for that particular 
> >>>>> > process is
> >>>>> > (like does someone have to approve it, for example?)
> >>>>> >
> >>>>> > Any insight you could provide would be greatly appreciated
> >>>>> >
> >>>>> > Andrew
> >>>>> >
> >>>>> > [1] https://issues.apache.org/jira/brows

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-29 Thread Andrew Lamb
In case anyone is interested, we are working on a blog post related to
this donation here [1]. All feedback more than welcome.

[1] https://github.com/apache/arrow-site/pull/479

On Mon, Feb 12, 2024 at 1:37 AM Chao Sun  wrote:
>
> Thank you all for the great support and interest on this project!
>
> On Sun, Feb 11, 2024 at 12:51 PM Wes McKinney  wrote:
> >
> > Congrats all! It's great to see the Arrow+DataFusion ecosystem expand in
> > this way and to bring the work under the ASF umbrella.
> >
> > On Sun, Feb 11, 2024 at 5:02 AM Andrew Lamb  wrote:
> >
> > > As a follow up here the acceptance vote [1] has passed, the IP Clearance
> > > Process is complete [2] and the code PR is merged[3]!
> > >
> > > It is a very exciting time! Congratulations to all involved
> > >
> > > Andrew
> > >
> > > [1]: https://lists.apache.org/thread/cyfyb96sssmpr73hhm7vh8jcdjbz8rsp
> > > [2]: https://github.com/apache/arrow-datafusion-comet/pull/2
> > > [3]: https://github.com/apache/arrow-datafusion-comet/pull/1
> > >
> > > On Wed, Jan 24, 2024 at 1:53 PM Jacques Nadeau  wrote:
> > >
> > > > For those that are interested wrt lang types/lines...
> > > >
> > > >
> > > >
> > > 
> > > > Language  files  blankcomment
> > > > code
> > > >
> > > >
> > > 
> > > > Rust 69   2701   2548
> > > >  17154
> > > > Scala69   2098   2595
> > > >  12991
> > > > Java 41926   1521
> > > > 5505
> > > > Maven 4 71156
> > > > 1228
> > > > Protocol Buffers  3 96 65
> > > >  417
> > > > XML   3 80 99
> > > >  256
> > > > Markdown  5 69 80
> > > >  190
> > > > TOML  2 14 38
> > > >   90
> > > > Bourne Shell  1  9 39
> > > >   65
> > > > make  1  5  1
> > > >   62
> > > > Bourne Again Shell1 12 16
> > > >   56
> > > > YAML  2  5 38
> > > >   34
> > > > Properties2  8 42
> > > >   26
> > > > SQL   1  0  0
> > > >9
> > > >
> > > >
> > > 
> > > > SUM:204   6094   7238
> > > >  38083
> > > >
> > > >
> > > 
> > > >
> > > > On Wed, Jan 24, 2024 at 8:30 AM Chao Sun  wrote:
> > > >
> > > > > Thanks Jacques and everyone here for the feedback! We just created a
> > > > > PR https://github.com/apache/arrow-datafusion-comet/pull/1 for the
> > > > > donation vote and IP clearance. Please take a look there and provide
> > > > > your valuable comments.
> > > > >
> > > > > Best,
> > > > > Chao
> > > > >
> > > > > On Thu, Jan 18, 2024 at 5:24 PM Jacques Nadeau 
> > > > wrote:
> > > > > >
> > > > > > Yes, that was roughly what I was requesting (I was suggesting a
> > > single
> > > > PR
> > > > > > with many commits that would be merged with the history).
> > > > > >
> > > > > > It's hard to provide a more concrete opinion on this without seeing
> > > the
> > > > > > quantity and complexity of the code. If it's 5,000 lines of code, it
> > > > > > probably doesn't matter. If it's 500,000, it's probably pretty
> > > > important.
> > > > > > If 10 active Arrow/Datafu

Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-02-29 Thread Andrew Lamb
Thank you Wes. I have added your name to the proposed PMC.

On Wed, Feb 28, 2024 at 4:08 PM Wes McKinney  wrote:
>
> I'd be happy to help. I think we will have to participate in PMC matters 
> infrequently (should there be a difficult issue in the future, we could offer 
> some perspective from cases in the past).
>
> On Wed, Feb 28, 2024 at 2:13 PM Andrew Lamb  wrote:
>>
>> Wes brought up a great point on the document[1] that I wanted to discuss 
>> here more broadly:
>>
>> > Others may point out that (I think) you don't have any ASF Members on your 
>> > initial PMC. When we started Arrow, we had several veteran ASF members on 
>> > our initial PMC who haven't been very active in the project otherwise. If 
>> > you wanted Jacques or I (both Members), for example, to serve on the PMC 
>> > in that capacity we would likely be happy to do that.
>>
>> I personally think having ASF Member(s) [2] on the PMC would be most helpful 
>> to connect us to the larger organization and would like to add Wes and or 
>> Jacques if they are willing to do so (are you Wes / Jacques)?
>>
>> If there are no concerns and Wes / Jacques are willing I will add their 
>> names to the proposed initial PMC.
>>
>> Andrew
>>
>> [1] 
>> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit?disco=AAABH2b6I88
>> [2] https://www.apache.org/foundation/members
>>
>> On Mon, Feb 26, 2024 at 5:10 PM Andrew Lamb  wrote:
>>>
>>> An update:
>>>
>>> I have updated the proposal [1] with additional information (new committers 
>>> Jeffrey Vo and Jay Zhan, and the new datafusion-comet repository)
>>>
>>> I plan to:
>>> 1. Call for a formal vote on this (dev@arrow.apache.org) mailing list this 
>>> Friday March 2
>>> 2. If the vote passes, submit the proposal to the ASF board as part of the 
>>> April 2024 Arrow report.
>>>
>>> This extended timeline is designed to balance the needs of some 
>>> contributors to prepare for the changed structure with their employers.
>>>
>>> Full Details can be found on [2].
>>>
>>> Thank you,
>>> Andrew
>>>
>>> [1] 
>>> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit
>>> [2] https://github.com/apache/arrow-datafusion/discussions/6475
>>>
>>> On Fri, Jan 5, 2024 at 11:19 AM Andrew Lamb  wrote:
>>>>
>>>> Thank you very much
>>>>
>>>> On Fri, Jan 5, 2024 at 11:17 AM Jean-Baptiste Onofré  
>>>> wrote:
>>>>>
>>>>> Hi Andrew,
>>>>>
>>>>> The PODLINGNAMESEARCH is not yet completed: the VP Brand Management
>>>>> (Mark Thomas) should comment in the Jira to approve or not the name.
>>>>>
>>>>> I added a comment in the Jira to ping Mark. He should get back to us soon.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Fri, Jan 5, 2024 at 3:38 PM Andrew Lamb  wrote:
>>>>> >
>>>>> > Thanks JB,
>>>>> >
>>>>> > I did do a name search and posted the results here [1]
>>>>> >
>>>>> > However, I am not sure what the next steps for that particular process 
>>>>> > is
>>>>> > (like does someone have to approve it, for example?)
>>>>> >
>>>>> > Any insight you could provide would be greatly appreciated
>>>>> >
>>>>> > Andrew
>>>>> >
>>>>> > [1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>>>> >
>>>>> >
>>>>> > On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré  
>>>>> > wrote:
>>>>> >
>>>>> > > Hi Andrew,
>>>>> > >
>>>>> > > I did a quick review on the doc and it looks good to me. I just added
>>>>> > > a question about name search (DataFusion will probably work as TLP,
>>>>> > > but we have to check as we have a new Apache name moving from Arrow
>>>>> > > DataFusion to DataFusion).
>>>>> > >
>>>>> > > Please let me know if I can help on that.
>>>>> > >
>>>>> > > Thanks !
>>>>> > > Regards
>>>>> > > JB
>>>>> > >
>>>&g

Re: [VOTE] Flight RPC: add 'fallback' URI scheme

2024-02-28 Thread Andrew Lamb
+1


On Tue, Feb 27, 2024 at 9:06 AM David Li  wrote:

> I would like to propose a 'reuse connection' URI scheme for Flight RPC.
> This proposal was previously discussed at [1]. A candidate implementation
> for C++, Java, and Go is at [2].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1
> [ ] +0
> [ ] -1 Do not accept this proposal because...
>
> [1]: https://lists.apache.org/thread/pc9fs0hf8t5ylj9os00r9vg8d2xv2npz
> [2]: https://github.com/apache/arrow/pull/40084
>
> On Tue, Feb 20, 2024, at 14:14, David Li wrote:
> > Thanks for the comments - I've updated the implementation [1] and added
> > Go + integration tests. If this all checks out I'd like to start a vote
> > soon.
> >
> > [1]: https://github.com/apache/arrow/pull/40084
> >
> > On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote:
> >> Thank you -- I think the usecase is great, but agree with the other
> >> reviewers that the name may be confusing. I left some notes on the
> ticket
> >>
> >> Andrew
> >>
> >> On Wed, Feb 14, 2024 at 3:52 PM David Li  wrote:
> >>
> >>> I've put up a candidate implementation sans integration test [1].
> >>>
> >>> Some caveats:
> >>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or
> 'scheme://?'
> >>> (yes, an empty query string pacifies it). I've chosen the latter since
> the
> >>> former is technically a URI with a non-empty path but neither are
> ideal.
> >>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
> >>> faithful to the intended use than 'fallback'.
> >>>
> >>> [1]: https://github.com/apache/arrow/pull/40084
> >>>
> >>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
> >>> > Hi David,
> >>> >
> >>> > It's reasonable. I think we can start with your initial proposal (it
> >>> > sounds fine to me) and we can always improve step by step.
> >>> >
> >>> > Thanks !
> >>> > Regards
> >>> > JB
> >>> >
> >>> > On Tue, Feb 13, 2024 at 4:53 PM David Li 
> wrote:
> >>> >>
> >>> >> I'm going to keep the proposal as-is then. It can be extended if
> this
> >>> use case comes up.
> >>> >>
> >>> >> I'll start work on candidate implementations now.
> >>> >>
> >>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
> >>> >> > I think the original proposal is sufficient.
> >>> >> >
> >>> >> > Also, it is not obvious to me how one would switch from e.g.
> grpc+tls
> >>> to
> >>> >> > http without an explicit server location (unless both Flight
> servers
> >>> are
> >>> >> > hosted under the same port?). So the "+" proposal seems a bit
> weird.
> >>> >> >
> >>> >> >
> >>> >> > Le 12/02/2024 à 23:39, David Li a écrit :
> >>> >> >> The idea is that the client would reuse the existing connection,
> in
> >>> which case the protocol and such are implicit. (If the client doesn't
> have
> >>> a connection anymore, it can't use the fallback anyways.)
> >>> >> >>
> >>> >> >> I suppose this has the advantage that you could "fall back" to a
> >>> known hostname with a different protocol, but I'm not sure that always
> >>> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
> >>> addresses aren't hostnames but rather opaque byte blobs, for instance.)
> >>> >> >>
> >>> >> >> If we do prefer this, to avoid overloading the hostname, there's
> >>> also the informal convention of using + in the scheme, so it could be
> >>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
> >>> >> >>
> >>> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
> >>> >> >>> Thanks for clarifying.
> >>> >> >>>
> >>> >> >>> Given the relationship between these two proposals, would it
> also be
> >>> >> >>> necessary to distinguish the scheme (or schemes) supported by
> the
> >>> >> >>> originating Flight RPC service?
> >>> >> >>>
> >>> >>

Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-02-28 Thread Andrew Lamb
Wes brought up a great point on the document[1] that I wanted to discuss
here more broadly:

> Others may point out that (I think) you don't have any ASF Members on
your initial PMC. When we started Arrow, we had several veteran ASF members
on our initial PMC who haven't been very active in the project otherwise.
If you wanted Jacques or I (both Members), for example, to serve on the PMC
in that capacity we would likely be happy to do that.

I personally think having ASF Member(s) [2] on the PMC would be most
helpful to connect us to the larger organization and would like to add Wes
and or Jacques if they are willing to do so (are you Wes / Jacques)?

If there are no concerns and Wes / Jacques are willing I will add their
names to the proposed initial PMC.

Andrew

[1]
https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit?disco=AAABH2b6I88
[2] https://www.apache.org/foundation/members

On Mon, Feb 26, 2024 at 5:10 PM Andrew Lamb  wrote:

> An update:
>
> I have updated the proposal [1] with additional information (new
> committers Jeffrey Vo and Jay Zhan, and the new datafusion-comet repository)
>
> I plan to:
> 1. Call for a formal vote on this (dev@arrow.apache.org) mailing list
> this Friday March 2
> 2. If the vote passes, submit the proposal to the ASF board as part of the
> April 2024 Arrow report.
>
> This extended timeline is designed to balance the needs of some
> contributors to prepare for the changed structure with their employers.
>
> Full Details can be found on [2].
>
> Thank you,
> Andrew
>
> [1]
> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit
> [2] https://github.com/apache/arrow-datafusion/discussions/6475
>
> On Fri, Jan 5, 2024 at 11:19 AM Andrew Lamb  wrote:
>
>> Thank you very much
>>
>> On Fri, Jan 5, 2024 at 11:17 AM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Andrew,
>>>
>>> The PODLINGNAMESEARCH is not yet completed: the VP Brand Management
>>> (Mark Thomas) should comment in the Jira to approve or not the name.
>>>
>>> I added a comment in the Jira to ping Mark. He should get back to us
>>> soon.
>>>
>>> Regards
>>> JB
>>>
>>> On Fri, Jan 5, 2024 at 3:38 PM Andrew Lamb  wrote:
>>> >
>>> > Thanks JB,
>>> >
>>> > I did do a name search and posted the results here [1]
>>> >
>>> > However, I am not sure what the next steps for that particular process
>>> is
>>> > (like does someone have to approve it, for example?)
>>> >
>>> > Any insight you could provide would be greatly appreciated
>>> >
>>> > Andrew
>>> >
>>> > [1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>>> >
>>> >
>>> > On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré 
>>> wrote:
>>> >
>>> > > Hi Andrew,
>>> > >
>>> > > I did a quick review on the doc and it looks good to me. I just added
>>> > > a question about name search (DataFusion will probably work as TLP,
>>> > > but we have to check as we have a new Apache name moving from Arrow
>>> > > DataFusion to DataFusion).
>>> > >
>>> > > Please let me know if I can help on that.
>>> > >
>>> > > Thanks !
>>> > > Regards
>>> > > JB
>>> > >
>>> > > On Fri, Jan 5, 2024 at 12:26 PM Andrew Lamb 
>>> wrote:
>>> > > >
>>> > > > Upon reviewing the board report template, I am planning on the
>>> following
>>> > > > schedule:
>>> > > > 1. I'll leave this proposal for another few weeks to gather any
>>> > > additional
>>> > > > input
>>> > > > 2. In early February 2024 I'll start a formal vote thread on the
>>> dev@
>>> > > > mailing list for this proposal
>>> > > > 3. If the vote passes, I'll submit a proposed resolution to the
>>> ASF board
>>> > > > for their meeting in April 2024 using the pre-existing template[1]
>>> > > >
>>> > > >
>>> > > > [1]
>>> > > >
>>> > >
>>> https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt
>>> > > >
>>> > > > On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh 
>>> wrote:
>>> > > >
>>> > > > > Thanks for writin

Re: [DISCUSS] Move sqlparser-rs back into DataFusion project?

2024-02-28 Thread Andrew Lamb
One potential way "moving sqlparser-rs into DataFusion" could look is that
code/repo is moved from the sqlparser-rs [1] organization to the apache
organization. For example

https://github.com/sqlparser-rs/sqlparser-rs
to
https://github.com/apache/datafusion-sqlparser

We could continue development separately from any other code, release it as
a separate artifact, but use the same overarching governance structure
(voting on releases, committer access, etc)

To follow this model, I think the largest work item would be to run the IP
clearance process, and since sqlparser-rs has many distinct contributors
that may take a while

Andrew



On Wed, Feb 28, 2024 at 1:45 AM Aldrin  wrote:

> Maybe it would be valuable to more explicitly define "moving back into
> DataFusion project".
>
> I assumed it meant absorbing into the datafusion repo, but it occurs to me
> that may not be the case. Then, how would sqlparser-rs be "moved"?
>
>
>
> # --
> # Aldrin
>
>
> https://github.com/drin/
> https://gitlab.com/octalene
> https://keybase.io/octalene
>
>
> On Tuesday, February 27th, 2024 at 16:20, Chak-Pong Chung <
> chakpongch...@gmail.com> wrote:
>
> > There are cases where people need datafusion but not a SQL parser. For
> > example, people building a composable query engine for graph or other
> data
> > modality may not choose SQL as the DSL. Decoupling them seems to be a
> good
> > idea.
> >
>
> > On Tue, Feb 27, 2024, 6:20 AM Mehmet Ozan Kabak o...@synnada.ai wrote:
> >
>
> > > In this case, maybe we can bring sqlparser-rs into the ASF umbrella
> > > following the arrow-datafusion model?
> > >
>
> > > Once DataFusion becomes a top-level project, we could move it to
> > > datafusion-sqlparser-rs — it would be a quasi-independent project just
> like
> > > how DataFusion is today w.r.t. Arrow. But it would get most benefits of
> > > having a community behind it.
> > >
>
> > > > On Feb 27, 2024, at 2:11 AM, Andrew Lamb al...@influxdata.com wrote:
> > > >
>
> > > > Julian, thank you for your insight. I very much agree with it.
> > > >
>
> > > > > I think the ASF is wrong on this. I think it needs to provide a
> home
> > > > > for medium-sized projects such as sqlparser-rs in an existing
> > > > > top-level project;
> > > >
>
> > > > It could be said that DataFusion fits this model -- it isn't really
> an
> > > > "Arrow" project but needed a place to live and grow, and the Arrow
> ASF
> > > > community provided that.
> > > >
>
> > > > Andrew
> > > >
>
> > > > On Mon, Feb 26, 2024 at 1:09 PM Julian Hyde jh...@apache.org wrote:
> > > >
>
> > > > > I am torn on this.
> > > > >
>
> > > > > One one hand, I am a big fan of components that are standalone -
> have
> > > > > no more dependencies than necessary, and are self-evidently
> > > > > standalone. So, I think that re-absorbing sqlparser-rs back into
> > > > > DataFusion would not be a good step. It would reduce the perception
> > > > > that it is standalone.
> > > > >
>
> > > > > On the other hand, it sounds as if sqlparser-rs would benefit by
> > > > > having an Apache-like community around it. DataFusion isn't a
> perfect
> > > > > fit - there is not much overlap between DataFusion and sqlparser-rs
> > > > > users - but it takes a lot of effort to create and run a top-level
> > > > > project, and DataFusion is already up and running.
> > > > >
>
> > > > > The tension is that people want to consume components that they
> > > > > perceive to be standalone, and yet the ASF wants to create
> communities
> > > > > that produce either a single large component or sets of
> highly-coupled
> > > > > components. The ASF used to do 'umbrella projects' whose
> sub-projects
> > > > > were in the same subject area but had little or no dependencies.
> For
> > > > > example, Apache DB [ https://db.apache.org/ ] has JDO, Derby and
> > > > > Torque. And commons included many useful Java libraries. Umbrella
> > > > > projects caused problems during the Jakarta and Hadoop eras, and
> now
> > > > > are strongly discouraged at the ASF.
> > > > >
>
> > > > > I think the ASF is wrong on this. I think it ne

[DataFusion] Proposed changes to the TreeNode API

2024-02-27 Thread Andrew Lamb
I would like to draw some additional attention to any DataFusion user who
uses the TreeNode API heavily.

There is a PR with a proposed improvement to the API in [1]. Please share
any comments you may have on the PR.

Andrew

[1] https://github.com/apache/arrow-datafusion/pull/8891


Re: [DISCUSS] Move sqlparser-rs back into DataFusion project?

2024-02-27 Thread Andrew Lamb
Julian, thank you for your insight. I very much agree with it.

> I think the ASF is wrong on this. I think it needs to provide a home
> for medium-sized projects such as sqlparser-rs in an existing
> top-level project;

It could be said that DataFusion fits this model  -- it isn't really an
"Arrow" project but needed a place to live and grow, and the Arrow ASF
community provided that.

Andrew




On Mon, Feb 26, 2024 at 1:09 PM Julian Hyde  wrote:

> I am torn on this.
>
> One one hand, I am a big fan of components that are standalone - have
> no more dependencies than necessary, and are self-evidently
> standalone. So, I think that re-absorbing sqlparser-rs back into
> DataFusion would not be a good step. It would reduce the perception
> that it is standalone.
>
> On the other hand, it sounds as if sqlparser-rs would benefit by
> having an Apache-like community around it. DataFusion isn't a perfect
> fit - there is not much overlap between DataFusion and sqlparser-rs
> users - but it takes a lot of effort to create and run a top-level
> project, and DataFusion is already up and running.
>
> The tension is that people want to consume components that they
> perceive to be standalone, and yet the ASF wants to create communities
> that produce either a single large component or sets of highly-coupled
> components. The ASF used to do 'umbrella projects' whose sub-projects
> were in the same subject area but had little or no dependencies. For
> example, Apache DB [ https://db.apache.org/ ] has JDO, Derby and
> Torque. And commons included many useful Java libraries. Umbrella
> projects caused problems during the Jakarta and Hadoop eras, and now
> are strongly discouraged at the ASF.
>
> I think the ASF is wrong on this. I think it needs to provide a home
> for medium-sized projects such as sqlparser-rs in an existing
> top-level project; maybe those projects grow into top-level projects,
> or maybe they remain medium-sized projects. This is especially
> necessary in the Rust community, where there are many exciting
> projects, but they are almost all happening outside ASF. (This is
> exactly where Java was in ~2005. Maybe we need a rust-commons or
> rust-db?)
>
> My conclusion is to leave sqlparser-rs where it is for now, but to
> continue talking about what might be an attractive home for it in ASF.
>
> Julian
>
> On Mon, Feb 26, 2024 at 8:12 AM Andrew Lamb  wrote:
> >
> > Sorry for the late reply,
> >
> > I think sqlparser-rs users are quite a bit more varied than DataFusion
> and
> > there is not a large overlap between the contributors of the two
> projects.
> > I currently seem to be the one reviewing / merging most sqlparser-rs
> > reviews, and I would definitely love some more help.
> >
> > However, given that the project is not an Apache project, I did not have
> > good luck attracting help.  A related discussion is here [1].
> >
> > If the DataFusion community would like to accelerate releases, we can
> also
> > try to do that without bringing it into Apache governance. Specifically,
> it
> > would be great to have help reviewing the PRs -- the actual release
> process
> > is pretty low overhead. The reviews are what take the vast majority of
> the
> > maintenance time.
> >
> > Andrew
> >
> > [1]: https://github.com/sqlparser-rs/sqlparser-rs/issues/818
> >
> >
> >
> > On Sat, Feb 17, 2024 at 4:44 PM Aldrin 
> wrote:
> >
> > > do users of sqlparser-rs mostly use datafusion? I don't know the
> > > community, but it seems like it would be an annoying change for users
> who
> > > use it with a different query engine. Just a thought
> > >
> > > Sent from Proton Mail <https://proton.me/mail/home> for iOS
> > >
> > >
> > > On Sat, Feb 17, 2024 at 10:26, Andy Grove  > > > wrote:
> > >
> > > I agree that it simplifies shipping new SQL features in DataFusion
> since we
> > > can develop the changes in the parser concurrently with the changes in
> > > other DataFusion crates and then release them all together.
> > >
> > > The name of the crate would not need to change, so downstream users
> should
> > > see no impact.
> > >
> > > We would need to decide if we want to keep a separate version number or
> > > bring it in line with DataFusion version numbers (I have no preference
> > > either way).
> > >
> > >
> > >
> > > On Sat, Feb 17, 2024 at 11:09 AM Mehmet Ozan Kabak 
> > > wrote:
> > >
> > > > Doing this will probably reduce the time-to-ship for Da

Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-02-26 Thread Andrew Lamb
An update:

I have updated the proposal [1] with additional information (new committers
Jeffrey Vo and Jay Zhan, and the new datafusion-comet repository)

I plan to:
1. Call for a formal vote on this (dev@arrow.apache.org) mailing list this
Friday March 2
2. If the vote passes, submit the proposal to the ASF board as part of the
April 2024 Arrow report.

This extended timeline is designed to balance the needs of some
contributors to prepare for the changed structure with their employers.

Full Details can be found on [2].

Thank you,
Andrew

[1]
https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit
[2] https://github.com/apache/arrow-datafusion/discussions/6475

On Fri, Jan 5, 2024 at 11:19 AM Andrew Lamb  wrote:

> Thank you very much
>
> On Fri, Jan 5, 2024 at 11:17 AM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Andrew,
>>
>> The PODLINGNAMESEARCH is not yet completed: the VP Brand Management
>> (Mark Thomas) should comment in the Jira to approve or not the name.
>>
>> I added a comment in the Jira to ping Mark. He should get back to us soon.
>>
>> Regards
>> JB
>>
>> On Fri, Jan 5, 2024 at 3:38 PM Andrew Lamb  wrote:
>> >
>> > Thanks JB,
>> >
>> > I did do a name search and posted the results here [1]
>> >
>> > However, I am not sure what the next steps for that particular process
>> is
>> > (like does someone have to approve it, for example?)
>> >
>> > Any insight you could provide would be greatly appreciated
>> >
>> > Andrew
>> >
>> > [1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
>> >
>> >
>> > On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré 
>> wrote:
>> >
>> > > Hi Andrew,
>> > >
>> > > I did a quick review on the doc and it looks good to me. I just added
>> > > a question about name search (DataFusion will probably work as TLP,
>> > > but we have to check as we have a new Apache name moving from Arrow
>> > > DataFusion to DataFusion).
>> > >
>> > > Please let me know if I can help on that.
>> > >
>> > > Thanks !
>> > > Regards
>> > > JB
>> > >
>> > > On Fri, Jan 5, 2024 at 12:26 PM Andrew Lamb 
>> wrote:
>> > > >
>> > > > Upon reviewing the board report template, I am planning on the
>> following
>> > > > schedule:
>> > > > 1. I'll leave this proposal for another few weeks to gather any
>> > > additional
>> > > > input
>> > > > 2. In early February 2024 I'll start a formal vote thread on the
>> dev@
>> > > > mailing list for this proposal
>> > > > 3. If the vote passes, I'll submit a proposed resolution to the ASF
>> board
>> > > > for their meeting in April 2024 using the pre-existing template[1]
>> > > >
>> > > >
>> > > > [1]
>> > > >
>> > >
>> https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt
>> > > >
>> > > > On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh 
>> wrote:
>> > > >
>> > > > > Thanks for writing the proposal. It looks great to me too.
>> > > > > I added a few comments on it.
>> > > > >
>> > > > > On Wed, Dec 27, 2023 at 3:05 PM Andy Grove > >
>> > > wrote:
>> > > > > >
>> > > > > > Thank you for creating the draft proposal, Andrew. I have
>> reviewed
>> > > this
>> > > > > and
>> > > > > > I think it looks great.
>> > > > > >
>> > > > > > Andy.
>> > > > > >
>> > > > > > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb <
>> al...@influxdata.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > > I have created a draft proposal [1] to break DataFusion out
>> to its
>> > > own
>> > > > > top
>> > > > > > > level project. Please provide your feedback and suggestions.
>> > > > > > >
>> > > > > > > The proposal is included at the end of this email and in this
>> > > Google
>> > > > > Doc:
>> > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> https:/

Re: [DISCUSS] Move sqlparser-rs back into DataFusion project?

2024-02-26 Thread Andrew Lamb
Sorry for the late reply,

I think sqlparser-rs users are quite a bit more varied than DataFusion and
there is not a large overlap between the contributors of the two projects.
I currently seem to be the one reviewing / merging most sqlparser-rs
reviews, and I would definitely love some more help.

However, given that the project is not an Apache project, I did not have
good luck attracting help.  A related discussion is here [1].

If the DataFusion community would like to accelerate releases, we can also
try to do that without bringing it into Apache governance. Specifically, it
would be great to have help reviewing the PRs -- the actual release process
is pretty low overhead. The reviews are what take the vast majority of the
maintenance time.

Andrew

[1]: https://github.com/sqlparser-rs/sqlparser-rs/issues/818



On Sat, Feb 17, 2024 at 4:44 PM Aldrin  wrote:

> do users of sqlparser-rs mostly use datafusion? I don't know the
> community, but it seems like it would be an annoying change for users who
> use it with a different query engine. Just a thought
>
> Sent from Proton Mail  for iOS
>
>
> On Sat, Feb 17, 2024 at 10:26, Andy Grove  > wrote:
>
> I agree that it simplifies shipping new SQL features in DataFusion since we
> can develop the changes in the parser concurrently with the changes in
> other DataFusion crates and then release them all together.
>
> The name of the crate would not need to change, so downstream users should
> see no impact.
>
> We would need to decide if we want to keep a separate version number or
> bring it in line with DataFusion version numbers (I have no preference
> either way).
>
>
>
> On Sat, Feb 17, 2024 at 11:09 AM Mehmet Ozan Kabak 
> wrote:
>
> > Doing this will probably reduce the time-to-ship for DataFusion features
> > that need parsing support due to increased convenience, so I’m inclined
> to
> > see it in a positive light.
> >
> > What would be the impact of doing this on people who use only
> > sqlparser-rs, if any?
> >
> > > On Feb 17, 2024, at 7:16 PM, Andy Grove  wrote:
> > >
> > > The sqlparser-rs project [1] seems to have become the de-facto SQL
> parser
> > > for Rust, with almost 4 million downloads so far. This was originally
> > part
> > > of DataFusion very early on, and I moved it into a separate project
> > because
> > > it seemed useful for other projects. This was before DataFusion was
> known
> > > as a composable query engine, and with hindsight, I probably should
> have
> > > left it as part of the DataFusion project.
> > >
> > > Now that DataFusion has a reputation as a composable query engine, I
> > think
> > > it would make sense to move this code back into DataFusion, where it
> > would
> > > benefit from a larger community of maintainers.
> > >
> > > I would like to hear thoughts from the Apache Arrow / DataFusion
> > community.
> > > Does this seem like a good idea?
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > [1] https://github.com/sqlparser-rs/sqlparser-rs
> >
> >
>
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 36.0.0 RC1

2024-02-17 Thread Andrew Lamb
+1 (binding)

Verified on M3 Mac

Thank you for keeping the release training humming Andy

Andrew

On Fri, Feb 16, 2024 at 12:23 PM L. C. Hsieh  wrote:

> +1 (binding)
>
> Verified on M3 Mac.
>
> Thanks Andy.
>
>
> On Fri, Feb 16, 2024 at 9:08 AM Andy Grove  wrote:
> >
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion
> Implementation,
> > version 36.0.0.
> >
> > This release candidate is based on commit:
> > bf6f83b3d228fb386f9b4b20c254fa58e2412660 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion 36.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion 36.0.0 because...
> >
> > Here is my vote:
> >
> > +1
> >
> > [1]:
> >
> https://github.com/apache/arrow-datafusion/tree/bf6f83b3d228fb386f9b4b20c254fa58e2412660
> > [2]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-36.0.0-rc1
> > [3]:
> >
> https://github.com/apache/arrow-datafusion/blob/bf6f83b3d228fb386f9b4b20c254fa58e2412660/CHANGELOG.md
>


Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-16 Thread Andrew Lamb
Thank you -- I think the usecase is great, but agree with the other
reviewers that the name may be confusing. I left some notes on the ticket

Andrew

On Wed, Feb 14, 2024 at 3:52 PM David Li  wrote:

> I've put up a candidate implementation sans integration test [1].
>
> Some caveats:
> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?'
> (yes, an empty query string pacifies it). I've chosen the latter since the
> former is technically a URI with a non-empty path but neither are ideal.
> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
> faithful to the intended use than 'fallback'.
>
> [1]: https://github.com/apache/arrow/pull/40084
>
> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
> > Hi David,
> >
> > It's reasonable. I think we can start with your initial proposal (it
> > sounds fine to me) and we can always improve step by step.
> >
> > Thanks !
> > Regards
> > JB
> >
> > On Tue, Feb 13, 2024 at 4:53 PM David Li  wrote:
> >>
> >> I'm going to keep the proposal as-is then. It can be extended if this
> use case comes up.
> >>
> >> I'll start work on candidate implementations now.
> >>
> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
> >> > I think the original proposal is sufficient.
> >> >
> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls
> to
> >> > http without an explicit server location (unless both Flight servers
> are
> >> > hosted under the same port?). So the "+" proposal seems a bit weird.
> >> >
> >> >
> >> > Le 12/02/2024 à 23:39, David Li a écrit :
> >> >> The idea is that the client would reuse the existing connection, in
> which case the protocol and such are implicit. (If the client doesn't have
> a connection anymore, it can't use the fallback anyways.)
> >> >>
> >> >> I suppose this has the advantage that you could "fall back" to a
> known hostname with a different protocol, but I'm not sure that always
> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
> addresses aren't hostnames but rather opaque byte blobs, for instance.)
> >> >>
> >> >> If we do prefer this, to avoid overloading the hostname, there's
> also the informal convention of using + in the scheme, so it could be
> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
> >> >>
> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
> >> >>> Thanks for clarifying.
> >> >>>
> >> >>> Given the relationship between these two proposals, would it also be
> >> >>> necessary to distinguish the scheme (or schemes) supported by the
> >> >>> originating Flight RPC service?
> >> >>>
> >> >>> If that is the case, it may be preferred to use the "host" portion
> of the
> >> >>> URI rather than the "scheme" to denote the location of the data. In
> this
> >> >>> scenario, the host "0.0.0.0" could be used. This IP address is
> defined in
> >> >>> IETF RFC1122 [1] as "This host on this network", which seems most
> >> >>> consistent with the intended use-case. There are some caveats to
> this usage
> >> >>> but in my experience it's not uncommon for protocols to extend the
> >> >>> definition of this address in their own usage.
> >> >>>
> >> >>> A benefit of this convention is that the scheme remains available
> in the
> >> >>> URI to specify the transport available. For example, the following
> list of
> >> >>> locations may be included in the response:
> >> >>>
> >> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4",
> ...]
> >> >>>
> >> >>> This would indicate that grpc and ucx transport is available from
> the
> >> >>> current service, grpc is available at 1.2.3.4, and possibly more
> >> >>> combinations of scheme/host.
> >> >>>
> >> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
> >> >>>
> >> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li 
> wrote:
> >> >>>
> >>  Ah, while I was thinking of it as useful for a fallback, I'm not
> >>  specifying it that way.  Better ideas for names would be
> appreciated.
> >> 
> >>  The actual precedence has never been specified. All endpoints are
> >>  equivalent, so clients may use what is "best". For instance, with
> Matt
> >>  Topol's concurrent proposal, a GPU-enabled client may
> preferentially try
> >>  UCX endpoints while other clients may choose to ignore them
> entirely (e.g.
> >>  because they don't have UCX installed).
> >> 
> >>  In practice the ADBC/JDBC drivers just scan the list left to right
> and try
> >>  each endpoint in turn for lack of a better heuristic.
> >> 
> >>  On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
> >> > Thanks for proposing this David.
> >> >
> >> > I think the ability to include the Flight RPC service itself in
> the list
> >>  of
> >> > endpoints from which data can be fetched is a helpful addition.
> >> >
> >> > The current choice of name for the URI (arrow-flight-fallback://)
> seems
> >>  to
> >> > 

Re: [VOTE] Explicit session management for Flight RPC

2024-02-16 Thread Andrew Lamb
+1

On Fri, Feb 16, 2024 at 1:46 AM Jean-Baptiste Onofré 
wrote:

> +1
>
> Regards
> JB
>
> On Wed, Feb 14, 2024 at 5:38 PM David Li  wrote:
> >
> > Paul Nienaber would like to propose explicit session management for
> Flight RPC.  This proposal was previously discussed at [1].  A candidate
> implementation for C++ and Java is at [2].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1
> > [ ] +0
> > [ ] -1 Do not accept this proposal because...
> >
> > [1]: https://lists.apache.org/thread/fd6r1n7vt91sg2c7fr35wcrsqz6x4645
> > [2]: https://github.com/apache/arrow/pull/34817
>


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2024-02-16 Thread Andrew Lamb
Thank you for driving this -- it will be a very valuable addition to
FlightSQL

On Tue, Feb 13, 2024 at 1:15 PM Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> A quick update about the ODBC driver donation:
> - I finished all required changes (ASF header everywhere, build update,
> etc)
> - I'm now on build & runtime test
>
> I should have the PR open in a couple of days.
>
> Thanks !
> Regards
> JB
>
> On Wed, Feb 7, 2024 at 4:04 PM Jean-Baptiste Onofré 
> wrote:
> >
> > Hi all,
> >
> > FYI, I'm back on Arrow (sorry I have been busy with other stuff).
> >
> > I'm working on the ODBC donation branch, I should have the PR ready by
> > the end of the week.
> >
> > Sorry again for the delay, it's now one of my top priority ;)
> >
> > Regards
> > JB
> >
> > On Tue, Dec 12, 2023 at 9:45 PM Alina Li 
> wrote:
> > >
> > > David you bring a good point. Regarding the IP clearance for
> Timestream ODBC driver, we are still looking to get the necessary paperwork
> from Amazon. We're also considering using the Ignite ODBC Driver seed [1]
> as a replacement to the Timestream seed if it shows that we're unable to
> obtain paperwork from Amazon; we are still discussing this internally and
> will get back to the community afterwards.
> > >
> > > Regarding paperwork for the Dremio code, thank you Laurent for
> offering your help. Please do let us know if there's anything we can do to
> help as well.
> > >
> > > [1]:
> https://github.com/apache/ignite/tree/master/modules/platforms/cpp
> > > 
> > > From: Laurent Goujon 
> > > Sent: Friday, December 8, 2023 11:01 AM
> > > To: dev@arrow.apache.org 
> > > Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR
> for Flight SQL ODBC driver
> > >
> > > Am I reading the ticket correctly that this is also about importing
> some of
> > > the Dremio code into Arrow project (namely
> > > https://github.com/dremio/flightsql-odbc/). If it is the case, let me
> check
> > > how my company can provide the documentation for the project?
> > >
> > > On Fri, Dec 8, 2023 at 8:41 AM David Li  wrote:
> > >
> > > > Thanks for the clarification. That does sound like a nontrivial
> amount of
> > > > code.
> > > >
> > > > My worry is that we might not be able to get all the paperwork
> necessary
> > > > from Amazon/Amazon contributors for the Timestream part. The
> > > > document/guidelines are here [1]. Does that look doable from your
> end?
> > > >
> > > > [1]: https://incubator.apache.org/ip-clearance/
> > > >
> > > > On Thu, Dec 7, 2023, at 14:30, Alina Li wrote:
> > > > > Hi David. To be one the safer side, I suggest going through IP
> > > > > clearance for [3] the Timestream ODBC driver project, and more code
> > > > > than entry_points.cpp will be used. We have initially plan to use
> the
> > > > > Timestream's entry points code, but it includes more than just
> > > > > entry_points.cpp (code such as [5] odbc.cpp, [6] odbc.h and some
> other
> > > > > files are part of the entry points), and besides the entry points,
> > > > > we're planning to use Timestream's installers and DSN window as
> well.
> > > > > Sorry for the confusion.
> > > > >
> > > > > [5]:
> > > > >
> > > >
> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/src/odbc.cpp
> > > > > [6]:
> > > > >
> > > >
> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/include/timestream/odbc.h
> > > > > 
> > > > > From: David Li 
> > > > > Sent: Wednesday, December 6, 2023 6:09 AM
> > > > > To: dev@arrow.apache.org 
> > > > > Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a
> PR for
> > > > > Flight SQL ODBC driver
> > > > >
> > > > > Thanks for the update, Alina. This sounds good, my only question
> for
> > > > > the broader community is whether there is enough imported code
> that we
> > > > > should go through the IP clearance process [1]. It's never been
> clear
> > > > > to me what exactly the threshold for this is. flightsql-odbc [2] is
> > > > > already quite large on its own and probably we should go through
> > > > > clearance? It's not clear to me how much of the Timestream project
> [3]
> > > > > would be involved here, if you mean literally only
> entry_points.cpp [4]
> > > > > (that's probably OK without clearance?) or more code than that.
> > > > >
> > > > > [1]: https://incubator.apache.org/ip-clearance/
> > > > > [2]: https://github.com/dremio/flightsql-odbc
> > > > > [3]: https://github.com/awslabs/amazon-timestream-odbc-driver
> > > > > [4]:
> > > > >
> > > >
> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/src/entry_points.cpp
> > > > >
> > > > > On Tue, Dec 5, 2023, at 18:25, Alina Li wrote:
> > > > >> Hi community,
> > > > >>
> > > > >> I wanted to start a discussion regarding the development of
> Flight SQL
> > > > >> ODBC driver. Regarding the seed usage to my previous email, our
> initial
> > > > >> plan is that flightsql-odbc will be mostly used 

[ANNOUNCE] New Arrow committer: Jay Zhan

2024-02-16 Thread Andrew Lamb
On behalf of the Arrow PMC, I'm happy to announce that Jay Zhan
has accepted an invitation to become a committer on Apache
Arrow. Welcome, and thank you for your contributions!

Andrew


Re: [FlightSQL] Supporting binding parameters to prepared statements with a stateless server

2024-02-12 Thread Andrew Lamb
An update here is that we are beginning development of a specific proposal,
and will keep the ticket and this mailing list thread up to date with our
status.

On Thu, Sep 14, 2023 at 7:56 AM Raphael Taylor-Davies
 wrote:

> Hi,
>
> Thank you for starting this discussion. I think the decision to use gRPC
> and by extension HTTP certainly would encourage a design that explicitly
> doesn't rely on server-side state. Not only is it now uncommon for
> backend servers to have a unique globally-routable identity, but as
> these are request-oriented protocols, as opposed to connection-oriented
> protocols, maintaining session state server-side becomes very
> complicated as there is no unambiguous end-of-session signal.
>
> I would very much encourage following an approach similar to web
> cookies, where state is instead managed by the clients and sent with
> each request. This sort of maps to the ticket notion already present in
> many of the APIs, but could perhaps be formalized.
>
> Kind Regards,
>
> Raphael Taylor-Davies
>
> On 14/09/2023 11:52, Andrew Lamb wrote:
> > Hello,
> >
> > As FlightSQL gets more widely adopted across the ecosystem, we hit some
> > issues trying to implement bind parameters in our stateless service. I
> > filed a ticket [1] that describes the issue as well as a potential
> > solution.
> >
> > Please share your thoughts on the ticket
> >
> > Andrew
> >
> > [1] https://github.com/apache/arrow/issues/37720
> >
>


Re: [DISCUSS] [DataFusion] Unifying BuiltIn and User Defined Functions

2024-02-12 Thread Andrew Lamb
Update here is that we are making good progress towards the goal of
removing the distinction between user / built in functions.

If you have any feedback on this project or how you would like
the functions structured, please join the conversation on [1].

Thanks
Andrew

[1]: https://github.com/apache/arrow-datafusion/issues/9100

On Tue, Jan 23, 2024 at 8:05 AM Andrew Lamb  wrote:

> I would like to bring attention to the project of unifying built in and
> user defined functions [1], and specifically a PR[2] that starts
> implementing that approach.
>
> Please provide any feedback you have on the ticket or PR.
>
> Thank you,
> Andrew
>
> [1]: https://github.com/apache/arrow-datafusion/issues/8045
> [2]: https://github.com/apache/arrow-datafusion/pull/8705
>


Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-11 Thread Andrew Lamb
eing widely used internally
> > > > already), we'd be happy to help further improve readability &
> > > > maintainability of the codebase and resolving issues raised from the
> > > > community. Will this work for you? really appreciate if you
> understand
> > > > our situation.
> > > >
> > > > Thanks,
> > > > Chao
> > > >
> > > > On Wed, Jan 17, 2024 at 11:30 AM Jacques Nadeau 
> > > > wrote:
> > > > >
> > > > > Thanks for the quick response Chao.
> > > > >
> > > > > My experience on these things is that maintaining commit history
> for
> > > > large
> > > > > codebases can be invaluable for tracking down issues. (Hey, why is
> > this
> > > > > code written this way-- oh, it was part of x patch that was trying
> to
> > > > > achieve y).
> > > > >
> > > > > In the past, I've used git commit replay type tools and filtering
> of
> > > > commit
> > > > > messages, subdirectories, etc. to get something prepped for
> external
> > > > > consumption. My experience is that spending a few days now to do
> this
> > > > kind
> > > > > of thing saves far more days in the future (and leads to higher
> > quality).
> > > > >
> > > > > On Wed, Jan 17, 2024 at 9:18 AM Chao Sun 
> wrote:
> > > > >
> > > > > > Hi Andy and Jacques,
> > > > > >
> > > > > > Thanks for setting the repo up. Yes we are working on cleaning up
> > the
> > > > > > internal repo and preparing to open a PR in the next few days.
> > > > > >
> > > > > > It's a bit difficult to retain the original commit history in the
> > PR
> > > > > > though since some of them contain internal info which we need to
> > > > > > remove upon open sourcing. How about we just add a summary in the
> > PR
> > > > > > itself, and add everyone that has contributed to it as co-author
> to
> > > > > > the PR?
> > > > > >
> > > > > > Chao
> > > > > >
> > > > > > On Wed, Jan 17, 2024 at 11:09 AM Jacques Nadeau <
> > jacq...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hey Chao, it would be great for you to share the code some
> place
> > with
> > > > > > > commit history. (PR to the repo that Andy made or something
> > else.)
> > > > > > >
> > > > > > > On Mon, Jan 15, 2024 at 7:38 AM Andy Grove <
> > andygrov...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Chao,
> > > > > > > >
> > > > > > > > I have created
> > https://github.com/apache/arrow-datafusion-comet
> > > > and
> > > > > > you
> > > > > > > > should be able to create a PR against the repo.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Andy.
> > > > > > > >
> > > > > > > > Andy.
> > > > > > > >
> > > > > > > > On Fri, Jan 12, 2024 at 3:45 PM Chao Sun  >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks all for the positive support!
> > > > > > > > >
> > > > > > > > > Andy, we plan to name the project Comet (BTW if you have
> > better
> > > > > > > > > suggestions please let us know). Could you help to create a
> > repo
> > > > > > named
> > > > > > > > > arrow-datafusion-comet or arrow-comet? We'll clean up our
> > > > internal
> > > > > > > > > repo and prepare for the donation in the next few days.
> > Thanks
> > > > for
> > > > > > the
> > > > > > > > > help!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Chao
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >

Re: [ANNOUNCE] New Arrow committer: Jeffrey Vo

2024-02-07 Thread Andrew Lamb
Congratulations Jeffrey! Well deserved!

On Tue, Feb 6, 2024 at 1:30 PM Raphael Taylor-Davies
 wrote:

> On behalf of the Arrow PMC, I am happy to announce that Jeffrey Vo has
> accepted an invitation to become a committer on Apache Arrow. Welcome,
> and thank you for your contributions!
>
> Raphael Taylor-Davies
>
>


Re: [VOTE][RUST][Ballista] Release Apache Arrow Ballista 0.12.0 RC4

2024-02-04 Thread Andrew Lamb
+1 (binding)

Verified in M3 Mac

Thanks Andy


On Sat, Feb 3, 2024 at 5:35 PM L. C. Hsieh  wrote:

> +1 (binding)
>
> Verified on M1 Mac.
>
> Thanks Andy.
>
> On Sat, Feb 3, 2024 at 2:15 PM Andy Grove  wrote:
> >
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Ballista
> Implementation,
> > version 0.12.0.
> >
> > This release candidate is based on commit:
> > a8ee11e55cfae4b7418f7044580318d33be9669e [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> https://github.com/apache/arrow-ballista/blob/main/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow Ballista 0.12.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Ballista 0.12.0 because...
> >
> > [1]:
> >
> https://github.com/apache/arrow-ballista/tree/a8ee11e55cfae4b7418f7044580318d33be9669e
> > [2]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-ballista-0.12.0-rc4
> > [3]:
> >
> https://github.com/apache/arrow-ballista/blob/a8ee11e55cfae4b7418f7044580318d33be9669e/CHANGELOG.md
>


Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 35.0.0 RC1

2024-02-02 Thread Andrew Lamb
+1 (binding)

Verified on M3 mac

As before it seems as if python 3.11 isn't supported in the verification
script, only python 3.10. When I used 3.10 everything looks good.

Thanks a lot,
Andrew

On Thu, Feb 1, 2024 at 7:27 PM L. C. Hsieh  wrote:

> +1 (binding)
>
> Verified on M1 Mac.
>
> Thanks Andy.
>
> On Thu, Feb 1, 2024 at 3:53 PM Andy Grove  wrote:
> >
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion Python
> > Bindings,
> > version 35.0.0.
> >
> > This release candidate is based on commit:
> > bef6cb66599588c096dae59ddfd707053e5741cd [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> > The Python wheels are located at [4].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> https://github.com/apache/arrow-datafusion-python/blob/main/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion Python 35.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion Python 35.0.0
> > because...
> >
> > Here is my vote:
> >
> > +1
> >
> > [1]:
> >
> https://github.com/apache/arrow-datafusion-python/tree/bef6cb66599588c096dae59ddfd707053e5741cd
> > [2]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-python-35.0.0-rc1
> > [3]:
> >
> https://github.com/apache/arrow-datafusion-python/blob/bef6cb66599588c096dae59ddfd707053e5741cd/CHANGELOG.md
> > [4]: https://test.pypi.org/project/datafusion/35.0.0/
>


[DISCUSS] [DATAFUSION] Minimum Supported Rust Version Policy

2024-01-31 Thread Andrew Lamb
Hello,

We are thinking about a min rust version policy for DataFusion[1]. Please
leave any thoughts you would like to share on the ticket.

Thank you,
Andrew
[1]: https://github.com/apache/arrow-datafusion/issues/9082


Re: [VOTE] Accept donation of Comet Spark native engine

2024-01-27 Thread Andrew Lamb
+1 (binding)

This is super exciting

On Sat, Jan 27, 2024 at 11:00 AM Daniël Heres  wrote:

> +1 (binding). Awesome addition to the DataFusion ecosystem!!!
>
> Daniël
>
>
> On Sat, Jan 27, 2024, 16:57 vin jake  wrote:
>
> > +1 (binding)
> >
> > Andy Grove  于 2024年1月27日周六 下午11:43写道:
> >
> > > Hello,
> > >
> > > This vote is to determine if the Arrow PMC is in favor of accepting the
> > > donation of Comet (a Spark native engine that is powered by DataFusion
> > and
> > > the Rust implementation of Arrow).
> > >
> > > The donation was previously discussed on the mailing list [1].
> > >
> > > The proposed donation is at [2].
> > >
> > > The Arrow PMC will start the IP clearance process if the vote passes.
> > There
> > > is a Google document [3] where the community is working on the draft
> > > contents for the IP clearance form.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 : Accept the donation
> > > [ ] 0 : No opinion
> > > [ ] -1 : Reject donation because...
> > >
> > > My vote: +1
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > >
> > > [1] https://lists.apache.org/thread/0q1rb11jtpopc7vt1ffdzro0omblsh0s
> > > [2] https://github.com/apache/arrow-datafusion-comet/pull/1
> > > [3]
> > >
> > >
> >
> https://docs.google.com/document/d/1azmxE1LERNUdnpzqDO5ortKTsPKrhNgQC4oZSmXa8x4/edit?usp=sharing
> > >
> >
>


[DataFusion] New Blog Post -- DataFusion 34.0

2024-01-23 Thread Andrew Lamb
If anyone is interested, here is a new blog post about the last 6 months in
DataFusion[1] and where we are heading this year.

Andrew

[1]: https://arrow.apache.org/blog/2024/01/19/datafusion-34.0.0/


[DISCUSS] [DataFusion] Unifying BuiltIn and User Defined Functions

2024-01-23 Thread Andrew Lamb
I would like to bring attention to the project of unifying built in and
user defined functions [1], and specifically a PR[2] that starts
implementing that approach.

Please provide any feedback you have on the ticket or PR.

Thank you,
Andrew

[1]: https://github.com/apache/arrow-datafusion/issues/8045
[2]: https://github.com/apache/arrow-datafusion/pull/8705


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 35.0.0 RC1

2024-01-21 Thread Andrew Lamb
+1 (binding)

I verified it on Mac (M3).

I got the same error in test_partial_ord and I agree it looks very much the
the same as https://github.com/apache/arrow-datafusion/pull/8908 -- a test
only issue that should not block the release

Thanks Andy


On Sat, Jan 20, 2024 at 10:43 AM Andy Grove  wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow DataFusion
> Implementation,
> version 35.0.0.
>
> This release candidate is based on commit:
> e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1 [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests, and
> vote
> on the release. The vote will be open for at least 72 hours.
>
> Only votes from PMC members are binding, but all members of the community
> are
> encouraged to test the release and vote with "(non-binding)".
>
> The standard verification procedure is documented at
>
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> .
>
> [ ] +1 Release this as Apache Arrow DataFusion 35.0.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow DataFusion 35.0.0 because...
>
> Here is my vote:
>
> +1
>
> [1]:
>
> https://github.com/apache/arrow-datafusion/tree/e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1
> [2]:
>
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-35.0.0-rc1
> [3]:
>
> https://github.com/apache/arrow-datafusion/blob/e58446bbe9ebe3f5a2aae1abd3c17a694070b0d1/CHANGELOG.md
>


Re: [DISC] Improve Arrow Release verification process

2024-01-19 Thread Andrew Lamb
I would second this notion that manually running tests that are already
covered as part of CI as part of the release process is of (very) limited
value.

While we do the same thing (compile and run some tests) as part of the Rust
release this has never caught any serious defect I am aware of and we only
run a subset of tests (e.g. not tests for integration with other arrow
versions)

Reducing the burden for releases I think would benefit everyone.

Andrew

On Fri, Jan 19, 2024 at 1:08 PM Antoine Pitrou  wrote:

>
> Well, if the main objective is to just follow the ASF Release
> guidelines, then our verification process can be simplified drastically.
>
> The ASF indeed just requires:
> """
> Every ASF release MUST contain one or more source packages, which MUST
> be sufficient for a user to build and test the release provided they
> have access to the appropriate platform and tools. A source release
> SHOULD not contain compiled code.
> """
>
> So, basically, if the source tarball is enough to compile Arrow on a
> single platform with a single set of tools, then we're ok. :-)
>
> The rest is just an additional burden that we voluntarily inflict to
> ourselves. *Ideally*, our continuous integration should be able to
> detect any build-time or runtime issue on supported platforms. There
> have been rare cases where some issues could be detected at release time
> thanks to the release verification script, but these are a tiny portion
> of all issues routinely detected in the form of CI failures. So, there
> doesn't seem to be a reason to believe that manual release verification
> is bringing significant benefits compared to regular CI.
>
> Regards
>
> Antoine.
>
>
> Le 19/01/2024 à 11:42, Raúl Cumplido a écrit :
> > Hi,
> >
> > One of the challenges we have when doing a release is verification and
> voting.
> >
> > Currently the Arrow verification process is quite long, tedious and
> error prone.
> >
> > I would like to use this email to get feedback and user requests in
> > order to improve the process.
> >
> > Several things already on my mind:
> >
> > One thing that is quite annoying is that any flaky failure makes us
> > restart the process and possibly requires downloading everything
> > again. It would be great to have some kind of retry mechanism that
> > allows us to keep going from where it failed and doesn't have to redo
> > the previous successful jobs.
> >
> > We do have a bunch of flags to do specific parts but that requires
> > knowledge and time to go over the different flags, etcetera so the UX
> > could be improved.
> >
> > Based on the ASF release policy [1] in order to cast a +1 vote we have
> > to validate the source code packages but it is not required to
> > validate binaries locally. Several binaries are currently tested using
> > docker images and they are already tested and validated on CI. Our
> > documentation for release verification points to perform binary
> > validation. I plan to update the documentation and move it to the
> > official docs instead of the wiki [2].
> >
> > I would appreciate input on the topic so we can improve the current
> process.
> >
> > Thanks everyone,
> > Raúl
> >
> > [1] https://www.apache.org/legal/release-policy.html#release-approval
> > [2]
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>


Re: invite to the #arrow-rust channel

2024-01-17 Thread Andrew Lamb
I sent an invite for Max.

455954986 I am not sure what you are asking -- can you please clarify?

Thanks,
Andrew

On Wed, Jan 17, 2024 at 1:40 AM 455954986 <455954...@qq.com.invalid> wrote:

> Thank you for the invitation. No questions asked. I look forward to
> participating. What should I do next?
>
>
>
>
> --原始邮件--
> 发件人:
>   "dev"
> <
> maxd...@gmail.com;
> 发送时间:2024年1月17日(星期三) 中午12:28
> 收件人:"dev"
> 主题:invite to the #arrow-rust channel
>
>
>
> Hello!
>
> Could I please get an invite to the Slack ASF workspace, specifically to
> the #arrow-rust channel? I would like to discuss making some contributions
> to the project, and I would like to start with asking a few questions
> there.
>
> Thanks!
> Max


Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-11 Thread Andrew Lamb
I am very supportive of this donation. I know of at least one other
DataFusion-based project, blaze-rs[1], which has the same design goal and
bringing this project into the ASF may help consolidate these efforts

As Andy said, I believe it was very valuable to have a major consumer
project (e.g. DataFusion) to help drive the definition and implementation
of arrow-rs implementation. We never achieved the same synergy with
Ballista and DataFusion but I think it is more likely with a more actively
maintained Spark accelerator.

I am not sure it affects this discussion, but the Gluten project, based on
Velox, was accepted yesterday[2] into the Apache Incubator[2].  While the
functionality may be similar, the technology (Rust vs C/C++) and the
communities are different so having both in the same (big) tent of the ASF
doesn't seem concerning to me.

Also, as Chao says, I think this new sub project would naturally move to a
new DataFusion top level project when we get there (we plan a proposed
resolution April ASF board meeting)

Looking forward to seeing more!
Andrew

[1]: https://github.com/blaze-init/blaze
[2]: https://lists.apache.org/thread/6lrozds10jn9gknj9rf74lqbh7j55pq6

On Wed, Jan 10, 2024 at 5:10 PM Andy Grove  wrote:

> Hi Chao,
>
> This sounds like a really interesting project. I am interested in seeing
> how it compares to Spark RAPIDS (the project that I work on at NVIDIA) and
> Intel's Gluten project (that works with Velox).
>
> I can see the following benefits of having this project being under Apache
> Arrow governance:
>
> - Assuming that this is a drop-in replacement that doesn't require users to
> change their code (as I imagine is the case), then it could lead to greater
> adoption of DataFusion, especially for more demanding use cases where
> processing on a single node is not possible.
> - Given that it has a deep integration with the Rust implementation of
> Arrow as well as DataFusion, and given the overlap of committers between
> these projects, having them under the same governance and communication
> channels will generally be more efficient than if this project is separate.
> - Hopefully this leads to more upstream contributions to DataFusion,
> perhaps even allowing other projects such as Ballista to benefit from
> Spark-compatible operators and expressions in the future.
> - Having another project that uses DataFusion as a dependency could help
> with stabilizing the public APIs and generally driving more innovation.
>
> Given these points, I would be supportive of a donation. I see it as being
> similar to the Ballista project, which is already part of Arrow (and we
> plan to move along with DataFusion once it becomes a top-level project).
>
> Thanks,
>
> Andy.
>
> On Wed, Jan 10, 2024 at 2:28 PM Chao Sun  wrote:
>
> > Hi all,
> >
> > We have been working on a native execution engine for Apache Spark
> > that is heavily based on DataFusion and Arrow. Our goal is to
> > accelerate Spark query execution via delegating Spark's physical plan
> > execution to DataFusion's highly modular execution framework, while
> > still maintaining the same semantics to Spark users (i.e., no Spark
> > behavior change from the end users' point of view). Several of us are
> > Spark and/or Arrow committers. At the moment, the project is under
> > active development and not yet feature complete. However, some of the
> > existing functionalities are relatively mature and have been put in
> > production for a while now.
> >
> > Given the current momentum towards accelerating Spark through native
> > vectorized execution, we believe open sourcing this work will benefit
> > other Spark users too. In addition, we think the project itself can
> > also leverage the vibrant and strong community behind Arrow and
> > DataFusion, and grow faster. Because of this, we are exploring the
> > possibility of contributing this project to the Apache Software
> > Foundation (ASF) under the Apache Arrow project umbrella.
> >
> > We'd very much like to hear your opinion on this. Thanks.
> >
> > Best,
> > Chao
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 50.0.0 RC1

2024-01-09 Thread Andrew Lamb
+1 (binding)

I verified this on a Mac M3

I hit an error while running the verification scripts[1], but upon
investigation it looks like a test only issue and related to an update in a
dependent library

Thanks Raphael,
ANdrew

[1] https://github.com/apache/arrow-rs/issues/5292



On Tue, Jan 9, 2024 at 5:25 AM Raphael Taylor-Davies
 wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 50.0.0.
>
> This release candidate is based on commit:
> db811083669df66992008c9409b743a2e365adb0 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
>
> https://github.com/apache/arrow-rs/tree/db811083669df66992008c9409b743a2e365adb0
> [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-50.0.0-rc1
> [3]:
>
> https://github.com/apache/arrow-rs/blob/db811083669df66992008c9409b743a2e365adb0/CHANGELOG.md
> [4]:
>
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
>
>


Re: [DISCUSS] Linear Formula Types

2024-01-08 Thread Andrew Lamb
> Where I am struggling a little bit is to understand at what level those
compute functions should be implemented. As far as I can tell, when I load
a dictionary encoded arrow into a Pandas data frame or made a query using
DataFusion, the user can then just operate as if they are working directly
with a string array. Is that implemented in the arrow libraries, or does
each "application" (pandas, DataFusion, etc.) have their own implementation?

It is generally implemented by arrow libraries (for example, pandas uses
py-arrow, which uses the C++ Apache Arrow implementation, and DataFusion
uses arrow-rs, the Rust Apache Arrow implementation)

On Mon, Jan 8, 2024 at 12:22 PM Morrison-Reed Elliot (BEG/PJ-EDS-NA)
 wrote:

> Thanks for the hint.
>
> After reading through the geoarrow spec, I think I agree that this is
> probably the best approach.
>
> As far as I can tell all that is required is a standardized set of
> metadata tags and then some well implemented compute functions that can
> easily project the raw to physical interpretations.
>
> Where I am struggling a little bit is to understand at what level those
> compute functions should be implemented. As far as I can tell, when I load
> a dictionary encoded arrow into a Pandas data frame or made a query using
> DataFusion, the user can then just operate as if they are working directly
> with a string array. Is that implemented in the arrow libraries, or does
> each "application" (pandas, DataFusion, etc.) have their own implementation?
>
> Best regards,
> Elliot Morrison-Reed
>
> -Original Message-
> From: Andrew Lamb 
> Sent: Saturday, January 6, 2024 8:22 AM
> To: dev@arrow.apache.org
> Subject: Re: [DISCUSS] Linear Formula Types
>
> Hi Elliot,
>
> Given your description, I agree extension types sound like they may be a
> good idea, similar to geoarrow[1] for Geospatial data where there is extra
> metadata[2] needed to interpret underlying types (e.g. factor and offset)
>
> Andrew
>
> [1] https://github.com/geoarrow/geoarrow
> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html#geoarrow
>
> On Sat, Jan 6, 2024 at 3:20 AM Morrison-Reed Elliot (BEG/PJ-EDS-NA)
>  wrote:
>
> > Background
> >
> > I have been looking into using parquet files for storing and working
> > with automotive data. One interesting thing about automotive data is
> > that most communication happens on the CAN bus where we have extremely
> > limited bandwidth.
> > In order to encode "physical" values in a very space efficient way, we
> > use linear conversion formulas that look like "phys = (raw * factor) +
> > offset".
> > This gives implicit range and resolution limits, but that is often
> > just fine when we are representing a physical property.
> >
> > Example 1:
> >
> > We have a throttle that can be anywhere from 0-100% and we want to fit
> > that value into 1 byte. So we would use a formula like:
> >
> > phys = (raw * 0.39215) + 0
> >
> > Example 2:
> >
> > We want to record ambient temperature of the vehicle. Resolution of 1
> > degree is fine. Also, temperatures below -40 and above 215 degrees C
> > are not particularly useful as they are very rare and out of scope for
> > a useful temperature.
> >
> > phys = (raw * 1.0) - 40
> >
> > So far, I have been converting the raw data into floating point data
> > before writing to arrow format to make it easier for the analysts to
> > use the data. This of course means that I am converting to a less
> > efficient format and I am also losing inherent information about the
> > raw signal. I would rather be able to store the raw data in an
> > appropriately sized unsigned integer and automatically convert to
> > floating point when using the data, similar to dictionary encoding.
> >
> > Discussion
> >
> > - How would people generally deal with this situation using the arrow
> > format?
> > - Is this something that other people are interested in?
> > - If this were to be added to the spec, what would be the best way to
> > do it?
> >
> > While I am coming from an automotive perspective, I think there are
> > many other areas of applicability (reading sensor data through an ADC,
> > industrial automation and monitoring, etc.)
> >
> > I could see this working as either a new primitive type (similar to
> > decimal), or as an extension where we simply put the factor and offset
> > as standard metadata fields.
> >
> > Best regards,
> > Elliot Morrison-Reed
> >
> >
>


Re: [DISCUSS] Linear Formula Types

2024-01-06 Thread Andrew Lamb
Hi Elliot,

Given your description, I agree extension types sound like they may be a
good idea, similar to geoarrow[1] for Geospatial data where there is extra
metadata[2] needed to interpret underlying types (e.g. factor and offset)

Andrew

[1] https://github.com/geoarrow/geoarrow
[2] https://arrow.apache.org/docs/format/CanonicalExtensions.html#geoarrow

On Sat, Jan 6, 2024 at 3:20 AM Morrison-Reed Elliot (BEG/PJ-EDS-NA)
 wrote:

> Background
>
> I have been looking into using parquet files for storing and working with
> automotive data. One interesting thing about automotive data is that most
> communication happens on the CAN bus where we have extremely limited
> bandwidth.
> In order to encode "physical" values in a very space efficient way, we
> use linear conversion formulas that look like "phys = (raw * factor) +
> offset".
> This gives implicit range and resolution limits, but that is often just
> fine
> when we are representing a physical property.
>
> Example 1:
>
> We have a throttle that can be anywhere from 0-100% and we want to fit that
> value into 1 byte. So we would use a formula like:
>
> phys = (raw * 0.39215) + 0
>
> Example 2:
>
> We want to record ambient temperature of the vehicle. Resolution of 1
> degree is
> fine. Also, temperatures below -40 and above 215 degrees C are not
> particularly
> useful as they are very rare and out of scope for a useful temperature.
>
> phys = (raw * 1.0) - 40
>
> So far, I have been converting the raw data into floating point data before
> writing to arrow format to make it easier for the analysts to use the
> data. This of course means that I am converting to a less efficient format
> and I
> am also losing inherent information about the raw signal. I would rather
> be able
> to store the raw data in an appropriately sized unsigned integer and
> automatically convert to floating point when using the data, similar to
> dictionary encoding.
>
> Discussion
>
> - How would people generally deal with this situation using the arrow
> format?
> - Is this something that other people are interested in?
> - If this were to be added to the spec, what would be the best way to do
> it?
>
> While I am coming from an automotive perspective, I think there are many
> other
> areas of applicability (reading sensor data through an ADC, industrial
> automation and monitoring, etc.)
>
> I could see this working as either a new primitive type (similar to
> decimal), or
> as an extension where we simply put the factor and offset as standard
> metadata
> fields.
>
> Best regards,
> Elliot Morrison-Reed
>
>


Re: [VOTE] Accept donation of flightsql-odbc

2024-01-06 Thread Andrew Lamb
+1 (binding)

Thank you for helping make this happen

On Sat, Jan 6, 2024 at 2:10 AM Sutou Kouhei  wrote:

> +1
>
> In 
>   "[VOTE] Accept donation of flightsql-odbc" on Fri, 05 Jan 2024 10:41:21
> -0500,
>   "David Li"  wrote:
>
> > Hello,
> >
> > This vote is to determine if the Arrow PMC is in favor of accepting the
> donation of the flightsql-odbc library. This was discussed in a previous ML
> thread [1].
> >
> > The outline of the IP clearance form is at [2][3]. The code to be
> donated is at [4].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 : Accept the donation
> > [ ] 0 : No opinion
> > [ ] -1 : Reject donation because...
> >
> > My vote: +1
> >
> > [1]: https://lists.apache.org/thread/p3qyhd7p3o8v0wxgm2jvqf2vbqo92m8k
> > [2]:
> https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/arrow-flight-sql-odbc.xml
> > [3]:
> https://incubator.apache.org/ip-clearance/arrow-flight-sql-odbc.html
> > [4]: https://github.com/dremio/flightsql-odbc
> >
> > Best,
> > David
>


Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-01-05 Thread Andrew Lamb
Thank you very much

On Fri, Jan 5, 2024 at 11:17 AM Jean-Baptiste Onofré 
wrote:

> Hi Andrew,
>
> The PODLINGNAMESEARCH is not yet completed: the VP Brand Management
> (Mark Thomas) should comment in the Jira to approve or not the name.
>
> I added a comment in the Jira to ping Mark. He should get back to us soon.
>
> Regards
> JB
>
> On Fri, Jan 5, 2024 at 3:38 PM Andrew Lamb  wrote:
> >
> > Thanks JB,
> >
> > I did do a name search and posted the results here [1]
> >
> > However, I am not sure what the next steps for that particular process is
> > (like does someone have to approve it, for example?)
> >
> > Any insight you could provide would be greatly appreciated
> >
> > Andrew
> >
> > [1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219
> >
> >
> > On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré 
> wrote:
> >
> > > Hi Andrew,
> > >
> > > I did a quick review on the doc and it looks good to me. I just added
> > > a question about name search (DataFusion will probably work as TLP,
> > > but we have to check as we have a new Apache name moving from Arrow
> > > DataFusion to DataFusion).
> > >
> > > Please let me know if I can help on that.
> > >
> > > Thanks !
> > > Regards
> > > JB
> > >
> > > On Fri, Jan 5, 2024 at 12:26 PM Andrew Lamb 
> wrote:
> > > >
> > > > Upon reviewing the board report template, I am planning on the
> following
> > > > schedule:
> > > > 1. I'll leave this proposal for another few weeks to gather any
> > > additional
> > > > input
> > > > 2. In early February 2024 I'll start a formal vote thread on the dev@
> > > > mailing list for this proposal
> > > > 3. If the vote passes, I'll submit a proposed resolution to the ASF
> board
> > > > for their meeting in April 2024 using the pre-existing template[1]
> > > >
> > > >
> > > > [1]
> > > >
> > >
> https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt
> > > >
> > > > On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh 
> wrote:
> > > >
> > > > > Thanks for writing the proposal. It looks great to me too.
> > > > > I added a few comments on it.
> > > > >
> > > > > On Wed, Dec 27, 2023 at 3:05 PM Andy Grove 
> > > wrote:
> > > > > >
> > > > > > Thank you for creating the draft proposal, Andrew. I have
> reviewed
> > > this
> > > > > and
> > > > > > I think it looks great.
> > > > > >
> > > > > > Andy.
> > > > > >
> > > > > > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb <
> al...@influxdata.com>
> > > > > wrote:
> > > > > >
> > > > > > > I have created a draft proposal [1] to break DataFusion out to
> its
> > > own
> > > > > top
> > > > > > > level project. Please provide your feedback and suggestions.
> > > > > > >
> > > > > > > The proposal is included at the end of this email and in this
> > > Google
> > > > > Doc:
> > > > > > >
> > > > > > >
> > > > >
> > >
> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g
> > > > > > > .
> > > > > > >
> > > > > > > Feel free to respond to this email or comment / make
> suggestions
> > > > > directly
> > > > > > > on the document.
> > > > > > >
> > > > > > > I would be especially grateful if people could review and
> comment
> > > on
> > > > > the
> > > > > > > proposed list of committers and PMC members.
> > > > > > >
> > > > > > > I hope everyone is not getting sick of hearing about this, but
> I
> > > think
> > > > > in
> > > > > > > this case it is better to over communicate than risk surprises.
> > > > > > >
> > > > > > > Andrew
> > > > > > >
> > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/8491
> > > > > > >
> > > > > > >
> > > > >

Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-01-05 Thread Andrew Lamb
Thanks JB,

I did do a name search and posted the results here [1]

However, I am not sure what the next steps for that particular process is
(like does someone have to approve it, for example?)

Any insight you could provide would be greatly appreciated

Andrew

[1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219


On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré  wrote:

> Hi Andrew,
>
> I did a quick review on the doc and it looks good to me. I just added
> a question about name search (DataFusion will probably work as TLP,
> but we have to check as we have a new Apache name moving from Arrow
> DataFusion to DataFusion).
>
> Please let me know if I can help on that.
>
> Thanks !
> Regards
> JB
>
> On Fri, Jan 5, 2024 at 12:26 PM Andrew Lamb  wrote:
> >
> > Upon reviewing the board report template, I am planning on the following
> > schedule:
> > 1. I'll leave this proposal for another few weeks to gather any
> additional
> > input
> > 2. In early February 2024 I'll start a formal vote thread on the dev@
> > mailing list for this proposal
> > 3. If the vote passes, I'll submit a proposed resolution to the ASF board
> > for their meeting in April 2024 using the pre-existing template[1]
> >
> >
> > [1]
> >
> https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt
> >
> > On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh  wrote:
> >
> > > Thanks for writing the proposal. It looks great to me too.
> > > I added a few comments on it.
> > >
> > > On Wed, Dec 27, 2023 at 3:05 PM Andy Grove 
> wrote:
> > > >
> > > > Thank you for creating the draft proposal, Andrew. I have reviewed
> this
> > > and
> > > > I think it looks great.
> > > >
> > > > Andy.
> > > >
> > > > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb 
> > > wrote:
> > > >
> > > > > I have created a draft proposal [1] to break DataFusion out to its
> own
> > > top
> > > > > level project. Please provide your feedback and suggestions.
> > > > >
> > > > > The proposal is included at the end of this email and in this
> Google
> > > Doc:
> > > > >
> > > > >
> > >
> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g
> > > > > .
> > > > >
> > > > > Feel free to respond to this email or comment / make suggestions
> > > directly
> > > > > on the document.
> > > > >
> > > > > I would be especially grateful if people could review and comment
> on
> > > the
> > > > > proposed list of committers and PMC members.
> > > > >
> > > > > I hope everyone is not getting sick of hearing about this, but I
> think
> > > in
> > > > > this case it is better to over communicate than risk surprises.
> > > > >
> > > > > Andrew
> > > > >
> > > > > [1] https://github.com/apache/arrow-datafusion/issues/8491
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > DataFusion Top Level Project Proposal
> > > > > Dec 27, 2023
> > > > >
> > > > > [Editor’s note: This document is based on the proposal to the ASF
> > > board to
> > > > > create the Arrow project. One it is been reviewed, we plan to send
> it
> > > to
> > > > > the ASF board sometime in January or February 2024 for their
> > > consideration]
> > > > >
> > > > > To: The ASF (bo...@apache.org)
> > > > >
> > > > > Summary:
> > > > >
> > > > > We propose creating a new top level project, Apache DataFusion,
> from an
> > > > > existing sub project of Apache Arrow to facilitate additional
> > > community and
> > > > > project growth.
> > > > >
> > > > > 
> > > > > Apache DataFusion for Apache Top Level Project
> > > > >
> > > > > Abstract
> > > > >
> > > > > Apache Arrow DataFusion[1]  is a very fast, extensible query
> engine for
> > > > > building high-quality data-centric systems in Rust, using the
> Apache
> > > Arrow
> > > > > in-memory format. DataFusion offers SQL and Dataframe APIs,
> excellent
> > > > &g

Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.9.0 RC1

2024-01-05 Thread Andrew Lamb
+1 (binding)

I verified on M3 mac, both by running the tests manually via `cargo test
--all-features` as well as running the release verification script

Thank you Raphael for all your work on this release.

Andrew

On Fri, Jan 5, 2024 at 8:30 AM Raphael Taylor-Davies
 wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow Rust Object
> Store Implementation, version 0.9.0.
>
> This release candidate is based on commit:
> cb16050ec732872d5995c7420cc6858749bbf743 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust Object Store
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust Object Store because...
>
> [1]:
>
> https://github.com/apache/arrow-rs/tree/cb16050ec732872d5995c7420cc6858749bbf743
> [2]:
>
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.9.0-rc1
> [3]:
>
> https://github.com/apache/arrow-rs/blob/cb16050ec732872d5995c7420cc6858749bbf743/object_store/CHANGELOG.md
> [4]:
>
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
>
>


Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-01-05 Thread Andrew Lamb
Upon reviewing the board report template, I am planning on the following
schedule:
1. I'll leave this proposal for another few weeks to gather any additional
input
2. In early February 2024 I'll start a formal vote thread on the dev@
mailing list for this proposal
3. If the vote passes, I'll submit a proposed resolution to the ASF board
for their meeting in April 2024 using the pre-existing template[1]


[1]
https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt

On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh  wrote:

> Thanks for writing the proposal. It looks great to me too.
> I added a few comments on it.
>
> On Wed, Dec 27, 2023 at 3:05 PM Andy Grove  wrote:
> >
> > Thank you for creating the draft proposal, Andrew. I have reviewed this
> and
> > I think it looks great.
> >
> > Andy.
> >
> > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb 
> wrote:
> >
> > > I have created a draft proposal [1] to break DataFusion out to its own
> top
> > > level project. Please provide your feedback and suggestions.
> > >
> > > The proposal is included at the end of this email and in this Google
> Doc:
> > >
> > >
> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g
> > > .
> > >
> > > Feel free to respond to this email or comment / make suggestions
> directly
> > > on the document.
> > >
> > > I would be especially grateful if people could review and comment on
> the
> > > proposed list of committers and PMC members.
> > >
> > > I hope everyone is not getting sick of hearing about this, but I think
> in
> > > this case it is better to over communicate than risk surprises.
> > >
> > > Andrew
> > >
> > > [1] https://github.com/apache/arrow-datafusion/issues/8491
> > >
> > >
> > > --
> > >
> > > DataFusion Top Level Project Proposal
> > > Dec 27, 2023
> > >
> > > [Editor’s note: This document is based on the proposal to the ASF
> board to
> > > create the Arrow project. One it is been reviewed, we plan to send it
> to
> > > the ASF board sometime in January or February 2024 for their
> consideration]
> > >
> > > To: The ASF (bo...@apache.org)
> > >
> > > Summary:
> > >
> > > We propose creating a new top level project, Apache DataFusion, from an
> > > existing sub project of Apache Arrow to facilitate additional
> community and
> > > project growth.
> > >
> > > 
> > > Apache DataFusion for Apache Top Level Project
> > >
> > > Abstract
> > >
> > > Apache Arrow DataFusion[1]  is a very fast, extensible query engine for
> > > building high-quality data-centric systems in Rust, using the Apache
> Arrow
> > > in-memory format. DataFusion offers SQL and Dataframe APIs, excellent
> > > performance, built-in support for CSV, Parquet, JSON, and Avro,
> extensive
> > > customization, and a great community.
> > >
> > > [1] https://arrow.apache.org/datafusion/
> > >
> > >
> > > Proposal
> > >
> > > We propose creating a new top level ASF project, Apache DataFusion,
> > > governed initially by a subset of the Arrow project’s PMC and
> committers.
> > > The project’s code is in four existing git repositories, currently
> governed
> > > by Apache Arrow which would transfer to the new top level project.
> > >
> > > Background
> > >
> > > When DataFusion was initially donated to the Arrow project, it did not
> have
> > > a strong enough community to stand on its own. It has since grown
> > > significantly, and benefited immensely from being part of Arrow and
> > > nurturing of the Apache Way, and now has a community strong enough to
> stand
> > > on its own and that would benefit from focused governance attention.
> > >
> > > The community has discussed this idea publicly for more than 6 months
> > > https://github.com/apache/arrow-datafusion/discussions/6475  and
> briefly
> > > on
> > > the Arrow PMC mailing list
> > > https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As
> of
> > > the
> > > time of this writing both had exclusively positive reactions.
> > >
> > > Several current members of the Arrow PMC are both active contributors
> to
> > > DataFusion and understand and believe deeply in the Apache Way, and
> play
> > &

Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 34.0.0 RC1

2024-01-03 Thread Andrew Lamb
I think you are right -- it seems I have too new a python installed (3.11)

andrewlamb@Andrews-MacBook-Pro:~/Software/sqlparser-rs/derive$ python3
--version
Python 3.11.6

Thank you for the investigation

Andrew


On Tue, Jan 2, 2024 at 7:39 PM Andy Grove  wrote:

> Hi Andrew,
>
> It looks like the issue is that numpy 1.21.3 requires a different Python
> version:
>
> 1.21.3 Requires-Python>=3.7,<3.11
>
> I am guessing that you have a Python version that is not within that range?
>
> I agree that this should not be a blocker.
>
> Thanks,
>
> Andy.
>
>
>
> On Fri, Dec 29, 2023 at 4:18 AM Andrew Lamb  wrote:
>
> > I had some trouble running the verification script --  got an error that
> a
> > specific version of numpy was not available. I am running on an Apple M3
> > Max.
> >
> > ERROR: No matching distribution found for numpy==1.21.3
> >
> > However that version does appear to be available:
> > https://pypi.org/project/numpy/1.21.3/
> >
> > I was able to verify the release on a Ubuntu 22.04 / x86_64 machine so I
> > don't think this is a release blocker
> >
> > Andrew
> >
> >
> > Here are more details:
> >
> > $ ./dev/release/verify-release-candidate.sh 34.0.0 1
> > + set -o pipefail
> > +++ dirname ./dev/release/verify-release-candidate.sh
> > ++ cd ./dev/release
> > ++ pwd
> > ...
> > Successfully installed pip-23.3.2
> > + python3 -m pip install -r requirements-310.txt
> > Collecting attrs==21.2.0 (from -r requirements-310.txt (line 7))
> >   Using cached attrs-21.2.0-py2.py3-none-any.whl (53 kB)
> > Collecting black==21.9b0 (from -r requirements-310.txt (line 11))
> >   Using cached black-21.9b0-py3-none-any.whl (148 kB)
> > Collecting click==8.0.3 (from -r requirements-310.txt (line 15))
> >   Using cached click-8.0.3-py3-none-any.whl (97 kB)
> > Collecting flake8==4.0.1 (from -r requirements-310.txt (line 19))
> >   Using cached flake8-4.0.1-py2.py3-none-any.whl (64 kB)
> > Collecting iniconfig==1.1.1 (from -r requirements-310.txt (line 23))
> >   Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
> > Collecting isort==5.9.3 (from -r requirements-310.txt (line 27))
> >   Using cached isort-5.9.3-py3-none-any.whl (106 kB)
> > Collecting maturin==0.15.1 (from -r requirements-310.txt (line 31))
> >   Using cached
> >
> >
> maturin-0.15.1-py3-none-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
> > (14.5 MB)
> > Collecting mccabe==0.6.1 (from -r requirements-310.txt (line 46))
> >   Using cached mccabe-0.6.1-py2.py3-none-any.whl (8.6 kB)
> > Collecting mypy==0.910 (from -r requirements-310.txt (line 50))
> >   Using cached mypy-0.910-py3-none-any.whl (2.1 MB)
> > Collecting mypy-extensions==0.4.3 (from -r requirements-310.txt (line
> 75))
> >   Using cached mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)
> > ERROR: Ignored the following versions that require a different python
> > version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python
> > >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python
> > >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
> > ERROR: Could not find a version that satisfies the requirement
> > numpy==1.21.3 (from versions: 1.3.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.6.1,
> > 1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.9.1, 1.9.2,
> > 1.9.3, 1.10.0.post2, 1.10.1, 1.10.2, 1.10.4, 1.11.0, 1.11.1, 1.11.2,
> > 1.11.3, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 1.13.3, 1.14.0, 1.14.1, 1.14.2,
> > 1.14.3, 1.14.4, 1.14.5, 1.14.6, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4,
> > 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.17.0, 1.17.1,
> > 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4,
> > 1.18.5, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.20.0, 1.20.1,
> > 1.20.2, 1.20.3, 1.21.0, 1.21.1, 1.22.0, 1.22.1, 1.22.2, 1.22.3, 1.22.4,
> > 1.23.0rc1, 1.23.0rc2, 1.23.0rc3, 1.23.0, 1.23.1, 1.23.2, 1.23.3, 1.23.4,
> > 1.23.5, 1.24.0rc1, 1.24.0rc2, 1.24.0, 1.24.1, 1.24.2, 1.24.3, 1.24.4,
> > 1.25.0rc1, 1.25.0, 1.25.1, 1.25.2, 1.26.0b1, 1.26.0rc1, 1.26.0, 1.26.1,
> > 1.26.2)
> > ERROR: No matching distribution found for numpy==1.21.3
> > + cleanup
> > + '[' no = yes ']'
> > + echo 'Failed to verify release candidate. See
> >
> >
> /var/folders/1l/tg68jc6550gg8xqf1hr4mlwrgn/T/arrow-34.0.0.X.Oo74Ac7ank
> > for details.'
> > Failed to verify release candidate. See
> >
> >
> /var/folders/1l/tg68jc6550gg8xqf1hr4mlwrgn/T/arrow-34.0.0.X.Oo74Ac7ank
&

Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 34.0.0 RC1

2023-12-29 Thread Andrew Lamb
I had some trouble running the verification script --  got an error that a
specific version of numpy was not available. I am running on an Apple M3
Max.

ERROR: No matching distribution found for numpy==1.21.3

However that version does appear to be available:
https://pypi.org/project/numpy/1.21.3/

I was able to verify the release on a Ubuntu 22.04 / x86_64 machine so I
don't think this is a release blocker

Andrew


Here are more details:

$ ./dev/release/verify-release-candidate.sh 34.0.0 1
+ set -o pipefail
+++ dirname ./dev/release/verify-release-candidate.sh
++ cd ./dev/release
++ pwd
...
Successfully installed pip-23.3.2
+ python3 -m pip install -r requirements-310.txt
Collecting attrs==21.2.0 (from -r requirements-310.txt (line 7))
  Using cached attrs-21.2.0-py2.py3-none-any.whl (53 kB)
Collecting black==21.9b0 (from -r requirements-310.txt (line 11))
  Using cached black-21.9b0-py3-none-any.whl (148 kB)
Collecting click==8.0.3 (from -r requirements-310.txt (line 15))
  Using cached click-8.0.3-py3-none-any.whl (97 kB)
Collecting flake8==4.0.1 (from -r requirements-310.txt (line 19))
  Using cached flake8-4.0.1-py2.py3-none-any.whl (64 kB)
Collecting iniconfig==1.1.1 (from -r requirements-310.txt (line 23))
  Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting isort==5.9.3 (from -r requirements-310.txt (line 27))
  Using cached isort-5.9.3-py3-none-any.whl (106 kB)
Collecting maturin==0.15.1 (from -r requirements-310.txt (line 31))
  Using cached
maturin-0.15.1-py3-none-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
(14.5 MB)
Collecting mccabe==0.6.1 (from -r requirements-310.txt (line 46))
  Using cached mccabe-0.6.1-py2.py3-none-any.whl (8.6 kB)
Collecting mypy==0.910 (from -r requirements-310.txt (line 50))
  Using cached mypy-0.910-py3-none-any.whl (2.1 MB)
Collecting mypy-extensions==0.4.3 (from -r requirements-310.txt (line 75))
  Using cached mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)
ERROR: Ignored the following versions that require a different python
version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python
>=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python
>=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement
numpy==1.21.3 (from versions: 1.3.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.6.1,
1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.9.1, 1.9.2,
1.9.3, 1.10.0.post2, 1.10.1, 1.10.2, 1.10.4, 1.11.0, 1.11.1, 1.11.2,
1.11.3, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 1.13.3, 1.14.0, 1.14.1, 1.14.2,
1.14.3, 1.14.4, 1.14.5, 1.14.6, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4,
1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.17.0, 1.17.1,
1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4,
1.18.5, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.20.0, 1.20.1,
1.20.2, 1.20.3, 1.21.0, 1.21.1, 1.22.0, 1.22.1, 1.22.2, 1.22.3, 1.22.4,
1.23.0rc1, 1.23.0rc2, 1.23.0rc3, 1.23.0, 1.23.1, 1.23.2, 1.23.3, 1.23.4,
1.23.5, 1.24.0rc1, 1.24.0rc2, 1.24.0, 1.24.1, 1.24.2, 1.24.3, 1.24.4,
1.25.0rc1, 1.25.0, 1.25.1, 1.25.2, 1.26.0b1, 1.26.0rc1, 1.26.0, 1.26.1,
1.26.2)
ERROR: No matching distribution found for numpy==1.21.3
+ cleanup
+ '[' no = yes ']'
+ echo 'Failed to verify release candidate. See
/var/folders/1l/tg68jc6550gg8xqf1hr4mlwrgn/T/arrow-34.0.0.X.Oo74Ac7ank
for details.'
Failed to verify release candidate. See
/var/folders/1l/tg68jc6550gg8xqf1hr4mlwrgn/T/arrow-34.0.0.X.Oo74Ac7ank
for details.
(venv) andrewlamb@Andrews-MacBook-Pro
:~/Downloads/apache-arrow-datafusion-python-34.0.0$


On Thu, Dec 28, 2023 at 9:18 PM vin jake  wrote:

> +1 (binding) Verified on my M1 Mac.
>
> Thanks Andy!
>
> On Fri, Dec 29, 2023 at 5:42 AM Andy Grove  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion Python
> > Bindings,
> > version 34.0.0.
> >
> > This release candidate is based on commit:
> > b22f82f3055941dc3599c9a18458a2de163ff4c0 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> > The Python wheels are located at [4].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> >
> https://github.com/apache/arrow-datafusion-python/blob/main/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion Python 34.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion Python 34.0.0
> > because...
> >
> > Here is my vote:
> >
> > +1
> >
> > [1]:
> >
> >
> https://github.com/apache/arrow-datafusion-python/tree/b22f82f3055941dc3599c9a18458a2de163ff4c0
> > [2]:
> >
> >
> 

[DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2023-12-27 Thread Andrew Lamb
I have created a draft proposal [1] to break DataFusion out to its own top
level project. Please provide your feedback and suggestions.

The proposal is included at the end of this email and in this Google Doc:
https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g
.

Feel free to respond to this email or comment / make suggestions directly
on the document.

I would be especially grateful if people could review and comment on the
proposed list of committers and PMC members.

I hope everyone is not getting sick of hearing about this, but I think in
this case it is better to over communicate than risk surprises.

Andrew

[1] https://github.com/apache/arrow-datafusion/issues/8491


--

DataFusion Top Level Project Proposal
Dec 27, 2023

[Editor’s note: This document is based on the proposal to the ASF board to
create the Arrow project. One it is been reviewed, we plan to send it to
the ASF board sometime in January or February 2024 for their consideration]

To: The ASF (bo...@apache.org)

Summary:

We propose creating a new top level project, Apache DataFusion, from an
existing sub project of Apache Arrow to facilitate additional community and
project growth.


Apache DataFusion for Apache Top Level Project

Abstract

Apache Arrow DataFusion[1]  is a very fast, extensible query engine for
building high-quality data-centric systems in Rust, using the Apache Arrow
in-memory format. DataFusion offers SQL and Dataframe APIs, excellent
performance, built-in support for CSV, Parquet, JSON, and Avro, extensive
customization, and a great community.

[1] https://arrow.apache.org/datafusion/


Proposal

We propose creating a new top level ASF project, Apache DataFusion,
governed initially by a subset of the Arrow project’s PMC and committers.
The project’s code is in four existing git repositories, currently governed
by Apache Arrow which would transfer to the new top level project.

Background

When DataFusion was initially donated to the Arrow project, it did not have
a strong enough community to stand on its own. It has since grown
significantly, and benefited immensely from being part of Arrow and
nurturing of the Apache Way, and now has a community strong enough to stand
on its own and that would benefit from focused governance attention.

The community has discussed this idea publicly for more than 6 months
https://github.com/apache/arrow-datafusion/discussions/6475  and briefly on
the Arrow PMC mailing list
https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As of the
time of this writing both had exclusively positive reactions.

Several current members of the Arrow PMC are both active contributors to
DataFusion and understand and believe deeply in the Apache Way, and play
active governance roles in the Arrow project as PMC members and PMC chairs,
guiding the community, and releasing software versions. With this existing
governance experience and structure, the new top level project will be able
to function well immediately and independently.

Overview of DataFusion

Current Status

Meritocracy

DataFusion has been developed as part of Apache Arrow and thus has been
operating as a meritocracy. Many of the developers of DataFusion are Arrow
PMC members or committers. The DataFusion project plans to continue adding
new PMC and committers as the project matures and grows.

Community

The DataFusion development team seeks to foster the development and user
communities. We hope that becoming a separate project will help both Arrow
and DataFusion communities by being more focused.  Focused governance will
make it easier to grow the community of committers and PMC members and make
the organization more clear to others.

Alignment

The ASF is a natural host for DataFusion given that it is already the home
of Arrow, Parquet, and other related distributed system, storage and query
execution systems.

Project Leadership

Proposed Initial PMC

We propose the following people as the initial DataFusion PMC members. This
is a subset of the existing Arrow PMC members who contribute to DataFusion
https://people.apache.org/phonebook.html?unix=arrow

Andy Grove (agrove):  Arrow PMC Chair
Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair
Daniël Heres (dheres) Arrow PMC
Jie Wen (jakevin):  Arrow PMC, Doris Committer
Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC
Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC
Qingping Hou: (houqp): Arrow PMC, Doris Committer
Will Jones (wjones127): Arrow PMC

We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF VP)
for the DataFusion project.

Affiliations

Andy Grove (agrove):  NVidia
Andrew Lamb (alamb): InfluxData
Daniël Heres (dheres): Coralogix
Jie Wen (jakevin): SelectDB
Kun Liu (liukun): Ebay
Liang-Chi Hsieh (viirya): Apple
Qingping Hou: (houqp): Scribd
Will Jones (wjones127): VoltronData

Proposed Initial Committers

In addition to the PMC, we propose the following people as the initial
DataFusion committers

Re: Question: Arrow Flight handshake protocol version

2023-12-27 Thread Andrew Lamb
To the best of my knowledge, the Rust crate implements the Handshake
portion of the protocol described in the Flight documentation [1]

Depending on your usecase I think your application needs to decide how to
use the Handshake message, if at all

Andrew

[1]: https://arrow.apache.org/docs/format/Flight.html#authentication

On Thu, Dec 21, 2023 at 10:04 AM vertexclique vertexclique <
vertexcli...@gmail.com> wrote:

> Hi all;
>
> I am looking to Rust crate for arrow flight. One thing that I am not sure
> right now is that protocol version is dedicated to user-defined protocol or
> that's reserved in a sense for handshake for a future use (or current)?
>
> Best,
> Theo Mahmut
>


Re: [RESULT][VOTE][RUST] Release Apache Arrow Rust 49.0.0 RC1

2023-12-21 Thread Andrew Lamb
Based on a conversation in slack, I also pushed the 49.0.0 tag to the repo
with these commands

```
git tag 49.0.0 747dcbf0670aeab2ede474edb3c4f22028d6a7e6
git push apache 49.0.0
```

On Mon, Nov 13, 2023 at 7:11 AM Raphael Taylor-Davies
 wrote:

> With 6 +1 votes (4 binding) the release is approved
>
> The release is available here:
>
> It has also been released to crates.io. I opted to omit arrow-avro as it
> is still a work in progress.
>
> Thank you to everyone who helped verify this release
>
> On 09/11/2023 20:37, Andrew Lamb wrote:
> > +1 (binding) on Mac x86
> >
> > Thank you Raphael
> >
> > Andrew
> >
> > On Thu, Nov 9, 2023 at 11:49 AM Chao Sun  wrote:
> >
> >> +1 (non-binding)
> >>
> >> Verified on M1 Mac. Thanks Raphael.
> >>
> >> On Thu, Nov 9, 2023 at 12:47 AM Wayne Xia  wrote:
> >>> +1 (non-binding)
> >>>
> >>> Verified on Intel Linux
> >>>
> >>> Thanks Raphael
> >>>
> >>> On Wed, Nov 8, 2023 at 6:12 AM L. C. Hsieh  wrote:
> >>>
> >>>> +1 (binding)
> >>>>
> >>>> Verified on Intel Mac.
> >>>>
> >>>> Thanks Raphael.
> >>>>
> >>>> On Tue, Nov 7, 2023 at 1:38 PM Andy Grove 
> >> wrote:
> >>>>> +1 (binding)
> >>>>>
> >>>>> Verified on Ubuntu 22.04.3 LTS.
> >>>>>
> >>>>> Thanks, Raphael.
> >>>>>
> >>>>> On Tue, Nov 7, 2023 at 2:22 PM Raphael Taylor-Davies
> >>>>>  wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I would like to propose a release of Apache Arrow Rust
> >> Implementation,
> >>>>>> version 49.0.0.
> >>>>>>
> >>>>>> This release candidate is based on commit:
> >>>>>> 747dcbf0670aeab2ede474edb3c4f22028d6a7e6 [1]
> >>>>>>
> >>>>>> The proposed release tarball and signatures are hosted at [2].
> >>>>>>
> >>>>>> The changelog is located at [3].
> >>>>>>
> >>>>>> Please download, verify checksums and signatures, run the unit
> >> tests,
> >>>>>> and vote on the release. There is a script [4] that automates some
> >> of
> >>>>>> the verification.
> >>>>>>
> >>>>>> The vote will be open for at least 72 hours.
> >>>>>>
> >>>>>> [ ] +1 Release this as Apache Arrow Rust
> >>>>>> [ ] +0
> >>>>>> [ ] -1 Do not release this as Apache Arrow Rust  because...
> >>>>>>
> >>>>>> [1]:
> >>>>>>
> >>>>>>
> >>
> https://github.com/apache/arrow-rs/tree/747dcbf0670aeab2ede474edb3c4f22028d6a7e6
> >>>>>> [2]:
> >>>>>>
> >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-49.0.0-rc1
> >>>>>> [3]:
> >>>>>>
> >>>>>>
> >>
> https://github.com/apache/arrow-rs/blob/747dcbf0670aeab2ede474edb3c4f22028d6a7e6/CHANGELOG.md
> >>>>>> [4]:
> >>>>>>
> >>>>>>
> >>
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >>>>>>
>


[DISUCSS] [DATAFUSION] Repositories

2023-12-21 Thread Andrew Lamb
This is another chain related to the previous discussion [1], about
planning to propose [2]
"graduating" the DataFusion to its own top level Apache project.

In this chain, I would like to discuss which of the current arrow
repositories[3] to propose moving to the new PMC

I propose the following three repositories move in their entirety

https://github.com/apache/arrow-datafusion
https://github.com/apache/arrow-ballista
https://github.com/apache/arrow-datafusion-python

I do now know of any code in any other arrow repositories that would be
appropriate to move to a new Apache Project.

Please let me know your thoughts,
Andrew

[1]: https://github.com/apache/arrow-datafusion/discussions/6475
[2]: https://github.com/apache/arrow-datafusion/issues/8491
[3]:
https://github.com/orgs/apache/repositories?language==arrow==all


[DISCUSS] [DATAFUSION] PMC for new DataFusion top level project

2023-12-20 Thread Andrew Lamb
Hello,

As we have discussed previously [1], we are planning to propose [2]
"graduating" the DataFusion to its own top level Apache project.

I would like to discuss the initial PMC members for the new top level
project.  The suggestion in [1] is

> All existing Arrow Committers and PMC members who so desired, would start
as committers or PMC members on the new DataFusion project (assuming this
is allowed by the process)

>From what I can tell, this means the PMC would be the following from the
current Arrow PMC [3]:

Andy Grove (NVidia)
Andrew Lamb (InfuxData)
Daniël Heres (Coralogix)
Jie Wen (SelectDB)
Kun Liu (Ebay)
Liang-Chi Hsieh (Apple)
Qingping Hou (Scribd)
Will Jones (VoltronData)

I think Raphael Taylor-Davies has told me offline he would prefer to focus
on Arrow and thus now join the DataFusion PMC, though it would be nice if
he could confirm.

We also need to propose a chair of the new PMC -- I am happy to help anyone
who would like to do this role, or do it myself. Spending a year as the
Arrow PMC chair gave me sufficient experience to make sure the process is
smooth, in my opinion.

If  the new project is approved and created then the initial PMC will
invite the relevant existing Arrow commiters as DataFusion committers.

Please let me know your thoughts and if there are other existing Arrow PMC
members who should be included in the proposal for initial DataFusion PMC.

Andrew

p.s. As part of this process, I discovered Arrow's origin as a top level
project came from splitting off from the Apache Drill project, which I had
not previously known


[1]: https://github.com/apache/arrow-datafusion/discussions/6475
[2]: https://github.com/apache/arrow-datafusion/issues/8491
[3]: https://arrow.apache.org/committers/


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 34.0.0 RC1

2023-12-13 Thread Andrew Lamb
There is now a PR with a proposed fix ready as well[1]

[1]: https://github.com/apache/arrow-datafusion/pull/8533

On Wed, Dec 13, 2023 at 3:35 PM L. C. Hsieh  wrote:

> Thanks Andrew for finding the regression before releasing.
>
> Agreed to hold on the release for now.
>
> On Wed, Dec 13, 2023 at 12:26 PM Andy Grove  wrote:
> >
> > Thanks, Andrew.
> >
> > That does seem quite serious. I agree it would be better to hold off on
> the
> > release until this is resolved.
> >
> > On Wed, Dec 13, 2023 at 1:09 PM Andrew Lamb 
> wrote:
> >
> > > I have found a regression [1] that I think is fairly serious and
> should be
> > > fixed prior to a release.
> > >
> > > If others think we should release anyways I can plan to make a patch
> > > release shortly with the fix
> > >
> > >
> > > [1] https://github.com/apache/arrow-datafusion/issues/8532
> > >
> > > On Wed, Dec 13, 2023 at 2:36 AM Kun Liu  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > verified on the M2 mac
> > > >
> > > > Thanks
> > > >
> > > > Wayne Xia  于2023年12月13日周三 14:27写道:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Verified on AMD64 Linux
> > > > >
> > > > > Thanks Andy
> > > > >
> > > > > On Wed, Dec 13, 2023 at 10:05 AM vin jake 
> > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Verified on Mac M1
> > > > > >
> > > > > > Thanks Andy
> > > > > >
> > > > > > Andy Grove  于 2023年12月12日周二 下午10:17写道:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I would like to propose a release of Apache Arrow DataFusion
> > > > > > > Implementation,
> > > > > > > version 34.0.0.
> > > > > > >
> > > > > > > This release candidate is based on commit:
> > > > > > > 1a02d1456878dcd44159ebaf33e24c28f471aa14 [1]
> > > > > > > The proposed release tarball and signatures are hosted at [2].
> > > > > > > The changelog is located at [3].
> > > > > > >
> > > > > > > Please download, verify checksums and signatures, run the unit
> > > tests,
> > > > > and
> > > > > > > vote
> > > > > > > on the release. The vote will be open for at least 72 hours.
> > > > > > >
> > > > > > > Only votes from PMC members are binding, but all members of the
> > > > > community
> > > > > > > are
> > > > > > > encouraged to test the release and vote with "(non-binding)".
> > > > > > >
> > > > > > > The standard verification procedure is documented at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > > > > > > .
> > > > > > >
> > > > > > > [ ] +1 Release this as Apache Arrow DataFusion 34.0.0
> > > > > > > [ ] +0
> > > > > > > [ ] -1 Do not release this as Apache Arrow DataFusion 34.0.0
> > > > because...
> > > > > > >
> > > > > > > Here is my vote:
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/arrow-datafusion/tree/1a02d1456878dcd44159ebaf33e24c28f471aa14
> > > > > > > [2]:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-34.0.0-rc1
> > > > > > > [3]:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/arrow-datafusion/blob/1a02d1456878dcd44159ebaf33e24c28f471aa14/CHANGELOG.md
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 34.0.0 RC1

2023-12-13 Thread Andrew Lamb
I have found a regression [1] that I think is fairly serious and should be
fixed prior to a release.

If others think we should release anyways I can plan to make a patch
release shortly with the fix


[1] https://github.com/apache/arrow-datafusion/issues/8532

On Wed, Dec 13, 2023 at 2:36 AM Kun Liu  wrote:

> +1 (binding)
>
> verified on the M2 mac
>
> Thanks
>
> Wayne Xia  于2023年12月13日周三 14:27写道:
>
> > +1 (non-binding)
> >
> > Verified on AMD64 Linux
> >
> > Thanks Andy
> >
> > On Wed, Dec 13, 2023 at 10:05 AM vin jake  wrote:
> >
> > > +1 (binding)
> > >
> > > Verified on Mac M1
> > >
> > > Thanks Andy
> > >
> > > Andy Grove  于 2023年12月12日周二 下午10:17写道:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose a release of Apache Arrow DataFusion
> > > > Implementation,
> > > > version 34.0.0.
> > > >
> > > > This release candidate is based on commit:
> > > > 1a02d1456878dcd44159ebaf33e24c28f471aa14 [1]
> > > > The proposed release tarball and signatures are hosted at [2].
> > > > The changelog is located at [3].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > and
> > > > vote
> > > > on the release. The vote will be open for at least 72 hours.
> > > >
> > > > Only votes from PMC members are binding, but all members of the
> > community
> > > > are
> > > > encouraged to test the release and vote with "(non-binding)".
> > > >
> > > > The standard verification procedure is documented at
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> > > > .
> > > >
> > > > [ ] +1 Release this as Apache Arrow DataFusion 34.0.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow DataFusion 34.0.0
> because...
> > > >
> > > > Here is my vote:
> > > >
> > > > +1
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/1a02d1456878dcd44159ebaf33e24c28f471aa14
> > > > [2]:
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-34.0.0-rc1
> > > > [3]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/1a02d1456878dcd44159ebaf33e24c28f471aa14/CHANGELOG.md
> > > >
> > >
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 34.0.0 RC1

2023-12-12 Thread Andrew Lamb
+1 (binding)

Verified on Mac M3

Thanks for keeping things rolling Andy




On Tue, Dec 12, 2023 at 9:18 AM Andy Grove  wrote:

> Hi,
>
> I would like to propose a release of Apache Arrow DataFusion
> Implementation,
> version 34.0.0.
>
> This release candidate is based on commit:
> 1a02d1456878dcd44159ebaf33e24c28f471aa14 [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests, and
> vote
> on the release. The vote will be open for at least 72 hours.
>
> Only votes from PMC members are binding, but all members of the community
> are
> encouraged to test the release and vote with "(non-binding)".
>
> The standard verification procedure is documented at
>
> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
> .
>
> [ ] +1 Release this as Apache Arrow DataFusion 34.0.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow DataFusion 34.0.0 because...
>
> Here is my vote:
>
> +1
>
> [1]:
>
> https://github.com/apache/arrow-datafusion/tree/1a02d1456878dcd44159ebaf33e24c28f471aa14
> [2]:
>
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-34.0.0-rc1
> [3]:
>
> https://github.com/apache/arrow-datafusion/blob/1a02d1456878dcd44159ebaf33e24c28f471aa14/CHANGELOG.md
>


Re: [VOTE] Flight SQL as experimental

2023-12-08 Thread Andrew Lamb
+1 (binding)

Sorry for the noise about a new thread, I just checked the archive[1] and
this is in a different thread. Thank you for this David.

Andrew

[1] https://lists.apache.org/thread/9t78clfqzsby08d2ryc83gwrtm3cthq8

On Fri, Dec 8, 2023 at 3:15 PM Joel Lubinitsky  wrote:

> +1 (non-binding)
>
> On Fri, Dec 8, 2023 at 3:11 PM Aldrin  wrote:
>
> > This thread does have [VOTE] for me. does it not for you?
> >
> > Sent from Proton Mail <https://proton.me/mail/home> for iOS
> >
> >
> > On Fri, Dec 8, 2023 at 12:09, Andrew Lamb  > > wrote:
> >
> > Would it be possible to change the thread's subject line to "[VOTE]" so
> it
> > is more visible that we are proposing a change? I worry that this will be
> > buried at the bottom of something that says "[DISCUSS]"
> >
> > On Fri, Dec 8, 2023 at 2:43 PM David Li  wrote:
> >
> > > Let's start a formal vote just so we're on the same page now that we've
> > > discussed a few things.
> > >
> > > I would like to propose we remove 'experimental' from Flight SQL and
> make
> > > it stable:
> > >
> > > - Remove the 'experimental' option from the Protobuf definitions (but
> > > leave the option definition for future additions)
> > > - Update specifications/documentation/implementations to no longer
> refer
> > > to Flight SQL as experimental, and describe what stable means (no
> > > backwards-incompatible changes)
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1
> > > [ ] +0
> > > [ ] -1 Keep Flight SQL experimental because...
> > >
> > > On Fri, Dec 8, 2023, at 13:37, Weston Pace wrote:
> > > > +1
> > > >
> > > > On Fri, Dec 8, 2023 at 10:33 AM Micah Kornfield <
> emkornfi...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >> +1
> > > >>
> > > >> On Fri, Dec 8, 2023 at 10:29 AM Andrew Lamb 
> > > wrote:
> > > >>
> > > >> > I agree it is time to "promote" ArrowFlightSQL to the same level
> as
> > > other
> > > >> > standards in Arrow
> > > >> >
> > > >> > Now that it is used widely (we use and count on it too at
> > InfluxData)
> > > I
> > > >> > agree it makes sense to remove the experimental label from the
> > overall
> > > >> > spec.
> > > >> >
> > > >> > It would make sense to leave experimental / caveats on any places
> > > (like
> > > >> > extension APIs) that are likely to change
> > > >> >
> > > >> > Andrew
> > > >> >
> > > >> > On Fri, Dec 8, 2023 at 11:39 AM David Li 
> > wrote:
> > > >> >
> > > >> > > Yes, I think we can continue marking new features (like the bulk
> > > >> > > ingest/session proposals) as experimental but remove it from
> > > anything
> > > >> > > currently in the spec.
> > > >> > >
> > > >> > > On Fri, Dec 8, 2023, at 11:36, Laurent Goujon wrote:
> > > >> > > > I'm the author of the initial pull request which triggered the
> > > >> > > discussion.
> > > >> > > > I was focusing first on the comment in Maven pom.xml files
> which
> > > show
> > > >> > up
> > > >> > > in
> > > >> > > > Maven Central and other places, and which got some people
> > confused
> > > >> > about
> > > >> > > > the state of the driver/code. IMHO this would apply to the
> > current
> > > >> > > > Flight/Flight SQL protocol and code as it is today. Protocol
> > > >> extensions
> > > >> > > > should be still deemed experimental if still in their
> incubating
> > > >> phase?
> > > >> > > >
> > > >> > > > Laurent
> > > >> > > >
> > > >> > > > On Thu, Dec 7, 2023 at 4:54 PM Micah Kornfield <
> > > >> emkornfi...@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> This applies to mostly existing APIs (e.g. recent additions
> are
> > > >> still
> > > >> > > >> experimental)? Or wou

  1   2   3   4   5   6   7   8   9   >