[VOTE][Format] Opaque canonical extension type

2024-07-23 Thread David Li
Hello,

I'd like to propose the 'Opaque' canonical extension type. Prior discussion can 
be found at [1] and the proposal and implementations for C++, Go, Java, and 
Python can be found at [2]. The proposal is additionally reproduced below.

The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...

[1]: https://lists.apache.org/thread/8d5ldl5cb7mms21rd15lhpfrv4j9no4n
[2]: https://github.com/apache/arrow/pull/41823

---

Opaque represents a type that an Arrow-based system received from an external
(often non-Arrow) system, but that it cannot interpret.  In this case, it can
pass on Opaque to its clients to at least show that a field exists and
preserve metadata about the type from the other system.

Extension parameters:

* Extension name: ``arrow.opaque``.

* The storage type of this extension is any type.  If there is no underlying
  data, the storage type should be Null.

* Extension type parameters:

  * **type_name** = the name of the unknown type in the external system.
  * **vendor_name** = the name of the external system.

* Description of the serialization:

  A valid JSON object containing the parameters as fields.  In the future,
  additional fields may be added, but all fields current and future are never
  required to interpret the array.

  Developers **should not** attempt to enable public semantic interoperability
  of Opaque by canonicalizing specific values of these parameters.


Re: Question about arrow flight authentication

2024-07-23 Thread David Li
Hi Chuan,

The attachment didn't make it through - could you post a Gist or something?

Thanks,
David

On Wed, Jul 24, 2024, at 08:06, Zhao, Chuan wrote:
> Hi,
>  
> I am from Teradata and I am working on a POC related to Arrow Flight. 
> Basically, I wanted to send metadata request from the client to Arrow Flight 
> Server which fetches the result from Teradata Database (for example table 
> types) and return it to the client. I want to enable username/password 
> authentication. Please see attached code on server and client side.
>  
> I found an issue when I run the code. Here is the output on server side:
>  
> Flight SQL Server listening at grpc+tls://localhost:
>  
> awaiting termination ...
>  
> Metadata(content-type=application/grpc,user-agent=grpc-java-netty/1.63.0,grpc-accept-encoding=gzip,authorization=Basic
>  YXJyb3c6Y2VydA==)
>  
> authenticating user 'arrow' using basic authentication
>  
> Metadata(content-type=application/grpc,user-agent=grpc-java-netty/1.63.0,grpc-accept-encoding=gzip)
>  
> inside method getFlightInfoTableTypes!!
>  
> Metadata(content-type=application/grpc,user-agent=grpc-java-netty/1.63.0,grpc-accept-encoding=gzip,authorization=Basic
>  YXJyb3c6Y2VydA==)
>  
> authenticating user 'arrow' using basic authentication
>  
> Metadata(content-type=application/grpc,user-agent=grpc-java-netty/1.63.0,grpc-accept-encoding=gzip)
>  
> inside method getStreamTableTypes!!
>  
> I printed out incomingHeaders.toString() in authenticate method in my 
> ArrowFlightAuthValidate.java. You can see before going into 
> getFlightInfoTableTypes method, it calls authenticate() two times. The first 
> time comes with authorization info in the header, but not the second time. 
> Same for after calling getFlightInfoTableTypes. I have used latest version 
> 17.0.0 for flight-core, arrow-jdbc and flight-sql on server side and latest 
> flight-sql-jdbc-driver (17.0.0) on client side.
>  
> Any help would be greatly appreciated. Thanks.
>  
> -Chuan
>  


Re: Unsupported/Other Type

2024-07-22 Thread David Li
Thanks to everyone for the comments so far. Are there any more comments? I'd 
like to start a vote now that we have C++/Java/Go.

Note that after a round of bikeshedding, the proposed type is now called Opaque.

On Fri, Jun 14, 2024, at 15:05, Micah Kornfield wrote:
> I left some comments mostly around bike shedding and organization of
> description.  I agree this is useful.
>
> On Tue, Jun 11, 2024 at 10:22 AM Antoine Pitrou  wrote:
>
>>
>> Sorry, I had forgotten to comment on this. I think this is generally a
>> good idea, but it would obviously need more eyes on it :-)
>>
>> Can other people go and take a look at David's PR below?
>>
>>
>> Le 25/05/2024 à 04:47, David Li a écrit :
>> > I've put up a draft PR here: https://github.com/apache/arrow/pull/41823
>>


Re: [DISCUSS] Deprecate UCX transport for Arrow Flight in favor of Dissociated IPC Protocol

2024-07-17 Thread David Li
Replacing gRPC was not the intent.

The disassociated protocol is worded very generically, but works over UCX and 
libfabric, so it is essentially equivalent but does not force you to use the 
predefined Flight RPC method names so it is more flexible in that recard.

On Thu, Jul 18, 2024, at 02:20, Adam Lippai wrote:
> Hi Raul,
>
> Finishing an experiment is good, it can help exploring more in the future
> (if the community doesn’t see it as a baggage to carry forever).
>
> Do you have any conclusions, a summary what was learned?
>
> I might be wrong, but my understanding was that the initial goal was
> replacing the TCP+TLS+HTTP/2+GRPC stack. The dissociated protocol handles
> the IPC format. Is there anything low level focusing on the
> network/transport for HPC users, data centers in the work? Or did we learn
> GRPC is good enough, not the bottleneck most of the time?
>
> Best regards,
> Adam Lippai
>
>
>
> On Wed, Jul 17, 2024 at 12:30 Raúl Cumplido  wrote:
>
>> Hi,
>>
>> I've followed up with a PR to remove UCX transport for flight [1].
>>
>> Thanks,
>> Raúl
>>
>> [1] https://github.com/apache/arrow/pull/43297
>>
>> El mié, 19 jun 2024 a las 11:29, Raúl Cumplido ()
>> escribió:
>> >
>> > Hi,
>> >
>> > I would like to discuss deprecation of the UCX transport for Arrow
>> > Flight (ARROW_WITH_UCX).
>> >
>> > From conversations I've had with Matt Topol and David Li:
>> > - This was implemented as an experimental PoC in order to run some
>> > benchmarks with flight over UCX [1]
>> > - We should encourage usage of the Dissociated IPC Protocol instead of
>> > that implementation [2]
>> >
>> > Some upstream systems are building flight with UCX and we should
>> > probably not encourage its use.
>> >
>> > Are there any thoughts about it?
>> >
>> > Kind regards,
>> > Raúl
>> > [1] https://github.com/apache/arrow/pull/12442
>> > [2] https://arrow.apache.org/docs/dev/format/DissociatedIPC.html
>>


Re: [VOTE] Release Apache Arrow 17.0.0 - RC2

2024-07-15 Thread David Li
+1 (binding)

Tested on Debian 12/x86_64

On Mon, Jul 15, 2024, at 15:31, Gang Wu wrote:
> +1 (non-binding)
>
> Verified C++ on my M1 Mac by running:
> - TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 17.0.0 2
>
> BTW, I ran into this issue as well:
> https://github.com/apache/arrow/issues/43167
>
> Best,
> Gang
>
> On Mon, Jul 15, 2024 at 1:39 PM Jean-Baptiste Onofré 
> wrote:
>
>> +1 (non binding)
>>
>> Regards
>> JB
>>
>> On Fri, Jul 12, 2024 at 11:56 AM Raúl Cumplido  wrote:
>> >
>> > Hi,
>> >
>> > I would like to propose the following release candidate (RC2) of Apache
>> > Arrow version 17.0.0. This is a release consisting of 321
>> > resolved GitHub issues[1].
>> >
>> > This release candidate is based on commit:
>> > 6a2e19a852b367c72d7b12da4d104456491ed8b7 [2]
>> >
>> > The source release rc2 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
>> > The changelog is located at [12].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> > and vote on the release. See [13] for how to validate a release
>> candidate.
>> >
>> > See also a verification result on GitHub pull request [14].
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow 17.0.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow 17.0.0 because...
>> >
>> > [1]:
>> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A17.0.0+is%3Aclosed
>> > [2]:
>> https://github.com/apache/arrow/tree/6a2e19a852b367c72d7b12da4d104456491ed8b7
>> > [3]:
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-17.0.0-rc2
>> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> > [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
>> > [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
>> > [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> > [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/17.0.0-rc2
>> > [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/17.0.0-rc2
>> > [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/17.0.0-rc2
>> > [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> > [12]:
>> https://github.com/apache/arrow/blob/6a2e19a852b367c72d7b12da4d104456491ed8b7/CHANGELOG.md
>> > [13]: https://arrow.apache.org/docs/developers/release_verification.html
>> > [14]: https://github.com/apache/arrow/pull/43220
>>


Re: [RESULT][VOTE] Release Apache Arrow ADBC 13 - RC0

2024-07-04 Thread David Li
[x] Close the GitHub milestone/project
[x] Add the new release to the Apache Reporter System
[x] Upload source release artifacts to Subversion
[x] Create the final GitHub release
[x] Update website
[x] Upload wheels/sdist to PyPI
[x] Publish Maven packages
[x] Update tags for Go modules
[x] Deploy APT/Yum repositories
[ ] Update R packages
[x] Upload Ruby packages to RubyGems
[x] Upload C#/.NET packages to NuGet
[IN PROGRESS] Update conda-forge packages
[x] Announce the new release
[x] Remove old artifacts
[x] Bump versions
[IN PROGRESS] Publish release blog post [2]

[1]: https://github.com/conda-forge/arrow-adbc-split-feedstock/pull/24
[2]: https://github.com/apache/arrow-site/pull/533

On Fri, Jul 5, 2024, at 10:22, David Li wrote:
> I've hit a snag with C#: it appears the packages got built with the 
> wrong version number (0.13.0-SNAPSHOT instead of 0.13.0) [1]. For this 
> time I'll rebuild the packages manually with a CI run and upload them 
> by hand.
>
> I will also have to tweak the website deployment since the redirect 
> didn't get updated [2].
>
> [1]: https://github.com/apache/arrow-adbc/issues/1968
> [2]: https://github.com/apache/arrow-adbc/issues/1967
>
> [x] Close the GitHub milestone/project
> [x] Add the new release to the Apache Reporter System
> [x] Upload source release artifacts to Subversion
> [x] Create the final GitHub release
> [x] Update website
> [x] Upload wheels/sdist to PyPI
> [x] Publish Maven packages
> [x] Update tags for Go modules
> [x] Deploy APT/Yum repositories
> [ ] Update R packages
> [x] Upload Ruby packages to RubyGems
> [ ] Upload C#/.NET packages to NuGet
> [ ] Update conda-forge packages
> [ ] Announce the new release
> [ ] Remove old artifacts
> [ ] Bump versions
> [ ] Publish release blog post
>
> On Fri, Jul 5, 2024, at 09:29, David Li wrote:
>> The vote passes with 3 binding, 1 non-binding +1 votes. Thanks all!
>>
>> Binding: Raúl Cumplido, Sutou Kouhei, Matt Topol
>> Non-binding: Dane Pitkin
>>
>> I will take care of the release tasks next.
>>
>> On Thu, Jul 4, 2024, at 00:03, Dane Pitkin wrote:
>>> +1 (non-binding)
>>>
>>> Verified on MacOS 14 aarch64 with:
>>>
>>> DOCKER_DEFAULT_PLATFORM=linux/amd64 USE_CONDA=1
>>> ./dev/release/verify-release-candidate.sh 13 0
>>>
>>> I also had to install arrow-glib-devel in ./dev/release/verify-yum.sh to
>>> fully verify the release:
>>>
>>> +${install_command} --enablerepo=epel arrow-glib-devel
>>>  ${install_command} --enablerepo=epel adbc
>>> -arrow-glib-devel-${package_version}
>>>  ${install_command} --enablerepo=epel adbc-arrow-glib-doc-${package_version}
>>>
>>>
>>> On Mon, Jul 1, 2024 at 8:49 PM Matt Topol  wrote:
>>>
>>>> +1 (binding)
>>>>
>>>> Release candidate validated successfully with:
>>>> USE_CONDA=0 dev/release/verify-release-candidate.sh 13 0
>>>>
>>>> using Pop_OS! 22.04
>>>>
>>>> Same issue as Kou, i needed to install arrow-glib-devel manually to get
>>>> verification to work
>>>>
>>>> On Mon, Jul 1, 2024 at 8:31 PM Sutou Kouhei  wrote:
>>>>
>>>> > +1 (binding)
>>>> >
>>>> > I ran the following on Debian GNU/Linux sid:
>>>> >
>>>> >   TEST_DEFAULT=0 \
>>>> > TEST_SOURCE=1 \
>>>> > LANG=C \
>>>> > TZ=UTC \
>>>> > JAVA_HOME=/usr/lib/jvm/default-java \
>>>> > dev/release/verify-release-candidate.sh 13 0
>>>> >
>>>> >   TEST_DEFAULT=0 \
>>>> > TEST_APT=1 \
>>>> > LANG=C \
>>>> > dev/release/verify-release-candidate.sh 13 0
>>>> >
>>>> >   TEST_DEFAULT=0 \
>>>> > TEST_BINARY=1 \
>>>> > LANG=C \
>>>> > dev/release/verify-release-candidate.sh 13 0
>>>> >
>>>> >   TEST_DEFAULT=0 \
>>>> > TEST_JARS=1 \
>>>> > LANG=C \
>>>> > dev/release/verify-release-candidate.sh 13 0
>>>> >
>>>> >   TEST_DEFAULT=0 \
>>>> > TEST_WHEELS=1 \
>>>> > TEST_PYTHON_VERSIONS=3.11 \
>>>> > LANG=C \
>>>> > TZ=UTC \
>>>> > dev/release/verify-release-candidate.sh 13 0
>>>> >
>>>> >   TEST_DEFAULT=0 \
>>>> > TEST_YUM=1 \
>>>> > LANG=C \
>>>> > dev/release/verify-

[ANNOUNCE] Apache Arrow ADBC 13 released

2024-07-04 Thread David Li
The Apache Arrow community is pleased to announce the 13th release of the 
Apache Arrow ADBC libraries. It includes 24 resolved GitHub issues ([1]). 
Individual components are versioned separately: some packages are on version 
0.13.0 and others are now version 1.1.0, with the release as a whole on version 
'13'.

The release is available now from [2] and [3].

Release notes are available at: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-13/CHANGELOG.md#adbc-libraries-13-2024-07-01

What is Apache Arrow?
-
Apache Arrow is a columnar in-memory analytics layer designed to accelerate big 
data. It houses a set of canonical in-memory representations of flat and 
hierarchical data along with multiple language-bindings for structure 
manipulation. It also provides low-overhead streaming and batch messaging, 
zero-copy interprocess communication (IPC), and vectorized in-memory analytics 
libraries. Languages currently supported include C, C++, C#, Go, Java, 
JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

What is Apache Arrow ADBC?
--
ADBC is a database access abstraction for Arrow-based applications. It provides 
a cross-language API for working with databases while using Arrow data, 
providing an alternative to APIs like JDBC and ODBC for analytical 
applications. For more, see [4].

Please report any feedback to the mailing lists ([5], [6]).

Regards,
The Apache Arrow Community

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+13%22+is%3Aclosed
[2]: https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-13
[3]: https://apache.jfrog.io/ui/native/arrow
[4]: https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/
[5]: https://lists.apache.org/list.html?u...@arrow.apache.org
[6]: https://lists.apache.org/list.html?dev@arrow.apache.org


Re: [RESULT][VOTE] Release Apache Arrow ADBC 13 - RC0

2024-07-04 Thread David Li
I've hit a snag with C#: it appears the packages got built with the wrong 
version number (0.13.0-SNAPSHOT instead of 0.13.0) [1]. For this time I'll 
rebuild the packages manually with a CI run and upload them by hand.

I will also have to tweak the website deployment since the redirect didn't get 
updated [2].

[1]: https://github.com/apache/arrow-adbc/issues/1968
[2]: https://github.com/apache/arrow-adbc/issues/1967

[x] Close the GitHub milestone/project
[x] Add the new release to the Apache Reporter System
[x] Upload source release artifacts to Subversion
[x] Create the final GitHub release
[x] Update website
[x] Upload wheels/sdist to PyPI
[x] Publish Maven packages
[x] Update tags for Go modules
[x] Deploy APT/Yum repositories
[ ] Update R packages
[x] Upload Ruby packages to RubyGems
[ ] Upload C#/.NET packages to NuGet
[ ] Update conda-forge packages
[ ] Announce the new release
[ ] Remove old artifacts
[ ] Bump versions
[ ] Publish release blog post

On Fri, Jul 5, 2024, at 09:29, David Li wrote:
> The vote passes with 3 binding, 1 non-binding +1 votes. Thanks all!
>
> Binding: Raúl Cumplido, Sutou Kouhei, Matt Topol
> Non-binding: Dane Pitkin
>
> I will take care of the release tasks next.
>
> On Thu, Jul 4, 2024, at 00:03, Dane Pitkin wrote:
>> +1 (non-binding)
>>
>> Verified on MacOS 14 aarch64 with:
>>
>> DOCKER_DEFAULT_PLATFORM=linux/amd64 USE_CONDA=1
>> ./dev/release/verify-release-candidate.sh 13 0
>>
>> I also had to install arrow-glib-devel in ./dev/release/verify-yum.sh to
>> fully verify the release:
>>
>> +${install_command} --enablerepo=epel arrow-glib-devel
>>  ${install_command} --enablerepo=epel adbc
>> -arrow-glib-devel-${package_version}
>>  ${install_command} --enablerepo=epel adbc-arrow-glib-doc-${package_version}
>>
>>
>> On Mon, Jul 1, 2024 at 8:49 PM Matt Topol  wrote:
>>
>>> +1 (binding)
>>>
>>> Release candidate validated successfully with:
>>> USE_CONDA=0 dev/release/verify-release-candidate.sh 13 0
>>>
>>> using Pop_OS! 22.04
>>>
>>> Same issue as Kou, i needed to install arrow-glib-devel manually to get
>>> verification to work
>>>
>>> On Mon, Jul 1, 2024 at 8:31 PM Sutou Kouhei  wrote:
>>>
>>> > +1 (binding)
>>> >
>>> > I ran the following on Debian GNU/Linux sid:
>>> >
>>> >   TEST_DEFAULT=0 \
>>> > TEST_SOURCE=1 \
>>> > LANG=C \
>>> > TZ=UTC \
>>> > JAVA_HOME=/usr/lib/jvm/default-java \
>>> > dev/release/verify-release-candidate.sh 13 0
>>> >
>>> >   TEST_DEFAULT=0 \
>>> > TEST_APT=1 \
>>> > LANG=C \
>>> > dev/release/verify-release-candidate.sh 13 0
>>> >
>>> >   TEST_DEFAULT=0 \
>>> > TEST_BINARY=1 \
>>> > LANG=C \
>>> > dev/release/verify-release-candidate.sh 13 0
>>> >
>>> >   TEST_DEFAULT=0 \
>>> > TEST_JARS=1 \
>>> > LANG=C \
>>> > dev/release/verify-release-candidate.sh 13 0
>>> >
>>> >   TEST_DEFAULT=0 \
>>> > TEST_WHEELS=1 \
>>> > TEST_PYTHON_VERSIONS=3.11 \
>>> > LANG=C \
>>> > TZ=UTC \
>>> > dev/release/verify-release-candidate.sh 13 0
>>> >
>>> >   TEST_DEFAULT=0 \
>>> > TEST_YUM=1 \
>>> > LANG=C \
>>> > dev/release/verify-release-candidate.sh 13 0
>>> >
>>> > with:
>>> >
>>> >   * g++ (Debian 13.3.0-1) 13.3.0
>>> >   * go version go1.22.4 linux/amd64
>>> >   * openjdk version "17.0.11" 2024-04-16
>>> >   * Python 3.11.9
>>> >   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
>>> >   * R version 4.4.1 (2024-06-14) -- "Race for Your Life"
>>> >   * Apache Arrow 17.0.0-SNAPSHOT
>>> >
>>> > Note:
>>> >
>>> > I needed to install arrow-glib-devel explicitly to verify
>>> > Yum repository like I did for ADBC 12:
>>> >
>>> > 
>>> > diff --git a/dev/release/verify-yum.sh b/dev/release/verify-yum.sh
>>> > index f7f023611..ff30176f1 100755
>>> > --- a/dev/release/verify-yum.sh
>>> > +++ b/dev/release/verify-yum.sh
>>> > @@ -170,6 +170,7 @@ echo "::endgroup::"
>>> >
>>> >  echo "::group::Test ADBC Arrow GLib"
>>> >
>&g

[RESULT][VOTE] Release Apache Arrow ADBC 13 - RC0

2024-07-04 Thread David Li
The vote passes with 3 binding, 1 non-binding +1 votes. Thanks all!

Binding: Raúl Cumplido, Sutou Kouhei, Matt Topol
Non-binding: Dane Pitkin

I will take care of the release tasks next.

On Thu, Jul 4, 2024, at 00:03, Dane Pitkin wrote:
> +1 (non-binding)
>
> Verified on MacOS 14 aarch64 with:
>
> DOCKER_DEFAULT_PLATFORM=linux/amd64 USE_CONDA=1
> ./dev/release/verify-release-candidate.sh 13 0
>
> I also had to install arrow-glib-devel in ./dev/release/verify-yum.sh to
> fully verify the release:
>
> +${install_command} --enablerepo=epel arrow-glib-devel
>  ${install_command} --enablerepo=epel adbc
> -arrow-glib-devel-${package_version}
>  ${install_command} --enablerepo=epel adbc-arrow-glib-doc-${package_version}
>
>
> On Mon, Jul 1, 2024 at 8:49 PM Matt Topol  wrote:
>
>> +1 (binding)
>>
>> Release candidate validated successfully with:
>> USE_CONDA=0 dev/release/verify-release-candidate.sh 13 0
>>
>> using Pop_OS! 22.04
>>
>> Same issue as Kou, i needed to install arrow-glib-devel manually to get
>> verification to work
>>
>> On Mon, Jul 1, 2024 at 8:31 PM Sutou Kouhei  wrote:
>>
>> > +1 (binding)
>> >
>> > I ran the following on Debian GNU/Linux sid:
>> >
>> >   TEST_DEFAULT=0 \
>> > TEST_SOURCE=1 \
>> > LANG=C \
>> > TZ=UTC \
>> > JAVA_HOME=/usr/lib/jvm/default-java \
>> > dev/release/verify-release-candidate.sh 13 0
>> >
>> >   TEST_DEFAULT=0 \
>> > TEST_APT=1 \
>> > LANG=C \
>> > dev/release/verify-release-candidate.sh 13 0
>> >
>> >   TEST_DEFAULT=0 \
>> > TEST_BINARY=1 \
>> > LANG=C \
>> > dev/release/verify-release-candidate.sh 13 0
>> >
>> >   TEST_DEFAULT=0 \
>> > TEST_JARS=1 \
>> > LANG=C \
>> > dev/release/verify-release-candidate.sh 13 0
>> >
>> >   TEST_DEFAULT=0 \
>> > TEST_WHEELS=1 \
>> > TEST_PYTHON_VERSIONS=3.11 \
>> > LANG=C \
>> > TZ=UTC \
>> > dev/release/verify-release-candidate.sh 13 0
>> >
>> >   TEST_DEFAULT=0 \
>> > TEST_YUM=1 \
>> > LANG=C \
>> > dev/release/verify-release-candidate.sh 13 0
>> >
>> > with:
>> >
>> >   * g++ (Debian 13.3.0-1) 13.3.0
>> >   * go version go1.22.4 linux/amd64
>> >   * openjdk version "17.0.11" 2024-04-16
>> >   * Python 3.11.9
>> >   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
>> >   * R version 4.4.1 (2024-06-14) -- "Race for Your Life"
>> >   * Apache Arrow 17.0.0-SNAPSHOT
>> >
>> > Note:
>> >
>> > I needed to install arrow-glib-devel explicitly to verify
>> > Yum repository like I did for ADBC 12:
>> >
>> > 
>> > diff --git a/dev/release/verify-yum.sh b/dev/release/verify-yum.sh
>> > index f7f023611..ff30176f1 100755
>> > --- a/dev/release/verify-yum.sh
>> > +++ b/dev/release/verify-yum.sh
>> > @@ -170,6 +170,7 @@ echo "::endgroup::"
>> >
>> >  echo "::group::Test ADBC Arrow GLib"
>> >
>> > +${install_command} --enablerepo=epel arrow-glib-devel
>> >  ${install_command} --enablerepo=epel
>> > adbc-arrow-glib-devel-${package_version}
>> >  ${install_command} --enablerepo=epel
>> > adbc-arrow-glib-doc-${package_version}
>> >
>> > 
>> >
>> > This is not a blocker for 13 too. I want to find the
>> > solution for this but I don' have any idea yet...
>> >
>> >
>> > Thanks,
>> > --
>> > kou
>> >
>> > In 
>> >   "[VOTE] Release Apache Arrow ADBC 13 - RC0" on Mon, 01 Jul 2024
>> 17:01:02
>> > +0900,
>> >   "David Li"  wrote:
>> >
>> > > Hello,
>> > >
>> > > I would like to propose the following release candidate (RC0) of Apache
>> > Arrow ADBC version 13. This is a release consisting of 24 resolved GitHub
>> > issues [1].
>> > >
>> > > The subcomponents are versioned independently:
>> > >
>> > > - C/C++/GLib/Go/Python/Ruby: 1.1.0
>> > > - C#: 0.13.0
>> > > - Java: 0.13.0
>> > > - R: 0.13.0
>> > > - Rust: 0.13.0
>> > >
>> > > This release candidate is based on commit:
>> > 37f79efbcd1641e6906a36e76df57cb896f2bc68 [2]
>> >

[VOTE] Release Apache Arrow ADBC 13 - RC0

2024-07-01 Thread David Li
Hello,

I would like to propose the following release candidate (RC0) of Apache Arrow 
ADBC version 13. This is a release consisting of 24 resolved GitHub issues [1].

The subcomponents are versioned independently:

- C/C++/GLib/Go/Python/Ruby: 1.1.0
- C#: 0.13.0
- Java: 0.13.0
- R: 0.13.0
- Rust: 0.13.0

This release candidate is based on commit: 
37f79efbcd1641e6906a36e76df57cb896f2bc68 [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8].
The changelog is located at [9].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [10] for how to validate a release candidate.

See also a verification result on GitHub Actions [11].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow ADBC 13
[ ] +0
[ ] -1 Do not release this as Apache Arrow ADBC 13 because...

Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export TEST_APT=0 
TEST_YUM=0`.)

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+13%22+is%3Aclosed
[2]: 
https://github.com/apache/arrow-adbc/commit/37f79efbcd1641e6906a36e76df57cb896f2bc68
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-13-rc0/
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[7]: 
https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
[8]: https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-13-rc0
[9]: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-13-rc0/CHANGELOG.md
[10]: 
https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
[11]: https://github.com/apache/arrow-adbc/actions/runs/9739916017


Re: Unsupported/Other Type

2024-05-24 Thread David Li
I've put up a draft PR here: https://github.com/apache/arrow/pull/41823

On Wed, Apr 17, 2024, at 23:34, David Li wrote:
> Yes, this would be for an extension type. 
>
> On Wed, Apr 17, 2024, at 23:25, Weston Pace wrote:
>>> people generally find use in Arrow schemas independently of concrete data.
>>
>> This makes sense.  I think we do want to encourage use of Arrow as a "type
>> system" even if there is no data involved.  And, given that we cannot
>> easily change a field's data type property to "optional" it makes sense to
>> use a dedicated type and I so I would be in favor of such a proposal (we
>> may eventually add an "unknown type" concept in Substrait as well, it's
>> come up several times, and so we could use this in that context).
>>
>> I think that I would still prefer a canonical extension type (with storage
>> type null) over a new dedicated type.
>>
>> On Wed, Apr 17, 2024 at 5:39 AM Antoine Pitrou  wrote:
>>
>>>
>>> Ah! Well, I think this could be an interesting proposal, but someone
>>> should put a more formal proposal, perhaps as a draft PR.
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> Le 17/04/2024 à 11:57, David Li a écrit :
>>> > For an unsupported/other extension type.
>>> >
>>> > On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
>>> >> What is "this proposal"?
>>> >>
>>> >>
>>> >> Le 17/04/2024 à 10:38, David Li a écrit :
>>> >>> Should I take it that this proposal is dead in the water? While we
>>> could define our own Unknown/Other type for say the ADBC PostgreSQL driver
>>> it might be useful to have a singular type for consumers to latch on to.
>>> >>>
>>> >>> On Fri, Apr 12, 2024, at 07:32, David Li wrote:
>>> >>>> I think an "Other" extension type is slightly different than an
>>> >>>> arbitrary extension type, though: the latter may be understood
>>> >>>> downstream but the former represents a point at which a component
>>> >>>> explicitly declares it does not know how to handle a field. In this
>>> >>>> example, the PostgreSQL ADBC driver might be able to provide a
>>> >>>> representation regardless, but a different driver (or say, the JDBC
>>> >>>> adapter, which cannot necessarily get a bytestring for an arbitrary
>>> >>>> JDBC type) may want an Other type to signal that it would fail if
>>> asked
>>> >>>> to provide particular columns.
>>> >>>>
>>> >>>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
>>> >>>>> Depending where your Arrow-encoded data is used, either extension
>>> >>>>> types or generic field metadata are options. We have this problem in
>>> >>>>> the ADBC Postgres driver, where we can convert *most* Postgres types
>>> >>>>> to an Arrow type but there are some others where we can't or don't
>>> >>>>> know or don't implement a conversion. Currently for these we return
>>> >>>>> opaque binary (the Postgres COPY representation of the value) but put
>>> >>>>> field metadata so that a consumer can implement a workaround for an
>>> >>>>> unsupported type. It would be arguably better to have implemented
>>> this
>>> >>>>> as an extension type; however, field metadata felt like less of a
>>> >>>>> commitment when I first worked on this.
>>> >>>>>
>>> >>>>> Cheers,
>>> >>>>>
>>> >>>>> -dewey
>>> >>>>>
>>> >>>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>>> >>>>>  wrote:
>>> >>>>>>
>>> >>>>>> I was using UUID as an example. It looks like extension types
>>> covers my original request.
>>> >>>>>> 
>>> >>>>>> From: Felipe Oliveira Carvalho 
>>> >>>>>> Sent: Thursday, April 11, 2024 7:15 AM
>>> >>>>>> To: dev@arrow.apache.org 
>>> >>>>>> Subject: Re: Unsupported/Other Type
>>> >>>>>>
>>> >>>>>> The OP used UUID as an example. 

Re: [VOTE] Release Apache Arrow nanoarrow 0.5.0

2024-05-22 Thread David Li
+1 (binding)

Tested on Debian 12 'bookworm'

On Thu, May 23, 2024, at 11:03, Sutou Kouhei wrote:
> +1 (binding)
>
> I ran the following command line on Debian GNU/Linux sid:
>
>   dev/release/verify-release-candidate.sh 0.5.0 0
>
> with:
>
>   * Apache Arrow C++ main
>   * gcc (Debian 13.2.0-23) 13.2.0
>   * R version 4.3.3 (2024-02-29) -- "Angel Food Cake"
>   * Python 3.11.9
>
> Thanks,
> -- 
> kou
>
>
> In 
>   "[VOTE] Release Apache Arrow nanoarrow 0.5.0" on Wed, 22 May 2024 
> 15:17:40 -0300,
>   Dewey Dunnington  wrote:
>
>> Hello,
>> 
>> I would like to propose the following release candidate (rc0) of
>> Apache Arrow nanoarrow [0] version 0.5.0. This is an initial release
>> consisting of 79 resolved GitHub issues from 9 contributors [1].
>> 
>> This release candidate is based on commit:
>> c5fb10035c17b598e6fd688ad9eb7b874c7c631b [2]
>> 
>> The source release rc0 is hosted at [3].
>> The changelog is located at [4].
>> 
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [5] for how to validate a release
>> candidate.
>> 
>> The vote will be open for at least 72 hours.
>> 
>> [ ] +1 Release this as Apache Arrow nanoarrow 0.5.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow nanoarrow 0.5.0 because...
>> 
>> [0] https://github.com/apache/arrow-nanoarrow
>> [1] https://github.com/apache/arrow-nanoarrow/milestone/5?closed=1
>> [2] 
>> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.5.0-rc0
>> [3] 
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.5.0-rc0/
>> [4] 
>> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.5.0-rc0/CHANGELOG.md
>> [5] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow ADBC 12 - RC4

2024-05-21 Thread David Li
[x] Close the GitHub milestone/project
[x] Add the new release to the Apache Reporter System
[x] Upload source release artifacts to Subversion
[x] Create the final GitHub release
[x] Update website
[x] Upload wheels/sdist to PyPI
[x] Publish Maven packages
[x] Update tags for Go modules
[x] Deploy APT/Yum repositories
[ ] Update R packages
[x] Upload Ruby packages to RubyGems
[x] Upload C#/.NET packages to NuGet
[x] Update conda-forge packages
[x] Announce the new release
[x] Remove old artifacts
[x] Bump versions
[IN PROGRESS] Publish release blog post [2]

@Dewey, I'd appreciate your help as always with the R packages :)

[1]: https://github.com/apache/arrow-site/pull/523

On Tue, May 21, 2024, at 09:00, Sutou Kouhei wrote:
> +1 (binding)
>
> I ran the following on Debian GNU/Linux sid:
>
>   TEST_DEFAULT=0 \
> TEST_SOURCE=1 \
> LANG=C \
> TZ=UTC \
> JAVA_HOME=/usr/lib/jvm/default-java \
> dev/release/verify-release-candidate.sh 12 4
>
>   TEST_DEFAULT=0 \
> TEST_APT=1 \
> LANG=C \
> dev/release/verify-release-candidate.sh 12 4
>
>   TEST_DEFAULT=0 \
> TEST_BINARY=1 \
> LANG=C \
> dev/release/verify-release-candidate.sh 12 4
>
>   TEST_DEFAULT=0 \
> TEST_JARS=1 \
> LANG=C \
> dev/release/verify-release-candidate.sh 12 4
>
>   TEST_DEFAULT=0 \
> TEST_WHEELS=1 \
> TEST_PYTHON_VERSIONS=3.11 \
> LANG=C \
> TZ=UTC \
> dev/release/verify-release-candidate.sh 12 4
>
>   TEST_DEFAULT=0 \
> TEST_YUM=1 \
> LANG=C \
> dev/release/verify-release-candidate.sh 12 4
>
> with:
>
>   * g++ (Debian 13.2.0-23) 13.2.0
>   * go version go1.22.2 linux/amd64
>   * openjdk version "17.0.11" 2024-04-16
>   * Python 3.11.9
>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
>   * R version 4.3.3 (2024-02-29) -- "Angel Food Cake"
>   * Apache Arrow 17.0.0-SNAPSHOT
>
> Note:
>
> I needed to install arrow-glib-devel explicitly to verify
> Yum repository:
>
> 
> diff --git a/dev/release/verify-yum.sh b/dev/release/verify-yum.sh
> index f7f023611..ff30176f1 100755
> --- a/dev/release/verify-yum.sh
> +++ b/dev/release/verify-yum.sh
> @@ -170,6 +170,7 @@ echo "::endgroup::"
> 
>  echo "::group::Test ADBC Arrow GLib"
> 
> +${install_command} --enablerepo=epel arrow-glib-devel
>  ${install_command} --enablerepo=epel adbc-arrow-glib-devel-${package_version}
>  ${install_command} --enablerepo=epel adbc-arrow-glib-doc-${package_version}
> 
> 
>
> adbc-arrow-glib-devel depends on "pkgconfig(arrow-glib)" and
> libarrow-glib-devel provided by EPEL also provides it:
>
> $ sudo dnf repoquery --deplist adbc-arrow-glib-devel-12
> Last metadata expiration check: 2:01:21 ago on Mon May 20 21:17:44 2024.
> package: adbc-arrow-glib-devel-12-1.el9.x86_64
> ...
>   dependency: pkgconfig(arrow-glib)
>provider: arrow-glib-devel-16.1.0-1.el9.x86_64
>provider: libarrow-glib-devel-9.0.0-11.el9.x86_64
> ...
>
>
> If I don't install arrow-glib-devel explicitly,
> libarrow-glib-devel may be installed. We may need to add
> "Conflicts: libarrow-glib-devel" to Apache Arrow's
> arrow-glib-devel to resolve this case automatically. Anyway,
> this is not a ADBC problem. So it's not a blocker.
>
>
>
> Thanks,
> -- 
> kou
>
>
> In 
>   "[VOTE] Release Apache Arrow ADBC 12 - RC4" on Wed, 15 May 2024 
> 14:00:33 +0900,
>   "David Li"  wrote:
>
>> Hello,
>> 
>> I would like to propose the following release candidate (RC4) of Apache 
>> Arrow ADBC version 12. This is a release consisting of 56 resolved GitHub 
>> issues [1].
>> 
>> Please note that the versioning scheme has changed.  This is the 12th 
>> release of ADBC, and so is called version "12".  The subcomponents, however, 
>> are versioned independently:
>> 
>> - C/C++/GLib/Go/Python/Ruby: 1.0.0
>> - C#: 0.12.0
>> - Java: 0.12.0
>> - R: 0.12.0
>> - Rust: 0.12.0
>> 
>> These are the versions you will see in the source and in actual packages.  
>> The next release will be "13", and the subcomponents will increment their 
>> versions independently (to either 1.1.0, 0.13.0, or 1.0.0).  At this point, 
>> there is no plan to release subcomponents independently from the project as 
>> a whole. 
>> 
>> Please note that there is a known issue when using the Flight SQL and 
>> Snowflake drivers at the same time on x86_64 macOS [12].
>> 
>> This release candidate is based on commit: 
>> 50cb9de621c4d72f4aefd18237cb4b73b82f4a0e [2]
>> 
>> The s

[ANNOUNCE] Apache Arrow ADBC 12 released

2024-05-20 Thread David Li
The Apache Arrow community is pleased to announce the 12th release of the 
Apache Arrow ADBC libraries. It includes 56 resolved GitHub issues ([1]). 
Individual components are versioned separately: some packages are on version 
0.12.0 and others are now version 1.0.0, with the release as a whole on version 
'12'.

The release is available now from [2] and [3].

Release notes are available at: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-12/CHANGELOG.md#adbc-libraries-12-2024-05-09

What is Apache Arrow?
-
Apache Arrow is a columnar in-memory analytics layer designed to accelerate big 
data. It houses a set of canonical in-memory representations of flat and 
hierarchical data along with multiple language-bindings for structure 
manipulation. It also provides low-overhead streaming and batch messaging, 
zero-copy interprocess communication (IPC), and vectorized in-memory analytics 
libraries. Languages currently supported include C, C++, C#, Go, Java, 
JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

What is Apache Arrow ADBC?
--
ADBC is a database access abstraction for Arrow-based applications. It provides 
a cross-language API for working with databases while using Arrow data, 
providing an alternative to APIs like JDBC and ODBC for analytical 
applications. For more, see [4].

Please report any feedback to the mailing lists ([5], [6]).

Regards,
The Apache Arrow Community

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+12%22+is%3Aclosed
[2]: https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-12
[3]: https://apache.jfrog.io/ui/native/arrow
[4]: https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/
[5]: https://lists.apache.org/list.html?u...@arrow.apache.org
[6]: https://lists.apache.org/list.html?dev@arrow.apache.org


Re: [VOTE] Release Apache Arrow ADBC 12 - RC4

2024-05-20 Thread David Li
The vote passes with 4 binding, 2 non-binding +1 votes.

+1 (binding): Raúl Cumplido, Dewey Dunnington, Weston Pace, David Li
+1 (non-binding): Jean-Baptiste Onofré, Vibhatha Abeykoon

I will take care of the post-release tasks next.

On Tue, May 21, 2024, at 00:32, Weston Pace wrote:
> +1 (binding)
>
> I also tested on Ubuntu 22.04 with USE_CONDA=1
> dev/release/verify-release-candidate.sh 12 4
>
> On Mon, May 20, 2024 at 5:20 AM David Li  wrote:
>
>> My vote: +1 (binding)
>>
>> Are any other PMC members able to take a look?
>>
>> On Fri, May 17, 2024, at 23:36, Dewey Dunnington wrote:
>> > +1 (binding)
>> >
>> > Tested with MacOS M1 using TEST_YUM=0 TEST_APT=0 USE_CONDA=1
>> > ./verify-release-candidate.sh 12 4
>> >
>> > On Fri, May 17, 2024 at 9:46 AM Jean-Baptiste Onofré 
>> wrote:
>> >>
>> >> +1 (non binding)
>> >>
>> >> Testing on MacOS M2.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On Wed, May 15, 2024 at 7:00 AM David Li  wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > I would like to propose the following release candidate (RC4) of
>> Apache Arrow ADBC version 12. This is a release consisting of 56 resolved
>> GitHub issues [1].
>> >> >
>> >> > Please note that the versioning scheme has changed.  This is the 12th
>> release of ADBC, and so is called version "12".  The subcomponents,
>> however, are versioned independently:
>> >> >
>> >> > - C/C++/GLib/Go/Python/Ruby: 1.0.0
>> >> > - C#: 0.12.0
>> >> > - Java: 0.12.0
>> >> > - R: 0.12.0
>> >> > - Rust: 0.12.0
>> >> >
>> >> > These are the versions you will see in the source and in actual
>> packages.  The next release will be "13", and the subcomponents will
>> increment their versions independently (to either 1.1.0, 0.13.0, or
>> 1.0.0).  At this point, there is no plan to release subcomponents
>> independently from the project as a whole.
>> >> >
>> >> > Please note that there is a known issue when using the Flight SQL and
>> Snowflake drivers at the same time on x86_64 macOS [12].
>> >> >
>> >> > This release candidate is based on commit:
>> 50cb9de621c4d72f4aefd18237cb4b73b82f4a0e [2]
>> >> >
>> >> > The source release rc4 is hosted at [3].
>> >> > The binary artifacts are hosted at [4][5][6][7][8].
>> >> > The changelog is located at [9].
>> >> >
>> >> > Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [10] for how to validate a release candidate.
>> >> >
>> >> > See also a verification result on GitHub Actions [11].
>> >> >
>> >> > The vote will be open for at least 72 hours.
>> >> >
>> >> > [ ] +1 Release this as Apache Arrow ADBC 12
>> >> > [ ] +0
>> >> > [ ] -1 Do not release this as Apache Arrow ADBC 12 because...
>> >> >
>> >> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export
>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export
>> TEST_APT=0 TEST_YUM=0`.)
>> >> >
>> >> > [1]:
>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+12%22+is%3Aclosed
>> >> > [2]:
>> https://github.com/apache/arrow-adbc/commit/50cb9de621c4d72f4aefd18237cb4b73b82f4a0e
>> >> > [3]:
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-12-rc4/
>> >> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> >> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> >> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> >> > [7]:
>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> >> > [8]:
>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-12-rc4
>> >> > [9]:
>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-12-rc4/CHANGELOG.md
>> >> > [10]:
>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> >> > [11]: https://github.com/apache/arrow-adbc/actions/runs/9089931356
>> >> > [12]: https://github.com/apache/arrow-adbc/issues/1841
>>


Re: [VOTE] Release Apache Arrow ADBC 12 - RC4

2024-05-20 Thread David Li
My vote: +1 (binding)

Are any other PMC members able to take a look?

On Fri, May 17, 2024, at 23:36, Dewey Dunnington wrote:
> +1 (binding)
>
> Tested with MacOS M1 using TEST_YUM=0 TEST_APT=0 USE_CONDA=1
> ./verify-release-candidate.sh 12 4
>
> On Fri, May 17, 2024 at 9:46 AM Jean-Baptiste Onofré  
> wrote:
>>
>> +1 (non binding)
>>
>> Testing on MacOS M2.
>>
>> Regards
>> JB
>>
>> On Wed, May 15, 2024 at 7:00 AM David Li  wrote:
>> >
>> > Hello,
>> >
>> > I would like to propose the following release candidate (RC4) of Apache 
>> > Arrow ADBC version 12. This is a release consisting of 56 resolved GitHub 
>> > issues [1].
>> >
>> > Please note that the versioning scheme has changed.  This is the 12th 
>> > release of ADBC, and so is called version "12".  The subcomponents, 
>> > however, are versioned independently:
>> >
>> > - C/C++/GLib/Go/Python/Ruby: 1.0.0
>> > - C#: 0.12.0
>> > - Java: 0.12.0
>> > - R: 0.12.0
>> > - Rust: 0.12.0
>> >
>> > These are the versions you will see in the source and in actual packages.  
>> > The next release will be "13", and the subcomponents will increment their 
>> > versions independently (to either 1.1.0, 0.13.0, or 1.0.0).  At this 
>> > point, there is no plan to release subcomponents independently from the 
>> > project as a whole.
>> >
>> > Please note that there is a known issue when using the Flight SQL and 
>> > Snowflake drivers at the same time on x86_64 macOS [12].
>> >
>> > This release candidate is based on commit: 
>> > 50cb9de621c4d72f4aefd18237cb4b73b82f4a0e [2]
>> >
>> > The source release rc4 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7][8].
>> > The changelog is located at [9].
>> >
>> > Please download, verify checksums and signatures, run the unit tests, and 
>> > vote on the release. See [10] for how to validate a release candidate.
>> >
>> > See also a verification result on GitHub Actions [11].
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow ADBC 12
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow ADBC 12 because...
>> >
>> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>> > DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
>> > TEST_APT=0 TEST_YUM=0`.)
>> >
>> > [1]: 
>> > https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+12%22+is%3Aclosed
>> > [2]: 
>> > https://github.com/apache/arrow-adbc/commit/50cb9de621c4d72f4aefd18237cb4b73b82f4a0e
>> > [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-12-rc4/
>> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> > [7]: 
>> > https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> > [8]: 
>> > https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-12-rc4
>> > [9]: 
>> > https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-12-rc4/CHANGELOG.md
>> > [10]: 
>> > https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> > [11]: https://github.com/apache/arrow-adbc/actions/runs/9089931356
>> > [12]: https://github.com/apache/arrow-adbc/issues/1841


[VOTE] Release Apache Arrow ADBC 12 - RC4

2024-05-14 Thread David Li
Hello,

I would like to propose the following release candidate (RC4) of Apache Arrow 
ADBC version 12. This is a release consisting of 56 resolved GitHub issues [1].

Please note that the versioning scheme has changed.  This is the 12th release 
of ADBC, and so is called version "12".  The subcomponents, however, are 
versioned independently:

- C/C++/GLib/Go/Python/Ruby: 1.0.0
- C#: 0.12.0
- Java: 0.12.0
- R: 0.12.0
- Rust: 0.12.0

These are the versions you will see in the source and in actual packages.  The 
next release will be "13", and the subcomponents will increment their versions 
independently (to either 1.1.0, 0.13.0, or 1.0.0).  At this point, there is no 
plan to release subcomponents independently from the project as a whole. 

Please note that there is a known issue when using the Flight SQL and Snowflake 
drivers at the same time on x86_64 macOS [12].

This release candidate is based on commit: 
50cb9de621c4d72f4aefd18237cb4b73b82f4a0e [2]

The source release rc4 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8].
The changelog is located at [9].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [10] for how to validate a release candidate.

See also a verification result on GitHub Actions [11].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow ADBC 12
[ ] +0
[ ] -1 Do not release this as Apache Arrow ADBC 12 because...

Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export TEST_APT=0 
TEST_YUM=0`.)

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+12%22+is%3Aclosed
[2]: 
https://github.com/apache/arrow-adbc/commit/50cb9de621c4d72f4aefd18237cb4b73b82f4a0e
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-12-rc4/
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[7]: 
https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
[8]: https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-12-rc4
[9]: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-12-rc4/CHANGELOG.md
[10]: 
https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
[11]: https://github.com/apache/arrow-adbc/actions/runs/9089931356
[12]: https://github.com/apache/arrow-adbc/issues/1841


Re: [VOTE] Release Apache Arrow 16.1.0 - RC1

2024-05-09 Thread David Li
+1 (binding)

Tested sources with Conda on Debian 12/x86_64 (binaries failed due to download 
flakiness)

On Fri, May 10, 2024, at 07:02, Rok Mihevc wrote:
> +1 (non-binding)
>
> Ran:
> TEST_DEFAULT=0 TEST_SOURCE=1 ./verify-release-candidate.sh 16.1.0 1
> On Ubuntu 22.04.1 x86_64
>
> Thanks for the hard work Raul!
>
> Rok
>
> On Thu, May 9, 2024 at 6:51 PM Bryce Mecum  wrote:
>
>> +1 (non-binding)
>>
>> I ran TEST_DEFAULT=0 TEST_CPP=1
>> ./dev/release/verify-release-candidate.sh 16.1.0 1 on aarch64 macOS
>> 14.4.1 with Homebrew. I did run into one failing test which I've filed
>> as [1].
>>
>> [1] https://github.com/apache/arrow/issues/41605
>>
>> On Thu, May 9, 2024 at 5:05 AM Raúl Cumplido  wrote:
>> >
>> > Hi,
>> >
>> > I would like to propose the following release candidate (RC1) of Apache
>> > Arrow version 16.1.0. This is a release consisting of 35
>> > resolved GitHub issues[1].
>> >
>> > This release candidate is based on commit:
>> > 7dd1d34074af176d9e861a360e135ae57b21cf96 [2]
>> >
>> > The source release rc1 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
>> > The changelog is located at [12].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> > and vote on the release. See [13] for how to validate a release
>> candidate.
>> >
>> > See also a verification result on GitHub pull request [14].
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow 16.1.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow 16.1.0 because...
>> >
>> > [1]:
>> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A16.1.0+is%3Aclosed
>> > [2]:
>> https://github.com/apache/arrow/tree/7dd1d34074af176d9e861a360e135ae57b21cf96
>> > [3]:
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-16.1.0-rc1
>> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> > [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
>> > [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
>> > [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> > [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/16.1.0-rc1
>> > [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/16.1.0-rc1
>> > [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/16.1.0-rc1
>> > [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> > [12]:
>> https://github.com/apache/arrow/blob/7dd1d34074af176d9e861a360e135ae57b21cf96/CHANGELOG.md
>> > [13]: https://arrow.apache.org/docs/developers/release_verification.html
>> > [14]: https://github.com/apache/arrow/pull/41600
>>


[VOTE] Release Apache Arrow ADBC 12 - RC3

2024-05-09 Thread David Li
Hello,

I would like to propose the following release candidate (RC3) of Apache Arrow 
ADBC version 12. This is a release consisting of 56 resolved GitHub issues [1].

Please note that the versioning scheme has changed.  This is the 12th release 
of ADBC, and so is called version "12".  The subcomponents, however, are 
versioned independently:

- C/C++/GLib/Go/Python/Ruby: 1.0.0
- C#: 0.12.0
- Java: 0.12.0
- R: 0.12.0
- Rust: 0.12.0

These are the versions you will see in the source and in actual packages.  The 
next release will be "13", and the subcomponents will increment their versions 
independently (to either 1.1.0, 0.13.0, or 1.0.0).  At this point, there is no 
plan to release subcomponents independently from the project as a whole. 

Please note that there is a known issue when using the Flight SQL and Snowflake 
drivers at the same time on x86_64 macOS [12].

This release candidate is based on commit: 
91804736a5c478fcd9a69930c4bbef8f7af90a7f [2]

The source release rc3 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8].
The changelog is located at [9].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [10] for how to validate a release candidate.  Please use 
the latest version of the verification script on 'main', and not the 
verification script in the source tarball.

See also a verification result on GitHub Actions [11].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow ADBC 12
[ ] +0
[ ] -1 Do not release this as Apache Arrow ADBC 12 because...

Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export TEST_APT=0 
TEST_YUM=0`.)

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+12%22+is%3Aclosed
[2]: 
https://github.com/apache/arrow-adbc/commit/91804736a5c478fcd9a69930c4bbef8f7af90a7f
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-12-rc3/
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[7]: 
https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
[8]: https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-12-rc3
[9]: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-12-rc3/CHANGELOG.md
[10]: 
https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
[11]: https://github.com/apache/arrow-adbc/actions/runs/9013195701
[12]: https://github.com/apache/arrow-adbc/issues/1841


Re: [VOTE] Release Apache Arrow ADBC 12 - RC0

2024-05-08 Thread David Li
It appears manylinux aarch64 wheels didn't get built, so rc1 will be incoming 

On Wed, May 8, 2024, at 14:44, David Li wrote:
> Hello,
>
> I would like to propose the following release candidate (RC0) of Apache 
> Arrow ADBC version 12. This is a release consisting of 48 resolved 
> GitHub issues [1].
>
> Please note that the versioning scheme has changed.  This is the 12th 
> release of ADBC, and so is called version "12".  The subcomponents, 
> however, are versioned independently:
>
> - C/C++/GLib/Go/Python/Ruby: 1.0.0
> - C#: 0.12.0
> - Java: 0.12.0
> - R: 0.12.0
> - Rust: 0.12.0
>
> These are the versions you will see in the source and in actual 
> packages.  The next release will be "13", and the subcomponents will 
> increment their versions independently (to either 1.1.0, 0.13.0, or 
> 1.0.0).  At this point, there is no plan to release subcomponents 
> independently from the project as a whole. 
>
> This release candidate is based on commit: 
> 5bd32135505b63fb8fb0ec617ae7c1f2c9e66bfb [2]
>
> The source release rc0 is hosted at [3].
> The binary artifacts are hosted at [4][5][6][7][8].
> The changelog is located at [9].
>
> Please download, verify checksums and signatures, run the unit tests, 
> and vote on the release. See [10] for how to validate a release 
> candidate.  Please use the latest version of the verification script on 
> 'main', and not the verification script in the source tarball.
>
> See also a verification result on GitHub Actions [11].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow ADBC 12
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow ADBC 12 because...
>
> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
> TEST_APT=0 TEST_YUM=0`.)
>
> [1]: 
> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+12%22+is%3Aclosed
> [2]: 
> https://github.com/apache/arrow-adbc/commit/5bd32135505b63fb8fb0ec617ae7c1f2c9e66bfb
> [3]: 
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-12-rc0/
> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> [7]: 
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> [8]: 
> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-12-rc0
> [9]: 
> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-12-rc0/CHANGELOG.md
> [10]: 
> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> [11]: https://github.com/apache/arrow-adbc/actions/runs/8996431623


[VOTE] Release Apache Arrow ADBC 12 - RC0

2024-05-07 Thread David Li
Hello,

I would like to propose the following release candidate (RC0) of Apache Arrow 
ADBC version 12. This is a release consisting of 48 resolved GitHub issues [1].

Please note that the versioning scheme has changed.  This is the 12th release 
of ADBC, and so is called version "12".  The subcomponents, however, are 
versioned independently:

- C/C++/GLib/Go/Python/Ruby: 1.0.0
- C#: 0.12.0
- Java: 0.12.0
- R: 0.12.0
- Rust: 0.12.0

These are the versions you will see in the source and in actual packages.  The 
next release will be "13", and the subcomponents will increment their versions 
independently (to either 1.1.0, 0.13.0, or 1.0.0).  At this point, there is no 
plan to release subcomponents independently from the project as a whole. 

This release candidate is based on commit: 
5bd32135505b63fb8fb0ec617ae7c1f2c9e66bfb [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8].
The changelog is located at [9].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [10] for how to validate a release candidate.  Please use 
the latest version of the verification script on 'main', and not the 
verification script in the source tarball.

See also a verification result on GitHub Actions [11].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow ADBC 12
[ ] +0
[ ] -1 Do not release this as Apache Arrow ADBC 12 because...

Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export TEST_APT=0 
TEST_YUM=0`.)

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+12%22+is%3Aclosed
[2]: 
https://github.com/apache/arrow-adbc/commit/5bd32135505b63fb8fb0ec617ae7c1f2c9e66bfb
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-12-rc0/
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[7]: 
https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
[8]: https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-12-rc0
[9]: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-12-rc0/CHANGELOG.md
[10]: 
https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
[11]: https://github.com/apache/arrow-adbc/actions/runs/8996431623


Re: [ANNOUNCE] New Arrow committer: Dane Pitkin

2024-05-07 Thread David Li
Congrats Dane!

On Wed, May 8, 2024, at 09:46, Felipe Oliveira Carvalho wrote:
> Great news. Congratulations Dane!
>
> On Tue, May 7, 2024 at 7:57 PM Vibhatha Abeykoon  wrote:
>>
>> Congratulations Dane!!!
>>
>> Vibhatha Abeykoon
>>
>>
>> On Wed, May 8, 2024 at 4:02 AM Jacob Wujciak  wrote:
>>
>> > Congrats!
>> >
>> > Am Di., 7. Mai 2024 um 23:19 Uhr schrieb Bryce Mecum > > >:
>> >
>> > > Congrats Dane!
>> > >
>> > > On Tue, May 7, 2024 at 5:53 AM Joris Van den Bossche
>> > >  wrote:
>> > > >
>> > > > On behalf of the Arrow PMC, I'm happy to announce that Dane Pitkin has
>> > > > accepted an invitation to become a committer on Apache Arrow. Welcome,
>> > > > and thank you for your contributions!
>> > > >
>> > > > Joris
>> > >
>> >


Re: [VOTE][Format] JSON canonical extension type

2024-04-29 Thread David Li
+1 (binding)

assuming we explicitly state RFC-8259

On Tue, Apr 30, 2024, at 08:02, Matt Topol wrote:
> +1 (binding)
>
> On Mon, Apr 29, 2024 at 5:36 PM Ian Cook  wrote:
>
>> +1 (non-binding)
>>
>> I added a comment in the PR suggesting that we explicitly refer to RFC-8259
>> in CanonicalExtensions.rst.
>>
>> On Mon, Apr 29, 2024 at 1:21 PM Micah Kornfield 
>> wrote:
>>
>> > +1, I added a comment to the PR because I think we should recommend
>> > implementations specifically reject parsing Binary arrays with the
>> > annotation in-case we want to support non-UTF8 encodings in the future
>> > (even thought IIRC these aren't really JSON spec compliant).
>> >
>> > On Fri, Apr 19, 2024 at 1:24 PM Rok Mihevc  wrote:
>> >
>> > > Hi all,
>> > >
>> > > Following discussions [1][2] and preliminary implementation work (by
>> > > Pradeep Gollakota) [3] I would like to propose a vote to add language
>> for
>> > > JSON canonical extension type to CanonicalExtensions.rst as in PR [4]
>> and
>> > > written below.
>> > > A draft C++ implementation PR can be seen here [3].
>> > >
>> > > [1] https://lists.apache.org/thread/p3353oz6lk846pnoq6vk638tjqz2hm1j
>> > > [2] https://lists.apache.org/thread/7xph3476g9rhl9mtqvn804fqf5z8yoo1
>> > > [3] https://github.com/apache/arrow/pull/13901
>> > > [4] https://github.com/apache/arrow/pull/41257 <- proposed change
>> > >
>> > >
>> > > The vote will be open for at least 72 hours.
>> > >
>> > > [ ] +1 Accept this proposal
>> > > [ ] +0
>> > > [ ] -1 Do not accept this proposal because...
>> > >
>> > >
>> > > JSON
>> > > 
>> > >
>> > > * Extension name: `arrow.json`.
>> > >
>> > > * The storage type of this extension is ``StringArray`` or
>> > >   or ``LargeStringArray`` or ``StringViewArray``.
>> > >   Only UTF-8 encoded JSON is supported.
>> > >
>> > > * Extension type parameters:
>> > >
>> > >   This type does not have any parameters.
>> > >
>> > > * Description of the serialization:
>> > >
>> > >   Metadata is either an empty string or a JSON string with an empty
>> > object.
>> > >   In the future, additional fields may be added, but they are not
>> > required
>> > >   to interpret the array.
>> > >
>> > >
>> > >
>> > > Rok
>> > >
>> >
>>


Re: [DISCUSSION] New Flags for Arrow C Interface Schema

2024-04-23 Thread David Li
For scalars: Arrow doesn't define scalars. They're an implementation concept. 
(They may be a *useful* one, but if we want to define them more generally, 
that's a separate discussion.)

For UDFs: UDFs are a system-specific interface. Presumably, that interface can 
encode whether an Arrow array is meant to represent a column or scalar (or 
record batch or ...). Again, because Arrow doesn't define scalars (for now...) 
or UDFs, the UDF interface needs to layer its own semantics on top of Arrow. 

In other words, I don't think the C Data Interface was meant to be something 
where you can expect to _only_ pass the ArrowDeviceArray around and have it 
encode all the semantics for a particular system, right? The UDF example is 
something where the engine would pass an ArrowDeviceArray plus additional 
context.

> since we can't determine which a given ArrowArray is on its own. In the
> libcudf situation, it came up with what happens if you pass a non-struct
> column to the from_arrow_device method which returns a cudf::table? Should
> it error, or should it create a table with a single column?

Presumably it should just error? I can see this being ambiguous if there were 
an API that dynamically returned either a table or a column based on the input 
shape (where before it would be less ambiguous since you'd explicitly pass 
pa.RecordBatch or pa.Array, and now it would be ambiguous since you only pass 
ArrowDeviceArray). But it doesn't sound like that's the case?

On Tue, Apr 23, 2024, at 11:15, Weston Pace wrote:
> I tend to agree with Dewey.  Using run-end-encoding to represent a scalar
> is clever and would keep the c data interface more compact.  Also, a struct
> array is a superset of a record batch (assuming the metadata is kept in the
> schema).  Consumers should always be able to deserialize into a struct
> array and then downcast to a record batch if that is what they want to do
> (raising an error if there happen to be nulls).
>
>> Depending on the function in question, it could be valid to pass a struct
>> column vs a record batch with different results.
>
> Are there any concrete examples where this is the case?  The closest
> example I can think of is something like the `drop_nulls` function, which,
> given a record batch, would choose to drop rows where any column is null
> and, given an array, only drops rows where the top-level struct is null.
> However, it might be clearer to just give the two functions different names
> anyways.
>
> On Mon, Apr 22, 2024 at 1:01 PM Dewey Dunnington
>  wrote:
>
>> Thank you for the background!
>>
>> I still wonder if these distinctions are the responsibility of the
>> ArrowSchema to communicate (although perhaps links to the specific
>> discussions would help highlight use-cases that I am not envisioning).
>> I think these distinctions are definitely important in the contexts
>> you mentioned; however, I am not sure that the FFI layer is going to
>> be helpful.
>>
>> > In the libcudf situation, it came up with what happens if you pass a
>> non-struct
>> > column to the from_arrow_device method which returns a cudf::table?
>> Should
>> > it error, or should it create a table with a single column?
>>
>> I suppose that I would have expected two functions (one to create a
>> table and one to create a column). As a consumer I can't envision a
>> situation where I would want to import an ArrowDeviceArray but where I
>> would want some piece of run-time information to decide what the
>> return type of the function would be? (With apologies if I am missing
>> a piece of the discussion).
>>
>> > If A and B have different lengths, this is invalid
>>
>> I believe several array implementations (e.g., numpy, R) are able to
>> broadcast/recycle a length-1 array. Run-end-encoding is also an option
>> that would make that broadcast explicit without expanding the scalar.
>>
>> > Depending on the function in question, it could be valid to pass a
>> struct column vs a record batch with different results.
>>
>> If this is an important distinction for an FFI signature of a UDF,
>> there would probably be a struct definition for the UDF where there
>> would be an opportunity to make this distinction (and perhaps others
>> that are relevant) without loading this concept onto the existing
>> structs.
>>
>> > If no flags are set, then the behavior shouldn't change
>> > from what it is now. If the ARROW_FLAG_RECORD_BATCH flag is set, then it
>> > should error unless calling ImportRecordBatch.
>>
>> I am not sure I would have expected that (since a struct array has an
>> unambiguous interpretation as a record batch and as a user I've very
>> explicitly decided that I want one, since I'm using that function).
>>
>> In the other direction, I am not sure a producer would be able to set
>> these flags without breaking backwards compatibility with earlier
>> producers that did not set them (since earlier threads have suggested
>> that it is good practice to error when an unsupported flag is

Re: ADBC - OS-level driver manager

2024-04-23 Thread David Li
I'd rather not hard code it directly into the manager, both because this may 
surprise applications that don't want it and would be inflexible for 
applications who are looking to use it, but providing an additional list of 
search paths that (say) Excel can configure + some platform-specific guidance 
on a standard list seems reasonable.

On Wed, Apr 24, 2024, at 02:45, Ian Cook wrote:
> I wonder if there is a relatively simple way to solve this problem. The
> ADBC driver manager libraries already make it possible to dynamically load
> drivers, and I believe these libraries already allow the user to specify
> which driver to use by passing either a bare filename or a full file path.
>
> So perhaps we could simply establish an ordered list of standard directory
> locations in which the ADBC driver manager will look for drivers when they
> are specified by bare filename. We would have to specify this differently
> for each mainstream type of OS, but I think that is doable. This could be
> codified in the ADBC docs and implemented in the ADBC driver managers.
> Anyone looking to achieve system-wide ADBC driver "registration" could take
> advantage of this, whereas anyone who prefers application-specific
> implementation could safely ignore it.
>
> I suspect that we would want the driver manager to look first in
> application-specific directories (which might vary depending on which ADBC
> driver language library one is using), then fall back on user-level config
> directories, then finally fall back on system-level config directories.
>
> I believe that Windows, macOS, and Linux distros all have standard
> user-level and system-level config directories that are often used for this
> type of thing.
>
> Does this seem reasonable? Are there any gotchas that would prevent an
> approach like this from working?
>
> Ian
>
> On Mon, Apr 1, 2024 at 5:44 PM Curt Hagenlocher 
> wrote:
>
>> The advantage to system-wide registration of drivers (however that's
>> accomplished) is of course that it allows driver authors to provide a
>> single installer or set of instructions for the driver to be installed
>> without regard for different usage scenarios. So if Tableau and Excel can
>> both use ODBC drivers, then I (as a hypothetical author of a niche driver)
>> don't have to solve N installation problems for N possible use cases. And
>> my spouse (as a non-developer finance user) can just run one installer and
>> know that the data source will be available in multiple tools. Or at least
>> that's the principle.
>>
>> For a real-world example, compare the instructions for installing ODBC
>> drivers into Tableau (
>>
>> https://help.tableau.com/current/pro/desktop/en-us/examples_otherdatabases.htm
>> ) with those for installing JDBC drivers (
>>
>> https://help.tableau.com/current/pro/desktop/en-us/examples_otherdatabases_jdbc.htm
>> ). The JDBC instructions include copying or installing files to a specific
>> directory which possibly needs to be created. The ODBC instructions ...
>> don't.
>>
>> With what I'm most immediately invested in -- database drivers for
>> Microsoft Power BI -- part of the problem actually ends up being that many
>> drivers are closed source and/or not freely redistributable. So for someone
>> to use Power BI with Oracle, they either need a way to install Oracle
>> drivers onto their machine in a standard way which lets us find them or we
>> need to go through a painful and sometimes expensive "biz dev" effort to
>> get the right to redistribute those drivers and install them ourselves.
>>
>> I am of course aware that there can also be significant downsides to such
>> system-wide registration.
>>
>> -Curt
>>
>> On Wed, Mar 20, 2024 at 7:23 AM Antoine Pitrou  wrote:
>>
>> >
>> > Also, with ADBC driver implementations currently in flux (none of them
>> > has reached the "stable" status in
>> > https://arrow.apache.org/adbc/main/driver/status.html), it might be a
>> > disservice to users to implicitly fetch drivers from potentially
>> > outdated DLLs on the current system.
>> >
>> > Regards
>> >
>> > Antoine.
>> >
>> >
>> > Le 20/03/2024 à 15:08, Matt Topol a écrit :
>> > >> it seems like the current driver manager work has been largely
>> targeting
>> > > an app-specific implementation.
>> > >
>> > > Yup, that was the intention. So far discussions of ADBC having a
>> > > system-wide driver registration paradigm like ODBC have mostly been to
>> > > discuss how much we dislike that paradigm and would prefer ADBC to stay
>> > > with the app-specific approach that we currently have. :)
>> > >
>> > > As of yet, no one has requested such a paradigm so the discussions
>> > haven't
>> > > gotten revived.
>> > >
>> > > On Wed, Mar 20, 2024 at 9:22 AM David Coe > > .invalid>
>> > > wrote:
>> > >
>> > >> ODBC has different OS-level driver managers available on their
>> > respective
>> > >> systems. It seems like the current driver manager<
>> > >> https://arrow.apache.org/adbc/main/cpp/driver_manager.html> work 

Re: [DISCUSS] Versioning and releases for apache/arrow components

2024-04-22 Thread David Li
Another possibility I'd like to float is doing this in ADBC first? My primary 
motivation is (1) from Joris's list: I'd like to bump a few components 
(Snowflake, maybe SQLite) to a "stable" version while leaving the others 
behind, and in this context I think it'd be much more helpful to users to 
differentiate between stable/experimental components. Admittedly this would not 
quite address the questions around versioning docs and so on.

On Tue, Apr 9, 2024, at 22:04, Antoine Pitrou wrote:
> It seems that perhaps this discussion should be rebooted for each 
> individual component, one at a time?
>
> Let's start with something simple and obvious, with some frequent 
> contribution activity, such as perhaps Go?
>
>
>
> Le 09/04/2024 à 14:27, Joris Van den Bossche a écrit :
>> I am also in favor of this idea in general and in the principle, but
>> (somewhat repeating others) I think we should be aware that this will
>> create _more_ work overall for releasing (refactoring release scripts
>> (at least initially), deciding which version to use for which
>> component, etc), and not less, given that easing the burden of the
>> release managers was mentioned as a goal.
>> So we if pursue this, it should be for other benefits that we think this has:
>> 1) because separate versions would be beneficial for users? (have a
>> clearer messaging in the version number (eg no major new version if
>> there were hardly any changes in a certain component, or use a
>> versioning scheme more common in a certain language's ecosystem, ..?)
>> 2) because it would actually allow separate releases, even though when
>> initially always releasing in batch (eg being able to just do a bug
>> fix for go, without having to also add a tag for all others)
>> 3) because it would make the release process more manageable / easier
>> to delegate? (and thus potentially easing the burden for an
>> _individual_ release manager, although requiring more work overall)
>> 4) .. other things I am missing?
>> 
>>> We could simply release C++, R, Python and C/GLib together.
>>> ...
 I think that versioning will require additional thinking for libraries 
 like PyArrow
>>> I think we should maybe focus on a few more obvious cases. [i.e. not C++ 
>>> and Python]
>> 
>> Yes, agreed to not focus on those. At least for PyArrow, speaking as
>> one of its maintainers, I am personally (at this moment) not really
>> interested in dealing with the complexity of allowing a decoupled
>> Arrow C++ and PyArrow build.
>> 
>> Related to the docs:
>> 
>>> There is a dedicated documentation page for this... though the
>>> versioning of the docs themselves would become ill-defined:
>>> https://arrow.apache.org/docs/status.html
>> ...
>>> I think it would be best to hold off on Java also, in part because
>>> of how the Java docs are integrated with the C++ and Python docs and
>>> controlled by the version selector menu.
>> 
>> We should indeed consider how to handle the current documentation
>> site. Previously, we actually did some work to split the sphinx docs
>> (used for the format, dev docs, and for the Python/C++/(part of the)
>> Java docs) into multiple sphinx projects that could be built
>> separately (https://github.com/apache/arrow/issues/30627,
>> https://github.com/apache/arrow/pull/11980), but we abandoned that
>> work last year because not seeming worthwhile. But we could certainly
>> revive that idea, for example to at least split the format docs (and
>> let that have its own versioning based on the Format Version
>> (currently 1.4)? or just only host a single, latest version?)
>> 
>> Joris


Re: Unsupported/Other Type

2024-04-17 Thread David Li
Yes, this would be for an extension type. 

On Wed, Apr 17, 2024, at 23:25, Weston Pace wrote:
>> people generally find use in Arrow schemas independently of concrete data.
>
> This makes sense.  I think we do want to encourage use of Arrow as a "type
> system" even if there is no data involved.  And, given that we cannot
> easily change a field's data type property to "optional" it makes sense to
> use a dedicated type and I so I would be in favor of such a proposal (we
> may eventually add an "unknown type" concept in Substrait as well, it's
> come up several times, and so we could use this in that context).
>
> I think that I would still prefer a canonical extension type (with storage
> type null) over a new dedicated type.
>
> On Wed, Apr 17, 2024 at 5:39 AM Antoine Pitrou  wrote:
>
>>
>> Ah! Well, I think this could be an interesting proposal, but someone
>> should put a more formal proposal, perhaps as a draft PR.
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 17/04/2024 à 11:57, David Li a écrit :
>> > For an unsupported/other extension type.
>> >
>> > On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
>> >> What is "this proposal"?
>> >>
>> >>
>> >> Le 17/04/2024 à 10:38, David Li a écrit :
>> >>> Should I take it that this proposal is dead in the water? While we
>> could define our own Unknown/Other type for say the ADBC PostgreSQL driver
>> it might be useful to have a singular type for consumers to latch on to.
>> >>>
>> >>> On Fri, Apr 12, 2024, at 07:32, David Li wrote:
>> >>>> I think an "Other" extension type is slightly different than an
>> >>>> arbitrary extension type, though: the latter may be understood
>> >>>> downstream but the former represents a point at which a component
>> >>>> explicitly declares it does not know how to handle a field. In this
>> >>>> example, the PostgreSQL ADBC driver might be able to provide a
>> >>>> representation regardless, but a different driver (or say, the JDBC
>> >>>> adapter, which cannot necessarily get a bytestring for an arbitrary
>> >>>> JDBC type) may want an Other type to signal that it would fail if
>> asked
>> >>>> to provide particular columns.
>> >>>>
>> >>>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
>> >>>>> Depending where your Arrow-encoded data is used, either extension
>> >>>>> types or generic field metadata are options. We have this problem in
>> >>>>> the ADBC Postgres driver, where we can convert *most* Postgres types
>> >>>>> to an Arrow type but there are some others where we can't or don't
>> >>>>> know or don't implement a conversion. Currently for these we return
>> >>>>> opaque binary (the Postgres COPY representation of the value) but put
>> >>>>> field metadata so that a consumer can implement a workaround for an
>> >>>>> unsupported type. It would be arguably better to have implemented
>> this
>> >>>>> as an extension type; however, field metadata felt like less of a
>> >>>>> commitment when I first worked on this.
>> >>>>>
>> >>>>> Cheers,
>> >>>>>
>> >>>>> -dewey
>> >>>>>
>> >>>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>> >>>>>  wrote:
>> >>>>>>
>> >>>>>> I was using UUID as an example. It looks like extension types
>> covers my original request.
>> >>>>>> 
>> >>>>>> From: Felipe Oliveira Carvalho 
>> >>>>>> Sent: Thursday, April 11, 2024 7:15 AM
>> >>>>>> To: dev@arrow.apache.org 
>> >>>>>> Subject: Re: Unsupported/Other Type
>> >>>>>>
>> >>>>>> The OP used UUID as an example. Would that be enough or the request
>> is for
>> >>>>>> a flexible mechanism that allows the creation of one-off nominal
>> types for
>> >>>>>> very specific use-cases?
>> >>>>>>
>> >>>>>> —
>> >>>>>> Felipe
>> >>>>>>
>> >>>>>> On Thu, 11 Apr 2024 at 05:06 An

Re: [VOTE] Release Apache Arrow 16.0.0 - RC0

2024-04-17 Thread David Li
+1

tested sources on Debian 12, x86-64

On Wed, Apr 17, 2024, at 18:14, Raúl Cumplido wrote:
> Hi,
>
> Just a minor note, the binary verification for
> verify-rc-binaries-wheels-windows failed with [1].
> This can be avoided by implementing the solution proposed in this
> comment by Kou [2]. See more details there.
>
> As shared in the comment we don't think this is a blocker as it just
> requires to set TZDIR and download the IANA database for the ORC test
> to pass on Windows.
>
> Kind regards,
> Raúl
>
> [1] 
> https://github.com/ursacomputing/crossbow/actions/runs/8715262993/job/23907626092#step:6:5681
> [2] https://github.com/apache/arrow/pull/41235#issuecomment-2060264968
>
> El mié, 17 abr 2024 a las 11:01, Raúl Cumplido () escribió:
>>
>> Hi,
>>
>> I would like to propose the following release candidate (RC0) of Apache
>> Arrow version 16.0.0. This is a release consisting of 378
>> resolved GitHub issues[1].
>>
>> This release candidate is based on commit:
>> 6a28035c2b49b432dc63f5ee7524d76b4ed2d762 [2]
>>
>> The source release rc0 is hosted at [3].
>> The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
>> The changelog is located at [12].
>>
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [13] for how to validate a release candidate.
>>
>> See also a verification result on GitHub pull request [14].
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow 16.0.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow 16.0.0 because...
>>
>> [1]: 
>> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A16.0.0+is%3Aclosed
>> [2]: 
>> https://github.com/apache/arrow/tree/6a28035c2b49b432dc63f5ee7524d76b4ed2d762
>> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-16.0.0-rc0
>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
>> [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
>> [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/16.0.0-rc0
>> [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/16.0.0-rc0
>> [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/16.0.0-rc0
>> [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> [12]: 
>> https://github.com/apache/arrow/blob/6a28035c2b49b432dc63f5ee7524d76b4ed2d762/CHANGELOG.md
>> [13]: https://arrow.apache.org/docs/developers/release_verification.html
>> [14]: https://github.com/apache/arrow/pull/41235


Re: Unsupported/Other Type

2024-04-17 Thread David Li
I'll see if I can write this out more.

@Weston, indeed this is some sort of "planning stage" but I think a concrete 
type is still useful. For example, wherever we use Arrow and adapt a foreign 
catalog, we may need _something_ to indicate the presence of a column that we 
do not know how to interpret. It would be bad to simply pretend the column does 
not exist, and it would be inconvenient for the user to have a hard error. This 
comes up with the Java JDBC adapter, where currently we just give a hard error 
when we don't know how to convert a type, even if the user is just inquiring 
about the schema of the table, as well as the ADBC Postgres driver, as 
discussed.

Otherwise, we'd have to come up with our own encoding of Arrow schemas that 
allows for Option, and invent our own conventions in each language/in 
ADBC, and so on. Perhaps we could call this an abuse of Arrow schemas given 
that Arrow was meant to describe concrete in-memory data, but I think user 
requests for features like JSON encodings of Arrow schemas (even if we've made 
no progress on them) show that people generally find use in Arrow schemas 
independently of concrete data.

On Wed, Apr 17, 2024, at 20:09, Antoine Pitrou wrote:
> Ah! Well, I think this could be an interesting proposal, but someone 
> should put a more formal proposal, perhaps as a draft PR.
>
> Regards
>
> Antoine.
>
>
> Le 17/04/2024 à 11:57, David Li a écrit :
>> For an unsupported/other extension type.
>> 
>> On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
>>> What is "this proposal"?
>>>
>>>
>>> Le 17/04/2024 à 10:38, David Li a écrit :
>>>> Should I take it that this proposal is dead in the water? While we could 
>>>> define our own Unknown/Other type for say the ADBC PostgreSQL driver it 
>>>> might be useful to have a singular type for consumers to latch on to.
>>>>
>>>> On Fri, Apr 12, 2024, at 07:32, David Li wrote:
>>>>> I think an "Other" extension type is slightly different than an
>>>>> arbitrary extension type, though: the latter may be understood
>>>>> downstream but the former represents a point at which a component
>>>>> explicitly declares it does not know how to handle a field. In this
>>>>> example, the PostgreSQL ADBC driver might be able to provide a
>>>>> representation regardless, but a different driver (or say, the JDBC
>>>>> adapter, which cannot necessarily get a bytestring for an arbitrary
>>>>> JDBC type) may want an Other type to signal that it would fail if asked
>>>>> to provide particular columns.
>>>>>
>>>>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
>>>>>> Depending where your Arrow-encoded data is used, either extension
>>>>>> types or generic field metadata are options. We have this problem in
>>>>>> the ADBC Postgres driver, where we can convert *most* Postgres types
>>>>>> to an Arrow type but there are some others where we can't or don't
>>>>>> know or don't implement a conversion. Currently for these we return
>>>>>> opaque binary (the Postgres COPY representation of the value) but put
>>>>>> field metadata so that a consumer can implement a workaround for an
>>>>>> unsupported type. It would be arguably better to have implemented this
>>>>>> as an extension type; however, field metadata felt like less of a
>>>>>> commitment when I first worked on this.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> -dewey
>>>>>>
>>>>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>>>>>>  wrote:
>>>>>>>
>>>>>>> I was using UUID as an example. It looks like extension types covers my 
>>>>>>> original request.
>>>>>>> 
>>>>>>> From: Felipe Oliveira Carvalho 
>>>>>>> Sent: Thursday, April 11, 2024 7:15 AM
>>>>>>> To: dev@arrow.apache.org 
>>>>>>> Subject: Re: Unsupported/Other Type
>>>>>>>
>>>>>>> The OP used UUID as an example. Would that be enough or the request is 
>>>>>>> for
>>>>>>> a flexible mechanism that allows the creation of one-off nominal types 
>>>>>>> for
>>>>>>> very specific use-cases?
>>>>>>>
>>>>>>> —
>>>>>&g

Re: Unsupported/Other Type

2024-04-17 Thread David Li
For an unsupported/other extension type.

On Wed, Apr 17, 2024, at 18:32, Antoine Pitrou wrote:
> What is "this proposal"?
>
>
> Le 17/04/2024 à 10:38, David Li a écrit :
>> Should I take it that this proposal is dead in the water? While we could 
>> define our own Unknown/Other type for say the ADBC PostgreSQL driver it 
>> might be useful to have a singular type for consumers to latch on to.
>> 
>> On Fri, Apr 12, 2024, at 07:32, David Li wrote:
>>> I think an "Other" extension type is slightly different than an
>>> arbitrary extension type, though: the latter may be understood
>>> downstream but the former represents a point at which a component
>>> explicitly declares it does not know how to handle a field. In this
>>> example, the PostgreSQL ADBC driver might be able to provide a
>>> representation regardless, but a different driver (or say, the JDBC
>>> adapter, which cannot necessarily get a bytestring for an arbitrary
>>> JDBC type) may want an Other type to signal that it would fail if asked
>>> to provide particular columns.
>>>
>>> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
>>>> Depending where your Arrow-encoded data is used, either extension
>>>> types or generic field metadata are options. We have this problem in
>>>> the ADBC Postgres driver, where we can convert *most* Postgres types
>>>> to an Arrow type but there are some others where we can't or don't
>>>> know or don't implement a conversion. Currently for these we return
>>>> opaque binary (the Postgres COPY representation of the value) but put
>>>> field metadata so that a consumer can implement a workaround for an
>>>> unsupported type. It would be arguably better to have implemented this
>>>> as an extension type; however, field metadata felt like less of a
>>>> commitment when I first worked on this.
>>>>
>>>> Cheers,
>>>>
>>>> -dewey
>>>>
>>>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>>>>  wrote:
>>>>>
>>>>> I was using UUID as an example. It looks like extension types covers my 
>>>>> original request.
>>>>> 
>>>>> From: Felipe Oliveira Carvalho 
>>>>> Sent: Thursday, April 11, 2024 7:15 AM
>>>>> To: dev@arrow.apache.org 
>>>>> Subject: Re: Unsupported/Other Type
>>>>>
>>>>> The OP used UUID as an example. Would that be enough or the request is for
>>>>> a flexible mechanism that allows the creation of one-off nominal types for
>>>>> very specific use-cases?
>>>>>
>>>>> —
>>>>> Felipe
>>>>>
>>>>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou  wrote:
>>>>>
>>>>>>
>>>>>> Yes, JSON and UUID are obvious candidates for new canonical extension
>>>>>> types. XML also comes to mind, but I'm not sure there's much of a use
>>>>>> case for it.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Antoine.
>>>>>>
>>>>>>
>>>>>> Le 10/04/2024 à 22:55, Wes McKinney a écrit :
>>>>>>> In the past we have discussed adding a canonical type for UUID and JSON.
>>>>>> I
>>>>>>> still think this is a good idea and could improve ergonomics in
>>>>>> downstream
>>>>>>> language bindings (e.g. by exposing JSON querying function or
>>>>>> automatically
>>>>>>> boxing UUIDs in built-in UUID types, like the Python uuid library). Has
>>>>>>> anyone done any work on this to anyone's knowledge?
>>>>>>>
>>>>>>> On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Norman,
>>>>>>>> Arrow has a concept of extension types [1] along with the possibility 
>>>>>>>> of
>>>>>>>> proposing new canonical extension types [2].  This seems to cover the
>>>>>>>> use-cases you mention but I might be misunderstanding?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Micah
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>

Re: Unsupported/Other Type

2024-04-17 Thread David Li
Should I take it that this proposal is dead in the water? While we could define 
our own Unknown/Other type for say the ADBC PostgreSQL driver it might be 
useful to have a singular type for consumers to latch on to.

On Fri, Apr 12, 2024, at 07:32, David Li wrote:
> I think an "Other" extension type is slightly different than an 
> arbitrary extension type, though: the latter may be understood 
> downstream but the former represents a point at which a component 
> explicitly declares it does not know how to handle a field. In this 
> example, the PostgreSQL ADBC driver might be able to provide a 
> representation regardless, but a different driver (or say, the JDBC 
> adapter, which cannot necessarily get a bytestring for an arbitrary 
> JDBC type) may want an Other type to signal that it would fail if asked 
> to provide particular columns.
>
> On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
>> Depending where your Arrow-encoded data is used, either extension
>> types or generic field metadata are options. We have this problem in
>> the ADBC Postgres driver, where we can convert *most* Postgres types
>> to an Arrow type but there are some others where we can't or don't
>> know or don't implement a conversion. Currently for these we return
>> opaque binary (the Postgres COPY representation of the value) but put
>> field metadata so that a consumer can implement a workaround for an
>> unsupported type. It would be arguably better to have implemented this
>> as an extension type; however, field metadata felt like less of a
>> commitment when I first worked on this.
>>
>> Cheers,
>>
>> -dewey
>>
>> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>>  wrote:
>>>
>>> I was using UUID as an example. It looks like extension types covers my 
>>> original request.
>>> 
>>> From: Felipe Oliveira Carvalho 
>>> Sent: Thursday, April 11, 2024 7:15 AM
>>> To: dev@arrow.apache.org 
>>> Subject: Re: Unsupported/Other Type
>>>
>>> The OP used UUID as an example. Would that be enough or the request is for
>>> a flexible mechanism that allows the creation of one-off nominal types for
>>> very specific use-cases?
>>>
>>> —
>>> Felipe
>>>
>>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou  wrote:
>>>
>>> >
>>> > Yes, JSON and UUID are obvious candidates for new canonical extension
>>> > types. XML also comes to mind, but I'm not sure there's much of a use
>>> > case for it.
>>> >
>>> > Regards
>>> >
>>> > Antoine.
>>> >
>>> >
>>> > Le 10/04/2024 à 22:55, Wes McKinney a écrit :
>>> > > In the past we have discussed adding a canonical type for UUID and JSON.
>>> > I
>>> > > still think this is a good idea and could improve ergonomics in
>>> > downstream
>>> > > language bindings (e.g. by exposing JSON querying function or
>>> > automatically
>>> > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has
>>> > > anyone done any work on this to anyone's knowledge?
>>> > >
>>> > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield 
>>> > > wrote:
>>> > >
>>> > >> Hi Norman,
>>> > >> Arrow has a concept of extension types [1] along with the possibility 
>>> > >> of
>>> > >> proposing new canonical extension types [2].  This seems to cover the
>>> > >> use-cases you mention but I might be misunderstanding?
>>> > >>
>>> > >> Thanks,
>>> > >> Micah
>>> > >>
>>> > >> [1]
>>> > >>
>>> > >>
>>> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
>>> > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>>> > >>
>>> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
>>> > >>  wrote:
>>> > >>
>>> > >>> Problem Description
>>> > >>>
>>> > >>> Currently Arrow schemas can only contain columns of types supported by
>>> > >>> Arrow. In some cases an Arrow schema maps to an external schema. This
>>> > can
>>> > >>> result in the Arrow schema not being able to support all the columns
>>>

Re: Personal feedback on your last release on Apache Arrow ADBC 0.11.0

2024-04-17 Thread David Li
Hi Christofer,

Sutou Kouhei is part of the PMC.

Additionally, there is a result email: 
https://lists.apache.org/thread/gb5k69pd3k6lnbzw978fm7ppx1p9cx15

On Wed, Apr 17, 2024, at 16:52, Christofer Dutz wrote:
> Hi all,
>
> while reviewing your projects activity in the last quarter as part of 
> my preparation for today's borads meeting I came across your last vote 
> on Apache Arrow ADBC 0.11.0 RC0
>
> Technically I count only 2 binding +1 votes:
> - Matthew Topol
> - Dewey Dunnington
>
> All others are not part of the PMC.
>
> I assume the Release Manager David implicitly counted himself as +1, 
> however does a concept of an implicit vote not exist at Apache. If you 
> want to save sending an additional email, adding something like "this 
> also counts as my +1 vote" to your email, or - even better - send an 
> explicit vote email.
>
> Also would it be good to have a RESULT email containing the result of a vote.
>
> So right now we would need a third binding vote as soon as possible 
> (Possibly also for other votes, where we had the release manager 
> provide the missing third vote).
>
> Chris
>
> PS: Please keep me in CC as I'm not subscribed here.


Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore

2024-04-11 Thread David Li
Congrats Sarah!

On Fri, Apr 12, 2024, at 06:04, Joris Van den Bossche wrote:
> Congrats!
>
> On Thu, 11 Apr 2024 at 22:56, Sarah Gilmore
>  wrote:
>>
>> Thank you everyone! It's been awesome working with everyone and look 
>> forwarding to continuing to do so! 
>> 
>> From: Ian Cook 
>> Sent: Thursday, April 11, 2024 2:43 PM
>> To: dev@arrow.apache.org 
>> Subject: Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore
>>
>> Congrats Sarah!
>>
>> On Thu, Apr 11, 2024 at 12:31 Bryce Mecum  wrote:
>>
>> > Congratulations!
>> >
>> > On Thu, Apr 11, 2024 at 3:13 AM Sutou Kouhei  wrote:
>> > >
>> > > Hi,
>> > >
>> > > On behalf of the Arrow PMC, I'm happy to announce that Sarah
>> > > Gilmore has accepted an invitation to become a committer on
>> > > Apache Arrow. Welcome, and thank you for your contributions!
>> > >
>> > > Thanks,
>> > > --
>> > > kou
>> >


Re: Unsupported/Other Type

2024-04-11 Thread David Li
I think an "Other" extension type is slightly different than an arbitrary 
extension type, though: the latter may be understood downstream but the former 
represents a point at which a component explicitly declares it does not know 
how to handle a field. In this example, the PostgreSQL ADBC driver might be 
able to provide a representation regardless, but a different driver (or say, 
the JDBC adapter, which cannot necessarily get a bytestring for an arbitrary 
JDBC type) may want an Other type to signal that it would fail if asked to 
provide particular columns.

On Fri, Apr 12, 2024, at 02:30, Dewey Dunnington wrote:
> Depending where your Arrow-encoded data is used, either extension
> types or generic field metadata are options. We have this problem in
> the ADBC Postgres driver, where we can convert *most* Postgres types
> to an Arrow type but there are some others where we can't or don't
> know or don't implement a conversion. Currently for these we return
> opaque binary (the Postgres COPY representation of the value) but put
> field metadata so that a consumer can implement a workaround for an
> unsupported type. It would be arguably better to have implemented this
> as an extension type; however, field metadata felt like less of a
> commitment when I first worked on this.
>
> Cheers,
>
> -dewey
>
> On Thu, Apr 11, 2024 at 1:20 PM Norman Jordan
>  wrote:
>>
>> I was using UUID as an example. It looks like extension types covers my 
>> original request.
>> 
>> From: Felipe Oliveira Carvalho 
>> Sent: Thursday, April 11, 2024 7:15 AM
>> To: dev@arrow.apache.org 
>> Subject: Re: Unsupported/Other Type
>>
>> The OP used UUID as an example. Would that be enough or the request is for
>> a flexible mechanism that allows the creation of one-off nominal types for
>> very specific use-cases?
>>
>> —
>> Felipe
>>
>> On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou  wrote:
>>
>> >
>> > Yes, JSON and UUID are obvious candidates for new canonical extension
>> > types. XML also comes to mind, but I'm not sure there's much of a use
>> > case for it.
>> >
>> > Regards
>> >
>> > Antoine.
>> >
>> >
>> > Le 10/04/2024 à 22:55, Wes McKinney a écrit :
>> > > In the past we have discussed adding a canonical type for UUID and JSON.
>> > I
>> > > still think this is a good idea and could improve ergonomics in
>> > downstream
>> > > language bindings (e.g. by exposing JSON querying function or
>> > automatically
>> > > boxing UUIDs in built-in UUID types, like the Python uuid library). Has
>> > > anyone done any work on this to anyone's knowledge?
>> > >
>> > > On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield 
>> > > wrote:
>> > >
>> > >> Hi Norman,
>> > >> Arrow has a concept of extension types [1] along with the possibility of
>> > >> proposing new canonical extension types [2].  This seems to cover the
>> > >> use-cases you mention but I might be misunderstanding?
>> > >>
>> > >> Thanks,
>> > >> Micah
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> > https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
>> > >> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>> > >>
>> > >> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
>> > >>  wrote:
>> > >>
>> > >>> Problem Description
>> > >>>
>> > >>> Currently Arrow schemas can only contain columns of types supported by
>> > >>> Arrow. In some cases an Arrow schema maps to an external schema. This
>> > can
>> > >>> result in the Arrow schema not being able to support all the columns
>> > from
>> > >>> the external schema.
>> > >>>
>> > >>> Consider an external system that contains a column of type UUID. To
>> > model
>> > >>> the schema in Arrow, the user has two choices:
>> > >>>
>> > >>>1.  Do not include the UUID column in the Arrow schema
>> > >>>
>> > >>>2.  Map the column to an existing Arrow type. This will not include
>> > the
>> > >>> original type information. A UUID can be mapped to a FixedSizeBinary,
>> > but
>> > >>> consumers of the Arrow schema will be unable to distinguish a
>> > >>> FixedSizeBinary field from a UUID field.
>> > >>>
>> > >>> Possible Solution
>> > >>>
>> > >>>*   Add a new type code that represents unsupported types
>> > >>>
>> > >>>*   Values for the new type are represented as variable length
>> > binary
>> > >>>
>> > >>> Some drivers can expose data even when they don’t understand the data
>> > >>> type. For example, the PostgreSQL driver will return the raw bytes for
>> > >>> fields of an unknown type. Using an explicit type lets clients know
>> > that
>> > >>> they should convert values if they were able to determine the actual
>> > data
>> > >>> type.
>> > >>>
>> > >>> Questions
>> > >>>
>> > >>>*   What is the impact on existing clients when they encounter
>> > fields
>> > >> of
>> > >>> the unsupported type?
>> > >>>
>> > >>>*   Is it safe to assume that all unsupported values can safely be
>> > >>> converted to a variable length binary?
>> > >>>
>> > >>> 

Re: Unsupported/Other Type

2024-04-10 Thread David Li
I think this should be an extension type, yes.

It could be parametrized on the storage type; the other system might at least 
know that one type is based on another (e.g. a user defined type). Type 
metadata can be preserved in the extension type's metadata.

I think it would be good to have standard UUID and JSON extension types. I 
don't think anyone is actively working on it. 

On Thu, Apr 11, 2024, at 05:55, Wes McKinney wrote:
> In the past we have discussed adding a canonical type for UUID and JSON. I
> still think this is a good idea and could improve ergonomics in downstream
> language bindings (e.g. by exposing JSON querying function or automatically
> boxing UUIDs in built-in UUID types, like the Python uuid library). Has
> anyone done any work on this to anyone's knowledge?
>
> On Wed, Apr 10, 2024 at 3:05 PM Micah Kornfield 
> wrote:
>
>> Hi Norman,
>> Arrow has a concept of extension types [1] along with the possibility of
>> proposing new canonical extension types [2].  This seems to cover the
>> use-cases you mention but I might be misunderstanding?
>>
>> Thanks,
>> Micah
>>
>> [1]
>>
>> https://arrow.apache.org/docs/format/Columnar.html#format-metadata-extension-types
>> [2] https://arrow.apache.org/docs/format/CanonicalExtensions.html
>>
>> On Wed, Apr 10, 2024 at 11:44 AM Norman Jordan
>>  wrote:
>>
>> > Problem Description
>> >
>> > Currently Arrow schemas can only contain columns of types supported by
>> > Arrow. In some cases an Arrow schema maps to an external schema. This can
>> > result in the Arrow schema not being able to support all the columns from
>> > the external schema.
>> >
>> > Consider an external system that contains a column of type UUID. To model
>> > the schema in Arrow, the user has two choices:
>> >
>> >   1.  Do not include the UUID column in the Arrow schema
>> >
>> >   2.  Map the column to an existing Arrow type. This will not include the
>> > original type information. A UUID can be mapped to a FixedSizeBinary, but
>> > consumers of the Arrow schema will be unable to distinguish a
>> > FixedSizeBinary field from a UUID field.
>> >
>> > Possible Solution
>> >
>> >   *   Add a new type code that represents unsupported types
>> >
>> >   *   Values for the new type are represented as variable length binary
>> >
>> > Some drivers can expose data even when they don’t understand the data
>> > type. For example, the PostgreSQL driver will return the raw bytes for
>> > fields of an unknown type. Using an explicit type lets clients know that
>> > they should convert values if they were able to determine the actual data
>> > type.
>> >
>> > Questions
>> >
>> >   *   What is the impact on existing clients when they encounter fields
>> of
>> > the unsupported type?
>> >
>> >   *   Is it safe to assume that all unsupported values can safely be
>> > converted to a variable length binary?
>> >
>> >   *   How can we preserve information about the original type?
>> >
>> >
>>


Re: [DISCUSS] Versioning and releases for apache/arrow components

2024-04-09 Thread David Li
Java has JNI parts, but I think they do not necessarily need to release at the 
same time as C++, especially since the JAR bundles the libraries; Java could 
just pick up the latest version of the C++ library whenever it releases. It 
would make it harder if the next step is to also decouple the repositories, 
though.

JB, I see what you're saying but I think we want to avoid declaring a "core" 
Arrow library as the implication is not fair to the independent and 
fully-featured implementations in Go, Rust, etc. But that is just a matter of 
wording.

On Tue, Apr 9, 2024, at 17:06, Jean-Baptiste Onofré wrote:
> Hi,
>
> Yeah, to be honest, I was more focused on Java versioning.
>
> Maybe, we can "group" Arrow components in two major areas: the "core"
> libs and the components using the "core" libs.
> C++ can have its own versioning, and the rest is decoupled from each
> other but it will depend to C++ release.
>
> I think it's do-able and probably "cleaner".
>
> Regards
> JB
>
> On Mon, Apr 8, 2024 at 3:55 PM Weston Pace  wrote:
>>
>> > Probably major versions should match between C++ and PyArrow, but I guess
>> > we could have diverging minor and patch versions. Or at least patch
>> > versions given that
>> > a new minor version is usually cut for bug fixes too.
>>
>> I believe even this would be difficult.  Stable ABIs are very finicky in
>> C++.  If the public API surface changes in any way then it can lead to
>> subtle bugs if pyarrow were to link against an older version.  I also am
>> not sure there is much advantage in trying to separate pyarrow from
>> arrow-cpp since they are almost always changing in lockstep (e.g. any
>> change to arrow-cpp enables functionality in pyarrow).
>>
>> I think we should maybe focus on a few more obvious cases.
>>
>> I think C#, JS, Java, and Go are the most obvious candidates to decouple.
>> Even then, we should probably only separate these candidates if they have
>> willing release managers.
>>
>> C/GLib, python, and ruby are all tightly coupled to C++ at the moment and
>> should not be a first priority.  I would have guessed that R is also in
>> this list but Jacob reported in the original email that they are already
>> somewhat decoupled?
>>
>> I don't know anything about swift or matlab.
>>
>> On Mon, Apr 8, 2024 at 6:23 AM Alessandro Molina
>>  wrote:
>>
>> > On Sun, Apr 7, 2024 at 3:06 PM Andrew Lamb  wrote:
>> >
>> > >
>> > > We have had separate releases / votes for Arrow Rust (and Arrow
>> > DataFusion)
>> > > and it has served us quite well. The version schemes have diverged
>> > > substantially from the monorepo (we are on version 51.0.0 in arrow-rs,
>> > for
>> > > example) and it doesn't seem to have caused any large confusion with
>> > users
>> > >
>> > >
>> > I think that versioning will require additional thinking for libraries like
>> > PyArrow, Java etc...
>> > For rust this is a non problem because there is no link to the C++ library,
>> >
>> > PyArrow instead is based on what the C++ library provides,
>> > so there is a direct link between the features provided by C++ in a
>> > specific version
>> > and the features provided in PyArrow at a specific version.
>> >
>> > More or less PyArrow 20 should have the same bug fixes that C++ 20 has,
>> > and diverging the two versions would lead to confusion easily.
>> > Probably major versions should match between C++ and PyArrow, but I guess
>> > we could have diverging minor and patch versions. Or at least patch
>> > versions given that
>> > a new minor version is usually cut for bug fixes too.
>> >


[RESULT][VOTE] Bulk ingestion support for Flight SQL (vote #2)

2024-04-08 Thread David Li
The vote passes with 3 binding, 4 non-binding +1 votes. Thanks Joel!

On Sat, Apr 6, 2024, at 17:56, Andrew Lamb wrote:
> +1
>
> On Sat, Apr 6, 2024 at 3:48 AM wish maple  wrote:
>
>> +1 (non binding)
>>
>> Best,
>> Xuwei Fu
>> ulk ingestion support for Flight SQL
>>
>> David Li  于2024年4月5日周五 16:38写道:
>>
>> > Hello,
>> >
>> > Joel Lubinitsky has proposed adding bulk ingestion support to Arrow
>> Flight
>> > SQL [1]. This provides a path for uploading an Arrow dataset to a Flight
>> > SQL server to create or append to a table, without having to know the
>> > specifics of the SQL or Substrait support on the server. The
>> functionality
>> > mimics similar functionality in ADBC. This is the second attempt at a
>> vote
>> > [3].
>> >
>> > Joel has provided reference implementations of this for C++ and Go at
>> [2],
>> > along with an integration test.
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Accept this proposal
>> > [ ] +0
>> > [ ] -1 Do not accept this proposal because...
>> >
>> > [1]: https://lists.apache.org/thread/mo98rsh20047xljrbfymrks8f2ngn49z
>> > [2]: https://github.com/apache/arrow/pull/38256
>> > [3]: https://lists.apache.org/thread/c8n3t0452807wm1ol1hvj41rs1vso3tp
>> >
>> > Thanks,
>> > David
>>


Re: [VOTE] Add new info codes and options keys to ADBC specification

2024-04-07 Thread David Li
+1

On Sat, Apr 6, 2024, at 22:20, Matt Topol wrote:
> +1
>
> On Sat, Apr 6, 2024, 4:54 AM Andrew Lamb  wrote:
>
>> +1
>>
>> On Fri, Apr 5, 2024 at 9:55 PM Jacob Wujciak 
>> wrote:
>>
>> > + 1 (non-binding)
>> >
>> > Am Sa., 6. Apr. 2024 um 01:57 Uhr schrieb Joel Lubinitsky <
>> > joell...@gmail.com>:
>> >
>> > > Yes, just updated both the issue and the PR.
>> > >
>> > > Thanks,
>> > > Joel
>> > >
>> > > On Fri, Apr 5, 2024 at 7:51 PM Sutou Kouhei 
>> wrote:
>> > >
>> > > > +1
>> > > >
>> > > > Could you also update the description of
>> > > > https://github.com/apache/arrow-adbc/issues/1650 ?
>> > > >
>> > > > Thanks,
>> > > > --
>> > > > kou
>> > > >
>> > > > In 
>> > > >   "Re: [VOTE] Add new info codes and options keys to ADBC
>> > specification"
>> > > > on Fri, 05 Apr 2024 15:39:33 -,
>> > > >   Joel Lubinitsky  wrote:
>> > > >
>> > > > > Update on this:
>> > > > >
>> > > > > I've removed ADBC_INFO_VENDOR_READ_ONLY from the proposal. The
>> change
>> > > is
>> > > > reflected in this commit [1] on the original PR [2]. The numbers
>> > > > corresponding to each of the other info codes have been decremented
>> by
>> > 1
>> > > to
>> > > > fill the gap in numbering.
>> > > > >
>> > > > > The reason is that a similar option already exists via
>> > > > ConnectionGet/SetOptions, so defining it on the driver isn't helpful.
>> > > > >
>> > > > > [1]:
>> > > >
>> > >
>> >
>> https://github.com/apache/arrow-adbc/pull/1649/commits/a52a4fa16e6b740392d3617751e28f044f1a8325
>> > > > > [2]: https://github.com/apache/arrow-adbc/pull/1649
>> > > > >
>> > > > > Thanks,
>> > > > > Joel
>> > > > >
>> > > > > On 2024/04/03 11:01:13 Joel Lubinitsky wrote:
>> > > > >> Hello,
>> > > > >>
>> > > > >> I would like to propose a change to the ADBC specification that
>> > > > introduces
>> > > > >> 5 new standard info codes and formalizes 3 existing option keys.
>> > > > >>
>> > > > >> The info codes being introduced are:
>> > > > >> - ADBC_INFO_VENDOR_READ_ONLY 3
>> > > > >> - ADBC_INFO_VENDOR_SQL 4
>> > > > >> - ADBC_INFO_VENDOR_SUBSTRAIT 5
>> > > > >> - ADBC_INFO_VENDOR_SUBSTRAIT_MIN_VERSION 6
>> > > > >> - ADBC_INFO_VENDOR_SUBSTRAIT_MAX_VERSION 7
>> > > > >>
>> > > > >> The option keys have been in use (defined in options.h) and are
>> > being
>> > > > moved
>> > > > >> to adbc.h:
>> > > > >> - ADBC_INGEST_OPTION_TARGET_CATALOG "adbc.ingest.target_catalog"
>> > > > >> - ADBC_INGEST_OPTION_TARGET_DB_SCHEMA
>> "adbc.ingest.target_db_schema"
>> > > > >> - ADBC_INGEST_OPTION_TEMPORARY "adbc.ingest.temporary"
>> > > > >>
>> > > > >> The change is described in this issue [0] and an implementation is
>> > > > included
>> > > > >> in this PR [1].
>> > > > >>
>> > > > >> The vote will be open for at least 72 hours.
>> > > > >>
>> > > > >> [ ] +1 Add these info codes and options keys to the ADBC spec
>> > > > >> [ ] +0
>> > > > >> [ ] -1 Do not add these to the ADBC spec because...
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Joel
>> > > > >>
>> > > > >> [0]: https://github.com/apache/arrow-adbc/issues/1650
>> > > > >> [1]: https://github.com/apache/arrow-adbc/pull/1649
>> > > > >>
>> > > >
>> > >
>> >
>>


[VOTE] Bulk ingestion support for Flight SQL (vote #2)

2024-04-05 Thread David Li
Hello,

Joel Lubinitsky has proposed adding bulk ingestion support to Arrow Flight SQL 
[1]. This provides a path for uploading an Arrow dataset to a Flight SQL server 
to create or append to a table, without having to know the specifics of the SQL 
or Substrait support on the server. The functionality mimics similar 
functionality in ADBC. This is the second attempt at a vote [3].

Joel has provided reference implementations of this for C++ and Go at [2], 
along with an integration test.

The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...

[1]: https://lists.apache.org/thread/mo98rsh20047xljrbfymrks8f2ngn49z
[2]: https://github.com/apache/arrow/pull/38256
[3]: https://lists.apache.org/thread/c8n3t0452807wm1ol1hvj41rs1vso3tp

Thanks,
David

Re: [Format][ADBC] GetObjects semantics for system objects

2024-04-02 Thread David Li
I don't see why we should exclude them; I would also caution against treating 
the driver behavior (especially only drivers in one language) as a reference.

On Tue, Apr 2, 2024, at 23:04, Joel Lubinitsky wrote:
> Hi,
>
> The ADBC spec does not currently define whether system
> catalogs/schemas/tables (e.g. information_schema.columns, sqlite_master,
> etc) should be included in the result of ConnectionGetObjects.
>
> A survey of existing driver implementations such as sqlite and postgresql
> indicates that the current convention is to exclude system objects,
> including only objects that have been defined by the user. Shall we
> formally add this to the spec?
>
> I think this would involve just updating a comment in adbc.h and any
> related documentation. The benefit would be better consistency for drivers
> developed by vendors outside the arrow-adbc repo.
>
> For context, this came up in a PR [0] for the DuckDB ADBC implementation.
>
> Thanks,
> Joel
>
> [0]: https://github.com/duckdb/duckdb/pull/11446


Re: [RESULT][VOTE] Release Apache Arrow ADBC 0.11.0 - RC0

2024-04-02 Thread David Li
Update on tasks:

[x] Close the GitHub milestone/project
[x] Add the new release to the Apache Reporter System
[x] Upload source release artifacts to Subversion
[x] Create the final GitHub release
[x] Update website
[x] Upload wheels/sdist to PyPI
[x] Publish Maven packages
[x] Update tags for Go modules
[x] Deploy APT/Yum repositories
[ ] Update R packages
[x] Upload Ruby packages to RubyGems
[x] Upload C#/.NET packages to NuGet
[x] Update conda-forge packages
[x] Announce the new release
[x] Remove old artifacts
[IN PROGRESS] Bump versions https://github.com/apache/arrow-adbc/pull/1693
[x] Publish release blog post 

On Sun, Mar 31, 2024, at 11:18, David Li wrote:
> The vote carries with 3 binding, 3 non-binding +1 votes.
>
> I will handle the post-release tasks. @Dewey I would appreciate help 
> with CRAN as usual!
>
> [x] Close the GitHub milestone/project
> [x] Add the new release to the Apache Reporter System
> [x] Upload source release artifacts to Subversion
> [x] Create the final GitHub release
> [ ] Update website
> [ ] Upload wheels/sdist to PyPI
> [ ] Publish Maven packages
> [ ] Update tags for Go modules
> [ ] Deploy APT/Yum repositories
> [ ] Update R packages
> [ ] Upload Ruby packages to RubyGems
> [ ] Upload C#/.NET packages to NuGet
> [ ] Update conda-forge packages
> [ ] Announce the new release
> [ ] Remove old artifacts
> [ ] Bump versions
> [ ] Publish release blog post
>
> On Sat, Mar 30, 2024, at 02:33, Jean-Baptiste Onofré wrote:
>> +1 (non binding)
>>
>> Regards
>> JB
>>
>> On Thu, Mar 28, 2024 at 4:07 PM David Li  wrote:
>>>
>>> Hello,
>>>
>>> I would like to propose the following release candidate (RC0) of Apache 
>>> Arrow ADBC version 0.11.0. This is a release consisting of 36 resolved 
>>> GitHub issues [1].
>>>
>>> This release candidate is based on commit: 
>>> 3cb5825bf551ae93d0e9ed2f64be226b569b27a7 [2]
>>>
>>> The source release rc0 is hosted at [3].
>>> The binary artifacts are hosted at [4][5][6][7][8].
>>> The changelog is located at [9].
>>>
>>> Please download, verify checksums and signatures, run the unit tests, and 
>>> vote on the release. See [10] for how to validate a release candidate.
>>>
>>> See also a verification result on GitHub Actions [11].
>>>
>>> The vote will be open for at least 72 hours.
>>>
>>> [ ] +1 Release this as Apache Arrow ADBC 0.11.0
>>> [ ] +0
>>> [ ] -1 Do not release this as Apache Arrow ADBC 0.11.0 because...
>>>
>>> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
>>> TEST_APT=0 TEST_YUM=0`.)
>>>
>>> [1]: 
>>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.11.0%22+is%3Aclosed
>>> [2]: 
>>> https://github.com/apache/arrow-adbc/commit/3cb5825bf551ae93d0e9ed2f64be226b569b27a7
>>> [3]: 
>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.11.0-rc0/
>>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>>> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>>> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>>> [7]: 
>>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>>> [8]: 
>>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.11.0-rc0
>>> [9]: 
>>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.11.0-rc0/CHANGELOG.md
>>> [10]: 
>>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>>> [11]: https://github.com/apache/arrow-adbc/actions/runs/8468352632


Re: [ANNOUNCE] New Committer Joel Lubinitsky

2024-04-01 Thread David Li
Congrats Joel!

On Tue, Apr 2, 2024, at 05:42, Weston Pace wrote:
> Congratulations Joel!
>
> On Mon, Apr 1, 2024 at 1:16 PM Bryce Mecum  wrote:
>
>> Congrats, Joel!
>>
>> On Mon, Apr 1, 2024 at 6:59 AM Matt Topol  wrote:
>> >
>> > On behalf of the Arrow PMC, I'm happy to announce that Joel Lubinitsky
>> has
>> > accepted an invitation to become a committer on Apache Arrow. Welcome,
>> and
>> > thank you for your contributions!
>> >
>> > --Matt
>>


[ANNOUNCE] Apache Arrow ADBC 0.11.0 released

2024-03-31 Thread David Li
The Apache Arrow community is pleased to announce the 0.11.0 release of the 
Apache Arrow ADBC libraries. It includes 36 resolved GitHub issues ([1]).

The release is available now from [2] and [3].

Release notes are available at: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.11.0/CHANGELOG.md#adbc-libraries-0110-2024-03-28

What is Apache Arrow?
-
Apache Arrow is a columnar in-memory analytics layer designed to accelerate big 
data. It houses a set of canonical in-memory representations of flat and 
hierarchical data along with multiple language-bindings for structure 
manipulation. It also provides low-overhead streaming and batch messaging, 
zero-copy interprocess communication (IPC), and vectorized in-memory analytics 
libraries. Languages currently supported include C, C++, C#, Go, Java, 
JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

What is Apache Arrow ADBC?
--
ADBC is a database access abstraction for Arrow-based applications. It provides 
a cross-language API for working with databases while using Arrow data, 
providing an alternative to APIs like JDBC and ODBC for analytical 
applications. For more, see [4].

Please report any feedback to the mailing lists ([5], [6]).

Regards,
The Apache Arrow Community

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.11.0%22+is%3Aclosed
[2]: https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-0.11.0 
[3]: https://apache.jfrog.io/ui/native/arrow
[4]: https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/
[5]: https://lists.apache.org/list.html?u...@arrow.apache.org
[6]: https://lists.apache.org/list.html?dev@arrow.apache.org


[RESULT][VOTE] Release Apache Arrow ADBC 0.11.0 - RC0

2024-03-31 Thread David Li
The vote carries with 3 binding, 3 non-binding +1 votes.

I will handle the post-release tasks. @Dewey I would appreciate help with CRAN 
as usual!

[x] Close the GitHub milestone/project
[x] Add the new release to the Apache Reporter System
[x] Upload source release artifacts to Subversion
[x] Create the final GitHub release
[ ] Update website
[ ] Upload wheels/sdist to PyPI
[ ] Publish Maven packages
[ ] Update tags for Go modules
[ ] Deploy APT/Yum repositories
[ ] Update R packages
[ ] Upload Ruby packages to RubyGems
[ ] Upload C#/.NET packages to NuGet
[ ] Update conda-forge packages
[ ] Announce the new release
[ ] Remove old artifacts
[ ] Bump versions
[ ] Publish release blog post

On Sat, Mar 30, 2024, at 02:33, Jean-Baptiste Onofré wrote:
> +1 (non binding)
>
> Regards
> JB
>
> On Thu, Mar 28, 2024 at 4:07 PM David Li  wrote:
>>
>> Hello,
>>
>> I would like to propose the following release candidate (RC0) of Apache 
>> Arrow ADBC version 0.11.0. This is a release consisting of 36 resolved 
>> GitHub issues [1].
>>
>> This release candidate is based on commit: 
>> 3cb5825bf551ae93d0e9ed2f64be226b569b27a7 [2]
>>
>> The source release rc0 is hosted at [3].
>> The binary artifacts are hosted at [4][5][6][7][8].
>> The changelog is located at [9].
>>
>> Please download, verify checksums and signatures, run the unit tests, and 
>> vote on the release. See [10] for how to validate a release candidate.
>>
>> See also a verification result on GitHub Actions [11].
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow ADBC 0.11.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow ADBC 0.11.0 because...
>>
>> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
>> TEST_APT=0 TEST_YUM=0`.)
>>
>> [1]: 
>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.11.0%22+is%3Aclosed
>> [2]: 
>> https://github.com/apache/arrow-adbc/commit/3cb5825bf551ae93d0e9ed2f64be226b569b27a7
>> [3]: 
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.11.0-rc0/
>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> [7]: 
>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> [8]: 
>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.11.0-rc0
>> [9]: 
>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.11.0-rc0/CHANGELOG.md
>> [10]: 
>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> [11]: https://github.com/apache/arrow-adbc/actions/runs/8468352632


Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-03-31 Thread David Li
w this was being considered for Flight.
>> This proposal, as I understand it, should make it possible for cloud
>> servers to support a cloud fetch style API.  From the discussion I got the
>> impression that this cloud fetch approach is useful and generally
>> applicable.
>>
>> So a big +1 for the idea of disassociated transports but I'm not sure why
>> we need a vote to start working on it (but I'm not opposed if a vote helps)
>>
>> [1]
>>
>> https://www.databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html
>>
>> On Thu, Mar 28, 2024 at 1:04 PM Matt Topol  wrote:
>>
>> > I'll keep this new vote open for at least the next 72 hours. As before
>> > please reply with:
>> >
>> > [ ] +1 Accept this Proposal
>> > [ ] +0
>> > [ ] -1 Do not accept this proposal because...
>> >
>> > Thanks everyone!
>> >
>> > On Wed, Mar 27, 2024 at 7:51 PM Benjamin Kietzman 
>> > wrote:
>> >
>> > > +1
>> > >
>> > > On Tue, Mar 26, 2024, 18:36 Matt Topol  wrote:
>> > >
>> > > > Should I start a new thread for a new vote? Or repeat the original
>> vote
>> > > > email here?
>> > > >
>> > > > Just asking since there hasn't been any responses so far.
>> > > >
>> > > > --Matt
>> > > >
>> > > > On Thu, Mar 21, 2024 at 11:46 AM Matt Topol 
>> > > > wrote:
>> > > >
>> > > > > Absolutely, it will be marked experimental until we see some people
>> > > using
>> > > > > it and can get more real-world feedback.
>> > > > >
>> > > > > There's also already a couple things that will be followed-up on
>> > after
>> > > > the
>> > > > > initial adoption for expansion which were discussed in the
>> comments.
>> > > > >
>> > > > > On Thu, Mar 21, 2024, 11:42 AM David Li 
>> wrote:
>> > > > >
>> > > > >> I think let's try again. Would it be reasonable to declare this
>> > > > >> 'experimental' for the time being, just as we did with
>> Flight/Flight
>> > > > >> SQL/etc?
>> > > > >>
>> > > > >> On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
>> > > > >> > Hey All, It's been another month and we've gotten a whole bunch
>> of
>> > > > >> feedback
>> > > > >> > and engagement on the document from a variety of individuals.
>> > Myself
>> > > > >> and a
>> > > > >> > few others have proactively attempted to reach out to as many
>> > third
>> > > > >> parties
>> > > > >> > as we could, hoping to pull more engagement also. While it would
>> > be
>> > > > >> great
>> > > > >> > to get even more feedback, the comments have slowed down and we
>> > > > haven't
>> > > > >> > gotten anything in a few days at this point.
>> > > > >> >
>> > > > >> > If there's no objections, I'd like to try to open up for voting
>> > > again
>> > > > to
>> > > > >> > officially adopt this as a protocol to add to our docs.
>> > > > >> >
>> > > > >> > Thanks all!
>> > > > >> >
>> > > > >> > --Matt
>> > > > >> >
>> > > > >> > On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen 
>> > > > wrote:
>> > > > >> >
>> > > > >> >> Agreed that it makes sense not to focus on in-place updating
>> for
>> > > this
>> > > > >> >> proposal.  I’m not even sure it’s a great fit as a “general
>> > > purpose”
>> > > > >> Arrow
>> > > > >> >> protocol, because of all the assumptions and restrictions
>> > required
>> > > as
>> > > > >> you
>> > > > >> >> noted.
>> > > > >> >>
>> > > > >> >> I took another look at the proposal and don’t think there’s
>> > > anything
>> > > > >> >> preventing in-place updating in the future - ultimately the
>> data
>> > 

[VOTE] Release Apache Arrow ADBC 0.11.0 - RC0

2024-03-28 Thread David Li
Hello,

I would like to propose the following release candidate (RC0) of Apache Arrow 
ADBC version 0.11.0. This is a release consisting of 36 resolved GitHub issues 
[1].

This release candidate is based on commit: 
3cb5825bf551ae93d0e9ed2f64be226b569b27a7 [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8].
The changelog is located at [9].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [10] for how to validate a release candidate.

See also a verification result on GitHub Actions [11].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow ADBC 0.11.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow ADBC 0.11.0 because...

Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export TEST_APT=0 
TEST_YUM=0`.)

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.11.0%22+is%3Aclosed
[2]: 
https://github.com/apache/arrow-adbc/commit/3cb5825bf551ae93d0e9ed2f64be226b569b27a7
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.11.0-rc0/
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[7]: 
https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
[8]: 
https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.11.0-rc0
[9]: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.11.0-rc0/CHANGELOG.md
[10]: 
https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
[11]: https://github.com/apache/arrow-adbc/actions/runs/8468352632


Re: [DISCUSS] Flight SQL Experimental status in docs

2024-03-21 Thread David Li
There's a ticket for it already if anyone would like to help

On Thu, Mar 21, 2024, at 12:02, James Duong wrote:
> Hi,
>
> We fairly recently merged a PR to remove the experimental status for 
> Flight SQL from maven: https://github.com/apache/arrow/pull/39040
>
> However our public documentation still has a warning saying Flight SQL 
> is experimental:
> https://arrow.apache.org/docs/format/FlightSql.html
>
> “Flight SQL is experimental and changes to the protocol may still be made.”
>
> Should we rephrase this? This being a warning and being experimental 
> can lead potential users into thinking the protocol is unstable.
>
> Perhaps take it out of the warning note and say:
>
> “Flight SQL is continuing to evolve with new functionality. However 
> clients only need to be updated to take advantage of new features and a 
> Flight SQL version update will not break existing clients”. Perhaps 
> there’s better wording to indicate that Flight SQL won’t get breaking 
> changes.


Re: [VOTE] Stateless prepared statements in FlightSQL

2024-03-21 Thread David Li
+1 

Thank you Adam!

On Thu, Mar 21, 2024, at 10:07, Andrew Lamb wrote:
> +1 (binding)
>
> I reviewed the spec proposal and the rust implementation and I think they
> look good to go. I am not as confident on the golang implementation, but
> the comments on the Go PR look like there are no objections.
>
> Thank you for your work driving this forward
>
> Andrew
>
> On Wed, Mar 20, 2024 at 10:04 PM Adam C  wrote:
>
>> Hello,
>>
>> I would like to propose a change to the FlightSQL specification as
>> originally described in this Github issue [1] by Andrew Lamb. The
>> specification change would allow servers to support prepared
>> statements with parameters, without needing to manage state between
>> client requests.
>>
>> There is a draft pull request [2] submitted by me that updates the
>> protobuf format and documents the changes made in the FlightSQL
>> specification. I have also created draft reference implementations for
>> the Go [3] and Rust [4] FlightSQL libraries. These implementations
>> will be submitted as pull requests once the proposal is officially
>> adopted.
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow Rust
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow Rust  because...
>>
>> Cheers,
>> Adam Curtis
>>
>> [1]:
>> https://github.com/apache/arrow/issues/37720
>> [2]:
>> https://github.com/apache/arrow/pull/40243
>> [3]:
>> https://github.com/apache/arrow/pull/40311
>> [4]:
>> https://github.com/apache/arrow-rs/pull/5433
>>


Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-03-21 Thread David Li
I think let's try again. Would it be reasonable to declare this 'experimental' 
for the time being, just as we did with Flight/Flight SQL/etc?

On Tue, Mar 19, 2024, at 15:24, Matt Topol wrote:
> Hey All, It's been another month and we've gotten a whole bunch of feedback
> and engagement on the document from a variety of individuals. Myself and a
> few others have proactively attempted to reach out to as many third parties
> as we could, hoping to pull more engagement also. While it would be great
> to get even more feedback, the comments have slowed down and we haven't
> gotten anything in a few days at this point.
>
> If there's no objections, I'd like to try to open up for voting again to
> officially adopt this as a protocol to add to our docs.
>
> Thanks all!
>
> --Matt
>
> On Sat, Mar 2, 2024 at 6:43 PM Paul Whalen  wrote:
>
>> Agreed that it makes sense not to focus on in-place updating for this
>> proposal.  I’m not even sure it’s a great fit as a “general purpose” Arrow
>> protocol, because of all the assumptions and restrictions required as you
>> noted.
>>
>> I took another look at the proposal and don’t think there’s anything
>> preventing in-place updating in the future - ultimately the data body could
>> just be in the same location for subsequent messages.
>>
>> Thanks!
>> Paul
>>
>> On Fri, Mar 1, 2024 at 5:28 PM Matt Topol  wrote:
>>
>> > > @pgwhalen: As a potential "end user developer," (and aspiring
>> > contributor) this
>> > immediately excited me when I first saw it.
>> >
>> > Yay! Good to hear that!
>> >
>> > > @pgwhalen: And it wasn't clear to me whether updating batches in
>> > place (and the producer/consumer coordination that comes with that) was
>> > supported or encouraged as part of the proposal.
>> >
>> > So, updating batches in place was not a particular use-case we were
>> > targeting with this approach. Instead using shared memory to produce and
>> > consume the buffers/batches without having to physically copy the data.
>> > Trying to update a batch in place is a dangerous prospect for a number of
>> > reasons, but as you've mentioned it can technically be made safe if the
>> > shape is staying the same and you're only modifying fixed-width data
>> types
>> > (i.e. not only is the *shape* unchanged but the sizes of the underlying
>> > data buffers are also remaining unchanged). The producer/consumer
>> > coordination that would be needed for updating batches in place is not
>> part
>> > of this proposal but is definitely something we can look into as a
>> > follow-up to this for extending it. There's a number of discussions that
>> > would need to be had around that so I don't want to add on another
>> > complexity to this already complex proposal.
>> >
>> > That said, if you or anyone see something in this proposal that would
>> > hinder or prevent being able to use it for your use case please let me
>> know
>> > so we can address it. Even though the proposal as it currently exists
>> > doesn't fully support the in-place updating of batches, I don't want to
>> > make things harder for us in such a follow-up where we'd end up requiring
>> > an entirely new protocol to support that.
>> >
>> > > @octalene.dev: I know of a third party that is interested in Arrow for
>> > HPC environments that could be interested in the proposal and I can see
>> if
>> > they're interested in providing feedback.
>> >
>> > Awesome! Thanks much!
>> >
>> >
>> > For reference to anyone who hasn't looked at the document in a while,
>> since
>> > the original discussion thread on this I have added a full "Background
>> > Context" page to the beginning of the proposal to help anyone who isn't
>> > already familiar with the issues this protocol is trying to solve or
>> isn't
>> > already familiar with ucx or libfabric transports to better understand
>> > *why* I'm
>> > proposing this and what it is trying to solve. The point of this
>> background
>> > information is to help ensure that anyone who might have thoughts on
>> > protocols in general or APIs should still be able to understand the base
>> > reasons and goals that we're trying to achieve with this protocol
>> proposal.
>> > You don't need to already understand managing GPU/device memory or ucx to
>> > be able to have meaningful input on the document.
>> >
>> > Thanks again to all who have contributed so far and please spread to any
>> > contacts that you think might be interested in this for their particular
>> > use cases.
>> >
>> > --Matt
>> >
>> > On Wed, Feb 28, 2024 at 1:39 AM Aldrin 
>> wrote:
>> >
>> > > I am interested in this as well, but I haven't gotten to a point where
>> I
>> > > can have valuable input (I haven't tried other transports). I know of a
>> > > third party that is interested in Arrow for HPC environments that could
>> > be
>> > > interested in the proposal and I can see if they're interested in
>> > providing
>> > > feedback.
>> > >
>> > > I glanced at the document before but I'll go through again to see if
>> > 

Re: [ANNOUNCE] New Arrow committer: Bryce Mecum

2024-03-18 Thread David Li
Congrats Bryce!

On Mon, Mar 18, 2024, at 08:52, Ian Cook wrote:
> Congratulations Bryce!
>
> Ian
>
> On Sun, Mar 17, 2024 at 22:24 Nic Crane  wrote:
>
>> On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has
>> accepted an invitation to become a committer on Apache Arrow. Welcome, and
>> thank you for your contributions!
>>
>> Nic
>>


Re: [VOTE] Release Apache Arrow 15.0.2 - RC3

2024-03-16 Thread David Li
+1

tested on Debian Linux/Conda/x86_64

On Sat, Mar 16, 2024, at 13:43, Ruoxi Sun wrote:
> +1 (non-binding)
>
> On my Intel Mac, OS version Sonoma 14.2.1 (23C71).
>
> TEST_DEFAULT=0 TEST_GO=1 TEST_CPP=1 ./verify-release-candidate.sh 15.0.2 3
>
> I also tried to verify Python
>
> TEST_DEFAULT=0 TEST_PYTHON=1 ./verify-release-candidate.sh 15.0.2 3
>
> It succeeded except for [1] (as for 15.0.0 and 15.0.1), which should be
> trivial.
>
> [1] https://github.com/apache/arrow/issues/39679
>
> *Regards,*
> *Rossi SUN*
>
>
> Sutou Kouhei  于2024年3月15日周五 18:56写道:
>
>> +1
>>
>> I ran the followings on Debian GNU/Linux sid:
>>
>>   * TEST_DEFAULT=0 \
>>   TEST_SOURCE=1 \
>>   LANG=C \
>>   TZ=UTC \
>>   CUDAToolkit_ROOT=/usr \
>>   ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON
>> -Dxsimd_SOURCE=BUNDLED" \
>>   dev/release/verify-release-candidate.sh 15.0.2 3
>>
>>   * TEST_DEFAULT=0 \
>>   TEST_APT=1 \
>>   LANG=C \
>>   dev/release/verify-release-candidate.sh 15.0.2 3
>>
>>   * TEST_DEFAULT=0 \
>>   TEST_BINARY=1 \
>>   LANG=C \
>>   dev/release/verify-release-candidate.sh 15.0.2 3
>>
>>   * TEST_DEFAULT=0 \
>>   TEST_JARS=1 \
>>   LANG=C \
>>   dev/release/verify-release-candidate.sh 15.0.2 3
>>
>>   * TEST_DEFAULT=0 \
>>   TEST_PYTHON_VERSIONS=3.11 \
>>   TEST_WHEELS=1 \
>>   LANG=C \
>>   dev/release/verify-release-candidate.sh 15.0.2 3
>>
>>   * TEST_DEFAULT=0 \
>>   TEST_YUM=1 \
>>   LANG=C \
>>   dev/release/verify-release-candidate.sh 15.0.2 3
>>
>> with:
>>
>>   * .NET SDK (7.0.406)
>>   * Python 3.11.8
>>   * gcc (Debian 13.2.0-13) 13.2.0
>>   * nvidia-cuda-dev 12.0.146~12.0.1-4
>>   * openjdk version "17.0.10" 2024-01-16
>>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
>>
>>
>> Thanks,
>> --
>> kou
>>
>>
>> In 
>>   "[VOTE] Release Apache Arrow 15.0.2 - RC3" on Thu, 14 Mar 2024 12:22:13
>> +0100,
>>   Raúl Cumplido  wrote:
>>
>> > Hi,
>> >
>> > I would like to propose the following release candidate (RC3) of Apache
>> > Arrow version 15.0.2. This is a release consisting of 8
>> > resolved GitHub issues[1].
>> >
>> > This release candidate is based on commit:
>> > e03105efc38edca4ca429bf967a17b4d0fbebe40 [2]
>> >
>> > The source release rc3 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
>> > The changelog is located at [12].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> > and vote on the release. See [13] for how to validate a release
>> candidate.
>> >
>> > See also a verification result on GitHub pull request [14].
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow 15.0.2
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow 15.0.2 because...
>> >
>> > [1]:
>> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A15.0.2+is%3Aclosed
>> > [2]:
>> https://github.com/apache/arrow/tree/e03105efc38edca4ca429bf967a17b4d0fbebe40
>> > [3]:
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-15.0.2-rc3
>> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> > [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
>> > [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
>> > [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> > [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/15.0.2-rc3
>> > [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/15.0.2-rc3
>> > [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/15.0.2-rc3
>> > [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> > [12]:
>> https://github.com/apache/arrow/blob/e03105efc38edca4ca429bf967a17b4d0fbebe40/CHANGELOG.md
>> > [13]:
>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>> > [14]: https://github.com/apache/arrow/pull/40504
>>


Re: [VOTE] Release Apache Arrow 15.0.1 - RC0

2024-03-05 Thread David Li
+1

Tested on Debian 12/x86_64 with Conda

I had some issues with APT/YUM packages but figure it's due to my local setup

On Tue, Mar 5, 2024, at 01:46, Jean-Baptiste Onofré wrote:
> +1 (non binding)
>
> I checked:
> - signature and hashes look good
> - build OK
> - ASF headers present
> - No problematic binaries found in the source distribution
> - Tested on MacOS M3
>
> Regards
> JB
>
> On Mon, Mar 4, 2024 at 10:05 AM Raúl Cumplido  wrote:
>>
>> Hi,
>>
>> I would like to propose the following release candidate (RC0) of Apache
>> Arrow version 15.0.1. This is a release consisting of 37
>> resolved GitHub issues[1].
>>
>> This release candidate is based on commit:
>> 5ce6ff434c1e7daaa2d7f134349f3ce4c22683da [2]
>>
>> The source release rc0 is hosted at [3].
>> The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
>> The changelog is located at [12].
>>
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [13] for how to validate a release candidate.
>>
>> See also a verification result on GitHub pull request [14].
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow 15.0.1
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow 15.0.1 because...
>>
>> [1]: 
>> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A15.0.1+is%3Aclosed
>> [2]: 
>> https://github.com/apache/arrow/tree/5ce6ff434c1e7daaa2d7f134349f3ce4c22683da
>> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-15.0.1-rc0
>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
>> [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
>> [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/15.0.1-rc0
>> [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/15.0.1-rc0
>> [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/15.0.1-rc0
>> [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> [12]: 
>> https://github.com/apache/arrow/blob/5ce6ff434c1e7daaa2d7f134349f3ce4c22683da/CHANGELOG.md
>> [13]: 
>> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>> [14]: https://github.com/apache/arrow/pull/40211


[RESULT][VOTE] Flight RPC: add 'fallback' URI scheme

2024-03-04 Thread David Li
With 3 binding, 1 non-binding +1 votes the proposal passes. 

I will clean up the PR and move it out of draft.

On Fri, Mar 1, 2024, at 16:40, Sutou Kouhei wrote:
> +1
>
> In <421fbc7b-f441-4b0a-8626-a8d2dfff0...@app.fastmail.com>
>   "[VOTE] Flight RPC: add 'fallback' URI scheme" on Tue, 27 Feb 2024 
> 09:01:36 -0500,
>   "David Li"  wrote:
>
>> I would like to propose a 'reuse connection' URI scheme for Flight RPC. This 
>> proposal was previously discussed at [1]. A candidate implementation for 
>> C++, Java, and Go is at [2].
>> 
>> The vote will be open for at least 72 hours.
>> 
>> [ ] +1 
>> [ ] +0
>> [ ] -1 Do not accept this proposal because...
>> 
>> [1]: https://lists.apache.org/thread/pc9fs0hf8t5ylj9os00r9vg8d2xv2npz
>> [2]: https://github.com/apache/arrow/pull/40084
>> 
>> On Tue, Feb 20, 2024, at 14:14, David Li wrote:
>>> Thanks for the comments - I've updated the implementation [1] and added 
>>> Go + integration tests. If this all checks out I'd like to start a vote 
>>> soon.
>>>
>>> [1]: https://github.com/apache/arrow/pull/40084
>>>
>>> On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote:
>>>> Thank you -- I think the usecase is great, but agree with the other
>>>> reviewers that the name may be confusing. I left some notes on the ticket
>>>>
>>>> Andrew
>>>>
>>>> On Wed, Feb 14, 2024 at 3:52 PM David Li  wrote:
>>>>
>>>>> I've put up a candidate implementation sans integration test [1].
>>>>>
>>>>> Some caveats:
>>>>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?'
>>>>> (yes, an empty query string pacifies it). I've chosen the latter since the
>>>>> former is technically a URI with a non-empty path but neither are ideal.
>>>>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
>>>>> faithful to the intended use than 'fallback'.
>>>>>
>>>>> [1]: https://github.com/apache/arrow/pull/40084
>>>>>
>>>>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
>>>>> > Hi David,
>>>>> >
>>>>> > It's reasonable. I think we can start with your initial proposal (it
>>>>> > sounds fine to me) and we can always improve step by step.
>>>>> >
>>>>> > Thanks !
>>>>> > Regards
>>>>> > JB
>>>>> >
>>>>> > On Tue, Feb 13, 2024 at 4:53 PM David Li  wrote:
>>>>> >>
>>>>> >> I'm going to keep the proposal as-is then. It can be extended if this
>>>>> use case comes up.
>>>>> >>
>>>>> >> I'll start work on candidate implementations now.
>>>>> >>
>>>>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>>>>> >> > I think the original proposal is sufficient.
>>>>> >> >
>>>>> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls
>>>>> to
>>>>> >> > http without an explicit server location (unless both Flight servers
>>>>> are
>>>>> >> > hosted under the same port?). So the "+" proposal seems a bit weird.
>>>>> >> >
>>>>> >> >
>>>>> >> > Le 12/02/2024 à 23:39, David Li a écrit :
>>>>> >> >> The idea is that the client would reuse the existing connection, in
>>>>> which case the protocol and such are implicit. (If the client doesn't have
>>>>> a connection anymore, it can't use the fallback anyways.)
>>>>> >> >>
>>>>> >> >> I suppose this has the advantage that you could "fall back" to a
>>>>> known hostname with a different protocol, but I'm not sure that always
>>>>> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
>>>>> addresses aren't hostnames but rather opaque byte blobs, for instance.)
>>>>> >> >>
>>>>> >> >> If we do prefer this, to avoid overloading the hostname, there's
>>>>> also the informal convention of using + in the scheme, so it could be
>>>>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>>>>> >

Re: [DISCUSS] [FlightSQL] Supporting parameters and prepared statements with a stateless server

2024-03-02 Thread David Li
The proposal looks good to me. Thanks!

On Fri, Mar 1, 2024, at 20:11, Adam Curtis wrote:
> Hello,
>
> We would like to support prepared statements with bind parameters with
> a stateless service. This was discussed previously on the mailing list
> [1]. The original ticket outlining the proposed design can be found
> here [2].
>
> I have prepared a specific proposal and would like feedback in
> preparation of calling for a formal vote. Here is a PR with the
> proposed spec change [3].
>
> Here are PRs with implementations in two languages: Go [4] and Rust [5].
>
> Please let me know your thoughts.
>
> [1]:
> https://lists.apache.org/thread/f0xb61z4yw611rw0v8vf9rht0qtq8opc
> [2]:
> https://github.com/apache/arrow/issues/37720
> [3]:
> https://github.com/apache/arrow/pull/40243
> [4]:
> https://github.com/apache/arrow/pull/40311
> [5]:
> https://github.com/apache/arrow-rs/pull/5433


Re: [VOTE] Flight RPC: add 'fallback' URI scheme

2024-03-01 Thread David Li
My vote: +1

On Wed, Feb 28, 2024, at 15:50, Joel Lubinitsky wrote:
> +1
>
> On Wed, Feb 28, 2024 at 3:22 PM Andrew Lamb  wrote:
>
>> +1
>>
>>
>> On Tue, Feb 27, 2024 at 9:06 AM David Li  wrote:
>>
>> > I would like to propose a 'reuse connection' URI scheme for Flight RPC.
>> > This proposal was previously discussed at [1]. A candidate implementation
>> > for C++, Java, and Go is at [2].
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1
>> > [ ] +0
>> > [ ] -1 Do not accept this proposal because...
>> >
>> > [1]: https://lists.apache.org/thread/pc9fs0hf8t5ylj9os00r9vg8d2xv2npz
>> > [2]: https://github.com/apache/arrow/pull/40084
>> >
>> > On Tue, Feb 20, 2024, at 14:14, David Li wrote:
>> > > Thanks for the comments - I've updated the implementation [1] and added
>> > > Go + integration tests. If this all checks out I'd like to start a vote
>> > > soon.
>> > >
>> > > [1]: https://github.com/apache/arrow/pull/40084
>> > >
>> > > On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote:
>> > >> Thank you -- I think the usecase is great, but agree with the other
>> > >> reviewers that the name may be confusing. I left some notes on the
>> > ticket
>> > >>
>> > >> Andrew
>> > >>
>> > >> On Wed, Feb 14, 2024 at 3:52 PM David Li  wrote:
>> > >>
>> > >>> I've put up a candidate implementation sans integration test [1].
>> > >>>
>> > >>> Some caveats:
>> > >>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or
>> > 'scheme://?'
>> > >>> (yes, an empty query string pacifies it). I've chosen the latter
>> since
>> > the
>> > >>> former is technically a URI with a non-empty path but neither are
>> > ideal.
>> > >>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be
>> more
>> > >>> faithful to the intended use than 'fallback'.
>> > >>>
>> > >>> [1]: https://github.com/apache/arrow/pull/40084
>> > >>>
>> > >>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
>> > >>> > Hi David,
>> > >>> >
>> > >>> > It's reasonable. I think we can start with your initial proposal
>> (it
>> > >>> > sounds fine to me) and we can always improve step by step.
>> > >>> >
>> > >>> > Thanks !
>> > >>> > Regards
>> > >>> > JB
>> > >>> >
>> > >>> > On Tue, Feb 13, 2024 at 4:53 PM David Li 
>> > wrote:
>> > >>> >>
>> > >>> >> I'm going to keep the proposal as-is then. It can be extended if
>> > this
>> > >>> use case comes up.
>> > >>> >>
>> > >>> >> I'll start work on candidate implementations now.
>> > >>> >>
>> > >>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>> > >>> >> > I think the original proposal is sufficient.
>> > >>> >> >
>> > >>> >> > Also, it is not obvious to me how one would switch from e.g.
>> > grpc+tls
>> > >>> to
>> > >>> >> > http without an explicit server location (unless both Flight
>> > servers
>> > >>> are
>> > >>> >> > hosted under the same port?). So the "+" proposal seems a bit
>> > weird.
>> > >>> >> >
>> > >>> >> >
>> > >>> >> > Le 12/02/2024 à 23:39, David Li a écrit :
>> > >>> >> >> The idea is that the client would reuse the existing
>> connection,
>> > in
>> > >>> which case the protocol and such are implicit. (If the client doesn't
>> > have
>> > >>> a connection anymore, it can't use the fallback anyways.)
>> > >>> >> >>
>> > >>> >> >> I suppose this has the advantage that you could "fall back" to
>> a
>> > >>> known hostname with a different protocol, but I'm not sure that
>> always
>> > >>> applies anyways. (Correct me i

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread David Li
+1

On Fri, Mar 1, 2024, at 12:06, Jorge Cardoso Leitão wrote:
> +1 - great work!!!
>
> On Fri, Mar 1, 2024 at 5:49 PM Micah Kornfield 
> wrote:
>
>> +1 (binding)
>>
>> On Friday, March 1, 2024, Uwe L. Korn  wrote:
>>
>> > +1 (binding)
>> >
>> > On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote:
>> > > +1 (binding)
>> > >
>> > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace 
>> > wrote:
>> > >
>> > >> +1 (binding)
>> > >>
>> > >> On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb 
>> > wrote:
>> > >>
>> > >> > Hello,
>> > >> >
>> > >> > As we have discussed[1][2] I would like to vote on the proposal to
>> > >> > create a new Apache Top Level Project for DataFusion. The text of
>> the
>> > >> > proposed resolution and background document is copy/pasted below
>> > >> >
>> > >> > If the community is in favor of this, we plan to submit the
>> resolution
>> > >> > to the ASF board for approval with the next Arrow report (for the
>> > >> > April 2024 board meeting).
>> > >> >
>> > >> > The vote will be open for at least 7 days.
>> > >> >
>> > >> > [ ] +1 Accept this Proposal
>> > >> > [ ] +0
>> > >> > [ ] -1 Do not accept this proposal because...
>> > >> >
>> > >> > Andrew
>> > >> >
>> > >> > [1]
>> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
>> > >> > [2] https://github.com/apache/arrow-datafusion/discussions/6475
>> > >> >
>> > >> > -- Proposed Resolution -
>> > >> >
>> > >> > Resolution to Create the Apache DataFusion Project from the Apache
>> > >> > Arrow DataFusion Sub Project
>> > >> >
>> > >> > =
>> > >> >
>> > >> > X. Establish the Apache DataFusion Project
>> > >> >
>> > >> > WHEREAS, the Board of Directors deems it to be in the best
>> > >> > interests of the Foundation and consistent with the
>> > >> > Foundation's purpose to establish a Project Management
>> > >> > Committee charged with the creation and maintenance of
>> > >> > open-source software related to an extensible query engine
>> > >> > for distribution at no charge to the public.
>> > >> >
>> > >> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management
>> > >> > Committee (PMC), to be known as the "Apache DataFusion Project",
>> > >> > be and hereby is established pursuant to Bylaws of the
>> > >> > Foundation; and be it further
>> > >> >
>> > >> > RESOLVED, that the Apache DataFusion Project be and hereby is
>> > >> > responsible for the creation and maintenance of software
>> > >> > related to an extensible query engine; and be it further
>> > >> >
>> > >> > RESOLVED, that the office of "Vice President, Apache DataFusion" be
>> > >> > and hereby is created, the person holding such office to
>> > >> > serve at the direction of the Board of Directors as the chair
>> > >> > of the Apache DataFusion Project, and to have primary responsibility
>> > >> > for management of the projects within the scope of
>> > >> > responsibility of the Apache DataFusion Project; and be it further
>> > >> >
>> > >> > RESOLVED, that the persons listed immediately below be and
>> > >> > hereby are appointed to serve as the initial members of the
>> > >> > Apache DataFusion Project:
>> > >> >
>> > >> > * Andy Grove (agr...@apache.org)
>> > >> > * Andrew Lamb (al...@apache.org)
>> > >> > * Daniël Heres (dhe...@apache.org)
>> > >> > * Jie Wen (jake...@apache.org)
>> > >> > * Kun Liu (liu...@apache.org)
>> > >> > * Liang-Chi Hsieh (vii...@apache.org)
>> > >> > * Qingping Hou: (ho...@apache.org)
>> > >> > * Wes McKinney(w...@apache.org)
>> > >> > * Will Jones (wjones...@apache.org)
>> > >> >
>> > >> > RESOLVED, that the Apache DataFusion Project be and hereby
>> > >> > is tasked with the migration and rationalization of the Apache
>> > >> > Arrow DataFusion sub-project; and be it further
>> > >> >
>> > >> > RESOLVED, that all responsibilities pertaining to the Apache
>> > >> > Arrow DataFusion sub-project encumbered upon the
>> > >> > Apache Arrow Project are hereafter discharged.
>> > >> >
>> > >> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb
>> > >> > be appointed to the office of Vice President, Apache DataFusion, to
>> > >> > serve in accordance with and subject to the direction of the
>> > >> > Board of Directors and the Bylaws of the Foundation until
>> > >> > death, resignation, retirement, removal or disqualification,
>> > >> > or until a successor is appointed.
>> > >> > =
>> > >> >
>> > >> >
>> > >> > ---
>> > >> >
>> > >> >
>> > >> > Summary:
>> > >> >
>> > >> > We propose creating a new top level project, Apache DataFusion, from
>> > >> > an existing sub project of Apache Arrow to facilitate additional
>> > >> > community and project growth.
>> > >> >
>> > >> > Abstract
>> > >> >
>> > >> > Apache Arrow DataFusion[1]  is a very fast, extensible query engine
>> > >> > for building high-quality data-centric systems in Rust, using the
>> > >> > Apache Arrow in-memory format. DataFusion offers 

[VOTE] Flight RPC: add 'fallback' URI scheme

2024-02-27 Thread David Li
I would like to propose a 'reuse connection' URI scheme for Flight RPC. This 
proposal was previously discussed at [1]. A candidate implementation for C++, 
Java, and Go is at [2].

The vote will be open for at least 72 hours.

[ ] +1 
[ ] +0
[ ] -1 Do not accept this proposal because...

[1]: https://lists.apache.org/thread/pc9fs0hf8t5ylj9os00r9vg8d2xv2npz
[2]: https://github.com/apache/arrow/pull/40084

On Tue, Feb 20, 2024, at 14:14, David Li wrote:
> Thanks for the comments - I've updated the implementation [1] and added 
> Go + integration tests. If this all checks out I'd like to start a vote 
> soon.
>
> [1]: https://github.com/apache/arrow/pull/40084
>
> On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote:
>> Thank you -- I think the usecase is great, but agree with the other
>> reviewers that the name may be confusing. I left some notes on the ticket
>>
>> Andrew
>>
>> On Wed, Feb 14, 2024 at 3:52 PM David Li  wrote:
>>
>>> I've put up a candidate implementation sans integration test [1].
>>>
>>> Some caveats:
>>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?'
>>> (yes, an empty query string pacifies it). I've chosen the latter since the
>>> former is technically a URI with a non-empty path but neither are ideal.
>>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
>>> faithful to the intended use than 'fallback'.
>>>
>>> [1]: https://github.com/apache/arrow/pull/40084
>>>
>>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
>>> > Hi David,
>>> >
>>> > It's reasonable. I think we can start with your initial proposal (it
>>> > sounds fine to me) and we can always improve step by step.
>>> >
>>> > Thanks !
>>> > Regards
>>> > JB
>>> >
>>> > On Tue, Feb 13, 2024 at 4:53 PM David Li  wrote:
>>> >>
>>> >> I'm going to keep the proposal as-is then. It can be extended if this
>>> use case comes up.
>>> >>
>>> >> I'll start work on candidate implementations now.
>>> >>
>>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>>> >> > I think the original proposal is sufficient.
>>> >> >
>>> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls
>>> to
>>> >> > http without an explicit server location (unless both Flight servers
>>> are
>>> >> > hosted under the same port?). So the "+" proposal seems a bit weird.
>>> >> >
>>> >> >
>>> >> > Le 12/02/2024 à 23:39, David Li a écrit :
>>> >> >> The idea is that the client would reuse the existing connection, in
>>> which case the protocol and such are implicit. (If the client doesn't have
>>> a connection anymore, it can't use the fallback anyways.)
>>> >> >>
>>> >> >> I suppose this has the advantage that you could "fall back" to a
>>> known hostname with a different protocol, but I'm not sure that always
>>> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
>>> addresses aren't hostnames but rather opaque byte blobs, for instance.)
>>> >> >>
>>> >> >> If we do prefer this, to avoid overloading the hostname, there's
>>> also the informal convention of using + in the scheme, so it could be
>>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>>> >> >>
>>> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>>> >> >>> Thanks for clarifying.
>>> >> >>>
>>> >> >>> Given the relationship between these two proposals, would it also be
>>> >> >>> necessary to distinguish the scheme (or schemes) supported by the
>>> >> >>> originating Flight RPC service?
>>> >> >>>
>>> >> >>> If that is the case, it may be preferred to use the "host" portion
>>> of the
>>> >> >>> URI rather than the "scheme" to denote the location of the data. In
>>> this
>>> >> >>> scenario, the host "0.0.0.0" could be used. This IP address is
>>> defined in
>>> >> >>> IETF RFC1122 [1] as "This host on this network", which seems most
>>> >> >>> consistent with the intended 

Re: [RESULT][VOTE] Release Apache Arrow ADBC 0.10.0 - RC1

2024-02-22 Thread David Li
[x] Close the GitHub milestone/project
[x] Add the new release to the Apache Reporter System
[x] Upload source release artifacts to Subversion
[x] Create the final GitHub release
[x] Update website
[x] Upload wheels/sdist to PyPI
[x] Publish Maven packages
[x] Update tags for Go modules
[x] Deploy APT/Yum repositories
[x] Upload Ruby packages to RubyGems
[IN PROGRESS] Update conda-forge packages [1]
[x] Announce the new release
[x] Remove old artifacts
[IN PROGRESS] Bump versions [2]
[IN PROGRESS] Publish release blog post [3]

[1]: https://github.com/conda-forge/arrow-adbc-split-feedstock/pull/21
[2]: https://github.com/apache/arrow-adbc/pull/1560
[3]: https://github.com/apache/arrow-site/pull/477

On Thu, Feb 22, 2024, at 09:12, David Li wrote:
> The vote passes with 4 binding, 2 non-binding +1 votes.
>
> I'll take care of the release tasks.
>
> On Wed, Feb 21, 2024, at 19:02, Dane Pitkin wrote:
>> +1 (non-binding)
>>
>> Verified on Mac M1 using conda.
>>
>> On Tue, Feb 20, 2024 at 11:27 PM Dewey Dunnington
>>  wrote:
>>
>>> +1!
>>>
>>> I ran USE_CONDA=1 dev/release/verify-release-candidate.sh 0.10.0 1 on
>>> MacOS Sonoma (M1).
>>>
>>> On Tue, Feb 20, 2024 at 9:43 AM Jean-Baptiste Onofré 
>>> wrote:
>>> >
>>> > +1 (non binding)
>>> >
>>> > I quickly tested on MacOS arm64.
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On Sun, Feb 18, 2024 at 9:47 PM David Li  wrote:
>>> > >
>>> > > Hello,
>>> > >
>>> > > I would like to propose the following release candidate (RC1) of
>>> Apache Arrow ADBC version 0.10.0. This is a release consisting of 30
>>> resolved GitHub issues [1].
>>> > >
>>> > > This release candidate is based on commit:
>>> 9a8e44cc62f23a68ffc0d3d4c7362214b221bea0 [2]
>>> > >
>>> > > The source release rc1 is hosted at [3].
>>> > > The binary artifacts are hosted at [4][5][6][7][8].
>>> > > The changelog is located at [9].
>>> > >
>>> > > Please download, verify checksums and signatures, run the unit tests,
>>> and vote on the release. See [10] for how to validate a release candidate.
>>> > >
>>> > > See also a verification result on GitHub Actions [11].
>>> > >
>>> > > The vote will be open for at least 72 hours.
>>> > >
>>> > > [ ] +1 Release this as Apache Arrow ADBC 0.10.0
>>> > > [ ] +0
>>> > > [ ] -1 Do not release this as Apache Arrow ADBC 0.10.0 because...
>>> > >
>>> > > Note: to verify APT/YUM packages on macOS/AArch64, you must `export
>>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export
>>> TEST_APT=0 TEST_YUM=0`.)
>>> > >
>>> > > [1]:
>>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.10.0%22+is%3Aclosed
>>> > > [2]:
>>> https://github.com/apache/arrow-adbc/commit/9a8e44cc62f23a68ffc0d3d4c7362214b221bea0
>>> > > [3]:
>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.10.0-rc1/
>>> > > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>>> > > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>>> > > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>>> > > [7]:
>>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>>> > > [8]:
>>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.10.0-rc1
>>> > > [9]:
>>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.10.0-rc1/CHANGELOG.md
>>> > > [10]:
>>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>>> > > [11]: https://github.com/apache/arrow-adbc/actions/runs/7951302316
>>>


[ANNOUNCE] Apache Arrow ADBC 0.10.0 released

2024-02-22 Thread David Li
The Apache Arrow community is pleased to announce the 0.10.0 release of the 
Apache Arrow ADBC libraries. It includes 31 resolved GitHub issues ([1]).

The release is available now from [2] and [3].

Release notes are available at: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.10.0/CHANGELOG.md#adbc-libraries-0100-2024-02-18

What is Apache Arrow?
-
Apache Arrow is a columnar in-memory analytics layer designed to accelerate big 
data. It houses a set of canonical in-memory representations of flat and 
hierarchical data along with multiple language-bindings for structure 
manipulation. It also provides low-overhead streaming and batch messaging, 
zero-copy interprocess communication (IPC), and vectorized in-memory analytics 
libraries. Languages currently supported include C, C++, C#, Go, Java, 
JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

What is Apache Arrow ADBC?
--
ADBC is a database access abstraction for Arrow-based applications. It provides 
a cross-language API for working with databases while using Arrow data, 
providing an alternative to APIs like JDBC and ODBC for analytical 
applications. For more, see [4].

Please report any feedback to the mailing lists ([5], [6]).

Regards,
The Apache Arrow Community

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.10.0%22+is%3Aclosed
[2]: https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-0.10.0 
[3]: https://apache.jfrog.io/ui/native/arrow
[4]: https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/
[5]: https://lists.apache.org/list.html?u...@arrow.apache.org
[6]: https://lists.apache.org/list.html?dev@arrow.apache.org


[RESULT][VOTE] Release Apache Arrow ADBC 0.10.0 - RC1

2024-02-22 Thread David Li
The vote passes with 4 binding, 2 non-binding +1 votes.

I'll take care of the release tasks.

On Wed, Feb 21, 2024, at 19:02, Dane Pitkin wrote:
> +1 (non-binding)
>
> Verified on Mac M1 using conda.
>
> On Tue, Feb 20, 2024 at 11:27 PM Dewey Dunnington
>  wrote:
>
>> +1!
>>
>> I ran USE_CONDA=1 dev/release/verify-release-candidate.sh 0.10.0 1 on
>> MacOS Sonoma (M1).
>>
>> On Tue, Feb 20, 2024 at 9:43 AM Jean-Baptiste Onofré 
>> wrote:
>> >
>> > +1 (non binding)
>> >
>> > I quickly tested on MacOS arm64.
>> >
>> > Regards
>> > JB
>> >
>> > On Sun, Feb 18, 2024 at 9:47 PM David Li  wrote:
>> > >
>> > > Hello,
>> > >
>> > > I would like to propose the following release candidate (RC1) of
>> Apache Arrow ADBC version 0.10.0. This is a release consisting of 30
>> resolved GitHub issues [1].
>> > >
>> > > This release candidate is based on commit:
>> 9a8e44cc62f23a68ffc0d3d4c7362214b221bea0 [2]
>> > >
>> > > The source release rc1 is hosted at [3].
>> > > The binary artifacts are hosted at [4][5][6][7][8].
>> > > The changelog is located at [9].
>> > >
>> > > Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [10] for how to validate a release candidate.
>> > >
>> > > See also a verification result on GitHub Actions [11].
>> > >
>> > > The vote will be open for at least 72 hours.
>> > >
>> > > [ ] +1 Release this as Apache Arrow ADBC 0.10.0
>> > > [ ] +0
>> > > [ ] -1 Do not release this as Apache Arrow ADBC 0.10.0 because...
>> > >
>> > > Note: to verify APT/YUM packages on macOS/AArch64, you must `export
>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export
>> TEST_APT=0 TEST_YUM=0`.)
>> > >
>> > > [1]:
>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.10.0%22+is%3Aclosed
>> > > [2]:
>> https://github.com/apache/arrow-adbc/commit/9a8e44cc62f23a68ffc0d3d4c7362214b221bea0
>> > > [3]:
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.10.0-rc1/
>> > > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> > > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> > > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> > > [7]:
>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> > > [8]:
>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.10.0-rc1
>> > > [9]:
>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.10.0-rc1/CHANGELOG.md
>> > > [10]:
>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> > > [11]: https://github.com/apache/arrow-adbc/actions/runs/7951302316
>>


Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-20 Thread David Li
Thanks for the comments - I've updated the implementation [1] and added Go + 
integration tests. If this all checks out I'd like to start a vote soon.

[1]: https://github.com/apache/arrow/pull/40084

On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote:
> Thank you -- I think the usecase is great, but agree with the other
> reviewers that the name may be confusing. I left some notes on the ticket
>
> Andrew
>
> On Wed, Feb 14, 2024 at 3:52 PM David Li  wrote:
>
>> I've put up a candidate implementation sans integration test [1].
>>
>> Some caveats:
>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?'
>> (yes, an empty query string pacifies it). I've chosen the latter since the
>> former is technically a URI with a non-empty path but neither are ideal.
>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
>> faithful to the intended use than 'fallback'.
>>
>> [1]: https://github.com/apache/arrow/pull/40084
>>
>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
>> > Hi David,
>> >
>> > It's reasonable. I think we can start with your initial proposal (it
>> > sounds fine to me) and we can always improve step by step.
>> >
>> > Thanks !
>> > Regards
>> > JB
>> >
>> > On Tue, Feb 13, 2024 at 4:53 PM David Li  wrote:
>> >>
>> >> I'm going to keep the proposal as-is then. It can be extended if this
>> use case comes up.
>> >>
>> >> I'll start work on candidate implementations now.
>> >>
>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>> >> > I think the original proposal is sufficient.
>> >> >
>> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls
>> to
>> >> > http without an explicit server location (unless both Flight servers
>> are
>> >> > hosted under the same port?). So the "+" proposal seems a bit weird.
>> >> >
>> >> >
>> >> > Le 12/02/2024 à 23:39, David Li a écrit :
>> >> >> The idea is that the client would reuse the existing connection, in
>> which case the protocol and such are implicit. (If the client doesn't have
>> a connection anymore, it can't use the fallback anyways.)
>> >> >>
>> >> >> I suppose this has the advantage that you could "fall back" to a
>> known hostname with a different protocol, but I'm not sure that always
>> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
>> addresses aren't hostnames but rather opaque byte blobs, for instance.)
>> >> >>
>> >> >> If we do prefer this, to avoid overloading the hostname, there's
>> also the informal convention of using + in the scheme, so it could be
>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>> >> >>
>> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>> >> >>> Thanks for clarifying.
>> >> >>>
>> >> >>> Given the relationship between these two proposals, would it also be
>> >> >>> necessary to distinguish the scheme (or schemes) supported by the
>> >> >>> originating Flight RPC service?
>> >> >>>
>> >> >>> If that is the case, it may be preferred to use the "host" portion
>> of the
>> >> >>> URI rather than the "scheme" to denote the location of the data. In
>> this
>> >> >>> scenario, the host "0.0.0.0" could be used. This IP address is
>> defined in
>> >> >>> IETF RFC1122 [1] as "This host on this network", which seems most
>> >> >>> consistent with the intended use-case. There are some caveats to
>> this usage
>> >> >>> but in my experience it's not uncommon for protocols to extend the
>> >> >>> definition of this address in their own usage.
>> >> >>>
>> >> >>> A benefit of this convention is that the scheme remains available
>> in the
>> >> >>> URI to specify the transport available. For example, the following
>> list of
>> >> >>> locations may be included in the response:
>> >> >>>
>> >> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4",
>> ...]
>> >> >>>
>> >> >>>

[RESULT][VOTE] Explicit session management for Flight RPC

2024-02-20 Thread David Li
The vote passes with 3 binding, 3 non-binding +1 votes. Thanks all!

On Mon, Feb 19, 2024, at 09:16, David Li wrote:
> My vote: +1
>
> On Sun, Feb 18, 2024, at 07:06, Joel Lubinitsky wrote:
>> +1
>>
>> On Fri, Feb 16, 2024 at 1:07 PM Andrew Lamb  wrote:
>>
>>> +1
>>>
>>> On Fri, Feb 16, 2024 at 1:46 AM Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> > +1
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On Wed, Feb 14, 2024 at 5:38 PM David Li  wrote:
>>> > >
>>> > > Paul Nienaber would like to propose explicit session management for
>>> > Flight RPC.  This proposal was previously discussed at [1].  A candidate
>>> > implementation for C++ and Java is at [2].
>>> > >
>>> > > The vote will be open for at least 72 hours.
>>> > >
>>> > > [ ] +1
>>> > > [ ] +0
>>> > > [ ] -1 Do not accept this proposal because...
>>> > >
>>> > > [1]: https://lists.apache.org/thread/fd6r1n7vt91sg2c7fr35wcrsqz6x4645
>>> > > [2]: https://github.com/apache/arrow/pull/34817
>>> >
>>>


Re: [VOTE] Explicit session management for Flight RPC

2024-02-19 Thread David Li
My vote: +1

On Sun, Feb 18, 2024, at 07:06, Joel Lubinitsky wrote:
> +1
>
> On Fri, Feb 16, 2024 at 1:07 PM Andrew Lamb  wrote:
>
>> +1
>>
>> On Fri, Feb 16, 2024 at 1:46 AM Jean-Baptiste Onofré 
>> wrote:
>>
>> > +1
>> >
>> > Regards
>> > JB
>> >
>> > On Wed, Feb 14, 2024 at 5:38 PM David Li  wrote:
>> > >
>> > > Paul Nienaber would like to propose explicit session management for
>> > Flight RPC.  This proposal was previously discussed at [1].  A candidate
>> > implementation for C++ and Java is at [2].
>> > >
>> > > The vote will be open for at least 72 hours.
>> > >
>> > > [ ] +1
>> > > [ ] +0
>> > > [ ] -1 Do not accept this proposal because...
>> > >
>> > > [1]: https://lists.apache.org/thread/fd6r1n7vt91sg2c7fr35wcrsqz6x4645
>> > > [2]: https://github.com/apache/arrow/pull/34817
>> >
>>


Re: [VOTE] Release Apache Arrow ADBC 0.10.0 - RC1

2024-02-19 Thread David Li
My vote: +1

Tested on macOS/AArch64

On Mon, Feb 19, 2024, at 08:24, Raúl Cumplido wrote:
> +1
>
> I have verified successfully on Ubuntu 22.04 with:
>  USE_CONDA=1 dev/release/verify-release-candidate.sh 0.10.0 1
>
> El dom, 18 feb 2024 a las 21:48, David Li () escribió:
>>
>> Hello,
>>
>> I would like to propose the following release candidate (RC1) of Apache 
>> Arrow ADBC version 0.10.0. This is a release consisting of 30 resolved 
>> GitHub issues [1].
>>
>> This release candidate is based on commit: 
>> 9a8e44cc62f23a68ffc0d3d4c7362214b221bea0 [2]
>>
>> The source release rc1 is hosted at [3].
>> The binary artifacts are hosted at [4][5][6][7][8].
>> The changelog is located at [9].
>>
>> Please download, verify checksums and signatures, run the unit tests, and 
>> vote on the release. See [10] for how to validate a release candidate.
>>
>> See also a verification result on GitHub Actions [11].
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow ADBC 0.10.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow ADBC 0.10.0 because...
>>
>> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
>> TEST_APT=0 TEST_YUM=0`.)
>>
>> [1]: 
>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.10.0%22+is%3Aclosed
>> [2]: 
>> https://github.com/apache/arrow-adbc/commit/9a8e44cc62f23a68ffc0d3d4c7362214b221bea0
>> [3]: 
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.10.0-rc1/
>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> [7]: 
>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> [8]: 
>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.10.0-rc1
>> [9]: 
>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.10.0-rc1/CHANGELOG.md
>> [10]: 
>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> [11]: https://github.com/apache/arrow-adbc/actions/runs/7951302316


[VOTE] Release Apache Arrow ADBC 0.10.0 - RC1

2024-02-18 Thread David Li
Hello,

I would like to propose the following release candidate (RC1) of Apache Arrow 
ADBC version 0.10.0. This is a release consisting of 30 resolved GitHub issues 
[1].

This release candidate is based on commit: 
9a8e44cc62f23a68ffc0d3d4c7362214b221bea0 [2]

The source release rc1 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8].
The changelog is located at [9].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [10] for how to validate a release candidate.

See also a verification result on GitHub Actions [11].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow ADBC 0.10.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow ADBC 0.10.0 because...

Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export TEST_APT=0 
TEST_YUM=0`.)

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.10.0%22+is%3Aclosed
[2]: 
https://github.com/apache/arrow-adbc/commit/9a8e44cc62f23a68ffc0d3d4c7362214b221bea0
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.10.0-rc1/
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[7]: 
https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
[8]: 
https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.10.0-rc1
[9]: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.10.0-rc1/CHANGELOG.md
[10]: 
https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
[11]: https://github.com/apache/arrow-adbc/actions/runs/7951302316


Re: Flight RPC: Add total_records and total_bytes to FlightEndpoint

2024-02-15 Thread David Li
Hi Taeyun,

Is this related to the previous thread about fetching a part of a result set? 

I think it's reasonable to have these fields.

If others agree the next step would be to create a PR with an implementation 
for review/voting.

Best,
David

On Wed, Feb 14, 2024, at 18:58, 김태연 (Taeyun Kim) wrote:
> Hi,
>
> Currently, FlightInfo has total_records and total_bytes fields, but 
> FlightEndpoint does not.
> It would be great if FlightEndpoint also had these fields.
> These fields can be used for:
> - identifying the endpoint containing a specific record by offset among 
> multiple endpoints.
> - efficiently distributing the processing load of endpoints based on 
> their record count or size.
> Since FlightInfo already has these fields, it might not be technically 
> difficult to add them to individual FlightEndpoints.
>
> If creating an issue in GitHub is more appropriate, I will do so.
>
> Thank you.


Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-14 Thread David Li
I've put up a candidate implementation sans integration test [1].

Some caveats:
- java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?' 
(yes, an empty query string pacifies it). I've chosen the latter since the 
former is technically a URI with a non-empty path but neither are ideal. 
- I've changed the scheme to 'arrow-flight-reuse-connection' to be more 
faithful to the intended use than 'fallback'.

[1]: https://github.com/apache/arrow/pull/40084

On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
> Hi David,
>
> It's reasonable. I think we can start with your initial proposal (it
> sounds fine to me) and we can always improve step by step.
>
> Thanks !
> Regards
> JB
>
> On Tue, Feb 13, 2024 at 4:53 PM David Li  wrote:
>>
>> I'm going to keep the proposal as-is then. It can be extended if this use 
>> case comes up.
>>
>> I'll start work on candidate implementations now.
>>
>> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>> > I think the original proposal is sufficient.
>> >
>> > Also, it is not obvious to me how one would switch from e.g. grpc+tls to
>> > http without an explicit server location (unless both Flight servers are
>> > hosted under the same port?). So the "+" proposal seems a bit weird.
>> >
>> >
>> > Le 12/02/2024 à 23:39, David Li a écrit :
>> >> The idea is that the client would reuse the existing connection, in which 
>> >> case the protocol and such are implicit. (If the client doesn't have a 
>> >> connection anymore, it can't use the fallback anyways.)
>> >>
>> >> I suppose this has the advantage that you could "fall back" to a known 
>> >> hostname with a different protocol, but I'm not sure that always applies 
>> >> anyways. (Correct me if I'm wrong Matt, but as I recall, UCX addresses 
>> >> aren't hostnames but rather opaque byte blobs, for instance.)
>> >>
>> >> If we do prefer this, to avoid overloading the hostname, there's also the 
>> >> informal convention of using + in the scheme, so it could be 
>> >> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>> >>
>> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>> >>> Thanks for clarifying.
>> >>>
>> >>> Given the relationship between these two proposals, would it also be
>> >>> necessary to distinguish the scheme (or schemes) supported by the
>> >>> originating Flight RPC service?
>> >>>
>> >>> If that is the case, it may be preferred to use the "host" portion of the
>> >>> URI rather than the "scheme" to denote the location of the data. In this
>> >>> scenario, the host "0.0.0.0" could be used. This IP address is defined in
>> >>> IETF RFC1122 [1] as "This host on this network", which seems most
>> >>> consistent with the intended use-case. There are some caveats to this 
>> >>> usage
>> >>> but in my experience it's not uncommon for protocols to extend the
>> >>> definition of this address in their own usage.
>> >>>
>> >>> A benefit of this convention is that the scheme remains available in the
>> >>> URI to specify the transport available. For example, the following list 
>> >>> of
>> >>> locations may be included in the response:
>> >>>
>> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4", 
>> >>> ...]
>> >>>
>> >>> This would indicate that grpc and ucx transport is available from the
>> >>> current service, grpc is available at 1.2.3.4, and possibly more
>> >>> combinations of scheme/host.
>> >>>
>> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
>> >>>
>> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li  wrote:
>> >>>
>> >>>> Ah, while I was thinking of it as useful for a fallback, I'm not
>> >>>> specifying it that way.  Better ideas for names would be appreciated.
>> >>>>
>> >>>> The actual precedence has never been specified. All endpoints are
>> >>>> equivalent, so clients may use what is "best". For instance, with Matt
>> >>>> Topol's concurrent proposal, a GPU-enabled client may preferentially try
>> >>>> UCX endpoints while o

[VOTE] Explicit session management for Flight RPC

2024-02-14 Thread David Li
Paul Nienaber would like to propose explicit session management for Flight RPC. 
 This proposal was previously discussed at [1].  A candidate implementation for 
C++ and Java is at [2].

The vote will be open for at least 72 hours.

[ ] +1 
[ ] +0
[ ] -1 Do not accept this proposal because...

[1]: https://lists.apache.org/thread/fd6r1n7vt91sg2c7fr35wcrsqz6x4645
[2]: https://github.com/apache/arrow/pull/34817


Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-13 Thread David Li
I'm going to keep the proposal as-is then. It can be extended if this use case 
comes up.

I'll start work on candidate implementations now.

On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
> I think the original proposal is sufficient.
>
> Also, it is not obvious to me how one would switch from e.g. grpc+tls to 
> http without an explicit server location (unless both Flight servers are 
> hosted under the same port?). So the "+" proposal seems a bit weird.
>
>
> Le 12/02/2024 à 23:39, David Li a écrit :
>> The idea is that the client would reuse the existing connection, in which 
>> case the protocol and such are implicit. (If the client doesn't have a 
>> connection anymore, it can't use the fallback anyways.)
>> 
>> I suppose this has the advantage that you could "fall back" to a known 
>> hostname with a different protocol, but I'm not sure that always applies 
>> anyways. (Correct me if I'm wrong Matt, but as I recall, UCX addresses 
>> aren't hostnames but rather opaque byte blobs, for instance.)
>> 
>> If we do prefer this, to avoid overloading the hostname, there's also the 
>> informal convention of using + in the scheme, so it could be 
>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>> 
>> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>>> Thanks for clarifying.
>>>
>>> Given the relationship between these two proposals, would it also be
>>> necessary to distinguish the scheme (or schemes) supported by the
>>> originating Flight RPC service?
>>>
>>> If that is the case, it may be preferred to use the "host" portion of the
>>> URI rather than the "scheme" to denote the location of the data. In this
>>> scenario, the host "0.0.0.0" could be used. This IP address is defined in
>>> IETF RFC1122 [1] as "This host on this network", which seems most
>>> consistent with the intended use-case. There are some caveats to this usage
>>> but in my experience it's not uncommon for protocols to extend the
>>> definition of this address in their own usage.
>>>
>>> A benefit of this convention is that the scheme remains available in the
>>> URI to specify the transport available. For example, the following list of
>>> locations may be included in the response:
>>>
>>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4", ...]
>>>
>>> This would indicate that grpc and ucx transport is available from the
>>> current service, grpc is available at 1.2.3.4, and possibly more
>>> combinations of scheme/host.
>>>
>>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
>>>
>>> On Mon, Feb 12, 2024 at 2:53 PM David Li  wrote:
>>>
>>>> Ah, while I was thinking of it as useful for a fallback, I'm not
>>>> specifying it that way.  Better ideas for names would be appreciated.
>>>>
>>>> The actual precedence has never been specified. All endpoints are
>>>> equivalent, so clients may use what is "best". For instance, with Matt
>>>> Topol's concurrent proposal, a GPU-enabled client may preferentially try
>>>> UCX endpoints while other clients may choose to ignore them entirely (e.g.
>>>> because they don't have UCX installed).
>>>>
>>>> In practice the ADBC/JDBC drivers just scan the list left to right and try
>>>> each endpoint in turn for lack of a better heuristic.
>>>>
>>>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
>>>>> Thanks for proposing this David.
>>>>>
>>>>> I think the ability to include the Flight RPC service itself in the list
>>>> of
>>>>> endpoints from which data can be fetched is a helpful addition.
>>>>>
>>>>> The current choice of name for the URI (arrow-flight-fallback://) seems
>>>> to
>>>>> imply that there is an order of precedence that should be considered in
>>>> the
>>>>> list of URI’s. Specifically, as a developer receiving the list of
>>>> locations
>>>>> I might assume that I should try fetching from other locations first. If
>>>>> those do not succeed, I may try the original service as a fallback.
>>>>>
>>>>> Are these the intended semantics? If so, is there a way to include the
>>>>> original service in the list of locations without the implied precedence?
>&

Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-12 Thread David Li
The idea is that the client would reuse the existing connection, in which case 
the protocol and such are implicit. (If the client doesn't have a connection 
anymore, it can't use the fallback anyways.) 

I suppose this has the advantage that you could "fall back" to a known hostname 
with a different protocol, but I'm not sure that always applies anyways. 
(Correct me if I'm wrong Matt, but as I recall, UCX addresses aren't hostnames 
but rather opaque byte blobs, for instance.)

If we do prefer this, to avoid overloading the hostname, there's also the 
informal convention of using + in the scheme, so it could be 
arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.

On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
> Thanks for clarifying.
>
> Given the relationship between these two proposals, would it also be
> necessary to distinguish the scheme (or schemes) supported by the
> originating Flight RPC service?
>
> If that is the case, it may be preferred to use the "host" portion of the
> URI rather than the "scheme" to denote the location of the data. In this
> scenario, the host "0.0.0.0" could be used. This IP address is defined in
> IETF RFC1122 [1] as "This host on this network", which seems most
> consistent with the intended use-case. There are some caveats to this usage
> but in my experience it's not uncommon for protocols to extend the
> definition of this address in their own usage.
>
> A benefit of this convention is that the scheme remains available in the
> URI to specify the transport available. For example, the following list of
> locations may be included in the response:
>
> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4", ...]
>
> This would indicate that grpc and ucx transport is available from the
> current service, grpc is available at 1.2.3.4, and possibly more
> combinations of scheme/host.
>
> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
>
> On Mon, Feb 12, 2024 at 2:53 PM David Li  wrote:
>
>> Ah, while I was thinking of it as useful for a fallback, I'm not
>> specifying it that way.  Better ideas for names would be appreciated.
>>
>> The actual precedence has never been specified. All endpoints are
>> equivalent, so clients may use what is "best". For instance, with Matt
>> Topol's concurrent proposal, a GPU-enabled client may preferentially try
>> UCX endpoints while other clients may choose to ignore them entirely (e.g.
>> because they don't have UCX installed).
>>
>> In practice the ADBC/JDBC drivers just scan the list left to right and try
>> each endpoint in turn for lack of a better heuristic.
>>
>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
>> > Thanks for proposing this David.
>> >
>> > I think the ability to include the Flight RPC service itself in the list
>> of
>> > endpoints from which data can be fetched is a helpful addition.
>> >
>> > The current choice of name for the URI (arrow-flight-fallback://) seems
>> to
>> > imply that there is an order of precedence that should be considered in
>> the
>> > list of URI’s. Specifically, as a developer receiving the list of
>> locations
>> > I might assume that I should try fetching from other locations first. If
>> > those do not succeed, I may try the original service as a fallback.
>> >
>> > Are these the intended semantics? If so, is there a way to include the
>> > original service in the list of locations without the implied precedence?
>> >
>> > Thanks,
>> > Joel
>> >
>> > On Mon, Feb 12, 2024 at 11:52 James Duong > .invalid>
>> > wrote:
>> >
>> >> This seems like a good idea, and also improves consistency with clients
>> >> that erroneously assumed that the service endpoint was always in the
>> list
>> >> of endpoints.
>> >>
>> >> From: Antoine Pitrou 
>> >> Date: Monday, February 12, 2024 at 6:05 AM
>> >> To: dev@arrow.apache.org 
>> >> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme
>> >>
>> >> Hello,
>> >>
>> >> This looks fine to me.
>> >>
>> >> Regards
>> >>
>> >> Antoine.
>> >>
>> >>
>> >> Le 12/02/2024 à 14:46, David Li a écrit :
>> >> > Hello,
>> >> >
>> >> > I'd like to propose a slight update to Flight RPC to make Flight SQL
>> >> work better in different deployment scenarios.  Comments on the doc
>> would
>> >> be appreciated:
>> >> >
>> >> >
>> >>
>> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing
>> >> >
>> >> > The gist is that FlightEndpoint allows specifying either (1) a list of
>> >> concrete URIs to fetch data from or (2) no URIs, meaning to fetch from
>> the
>> >> Flight RPC service itself; but it would be useful to combine both
>> behaviors
>> >> (try these concrete URIs and fall back to the Flight RPC service itself)
>> >> without requiring the service to know its own public address.
>> >> >
>> >> > Best,
>> >> > David
>> >>
>>


Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-12 Thread David Li
Ah, while I was thinking of it as useful for a fallback, I'm not specifying it 
that way.  Better ideas for names would be appreciated.

The actual precedence has never been specified. All endpoints are equivalent, 
so clients may use what is "best". For instance, with Matt Topol's concurrent 
proposal, a GPU-enabled client may preferentially try UCX endpoints while other 
clients may choose to ignore them entirely (e.g. because they don't have UCX 
installed).

In practice the ADBC/JDBC drivers just scan the list left to right and try each 
endpoint in turn for lack of a better heuristic. 

On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
> Thanks for proposing this David.
>
> I think the ability to include the Flight RPC service itself in the list of
> endpoints from which data can be fetched is a helpful addition.
>
> The current choice of name for the URI (arrow-flight-fallback://) seems to
> imply that there is an order of precedence that should be considered in the
> list of URI’s. Specifically, as a developer receiving the list of locations
> I might assume that I should try fetching from other locations first. If
> those do not succeed, I may try the original service as a fallback.
>
> Are these the intended semantics? If so, is there a way to include the
> original service in the list of locations without the implied precedence?
>
> Thanks,
> Joel
>
> On Mon, Feb 12, 2024 at 11:52 James Duong 
> wrote:
>
>> This seems like a good idea, and also improves consistency with clients
>> that erroneously assumed that the service endpoint was always in the list
>> of endpoints.
>>
>> From: Antoine Pitrou 
>> Date: Monday, February 12, 2024 at 6:05 AM
>> To: dev@arrow.apache.org 
>> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme
>>
>> Hello,
>>
>> This looks fine to me.
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 12/02/2024 à 14:46, David Li a écrit :
>> > Hello,
>> >
>> > I'd like to propose a slight update to Flight RPC to make Flight SQL
>> work better in different deployment scenarios.  Comments on the doc would
>> be appreciated:
>> >
>> >
>> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing
>> >
>> > The gist is that FlightEndpoint allows specifying either (1) a list of
>> concrete URIs to fetch data from or (2) no URIs, meaning to fetch from the
>> Flight RPC service itself; but it would be useful to combine both behaviors
>> (try these concrete URIs and fall back to the Flight RPC service itself)
>> without requiring the service to know its own public address.
>> >
>> > Best,
>> > David
>>


Re: DISCUSS: [FlightSQL] Catalog support

2024-02-12 Thread David Li
The proposal for session support/explicit catalogs is ready for review [1]. 
Absent any objections I will start the vote this week, but comments would be 
appreciate as I'd like to avoid lots of revisions during the vote itself.

[1]: https://github.com/apache/arrow/pull/34817

On Wed, Nov 15, 2023, at 15:31, David Li wrote:
> Hi all,
>
> Paul and company have been working on this feature for Flight SQL via 
> support for explicit sessions in Flight SQL. Feedback would be much 
> appreciated on the PR, especially before tackling the second 
> implementation and eventual vote [1]. The current PR implements 
> client/server support for C++.
>
> [1]: https://github.com/apache/arrow/pull/34817
>
> On Wed, Feb 15, 2023, at 01:10, Sutou Kouhei wrote:
>> Hi James,
>>
>> Thanks for sharing your plan! I will wait for an update.
>>
>> -- 
>> kou
>>
>> In 
>>  
>> 
>>   "Re: DISCUSS: [FlightSQL] Catalog support" on Tue, 14 Feb 2023 
>> 04:53:07 +,
>>   James Duong  wrote:
>>
>>> Hi Sutou,
>>> 
>>> I saw your PostgreSQL project and thought it was quite interesting, 
>>> especially given the number of PostgreSQL-compatible databases.
>>> 
>>> Paul Nienaber will be picking up implementation of the catalog feature 
>>> going forward and can provide an update
>>> 
>>> Get Outlook for Android<https://aka.ms/AAb9ysg>
>>> 
>>> 
>>> From: Sutou Kouhei 
>>> Sent: Thursday, February 9, 2023, 22:25
>>> To: dev@arrow.apache.org 
>>> Subject: Re: DISCUSS: [FlightSQL] Catalog support
>>> 
>>> Hi James
>>> 
>>> Is there any progress of this?
>>> 
>>> I'm developing a Flight SQL adapter for PostgreSQL:
>>> https://github.com/apache/arrow-flight-sql-postgresql
>>> 
>>> I want to implement session feature for it because opening
>>> a session in PostgreSQL is expensive. PostgreSQL uses one
>>> process per session. If we open and close a session for
>>> each Flight SQL call, we need to start one process for each
>>> Flight SQL call.
>>> 
>>> I noticed that the current Flight SQL specification doesn't
>>> provide the standard session support. So I'm interesting in
>>> this discussion.
>>> 
>>> Background: https://github.com/apache/arrow-flight-sql-postgresql/issues/13
>>> 
>>> 
>>> Thanks,
>>> --
>>> kou
>>> 
>>> In
>>>  
>>> 
>>>   "Re: DISCUSS: [FlightSQL] Catalog support" on Mon, 12 Dec 2022 18:12:06 
>>> +,
>>>   James Duong  wrote:
>>> 
>>>> Hi David,
>>>>
>>>> I've written up the URI parsing in C++ and started adding session 
>>>> management messages. I'm also planning on having the 
>>>> ClientCookieMiddlewareFactory be able to report if sessions are enabled on 
>>>> the server.
>>>>
>>>> I (or another developer) will send an update once those features are ready 
>>>> for demo.
>>>> 
>>>> From: David Li 
>>>> Sent: December 12, 2022 10:07 AM
>>>> To: dev@arrow.apache.org 
>>>> Subject: Re: DISCUSS: [FlightSQL] Catalog support
>>>>
>>>> Following up here, James are you interested in putting up a draft PR for 
>>>> the Flight SQL URI format and for session management?
>>>>
>>>> The Flight SQL URI format would then also cover Andrew's use case. And if 
>>>> someone wants to draw up a PR to the JDBC driver to enable arbitrary 
>>>> properties, I can review that too.
>>>>
>>>> On Sat, Dec 3, 2022, at 05:38, Andrew Lamb wrote:
>>>>>> Andrew, do we need to look into adding more metadata to indicate
>>>>> different query languages? (It's quite a shame that we named this Flight
>>>>> SQL at this point...)
>>>>>
>>>>> TDLR is I don't think trying to explicitly support languages other than 
>>>>> SQL
>>>>> in FlightSQL is a good idea. Among other reasons, the JDBC / ODBC drivers,
>>>>> which mostly assume SQL, are one of the key features of FlightSQL, and 
>>>>> they
>>>>> are likely not as useful for non SQL. I can see the argument to support 
>>>>> for
>>>>> substrait plans, and it will be interesting to see what use cases benefit
>>>>> from tha

[DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-12 Thread David Li
Hello,

I'd like to propose a slight update to Flight RPC to make Flight SQL work 
better in different deployment scenarios.  Comments on the doc would be 
appreciated:

https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing

The gist is that FlightEndpoint allows specifying either (1) a list of concrete 
URIs to fetch data from or (2) no URIs, meaning to fetch from the Flight RPC 
service itself; but it would be useful to combine both behaviors (try these 
concrete URIs and fall back to the Flight RPC service itself) without requiring 
the service to know its own public address.

Best,
David


Re: [DISCUSS] Proposal to expand Arrow Communications

2024-02-02 Thread David Li
I like this new direction, and I think it'll be actually viable unlike the 
Flight-UCX work that was attempted a couple years ago. I think the hardcoded 
endpoints in Flight RPC are difficult for other projects (including Flight 
SQL!) to build on top of, and we would serve users better by 
- standardizing the encoding of Arrow data across different transports like UCX 
(and as with Ian's HTTP proposal),
- focusing on the part of Flight that can unify those transports as described 
here (rather than the whole "action"/"command" structure imposed on Flight 
users).

On Fri, Feb 2, 2024, at 18:22, Matt Topol wrote:
> Hey all,
>
> In my current work I've been experimenting and playing around with
> utilizing Arrow and non-cpu memory data. While the creation of the
> ArrowDeviceArray struct and the enhancements to the Arrow library Device
> abstractions were necessary, there is also a need to extend the
> communications specs we utilize, i.e. Flight.
>
> Currently there is no real way to utilize Arrow Flight with shared memory
> or with non-CPU memory (without an expensive Device -> Host copy first). To
> this end I've done a bunch of research and toying around and came up with a
> protocol to propose and a reference implementation using UCX[1]. Attached
> to the proposal is also a couple extensions for Flight itself to make it
> easier for users to still use Flight for metadata / dataset information and
> then point consumers elsewhere to actually retrieve the data. The idea here
> is that this would be a new specification for how to transport Arrow data
> across these high-performance transports such as UCX / libfabric / shared
> memory / etc. We wouldn't necessarily expose / directly add implementations
> of the spec to the Arrow libraries, just provide reference/example
> implementations.
>
> I've written the proposal up on a google doc[2] that everyone should be
> able to comment on. Once we get some community discussion on there, if
> everyone is okay with it I'd like eventually do a vote on adopting this
> spec and if we do, I'll then make a PR to start adding it to the Arrow
> documentation, etc.
>
> Anyways, thank you everyone in advance for your feedback and comments!
>
> --Matt
>
> [1]: https://github.com/openucx/ucx/
> [2]:
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit?usp=sharing


Arrow community meeting January 31 at 17:00 UTC

2024-01-31 Thread David Li
Hello all,

Apologies for the late notice. I believe Ian is busy this week, but we will 
still be having our usual call at our usual time (i.e., right when I'm sending 
this).

Zoom meeting URL:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
Meeting ID: 876 4903 3008
Passcode: 958092

Meeting notes will be captured in this Google Doc:
https://docs.google.com/document/d/1xrji8fc6_24TVmKiHJB4ECX1Zy2sy2eRbBjpVJMnPmk/
If you plan to attend this meeting, you are welcome to edit the
document to add the topics that you would like to discuss.

Best,
David

Re: [VOTE] Release Apache Arrow nanoarrow 0.4.0 - RC0

2024-01-29 Thread David Li
+1 (binding)

Tested on Debian Linux 'bookworm' 

On Mon, Jan 29, 2024, at 10:45, Dane Pitkin wrote:
> +1 (non-binding)
>
> Verified on MacOS 14 using conda.
>
> On Mon, Jan 29, 2024 at 10:11 AM Dewey Dunnington
>  wrote:
>
>> Hello,
>>
>> I would like to propose the following release candidate (rc0) of
>> Apache Arrow nanoarrow [0] version 0.4.0. This release consists of 46
>> resolved GitHub issues from 5 contributors [1].
>>
>> This release candidate is based on commit:
>> 3f83f4c48959f7a51053074672b7a330888385b1 [2]
>>
>> The source release rc0 is hosted at [3].
>> The changelog is located at [4].
>>
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [5] for how to validate a release
>> candidate. Note also a successful verification CI run at [6].
>>
>> This release contains experimental Python bindings to the nanoarrow C
>> library. This vote is on the source tarball only; however, wheels have
>> also been prepared and tested for convenience and are available from
>> [7].
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow nanoarrow 0.4.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow nanoarrow 0.4.0 because...
>>
>> [0] https://github.com/apache/arrow-nanoarrow
>> [1] https://github.com/apache/arrow-nanoarrow/milestone/4?closed=1
>> [2]
>> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.4.0-rc0
>> [3]
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.4.0-rc0/
>> [4]
>> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.4.0-rc0/CHANGELOG.md
>> [5]
>> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md
>> [6] https://github.com/apache/arrow-nanoarrow/actions/runs/7697719271
>> [7] https://github.com/apache/arrow-nanoarrow/actions/runs/7697710625
>>


Re: [VOTE] Accept donation of Comet Spark native engine

2024-01-27 Thread David Li
+1 (binding)

On Sat, Jan 27, 2024, at 13:03, L. C. Hsieh wrote:
> +1 (binding)
>
> On Sat, Jan 27, 2024 at 8:10 AM Andrew Lamb  wrote:
>>
>> +1 (binding)
>>
>> This is super exciting
>>
>> On Sat, Jan 27, 2024 at 11:00 AM Daniël Heres  wrote:
>>
>> > +1 (binding). Awesome addition to the DataFusion ecosystem!!!
>> >
>> > Daniël
>> >
>> >
>> > On Sat, Jan 27, 2024, 16:57 vin jake  wrote:
>> >
>> > > +1 (binding)
>> > >
>> > > Andy Grove  于 2024年1月27日周六 下午11:43写道:
>> > >
>> > > > Hello,
>> > > >
>> > > > This vote is to determine if the Arrow PMC is in favor of accepting the
>> > > > donation of Comet (a Spark native engine that is powered by DataFusion
>> > > and
>> > > > the Rust implementation of Arrow).
>> > > >
>> > > > The donation was previously discussed on the mailing list [1].
>> > > >
>> > > > The proposed donation is at [2].
>> > > >
>> > > > The Arrow PMC will start the IP clearance process if the vote passes.
>> > > There
>> > > > is a Google document [3] where the community is working on the draft
>> > > > contents for the IP clearance form.
>> > > >
>> > > > The vote will be open for at least 72 hours.
>> > > >
>> > > > [ ] +1 : Accept the donation
>> > > > [ ] 0 : No opinion
>> > > > [ ] -1 : Reject donation because...
>> > > >
>> > > > My vote: +1
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Andy.
>> > > >
>> > > >
>> > > > [1] https://lists.apache.org/thread/0q1rb11jtpopc7vt1ffdzro0omblsh0s
>> > > > [2] https://github.com/apache/arrow-datafusion-comet/pull/1
>> > > > [3]
>> > > >
>> > > >
>> > >
>> > https://docs.google.com/document/d/1azmxE1LERNUdnpzqDO5ortKTsPKrhNgQC4oZSmXa8x4/edit?usp=sharing
>> > > >
>> > >
>> >


Re: [VOTE] Release Apache Arrow 15.0.0 - RC1

2024-01-19 Thread David Li
+1 (binding)

Verified sources on Debian 12 'bookworm'. I had issues with binaries but that 
was because of AlmaLinux failing to verify its own GPG key for some reason.

On Thu, Jan 18, 2024, at 04:40, Raúl Cumplido wrote:
> El mié, 17 ene 2024 a las 23:37, Matt Topol () 
> escribió:
>>
>> Yea, I confirmed you're correct. I have a fix for this now and I've put up
>> a PR[1] for it that Raul can cherry-pick into a new RC.
>
> As the issue seems to have been around for 1.5 or 2 years I don't
> think this issue requires a new RC. It can be cherry-picked if we want
> to do a patch release in the future.
>
>>
>> --Matt
>>
>> [1]: https://github.com/apache/arrow/pull/39674
>>
>> On Wed, Jan 17, 2024 at 2:15 PM Ruoxi Sun  wrote:
>>
>> > My guts feeling is this has something to do with timezone, I'm in UTC+8.
>> >
>> > If I change this line [1] to:
>> >   testTime := time.Now().UTC()
>> > test passed.
>> >
>> > So for anyone who can't reproduce, maybe you can try change [1] to:
>> >   loc, _ := time.LoadLocation("Asia/Shanghai")
>> >   testTime := time.Now().In(loc)
>> > to increase the chance?
>> >
>> > [1]
>> >
>> > https://github.com/apache/arrow/blob/c170af41ba0c30b80aa4172da0b3637206368cf2/go/arrow/flight/flightsql/driver/utils_test.go#L90
>> >
>> > *Regards,*
>> > *Rossi*
>> >
>> >
>> > Matt Topol  于2024年1月18日周四 02:55写道:
>> >
>> > > @pitrou Looks like the verification issue is the script just taking the
>> > > system Go if it finds it rather than verifying it's at least Go 1.19+ so
>> > it
>> > > should be pretty easy to fix that and not a release blocker.
>> > >
>> > > Regarding the unit test failure, it looks like I can't replicate it on my
>> > > linux machine and I don't have access to a mac at the moment. Would
>> > anyone
>> > > happen to have a mac they can try to dig into and check out that unit
>> > test
>> > > on? Otherwise I can spin up an AWS instance and try replicating and
>> > > debugging on that if necessary.
>> > >
>> > > --Matt
>> > >
>> > > On Wed, Jan 17, 2024 at 1:30 PM Matt Topol 
>> > wrote:
>> > >
>> > > > I'll take a look at that Go test failure in a bit.
>> > > >
>> > > > As for the ubuntu 22.04 verification failure, I'll double check that
>> > > we're
>> > > > installing Go 1.19 for the verification and using the right PATH to
>> > it, I
>> > > > thought we addressed this but I guess something must have been
>> > > overlooked.
>> > > >
>> > > > --Matt
>> > > >
>> > > > On Wed, Jan 17, 2024 at 1:26 PM Ruoxi Sun 
>> > wrote:
>> > > >
>> > > >> Thanks Raúl, and sorry about the confusion. I should've used RC1.
>> > > >>
>> > > >> Now I just verified RC1 but the failure remains. What's more I tried
>> > on
>> > > my
>> > > >> two laptops (one Intel Mac and one M1 Mac), and got the same failure.
>> > > >>
>> > > >>
>> > > >> *Regards,*
>> > > >> *Rossi*
>> > > >>
>> > > >>
>> > > >> Raúl Cumplido  于2024年1月18日周四 02:14写道:
>> > > >>
>> > > >> > Not sure what the issue is with the test but could you use RC1
>> > instead
>> > > >> of
>> > > >> > RC0:
>> > > >> > TEST_DEFAULT=0 TEST_GO=1 TEST_CPP=1 TEST_PYTHON=1
>> > > >> > ./verify-release-candidate.sh 15.0.0 1
>> > > >> >
>> > > >> > instead of:
>> > > >> > TEST_DEFAULT=0 TEST_GO=1 TEST_CPP=1 TEST_PYTHON=1
>> > > >> > ./verify-release-candidate.sh 15.0.0 0
>> > > >> >
>> > > >> > El mié, 17 ene 2024 a las 19:07, Ruoxi Sun (> > >)
>> > > >> > escribió:
>> > > >> > >
>> > > >> > > Tried:
>> > > >> > >
>> > > >> > > TEST_DEFAULT=0 TEST_GO=1 TEST_CPP=1 TEST_PYTHON=1
>> > > >> > > ./verify-release-candidate.sh 15.0.0 0
>> > > >> > >
>> > > >> > > But one of go test failed:
>> > > >> > >
>> > > >> > > ok  github.com/apache/arrow/go/v15/arrow/flight/flightsql
>> > > >>  4.990s
>> > > >> > > 1970-01-01 12:00:00 + UTC
>> > > >> > > --- FAIL: Test_fromArrowType (0.00s)
>> > > >> > > --- FAIL: Test_fromArrowType/fromArrowType_date64_f16-d64
>> > > (0.00s)
>> > > >> > > utils_test.go:104: test failed, wanted time.Time
>> > 2024-01-17
>> > > >> > > 00:00:00 + UTC got time.Time 2024-01-18 00:00:00 + UTC
>> > > >> > > FAIL
>> > > >> > > FAIL
>> > > github.com/apache/arrow/go/v15/arrow/flight/flightsql/driver
>> > > >> > >  5.821s
>> > > >> > >
>> > > >> > > *Regards,*
>> > > >> > > *Rossi*
>> > > >> > >
>> > > >> > >
>> > > >> > > Raúl Cumplido  于2024年1月18日周四 01:29写道:
>> > > >> > >
>> > > >> > > > +1 (binding)
>> > > >> > > >
>> > > >> > > > I've verified successfully the sources and binaries with:
>> > > >> > > >
>> > > >> > > > TEST_DEFAULT=0 TEST_SOURCE=1
>> > > dev/release/verify-release-candidate.sh
>> > > >> > > > 15.0.0 1
>> > > >> > > > TEST_DEFAULT=0 TEST_BINARIES=1
>> > > >> dev/release/verify-release-candidate.sh
>> > > >> > > > 15.0.0 1
>> > > >> > > > with:
>> > > >> > > >   * Python 3.10.12
>> > > >> > > >   * gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
>> > > >> > > >   * NVIDIA CUDA Build cuda_11.5.r11.5/compiler.30672275_0
>> > > >> > > >   * openjdk version "17.0.9" 2023-10-17
>> > > >> > > >   * ruby 3.0.2p107 

Re: ADBC: xdbc_data_type and xdbc_sql_data_type

2024-01-11 Thread David Li
Those values are inherited from Flight SQL [1] which effectively borrowed types 
from JDBC/ODBC.

xdbc_sql_data_type [2] is defined by an enum [3]. This is the database's type 
in its SQL dialect, not the Arrow type. Arrow types are always represented in 
Arrow schemas. (This field is a little contradictory to JDBC, which specifies 
sql_data_type is unused/reserved.) 

xdbc_data_type [4] is ill-defined I think. James Duong, do you have a 
clarification about Dremio's original intent here? In JDBC this is a 
java.sql.Types value but it is not explained in Flight SQL. In fact it seems 
the proto interchanged the definitions of the two fields, since the enum above 
is java.sql.Types.


[1]: 
https://github.com/apache/arrow-adbc/blob/6b73e529ced2f057aa463e7599c6e1227104b025/adbc.h#L1520-L1522
[2]: 
https://github.com/apache/arrow/blob/2b4a70320232647f730b19d2fea5746c3baec752/format/FlightSql.proto#L1098-L1102
[3]: 
https://github.com/apache/arrow/blob/2b4a70320232647f730b19d2fea5746c3baec752/format/FlightSql.proto#L944-L973
[4]: 
https://github.com/apache/arrow/blob/2b4a70320232647f730b19d2fea5746c3baec752/format/FlightSql.proto#L1067

On Thu, Jan 11, 2024, at 12:37, David Coe wrote:
> I recently raised csharp/src/Apache.Arrow/Types/ArrowType: There are 
> different type IDs for values after 21, including Decimal128 and 
> Decimal256, than for Python * Issue #39568 * apache/arrow 
> (github.com) because I 
> have a downstream system that is interpreting the 
> XDBC_DATA_TYPE
>  
> as the ArrowTypeId and those are different values in different 
> languages.
>
> For ADBC, what is the intended distinction between xdbc_data_type and  
> xdbc_sql_data_type? Is the xdbc_data_type intended to mimic the C types 
> in ODBC? Or is there a different interpretation? And if there are docs 
> I don't seem to be finding, please refer me to those.
>
> Thanks,
>
>   *   David


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2024-01-10 Thread David Li
I've closed the PMC vote. Once I see the PR to apache/arrow with the license 
headers attached, I will start the incubator vote to finalize this.

On Sat, Jan 6, 2024, at 01:25, Jean-Baptiste Onofré wrote:
> Correct.
>
> Regards
> JB
>
> On Fri, Jan 5, 2024 at 8:02 PM David Li  wrote:
>>
>> My understanding is that if the employment contract for all developers 
>> involved stated that Dremio holds the rights, then only Dremio needs to 
>> grant rights to the ASF. Laurent pointed out that this is how Gandiva was 
>> donated [1].
>>
>> [1]: https://incubator.apache.org/ip-clearance/arrow-gandiva.html
>>
>> On Fri, Jan 5, 2024, at 13:03, Alina Li wrote:
>> > Thank you for the update, David. JB / his colleagues would be working
>> > on the PR to import the code to Arrow.
>> >
>> > Regarding ICLAs, I believe Joy and Alex have filled out the ICLA
>> > already. Just to confirm, since no individuals hold copyright over the
>> > code, ICLAs are no longer required from the contributors who worked on
>> > flightsql-odbc?
>> >
>> > Cheers,
>> > Alina


[RESULT][VOTE] Accept donation of flightsql-odbc

2024-01-10 Thread David Li
The vote passes with 5 binding, 3 non-binding +1 votes.

On Sat, Jan 6, 2024, at 12:13, L. C. Hsieh wrote:
> +1 (binding)
>
> Thanks.
>
> On Sat, Jan 6, 2024 at 5:05 AM Andrew Lamb  wrote:
>>
>> +1 (binding)
>>
>> Thank you for helping make this happen
>>
>> On Sat, Jan 6, 2024 at 2:10 AM Sutou Kouhei  wrote:
>>
>> > +1
>> >
>> > In 
>> >   "[VOTE] Accept donation of flightsql-odbc" on Fri, 05 Jan 2024 10:41:21
>> > -0500,
>> >   "David Li"  wrote:
>> >
>> > > Hello,
>> > >
>> > > This vote is to determine if the Arrow PMC is in favor of accepting the
>> > donation of the flightsql-odbc library. This was discussed in a previous ML
>> > thread [1].
>> > >
>> > > The outline of the IP clearance form is at [2][3]. The code to be
>> > donated is at [4].
>> > >
>> > > The vote will be open for at least 72 hours.
>> > >
>> > > [ ] +1 : Accept the donation
>> > > [ ] 0 : No opinion
>> > > [ ] -1 : Reject donation because...
>> > >
>> > > My vote: +1
>> > >
>> > > [1]: https://lists.apache.org/thread/p3qyhd7p3o8v0wxgm2jvqf2vbqo92m8k
>> > > [2]:
>> > https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/arrow-flight-sql-odbc.xml
>> > > [3]:
>> > https://incubator.apache.org/ip-clearance/arrow-flight-sql-odbc.html
>> > > [4]: https://github.com/dremio/flightsql-odbc
>> > >
>> > > Best,
>> > > David
>> >


Re: [RESULT][VOTE] Release Apache Arrow ADBC 0.9.0 - RC0

2024-01-08 Thread David Li
[x] Close the GitHub milestone/project
[x] Add the new release to the Apache Reporter System
[x] Upload source release artifacts to Subversion
[x] Create the final GitHub release
[x] Update website
[x] Upload wheels/sdist to PyPI
[x] Publish Maven packages
[x] Update tags for Go modules
[x] Deploy APT/Yum repositories
[x] Upload Ruby packages to RubyGems
[x] Update conda-forge packages
[x] Announce the new release
[x] Remove old artifacts
[x] Bump versions
[IN PROGRESS] Publish release blog post [1]

[1]: https://github.com/apache/arrow-site/pull/462

On Mon, Jan 8, 2024, at 10:28, David Li wrote:
> The vote passes with 4 binding +1 votes, 1 non-binding. Thanks everyone!
>
> I'll take care of the release tasks.
>
> On Sat, Jan 6, 2024, at 19:06, Sutou Kouhei wrote:
>> +1
>>
>> I ran the following on Debian GNU/Linux sid:
>>
>>   JAVA_HOME=/usr/lib/jvm/default-java \
>> dev/release/verify-release-candidate.sh 0.9.0 0
>>
>> with:
>>
>>   * g++ (Debian 13.2.0-4) 13.2.0
>>   * go version go1.21.5 linux/amd64
>>   * openjdk version "17.0.9-ea" 2023-10-17
>>   * Python 3.11.7
>>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
>>   * R version 4.3.2 (2023-10-31) -- "Eye Holes"
>>
>> Note that I need the following fix:
>>
>>   * https://github.com/apache/arrow-adbc/pull/1436
>>
>> but it's a verification script problem. It's not a problem
>> of ADBC 0.9.0.
>>
>>
>> Thanks,
>> -- 
>> kou
>>
>>
>> In <1889d6b6-a5b4-425f-b5a8-6c3b18bfc...@app.fastmail.com>
>>   "[VOTE] Release Apache Arrow ADBC 0.9.0 - RC0" on Wed, 03 Jan 2024 
>> 15:54:54 -0500,
>>   "David Li"  wrote:
>>
>>> Hello,
>>> 
>>> I would like to propose the following release candidate (RC0) of Apache 
>>> Arrow ADBC version 0.9.0. This is a release consisting of 34 resolved 
>>> GitHub issues [1].
>>> 
>>> This release candidate is based on commit: 
>>> 37a27717fb94fb84211f1b17486cc8f0be7df59c [2]
>>> 
>>> The source release rc0 is hosted at [3].
>>> The binary artifacts are hosted at [4][5][6][7][8].
>>> The changelog is located at [9].
>>> 
>>> Please download, verify checksums and signatures, run the unit tests, and 
>>> vote on the release. See [10] for how to validate a release candidate.
>>> 
>>> See also a verification result on GitHub Actions [11].
>>> 
>>> The vote will be open for at least 72 hours.
>>> 
>>> [ ] +1 Release this as Apache Arrow ADBC 0.9.0
>>> [ ] +0
>>> [ ] -1 Do not release this as Apache Arrow ADBC 0.9.0 because...
>>> 
>>> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
>>> TEST_APT=0 TEST_YUM=0`.)
>>> 
>>> [1]: 
>>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.9.0%22+is%3Aclosed
>>> [2]: 
>>> https://github.com/apache/arrow-adbc/commit/37a27717fb94fb84211f1b17486cc8f0be7df59c
>>> [3]: 
>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.9.0-rc0/
>>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>>> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>>> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>>> [7]: 
>>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>>> [8]: 
>>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.9.0-rc0
>>> [9]: 
>>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.9.0-rc0/CHANGELOG.md
>>> [10]: 
>>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>>> [11]: https://github.com/apache/arrow-adbc/actions/runs/7401997231


[ANNOUNCE] Apache Arrow ADBC 0.9.0 released

2024-01-08 Thread David Li
The Apache Arrow community is pleased to announce the 0.9.0 release of the 
Apache Arrow ADBC libraries. It includes 34 resolved GitHub issues ([1]).

The release is available now from [2] and [3].

Release notes are available at: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.9.0/CHANGELOG.md#adbc-libraries-090-2024-01-03

What is Apache Arrow?
-
Apache Arrow is a columnar in-memory analytics layer designed to accelerate big 
data. It houses a set of canonical in-memory representations of flat and 
hierarchical data along with multiple language-bindings for structure 
manipulation. It also provides low-overhead streaming and batch messaging, 
zero-copy interprocess communication (IPC), and vectorized in-memory analytics 
libraries. Languages currently supported include C, C++, C#, Go, Java, 
JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

What is Apache Arrow ADBC?
--
ADBC is a database access abstraction for Arrow-based applications. It provides 
a cross-language API for working with databases while using Arrow data, 
providing an alternative to APIs like JDBC and ODBC for analytical 
applications. For more, see [4].

Please report any feedback to the mailing lists ([5], [6]).

Regards,
The Apache Arrow Community

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.9.0%22+is%3Aclosed
[2]: https://www.apache.org/dyn/closer.cgi/arrow/apache-arrow-adbc-0.9.0 
[3]: https://apache.jfrog.io/ui/native/arrow
[4]: https://arrow.apache.org/blog/2023/01/05/introducing-arrow-adbc/
[5]: https://lists.apache.org/list.html?u...@arrow.apache.org
[6]: https://lists.apache.org/list.html?dev@arrow.apache.org


[RESULT][VOTE] Release Apache Arrow ADBC 0.9.0 - RC0

2024-01-08 Thread David Li
The vote passes with 4 binding +1 votes, 1 non-binding. Thanks everyone!

I'll take care of the release tasks.

On Sat, Jan 6, 2024, at 19:06, Sutou Kouhei wrote:
> +1
>
> I ran the following on Debian GNU/Linux sid:
>
>   JAVA_HOME=/usr/lib/jvm/default-java \
> dev/release/verify-release-candidate.sh 0.9.0 0
>
> with:
>
>   * g++ (Debian 13.2.0-4) 13.2.0
>   * go version go1.21.5 linux/amd64
>   * openjdk version "17.0.9-ea" 2023-10-17
>   * Python 3.11.7
>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
>   * R version 4.3.2 (2023-10-31) -- "Eye Holes"
>
> Note that I need the following fix:
>
>   * https://github.com/apache/arrow-adbc/pull/1436
>
> but it's a verification script problem. It's not a problem
> of ADBC 0.9.0.
>
>
> Thanks,
> -- 
> kou
>
>
> In <1889d6b6-a5b4-425f-b5a8-6c3b18bfc...@app.fastmail.com>
>   "[VOTE] Release Apache Arrow ADBC 0.9.0 - RC0" on Wed, 03 Jan 2024 
> 15:54:54 -0500,
>   "David Li"  wrote:
>
>> Hello,
>> 
>> I would like to propose the following release candidate (RC0) of Apache 
>> Arrow ADBC version 0.9.0. This is a release consisting of 34 resolved GitHub 
>> issues [1].
>> 
>> This release candidate is based on commit: 
>> 37a27717fb94fb84211f1b17486cc8f0be7df59c [2]
>> 
>> The source release rc0 is hosted at [3].
>> The binary artifacts are hosted at [4][5][6][7][8].
>> The changelog is located at [9].
>> 
>> Please download, verify checksums and signatures, run the unit tests, and 
>> vote on the release. See [10] for how to validate a release candidate.
>> 
>> See also a verification result on GitHub Actions [11].
>> 
>> The vote will be open for at least 72 hours.
>> 
>> [ ] +1 Release this as Apache Arrow ADBC 0.9.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow ADBC 0.9.0 because...
>> 
>> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
>> TEST_APT=0 TEST_YUM=0`.)
>> 
>> [1]: 
>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.9.0%22+is%3Aclosed
>> [2]: 
>> https://github.com/apache/arrow-adbc/commit/37a27717fb94fb84211f1b17486cc8f0be7df59c
>> [3]: 
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.9.0-rc0/
>> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> [7]: 
>> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> [8]: 
>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.9.0-rc0
>> [9]: 
>> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.9.0-rc0/CHANGELOG.md
>> [10]: 
>> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> [11]: https://github.com/apache/arrow-adbc/actions/runs/7401997231


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2024-01-05 Thread David Li
My understanding is that if the employment contract for all developers involved 
stated that Dremio holds the rights, then only Dremio needs to grant rights to 
the ASF. Laurent pointed out that this is how Gandiva was donated [1].

[1]: https://incubator.apache.org/ip-clearance/arrow-gandiva.html

On Fri, Jan 5, 2024, at 13:03, Alina Li wrote:
> Thank you for the update, David. JB / his colleagues would be working 
> on the PR to import the code to Arrow.
>
> Regarding ICLAs, I believe Joy and Alex have filled out the ICLA 
> already. Just to confirm, since no individuals hold copyright over the 
> code, ICLAs are no longer required from the contributors who worked on 
> flightsql-odbc?
>
> Cheers,
> Alina


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2024-01-05 Thread David Li
(Apologies if this gets delivered twice; I used the wrong email the first time 
around and I _think_ it didn't send as a result.)

I have updated the IP clearance form.

Dremio has informed me that no individuals hold copyright over the code. The 
Software Grant has been filed. I will start a vote for the Arrow PMC to accept 
the code. 

Alina, could you/your colleagues open a PR to the Arrow project importing the 
code, where the code has been updated to add the Apache license header? That 
appears to be the last step ("Check and make sure that the files that have been 
donated have been updated to reflect the new ASF copyright"). It doesn't have 
to build or anything, just move the files into the right place with the license 
header.

On Tue, Dec 19, 2023, at 12:17, Alina Li wrote:
> Thank you both David and Jean-B, I had informed the contributors to put 
> "arrow" in the notify project field. Please update this thread after 
> you're notified regarding the receival of ICLAs.
>
> Thanks,
> Alina


[VOTE] Accept donation of flightsql-odbc

2024-01-05 Thread David Li
Hello,

This vote is to determine if the Arrow PMC is in favor of accepting the 
donation of the flightsql-odbc library. This was discussed in a previous ML 
thread [1].

The outline of the IP clearance form is at [2][3]. The code to be donated is at 
[4].

The vote will be open for at least 72 hours.

[ ] +1 : Accept the donation
[ ] 0 : No opinion
[ ] -1 : Reject donation because...

My vote: +1

[1]: https://lists.apache.org/thread/p3qyhd7p3o8v0wxgm2jvqf2vbqo92m8k
[2]: 
https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/arrow-flight-sql-odbc.xml
[3]: https://incubator.apache.org/ip-clearance/arrow-flight-sql-odbc.html
[4]: https://github.com/dremio/flightsql-odbc

Best,
David

Re: [VOTE] Release Apache Arrow ADBC 0.9.0 - RC0

2024-01-03 Thread David Li
My vote: +1

Tested on macOS/AArch64, Debian Linux/x86_64

On Wed, Jan 3, 2024, at 15:54, David Li wrote:
> Hello,
>
> I would like to propose the following release candidate (RC0) of Apache 
> Arrow ADBC version 0.9.0. This is a release consisting of 34 resolved 
> GitHub issues [1].
>
> This release candidate is based on commit: 
> 37a27717fb94fb84211f1b17486cc8f0be7df59c [2]
>
> The source release rc0 is hosted at [3].
> The binary artifacts are hosted at [4][5][6][7][8].
> The changelog is located at [9].
>
> Please download, verify checksums and signatures, run the unit tests, 
> and vote on the release. See [10] for how to validate a release 
> candidate.
>
> See also a verification result on GitHub Actions [11].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow ADBC 0.9.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow ADBC 0.9.0 because...
>
> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
> TEST_APT=0 TEST_YUM=0`.)
>
> [1]: 
> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.9.0%22+is%3Aclosed
> [2]: 
> https://github.com/apache/arrow-adbc/commit/37a27717fb94fb84211f1b17486cc8f0be7df59c
> [3]: 
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.9.0-rc0/
> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> [7]: 
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> [8]: 
> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.9.0-rc0
> [9]: 
> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.9.0-rc0/CHANGELOG.md
> [10]: 
> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> [11]: https://github.com/apache/arrow-adbc/actions/runs/7401997231


[VOTE] Release Apache Arrow ADBC 0.9.0 - RC0

2024-01-03 Thread David Li
Hello,

I would like to propose the following release candidate (RC0) of Apache Arrow 
ADBC version 0.9.0. This is a release consisting of 34 resolved GitHub issues 
[1].

This release candidate is based on commit: 
37a27717fb94fb84211f1b17486cc8f0be7df59c [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8].
The changelog is located at [9].

Please download, verify checksums and signatures, run the unit tests, and vote 
on the release. See [10] for how to validate a release candidate.

See also a verification result on GitHub Actions [11].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow ADBC 0.9.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow ADBC 0.9.0 because...

Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export TEST_APT=0 
TEST_YUM=0`.)

[1]: 
https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.9.0%22+is%3Aclosed
[2]: 
https://github.com/apache/arrow-adbc/commit/37a27717fb94fb84211f1b17486cc8f0be7df59c
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.9.0-rc0/
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[7]: 
https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
[8]: 
https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.9.0-rc0
[9]: 
https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.9.0-rc0/CHANGELOG.md
[10]: 
https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
[11]: https://github.com/apache/arrow-adbc/actions/runs/7401997231


Re: What's wrong with my TLS reasoning and FlightServerBase ?

2023-12-30 Thread David Li
Just to be clear - the server never supports both TLS and plaintext connections 
at the same time. (I don't believe this is possible in gRPC.) The URI scheme 
determines how the server listens so if you don't use grpc+tls:// it will use 
plaintext regardless of if you pass certificates or not. The code could do more 
input validation in this case but it was never listening using TLS in the first 
place.

On Sat, Dec 30, 2023, at 18:57, Bryce Mecum wrote:
> Hi Rick,
>
> You're right that TLS support is built into PyArrow Flight [1]. I
> think the issue with your code is that your client isn't attempting to
> connect over TLS and that the default behavior of the FlightServerBase
> must be to allow both TLS and non-TLS connections. This seems to be
> similar to how web servers might choose to accept connections over
> HTTP and HTTPS (though many may not).
>
> To make your code fail as you expect, see [1] and, in your client
> code, either change server_location to use
> pyarrow.flight.Location.for_grpc_tls to construct the Location object
> or change your URI to "grpc+tls://localhost:8081" instead of just
> "grpc://localhost:8081". Once you change this, your client should fail
> with an SSL handshake error.
>
> [1] https://arrow.apache.org/docs/python/flight.html#enabling-tls
>
> On Sat, Dec 30, 2023 at 2:20 PM Rick Spencer
>  wrote:
>>
>> I am working on supporting TLS, and it looks like everything that I need is
>> built into FlightServerBase.
>>
>> However, I am struggling to understand how it works, or how to test that it
>> is working. For example, I don't understand why I can pass garbage in for
>> the tls_certs, and still get results when called from a client. Here is a
>> minimal example I put together to show where I am confused.
>>
>> Server that I think should not work:
>> ```python
>> from pyarrow import flight, Table
>>
>> class SampleServer(flight.FlightServerBase):
>> def __init__(self, *args, **kwargs):
>> tls_certificates = [("garbage", "garbage")]
>> location = flight.Location.for_grpc_tcp("localhost", 8081)
>> super(SampleServer, self).__init__(location,
>> None,
>> tls_certificates,
>> False,
>> None,
>> *args, **kwargs)
>>
>> def do_get(self, context, ticket):
>> data = {'col': [1]}
>> table = Table.from_pydict(data)
>> return flight.RecordBatchStream(table)
>>
>> if __name__ == "__main__":
>> server = SampleServer()
>> server.serve()
>> ```
>>
>> Client code that I think should not work: ```python
>> import pyarrow.flight as fl
>> import json
>> def main():
>> server_location = "grpc://localhost:8081"
>>
>> client = fl.FlightClient(server_location)
>> ticket = fl.Ticket(json.dumps({}))
>> reader = client.do_get(ticket)
>> print(reader.read_all().to_pandas())
>>
>> if __name__ == "__main__":
>> main()
>> ```
>>
>> But when I run the server, and then the client, I get a result: ``` %
>> python3 client.py
>>col
>> 01 ``` I would expect some kind of TLS error.
>>
>> I am sure that I am confused about something, but if someone could help me
>> with my reasoning, I would appreciate it.


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2023-12-18 Thread David Li
Yes, but just "arrow" should be fine.

On Mon, Dec 18, 2023, at 12:38, Alina Li wrote:
> Thanks for the confirmation, David. So, the only optional field they 
> need to fill in on the ICLA form is the notify project field, and they 
> need to put "arrow-pmc" in it?


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2023-12-15 Thread David Li
You would omit the Apache ID. That is for new committers.

On Fri, Dec 15, 2023, at 12:59, Alina Li wrote:
> Hi David, thanks for the heads up. I'm still learning about the ICLA 
> process, so I have some questions. According to Contributor agreements 
> page [1], I see that an apache id is also needed if a contributor is 
> submitting ICLA in response to an invitation from a PMC. So, in order 
> for the contributors to check the option to notify the Arrow PMC on the 
> ICLA, should they enter "arrow-pmc" in the notify project field as well 
> as a chosen apache id in the preferred Apache id field?
>
> [1]: https://www.apache.org/licenses/contributor-agreements.html#clas


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2023-12-15 Thread David Li
Thanks JB for the clarification!

@Alina when your colleagues file their ICLAs, please make sure to check the 
option to notify the Arrow PMC on the paperwork as otherwise I have no way of 
knowing whether it was filed or not. Thanks!

On Fri, Dec 15, 2023, at 04:21, Jean-Baptiste Onofré wrote:
> For example, I did several code donations on different Apache projects
> (CarbonData, Beam, Karaf, etc) with different companies: we used only
> a software grant and often proposed few original contributors as
> committers (to maintain the code in the ASF project).
> That works for code made as part of a company project.
>
> For code made by individuals, we need an ICLA for each contributor.
>
> Let me know if I can help on this.
>
> Thanks !
> Regards
> JB
>
>
> On Fri, Dec 15, 2023 at 10:18 AM Jean-Baptiste Onofré  
> wrote:
>>
>> Actually, we would need both: CCLA + IP approval from all original
>> contributors (that's the theory).
>> However, as sometime it's hard to get ICLA for everyone, when original
>> contributors worked on the code while employee to a company, CCLA is
>> enough.
>>
>> Regards
>> JB
>>
>> On Thu, Dec 14, 2023 at 10:20 PM Laurent Goujon
>>  wrote:
>> >
>> > All the persons mentioned in that list where Dremio employees/contractors
>> > at that time working on the project in as part of their contract. Shouldn't
>> > Dremio CCLA be enough?
>> >
>> > Laurent
>> >
>> > On Thu, Dec 14, 2023 at 8:24 AM David Li  wrote:
>> >
>> > > Ok, I've started to fill out the paperwork [1].
>> > >
>> > > Dremio will need to fill out a Software Grant [2]. When submitting, 
>> > > please
>> > > make a note to notify the Apache Arrow PMC.
>> > >
>> > > Going by the contributors to flightsql-odbc, the following people will
>> > > need to file an ICLA [3]:
>> > > - Alex McRae
>> > > - João Victor Huguenin
>> > > - "joyc-bq" (I was unable to identify them)
>> > >
>> > > *it's possible they already have done this, please let me know if that's
>> > > the case.
>> > >
>> > > These people already have filed an ICLA, either as committers or when we
>> > > did the JDBC driver donation:
>> > > - James Duong
>> > > - Jay Ho
>> > > - Jose Almeida
>> > > - Rafael Telles
>> > >
>> > > [1]:
>> > > https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/arrow-flight-sql-odbc.xml
>> > > [2]: https://www.apache.org/licenses/contributor-agreements.html#grants
>> > > [3]: https://www.apache.org/licenses/contributor-agreements.html#clas
>> > >
>> > > On Wed, Dec 13, 2023, at 17:56, David Li wrote:
>> > > > Yes, I can handle this process on the PMC side. Thanks for the help.
>> > > >
>> > > > On Wed, Dec 13, 2023, at 16:55, Laurent Goujon wrote:
>> > > >> David, would you be the PMC contact for the flightsql-odbc driver? If
>> > > yes,
>> > > >> and if there's no prior objection for bringing flightsql-odbc code to
>> > > Arrow
>> > > >> project, I can start the internal discussion to get the legal approval
>> > > from
>> > > >> Dremio and work on the ip clearance with you (JB can be my backup on
>> > > that
>> > > >> one).
>> > > >>
>> > > >> Laurent
>> > > >>
>> > > >> On Tue, Dec 12, 2023 at 12:45 PM Alina Li > > > .invalid>
>> > > >> wrote:
>> > > >>
>> > > >>> David you bring a good point. Regarding the IP clearance for 
>> > > >>> Timestream
>> > > >>> ODBC driver, we are still looking to get the necessary paperwork from
>> > > >>> Amazon. We're also considering using the Ignite ODBC Driver seed [1]
>> > > as a
>> > > >>> replacement to the Timestream seed if it shows that we're unable to
>> > > obtain
>> > > >>> paperwork from Amazon; we are still discussing this internally and
>> > > will get
>> > > >>> back to the community afterwards.
>> > > >>>
>> > > >>> Regarding paperwork for the Dremio code, thank you Laurent for 
>> > > >>> offering
>> > > >>> your help. Please do let us know if there's an

Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2023-12-14 Thread David Li
Hmm, the template states "Check that all active committers have a signed CLA on 
record." But I see that other projects have been satisfied with only a software 
grant from the employer. I think it should be OK then. It might be useful to 
have documentation of the Dremio CLA on hand if needed (I see [1] cites the 
InfluxDB CLA explicitly and omits Apache ICLA, for example).

[1]: https://incubator.apache.org/ip-clearance/arrow-rust-object-store.html

On Thu, Dec 14, 2023, at 16:20, Laurent Goujon wrote:
> All the persons mentioned in that list where Dremio employees/contractors
> at that time working on the project in as part of their contract. Shouldn't
> Dremio CCLA be enough?
>
> Laurent
>
> On Thu, Dec 14, 2023 at 8:24 AM David Li  wrote:
>
>> Ok, I've started to fill out the paperwork [1].
>>
>> Dremio will need to fill out a Software Grant [2]. When submitting, please
>> make a note to notify the Apache Arrow PMC.
>>
>> Going by the contributors to flightsql-odbc, the following people will
>> need to file an ICLA [3]:
>> - Alex McRae
>> - João Victor Huguenin
>> - "joyc-bq" (I was unable to identify them)
>>
>> *it's possible they already have done this, please let me know if that's
>> the case.
>>
>> These people already have filed an ICLA, either as committers or when we
>> did the JDBC driver donation:
>> - James Duong
>> - Jay Ho
>> - Jose Almeida
>> - Rafael Telles
>>
>> [1]:
>> https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/arrow-flight-sql-odbc.xml
>> [2]: https://www.apache.org/licenses/contributor-agreements.html#grants
>> [3]: https://www.apache.org/licenses/contributor-agreements.html#clas
>>
>> On Wed, Dec 13, 2023, at 17:56, David Li wrote:
>> > Yes, I can handle this process on the PMC side. Thanks for the help.
>> >
>> > On Wed, Dec 13, 2023, at 16:55, Laurent Goujon wrote:
>> >> David, would you be the PMC contact for the flightsql-odbc driver? If
>> yes,
>> >> and if there's no prior objection for bringing flightsql-odbc code to
>> Arrow
>> >> project, I can start the internal discussion to get the legal approval
>> from
>> >> Dremio and work on the ip clearance with you (JB can be my backup on
>> that
>> >> one).
>> >>
>> >> Laurent
>> >>
>> >> On Tue, Dec 12, 2023 at 12:45 PM Alina Li > .invalid>
>> >> wrote:
>> >>
>> >>> David you bring a good point. Regarding the IP clearance for Timestream
>> >>> ODBC driver, we are still looking to get the necessary paperwork from
>> >>> Amazon. We're also considering using the Ignite ODBC Driver seed [1]
>> as a
>> >>> replacement to the Timestream seed if it shows that we're unable to
>> obtain
>> >>> paperwork from Amazon; we are still discussing this internally and
>> will get
>> >>> back to the community afterwards.
>> >>>
>> >>> Regarding paperwork for the Dremio code, thank you Laurent for offering
>> >>> your help. Please do let us know if there's anything we can do to help
>> as
>> >>> well.
>> >>>
>> >>> [1]:
>> https://github.com/apache/ignite/tree/master/modules/platforms/cpp
>> >>> 
>> >>> From: Laurent Goujon 
>> >>> Sent: Friday, December 8, 2023 11:01 AM
>> >>> To: dev@arrow.apache.org 
>> >>> Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR for
>> >>> Flight SQL ODBC driver
>> >>>
>> >>> Am I reading the ticket correctly that this is also about importing
>> some of
>> >>> the Dremio code into Arrow project (namely
>> >>> https://github.com/dremio/flightsql-odbc/). If it is the case, let me
>> >>> check
>> >>> how my company can provide the documentation for the project?
>> >>>
>> >>> On Fri, Dec 8, 2023 at 8:41 AM David Li  wrote:
>> >>>
>> >>> > Thanks for the clarification. That does sound like a nontrivial
>> amount of
>> >>> > code.
>> >>> >
>> >>> > My worry is that we might not be able to get all the paperwork
>> necessary
>> >>> > from Amazon/Amazon contributors for the Timestream part. The
>> >>> > document/guidelines are here [1]. Does that look doable from your
>> end?
>> >>>

Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2023-12-14 Thread David Li
Ok, I've started to fill out the paperwork [1].

Dremio will need to fill out a Software Grant [2]. When submitting, please make 
a note to notify the Apache Arrow PMC.

Going by the contributors to flightsql-odbc, the following people will need to 
file an ICLA [3]:
- Alex McRae
- João Victor Huguenin
- "joyc-bq" (I was unable to identify them)

*it's possible they already have done this, please let me know if that's the 
case. 

These people already have filed an ICLA, either as committers or when we did 
the JDBC driver donation:
- James Duong
- Jay Ho
- Jose Almeida
- Rafael Telles

[1]: 
https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/arrow-flight-sql-odbc.xml
[2]: https://www.apache.org/licenses/contributor-agreements.html#grants
[3]: https://www.apache.org/licenses/contributor-agreements.html#clas

On Wed, Dec 13, 2023, at 17:56, David Li wrote:
> Yes, I can handle this process on the PMC side. Thanks for the help.
>
> On Wed, Dec 13, 2023, at 16:55, Laurent Goujon wrote:
>> David, would you be the PMC contact for the flightsql-odbc driver? If yes,
>> and if there's no prior objection for bringing flightsql-odbc code to Arrow
>> project, I can start the internal discussion to get the legal approval from
>> Dremio and work on the ip clearance with you (JB can be my backup on that
>> one).
>>
>> Laurent
>>
>> On Tue, Dec 12, 2023 at 12:45 PM Alina Li 
>> wrote:
>>
>>> David you bring a good point. Regarding the IP clearance for Timestream
>>> ODBC driver, we are still looking to get the necessary paperwork from
>>> Amazon. We're also considering using the Ignite ODBC Driver seed [1] as a
>>> replacement to the Timestream seed if it shows that we're unable to obtain
>>> paperwork from Amazon; we are still discussing this internally and will get
>>> back to the community afterwards.
>>>
>>> Regarding paperwork for the Dremio code, thank you Laurent for offering
>>> your help. Please do let us know if there's anything we can do to help as
>>> well.
>>>
>>> [1]: https://github.com/apache/ignite/tree/master/modules/platforms/cpp
>>> 
>>> From: Laurent Goujon 
>>> Sent: Friday, December 8, 2023 11:01 AM
>>> To: dev@arrow.apache.org 
>>> Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR for
>>> Flight SQL ODBC driver
>>>
>>> Am I reading the ticket correctly that this is also about importing some of
>>> the Dremio code into Arrow project (namely
>>> https://github.com/dremio/flightsql-odbc/). If it is the case, let me
>>> check
>>> how my company can provide the documentation for the project?
>>>
>>> On Fri, Dec 8, 2023 at 8:41 AM David Li  wrote:
>>>
>>> > Thanks for the clarification. That does sound like a nontrivial amount of
>>> > code.
>>> >
>>> > My worry is that we might not be able to get all the paperwork necessary
>>> > from Amazon/Amazon contributors for the Timestream part. The
>>> > document/guidelines are here [1]. Does that look doable from your end?
>>> >
>>> > [1]: https://incubator.apache.org/ip-clearance/
>>> >
>>> > On Thu, Dec 7, 2023, at 14:30, Alina Li wrote:
>>> > > Hi David. To be one the safer side, I suggest going through IP
>>> > > clearance for [3] the Timestream ODBC driver project, and more code
>>> > > than entry_points.cpp will be used. We have initially plan to use the
>>> > > Timestream's entry points code, but it includes more than just
>>> > > entry_points.cpp (code such as [5] odbc.cpp, [6] odbc.h and some other
>>> > > files are part of the entry points), and besides the entry points,
>>> > > we're planning to use Timestream's installers and DSN window as well.
>>> > > Sorry for the confusion.
>>> > >
>>> > > [5]:
>>> > >
>>> >
>>> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/src/odbc.cpp
>>> > > [6]:
>>> > >
>>> >
>>> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/include/timestream/odbc.h
>>> > > 
>>> > > From: David Li 
>>> > > Sent: Wednesday, December 6, 2023 6:09 AM
>>> > > To: dev@arrow.apache.org 
>>> > > Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR for
>>> > > Flight SQL ODBC driver
>>> > >

Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2023-12-13 Thread David Li
Yes, I can handle this process on the PMC side. Thanks for the help.

On Wed, Dec 13, 2023, at 16:55, Laurent Goujon wrote:
> David, would you be the PMC contact for the flightsql-odbc driver? If yes,
> and if there's no prior objection for bringing flightsql-odbc code to Arrow
> project, I can start the internal discussion to get the legal approval from
> Dremio and work on the ip clearance with you (JB can be my backup on that
> one).
>
> Laurent
>
> On Tue, Dec 12, 2023 at 12:45 PM Alina Li 
> wrote:
>
>> David you bring a good point. Regarding the IP clearance for Timestream
>> ODBC driver, we are still looking to get the necessary paperwork from
>> Amazon. We're also considering using the Ignite ODBC Driver seed [1] as a
>> replacement to the Timestream seed if it shows that we're unable to obtain
>> paperwork from Amazon; we are still discussing this internally and will get
>> back to the community afterwards.
>>
>> Regarding paperwork for the Dremio code, thank you Laurent for offering
>> your help. Please do let us know if there's anything we can do to help as
>> well.
>>
>> [1]: https://github.com/apache/ignite/tree/master/modules/platforms/cpp
>> 
>> From: Laurent Goujon 
>> Sent: Friday, December 8, 2023 11:01 AM
>> To: dev@arrow.apache.org 
>> Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR for
>> Flight SQL ODBC driver
>>
>> Am I reading the ticket correctly that this is also about importing some of
>> the Dremio code into Arrow project (namely
>> https://github.com/dremio/flightsql-odbc/). If it is the case, let me
>> check
>> how my company can provide the documentation for the project?
>>
>> On Fri, Dec 8, 2023 at 8:41 AM David Li  wrote:
>>
>> > Thanks for the clarification. That does sound like a nontrivial amount of
>> > code.
>> >
>> > My worry is that we might not be able to get all the paperwork necessary
>> > from Amazon/Amazon contributors for the Timestream part. The
>> > document/guidelines are here [1]. Does that look doable from your end?
>> >
>> > [1]: https://incubator.apache.org/ip-clearance/
>> >
>> > On Thu, Dec 7, 2023, at 14:30, Alina Li wrote:
>> > > Hi David. To be one the safer side, I suggest going through IP
>> > > clearance for [3] the Timestream ODBC driver project, and more code
>> > > than entry_points.cpp will be used. We have initially plan to use the
>> > > Timestream's entry points code, but it includes more than just
>> > > entry_points.cpp (code such as [5] odbc.cpp, [6] odbc.h and some other
>> > > files are part of the entry points), and besides the entry points,
>> > > we're planning to use Timestream's installers and DSN window as well.
>> > > Sorry for the confusion.
>> > >
>> > > [5]:
>> > >
>> >
>> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/src/odbc.cpp
>> > > [6]:
>> > >
>> >
>> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/include/timestream/odbc.h
>> > > 
>> > > From: David Li 
>> > > Sent: Wednesday, December 6, 2023 6:09 AM
>> > > To: dev@arrow.apache.org 
>> > > Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR for
>> > > Flight SQL ODBC driver
>> > >
>> > > Thanks for the update, Alina. This sounds good, my only question for
>> > > the broader community is whether there is enough imported code that we
>> > > should go through the IP clearance process [1]. It's never been clear
>> > > to me what exactly the threshold for this is. flightsql-odbc [2] is
>> > > already quite large on its own and probably we should go through
>> > > clearance? It's not clear to me how much of the Timestream project [3]
>> > > would be involved here, if you mean literally only entry_points.cpp [4]
>> > > (that's probably OK without clearance?) or more code than that.
>> > >
>> > > [1]: https://incubator.apache.org/ip-clearance/
>> > > [2]: https://github.com/dremio/flightsql-odbc
>> > > [3]: https://github.com/awslabs/amazon-timestream-odbc-driver
>> > > [4]:
>> > >
>> >
>> https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/src/entry_points.cpp
>> > >
>> > > On Tue, Dec 5, 2023, at 18:25, Alina Li wrote:
>> > >>

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

2023-12-13 Thread David Li
That seems fine. Is there somewhere more canonical we should be putting the 
libraries? For instance I poked netty-tcnative and it puts the libraries under 
META-INF/native.

On Wed, Dec 13, 2023, at 17:52, James Duong wrote:
> Another issue popped up while doing modules with native libraries.
>
> We currently put native libraries in the arch folder in the root of 
> each built JAR. JPMS is interpreting this as a package name (eg x86_64).
>
> This is breaking when a user tries to use multiple modules with native 
> libs (or in the case of dataset, have a module with a native lib depend 
> on another module with a native lib). JPMS sees the arch package in two 
> modules and treats this as a package collision.
>
> I’m thinking we prefix the arch directory with another directory 
> indicating which module the binary came from (eg arrow-c-data/x86_64). 
> This will probably require code changes, CMake changes, and test 
> changes.
>
>
> From: Jean-Baptiste Onofré 
> Date: Wednesday, December 13, 2023 at 11:45 AM
> To: dev@arrow.apache.org 
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> Agree with David on that.
>
> Regards
> JB
>
> On Wed, Dec 13, 2023 at 8:17 PM David Li  wrote:
>>
>> I think we will just have to live with requiring different flags for 
>> different modules. If we can detect the lack of the flag at runtime somehow 
>> and present an appropriate error message (the way we do for the memory 
>> module) that would be best.
>>
>> On Wed, Dec 13, 2023, at 12:33, James Duong wrote:
>> > For the work on modularization, I’ve merged the flight-grpc and
>> > flight-core modules. I haven’t moved any other packages.
>> >
>> > I also encountered this problem with flight-core. Flight-core derives
>> > from io.netty.buffer (specifically CompositeByteBuf), but we cannot use
>> > Netty buffer with JPMS without requiring users to monkey-patch it with
>> > patch-module. I have flight-core using Netty buffer on the classpath,
>> > but deriving from CompositeByteBuf requires adding
>> > --add-reads=org.apache.arrow.flight.core=ALL-UNNAMED to the command
>> > line since Netty’s being treated as part of the unnamed module in this
>> > context.
>> >
>> > This isn’t terribly onerous from the user’s perspective (especially
>> > compared to monkey-patching), but we start getting into the business of
>> > having different command-line arguments for different modules the user
>> > might be using.
>> >
>> > We could tell users to always add
>> > –add-reads=org.apache.arrow.flight.core=ALL-UNNAMED, but if they aren’t
>> > using Flight, they’ll get a warning about an unrecognized module. Any
>> > thoughts about the best way to present these command-line changes?
>> >
>> >
>> > From: Jean-Baptiste Onofré 
>> > Date: Wednesday, December 13, 2023 at 6:58 AM
>> > To: dev@arrow.apache.org 
>> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> > Module System
>> > Hi,
>> >
>> > We have a "split packages" between flight-core and flight-grpc. I
>> > don't think anyone is using only flight-core to create a new
>> > "transport".
>> > I think it's acceptable to combine flight-core and flight-grpc, and
>> > maybe reshape a bit (imho some classes from flight-grpc should be
>> > actually in flight-core).
>> > Users can always reshade (even if it's painful).
>> >
>> > Regards
>> > JB
>> >
>> > On Tue, Dec 12, 2023 at 8:52 PM David Li  wrote:
>> >>
>> >> I would be OK with combining flight-core and flight-grpc. It's not clear 
>> >> to me anyways that the current split would actually be helpful if we do 
>> >> ever want to support Flight without gRPC.
>> >>
>> >> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
>> >> > An update on this work.
>> >> >
>> >> > I’ve been able to update the following to JPMS modules now:
>> >> >
>> >> >   *   arrow-memory-core
>> >> >   *   arrow-memory-unsafe
>> >> >   *   arrow-format
>> >> >   *   arrow-vector
>> >> >   *   arrow-compression
>> >> >   *   arrow-tools
>> >> >   *   arrow-avro
>> >> >   *   arrow-jdbc
>> >> >   *   arrow-algorithm
>> >> >
>> >> > I don’t think the following should be modularized (but would appreciate
>>

Re: [DISCUSS] Fix Arrow Flight Core tests

2023-12-13 Thread David Li
Could we include submodules in the source distribution instead? 

Also, you can still build from the source distribution, you just have to 
-DskipTests.

On Wed, Dec 13, 2023, at 09:54, Jean-Baptiste Onofré wrote:
> Hi David and foxes,
>
> I checked the release source distribution and I think we should maybe
> include a profile to include TestTls related tests.
>
> As an Apache project, we are supposed to be able to build from the
> source distribution without any external requirements (in the case the
> project goes to the attic or someone wants to create a new release on
> an old branch).
>
> Downloading 
> https://dist.apache.org/repos/dist/release/arrow/arrow-14.0.1/apache-arrow-14.0.1.tar.gz
> and trying to build with:
>
>   mvn clean install
>
> fails as git submodule is required.
>
> So, I would propose to:
> 1. Exclude TestTls related tests by default
> 2. Add a new profile (tlsTests) including TestTls tests
> 3. Activate the profile on CI (-PtlsTests)
>
> Thoughts ?
>
> Regards
> JB
>
>
> On Mon, Dec 11, 2023 at 3:00 PM David Li  wrote:
>>
>> You can `git submodule update --init` to get the files. This is documented 
>> in the environment setup [1], though the failing assertion could be more 
>> helpful about pointing this out.
>>
>> [1]: https://arrow.apache.org/docs/dev/developers/java/building.html#building
>>
>> On Mon, Dec 11, 2023, at 08:49, Jean-Baptiste Onofré wrote:
>> > Hi guys,
>> >
>> > I noticed that Arrow Flight Core doesn't build "out of the box" due to
>> > the TestTls failing.
>> > The reason what TestTls is failing is because it tries to read
>> > cert0.pem from the testing/data folder (at project root), but testing
>> > is empty by default.
>> > If I create a cert0.pem by hand (with self signing key), it works.
>> >
>> > So, I propose three options:
>> > 1. We document the required tests to build Arrow java cleanly
>> > 2. We create a default self signed cert0.pem in testing/data that we 
>> > commit.
>> > 3. We add a before step to the test to create the pem file
>> >
>> > My preference would be for 3.
>> >
>> > Thoughts ?
>> >
>> > Regards
>> > JB


Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

2023-12-13 Thread David Li
I think we will just have to live with requiring different flags for different 
modules. If we can detect the lack of the flag at runtime somehow and present 
an appropriate error message (the way we do for the memory module) that would 
be best.

On Wed, Dec 13, 2023, at 12:33, James Duong wrote:
> For the work on modularization, I’ve merged the flight-grpc and 
> flight-core modules. I haven’t moved any other packages.
>
> I also encountered this problem with flight-core. Flight-core derives 
> from io.netty.buffer (specifically CompositeByteBuf), but we cannot use 
> Netty buffer with JPMS without requiring users to monkey-patch it with 
> patch-module. I have flight-core using Netty buffer on the classpath, 
> but deriving from CompositeByteBuf requires adding 
> --add-reads=org.apache.arrow.flight.core=ALL-UNNAMED to the command 
> line since Netty’s being treated as part of the unnamed module in this 
> context.
>
> This isn’t terribly onerous from the user’s perspective (especially 
> compared to monkey-patching), but we start getting into the business of 
> having different command-line arguments for different modules the user 
> might be using.
>
> We could tell users to always add 
> –add-reads=org.apache.arrow.flight.core=ALL-UNNAMED, but if they aren’t 
> using Flight, they’ll get a warning about an unrecognized module. Any 
> thoughts about the best way to present these command-line changes?
>
>
> From: Jean-Baptiste Onofré 
> Date: Wednesday, December 13, 2023 at 6:58 AM
> To: dev@arrow.apache.org 
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> Hi,
>
> We have a "split packages" between flight-core and flight-grpc. I
> don't think anyone is using only flight-core to create a new
> "transport".
> I think it's acceptable to combine flight-core and flight-grpc, and
> maybe reshape a bit (imho some classes from flight-grpc should be
> actually in flight-core).
> Users can always reshade (even if it's painful).
>
> Regards
> JB
>
> On Tue, Dec 12, 2023 at 8:52 PM David Li  wrote:
>>
>> I would be OK with combining flight-core and flight-grpc. It's not clear to 
>> me anyways that the current split would actually be helpful if we do ever 
>> want to support Flight without gRPC.
>>
>> On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
>> > An update on this work.
>> >
>> > I’ve been able to update the following to JPMS modules now:
>> >
>> >   *   arrow-memory-core
>> >   *   arrow-memory-unsafe
>> >   *   arrow-format
>> >   *   arrow-vector
>> >   *   arrow-compression
>> >   *   arrow-tools
>> >   *   arrow-avro
>> >   *   arrow-jdbc
>> >   *   arrow-algorithm
>> >
>> > I don’t think the following should be modularized (but would appreciate
>> > feedback as well):
>> >
>> >   *   arrow-performance (test-only module)
>> >   *   flight-integration-tests (test-only module)
>> >   *   flight-jdbc-core
>> >   *   flight-jdbc-driver (Need to think about this. MySQL suggests they
>> > want to modularize their JDBC driver:
>> > https://bugs.mysql.com/bug.php?id=97683 . How does this interact with
>> > the ServiceLoader changes in JPMS as well).
>> >
>> > I’m starting the flight-related modules now. The first issue I’ve
>> > noticed is that flight-core and flight-grpc reuse packages. I’d like to
>> > combine flight-core and flight-grpc because of this (currently we only
>> > have Flight using grpc in Java).
>> >
>> > After Flight I’d take a look at the modules that have native code:
>> >
>> >   *   arrow-c-data
>> >   *   arrow-orc
>> >   *   arrow-dataset
>> >   *   arrow-gandiva
>> >
>> >
>> > From: James Duong 
>> > Date: Tuesday, December 5, 2023 at 1:39 PM
>> > To: dev@arrow.apache.org 
>> > Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform
>> > Module System
>> > I expect that we’d still have to change CLI arguments in JDK17.
>> >
>> > The need for changing the CLI arguments is due to memory-core now being
>> > a named module while requiring access to Unsafe and using reflection
>> > for accessing internals of java.nio.Buffer.
>> >
>> > We’re using JDK8 code for doing both currently. It might be worth
>> > looking into if there’s a JDK9+ way of doing this without needing
>> > reflection and compiling memory-core as a multi-release JAR.
>> >
>> > I don’t expect more CLI argument

Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform Module System

2023-12-12 Thread David Li
I would be OK with combining flight-core and flight-grpc. It's not clear to me 
anyways that the current split would actually be helpful if we do ever want to 
support Flight without gRPC.

On Tue, Dec 12, 2023, at 14:21, James Duong wrote:
> An update on this work.
>
> I’ve been able to update the following to JPMS modules now:
>
>   *   arrow-memory-core
>   *   arrow-memory-unsafe
>   *   arrow-format
>   *   arrow-vector
>   *   arrow-compression
>   *   arrow-tools
>   *   arrow-avro
>   *   arrow-jdbc
>   *   arrow-algorithm
>
> I don’t think the following should be modularized (but would appreciate 
> feedback as well):
>
>   *   arrow-performance (test-only module)
>   *   flight-integration-tests (test-only module)
>   *   flight-jdbc-core
>   *   flight-jdbc-driver (Need to think about this. MySQL suggests they 
> want to modularize their JDBC driver: 
> https://bugs.mysql.com/bug.php?id=97683 . How does this interact with 
> the ServiceLoader changes in JPMS as well).
>
> I’m starting the flight-related modules now. The first issue I’ve 
> noticed is that flight-core and flight-grpc reuse packages. I’d like to 
> combine flight-core and flight-grpc because of this (currently we only 
> have Flight using grpc in Java).
>
> After Flight I’d take a look at the modules that have native code:
>
>   *   arrow-c-data
>   *   arrow-orc
>   *   arrow-dataset
>   *   arrow-gandiva
>
>
> From: James Duong 
> Date: Tuesday, December 5, 2023 at 1:39 PM
> To: dev@arrow.apache.org 
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> I expect that we’d still have to change CLI arguments in JDK17.
>
> The need for changing the CLI arguments is due to memory-core now being 
> a named module while requiring access to Unsafe and using reflection 
> for accessing internals of java.nio.Buffer.
>
> We’re using JDK8 code for doing both currently. It might be worth 
> looking into if there’s a JDK9+ way of doing this without needing 
> reflection and compiling memory-core as a multi-release JAR.
>
> I don’t expect more CLI argument changes unless we use reflection on 
> NIO classes in other modules.
>
>
> From: Adam Lippai 
> Date: Tuesday, December 5, 2023 at 9:28 AM
> To: dev@arrow.apache.org 
> Subject: Re: [DISC][Java]: Migrate Arrow Java to JPMS Java Platform 
> Module System
> I believe Spark 4.0 was mentioned before. It’ll require Java 17 and 
> will be
> released in a few months (June?).
>
> Best regards,
> Adam Lippai
>
> On Tue, Dec 5, 2023 at 12:05 David Li  wrote:
>
>> Thanks James for delving into this mess.
>>
>> It looks like this change is unavoidable if we want to modularize? I think
>> this is OK. Will the CLI argument change as we continue modularizing, or is
>> this the only change that will be needed?
>>
>> On Mon, Dec 4, 2023, at 20:07, James Duong wrote:
>> > Hello,
>> >
>> > I did some work to separate the below PR into smaller PRs.
>> >
>> >
>> >   *   Updating the versions of dependencies and maven plugins is done
>> > and merged into master.
>> >   *   I separated out the work modularizing arrow-vector,
>> > arrow-memory-core/unsafe, and arrow-memory-netty.
>> >
>> > Modularizing arrow-memory-core requires a smaller change to user
>> > command-line arguments. Instead of:
>> > --add-opens=java.base/java.nio=ALL-UNNAMED
>> >
>> > The user needs to add:
>> > --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
>> >
>> > I initially tried to modularize arrow-vector separately from
>> > arrow-memory-core but found that any meaningful operation in
>> > arrow-vector would trigger an illegal access in memory-core if it
>> > wasn’t modularized.
>> >
>> > I was able to run the tests for arrow-compression and arrow-tools
>> > successfully after modularizing memory-core, memory-unsafe-, and
>> > arrow-vector. Note that I had more success by making memory-core and
>> > memory-unsafe automatic modules.
>> >
>> > I think we should make a decision here on if we want to bite the bullet
>> > and introduce a breaking user-facing change around command-line
>> > options. The other option is to wait for JDK 21 to modularize. That’s
>> > farther down the line and requires refactoring much of the memory
>> > module code and implementing a module using the foreign memory
>> > interface.
>> >
>> > From: James Duong 
>> > Date: Tuesday, November 28, 2023 at 6:48 PM
>> > To

  1   2   3   4   5   6   7   8   9   >