Community Over Code Asia 2024 Travel Assistance Applications now open!

2024-02-20 Thread Gavin McDonald
Hello to all users, contributors and Committers!

The Travel Assistance Committee (TAC) are pleased to announce that
travel assistance applications for Community over Code Asia 2024 are now
open!

We will be supporting Community over Code Asia, Hangzhou, China
July 26th - 28th, 2024.

TAC exists to help those that would like to attend Community over Code
events, but are unable to do so for financial reasons. For more info
on this year's applications and qualifying criteria, please visit the
TAC website at < https://tac.apache.org/ >. Applications are already
open on https://tac-apply.apache.org/, so don't delay!

The Apache Travel Assistance Committee will only be accepting
applications from those people that are able to attend the full event.

Important: Applications close on Friday, May 10th, 2024.

Applicants have until the the closing date above to submit their
applications (which should contain as much supporting material as
required to efficiently and accurately process their request), this
will enable TAC to announce successful applications shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a
diverse range of backgrounds; therefore, we encourage (as always)
anyone thinking about sending in an application to do so ASAP.

For those that will need a Visa to enter the Country - we advise you to
apply
now so that you have enough time in case of interview delays. So do not
wait until you know if you have been accepted or not.

We look forward to greeting many of you in Hangzhou, China in July, 2024!

Kind Regards,

Gavin

(On behalf of the Travel Assistance Committee)


Re: [VOTE] Release Apache Arrow ADBC 0.10.0 - RC1

2024-02-20 Thread Jean-Baptiste Onofré
+1 (non binding)

I quickly tested on MacOS arm64.

Regards
JB

On Sun, Feb 18, 2024 at 9:47 PM David Li  wrote:
>
> Hello,
>
> I would like to propose the following release candidate (RC1) of Apache Arrow 
> ADBC version 0.10.0. This is a release consisting of 30 resolved GitHub 
> issues [1].
>
> This release candidate is based on commit: 
> 9a8e44cc62f23a68ffc0d3d4c7362214b221bea0 [2]
>
> The source release rc1 is hosted at [3].
> The binary artifacts are hosted at [4][5][6][7][8].
> The changelog is located at [9].
>
> Please download, verify checksums and signatures, run the unit tests, and 
> vote on the release. See [10] for how to validate a release candidate.
>
> See also a verification result on GitHub Actions [11].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow ADBC 0.10.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow ADBC 0.10.0 because...
>
> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
> DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
> TEST_APT=0 TEST_YUM=0`.)
>
> [1]: 
> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.10.0%22+is%3Aclosed
> [2]: 
> https://github.com/apache/arrow-adbc/commit/9a8e44cc62f23a68ffc0d3d4c7362214b221bea0
> [3]: 
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.10.0-rc1/
> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> [7]: 
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> [8]: 
> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.10.0-rc1
> [9]: 
> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.10.0-rc1/CHANGELOG.md
> [10]: 
> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> [11]: https://github.com/apache/arrow-adbc/actions/runs/7951302316


[RESULT][VOTE] Explicit session management for Flight RPC

2024-02-20 Thread David Li
The vote passes with 3 binding, 3 non-binding +1 votes. Thanks all!

On Mon, Feb 19, 2024, at 09:16, David Li wrote:
> My vote: +1
>
> On Sun, Feb 18, 2024, at 07:06, Joel Lubinitsky wrote:
>> +1
>>
>> On Fri, Feb 16, 2024 at 1:07 PM Andrew Lamb  wrote:
>>
>>> +1
>>>
>>> On Fri, Feb 16, 2024 at 1:46 AM Jean-Baptiste Onofré 
>>> wrote:
>>>
>>> > +1
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On Wed, Feb 14, 2024 at 5:38 PM David Li  wrote:
>>> > >
>>> > > Paul Nienaber would like to propose explicit session management for
>>> > Flight RPC.  This proposal was previously discussed at [1].  A candidate
>>> > implementation for C++ and Java is at [2].
>>> > >
>>> > > The vote will be open for at least 72 hours.
>>> > >
>>> > > [ ] +1
>>> > > [ ] +0
>>> > > [ ] -1 Do not accept this proposal because...
>>> > >
>>> > > [1]: https://lists.apache.org/thread/fd6r1n7vt91sg2c7fr35wcrsqz6x4645
>>> > > [2]: https://github.com/apache/arrow/pull/34817
>>> >
>>>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 36.0.0 RC1

2024-02-20 Thread Andy Grove
I went ahead and published with --no-verify, and I started a vote on 36.0.1
to resolve the circular dependency issue

On Mon, Feb 19, 2024 at 12:16 PM Andy Grove  wrote:

> The vote passes with three binding votes. Thank you all.
>
> The source release has been published, but unfortunately, I cannot publish
> it to crates.io because a circular dependency has been introduced between
> crates. I have filed an issue to track this.
>
> https://github.com/apache/arrow-datafusion/issues/9277
>
> I propose that we invest in a CI check for this somehow, as we have seen
> this at least once before, and this kind of change can potentially be
> disruptive to undo. I have filed an issue for this as well:
>
> https://github.com/apache/arrow-datafusion/issues/9278
>
> Thanks,
>
> Andy.
>
> On Sat, Feb 17, 2024 at 2:25 AM Andrew Lamb  wrote:
>
>> +1 (binding)
>>
>> Verified on M3 Mac
>>
>> Thank you for keeping the release training humming Andy
>>
>> Andrew
>>
>> On Fri, Feb 16, 2024 at 12:23 PM L. C. Hsieh  wrote:
>>
>> > +1 (binding)
>> >
>> > Verified on M3 Mac.
>> >
>> > Thanks Andy.
>> >
>> >
>> > On Fri, Feb 16, 2024 at 9:08 AM Andy Grove 
>> wrote:
>> > >
>> > > Hi,
>> > >
>> > > I would like to propose a release of Apache Arrow DataFusion
>> > Implementation,
>> > > version 36.0.0.
>> > >
>> > > This release candidate is based on commit:
>> > > bf6f83b3d228fb386f9b4b20c254fa58e2412660 [1]
>> > > The proposed release tarball and signatures are hosted at [2].
>> > > The changelog is located at [3].
>> > >
>> > > Please download, verify checksums and signatures, run the unit tests,
>> and
>> > > vote
>> > > on the release. The vote will be open for at least 72 hours.
>> > >
>> > > Only votes from PMC members are binding, but all members of the
>> community
>> > > are
>> > > encouraged to test the release and vote with "(non-binding)".
>> > >
>> > > The standard verification procedure is documented at
>> > >
>> >
>> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
>> > > .
>> > >
>> > > [ ] +1 Release this as Apache Arrow DataFusion 36.0.0
>> > > [ ] +0
>> > > [ ] -1 Do not release this as Apache Arrow DataFusion 36.0.0
>> because...
>> > >
>> > > Here is my vote:
>> > >
>> > > +1
>> > >
>> > > [1]:
>> > >
>> >
>> https://github.com/apache/arrow-datafusion/tree/bf6f83b3d228fb386f9b4b20c254fa58e2412660
>> > > [2]:
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-36.0.0-rc1
>> > > [3]:
>> > >
>> >
>> https://github.com/apache/arrow-datafusion/blob/bf6f83b3d228fb386f9b4b20c254fa58e2412660/CHANGELOG.md
>> >
>>
>


[RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 36.0.0 RC1

2024-02-20 Thread Andy Grove
On Tue, Feb 20, 2024 at 7:08 AM Andy Grove  wrote:

> I went ahead and published with --no-verify, and I started a vote on
> 36.0.1 to resolve the circular dependency issue
>
> On Mon, Feb 19, 2024 at 12:16 PM Andy Grove  wrote:
>
>> The vote passes with three binding votes. Thank you all.
>>
>> The source release has been published, but unfortunately, I cannot
>> publish it to crates.io because a circular dependency has been
>> introduced between crates. I have filed an issue to track this.
>>
>> https://github.com/apache/arrow-datafusion/issues/9277
>>
>> I propose that we invest in a CI check for this somehow, as we have seen
>> this at least once before, and this kind of change can potentially be
>> disruptive to undo. I have filed an issue for this as well:
>>
>> https://github.com/apache/arrow-datafusion/issues/9278
>>
>> Thanks,
>>
>> Andy.
>>
>> On Sat, Feb 17, 2024 at 2:25 AM Andrew Lamb  wrote:
>>
>>> +1 (binding)
>>>
>>> Verified on M3 Mac
>>>
>>> Thank you for keeping the release training humming Andy
>>>
>>> Andrew
>>>
>>> On Fri, Feb 16, 2024 at 12:23 PM L. C. Hsieh  wrote:
>>>
>>> > +1 (binding)
>>> >
>>> > Verified on M3 Mac.
>>> >
>>> > Thanks Andy.
>>> >
>>> >
>>> > On Fri, Feb 16, 2024 at 9:08 AM Andy Grove 
>>> wrote:
>>> > >
>>> > > Hi,
>>> > >
>>> > > I would like to propose a release of Apache Arrow DataFusion
>>> > Implementation,
>>> > > version 36.0.0.
>>> > >
>>> > > This release candidate is based on commit:
>>> > > bf6f83b3d228fb386f9b4b20c254fa58e2412660 [1]
>>> > > The proposed release tarball and signatures are hosted at [2].
>>> > > The changelog is located at [3].
>>> > >
>>> > > Please download, verify checksums and signatures, run the unit
>>> tests, and
>>> > > vote
>>> > > on the release. The vote will be open for at least 72 hours.
>>> > >
>>> > > Only votes from PMC members are binding, but all members of the
>>> community
>>> > > are
>>> > > encouraged to test the release and vote with "(non-binding)".
>>> > >
>>> > > The standard verification procedure is documented at
>>> > >
>>> >
>>> https://github.com/apache/arrow-datafusion/blob/main/dev/release/README.md#verifying-release-candidates
>>> > > .
>>> > >
>>> > > [ ] +1 Release this as Apache Arrow DataFusion 36.0.0
>>> > > [ ] +0
>>> > > [ ] -1 Do not release this as Apache Arrow DataFusion 36.0.0
>>> because...
>>> > >
>>> > > Here is my vote:
>>> > >
>>> > > +1
>>> > >
>>> > > [1]:
>>> > >
>>> >
>>> https://github.com/apache/arrow-datafusion/tree/bf6f83b3d228fb386f9b4b20c254fa58e2412660
>>> > > [2]:
>>> > >
>>> >
>>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-36.0.0-rc1
>>> > > [3]:
>>> > >
>>> >
>>> https://github.com/apache/arrow-datafusion/blob/bf6f83b3d228fb386f9b4b20c254fa58e2412660/CHANGELOG.md
>>> >
>>>
>>


Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme

2024-02-20 Thread David Li
Thanks for the comments - I've updated the implementation [1] and added Go + 
integration tests. If this all checks out I'd like to start a vote soon.

[1]: https://github.com/apache/arrow/pull/40084

On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote:
> Thank you -- I think the usecase is great, but agree with the other
> reviewers that the name may be confusing. I left some notes on the ticket
>
> Andrew
>
> On Wed, Feb 14, 2024 at 3:52 PM David Li  wrote:
>
>> I've put up a candidate implementation sans integration test [1].
>>
>> Some caveats:
>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?'
>> (yes, an empty query string pacifies it). I've chosen the latter since the
>> former is technically a URI with a non-empty path but neither are ideal.
>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
>> faithful to the intended use than 'fallback'.
>>
>> [1]: https://github.com/apache/arrow/pull/40084
>>
>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
>> > Hi David,
>> >
>> > It's reasonable. I think we can start with your initial proposal (it
>> > sounds fine to me) and we can always improve step by step.
>> >
>> > Thanks !
>> > Regards
>> > JB
>> >
>> > On Tue, Feb 13, 2024 at 4:53 PM David Li  wrote:
>> >>
>> >> I'm going to keep the proposal as-is then. It can be extended if this
>> use case comes up.
>> >>
>> >> I'll start work on candidate implementations now.
>> >>
>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>> >> > I think the original proposal is sufficient.
>> >> >
>> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls
>> to
>> >> > http without an explicit server location (unless both Flight servers
>> are
>> >> > hosted under the same port?). So the "+" proposal seems a bit weird.
>> >> >
>> >> >
>> >> > Le 12/02/2024 à 23:39, David Li a écrit :
>> >> >> The idea is that the client would reuse the existing connection, in
>> which case the protocol and such are implicit. (If the client doesn't have
>> a connection anymore, it can't use the fallback anyways.)
>> >> >>
>> >> >> I suppose this has the advantage that you could "fall back" to a
>> known hostname with a different protocol, but I'm not sure that always
>> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
>> addresses aren't hostnames but rather opaque byte blobs, for instance.)
>> >> >>
>> >> >> If we do prefer this, to avoid overloading the hostname, there's
>> also the informal convention of using + in the scheme, so it could be
>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>> >> >>
>> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>> >> >>> Thanks for clarifying.
>> >> >>>
>> >> >>> Given the relationship between these two proposals, would it also be
>> >> >>> necessary to distinguish the scheme (or schemes) supported by the
>> >> >>> originating Flight RPC service?
>> >> >>>
>> >> >>> If that is the case, it may be preferred to use the "host" portion
>> of the
>> >> >>> URI rather than the "scheme" to denote the location of the data. In
>> this
>> >> >>> scenario, the host "0.0.0.0" could be used. This IP address is
>> defined in
>> >> >>> IETF RFC1122 [1] as "This host on this network", which seems most
>> >> >>> consistent with the intended use-case. There are some caveats to
>> this usage
>> >> >>> but in my experience it's not uncommon for protocols to extend the
>> >> >>> definition of this address in their own usage.
>> >> >>>
>> >> >>> A benefit of this convention is that the scheme remains available
>> in the
>> >> >>> URI to specify the transport available. For example, the following
>> list of
>> >> >>> locations may be included in the response:
>> >> >>>
>> >> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4",
>> ...]
>> >> >>>
>> >> >>> This would indicate that grpc and ucx transport is available from
>> the
>> >> >>> current service, grpc is available at 1.2.3.4, and possibly more
>> >> >>> combinations of scheme/host.
>> >> >>>
>> >> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
>> >> >>>
>> >> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li 
>> wrote:
>> >> >>>
>> >>  Ah, while I was thinking of it as useful for a fallback, I'm not
>> >>  specifying it that way.  Better ideas for names would be
>> appreciated.
>> >> 
>> >>  The actual precedence has never been specified. All endpoints are
>> >>  equivalent, so clients may use what is "best". For instance, with
>> Matt
>> >>  Topol's concurrent proposal, a GPU-enabled client may
>> preferentially try
>> >>  UCX endpoints while other clients may choose to ignore them
>> entirely (e.g.
>> >>  because they don't have UCX installed).
>> >> 
>> >>  In practice the ADBC/JDBC drivers just scan the list left to right
>> and try
>> >>  each endpoint in turn for lack of a better heuristic.
>> >> 
>> >>  On Mon, Feb 12, 2024, at 14:28, Joel

Filtering on arrow::DictionaryArray column

2024-02-20 Thread Vaneet Dadra
Hello,

I have an arrow::RecordBatch where one of the column called id is
arrow::DictionaryArray. I wanted to only read the rows which have value,
let's say "specialValue", in "id" column. I have some ad hoc code to read
the rows one by one and apply the filter manually but I am wondering if
there is a way to filter a RecordBatch by a column value and get a filtered
RecordBatch.

Thanks,
Vaneet


Re: [ANNOUNCE] New Arrow committer: Jay Zhan

2024-02-20 Thread Jacob Wujciak-Jens
Congrats and welcome!

On Sat, Feb 17, 2024 at 9:45 AM Christiano Anderson 
wrote:

> Congrats!
>
> On 16/02/2024 11:25, Andrew Lamb wrote:
> > On behalf of the Arrow PMC, I'm happy to announce that Jay Zhan
> > has accepted an invitation to become a committer on Apache
> > Arrow. Welcome, and thank you for your contributions!
> >
> > Andrew
> >
>


Re: [VOTE] Release Apache Arrow ADBC 0.10.0 - RC1

2024-02-20 Thread Dewey Dunnington
+1!

I ran USE_CONDA=1 dev/release/verify-release-candidate.sh 0.10.0 1 on
MacOS Sonoma (M1).

On Tue, Feb 20, 2024 at 9:43 AM Jean-Baptiste Onofré  wrote:
>
> +1 (non binding)
>
> I quickly tested on MacOS arm64.
>
> Regards
> JB
>
> On Sun, Feb 18, 2024 at 9:47 PM David Li  wrote:
> >
> > Hello,
> >
> > I would like to propose the following release candidate (RC1) of Apache 
> > Arrow ADBC version 0.10.0. This is a release consisting of 30 resolved 
> > GitHub issues [1].
> >
> > This release candidate is based on commit: 
> > 9a8e44cc62f23a68ffc0d3d4c7362214b221bea0 [2]
> >
> > The source release rc1 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7][8].
> > The changelog is located at [9].
> >
> > Please download, verify checksums and signatures, run the unit tests, and 
> > vote on the release. See [10] for how to validate a release candidate.
> >
> > See also a verification result on GitHub Actions [11].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow ADBC 0.10.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow ADBC 0.10.0 because...
> >
> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
> > DOCKER_DEFAULT_PLATFORM=linux/amd64`. (Or skip this step by `export 
> > TEST_APT=0 TEST_YUM=0`.)
> >
> > [1]: 
> > https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.10.0%22+is%3Aclosed
> > [2]: 
> > https://github.com/apache/arrow-adbc/commit/9a8e44cc62f23a68ffc0d3d4c7362214b221bea0
> > [3]: 
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.10.0-rc1/
> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> > [7]: 
> > https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> > [8]: 
> > https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.10.0-rc1
> > [9]: 
> > https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.10.0-rc1/CHANGELOG.md
> > [10]: 
> > https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> > [11]: https://github.com/apache/arrow-adbc/actions/runs/7951302316