[ANNOUNCE] Apache Arrow 12.0.1 released

2023-06-19 Thread Raúl Cumplido
The Apache Arrow community is pleased to announce the 12.0.1 release.
It includes 38 resolved issues ([1]) since the 12.0.0 release.

The release is available now from our website and [2]:
http://arrow.apache.org/install/

Read about what's new in the release
https://arrow.apache.org/blog/2023/06/13/12.0.1-release/

Changelog
https://arrow.apache.org/release/12.0.1.html

What is Apache Arrow?
-

Apache Arrow is a columnar in-memory analytics layer designed to accelerate big
data. It houses a set of canonical in-memory representations of flat and
hierarchical data along with multiple language-bindings for structure
manipulation. It also provides low-overhead streaming and batch messaging,
zero-copy interprocess communication (IPC), and vectorized in-memory analytics
libraries.

Please report any feedback to the mailing lists ([3])

Regards,
The Apache Arrow community

[1]: https://github.com/apache/arrow/milestone/54?closed=1
[2]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-12.0.1/
[3]: https://lists.apache.org/list.html?dev@arrow.apache.org


Re: [VOTE] Release Apache Arrow ADBC 0.5.0 - RC0

2023-06-19 Thread David Li
My vote: +1 (Ubuntu Linux 20.04/x86_64)

On Fri, Jun 16, 2023, at 05:24, Raúl Cumplido wrote:
> +1 (non-binding)
>
> I ran the following on Ubuntu 22.04:
>
> USE_CONDA=1 dev/release/verify-release-candidate.sh 0.5.0 0
>
> Thanks!
> Raúl
>
> El vie, 16 jun 2023 a las 9:10, Sutou Kouhei () escribió:
>>
>> +1
>>
>> I ran the following on Debian GNU/Linux sid:
>>
>>   JAVA_HOME=/usr/lib/jvm/default-java \
>> dev/release/verify-release-candidate.sh 0.5.0 0
>>
>> with:
>>
>>   * Python 3.11.2
>>   * g++ (Debian 12.2.0-14) 12.2.0
>>   * go version go1.19.8 linux/amd64
>>   * openjdk version "17.0.6" 2023-01-17
>>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
>>   * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
>>
>>
>> Note: I needed https://github.com/apache/arrow-adbc/pull/810 .
>>
>>
>> Thanks,
>> --
>> kou
>>
>>
>> In <74f01cb9-5c76-4745-b357-4deca0bbd...@app.fastmail.com>
>>   "[VOTE] Release Apache Arrow ADBC 0.5.0 - RC0" on Thu, 15 Jun 2023 
>> 20:06:46 -0400,
>>   "David Li"  wrote:
>>
>> > Hello,
>> >
>> > I would like to propose the following release candidate (RC0) of Apache 
>> > Arrow ADBC version 0.5.0. This is a release consisting of 36 resolved 
>> > GitHub issues [1].
>> >
>> > This release candidate is based on commit: 
>> > ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
>> >
>> > The source release rc0 is hosted at [3].
>> > The binary artifacts are hosted at [4][5][6][7][8].
>> > The changelog is located at [9].
>> >
>> > Please download, verify checksums and signatures, run the unit tests, and 
>> > vote on the release. See [10] for how to validate a release candidate.
>> >
>> > See also a verification result on GitHub Actions [11].
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow ADBC 0.5.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.0 because...
>> >
>> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
>> > DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export 
>> > TEST_APT=0 TEST_YUM=0`.)
>> >
>> > [1]: 
>> > https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
>> > [2]: 
>> > https://github.com/apache/arrow-adbc/commit/ac0e0ef8bd83787f65e53d421fce6ad490d9a37d
>> > [3]: 
>> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.0-rc0/
>> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
>> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
>> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
>> > [7]: 
>> > https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
>> > [8]: 
>> > https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.0-rc0
>> > [9]: 
>> > https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0-rc0/CHANGELOG.md
>> > [10]: 
>> > https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
>> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5284608862


[Parquet][C++] Loading from in-memory buffers

2023-06-19 Thread Kohei Yoshida

Hello there,

I would like to get some guidance on how to load Parquet files from an 
in-memory buffers.  I have already managed to load from files by 
following this tutorial:


https://arrow.apache.org/docs/cpp/parquet.html

and I did spend some time looking around the Arrow API to figure out a 
way to load from in-memory buffers.  But so far no luck.


Is there a way to achieve this using the existing Arrow API?  Any help 
or guidance would be appreciated.


A little background on why I'm doing this.  I'm currently working on 
implementing an import filter for Parquet file format for LibreOffice 
Calc, and I'm doing so via orcus library[1] which specializes in 
providing spreadsheet-related file format filters as an external 
library.  The orcus library API itself provides API for both loading 
from files and loading from in-memory buffers for all file formats it 
supports.  LibreOffice itself uses orcus's in-memory buffer API to 
achieve file loading due to the way its file loading mechanism works.  
Currently, I'm temporarily saving the incoming buffer to a temporary 
file and loading from it, but that's far from ideal...


Thanks,

Kohei Yoshida

[1] https://gitlab.com/orcus/orcus


Re: [Parquet][C++] Loading from in-memory buffers

2023-06-19 Thread Antoine Pitrou



Hello Kohei,

You can create a arrow::BufferReader to wrap your in-memory buffer:
https://arrow.apache.org/docs/cpp/api/io.html#in-memory-streams

and then pass it to parquet::FileReaderBuilder:
https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow17FileReaderBuilderE

(BufferReader subclasses RandomAccessFile)

Regards

Antoine.


Le 19/06/2023 à 14:57, Kohei Yoshida a écrit :

Hello there,

I would like to get some guidance on how to load Parquet files from an
in-memory buffers.  I have already managed to load from files by
following this tutorial:

https://arrow.apache.org/docs/cpp/parquet.html

and I did spend some time looking around the Arrow API to figure out a
way to load from in-memory buffers.  But so far no luck.

Is there a way to achieve this using the existing Arrow API?  Any help
or guidance would be appreciated.

A little background on why I'm doing this.  I'm currently working on
implementing an import filter for Parquet file format for LibreOffice
Calc, and I'm doing so via orcus library[1] which specializes in
providing spreadsheet-related file format filters as an external
library.  The orcus library API itself provides API for both loading
from files and loading from in-memory buffers for all file formats it
supports.  LibreOffice itself uses orcus's in-memory buffer API to
achieve file loading due to the way its file loading mechanism works.
Currently, I'm temporarily saving the incoming buffer to a temporary
file and loading from it, but that's far from ideal...

Thanks,

Kohei Yoshida

[1] https://gitlab.com/orcus/orcus


Re: [DISCUSS][Format] Draft implementation of string view array format

2023-06-19 Thread Benjamin Kietzman
Hi Gang,

I'm not sure what you mean, sorry if my answers are off base:

Parquet's ByteArray will be unaffected by the addition of the string view
type;
all arrow strings (arrow::Type::STRING, arrow::Type::LARGE_STRING, and
with this patch arrow::Type::STRING_VIEW) are converted to ByteArrays
during serialization to parquet [1].

If you mean that encoding of arrow::Type::STRING_VIEW will not be as fast
as encoding of equivalent arrow::Type::STRING, that's something I haven't
benchmarked so I can't answer definitively. I would expect it to be faster
than
first converting STRING_VIEW->STRING then encoding to parquet; direct
encoding avoids allocating and populating temporary buffers. Of course this
only applies to cases where you need to encode an array of STRING_VIEW to
parquet- encoding of STRING to parquet will be unaffected.

Sincerely,
Ben

[1]
https://github.com/bkietz/arrow/blob/46cf7e67766f0646760acefa4d2d01cdfead2d5d/cpp/src/parquet/encoding.cc#L166-L179

On Thu, Jun 15, 2023 at 10:34 PM Gang Wu  wrote:

> Hi Ben,
>
> The posted benchmark [1] looks pretty good to me. However, I want to
> raise a possible issue from the perspective of parquet-cpp. Parquet-cpp
> uses a customized parquet::ByteArray type [2] for string/binary, I would
> expect some regression of conversions between parquet reader/writer
> and the proposed string view array, especially when some strings use
> short form and others use long form.
>
> [1]
>
> https://github.com/apache/arrow/blob/41309de8dd91a9821873fc5f94339f0542ca0108/cpp/src/parquet/types.h#L575
> [2] https://github.com/apache/arrow/pull/35628#issuecomment-1583218617
>
> Best,
> Gang
>
> On Fri, Jun 16, 2023 at 3:58 AM Will Jones 
> wrote:
>
> > Cool. Thanks for doing that!
> >
> > On Thu, Jun 15, 2023 at 12:40 Benjamin Kietzman 
> > wrote:
> >
> > > I've added https://github.com/apache/arrow/issues/36112 to track
> > > deduplication of buffers on write.
> > > I don't think it would require modification of the IPC format.
> > >
> > > Ben
> > >
> > > On Thu, Jun 15, 2023 at 1:30 PM Matt Topol 
> > wrote:
> > >
> > > > Based on my understanding, in theory a buffer *could* be shared
> within
> > a
> > > > batch since the flatbuffers message just uses an offset and length to
> > > > identify the buffers.
> > > >
> > > > That said, I don't believe any current implementation actually does
> > this
> > > or
> > > > takes advantage of this in any meaningful way.
> > > >
> > > > --Matt
> > > >
> > > > On Thu, Jun 15, 2023 at 1:00 PM Will Jones 
> > > > wrote:
> > > >
> > > > > Hi Ben,
> > > > >
> > > > > It's exciting to see this move along.
> > > > >
> > > > > The buffers will be duplicated. If buffer duplication is becomes a
> > > > concern,
> > > > > > I'd prefer to handle
> > > > > > that in the ipc writer. Then buffers which are duplicated could
> be
> > > > > detected
> > > > > > by checking
> > > > > > pointer identity and written only once.
> > > > >
> > > > >
> > > > > Question: to be able to write buffer only once and reference in
> > > multiple
> > > > > arrays, does that require a change to the IPC format? Or is sharing
> > > > buffers
> > > > > within the same batch already allowed in the IPC format?
> > > > >
> > > > > Best,
> > > > >
> > > > > Will Jones
> > > > >
> > > > > On Thu, Jun 15, 2023 at 9:03 AM Benjamin Kietzman <
> > bengil...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hello again all,
> > > > > >
> > > > > > The PR [1] to add string view to the format and the C++
> > > implementation
> > > > is
> > > > > > hovering around passing CI and has been undrafted. Furthermore,
> > there
> > > > is
> > > > > > now also a PR [2] to add string view to the Go implementation.
> Code
> > > > > review
> > > > > > is underway for each PR and I'd like to move toward a vote for
> > > > > acceptance-
> > > > > > are there any other preliminaries which I've neglected?
> > > > > >
> > > > > > To reiterate the answers to some past questions:
> > > > > > - Benchmarks are added in the C++ PR [1] to demonstrate the
> > > performance
> > > > > of
> > > > > >   conversion between the various string formats. In addition,
> there
> > > are
> > > > > >   some benchmarks which demonstrate the performance gains
> available
> > > > with
> > > > > >   the new format [3].
> > > > > > - Adding string view to the C ABI is a natural follow up, but
> > should
> > > be
> > > > > >   handled independently. An issue has been added to track that
> > > > > >   enhancement [4].
> > > > > >
> > > > > > Sincerely,
> > > > > > Ben Kietzman
> > > > > >
> > > > > > [1] https://github.com/apache/arrow/pull/35628
> > > > > > [2] https://github.com/apache/arrow/pull/35769
> > > > > > [3]
> > > https://github.com/apache/arrow/pull/35628#issuecomment-1583218617
> > > > > > [4] https://github.com/apache/arrow/issues/36099
> > > > > >
> > > > > > On Wed, May 17, 2023 at 12:53 PM Benjamin Kietzman <
> > > > bengil...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > @Jacob
> > > > > > > >

Re: [VOTE] Release Apache Arrow ADBC 0.5.0 - RC0

2023-06-19 Thread Jacob Wujciak-Jens
+1 (nb) with conda on ubuntu

On Mon, Jun 19, 2023 at 2:18 PM David Li  wrote:

> My vote: +1 (Ubuntu Linux 20.04/x86_64)
>
> On Fri, Jun 16, 2023, at 05:24, Raúl Cumplido wrote:
> > +1 (non-binding)
> >
> > I ran the following on Ubuntu 22.04:
> >
> > USE_CONDA=1 dev/release/verify-release-candidate.sh 0.5.0 0
> >
> > Thanks!
> > Raúl
> >
> > El vie, 16 jun 2023 a las 9:10, Sutou Kouhei ()
> escribió:
> >>
> >> +1
> >>
> >> I ran the following on Debian GNU/Linux sid:
> >>
> >>   JAVA_HOME=/usr/lib/jvm/default-java \
> >> dev/release/verify-release-candidate.sh 0.5.0 0
> >>
> >> with:
> >>
> >>   * Python 3.11.2
> >>   * g++ (Debian 12.2.0-14) 12.2.0
> >>   * go version go1.19.8 linux/amd64
> >>   * openjdk version "17.0.6" 2023-01-17
> >>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
> >>   * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
> >>
> >>
> >> Note: I needed https://github.com/apache/arrow-adbc/pull/810 .
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >>
> >> In <74f01cb9-5c76-4745-b357-4deca0bbd...@app.fastmail.com>
> >>   "[VOTE] Release Apache Arrow ADBC 0.5.0 - RC0" on Thu, 15 Jun 2023
> 20:06:46 -0400,
> >>   "David Li"  wrote:
> >>
> >> > Hello,
> >> >
> >> > I would like to propose the following release candidate (RC0) of
> Apache Arrow ADBC version 0.5.0. This is a release consisting of 36
> resolved GitHub issues [1].
> >> >
> >> > This release candidate is based on commit:
> ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
> >> >
> >> > The source release rc0 is hosted at [3].
> >> > The binary artifacts are hosted at [4][5][6][7][8].
> >> > The changelog is located at [9].
> >> >
> >> > Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [10] for how to validate a release candidate.
> >> >
> >> > See also a verification result on GitHub Actions [11].
> >> >
> >> > The vote will be open for at least 72 hours.
> >> >
> >> > [ ] +1 Release this as Apache Arrow ADBC 0.5.0
> >> > [ ] +0
> >> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.0 because...
> >> >
> >> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export
> DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export
> TEST_APT=0 TEST_YUM=0`.)
> >> >
> >> > [1]:
> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
> >> > [2]:
> https://github.com/apache/arrow-adbc/commit/ac0e0ef8bd83787f65e53d421fce6ad490d9a37d
> >> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.0-rc0/
> >> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> >> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> >> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> >> > [7]:
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> >> > [8]:
> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.0-rc0
> >> > [9]:
> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0-rc0/CHANGELOG.md
> >> > [10]:
> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> >> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5284608862
>


Re: [VOTE] Release Apache Arrow ADBC 0.5.0 - RC0

2023-06-19 Thread Matt Topol
+1

Tested on Pop!_Os (Ubuntu 22.04) x86_64

On Mon, Jun 19, 2023, 10:55 AM Jacob Wujciak-Jens
 wrote:

> +1 (nb) with conda on ubuntu
>
> On Mon, Jun 19, 2023 at 2:18 PM David Li  wrote:
>
> > My vote: +1 (Ubuntu Linux 20.04/x86_64)
> >
> > On Fri, Jun 16, 2023, at 05:24, Raúl Cumplido wrote:
> > > +1 (non-binding)
> > >
> > > I ran the following on Ubuntu 22.04:
> > >
> > > USE_CONDA=1 dev/release/verify-release-candidate.sh 0.5.0 0
> > >
> > > Thanks!
> > > Raúl
> > >
> > > El vie, 16 jun 2023 a las 9:10, Sutou Kouhei ()
> > escribió:
> > >>
> > >> +1
> > >>
> > >> I ran the following on Debian GNU/Linux sid:
> > >>
> > >>   JAVA_HOME=/usr/lib/jvm/default-java \
> > >> dev/release/verify-release-candidate.sh 0.5.0 0
> > >>
> > >> with:
> > >>
> > >>   * Python 3.11.2
> > >>   * g++ (Debian 12.2.0-14) 12.2.0
> > >>   * go version go1.19.8 linux/amd64
> > >>   * openjdk version "17.0.6" 2023-01-17
> > >>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
> > >>   * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
> > >>
> > >>
> > >> Note: I needed https://github.com/apache/arrow-adbc/pull/810 .
> > >>
> > >>
> > >> Thanks,
> > >> --
> > >> kou
> > >>
> > >>
> > >> In <74f01cb9-5c76-4745-b357-4deca0bbd...@app.fastmail.com>
> > >>   "[VOTE] Release Apache Arrow ADBC 0.5.0 - RC0" on Thu, 15 Jun 2023
> > 20:06:46 -0400,
> > >>   "David Li"  wrote:
> > >>
> > >> > Hello,
> > >> >
> > >> > I would like to propose the following release candidate (RC0) of
> > Apache Arrow ADBC version 0.5.0. This is a release consisting of 36
> > resolved GitHub issues [1].
> > >> >
> > >> > This release candidate is based on commit:
> > ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
> > >> >
> > >> > The source release rc0 is hosted at [3].
> > >> > The binary artifacts are hosted at [4][5][6][7][8].
> > >> > The changelog is located at [9].
> > >> >
> > >> > Please download, verify checksums and signatures, run the unit
> tests,
> > and vote on the release. See [10] for how to validate a release
> candidate.
> > >> >
> > >> > See also a verification result on GitHub Actions [11].
> > >> >
> > >> > The vote will be open for at least 72 hours.
> > >> >
> > >> > [ ] +1 Release this as Apache Arrow ADBC 0.5.0
> > >> > [ ] +0
> > >> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.0 because...
> > >> >
> > >> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export
> > DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export
> > TEST_APT=0 TEST_YUM=0`.)
> > >> >
> > >> > [1]:
> >
> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
> > >> > [2]:
> >
> https://github.com/apache/arrow-adbc/commit/ac0e0ef8bd83787f65e53d421fce6ad490d9a37d
> > >> > [3]:
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.0-rc0/
> > >> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> > >> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> > >> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> > >> > [7]:
> >
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> > >> > [8]:
> >
> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.0-rc0
> > >> > [9]:
> >
> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0-rc0/CHANGELOG.md
> > >> > [10]:
> >
> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> > >> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5284608862
> >
>


Re: [VOTE] Release Apache Arrow ADBC 0.5.0 - RC0

2023-06-19 Thread Jean-Baptiste Onofré
+1 (non binding)

Regards
JB

On Fri, Jun 16, 2023 at 2:06 AM David Li  wrote:
>
> Hello,
>
> I would like to propose the following release candidate (RC0) of Apache Arrow 
> ADBC version 0.5.0. This is a release consisting of 36 resolved GitHub issues 
> [1].
>
> This release candidate is based on commit: 
> ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
>
> The source release rc0 is hosted at [3].
> The binary artifacts are hosted at [4][5][6][7][8].
> The changelog is located at [9].
>
> Please download, verify checksums and signatures, run the unit tests, and 
> vote on the release. See [10] for how to validate a release candidate.
>
> See also a verification result on GitHub Actions [11].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow ADBC 0.5.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow ADBC 0.5.0 because...
>
> Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
> DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export 
> TEST_APT=0 TEST_YUM=0`.)
>
> [1]: 
> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
> [2]: 
> https://github.com/apache/arrow-adbc/commit/ac0e0ef8bd83787f65e53d421fce6ad490d9a37d
> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.0-rc0/
> [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> [7]: 
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> [8]: 
> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.0-rc0
> [9]: 
> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0-rc0/CHANGELOG.md
> [10]: 
> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> [11]: https://github.com/apache/arrow-adbc/actions/runs/5284608862


[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-19 Thread Dewey Dunnington
Hello,

I would like to propose the following release candidate (RC1) of
Apache Arrow nanoarrow version 0.2.0. This release consists of 17
resolved GitHub issues [1].

This release candidate is based on commit:
f71063605e288d9a8dd73cfdd9578773519b6743 [2]

The source release rc1 is hosted at [3].
The changelog is located at [4].
The draft release post is located at [5].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [6] for how to validate a release
candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...

[0] https://github.com/apache/arrow-nanoarrow
[1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
[2] 
https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
[3] 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
[4] 
https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
[5] https://github.com/apache/arrow-site/pull/364
[6] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC0

2023-06-19 Thread Dewey Dunnington
Hi all,

Thank you all for verifying!

The issue uncovered during verification on MacOS M1/Conda (Thanks!)
has been fixed [1] and a new release candidate has been issued [2].

Cheers,

-dewey

[1] https://github.com/apache/arrow-nanoarrow/pull/242
[2] https://lists.apache.org/thread/027xxw9vfv7dnf6lhhyzqofoyqnnbnf2

On Sun, Jun 18, 2023 at 6:05 PM Sutou Kouhei  wrote:
>
> +1
>
> I ran the following command line on Debian GNU/Linux sid:
>
>   CMAKE_PREFIX_PATH=/tmp/local \
> dev/release/verify-release-candidate.sh 0.2.0 0
>
> with:
>
>   * Apache Arrow C++ main
>   * gcc (Debian 12.2.0-14) 12.2.0
>   * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
>
>
> Thanks,
> --
> kou
>
> In 
>   "[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC0" on Fri, 16 Jun 2023 
> 17:15:41 -0300,
>   Dewey Dunnington  wrote:
>
> > Hello,
> >
> > I would like to propose the following release candidate (RC0) of
> > Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> > resolved GitHub issues [1].
> >
> > This release candidate is based on commit:
> > a7b824de6cb99ce458e1a5cd311d69588ceb0570 [2]
> >
> > The source release rc0 is hosted at [3].
> > The changelog is located at [4].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [5] for how to validate a release
> > candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
> >
> > [0] https://github.com/apache/arrow-nanoarrow
> > [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> > [2] 
> > https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc0
> > [3] 
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc0/
> > [4] 
> > https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc0/CHANGELOG.md
> > [5] 
> > https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-19 Thread Dewey Dunnington
My vote is +1 (non-binding). Verified on MacOS M1 (both Homebrew and Conda).

On Mon, Jun 19, 2023 at 3:58 PM Dewey Dunnington  wrote:
>
> Hello,
>
> I would like to propose the following release candidate (RC1) of
> Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> resolved GitHub issues [1].
>
> This release candidate is based on commit:
> f71063605e288d9a8dd73cfdd9578773519b6743 [2]
>
> The source release rc1 is hosted at [3].
> The changelog is located at [4].
> The draft release post is located at [5].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [6] for how to validate a release
> candidate.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
>
> [0] https://github.com/apache/arrow-nanoarrow
> [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> [2] 
> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
> [3] 
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
> [4] 
> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
> [5] https://github.com/apache/arrow-site/pull/364
> [6] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-19 Thread Will Jones
Thanks for fixing that issue. I can now successfully verify the release on
M1 Mac with Conda.

My vote: +1 (binding)

On Mon, Jun 19, 2023 at 12:10 PM Dewey Dunnington
 wrote:

> My vote is +1 (non-binding). Verified on MacOS M1 (both Homebrew and
> Conda).
>
> On Mon, Jun 19, 2023 at 3:58 PM Dewey Dunnington 
> wrote:
> >
> > Hello,
> >
> > I would like to propose the following release candidate (RC1) of
> > Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> > resolved GitHub issues [1].
> >
> > This release candidate is based on commit:
> > f71063605e288d9a8dd73cfdd9578773519b6743 [2]
> >
> > The source release rc1 is hosted at [3].
> > The changelog is located at [4].
> > The draft release post is located at [5].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [6] for how to validate a release
> > candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
> >
> > [0] https://github.com/apache/arrow-nanoarrow
> > [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> > [2]
> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
> > [3]
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
> > [4]
> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
> > [5] https://github.com/apache/arrow-site/pull/364
> > [6]
> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md
>


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-19 Thread David Li
+1 (Ubuntu 20.04/x86_64)

On Mon, Jun 19, 2023, at 16:34, Will Jones wrote:
> Thanks for fixing that issue. I can now successfully verify the release on
> M1 Mac with Conda.
>
> My vote: +1 (binding)
>
> On Mon, Jun 19, 2023 at 12:10 PM Dewey Dunnington
>  wrote:
>
>> My vote is +1 (non-binding). Verified on MacOS M1 (both Homebrew and
>> Conda).
>>
>> On Mon, Jun 19, 2023 at 3:58 PM Dewey Dunnington 
>> wrote:
>> >
>> > Hello,
>> >
>> > I would like to propose the following release candidate (RC1) of
>> > Apache Arrow nanoarrow version 0.2.0. This release consists of 17
>> > resolved GitHub issues [1].
>> >
>> > This release candidate is based on commit:
>> > f71063605e288d9a8dd73cfdd9578773519b6743 [2]
>> >
>> > The source release rc1 is hosted at [3].
>> > The changelog is located at [4].
>> > The draft release post is located at [5].
>> >
>> > Please download, verify checksums and signatures, run the unit tests,
>> > and vote on the release. See [6] for how to validate a release
>> > candidate.
>> >
>> > The vote will be open for at least 72 hours.
>> >
>> > [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
>> >
>> > [0] https://github.com/apache/arrow-nanoarrow
>> > [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
>> > [2]
>> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
>> > [3]
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
>> > [4]
>> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
>> > [5] https://github.com/apache/arrow-site/pull/364
>> > [6]
>> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md
>>


Re: [Parquet][C++] Loading from in-memory buffers

2023-06-19 Thread Kohei Yoshida

Hi Antoine,

I just went ahead and tried arrow::BufferReader, which, as it turned 
out, can be passed directly to parquet::ParquetFileReader as 
RandomAccessFile.  The rest just worked. :-)


Thanks for your help!

Kohei

On 6/19/23 09:03, Antoine Pitrou wrote:


Hello Kohei,

You can create a arrow::BufferReader to wrap your in-memory buffer:
https://arrow.apache.org/docs/cpp/api/io.html#in-memory-streams

and then pass it to parquet::FileReaderBuilder:
https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow17FileReaderBuilderE 



(BufferReader subclasses RandomAccessFile)

Regards

Antoine.


Le 19/06/2023 à 14:57, Kohei Yoshida a écrit :

Hello there,

I would like to get some guidance on how to load Parquet files from an
in-memory buffers.  I have already managed to load from files by
following this tutorial:

https://arrow.apache.org/docs/cpp/parquet.html

and I did spend some time looking around the Arrow API to figure out a
way to load from in-memory buffers.  But so far no luck.

Is there a way to achieve this using the existing Arrow API? Any help
or guidance would be appreciated.

A little background on why I'm doing this.  I'm currently working on
implementing an import filter for Parquet file format for LibreOffice
Calc, and I'm doing so via orcus library[1] which specializes in
providing spreadsheet-related file format filters as an external
library.  The orcus library API itself provides API for both loading
from files and loading from in-memory buffers for all file formats it
supports.  LibreOffice itself uses orcus's in-memory buffer API to
achieve file loading due to the way its file loading mechanism works.
Currently, I'm temporarily saving the incoming buffer to a temporary
file and loading from it, but that's far from ideal...

Thanks,

Kohei Yoshida

[1] https://gitlab.com/orcus/orcus


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1

2023-06-19 Thread Sutou Kouhei
+1

I ran the following command line on Debian GNU/Linux sid:

  CMAKE_PREFIX_PATH=/tmp/local \
dev/release/verify-release-candidate.sh 0.2.0 1

with:

  * Apache Arrow C++ main
  * gcc (Debian 12.2.0-14) 12.2.0
  * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"


Thanks,
-- 
kou

In 
  "[VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC1" on Mon, 19 Jun 2023 
15:58:45 -0300,
  Dewey Dunnington  wrote:

> Hello,
> 
> I would like to propose the following release candidate (RC1) of
> Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> resolved GitHub issues [1].
> 
> This release candidate is based on commit:
> f71063605e288d9a8dd73cfdd9578773519b6743 [2]
> 
> The source release rc1 is hosted at [3].
> The changelog is located at [4].
> The draft release post is located at [5].
> 
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [6] for how to validate a release
> candidate.
> 
> The vote will be open for at least 72 hours.
> 
> [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
> 
> [0] https://github.com/apache/arrow-nanoarrow
> [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> [2] 
> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc1
> [3] 
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc1/
> [4] 
> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc1/CHANGELOG.md
> [5] https://github.com/apache/arrow-site/pull/364
> [6] https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md


Re: [VOTE] Release Apache Arrow ADBC 0.5.0 - RC0

2023-06-19 Thread Dewey Dunnington
+1 (non-binding)

I ran the following on MacOS M1:

USE_CONDA=1 TEST_APT=0 TEST_YUM=0 ./verify-release-candidate.sh 0.5.0 0

On Mon, Jun 19, 2023 at 12:12 PM Jean-Baptiste Onofré  wrote:
>
> +1 (non binding)
>
> Regards
> JB
>
> On Fri, Jun 16, 2023 at 2:06 AM David Li  wrote:
> >
> > Hello,
> >
> > I would like to propose the following release candidate (RC0) of Apache 
> > Arrow ADBC version 0.5.0. This is a release consisting of 36 resolved 
> > GitHub issues [1].
> >
> > This release candidate is based on commit: 
> > ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
> >
> > The source release rc0 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7][8].
> > The changelog is located at [9].
> >
> > Please download, verify checksums and signatures, run the unit tests, and 
> > vote on the release. See [10] for how to validate a release candidate.
> >
> > See also a verification result on GitHub Actions [11].
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow ADBC 0.5.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.0 because...
> >
> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export 
> > DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export 
> > TEST_APT=0 TEST_YUM=0`.)
> >
> > [1]: 
> > https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
> > [2]: 
> > https://github.com/apache/arrow-adbc/commit/ac0e0ef8bd83787f65e53d421fce6ad490d9a37d
> > [3]: 
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.0-rc0/
> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> > [7]: 
> > https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> > [8]: 
> > https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.0-rc0
> > [9]: 
> > https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0-rc0/CHANGELOG.md
> > [10]: 
> > https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5284608862