Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 36.0.0 RC1

2024-03-10 Thread QP Hou
+1 (binding)

On Sun, Mar 10, 2024 at 10:18 AM Andy Grove  wrote:
>
> Bumping this email thread. We need one more +1 PMC vote.
>
> Thanks,
>
> Andy.
>
> On Sun, Mar 3, 2024 at 8:31 PM L. C. Hsieh  wrote:
>
> > +1 (binding)
> >
> > Verified on M3 Mac.
> >
> > Thanks Andy.
> >
> > On Sun, Mar 3, 2024 at 6:53 PM Andy Grove  wrote:
> > >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow DataFusion Python
> > > Bindings,
> > > version 36.0.0.
> > >
> > > This release candidate is based on commit:
> > > 3a82be08c458358a3c07587c2b4d9ffbaf646ca2 [1]
> > > The proposed release tarball and signatures are hosted at [2].
> > > The changelog is located at [3].
> > > The Python wheels are located at [4].
> > >
> > > Please download, verify checksums and signatures, run the unit tests, and
> > > vote
> > > on the release. The vote will be open for at least 72 hours.
> > >
> > > Only votes from PMC members are binding, but all members of the community
> > > are
> > > encouraged to test the release and vote with "(non-binding)".
> > >
> > > The standard verification procedure is documented at
> > >
> > https://github.com/apache/arrow-datafusion-python/blob/main/dev/release/README.md#verifying-release-candidates
> > > .
> > >
> > > [ ] +1 Release this as Apache Arrow DataFusion Python 36.0.0
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow DataFusion Python 36.0.0
> > > because...
> > >
> > > Here is my vote:
> > >
> > > +1
> > >
> > > [1]:
> > >
> > https://github.com/apache/arrow-datafusion-python/tree/3a82be08c458358a3c07587c2b4d9ffbaf646ca2
> > > [2]:
> > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-python-36.0.0-rc1
> > > [3]:
> > >
> > https://github.com/apache/arrow-datafusion-python/blob/3a82be08c458358a3c07587c2b4d9ffbaf646ca2/CHANGELOG.md
> > > [4]: https://test.pypi.org/project/datafusion/36.0.0/
> >


Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread QP Hou
+1 (binding)

exciting milestone :)

On Fri, Mar 1, 2024 at 9:49 AM David Li  wrote:
>
> +1
>
> On Fri, Mar 1, 2024, at 12:06, Jorge Cardoso Leitão wrote:
> > +1 - great work!!!
> >
> > On Fri, Mar 1, 2024 at 5:49 PM Micah Kornfield 
> > wrote:
> >
> >> +1 (binding)
> >>
> >> On Friday, March 1, 2024, Uwe L. Korn  wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote:
> >> > > +1 (binding)
> >> > >
> >> > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace 
> >> > wrote:
> >> > >
> >> > >> +1 (binding)
> >> > >>
> >> > >> On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb 
> >> > wrote:
> >> > >>
> >> > >> > Hello,
> >> > >> >
> >> > >> > As we have discussed[1][2] I would like to vote on the proposal to
> >> > >> > create a new Apache Top Level Project for DataFusion. The text of
> >> the
> >> > >> > proposed resolution and background document is copy/pasted below
> >> > >> >
> >> > >> > If the community is in favor of this, we plan to submit the
> >> resolution
> >> > >> > to the ASF board for approval with the next Arrow report (for the
> >> > >> > April 2024 board meeting).
> >> > >> >
> >> > >> > The vote will be open for at least 7 days.
> >> > >> >
> >> > >> > [ ] +1 Accept this Proposal
> >> > >> > [ ] +0
> >> > >> > [ ] -1 Do not accept this proposal because...
> >> > >> >
> >> > >> > Andrew
> >> > >> >
> >> > >> > [1]
> >> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
> >> > >> > [2] https://github.com/apache/arrow-datafusion/discussions/6475
> >> > >> >
> >> > >> > -- Proposed Resolution -
> >> > >> >
> >> > >> > Resolution to Create the Apache DataFusion Project from the Apache
> >> > >> > Arrow DataFusion Sub Project
> >> > >> >
> >> > >> > =
> >> > >> >
> >> > >> > X. Establish the Apache DataFusion Project
> >> > >> >
> >> > >> > WHEREAS, the Board of Directors deems it to be in the best
> >> > >> > interests of the Foundation and consistent with the
> >> > >> > Foundation's purpose to establish a Project Management
> >> > >> > Committee charged with the creation and maintenance of
> >> > >> > open-source software related to an extensible query engine
> >> > >> > for distribution at no charge to the public.
> >> > >> >
> >> > >> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> >> > >> > Committee (PMC), to be known as the "Apache DataFusion Project",
> >> > >> > be and hereby is established pursuant to Bylaws of the
> >> > >> > Foundation; and be it further
> >> > >> >
> >> > >> > RESOLVED, that the Apache DataFusion Project be and hereby is
> >> > >> > responsible for the creation and maintenance of software
> >> > >> > related to an extensible query engine; and be it further
> >> > >> >
> >> > >> > RESOLVED, that the office of "Vice President, Apache DataFusion" be
> >> > >> > and hereby is created, the person holding such office to
> >> > >> > serve at the direction of the Board of Directors as the chair
> >> > >> > of the Apache DataFusion Project, and to have primary responsibility
> >> > >> > for management of the projects within the scope of
> >> > >> > responsibility of the Apache DataFusion Project; and be it further
> >> > >> >
> >> > >> > RESOLVED, that the persons listed immediately below be and
> >> > >> > hereby are appointed to serve as the initial members of the
> >> > >> > Apache DataFusion Project:
> >> > >> >
> >> > >> > * Andy Grove (agr...@apache.org)
> >> > >> > * Andrew Lamb (al...@apache.org)
> >> > >> > * Daniël Heres (dhe...@apache.org)
> >> > >> > * Jie Wen (jake...@apache.org)
> >> > >> > * Kun Liu (liu...@apache.org)
> >> > >> > * Liang-Chi Hsieh (vii...@apache.org)
> >> > >> > * Qingping Hou: (ho...@apache.org)
> >> > >> > * Wes McKinney(w...@apache.org)
> >> > >> > * Will Jones (wjones...@apache.org)
> >> > >> >
> >> > >> > RESOLVED, that the Apache DataFusion Project be and hereby
> >> > >> > is tasked with the migration and rationalization of the Apache
> >> > >> > Arrow DataFusion sub-project; and be it further
> >> > >> >
> >> > >> > RESOLVED, that all responsibilities pertaining to the Apache
> >> > >> > Arrow DataFusion sub-project encumbered upon the
> >> > >> > Apache Arrow Project are hereafter discharged.
> >> > >> >
> >> > >> > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb
> >> > >> > be appointed to the office of Vice President, Apache DataFusion, to
> >> > >> > serve in accordance with and subject to the direction of the
> >> > >> > Board of Directors and the Bylaws of the Foundation until
> >> > >> > death, resignation, retirement, removal or disqualification,
> >> > >> > or until a successor is appointed.
> >> > >> > =
> >> > >> >
> >> > >> >
> >> > >> > ---
> >> > >> >
> >> > >> >
> >> > >> > Summary:
> >> > >> >
> >> > >> > We propose creating a new top level project, Apache DataFusion, from
> >> > >> > an existing sub project of Apache 

Re: [ANNOUNCE] New Arrow committer: Brent Gardner

2023-01-11 Thread QP Hou
Congratulations Brent!

On Wed, Jan 11, 2023 at 2:56 PM Andy Grove  wrote:

> On behalf of the Arrow PMC, I'm happy to announce that Brent Gardner
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> Andy.
>


Re: [ANNOUNCE] New Arrow PMC chair: Andrew Lamb

2022-12-28 Thread QP Hou
Congrats Andrew and thank you for your continued service to the community!

On Wed, Dec 28, 2022 at 6:53 AM Antoine Pitrou  wrote:
>
>
> Thank you Andrew for accepting this responsibility!
>
>
> Le 26/12/2022 à 05:53, Sutou Kouhei a écrit :
> > I am pleased to announce that we have a new PMC chair and VP as per
> > our newly started tradition of rotating the chair once a year. I have
> > resigned and Andrew Lamb was duly elected by the PMC and approved
> > unanimously by the board. Please join me in congratulating Andrew Lamb!
> >
> > Thanks,


Re: [ANNOUNCE] New Arrow PMC member: Kun Liu

2022-11-14 Thread QP Hou
Congrats Kun!

On Mon, Nov 14, 2022 at 10:50 AM David Li  wrote:
>
> Welcome, Kun!
>
> On Mon, Nov 14, 2022, at 13:27, Ian Joiner wrote:
> > Congrats!
> >
> > On Sun, Nov 13, 2022 at 3:22 PM Sutou Kouhei  wrote:
> >
> >> The Project Management Committee (PMC) for Apache Arrow has invited
> >> Kun Liu to become a PMC member and we are pleased to announce
> >> that Kun Liu has accepted.
> >>
> >> Congratulations and welcome!
> >>


Re: [VOTE][RUST] Release Apache Arrow Rust 26.0.0 RC1

2022-10-28 Thread QP Hou
+1 (binding)

On Fri, Oct 28, 2022 at 10:11 PM L. C. Hsieh  wrote:
>
> +1 (binding)
>
> Verified on M1 Mac.
>
> Thanks Andrew!
>
> On Fri, Oct 28, 2022 at 9:18 PM Raphael Taylor-Davies
>  wrote:
> >
> > +1 (binding)
> >
> > On 29/10/2022 09:13, Andrew Lamb wrote:
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow Rust Implementation,
> > > version 26.0.0.
> > >
> > > This release candidate is based on commit:
> > > 779804317d9c9d80e72a955deb8594eb45a8308a [1]
> > >
> > > The proposed release tarball and signatures are hosted at [2].
> > >
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> > > and vote on the release. There is a script [4] that automates some of
> > > the verification.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow Rust
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > >
> > > [1]:
> > > https://github.com/apache/arrow-rs/tree/779804317d9c9d80e72a955deb8594eb45a8308a
> > > [2]: 
> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-26.0.0-rc1
> > > [3]:
> > > https://github.com/apache/arrow-rs/blob/779804317d9c9d80e72a955deb8594eb45a8308a/CHANGELOG.md
> > > [4]:
> > > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > >


Re: [ANNOUNCE] New Arrow committer: Will Jones

2022-10-27 Thread QP Hou
Congratulations Will!

On Thu, Oct 27, 2022 at 6:51 PM Vibhatha Abeykoon 
wrote:

> Congratulations Will!
>
> On Fri, Oct 28, 2022 at 6:40 AM Li Jin  wrote:
>
> > congrats!
> >
> > On Thu, Oct 27, 2022 at 9:03 PM Matt Topol 
> wrote:
> >
> > > Congrats Will!
> > >
> > > On Thu, Oct 27, 2022 at 9:02 PM Ian Cook 
> wrote:
> > >
> > > > Congratulations Will!
> > > >
> > > > On Thu, Oct 27, 2022 at 19:56 Sutou Kouhei 
> wrote:
> > > >
> > > > > On behalf of the Arrow PMC, I'm happy to announce that Will Jones
> > > > > has accepted an invitation to become a committer on Apache
> > > > > Arrow. Welcome, and thank you for your contributions!
> > > > >
> > > > > kou
> > > > >
> > > >
> > >
> >
>


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 13.0.0 RC1

2022-10-07 Thread QP Hou
+1 (binding)

On Fri, Oct 7, 2022 at 3:11 PM Ian Joiner  wrote:

> +1 (non-binding)
>
> Verified on my Ubuntu 22.04 / AMD
>
> On Fri, Oct 7, 2022 at 7:26 AM Andy Grove  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion
> > Implementation,
> > version 13.0.0.
> >
> > This release candidate is based on commit:
> > 807a0c1d2963f6ca327d316badb4ed0fa77e9f21 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion 13.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion 13.0.0 because...
> >
> > Here is my vote:
> >
> > +1
> >
> > [1]:
> >
> >
> https://github.com/apache/arrow-datafusion/tree/807a0c1d2963f6ca327d316badb4ed0fa77e9f21
> > [2]:
> >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-13.0.0-rc1
> > [3]:
> >
> >
> https://github.com/apache/arrow-datafusion/blob/807a0c1d2963f6ca327d316badb4ed0fa77e9f21/CHANGELOG.md
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.5.1 RC1

2022-10-07 Thread QP Hou
+1 (binding)

On Fri, Oct 7, 2022 at 3:31 PM Ian Joiner  wrote:

> +1 (Non-binding)
>
> Never mind. It is slow but it still works.
>
> On Fri, Oct 7, 2022 at 11:32 AM Andrew Lamb  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Object
> > Store Implementation, version 0.5.1.
> >
> > This release candidate is based on commit:
> > 8a54e95850fe27ac5865a02ef4be2de0937de5b3 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust Object Store
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust Object Store  because...
> >
> > [1]:
> >
> >
> https://github.com/apache/arrow-rs/tree/8a54e95850fe27ac5865a02ef4be2de0937de5b3
> > [2]:
> >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.5.1-rc1
> > [3]:
> >
> >
> https://github.com/apache/arrow-rs/blob/8a54e95850fe27ac5865a02ef4be2de0937de5b3/object_store/CHANGELOG.md
> > [4]:
> >
> >
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 23.0.0 RC2

2022-09-17 Thread QP Hou
+1 binding

On Sat, Sep 17, 2022 at 11:39 AM Ashish  wrote:

> +1 (non-binding)
>
> validated on M1 Mac
>
> thanks
> Ashish
>
> On Sat, Sep 17, 2022 at 11:13 AM L. C. Hsieh  wrote:
>
> > +1 (binding)
> >
> > Verified on M1 Mac.
> >
> > Thanks, Andrew!
> >
> > On Sat, Sep 17, 2022 at 11:12 AM Andy Grove 
> wrote:
> > >
> > > +1 (binding)
> > >
> > > I used the updated verification script from the repo rather than the
> one
> > in
> > > the release.
> > >
> > > Verified on Ubuntu 20.04.4 LTS.
> > >
> > > Thanks, Andrew!
> > >
> > > On Sat, Sep 17, 2022 at 12:01 PM Andrew Lamb 
> > wrote:
> > >
> > > > We have resolved the issue (it was a verification scrip issue).  I
> > hope the
> > > > voting on this release candidate can continue
> > > >
> > > > On Fri, Sep 16, 2022, 16:53 Andrew Lamb 
> wrote:
> > > >
> > > > > Thank you for the report Ian. I am sorry -- I think there is a bug
> > in the
> > > > > verification script; I have filed an issue to track this problem
> [1]
> > and
> > > > > will report back here when it is resolved
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Sep 16, 2022 at 4:35 PM Ian Joiner 
> > > > wrote:
> > > > >
> > > > >> 0 (Non-binding)
> > > > >>
> > > > >> I found a problem related to the new arrow-buffer crate.
> > > > >>
> > > > >> + cargo publish --dry-run
> > > > >>
> > > > >> *Updating* crates.io index
> > > > >>
> > > > >> *   Packaging* arrow v23.0.0
> > > > >>
> > > > >>
> > > >
> >
> (/private/var/folders/cl/ycxd_6916zlf50f8mpthd9qwgn/T/arrow-23.0.0.X.yNgAGJ0B/apache-arrow-rs-23.0.0/arrow)
> > > > >>
> > > > >> *error**:* failed to prepare local package for uploading
> > > > >>
> > > > >>
> > > > >> Caused by:
> > > > >>
> > > > >>   no matching package named `arrow-buffer` found
> > > > >>
> > > > >>   location searched: registry `crates-io`
> > > > >>
> > > > >>   required by package `arrow v23.0.0
> > > > >>
> > > > >>
> > > >
> >
> (/private/var/folders/cl/ycxd_6916zlf50f8mpthd9qwgn/T/arrow-23.0.0.X.yNgAGJ0B/apache-arrow-rs-23.0.0/arrow)`
> > > > >>
> > > > >>
> > > > >> Ian
> > > > >>
> > > > >> On Fri, Sep 16, 2022 at 2:05 PM Andrew Lamb  >
> > > > wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > I would like to propose a release of Apache Arrow Rust
> > Implementation,
> > > > >> > version 23.0.0.
> > > > >> >
> > > > >> > This release candidate is based on commit:
> > > > >> > 5a55406cf24171600a143a83a95046c7513fd92c [1]
> > > > >> >
> > > > >> > The proposed release tarball and signatures are hosted at [2].
> > > > >> >
> > > > >> > The changelog is located at [3].
> > > > >> >
> > > > >> > Note there is a known issue causing the docs CI to fail [5] that
> > is
> > > > not
> > > > >> > related to the arrow code.
> > > > >> >
> > > > >> > Note this is RC2 (RC1 had a problem [6]).
> > > > >> >
> > > > >> > Please download, verify checksums and signatures, run the unit
> > tests,
> > > > >> > and vote on the release. There is a script [4] that automates
> > some of
> > > > >> > the verification.
> > > > >> >
> > > > >> > The vote will be open for at least 72 hours.
> > > > >> >
> > > > >> > [ ] +1 Release this as Apache Arrow Rust
> > > > >> > [ ] +0
> > > > >> > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > >> >
> > > > >> > [1]:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> >
> https://github.com/apache/arrow-rs/tree/5a55406cf24171600a143a83a95046c7513fd92c
> > > > >> > [2]:
> > > > >> >
> > > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-23.0.0-rc2
> > > > >> > [3]:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> >
> https://github.com/apache/arrow-rs/blob/5a55406cf24171600a143a83a95046c7513fd92c/CHANGELOG.md
> > > > >> > [4]:
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > >> > [5]: https://github.com/apache/arrow-rs/issues/2733
> > > > >> > [6]: https://github.com/apache/arrow-rs/pull/2748
> > > > >> >
> > > > >>
> > > > >
> > > >
> >
>
>
> --
> thanks
> ashish
>


Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.5.0 RC1

2022-09-08 Thread QP Hou
+1 (binding)

On Thu, Sep 8, 2022 at 11:16 AM Ashish  wrote:

> +1 (non-binding)
>
> validated on M1 Mac
>
> On Thu, Sep 8, 2022 at 11:03 AM Andrew Lamb  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Object
> > Store Implementation, version 0.5.0.
> >
> > This release candidate is based on commit:
> > dd58805b1c46691fcbe5b46412b2581ae3bd2a58 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > You can see additional details on the release tracking ticket [5]
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust Object Store
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust Object Store  because...
> >
> > [1]:
> >
> >
> https://github.com/apache/arrow-rs/tree/dd58805b1c46691fcbe5b46412b2581ae3bd2a58
> > [2]:
> >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.5.0-rc1
> > [3]:
> >
> >
> https://github.com/apache/arrow-rs/blob/dd58805b1c46691fcbe5b46412b2581ae3bd2a58/object_store/CHANGELOG.md
> > [4]:
> >
> >
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
> > [5]: https://github.com/apache/arrow-rs/issues/2620
> >
>
>
> --
> thanks
> ashish
>


Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-05 Thread QP Hou
Congrats Weston!

On Mon, Sep 5, 2022 at 9:38 AM Yaron Gvili  wrote:

> Congratulations Weston!
> 
> From: Raul Cumplido Dominguez 
> Sent: Monday, September 5, 2022 10:04 AM
> To: dev@arrow.apache.org 
> Subject: Re: [ANNOUNCE] New Arrow PMC member: Weston Pace
>
> Congratulations Weston!
>
> On Mon, Sep 5, 2022 at 3:37 PM Niranda Perera 
> wrote:
>
> > Congrats Weston! :-)
> >
> > On Mon, Sep 5, 2022 at 9:36 AM Vibhatha Abeykoon 
> > wrote:
> >
> > > Congratulations Weston!
> > >
> > > On Mon, Sep 5, 2022 at 7:04 PM Antoine Pitrou 
> > wrote:
> > >
> > > >
> > > > Congratulations Weston!
> > > >
> > > > Le 05/09/2022 à 15:30, Ian Cook a écrit :
> > > > > Congratulations Weston!
> > > > >
> > > > > On Mon, Sep 5, 2022 at 01:56 Sutou Kouhei 
> > wrote:
> > > > >
> > > > >> The Project Management Committee (PMC) for Apache Arrow has
> invited
> > > > >> Weston Pace to become a PMC member and we are pleased to announce
> > > > >> that Weston Pace has accepted.
> > > > >>
> > > > >> Congratulations and welcome!
> > > > >>
> > > > >
> > > >
> > > --
> > > Vibhatha Abeykoon
> > >
> >
> >
> > --
> > Niranda Perera
> > https://niranda.dev/
> > @n1r44 
> >
>


Re: [ANNOUNCE] New Arrow PMC member: L. C. Hsieh

2022-09-03 Thread QP Hou
Congrats Liang-Chi!

On Sat, Sep 3, 2022 at 8:25 PM Remzi Yang <1371656737...@gmail.com> wrote:

> Congratulation Liang-Chi!
>
> On Sun, 4 Sept 2022 at 05:39, Sutou Kouhei  wrote:
>
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > L. C. Hsieh to become a PMC member and we are pleased to announce
> > that L. C. Hsieh has accepted.
> >
> > Congratulations and welcome!
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 22.0.0 RC1

2022-09-03 Thread QP Hou
+1 (binding)

On Fri, Sep 2, 2022 at 8:31 PM Chao Sun  wrote:

> +1 (non-binding). Verified on Intel Mac
>
> Thanks Andrew.
>
> Chao
>
> On Fri, Sep 2, 2022 at 6:23 PM Remzi Yang <1371656737...@gmail.com> wrote:
> >
> > +1 (non_binding) Verified on M1 Mac.
> > Thank you, Andrew.
> >
> > On Sat, 3 Sept 2022 at 08:10, Ashish  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Validated on M1 Mac/ Macos 12.5.1
> > >
> > > On Fri, Sep 2, 2022 at 3:45 PM Andy Grove 
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Verified on Ubuntu 20.04.4 LTS
> > > >
> > > > Thanks, Andrew.
> > > >
> > > > On Fri, Sep 2, 2022 at 12:25 PM Ian Joiner 
> > > wrote:
> > > >
> > > > > +1 (Non-binding)
> > > > >
> > > > > Tested on macOS 12.2.1 / Apple M1
> > > > >
> > > > > On Fri, Sep 2, 2022 at 2:11 PM Andrew Lamb 
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to propose a release of Apache Arrow Rust
> > > Implementation,
> > > > > > version 22.0.0.
> > > > > >
> > > > > > This release candidate is based on commit:
> > > > > > e5b9d05ec50807666efe401729708d53216d79fc [1]
> > > > > >
> > > > > > The proposed release tarball and signatures are hosted at [2].
> > > > > >
> > > > > > The changelog is located at [3].
> > > > > >
> > > > > > Please download, verify checksums and signatures, run the unit
> tests,
> > > > > > and vote on the release. There is a script [4] that automates
> some of
> > > > > > the verification.
> > > > > >
> > > > > > The vote will be open for at least 72 hours.
> > > > > >
> > > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > > [ ] +0
> > > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/arrow-rs/tree/e5b9d05ec50807666efe401729708d53216d79fc
> > > > > > [2]:
> > > > > >
> > > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-22.0.0-rc1
> > > > > > [3]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/arrow-rs/blob/e5b9d05ec50807666efe401729708d53216d79fc/CHANGELOG.md
> > > > > > [4]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > thanks
> > > ashish
> > >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 21.0.0 RC2

2022-08-22 Thread QP Hou
+1 (binding)

On Mon, Aug 22, 2022 at 7:49 AM Ashish  wrote:
>
> +1 (non-binding)
>
> Hit the same json issue, already discussed.
>
> On Mon, Aug 22, 2022 at 6:45 AM Andy Grove  wrote:
>
> > That sounds fine to me.
> >
> > +1 (binding)
> >
> > Thanks, Andrew and LC.
> >
> > On Mon, Aug 22, 2022 at 3:11 AM Andrew Lamb  wrote:
> >
> > > Thank you for looking into this LC.
> > >
> > > It is my opinion that as long as we think the code to be published is
> > > correct (I am not including the tests), we don't need to make an RC3 and
> > > could publish RC2 as is. If others feel differently, I am happy to make
> > an
> > > RC3 but I likely won't have time until next week (I will be away this
> > > week).
> > >
> > > Andrew
> > >
> > >
> > > On Sun, Aug 21, 2022 at 8:24 PM L. C. Hsieh  wrote:
> > >
> > > > I'm sure that I didn't hit this error when I verified the release.
> > > >
> > > > However, I can reproduce the error now.
> > > >
> > > > It seems due to serde_json 1.0.84+. When I specify versions like
> > > > 1.0.83 or 1.0.82, it can pass.
> > > >
> > > > https://github.com/serde-rs/json/releases/tag/v1.0.84 makes some
> > > > changes to Debug of serde_json::Value.
> > > > As the test failure is the difference of error message, I think it
> > > > won't affect the functionalities.
> > > >
> > > > Submitted a PR to fix the test:
> > > > https://github.com/apache/arrow-rs/pull/2546
> > > >
> > > > Not sure if we should cut RC3 for that.
> > > >
> > > > On Sun, Aug 21, 2022 at 3:11 PM Andy Grove 
> > > wrote:
> > > > >
> > > > > I tried verifying on mac and ubuntu and both failed with:
> > > > >
> > > > > failures:
> > > > >
> > > > >  json::reader::tests::test_row_type_validation stdout 
> > > > > thread 'json::reader::tests::test_row_type_validation' panicked at
> > > > > 'assertion failed: `(left == right)`
> > > > >   left: `"Json error: Expected JSON record to be an object, found
> > Array
> > > > > [Number(1), String(\"hello\")]"`,
> > > > >  right: `"Json error: Expected JSON record to be an object, found
> > > > > Array([Number(1), String(\"hello\")])"`', src/json/reader.rs:2624:9
> > > > >
> > > > >
> > > > > failures:
> > > > > json::reader::tests::test_row_type_validation
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Aug 18, 2022 at 3:23 PM Andrew Lamb 
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I would like to propose a release of Apache Arrow Rust
> > > Implementation,
> > > > > > version 21.0.0.
> > > > > >
> > > > > > This release candidate is based on commit:
> > > > > > 68934f0fa5da62ae9fcbb753d1dfd8f672a4eb1e [1]
> > > > > >
> > > > > > The proposed release tarball and signatures are hosted at [2].
> > > > > >
> > > > > > The changelog is located at [3].
> > > > > >
> > > > > > Please download, verify checksums and signatures, run the unit
> > tests,
> > > > > > and vote on the release. There is a script [4] that automates some
> > of
> > > > > > the verification.
> > > > > >
> > > > > > Please note
> > > > > > 1. The verification script has changed since last release [5][6]
> > > > > > 2. This is RC 2 (the first release candidate has problems in the
> > > > > > verification script -- see [6])
> > > > > >
> > > > > > The vote will be open for at least 72 hours.
> > > > > >
> > > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > > [ ] +0
> > > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/arrow-rs/tree/68934f0fa5da62ae9fcbb753d1dfd8f672a4eb1e
> > > > > > [2]:
> > > > > >
> > > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-21.0.0-rc2
> > > > > > [3]:
> > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/arrow-rs/blob/68934f0fa5da62ae9fcbb753d1dfd8f672a4eb1e/CHANGELOG.md
> > > > > > [4]:
> > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > > > [5]: https://github.com/apache/arrow-rs/pull/2339
> > > > > > [6]: https://github.com/apache/arrow-rs/pull/2506
> > > > > >
> > > >
> > >
> >
>
>
> --
> thanks
> ashish


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 11.0.0 RC1

2022-08-17 Thread QP Hou
+1 (binding)

On Wed, Aug 17, 2022 at 9:26 AM Francis Du  wrote:

> +1 (Non-binding)
>
> Verified on MacOS 12.5 / MBA M2
>
> On Wed, 17 Aug 2022 at 16:47, Andrew Lamb  wrote:
>
> > +1 (binding)
> >
> > Thanks Andy and everyone else who contributed to this release -- this
> looks
> > like it will be great!
> >
> > On Tue, Aug 16, 2022 at 1:17 PM Ian Joiner 
> wrote:
> >
> > > +1 (Non-binding)
> > >
> > > Verified on macOS 12.2.1 / Apple M1 Pro
> > >
> > > P.S. If verified with zsh instead of bash we got a command not found
> for
> > > shasum
> > > -a 256 -c on verify_dir_artifact_signatures:10 unless shasums are
> > disabled
> > > which has been happening for a while. Not sure whether we want to fix
> > this.
> > > If so I will file a PR for that.
> > >
> > > On Tue, Aug 16, 2022 at 12:15 PM Andy Grove 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to propose a release of Apache Arrow DataFusion
> > > > Implementation,
> > > > version 11.0.0.
> > > >
> > > > This release candidate is based on commit:
> > > > 8ee31cc69f43a4de0c0678d18a57f27cb4d0ead1 [1]
> > > > The proposed release tarball and signatures are hosted at [2].
> > > > The changelog is located at [3].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > and
> > > > vote
> > > > on the release. The vote will be open for at least 72 hours.
> > > >
> > > > Only votes from PMC members are binding, but all members of the
> > community
> > > > are
> > > > encouraged to test the release and vote with "(non-binding)".
> > > >
> > > > The standard verification procedure is documented at
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > > > .
> > > >
> > > > [ ] +1 Release this as Apache Arrow DataFusion 11.0.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow DataFusion 11.0.0
> because...
> > > >
> > > > Here is my vote:
> > > >
> > > > +1
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/8ee31cc69f43a4de0c0678d18a57f27cb4d0ead1
> > > > [2]:
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-11.0.0-rc1
> > > > [3]:
> > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/8ee31cc69f43a4de0c0678d18a57f27cb4d0ead1/CHANGELOG.md
> > > >
> > >
> >
>
>
> --
> Name : Francis Du
> Email : francisdu...@gmail.com
>


Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.4.0 RC2

2022-08-10 Thread QP Hou
+1 binding!

On Wed, Aug 10, 2022 at 12:31 PM L. C. Hsieh  wrote:

> +1 (non-binding). Verified on Intel Mac.
>
> Thank you, Andrew!
>
> On Wed, Aug 10, 2022 at 7:36 AM Andy Grove  wrote:
> >
> > +1 (binding)
> >
> > Verified on Ubuntu 20.04.4 LTS
> >
> > Thanks, Andrew!
> >
> > On Wed, Aug 10, 2022 at 8:28 AM Andrew Lamb 
> wrote:
> >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow Rust Object
> > > Store Implementation, version 0.4.0.
> > >
> > > This is my first attempt to do a release so please let me know if you
> > > encounter any issues or have any suggestions. Note that the
> verification
> > > script is in a different (but similar) location
> > >
> > > This release candidate is based on commit:
> > > 195d9c5e7aac9f8d88b281af314ff4f822fe [1]
> > >
> > > The proposed release tarball and signatures are hosted at [2].
> > >
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> > > and vote on the release. There is a script [4] that automates some of
> > > the verification.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow Rust Object Store
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow Rust Object Store
> because...
> > >
> > > [1]:
> > >
> > >
> https://github.com/apache/arrow-rs/tree/195d9c5e7aac9f8d88b281af314ff4f822fe
> > > [2]:
> > >
> > >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.4.0-rc2
> > > [3]:
> > >
> > >
> https://github.com/apache/arrow-rs/blob/195d9c5e7aac9f8d88b281af314ff4f822fe/object_store/CHANGELOG.md
> > > [4]:
> > >
> > >
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
> > >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 20.0.0 RC2

2022-08-06 Thread QP Hou
+1 (binding)

On Sat, Aug 6, 2022 at 8:01 AM Andy Grove  wrote:

> +1 (binding).
> Verified on Ubuntu 20.04 LTS
>
> Thanks, Andrew!
>
> On Sat, Aug 6, 2022 at 8:00 AM Remzi Yang <1371656737...@gmail.com> wrote:
>
> > +1 (non-binding)
> > Verified on M1 Mac. Thank you, Andrew!
> >
> > Best Regards,
> > Remzi
> >
> > On Sat, 6 Aug 2022 at 16:50, Kun Liu  wrote:
> >
> > > Hi all,
> > >   Verified on my Intel Mac.
> > >   But I have a question about the public API from this issue:
> > >   https://github.com/apache/arrow-rs/issues/2343
> > >
> > > Thanks,
> > > Kun
> > >
> > >
> > >
> > > L. C. Hsieh  于2022年8月6日周六 10:30写道:
> > >
> > > > +1 (non-binding). Verified on Intel Mac.
> > > >
> > > > Thanks, Andrew!
> > > >
> > > > On Fri, Aug 5, 2022 at 2:08 PM Andrew Lamb 
> > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I would like to propose a release of Apache Arrow Rust
> > Implementation,
> > > > > version 20.0.0. Note it includes the new object_store source as
> > > described
> > > > > in [5]. I don't think it is a release blocker but would welcome
> other
> > > > > opinions
> > > > >
> > > > > Also note this is RC2 (RC1 had an issue [6])
> > > > >
> > > > > This release candidate is based on commit:
> > > > > 30c94dbf1c422f81f8520b9956e96ab7b53c3f47 [1]
> > > > >
> > > > > The proposed release tarball and signatures are hosted at [2].
> > > > >
> > > > > The changelog is located at [3].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit
> tests,
> > > > > and vote on the release. There is a script [4] that automates some
> of
> > > > > the verification.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > >
> > > > > [1]:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/tree/30c94dbf1c422f81f8520b9956e96ab7b53c3f47
> > > > > [2]:
> > > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-20.0.0-rc2
> > > > > [3]:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/30c94dbf1c422f81f8520b9956e96ab7b53c3f47/CHANGELOG.md
> > > > > [4]:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > > [5]: https://github.com/apache/arrow-rs/issues/2338
> > > > > [6]: https://github.com/apache/arrow-rs/pull/2340
> > > >
> > >
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust 19.0.0 RC1

2022-07-23 Thread QP Hou
+1 (binding)

On Fri, Jul 22, 2022 at 5:01 PM L. C. Hsieh  wrote:
>
> +1 (non-binding)
>
> Tested on Intel Macbook.
>
> Thanks Andrew!
>
> On Fri, Jul 22, 2022 at 12:34 PM Ian Joiner  wrote:
> >
> > +1 (Non-binding)
> >
> > Tested on a MacBook Pro with macOS 12.2.1 & Apple M1 Pro chip
> >
> > On Fri, Jul 22, 2022 at 12:56 PM Andrew Lamb  wrote:
> >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow Rust Implementation,
> > > version 19.0.0.
> > >
> > > This release candidate is based on commit:
> > > c3e019f3011a902a344758969b1cfc3604f3c2d7 [1]
> > >
> > > The proposed release tarball and signatures are hosted at [2].
> > >
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> > > and vote on the release. There is a script [4] that automates some of
> > > the verification.
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow Rust
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > >
> > > [1]:
> > >
> > > https://github.com/apache/arrow-rs/tree/c3e019f3011a902a344758969b1cfc3604f3c2d7
> > > [2]:
> > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-19.0.0-rc1
> > > [3]:
> > >
> > > https://github.com/apache/arrow-rs/blob/c3e019f3011a902a344758969b1cfc3604f3c2d7/CHANGELOG.md
> > > [4]:
> > >
> > > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > -
> > >


Re: [VOTE] Accept donation of Rust Object Store donation

2022-07-18 Thread QP Hou
+1 (binding)

On Mon, Jul 18, 2022 at 8:55 AM Andrew Lamb  wrote:
>
> Hello,
>
> This vote is to determine if the Arrow PMC is in favor of accepting the
> donation of the Rust Object Store crate to arrow-rs.
>
> We have previously discussed this topic [1] and [2].
>
> The proposed donation is at [3]. IP clearance work is underway in [4] and
> [5]
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 : Accept the donation
> [ ] 0 : No opinion
> [ ] -1 : Reject donation because...
>
> My vote: +1
>
> Thank you
> Andrew
>
> [1] https://lists.apache.org/thread/22mptsrm9kjkjyrt5sz8dby8rroj87o2
> [2] https://github.com/apache/arrow-rs/issues/2030
> [3] https://github.com/apache/arrow-rs/pull/2081
> [4] https://github.com/apache/arrow-rs/issues/2096
> [5]
> https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/arrow-rust-object-store.xml


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 10.0.0 RC1

2022-07-13 Thread QP Hou
+1 (binding)

On Wed, Jul 13, 2022 at 1:24 PM Andrew Lamb  wrote:
>
> +1 (binding)
>
> Thanks Andy! I know releases are a significant amount of work.
>
> Andrew
>
> On Tue, Jul 12, 2022 at 11:45 AM Andy Grove  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow DataFusion
> > Implementation,
> > version 10.0.0.
> >
> > This release candidate is based on commit:
> > d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> >
> > https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow DataFusion 10.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow DataFusion 10.0.0 because...
> >
> > Here is my vote:
> >
> > +1
> >
> > [1]:
> >
> > https://github.com/apache/arrow-datafusion/tree/d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f
> > [2]:
> >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-10.0.0-rc1
> > [3]:
> >
> > https://github.com/apache/arrow-datafusion/blob/d25e822c1ef85ee7c0297b4b38d05a51b0d2e46f/CHANGELOG.md
> >


Re: [VOTE][RUST] Release Apache Arrow Rust 18.0.0 RC1

2022-07-09 Thread QP Hou
+1 (binding)

On Sat, Jul 9, 2022 at 12:55 PM Chao Sun  wrote:
>
> +1 (non-binding) verified on Intel Mac.
>
> Thanks Andrew.
>
> Chao
>
> On Sat, Jul 9, 2022 at 10:36 AM L. C. Hsieh  wrote:
> >
> > +1 (non-binding)
> >
> > Verified on Intel Mac.
> >
> > Thank you, Andrew!
> >
> > On Sat, Jul 9, 2022 at 9:36 AM Andy Grove  wrote:
> > >
> > > +1 (binding)
> > > Verified on Ubuntu 20.04.4 LTS
> > >
> > > Thanks, Andrew!
> > >
> > >
> > > On Fri, Jul 8, 2022 at 7:11 PM Remzi Yang <1371656737...@gmail.com> wrote:
> > >
> > > > + 1 (non-binding)
> > > > Verified on Mac M1.
> > > >
> > > > Thanks, Andrew!
> > > >
> > > > On Sat, 9 Jul 2022 at 02:55, Andrew Lamb  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to propose a release of Apache Arrow Rust Implementation,
> > > > > version 18.0.0.
> > > > >
> > > > > This release candidate is based on commit:
> > > > > 330505c98c9db4d103b4965e8bd55065961dada7 [1]
> > > > >
> > > > > The proposed release tarball and signatures are hosted at [2].
> > > > >
> > > > > The changelog is located at [3].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit tests,
> > > > > and vote on the release. There is a script [4] that automates some of
> > > > > the verification.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > > https://github.com/apache/arrow-rs/tree/330505c98c9db4d103b4965e8bd55065961dada7
> > > > > [2]:
> > > > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-18.0.0-rc1
> > > > > [3]:
> > > > >
> > > > >
> > > > https://github.com/apache/arrow-rs/blob/330505c98c9db4d103b4965e8bd55065961dada7/CHANGELOG.md
> > > > > [4]:
> > > > >
> > > > >
> > > > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > >
> > > >


Re: [VOTE][RUST] Release Apache Arrow Rust 17.0.0 RC1

2022-06-24 Thread QP Hou
+1 (binding)

On Fri, Jun 24, 2022 at 7:36 PM Remzi Yang <1371656737...@gmail.com> wrote:
>
> +1 (non-binding). Verified on Mac M1.
> Thanks Andrew.
>
> Remzi
>
> On Sat, 25 Jun 2022 at 09:33, Chao Sun  wrote:
>
> > +1 (non-binding). Verified on Intel Mac.
> >
> > Thanks Andrew.
> >
> > On Fri, Jun 24, 2022 at 5:17 PM L. C. Hsieh  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Verified on Intel Mac.
> > >
> > > Thank you, Andrew.
> > >
> > > On Fri, Jun 24, 2022 at 5:00 PM Andy Grove 
> > wrote:
> > > >
> > > > +1 (binding)
> > > >
> > > > Verified on Ubuntu 20.04.4 LTS.
> > > >
> > > > Thanks, Andrew.
> > > >
> > > > On Fri, Jun 24, 2022 at 2:45 PM Andrew Lamb 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to propose a release of Apache Arrow Rust
> > Implementation,
> > > > > version 17.0.0.
> > > > >
> > > > > This release candidate is based on commit:
> > > > > 9f7b6004d365b0c0bac8e30170b49bdd66cc7df0 [1]
> > > > >
> > > > > The proposed release tarball and signatures are hosted at [2].
> > > > >
> > > > > The changelog is located at [3].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit tests,
> > > > > and vote on the release. There is a script [4] that automates some of
> > > > > the verification.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > https://github.com/apache/arrow-rs/tree/9f7b6004d365b0c0bac8e30170b49bdd66cc7df0
> > > > > [2]:
> > > > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-17.0.0-rc1
> > > > > [3]:
> > > > >
> > > > >
> > https://github.com/apache/arrow-rs/blob/9f7b6004d365b0c0bac8e30170b49bdd66cc7df0/CHANGELOG.md
> > > > > [4]:
> > > > >
> > > > >
> > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > > -
> > > > >
> >


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 9.0.0 RC1

2022-06-13 Thread QP Hou
Sorry that I was under the impression you want to propose another RC
including a fix for 2719.

+1 (binding)

On Mon, Jun 13, 2022 at 2:23 PM Andy Grove  wrote:
>
> We need one more PMC vote to be able to release. It is possible that some
> of the PMC members who normally help with DataFusion votes are out on
> vacation so if any other PMC members have time to help with this it would
> be appreciated.
>
> Thanks,
>
> Andy.
>
> On Sat, Jun 11, 2022 at 9:04 AM Andy Grove  wrote:
>
> > The workaround for the test failures is to remove the trailing slash from
> > the paths in the ARROW_TEST_DATA and PARQUET_TEST_DATA environment
> > variables.
> >
> > On Sat, Jun 11, 2022 at 12:16 AM Remzi Yang <1371656737...@gmail.com>
> > wrote:
> >
> >> 0(non-binding)
> >>
> >> Met the same errors as Andrew.
> >>
> >> Regards,
> >> Remzi
> >>
> >>
> >>
> >> On Sat, 11 Jun 2022 at 02:10, Andrew Lamb  wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > The verification script did not complete cleanly[1], but the errors
> >> appear
> >> > to be bugs in the test normalization rather than anything in the code
> >> > itself.
> >> >
> >> > Thanks for preparing the release
> >> >
> >> > Andrew
> >> >
> >> > [1] https://github.com/apache/arrow-datafusion/issues/2719
> >> >
> >> >
> >> > On Fri, Jun 10, 2022 at 11:34 AM Andy Grove 
> >> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I would like to propose a release of Apache Arrow DataFusion
> >> > > Implementation,
> >> > > version 9.0.0.
> >> > >
> >> > > This release candidate is based on commit:
> >> > > 10058f658fe72c1811b5074ad61a723ec4e60abd [1]
> >> > > The proposed release tarball and signatures are hosted at [2].
> >> > > The changelog is located at [3].
> >> > >
> >> > > Please download, verify checksums and signatures, run the unit tests,
> >> and
> >> > > vote
> >> > > on the release. The vote will be open for at least 72 hours.
> >> > >
> >> > > Only votes from PMC members are binding, but all members of the
> >> community
> >> > > are
> >> > > encouraged to test the release and vote with "(non-binding)".
> >> > >
> >> > > The standard verification procedure is documented at
> >> > >
> >> > >
> >> >
> >> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> >> > > .
> >> > >
> >> > > [ ] +1 Release this as Apache Arrow DataFusion 9.0.0
> >> > > [ ] +0
> >> > > [ ] -1 Do not release this as Apache Arrow DataFusion 9.0.0 because...
> >> > >
> >> > > Here is my vote:
> >> > >
> >> > > +1
> >> > >
> >> > > [1]:
> >> > >
> >> > >
> >> >
> >> https://github.com/apache/arrow-datafusion/tree/10058f658fe72c1811b5074ad61a723ec4e60abd
> >> > > [2]:
> >> > >
> >> > >
> >> >
> >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-9.0.0-rc1
> >> > > [3]:
> >> > >
> >> > >
> >> >
> >> https://github.com/apache/arrow-datafusion/blob/10058f658fe72c1811b5074ad61a723ec4e60abd/CHANGELOG.md
> >> > >
> >> >
> >>
> >


Re: [VOTE][RUST] Release Apache Arrow Rust 16.0.0 RC1

2022-06-10 Thread QP Hou
+1 (binding)

On Fri, Jun 10, 2022 at 11:27 AM Andy Grove  wrote:
>
> +1 (binding)
>
> Verified on Ubuntu 20.04.4 LTS
>
> Thanks, Andrew.
>
> On Fri, Jun 10, 2022 at 11:54 AM Andrew Lamb  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Implementation,
> > version 16.0.0.
> >
> > This release candidate is based on commit:
> > c396dfb5035d22e57717b6dd365486b76eb611bc [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust  because...
> >
> > [1]:
> >
> > https://github.com/apache/arrow-rs/tree/c396dfb5035d22e57717b6dd365486b76eb611bc
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-16.0.0-rc1
> > [3]:
> >
> > https://github.com/apache/arrow-rs/blob/c396dfb5035d22e57717b6dd365486b76eb611bc/CHANGELOG.md
> > [4]:
> >
> > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >


Re: [VOTE][RUST] Release Apache Arrow Rust 15.0.0 RC1

2022-05-27 Thread QP Hou
+1 (binding)

Thank you Andrew!

On Fri, May 27, 2022 at 10:51 AM L. C. Hsieh  wrote:
>
> +1 (non-binding)
> Verified on Intel Mac.
>
> Thank you, Andrew!
>
> On Fri, May 27, 2022 at 9:41 AM Chao Sun  wrote:
> >
> > +1 (non-binding) verified on Intel Mac.
> >
> > Thanks Andrew!
> >
> > On Fri, May 27, 2022 at 6:35 AM Andy Grove  wrote:
> > >
> > > +1 (binding)
> > >
> > > Verified on Ubuntu 20.04.4 LTS
> > >
> > > Thanks, Andrew!
> > >
> > > On Fri, May 27, 2022 at 7:10 AM Remzi Yang <1371656737...@gmail.com> 
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Verified on MacOS 12.2 M1 pro.
> > > >
> > > > Thank you, Andrew.
> > > >
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Remzi
> > > >
> > > > On Fri, 27 May 2022 at 19:36, Andrew Lamb  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to propose a release of Apache Arrow Rust Implementation,
> > > > > version 15.0.0.
> > > > >
> > > > > This release candidate is based on commit:
> > > > > 6e6a9e148c0f50ed1e0eeebd667245e67d7826b6 [1]
> > > > >
> > > > > The proposed release tarball and signatures are hosted at [2].
> > > > >
> > > > > The changelog is located at [3].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit tests,
> > > > > and vote on the release. There is a script [4] that automates some of
> > > > > the verification.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > > https://github.com/apache/arrow-rs/tree/6e6a9e148c0f50ed1e0eeebd667245e67d7826b6
> > > > > [2]:
> > > > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-15.0.0-rc1
> > > > > [3]:
> > > > >
> > > > >
> > > > https://github.com/apache/arrow-rs/blob/6e6a9e148c0f50ed1e0eeebd667245e67d7826b6/CHANGELOG.md
> > > > > [4]:
> > > > >
> > > > >
> > > > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> > > > >
> > > >


Re: [VOTE] [Rust] Move Ballista to new arrow-ballista repository

2022-05-17 Thread QP Hou
+1 (binding)

On Tue, May 17, 2022 at 1:27 PM David Li  wrote:
>
> +1 (binding)
>
> On Tue, May 17, 2022, at 16:00, Neal Richardson wrote:
> > +1
> >
> > On Tue, May 17, 2022 at 12:46 PM Andrew Lamb  wrote:
> >
> >> +1 (binding)
> >>
> >> On Mon, May 16, 2022 at 9:56 AM Andy Grove  wrote:
> >>
> >> > I would like to propose that we move the Ballista project to a new
> >> > top-level *arrow-ballista* repository.
> >> >
> >> > The rationale for this (copied from the GitHub issue [1]) is:
> >> >
> >> >-
> >> >
> >> >Decouple release process for DataFusion and Ballista
> >> >-
> >> >
> >> >Allow each project to have top-level documentation and user guides
> >> that
> >> >are targeting the appropriate audience
> >> >-
> >> >
> >> >Reduce issue tracking and PR review burden for DataFusion maintainers
> >> >who are not as interested in Ballista
> >> >-
> >> >
> >> >Help avoid accidental circular dependencies being introduced between
> >> the
> >> >projects (such as [3])
> >> >-
> >> >
> >> >Helps formalize the public API for DataFusion that other query engines
> >> >should be using
> >> >
> >> > There is a design docment [3] that outlines the plan for implementing
> >> this.
> >> >
> >> > Only votes from PMC members are binding, but all members of the community
> >> > are encouraged to test the release and vote with "(non-binding)". The
> >> vote
> >> > will run for at least 72 hours.
> >> >
> >> > [ ] +1 Proceed with moving Ballista to a new arrow-ballista repository [
> >> ]
> >> > +0
> >> >
> >> > [ ] -1 Do not proceed with moving Ballista to a new arrow-ballista
> >> > repository because ...
> >> >
> >> > Here is my vote:
> >> >
> >> > +1 (binding)
> >> >
> >> > [1] https://github.com/apache/arrow-datafusion/issues/2502
> >> >
> >> > [2] https://github.com/apache/arrow-datafusion/issues/2433
> >> >
> >> > [3]
> >> >
> >> >
> >> https://docs.google.com/document/d/1jNRbadyStSrV5kifwn0khufAwq6OnzGczG4z8oTQJP4/edit?usp=sharing
> >> >
> >>


Re: [Rust] Issues with signing release

2022-05-16 Thread QP Hou
Hi Andy,

You are correct that those are alternative options. You actually had
your key correctly added to the KEYS file. I believe the issue is that
your key is only self-signed, so it cannot be verified through
Andrew's web of trust. See key signing party instructions at:
https://infra.apache.org/release-signing.html#key-signing-party.

Thanks,
QP

On Fri, May 13, 2022 at 6:47 AM Andy Grove  wrote:
>
> As Andrew notes in the current VOTE thread for DataFusion 8.0.0-rc2, there
> is an issue with the key I used to sign the release:
>
> gpg: Good signature from "Andy Grove " [unknown]
> gpg: WARNING: This key is not certified with a trusted signature!
> gpg:  There is no indication that the signature belongs to the
>
> I found the current documentation a little lacking so could use some
> guidance on what I need to do, and I can then better document this in the
> repo.
>
> The KEYS file has this header:
>
> Users: pgp < KEYS
>   gpg --import KEYS
> Developers:
>   pgp -kxa  and append it to this file.
>   (pgpk -ll  && pgpk -xa ) >> this file.
>   (gpg --list-sigs 
> && gpg --armor --export ) >> this file.
>
> Was I supposed to run both the pgp and gpg commands in the developer
> section? I perhaps naively assumed these were alternate options and I just
> ran the following:
>
> (gpg --list-sigs "Andy Grove" && gpg --armor --export "Andy Grove") >> KEYS
> svn commit KEYS -m "Add key for Andy Grove"
>
> Also, It wasn't immediately obvious to me how to install "pgpk" on Ubuntu.
>
> Were there other steps that I have missed?
>
> Thanks,
>
> Andy.


Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC2

2022-05-15 Thread QP Hou
+1 binding.

Thanks Andy for driving this release.

On Fri, May 13, 2022 at 4:57 AM Andy Grove  wrote:
>
> Hi,
>
> I would like to propose a release of Apache Arrow DataFusion Implementation,
> version 8.0.0.
>
> This release candidate is based on commit:
> b9f6e6b7c353c1109bd7b306008e006db29b46f8 [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests, and
> vote
> on the release. The vote will be open for at least 72 hours.
>
> Only votes from PMC members are binding, but all members of the community
> are
> encouraged to test the release and vote with "(non-binding)".
>
> The standard verification procedure is documented at
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> .
>
> [ ] +1 Release this as Apache Arrow DataFusion 8.0.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow DataFusion 8.0.0 because...
>
> Here is my vote:
>
> +1 (binding)
>
> [1]:
> https://github.com/apache/arrow-datafusion/tree/b9f6e6b7c353c1109bd7b306008e006db29b46f8
> [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-8.0.0-rc2
> [3]:
> https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/CHANGELOG.md


Re: [VOTE][RUST] Release Apache Arrow Rust 14.0.0 RC1

2022-05-14 Thread QP Hou
+1 (binding)

On Fri, May 13, 2022 at 11:12 AM Andrew Lamb  wrote:
>
> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 14.0.0.
>
> This release candidate is based on commit:
> 33e298444f251258dd289c8377c68a80925ab0b4 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
> https://github.com/apache/arrow-rs/tree/33e298444f251258dd289c8377c68a80925ab0b4
> [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-14.0.0-rc1
> [3]:
> https://github.com/apache/arrow-rs/blob/33e298444f251258dd289c8377c68a80925ab0b4/CHANGELOG.md
> [4]:
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 7.1.0 RC1

2022-04-17 Thread QP Hou
+1 (binding)

On Sun, Apr 17, 2022 at 5:13 AM Rich  wrote:
>
> +1 (non-binding)
> verified at macOS 12.2.1, Intel i7
>
> Thanks,
> Rich (jychen7)
>
>
> Sent with ProtonMail secure email.
> --- Original Message ---
> On Tuesday, April 12th, 2022 at 4:44 PM, Andrew Lamb  
> wrote:
>
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Datafusion Implementation,
> > version 7.1.0.
> >
> > You can find additional information about this release at [4]. It is
> > primarily intended to address a small issue in [5], and will only result in
> > a `datafusion` release to crates.io (no change to `datafusion-expr` or
> > `datafusion-common`).
> >
> > This release candidate is based on commit:
> > f2fcb80f8727c6b9620e2a84629d3e45d8c0e8f7 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > Only votes from PMC members are binding, but all members of the community
> > are
> > encouraged to test the release and vote with "(non-binding)".
> >
> > The standard verification procedure is documented at
> > https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > .
> >
> > [ ] +1 Release this as Apache Arrow Datafusion 7.1.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Datafusion 7.1.0 because...
> >
> > [1]:
> > https://github.com/apache/arrow-datafusion/tree/f2fcb80f8727c6b9620e2a84629d3e45d8c0e8f7
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-7.1.0-rc1
> > [3]:
> > https://github.com/apache/arrow-datafusion/blob/f2fcb80f8727c6b9620e2a84629d3e45d8c0e8f7/CHANGELOG.md
> > [4] https://github.com/apache/arrow-datafusion/issues/2095
> > [5] https://github.com/apache/arrow-datafusion/pull/2159
> > -


Re: [VOTE][RUST] Release Apache Arrow Rust 11.1.0 RC1

2022-04-02 Thread QP Hou
+1 (binding), thanks Andrew!

On Fri, Apr 1, 2022 at 8:26 AM Andrew Lamb  wrote:
>
> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 11.1.0.
>
> This release candidate is based on commit:
> eb6b7c65f49794f54d9b11a632172ffa13783ff2 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3] and it has a nice set of features .
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
> https://github.com/apache/arrow-rs/tree/eb6b7c65f49794f54d9b11a632172ffa13783ff2
> [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-11.1.0-rc1
> [3]:
> https://github.com/apache/arrow-rs/blob/eb6b7c65f49794f54d9b11a632172ffa13783ff2/CHANGELOG.md
> [4]:
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh


Re: [VOTE][RUST] Release Apache Arrow Rust 11.0.0 RC1

2022-03-18 Thread QP Hou
+1 (binding)
Thanks,
QP Hou

On Fri, Mar 18, 2022 at 1:01 AM Andrew Lamb  wrote:
>
> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 11.0.0.
>
> This release candidate is based on commit:
> 5d6b638111e3f9c72dc8504ea98e46914fc93af5 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
> https://github.com/apache/arrow-rs/tree/5d6b638111e3f9c72dc8504ea98e46914fc93af5
> [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-11.0.0-rc1
> [3]:
> https://github.com/apache/arrow-rs/blob/5d6b638111e3f9c72dc8504ea98e46914fc93af5/CHANGELOG.md
> [4]:
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> -


Re: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, and Kun Liu

2022-03-09 Thread QP Hou
Congratulations to all, well deserved!

On Wed, Mar 9, 2022 at 9:37 AM Daniël Heres  wrote:
>
> Congratulations!
>
> On Wed, Mar 9, 2022, 18:26 LM  wrote:
>
> > Congrats to you all!
> >
> > On Wed, Mar 9, 2022 at 9:19 AM Chao Sun  wrote:
> >
> > > Congrats all!
> > >
> > > On Wed, Mar 9, 2022 at 9:16 AM Micah Kornfield 
> > > wrote:
> > > >
> > > > Congrats!
> > > >
> > > > On Wed, Mar 9, 2022 at 8:36 AM Weston Pace 
> > > wrote:
> > > >
> > > > > Congratulations to all of you!
> > > > >
> > > > > On Wed, Mar 9, 2022, 4:52 AM Matthew Turner <
> > > matthew.m.tur...@outlook.com>
> > > > > wrote:
> > > > >
> > > > > > Congrats all and thank you for your contributions! It's been great
> > to
> > > > > work
> > > > > > with and learn from you all.
> > > > > >
> > > > > > -Original Message-
> > > > > > From: Andrew Lamb 
> > > > > > Sent: Wednesday, March 9, 2022 8:59 AM
> > > > > > To: dev 
> > > > > > Subject: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies,
> > Wang
> > > > > > Xudong, Yijie Shen, and Kun Liu
> > > > > >
> > > > > > On behalf of the Arrow PMC, I'm happy to announce that
> > > > > >
> > > > > > Raphael Taylor-Davies
> > > > > > Wang Xudong
> > > > > > Yijie Shen
> > > > > > Kun Liu
> > > > > >
> > > > > > Have all accepted invitations to become committers on Apache Arrow!
> > > > > > Welcome, thank you for all your contributions so far, and we look
> > > forward
> > > > > > to continuing to drive Apache Arrow forward to an even better place
> > > in
> > > > > the
> > > > > > future.
> > > > > >
> > > > > > This exciting growth in committers mirrors the growth of the Arrow
> > > Rust
> > > > > > community.
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > > p.s. sorry for the somewhat impersonal email; I was trying to avoid
> > > > > > several very similar emails. I am truly excited for each of these
> > > > > > individuals.
> > > > > >
> > > > >
> > >
> >


Re: [Rust] DataFusion + Substrait

2022-03-07 Thread QP Hou
I am also very excited for this, especially the possibility of
leveraging it in Ballista. Great work Andy!

On Mon, Mar 7, 2022 at 8:31 AM Andy Grove  wrote:
>
> I created a new repo in the datafusion-contrib GitHub org over the weekend
> with a starting point for supporting DataFusion as both a producer and
> consumer of Substrait plans.
>
> https://github.com/datafusion-contrib/datafusion-substrait
>
> I am hopeful that we can eventually use Substrait in Ballista as a
> replacement for the current query plan protobuf format, meaning that the
> Ballista scheduler could potentially be used with engines other than
> DataFusion.
>
> I also think it could be helpful with in-memory language interoperability,
> such as passing query plans between Python and Rust.
>
> I plan on continuing to merge my own PRs here as I flesh out more of this,
> at least until there are other contributors.
>
> Thanks,
>
> Andy.


Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1

2022-03-07 Thread QP Hou
+1 (binding). Thanks Andrew.

On Mon, Mar 7, 2022 at 9:17 AM Chao Sun  wrote:
>
> +1 (non-binding) verified on Mac. Thanks Andrew!
>
> On Mon, Mar 7, 2022 at 7:47 AM Matthew Turner
>  wrote:
> >
> > +1 (non-binding) after running release verification script on M1 Mac.
> >
> > Thanks, Andrew.
> >
> > From: Andy Grove 
> > Date: Monday, March 7, 2022 at 10:00 AM
> > To: dev 
> > Subject: Re: [VOTE][RUST] Release Apache Arrow Rust 10.0.0 RC1
> > +1 (binding)
> >
> > Verified on Ubuntu 20.04.3 LTS
> >
> > On Mon, Mar 7, 2022 at 6:52 AM Kun Liu  wrote:
> >
> > > I have tested it in the mac and got "Release candidate looks good!"
> > > message.
> > > The ut passed in my mac.
> > >
> > > +1 non-binding.
> > >
> > > Thanks,
> > > Kun
> > >
> > > R
> > >
> > > Wang Xudong  于2022年3月5日周六 22:00写道:
> > >
> > > > +1 non-binding
> > > >
> > > > Test on macOS, "Release candidate looks good!"
> > > > Thank you alamb!
> > > >
> > > > ---
> > > > xudong
> > > >
> > > >
> > > >
> > > > Andrew Lamb  于2022年3月5日周六 20:06写道:
> > > >
> > > > > Salutations Arrow Rust Community,
> > > > >
> > > > > I would like to propose a release of Apache Arrow Rust Implementation,
> > > > > version 10.0.0. As previously discussed[5]  the "Integration Test" CI
> > > is
> > > > > failing[6], but I we have determined it is a bug in the test, not in
> > > the
> > > > > code itself and have a fix ready [7]
> > > > >
> > > > > This release candidate is based on commit:
> > > > > a7bd09abde0010a58d0cd0557384df5aadba83ac [1]
> > > > >
> > > > > The proposed release tarball and signatures are hosted at [2].
> > > > >
> > > > > The changelog is located at [3].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit tests,
> > > > > and vote on the release. There is a script [4] that automates some of
> > > > > the verification.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow Rust
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow Rust  because...
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Ftree%2Fa7bd09abde0010a58d0cd0557384df5aadba83acdata=04%7C01%7C%7C8db71e50f13d468fcd2b08da004b4654%7C84df9e7fe9f640afb435%7C1%7C0%7C637822620540523542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=66NubVTy23HM%2B8coRtaWHAvQWYPQDMEvfNeBTHyc40E%3Dreserved=0
> > > > > [2]:
> > > > >
> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Farrow%2Fapache-arrow-rs-10.0.0-rc1data=04%7C01%7C%7C8db71e50f13d468fcd2b08da004b4654%7C84df9e7fe9f640afb435%7C1%7C0%7C637822620540523542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=fcoy2Ee7qx0UdQo504491hgeIL9%2Fekbnz35J4CgduuQ%3Dreserved=0
> > > > > [3]:
> > > > >
> > > > >
> > > >
> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fa7bd09abde0010a58d0cd0557384df5aadba83ac%2FCHANGELOG.mddata=04%7C01%7C%7C8db71e50f13d468fcd2b08da004b4654%7C84df9e7fe9f640afb435%7C1%7C0%7C637822620540523542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=BTjlJ7euXemTh%2BV0Bw90iCGdhvBJCKWS50dQm8AhXjQ%3Dreserved=0
> > > > > [4]:
> > > > >
> > > > >
> > > >
> > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Fdev%2Frelease%2Fverify-release-candidate.shdata=04%7C01%7C%7C8db71e50f13d468fcd2b08da004b4654%7C84df9e7fe9f640afb435%7C1%7C0%7C637822620540523542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=QXf6z4OLE0cqbOJpzHCB2JYe5AULep4PgABZFbBCjFs%3Dreserved=0
> > > > > [5]: 
> > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fkrpmkmpj979xwfyglzpqzdj8m7bxo8sddata=04%7C01%7C%7C8db71e50f13d468fcd2b08da004b4654%7C84df9e7fe9f640afb435%7C1%7C0%7C637822620540523542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=A3MWRMK%2BiyYkrXLpRmsfqPuB54A6c0XRBRafsbXRlSo%3Dreserved=0
> > > > > [6]: 
> > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fissues%2F1398data=04%7C01%7C%7C8db71e50f13d468fcd2b08da004b4654%7C84df9e7fe9f640afb435%7C1%7C0%7C637822620540523542%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2F58HIgfIvXL5Fk9YJrCFl%2FsCBvXkvJU1BNzOT2c6Ifw%3Dreserved=0
> > > > > [7]: 
> > > > > 

Re: [ANNOUNCE] New Arrow committer: Jacob Quinn

2022-02-24 Thread QP Hou
Congratulations and welcome Jacob!

Thanks,
QP Hou

On Thu, Feb 24, 2022 at 4:06 PM Ian Joiner  wrote:
>
> Congrats Jacob!
>
> On Thursday, February 24, 2022, David Li  wrote:
>
> > Congrats & welcome Jacob!
> >
> > On Thu, Feb 24, 2022, at 17:39, Rok Mihevc wrote:
> > > Congratulations Jacob!
> > >
> > > Rok
> > >
> > > On Thu, Feb 24, 2022 at 11:13 PM Micah Kornfield 
> > wrote:
> > >>
> > >> Congrats Jacob!
> > >>
> > >> On Thu, Feb 24, 2022 at 1:50 PM Wes McKinney 
> > wrote:
> > >>
> > >> > On behalf of the Arrow PMC, I'm happy to announce that Jacob Quinn has
> > >> > accepted an invitation to become a committer on Apache Arrow. Welcome,
> > >> > and thank you for your contributions!
> > >> >
> > >> > Wes
> > >> >
> >


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 7.0.0 RC2

2022-02-14 Thread QP Hou
+1 non-binding, went release script on Linux arm64.

It failed at the end when executing the `cargo publish --dry-run`
command, but I think this is expected because the datafusion core is
now depending on the `datafusion-common` crate. We should probably
update the verification script to run publish dry run on
datafusion-common instead.


On Mon, Feb 14, 2022 at 6:53 AM Andrew Lamb  wrote:
>
> Greetings,
>
> I would like to propose a release of Apache Arrow Datafusion Implementation,
> version 7.0.0.
>
> This release candidate is based on commit [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
>
> Note this release does NOT include python bindings or ballista (which
> can hopefully be released separately). More detail on release
> coordination be found at [8].
>
> In particular, I believe it would be beneficial for downstream projects
> (such as datafusion-python [4], and datafusion-objectstore-s3 [5]) to
> test that this release candidate works for their needs
>
> Please download, verify checksums and signatures, run your tests, and
> vote on the release. The vote will be open for at least 72 hours.
>
> Only votes from PMC members are binding, but all members of the community
> are
> encouraged to test the release and vote with "(non-binding)".
>
> The standard verification procedure is documented at [6]. Note there
> were changes to the verification scripts for this release [7] related
> to no longer publishing the python bindings and breaking datafusion
> into multiple smaller crates.
>
>
> [ ] +1 Release this as Apache Arrow Datafusion 7.0.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Datafusion 7.0.0 because...
>
> [1]:
> https://github.com/apache/arrow-datafusion/tree/ca765d54dda6114da55ece8d876c042eca3ea870
> [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-7.0.0-rc2
> [3]:
> https://github.com/apache/arrow-datafusion/blob/ca765d54dda6114da55ece8d876c042eca3ea870/CHANGELOG.md
> [4]: https://github.com/datafusion-contrib/datafusion-python
> [5]: https://github.com/datafusion-contrib/datafusion-objectstore-s3
> [6]:
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> [7]: https://github.com/apache/arrow-datafusion/pull/1830
> [8]: https://github.com/apache/arrow-datafusion/issues/1587


Re: Managing usage of the @ApacheArrow Twitter handle and other social media

2022-02-01 Thread QP Hou
To be clear I haven't used that twitter-together action myself, was
just using it as an example for how such a workflow could be set up. I
imagine it won't be too much work for us to write our own action if
needed.

Thanks,
QP Hou

On Tue, Feb 1, 2022 at 5:31 AM Neal Richardson
 wrote:
>
> I like the idea too. If we were to use the github action to send tweets, we
> would need to get INFRA to add it to the allowlist since they restrict
> which actions we can use. I don't think it's a blocker, just would add some
> extra time in the process of getting it set up.
>
> Neal
>
>
> On Tue, Feb 1, 2022 at 5:25 AM Alessandro Molina <
> alessan...@ursacomputing.com> wrote:
>
> > I never used https://github.com/gr2m/twitter-together previously, in the
> > past I used Hootsuite to set up approval workflows, but I think that the
> > idea of setting up a workflow through github PRs looks like a good idea. It
> > would be able to leverage committer/pmc membership to merge the PRs and
> > would allow anyone to contribute with social media content.
> >
> > On Tue, Feb 1, 2022 at 12:43 AM QP Hou  wrote:
> >
> > > I don't know how other projects manage this, but one solution we could
> > > evaluate is using github PRs to manage the twitter account. For
> > > example, here is a github action that does exactly this
> > > https://github.com/gr2m/twitter-together.
> > >
> > > On Mon, Jan 31, 2022 at 3:14 PM Wes McKinney 
> > wrote:
> > > >
> > > > hi all,
> > > >
> > > > The project is approaching it's 6th birthday and we have come a long
> > way!
> > > >
> > > > We have a relatively seldom-used Twitter handle
> > > > twitter.com/ApacheArrow and only a handful of people in the community
> > > > have access to it. I know that Jacques and I do, but I am not sure who
> > > > else.
> > > >
> > > > I wanted to discuss a few things:
> > > >
> > > > * Giving more committers/PMC members access to the Twitter handle — I
> > > > think clearly there should be more people with access (I tweet through
> > > > TweetDeck, e.g. I just posted about a newly posted blog post)
> > > > * Consider if there are any other social media channels where we might
> > > > want to promote Arrow content
> > > > * Discuss a social media policy more broadly for the project
> > > >
> > > > On the latter point, my feelings are:
> > > >
> > > > * Promote content and usage of Apache Arrow, but not companies or
> > > > products (Apache projects are independent)
> > > > * Provide a way for the community to submit ideas/materials for social
> > > media
> > > >
> > > > Does anyone know if other ASF projects have policies/conventions about
> > > > how they decide how to use their social media properties to best serve
> > > > the community?
> > > >
> > > > Thanks,
> > > > Wes
> > >
> >


Re: Managing usage of the @ApacheArrow Twitter handle and other social media

2022-01-31 Thread QP Hou
I don't know how other projects manage this, but one solution we could
evaluate is using github PRs to manage the twitter account. For
example, here is a github action that does exactly this
https://github.com/gr2m/twitter-together.

On Mon, Jan 31, 2022 at 3:14 PM Wes McKinney  wrote:
>
> hi all,
>
> The project is approaching it's 6th birthday and we have come a long way!
>
> We have a relatively seldom-used Twitter handle
> twitter.com/ApacheArrow and only a handful of people in the community
> have access to it. I know that Jacques and I do, but I am not sure who
> else.
>
> I wanted to discuss a few things:
>
> * Giving more committers/PMC members access to the Twitter handle — I
> think clearly there should be more people with access (I tweet through
> TweetDeck, e.g. I just posted about a newly posted blog post)
> * Consider if there are any other social media channels where we might
> want to promote Arrow content
> * Discuss a social media policy more broadly for the project
>
> On the latter point, my feelings are:
>
> * Promote content and usage of Apache Arrow, but not companies or
> products (Apache projects are independent)
> * Provide a way for the community to submit ideas/materials for social media
>
> Does anyone know if other ASF projects have policies/conventions about
> how they decide how to use their social media properties to best serve
> the community?
>
> Thanks,
> Wes


Re: [ANNOUNCE] New Arrow PMC chair: Kouhei Sutou

2022-01-25 Thread QP Hou
Congrats Kou, very well deserved.

On Tue, Jan 25, 2022 at 9:53 AM Benson Muite  wrote:
>
> Congratulations Kou!
> On 1/25/22 8:44 PM, Vibhatha Abeykoon wrote:
> > Congrats Kou!
> >
> >
> > On Tue, Jan 25, 2022 at 11:13 PM Ian Joiner  wrote:
> >
> >> Congrats Kou!
> >>
> >> On Tuesday, January 25, 2022, Wes McKinney  wrote:
> >>
> >>> I am pleased to announce that we have a new PMC chair and VP as per
> >>> our newly started tradition of rotating the chair once a year. I have
> >>> resigned and Kouhei was duly elected by the PMC and approved
> >>> unanimously by the board. Please join me in congratulating Kou!
> >>>
> >>> Thanks,
> >>> Wes
> >>>
> >>
> >
>


Re: [RUST][DataFusion][Arrow] Switching DataFusion to use arrow2 implementation and the future of arrow

2022-01-19 Thread QP Hou
I suggest we respect Jorge's decision and give him more time/space.
Considering he spent 6 months last year trying to donate arrow2 to ASF
and couldn't get it through, I can see why he doesn't want to retry
the same effort in a short period of time. Especially when he is
working on all of these in his limited free time. I trust him to make
decisions that will best push the project forward on a technical
front.

As for arrow-rs, I think Andy made a good comment in the github issue
suggesting that if there are community members who are motivated to
maintain it, we can keep it going but probably with a less frequent
release cadence. I agree with Micah that If no one is willing to take
up the work, then a clear communication of maintenance only mode would
better serve the existing arrow-rs users.

On Wed, Jan 19, 2022 at 9:25 PM Micah Kornfield  wrote:
>
> Hi Jorge,
>
> Thanks for your response.
>
> My current hypothesis is that arrow2 will be donated to Apache Arrow, I
> > just don't feel comfortable and have the energy doing so right now.
>
> Aside from  the burden of going through the donation process (maybe other
> PMCs members are willing to help here?), what would make you comfortable
> with the donation?  Are there specific milestones you are waiting for?  If
> the docs on the arrow2 repo [1] are up-to-date it seems like the main
> concern is versioning, and sticking faithfully to sem-ver.  It sounds like
> Rust is going to move out of lock-sync versioning with the
> other implementations. So maybe the gaps are small?
>
> There are advantages and disadvantages to having a project hosted by the
> ASF, so I can understand that it might not be an easy decision.  However,
> as more development is done outside the ASF additional work for IP
> clearance accumulates if the project ends up getting donated.  This seems
> like it might lead to a cycle where the timing is never quite right.
>
> Separately, from an outsider perspective on Rust development, I think a
> community over code approach applies here.  If arrow2 is decided to be the
> best path forward for DataFusion, continuing to expend effort on
> maintaining arrow-rs doesn't seem like a good use of people's time given
> the high degree of overlap.  I think it is a bad result if arrow-rs ends up
> bit-rotting due to lack of maintainers (i.e. I think it would be better to
> intentionally deprecate it).
>
> Hopefully consensus can be reached among the Rust maintainers and we can
> have clear messaging for users over the future of both projects.
>
> Cheers,
> Micah
>
>  [1]
> https://github.com/jorgecarleitao/arrow2#any-plans-to-merge-with-the-apache-arrow-project
>
> On Wed, Jan 19, 2022 at 10:19 AM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > Hi,
> >
> > Thank you for raising this here and for your comments. I am very humbled by
> > the feedback and adoption that arrow2 got so far.
> >
> > My current hypothesis is that arrow2 will be donated to Apache Arrow, I
> > just don't feel comfortable and have the energy doing so right now.
> >
> > Thank you for your understanding,
> > Jorge
> >
> >
> > On Mon, Jan 17, 2022 at 6:21 PM Wes McKinney  wrote:
> >
> > > Sounds good, thanks all and look forward to hearing more about this.
> > >
> > > To second what Micah said, a reminder to please engage with civility.
> > > The ASF's code of conduct is found here [1]. We are all volunteering
> > > our time to try to do what is best for the developer and user
> > > communities' long-term health and success.
> > >
> > > [1]: https://www.apache.org/foundation/policies/conduct.html
> > >
> > >
> > > On Mon, Jan 17, 2022 at 6:07 AM Andrew Lamb 
> > wrote:
> > > >
> > > > For what it is worth, I personally would likely spend much less time
> > > > maintaining arrow-rs if datafusion switched to arrow2. That discussion
> > is
> > > > happening independently here [1].
> > > >
> > > > [1] https://github.com/apache/arrow-datafusion/issues/1532
> > > >
> > > > On Sun, Jan 16, 2022 at 11:17 PM Micah Kornfield <
> > emkornfi...@gmail.com>
> > > > wrote:
> > > >
> > > > > I agree, Jorge's point of view (and anyone else who has contributed
> > to
> > > > > arrow2) is important here.
> > > > >
> > > > > One thing that isn't exactly clear to me from the linked issue is how
> > > much
> > > > > interest there is in the community for maintaining arrow-rs?  How
> > much
> > > is a
> &g

Re: [RUST][DataFusion][Arrow] Switching DataFusion to use arrow2 implementation and the future of arrow

2022-01-16 Thread QP Hou
Hi Wes,

I believe what you mentioned is the plan, i.e. move arrow2 to ASF in
the long run when it stabalize on its design/API and could benefit
from a more rigorous release process. From what I have seen, the
project is still undergoing major API changes on a monthly basis, so
quick releases and fast user feedback is quite valuable. But let's
hear Jorge's point of view on this first.

On Sun, Jan 16, 2022 at 2:42 PM Wes McKinney  wrote:
>
> Is there a possibility of donating arrow2 to the Arrow project (at
> some point)? The main impact to development would be holding votes on
> releases, but this is probably a good thing long term from a
> governance standpoint. The answer may be "not right now" and that's
> fine. Having many of the same people split across projects with
> different governance structures is less than ideal.
>
> On Fri, Jan 14, 2022 at 1:15 PM Andrew Lamb  wrote:
> >
> > Hello,
> >
> > I wanted to draw your attention to two issues of significance to the Rust
> > Arrow implementations
> >
> > Discussion for switching DataFusion to use arrow2:
> > https://github.com/apache/arrow-datafusion/issues/1532
> >
> > Discussion for what to do with arrow if DataFusion switches to use arrow2:
> > https://github.com/apache/arrow-rs/issues/1176
> >
> > The second is likely the most pertinent to people on this mailing list, but
> > the first is the reason why the second has become important.
> >
> > Andrew


Re: Arrow streaming computation engine

2022-01-11 Thread QP Hou
For datafusion (the Rust engine that Weston mentioned), the community
is about to start building a PoC for streaming engine. The discussion
is happening at
https://github.com/apache/arrow-datafusion/issues/1544.

On Tue, Jan 11, 2022 at 3:29 PM Weston Pace  wrote:
>
> First, note that there are different computation engines in different
> languages.  The Rust implementation has datafusion[1] for example.
> For the rest of this email, I will speak in more detail specifically
> about the C++ computation engine (which I am more familiar with) that
> is in place today.  The C++ engine is documented here[2] although that
> documentation is a little scarce and we are working on an updated
> version[3].
>
> Also note that the docs describe a "Streaming execution engine"
> because it operates on the data in a batch-oriented fashion.  However,
> this doesn't guarantee that it will use a small amount of memory.  For
> example, if you were to request that the engine sort the data then the
> engine may need to cache the entire dataset into memory (in the future
> this may mean spilling into temporary tables as memory runs out) in
> order to fulfill that query (because the very last row you read might
> be the very first row you need to emit).  However, for properly
> constructed queries, the engine should be able to operate as you are
> describing.  The queries you are describing sound to me like what one
> might expect to find in a "time series database" which is another term
> I've heard thrown around.
>
> I am not an expert in time series databases so I don't know the extent
> of the computation required.  However, the example you give (7 day
> rolling mean of daily US stock prices) is not something that could be
> efficiently computed today.  It is something that could be efficiently
> computed once "window functions" are supported.  Window functions[4]
> are a query engine feature that enables the sliding window needed for
> a rolling average.  I believe there are people at Voltron Data that
> are hoping to add support for these window functions to the C++
> streaming execution engine but that is future work that is not
> currently in progress.  That being said, a time series execution
> engine would probably also need to know about indices, statistics,
> whether the data on disk is sorted or not (and by what columns),
> downsampling functions, interpolation functions, etc.  In addition,
> beyond execution / computation there are concerns such as retention
> policies, streaming / appending data to disk, etc.
>
> > So I am wondering if there is
> > a way to design an engine that can satisfy both streaming and batch mode of
> > processing. Or maybe it needs to be seperate engines but we can minimize
> > the amount of duplication?
>
> Regardless of the timeline and plans for window functions the answer
> to this specific question is probably "yes" but I'm not enough of an
> expert in time series processing to answer with certainty.  The
> streaming execution engine in Arrow today is quite generic.  A graph
> of "exec nodes" is constructed.  Data is passed through these exec
> nodes starting from one or more sources and then ending at a sink.
> The sources could be live data to satisfy your request for (3).  The
> plan is currently run very similar to an actor model where batches are
> pushed from one node to another.  I'm hoping to add more support for
> scheduling and backpressure at some point.  Given what I know of the
> types of queries you are describing I think this model should suffice
> to run those queries efficiently.
>
> So, summarizing, I think some of the work we are doing will be useful
> to you (though possibly not sufficient) and it would be a good idea to
> reuse & share where possible.
>
> [1] https://docs.rs/datafusion/latest/datafusion/
> [2] https://arrow.apache.org/docs/cpp/streaming_execution.html
> [3] https://github.com/apache/arrow/pull/12033
> [4] 
> https://medium.com/an-idea/rolling-sum-and-average-window-functions-mysql-7509d1d576e6
>
> On Tue, Jan 11, 2022 at 11:19 AM Li Jin  wrote:
> >
> > Hi,
> >
> > This is a somewhat lengthy email about thoughts around a streaming
> > computation engine for Arrow dataset that I would like to hear feedback
> > from Arrow devs.
> >
> > The main use cases that we are thinking for the streaming engine are time
> > series data, i.e., data arrives in time order (e.g. daily US stock prices)
> > and the query often follows the time order of the data (e.g., compute 7 day
> > rolling mean of daily US stock prices).
> >
> > The main motivations for a streaming engine is (1) performance: always
> > keeps small amount of hot data always in memory and cache (2)
> > memory efficiency: the engine only need to keep small amounts of data in
> > memory, e.g., for the 7 day rolling mean case, the engine never need to
> > keep more than 7 day worth of stock price data, even it is computing this
> > for a stream of 20 year data. (3) Live data application: data arrives 

Re: [ANNOUNCE] New Arrow PMC member: Daniël Heres

2021-12-21 Thread QP Hou
Congrats Daniël! Thank you for all your awesome work on the rust
implementation and datafusion!

On Tue, Dec 21, 2021 at 9:49 PM Eduardo Ponce  wrote:
>
> Congrats!
>
> > On Dec 21, 2021, at 12:18 PM, Wes McKinney  wrote:
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Daniël Heres to become a PMC member and we are pleased to announce
> > that Daniël has accepted.
> >
> > Congratulations and welcome!
>


Re: [ANNOUNCE] New Arrow committer: Rémi Dattai

2021-12-08 Thread QP Hou
Congrats Rémi, thank you for your epic work on datafusion :)

On Wed, Dec 8, 2021 at 9:00 AM Andrew Lamb  wrote:
>
> Congratulations Rémi!
>
> On Wed, Dec 8, 2021 at 10:56 AM Nic  wrote:
>
> > Congratulations! :)
> >
> > On Wed, 8 Dec 2021 at 07:20, Jorge Cardoso Leitão <
> > jorgecarlei...@gmail.com>
> > wrote:
> >
> > > Congrats!
> > >
> > > On Wed, Dec 8, 2021 at 8:14 AM Daniël Heres 
> > wrote:
> > >
> > > > Congrats Rémi!
> > > >
> > > > On Wed, Dec 8, 2021, 04:27 Ian Joiner  wrote:
> > > >
> > > > > Congrats!
> > > > >
> > > > > On Tuesday, December 7, 2021, Wes McKinney 
> > > wrote:
> > > > >
> > > > > > On behalf of the Arrow PMC, I'm happy to announce that Rémi Dattai
> > > has
> > > > > > accepted an invitation to become a committer on Apache Arrow.
> > > Welcome,
> > > > > > and thank you for your contributions!
> > > > > >
> > > > > > Wes
> > > > > >
> > > > >
> > > >
> > >
> >


Re: [DISCUSS] Community maintained extension repos for Datafusion

2021-11-18 Thread QP Hou
Thanks Wes for the confirmation. Yes, we only intend to keep
extensions that won't get merged back to datafusion core in the
contrib repo. Any code that we intend to go into the core will
definitely still be developed within the ASF GH org.

On Wed, Nov 17, 2021 at 3:29 PM Wes McKinney  wrote:
>
> Having a "community" contrib GitHub org outside of Apache sounds fine.
> If we want to move any packages into the Apache governance structure
> then we can conduct an IP clearance at that point. Since the term
> "DataFusion" doesn't have ASF trademark issues like "Arrow" does, we
> don't need to be as careful with project names (e.g. "Arrow X" is bad
> but "X for Arrow" or "X powered by Arrow" is OK)
>
> If the contrib repository gets used to "incubate" new ideas for
> mainline DataFusion, we might rather use the "experimental" repo
> policy (discussed in the past on the mailing list) to keep the work
> happening inside Apache.
>
> On Mon, Nov 15, 2021 at 7:32 AM Andrew Lamb  wrote:
> >
> > Thank you QP
> >
> > Andrew
> >
> > On Sun, Nov 14, 2021 at 5:02 PM QP Hou  wrote:
> >
> > > Thanks Jiayu, Benson, Micah and Andrew for your input on this. I have
> > > created an unofficial Github org [1] as a quick and dirty experiment
> > > for something like spark-packages.org. We should make it clear that
> > > code developed in this org will still need to go through the donation
> > > process in order to get into the ASF org.
> > >
> > > [1]: https://github.com/datafusion-contrib
> > >
> > > On Mon, Nov 8, 2021 at 3:12 AM Andrew Lamb  wrote:
> > > >
> > > > I think a separate non-ASF organization, with a central list of
> > > extensions
> > > > like spark-packages.org sounds like a good idea to me.
> > > >
> > > > On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield 
> > > > wrote:
> > > >
> > > > > I'll preface this with not being an expert on these matters but this
> > > is my
> > > > > impression.
> > > > >
> > > > >
> > > > > > Therefore, I am proposing that we create an unofficial shared Github
> > > > > > organization to host these Datafusion contrib type projects that are
> > > > > > only maintained by non-PMC community members.
> > > > >
> > > > >
> > > > > I think as long as this is hosted outside of the Apache github
> > > > > organization, this seems fine.  I think being careful around 
> > > > > trade-mark
> > > > > issues and making it clear it isn't officially part of the Apache
> > > > > DataFusion project are the things to be careful about.  FWIW, I seem 
> > > > > to
> > > > > recall this type of model was something proposed in Spark and there 
> > > > > was
> > > > > some tension at the time with branding of the project.  It looks like
> > > Spark
> > > > > has settled on having a central site <https://spark-packages.org/>
> > > [1][2]
> > > > > for linking additional modules and they don't have a common namespace.
> > > > >
> > > > >
> > > > > > Am I curious if this is something that could be done under the 
> > > > > > Apache
> > > > > > governance model? My main goal is to create an unofficial incubator
> > > > > > type space for community members to develop and collaborate on
> > > > > > extensions that may or may not be adopted as official extensions in
> > > > > > the future.
> > > > >
> > > > >
> > > > > My limited understanding is either something is governed by the ASF
> > > rules
> > > > > (i.e. PMC/Committers officially recognized by the apache foundation,
> > > along
> > > > > with release requirements) or it isn't, there really isn't a half-way
> > > thing
> > > > > here from the ASF perspective.  Independent projects can choose
> > > ASF-like
> > > > > policies and manage themselves in this manner. The incubator program
> > > at the
> > > > > ASF is for projects that might or might not have sustained interest to
> > > > > continue (but my understanding is incubation follows all the process
> > > of a
> > > > > normal top-level Apache project).  Any code developed outside of ASF
> > >

[RESULT][VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 6.0.0 RC0

2021-11-16 Thread QP Hou
The vote has passed with 3 +1 binding votes and 2 non-binding votes.
Thank you to all who helped with the release verification.

We will proceed to finalize the release by following
https://github.com/apache/arrow-datafusion/tree/master/dev/release#finalize-the-release.

On Tue, Nov 16, 2021 at 8:50 PM Sutou Kouhei  wrote:
>
> +1
>
> I ran the followings on Debian GNU/Linux sid:
>
>   dev/release/verify-release-candidate.sh 6.0.0 0
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 6.0.0 RC0" on 
> Tue, 16 Nov 2021 18:20:37 -0800,
>   QP Hou  wrote:
>
> > It looks like we just need one more binding vote to get this release
> > out, would appreciate a +1 from another PMC member :)
> >
> > On Sun, Nov 14, 2021 at 4:54 PM Andy Grove  wrote:
> >>
> >> +1 (binding)
> >>
> >> I ran the release verification script on Ubuntu 20.04 and also ran the
> >> Ballista integration tests.
> >>
> >> Thanks,
> >>
> >> Andy.
> >>
> >>
> >> On Sun, Nov 14, 2021 at 10:10 AM Andrew Lamb  wrote:
> >>
> >> > +1 (binding)
> >> >
> >> > I reviewed the CHANGELOG and ran the release verification script on MacOS
> >> > Big Sur (11.6)
> >> >
> >> > Thank you QP for doing the work to create the release -- this one is 
> >> > pretty
> >> > awesome
> >> >
> >> > Andrew
> >> >


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 6.0.0 RC0

2021-11-16 Thread QP Hou
It looks like we just need one more binding vote to get this release
out, would appreciate a +1 from another PMC member :)

On Sun, Nov 14, 2021 at 4:54 PM Andy Grove  wrote:
>
> +1 (binding)
>
> I ran the release verification script on Ubuntu 20.04 and also ran the
> Ballista integration tests.
>
> Thanks,
>
> Andy.
>
>
> On Sun, Nov 14, 2021 at 10:10 AM Andrew Lamb  wrote:
>
> > +1 (binding)
> >
> > I reviewed the CHANGELOG and ran the release verification script on MacOS
> > Big Sur (11.6)
> >
> > Thank you QP for doing the work to create the release -- this one is pretty
> > awesome
> >
> > Andrew
> >


Re: [DISCUSS] Community maintained extension repos for Datafusion

2021-11-14 Thread QP Hou
Thanks Jiayu, Benson, Micah and Andrew for your input on this. I have
created an unofficial Github org [1] as a quick and dirty experiment
for something like spark-packages.org. We should make it clear that
code developed in this org will still need to go through the donation
process in order to get into the ASF org.

[1]: https://github.com/datafusion-contrib

On Mon, Nov 8, 2021 at 3:12 AM Andrew Lamb  wrote:
>
> I think a separate non-ASF organization, with a central list of extensions
> like spark-packages.org sounds like a good idea to me.
>
> On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield 
> wrote:
>
> > I'll preface this with not being an expert on these matters but this is my
> > impression.
> >
> >
> > > Therefore, I am proposing that we create an unofficial shared Github
> > > organization to host these Datafusion contrib type projects that are
> > > only maintained by non-PMC community members.
> >
> >
> > I think as long as this is hosted outside of the Apache github
> > organization, this seems fine.  I think being careful around trade-mark
> > issues and making it clear it isn't officially part of the Apache
> > DataFusion project are the things to be careful about.  FWIW, I seem to
> > recall this type of model was something proposed in Spark and there was
> > some tension at the time with branding of the project.  It looks like Spark
> > has settled on having a central site <https://spark-packages.org/> [1][2]
> > for linking additional modules and they don't have a common namespace.
> >
> >
> > > Am I curious if this is something that could be done under the Apache
> > > governance model? My main goal is to create an unofficial incubator
> > > type space for community members to develop and collaborate on
> > > extensions that may or may not be adopted as official extensions in
> > > the future.
> >
> >
> > My limited understanding is either something is governed by the ASF rules
> > (i.e. PMC/Committers officially recognized by the apache foundation, along
> > with release requirements) or it isn't, there really isn't a half-way thing
> > here from the ASF perspective.  Independent projects can choose ASF-like
> > policies and manage themselves in this manner. The incubator program at the
> > ASF is for projects that might or might not have sustained interest to
> > continue (but my understanding is incubation follows all the process of a
> > normal top-level Apache project).  Any code developed outside of ASF
> > governance needs to go through the donation process (IP Clearance, etc) to
> > be moved into ASF repos, even if it is developed by PMC members/committers
> > (see prior discussions on Arrow2 in Rust and the Julia libraries).
> >
> > Cheers,
> > Micah
> >
> > [1] https://spark.apache.org/contributing.html
> > [2] https://spark-packages.org/
> >
> >
> > On Sun, Nov 7, 2021 at 2:31 AM Benson Muite 
> > wrote:
> >
> > > A community owned GitHub organization would be helpful. Maybe for all
> > > other Arrow related projects not just Datafusion. This would make them
> > > easier to find, and for community members to contribute. It could also
> > > include a listing of relevant projects elsewhere.
> > >
> > > On 11/7/21 9:40 AM, Jiayu Liu wrote:
> > > > FWIW if there's a way to contribute code pertaining to datafusion I can
> > > > contribute my version of Java bindings to it.
> > > >
> > > > IMO having a central place (instead of linking) for all bindings, 3rd
> > > > libraries, etc. for datafusion would mean more synergy across different
> > > > languages but I won't go as far as a monorepo because the CI/CD process
> > > > and release process are unlikely to benefit from it. Maybe a community
> > > > owned GitHub org?
> > > >
> > > > On 2021/11/07 00:52:49 QP Hou wrote:
> > > >> Hi all,
> > > >>
> > > >> I would like to propose a new and more community friendly governance
> > > >> model for community contributed and maintained extensions for the
> > > >> datafusion project.
> > > >>
> > > >> Over the last year, many datafusion extensions have been proposed and
> > > >> created by the community including the java binding, s3 and hdfs[1]
> > > >> object storage implementations, etc. Right now these code are or will
> > > >> be hosted in individual github namespaces due to the following
> > > >> reasons:
> > &g

[VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 6.0.0 RC0

2021-11-14 Thread QP Hou
Hi,

I would like to propose a release of Apache Arrow Datafusion Implementation,
version 6.0.0.

This release candidate is based on commit:
7824a8d74093374da8a4f040d23a81b8436b7380 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at
https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates.

[ ] +1 Release this as Apache Arrow Datafusion 6.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow Datafusion 6.0.0 because...

[1]: 
https://github.com/apache/arrow-datafusion/tree/7824a8d74093374da8a4f040d23a81b8436b7380
[2]: 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-6.0.0-rc0
[3]: 
https://github.com/apache/arrow-datafusion/blob/7824a8d74093374da8a4f040d23a81b8436b7380/CHANGELOG.md

Thanks,
QP


[DISCUSS] Community maintained extension repos for Datafusion

2021-11-06 Thread QP Hou
Hi all,

I would like to propose a new and more community friendly governance
model for community contributed and maintained extensions for the
datafusion project.

Over the last year, many datafusion extensions have been proposed and
created by the community including the java binding, s3 and hdfs[1]
object storage implementations, etc. Right now these code are or will
be hosted in individual github namespaces due to the following
reasons:

* Most of these extensions are not considered part of the Datafusion
core, so the current maintainers prefer to not have them managed in
the main repository. The current python binding and ballista code base
is already adding a decent amount of overhead to our development
process. Adding more dependent crates will slow us down further
without much upside.

* Considering the overhead of the official Apache release process,
current Datafusion PMCs don't have the bandwidth to manage individual
releases for these extensions. All of the authors of these extensions
are not Arrow PMC members, so they won't have the access to drive the
Apache releases by themselves.

Therefore, I am proposing that we create an unofficial shared Github
organization to host these Datafusion contrib type projects that are
only maintained by non-PMC community members. I think this is strictly
better than hosting these extensions projects in personal github
namespaces. If any of these extensions end up getting significant
involvements or interests from Datafusion committers, then we can
promote them into official projects and provide official Apache style
release support.

Other alternatives I have considered are:

* Keep these projects under personal namespaces and only link them in
Datafusion's documentation.

* Manage these extensions using experimental repos. But as far as I
know, the code owners still need to be a PMC member in order to
perform crates.io releases and it's not intended for long running
projects without no goal for eventual archival.

* Create a dedicated mono repo named apache/datafusion-contrib to host
these extensions. However, this approach also requires PMC members to
get involved for crates.io releases if I understand it correctly.

Am I curious if this is something that could be done under the Apache
governance model? My main goal is to create an unofficial incubator
type space for community members to develop and collaborate on
extensions that may or may not be adopted as official extensions in
the future.

[1]: https://github.com/apache/arrow-datafusion/pull/1223

Thanks,
QP


Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-10-14 Thread QP Hou
Hi Ben,

Anyone is welcome to join the party.

--QP

On Thu, Oct 14, 2021 at 7:02 AM Ben Chen  wrote:
>
> Hi,
>
> I'm new to Arrow. Is such a call specific to a small group of people or
> welcome to anyone?
>
> Thanks,
> Ben
>
> On Thu, Oct 14, 2021 at 9:17 AM Andrew Lamb  wrote:
>
> > I agree -- since no one has added anything to the agenda, I will cancel it
> > (and leave a note in the agenda document to that effect)
> >
> > On Thu, Oct 14, 2021 at 1:23 AM QP Hou  wrote:
> >
> > > I recommend skipping the meeting if there is no proposed item before
> > > the meeting starts. Changing to monthly cadence also sounds like a
> > > good idea. We can always change it back to bi-weekly or call of ad-hoc
> > > meetings if there are more items that need to be discussed.
> > >
> > >
> > > On Wed, Oct 13, 2021 at 12:43 PM Andrew Lamb 
> > wrote:
> > > >
> > > > Tomorrow October 14 would be the next 16:00 Rust sync up.
> > > >
> > > > However, we don't seem to have anything on the agenda[1]
> > > >
> > > > Would anyone like to propose topics? Or shall we skip this time ? Maybe
> > > > monthly would be a better cadence?
> > > >
> > > > Andrew
> > > >
> > > >
> > > > [1]
> > > >
> > >
> > https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#
> > > >
> > > > On Thu, Sep 30, 2021 at 12:46 PM Jiayu Liu  wrote:
> > > >
> > > > > Thanks Andrew for facilitating this meeting and very happy to "meet"
> > > > > everyone on the call. Hope you have a great day / evening.
> > > > >
> > > > > On Fri, Oct 1, 2021 at 12:38 AM Andrew Lamb 
> > > wrote:
> > > > >
> > > > > > Notes from the 16:00 UTC Call:
> > > > > >
> > > > > > Attendees:
> > > > > > Andrew Lamb
> > > > > > Shen Yi Jie
> > > > > > Matt Turner
> > > > > > Zied BF
> > > > > > Remi Dettai
> > > > > > Rich
> > > > > > Ruihang
> > > > > > Jaiyu Liu
> > > > > > QP
> > > > > > Jorn Horstmann
> > > > > > Benson Muite
> > > > > >
> > > > > > Introductions (20 minutes)
> > > > > >
> > > > > > Discussion Items (10 minutes):
> > > > > > * Interest in python binding, though it is lagging behind
> > > > > > * Thoughts on boundaries between Arrow, DataFusion and Ballista
> > > > > >
> > > > > > On Thu, Sep 30, 2021 at 12:29 AM Benson Muite <
> > > > > benson_mu...@emailplus.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Attendees:
> > > > > > >
> > > > > > > Ruihang
> > > > > > > Benson
> > > > > > >
> > > > > > > Discussion items:
> > > > > > >
> > > > > > > Self-introduction
> > > > > > > OpenVidu seemed to work
> > > > > > > Data Fusion introduction
> > > > > > > Speed of Arrow development process and intended use cases
> > > > > > > Maybe get time zones of attendees?
> > > > > > >
> > > > > > > On 9/30/21 6:58 AM, Benson Muite wrote:
> > > > > > > > Join link:
> > > > > > > >
> > > > > > > > https://mkutano.nairuby.org/#/soft-amaranth-alpaca
> > > > > > > >
> > > > > > > >
> > > > > > > > Sorry it is late. Meeting should be short, as it seems there
> > is a
> > > > > > > > preference for one meeting.
> > > > > > > >
> > > > > > > > On 9/29/21 10:59 AM, Benson Muite wrote:
> > > > > > > >> Hi,
> > > > > > > >>
> > > > > > > >> Will send a link to a BigBlueButton/OpenVidu instance at 3:45
> > > UTC
> > > > > > > >> tomorrow.
> > > > > > > >>
> > > > > > > >> Update the google doc [1]
> > > > > > > >>
> > > > > > > >>

Re: [VOTE][RUST] Release Apache Arrow Rust 6.0.0 RC1

2021-10-13 Thread QP Hou
+1 (non-binding)

Tested with verification script on Linux 5.4.0-80 Ubuntu x86_64.

On Wed, Oct 13, 2021 at 1:23 PM Andy Grove  wrote:
>
> +1 (binding)
>
> I checked signatures and ran the release verification script on Ubuntu 20.04
>
> On Wed, Oct 13, 2021 at 1:35 PM Andrew Lamb  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Implementation,
> > version 6.0.0.
> >
> > This release candidate is based on commit:
> > 9e522699982fa554a41f0c67fb003e9ba0c3becb [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust  because...
> >
> > [1]:
> >
> > https://github.com/apache/arrow-rs/tree/9e522699982fa554a41f0c67fb003e9ba0c3becb
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-6.0.0-rc1
> > [3]:
> >
> > https://github.com/apache/arrow-rs/blob/9e522699982fa554a41f0c67fb003e9ba0c3becb/CHANGELOG.md
> > [4]:
> >
> > https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
> >


Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-10-13 Thread QP Hou
I recommend skipping the meeting if there is no proposed item before
the meeting starts. Changing to monthly cadence also sounds like a
good idea. We can always change it back to bi-weekly or call of ad-hoc
meetings if there are more items that need to be discussed.


On Wed, Oct 13, 2021 at 12:43 PM Andrew Lamb  wrote:
>
> Tomorrow October 14 would be the next 16:00 Rust sync up.
>
> However, we don't seem to have anything on the agenda[1]
>
> Would anyone like to propose topics? Or shall we skip this time ? Maybe
> monthly would be a better cadence?
>
> Andrew
>
>
> [1]
> https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#
>
> On Thu, Sep 30, 2021 at 12:46 PM Jiayu Liu  wrote:
>
> > Thanks Andrew for facilitating this meeting and very happy to "meet"
> > everyone on the call. Hope you have a great day / evening.
> >
> > On Fri, Oct 1, 2021 at 12:38 AM Andrew Lamb  wrote:
> >
> > > Notes from the 16:00 UTC Call:
> > >
> > > Attendees:
> > > Andrew Lamb
> > > Shen Yi Jie
> > > Matt Turner
> > > Zied BF
> > > Remi Dettai
> > > Rich
> > > Ruihang
> > > Jaiyu Liu
> > > QP
> > > Jorn Horstmann
> > > Benson Muite
> > >
> > > Introductions (20 minutes)
> > >
> > > Discussion Items (10 minutes):
> > > * Interest in python binding, though it is lagging behind
> > > * Thoughts on boundaries between Arrow, DataFusion and Ballista
> > >
> > > On Thu, Sep 30, 2021 at 12:29 AM Benson Muite <
> > benson_mu...@emailplus.org>
> > > wrote:
> > >
> > > > Attendees:
> > > >
> > > > Ruihang
> > > > Benson
> > > >
> > > > Discussion items:
> > > >
> > > > Self-introduction
> > > > OpenVidu seemed to work
> > > > Data Fusion introduction
> > > > Speed of Arrow development process and intended use cases
> > > > Maybe get time zones of attendees?
> > > >
> > > > On 9/30/21 6:58 AM, Benson Muite wrote:
> > > > > Join link:
> > > > >
> > > > > https://mkutano.nairuby.org/#/soft-amaranth-alpaca
> > > > >
> > > > >
> > > > > Sorry it is late. Meeting should be short, as it seems there is a
> > > > > preference for one meeting.
> > > > >
> > > > > On 9/29/21 10:59 AM, Benson Muite wrote:
> > > > >> Hi,
> > > > >>
> > > > >> Will send a link to a BigBlueButton/OpenVidu instance at 3:45 UTC
> > > > >> tomorrow.
> > > > >>
> > > > >> Update the google doc [1]
> > > > >>
> > > > >> Would be helpful to know if having 2 meetings on the same day, or
> > > > >> alternating the meeting time will work best for most people.
> > > > >>
> > > > >> Regards,
> > > > >> Benson
> > > > >>
> > > > >> [1]
> > > > >>
> > > >
> > >
> > https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit
> > > > >>
> > > > >>
> > > > >> On 9/24/21 4:26 PM, Andrew Lamb wrote:
> > > > >>> Thank you!
> > > > >>>
> > > > >>>
> > > > >>> On Thu, Sep 23, 2021 at 4:17 PM Benson Muite
> > > > >>> 
> > > > >>> wrote:
> > > > >>>
> > > >  Can host 4:00 UTC, will likely use a self-hosted video
> > conferencing
> > > >  solution that should just work in the browser.
> > > > 
> > > >  Benson
> > > > 
> > > > 
> > > >  On 9/22/21 11:15 PM, Andrew Lamb wrote:
> > > > > The idea of time variation sounds great. As I am not typically
> > > > > available
> > > >  at
> > > > > 4:00 UTC I would appreciate it if someone else could please
> > arrange
> > > > > that.
> > > > > Thus, I propose the following as an initial call and we can
> > adjust
> > > > > schedules or technology as needed:
> > > > >
> > > > > Date/Time: Alternating Thursdays at 16:00 UTC starting September
> > > > > 30, 2021
> > > > > Location: Zoom [2]
> > > > > Agenda: Google docs [1]
> > > > >
> > > > > We will send a summary of all sync ups to the mailing list. I
> > have
> > > > > also
> > > > > proposed adding this information to the website [3]
> > > > >
> > > > > Thanks,
> > > > > Andrew
> > > > >
> > > > > [1]
> > > > >
> > > > 
> > > >
> > >
> > https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit
> > > > 
> > > > >
> > > > > [2]
> > > > > Topic: Apache Arrow Rust Syncup (arrow, datafusion ballista, etc)
> > > > > Time:
> > > >  Sep
> > > > > 30, 2021 04:00 PM Universal Time UTC
> > > > > https://influxdata.zoom.us/j/94666921249
> > > > >
> > > > > [3] https://github.com/apache/arrow-datafusion/pull/1042/files
> > > > >
> > > > 
> > > > >>>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >


Re: [ANNOUNCE] New Arrow committer: Jiayu Liu

2021-10-07 Thread QP Hou
Congrats Jiayu, welcome to the party!

--QP

On Thu, Oct 7, 2021 at 3:56 AM Andrew Lamb  wrote:
>
> Hi,
>
> On behalf of the Arrow PMC, I'm happy to announce that
> Jiayu Liu has accepted an invitation to become a
> committer on Apache Arrow. Welcome, and thank you for your
> contributions!
>
>
> Andrew


Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-29 Thread QP Hou
I would prefer alternating meeting time to optimize for more
attendance and reduce meeting fatigue for those who can attend both.

On Wed, Sep 29, 2021 at 1:00 AM Benson Muite  wrote:
>
> Hi,
>
> Will send a link to a BigBlueButton/OpenVidu instance at 3:45 UTC tomorrow.
>
> Update the google doc [1]
>
> Would be helpful to know if having 2 meetings on the same day, or
> alternating the meeting time will work best for most people.
>
> Regards,
> Benson
>
> [1]
> https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit
>
> On 9/24/21 4:26 PM, Andrew Lamb wrote:
> > Thank you!
> >
> >
> > On Thu, Sep 23, 2021 at 4:17 PM Benson Muite 
> > wrote:
> >
> >> Can host 4:00 UTC, will likely use a self-hosted video conferencing
> >> solution that should just work in the browser.
> >>
> >> Benson
> >>
> >>
> >> On 9/22/21 11:15 PM, Andrew Lamb wrote:
> >>> The idea of time variation sounds great. As I am not typically available
> >> at
> >>> 4:00 UTC I would appreciate it if someone else could please arrange that.
> >>> Thus, I propose the following as an initial call and we can adjust
> >>> schedules or technology as needed:
> >>>
> >>> Date/Time: Alternating Thursdays at 16:00 UTC starting September 30, 2021
> >>> Location: Zoom [2]
> >>> Agenda: Google docs [1]
> >>>
> >>> We will send a summary of all sync ups to the mailing list. I have also
> >>> proposed adding this information to the website [3]
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>> [1]
> >>>
> >> https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit
> >>>
> >>> [2]
> >>> Topic: Apache Arrow Rust Syncup (arrow, datafusion ballista, etc) Time:
> >> Sep
> >>> 30, 2021 04:00 PM Universal Time UTC
> >>> https://influxdata.zoom.us/j/94666921249
> >>>
> >>> [3] https://github.com/apache/arrow-datafusion/pull/1042/files
> >>>
> >>
> >
>


Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-21 Thread QP Hou
To expedite the donation, perhaps we could move on with the decoupled
version scheme for now to reduce workload and disruption to the
existing users. The julia maintainers can always decide to change the
versioning scheme later after the donation has been completed. This
doesn't seem like a blocker issue to me.

On Mon, Sep 20, 2021 at 8:09 PM Sutou Kouhei  wrote:
>
> Hi Jacob,
>
> Thanks for confirming this.
>
> For major release:
>
> As far as I know:
>
> We chose this style because we will develop actively in at
> least a few years. Active development will need API breaking
> changes. So we release a major version per 3-4 months.
>
> Our release process releases all implementations at once
> before we chose this style. We just didn't change it. Some
> implementations don't have API breaking changes between
> major releases. But we just don't care it.
>
> Aligned versions for all implementations may have a merit
> for users. Users can assume that it's safe that they use
> Apache Arrow C++ 6.0.0 and Apache Arrow Rust 6.0.0. (We have
> integration tests for implementations with the same version.)
>
> References:
>
>   * Discussion: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"
> 
> https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8%40%3Cdev.arrow.apache.org%3E
>
>   * Vote: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for 
> Arrow 1.0.0 and beyond
> 
> https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E
>
>   * Follow-up thread: Versioning of arrow
> 
> https://lists.apache.org/thread.html/rb11c0839a7167c2f1d82b0b77134c53abc5487e9165c3493b55db12b%40%3Cdev.arrow.apache.org%3E
>
>
> My opinion:
>
> I have no opinion on this. I don't object that the Julia
> implementation uses separated version.
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 16 Sep 
> 2021 23:47:45 -0600,
>   Jacob Quinn  wrote:
>
> > Good question.
> >
> > In my mind, I was imagining the arrow-julia repo would have a fully
> > decoupled versioning from the main arrow project. This comes from my
> > understanding that the julia implementation is it's own "project" that
> > implements the arrow spec/format, and we may need a breaking major release
> > at different cadences than the main spec version. Indeed, while the arrow
> > project has gone from 2.0 -> 6.0 since the julia implementation was first
> > released, we're just now releasing our own 2.0.0 version after a change in
> > API for how metadata is set/retrieved on table/column objects.
> >
> > I'll admit that it's not entirely clear to me how to best signal/implement
> > coordination between the main arrow project versions and the julia version
> > though. I'm just guessing here, but is that why the main arrow project does
> > so frequent major version releases? To account for any child
> > implementations happening to have breaking changes? I think I remember
> > discussion recently around moving the actual spec/format document out as a
> > separate repo or at least versioning it separately from all the various
> > implementations, and that seems like it would be a good idea, though I
> > guess the format itself has versioning builtin to itself. It's certainly
> > something we can clarify in the Julia package itself; i.e. which version of
> > the spec a given Julia package version is compatible with. Typically with
> > other julia package dependencies, just a minor version increment is
> > required when a new breaking dependency version is upgraded, so I would
> > think we could follow something similar by treating the arrow format as a
> > "dependency".
> >
> > I'll clarify that I don't feel very strongly on these points, so if there's
> > something I'm missing or gaps in my understanding of how the rest of the
> > web of projects are coordinating things, I'm all ears.
> >
> > -Jacob
> >
> > On Thu, Sep 16, 2021 at 11:24 PM Sutou Kouhei  wrote:
> >
> >> Hi,
> >>
> >> Good point! Jacob, could you confirm this?
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Sat, 11
> >> Sep 2021 16:57:17 -0700,
> >>   QP Hou  wrote:
> >>
> >> > Just one minor point to confirm and clarify. It looks like Julia arrow
> >> only
> >> > wants 

Re: Re: Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-19 Thread QP Hou
16 UTC works for me too.

On Sun, Sep 19, 2021 at 10:00 AM zied bf  wrote:
>
> HI everyone,
>
> Still new to the stack and as per @Remi mentioned a few details related to
> internal design of datafuision documents and discussion could help
> newcomers to understand how to contribute and at least where to start to
> grasp the full implementation details and potentially contribute.
>
> I would vote for 16:00 UTC.
>
> Best
>
> On Sun, Sep 19, 2021 at 5:50 PM Rémi Dettai  wrote:
>
> > I would also vote for 16:00 UTC.
> >
> > Remi
> >
> > On Sun, Sep 19, 2021, 2:01 PM Yijie Shen 
> > wrote:
> >
> > > 4:00, 10:00, and 16:00 in UTC works for me as well :D
> > >
> > > On 2021/09/19 10:32:03 Wayne Xia wrote:
> > > > I vote for 4:00, 10:00 and 16:00 (in UTC)
> > > >
> > > > On Sun, Sep 19, 2021 at 6:27 PM Andrew Lamb 
> > > wrote:
> > > >
> > > > > It sounds like there are enough people to make it worth organizing at
> > > least
> > > > > a few.
> > > > >
> > > > > I would be happy to organize a zoom meeting to facilitate this.
> > > Previously
> > > > > the schedule was every other Wednesday at 12 Noon Eastern Time,
> > which I
> > > > > realize may be a tough time for some, especially those in Asia.
> > > > >
> > > > > Could people please respond with their preference:
> > > > > A) 4:00 UTC
> > > > > A) 10:00 UTC
> > > > > B) 16:00 UTC
> > > > > D) 22:00 UTC
> > > > >
> > > > > Thanks,
> > > > > Andrew
> > > > >
> > > > >
> > > > > On Fri, Sep 17, 2021 at 6:07 AM Yijie Shen <
> > henry.yijies...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I have received a lot of great help since I started working on
> > > > > DataFusion.
> > > > > > It will be fantastic to have an opportunity to communicate with
> > > community
> > > > > > members "face to face".
> > > > > >
> > > > > > Best,
> > > > > > Yijie
> > > > > >
> > > > > > On 2021/09/17 03:58:10 QP Hou wrote:
> > > > > > > I would be interested in meeting with more contributors "face to
> > > face"
> > > > > > > and chime in to help move these major initiatives forward in any
> > > way I
> > > > > > > can :)
> > > > > > >
> > > > > > > --
> > > > > > > QP
> > > > > > >
> > > > > > > On Thu, Sep 16, 2021 at 6:55 AM Rémi Dettai 
> > > wrote:
> > > > > > > >
> > > > > > > > I am also very interested in re-instoring these events, at
> > least
> > > > > > > > occasionally.
> > > > > > > >
> > > > > > > > I do think that sharing some higher level goals and ideas in
> > more
> > > > > > *informal
> > > > > > > > *discussions could help us understand each other better in our
> > > > > > asynchronous
> > > > > > > > work (design documents, issues, PRs).
> > > > > > > >
> > > > > > > > I also agree that no decision should be taken during these
> > > calls. An
> > > > > > > > interesting format could be that each time, one or two
> > > participants,
> > > > > > on a
> > > > > > > > volunteering basis, share a small presentation of their work
> > > > > on/around
> > > > > > > > Arrow/Datafusion, the time they have available to spend on it,
> > > maybe
> > > > > > their
> > > > > > > > overall vision of what they would like the project to become...
> > > > > > > >
> > > > > > > > Remi
> > > > > > > >
> > > > > > > > Le jeu. 16 sept. 2021 à 15:37, Andrew Lamb <
> > al...@influxdata.com>
> > > a
> > > > > > écrit :
> > > > > > > >
> > > > > > > > >  A lot has been happening in DataFusion and Arrow  since  we
> > > > > stopped
> > > > > > the
> > &

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-16 Thread QP Hou
I would be interested in meeting with more contributors "face to face"
and chime in to help move these major initiatives forward in any way I
can :)

--
QP

On Thu, Sep 16, 2021 at 6:55 AM Rémi Dettai  wrote:
>
> I am also very interested in re-instoring these events, at least
> occasionally.
>
> I do think that sharing some higher level goals and ideas in more *informal
> *discussions could help us understand each other better in our asynchronous
> work (design documents, issues, PRs).
>
> I also agree that no decision should be taken during these calls. An
> interesting format could be that each time, one or two participants, on a
> volunteering basis, share a small presentation of their work on/around
> Arrow/Datafusion, the time they have available to spend on it, maybe their
> overall vision of what they would like the project to become...
>
> Remi
>
> Le jeu. 16 sept. 2021 à 15:37, Andrew Lamb  a écrit :
>
> >  A lot has been happening in DataFusion and Arrow  since  we stopped the
> > Rust specific sync calls (see mailing list thread [1] on the topic).
> >
> > I would like to gauge interest in restarting the calls
> >
> > I think a call could be valuable to:
> > 1. Help "put a face to the name" of some of other contributors we are
> > working with
> > 2. Discuss / synchronize on the goals and major initiatives from different
> > stakeholders to identify areas where more alignment is needed
> >
> > Recent areas I am thinking about that might benefit from some in person
> > discussion are the object store API [2] and table provider splits [3].
> >
> > As always, we would ensure that minutes are sent out,  no decisions are
> > made on the call and anything of substance was discussed on this mailing
> > list or in github issues / google docs.
> >
> > Andrew
> >
> > [1]
> >
> > https://lists.apache.org/thread.html/rbeadc3b11bce8731c69617c8e0fe780a97055de0fcd739c378d9c0e1%40%3Cdev.arrow.apache.org%3E
> >
> > [2] https://github.com/apache/arrow-datafusion/pull/950
> >
> > [3] https://github.com/apache/arrow-datafusion/issues/1009
> >


Re: [DISCUSS] Leap seconds/days and day light saving for Duration types

2021-09-16 Thread QP Hou
Thank you for your feedback Weston and Antonie. I agree that ordering
discussion should be out of scope for the Arrow format spec. I have
removed reference of ordering in the PR so now the only change is
mentioning leap seconds to keep it consistent with other temporal
types.

I would like to add that even though we are not explicitly discussing
ordering in the spec, any kind of restriction we assign to a type
would still implicitly impact ordering in downstream compute kernels.
This is why I also took out the discussion of leap days in my PR as
well.

Thanks,
QP

On Tue, Sep 14, 2021 at 12:46 AM Antoine Pitrou  wrote:
>
>
> I agree with Weston that ordering isn't in the scope for the Arrow
> format spec (*).  For example, implementations are free to define UTF8
> comparisons and ordering as they wish (some may want to invest in the
> complexity of the official Unicode collation algorithm, others may be
> content with a simple codepoint-wise lexicographic comparison).  It
> doesn't prevent them from exchanging UTF8 data unambiguously using Arrow.
>
> (*) It may be in the scope for a hypothetical Compute IR spec, however.
>
> Regards
>
> Antoine.
>
>
> Le 14/09/2021 à 07:16, QP Hou a écrit :
> > Good point Weston. My proposal was written with the impression that
> > Arrow does want to define semantic for some of these temporal types
> > based on the existing comments in the Schema.fbs file.
> >
> > For example, here is a quote taken from the comments for the Time time:
> >
> > /// This definition doesn't allow for leap seconds. Time values from
> > /// measurements with leap seconds will need to be corrected when ingesting
> > /// into Arrow (for example by replacing the value 86400 with 86399).
> >
> > Here is another quote for the Date type:
> >
> > /// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch 
> > (no
> > /// leap seconds), where the values are evenly divisible by 8640
> >
> > For the interval type, we have:
> >
> > // A "calendar" interval which models types that don't necessarily
> > // have a precise duration without the context of a base timestamp (e.g.
> > // days can differ in length during day light savings time transitions).
> >
> > I think pushing the responsibility to define these semantics to the
> > data producer side is also a perfectly fine design with its own
> > trade-offs. It would make data exchange between two different systems
> > a little bit harder because consumers need to be aware of the
> > semantics defined by the producer. On the other hand, it does make the
> > producer implementation easier. It also makes data exchange within the
> > same system more efficient if that system's temporal type semantic is
> > different from what's defined in Arrow's spec.
> >
> > Either way, I think it would be good if we can be consistent on our
> > temporal type semantics in the spec. If we are making the claim that
> > leap seconds should not be taken into account for Time, Timestamp and
> > Date types, then it seems natural to make this claim for Interval type
> > as well. Alternatively, we could update the spec to make all temporal
> > types leap seconds agnostics.
> >
> > On Mon, Sep 13, 2021 at 12:03 PM Weston Pace  wrote:
> >>
> >> One could define a sorting based on 30 days months, 365 day years, and
> >> 24 hour days.  It would be consistent but can lead to some surprising
> >> results.  It appears that this is what postgres does as I got the
> >> following ordering for an interval:
> >>
> >> 359 days, 12 months, 360 days, 1 year, 365 days, 366 days
> >>
> >> On the other hand, Joda time forbids comparison of periods (their
> >> version of what we call an interval) and offers three ways to convert
> >> to a duration.  There is toDurationFrom(instant),
> >> toDurationTo(instant) which give durations from specific calendar
> >> ranges and then there is toStandardDuration() which converts to a
> >> duration based on 24 hour days.  However, toStandardDuration will
> >> still fail if the period has >0 months or years (presumably because
> >> months and years are too inconsistent).
> >>
> >> I'm not sure though that this is something that Arrow needs to define.
> >> We aren't specifying any invalid ranges of values.  I don't foresee
> >> any interoperability concerns.  A system that treated intervals as
> >> comparable (and didn't factor in DST, leap years, etc.) will read and
> >> write intervals the same way as a system that considers intervals
> >> incompa

Re: [DISCUSS] Leap seconds/days and day light saving for Duration types

2021-09-13 Thread QP Hou
Good point Weston. My proposal was written with the impression that
Arrow does want to define semantic for some of these temporal types
based on the existing comments in the Schema.fbs file.

For example, here is a quote taken from the comments for the Time time:

/// This definition doesn't allow for leap seconds. Time values from
/// measurements with leap seconds will need to be corrected when ingesting
/// into Arrow (for example by replacing the value 86400 with 86399).

Here is another quote for the Date type:

/// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no
/// leap seconds), where the values are evenly divisible by 8640

For the interval type, we have:

// A "calendar" interval which models types that don't necessarily
// have a precise duration without the context of a base timestamp (e.g.
// days can differ in length during day light savings time transitions).

I think pushing the responsibility to define these semantics to the
data producer side is also a perfectly fine design with its own
trade-offs. It would make data exchange between two different systems
a little bit harder because consumers need to be aware of the
semantics defined by the producer. On the other hand, it does make the
producer implementation easier. It also makes data exchange within the
same system more efficient if that system's temporal type semantic is
different from what's defined in Arrow's spec.

Either way, I think it would be good if we can be consistent on our
temporal type semantics in the spec. If we are making the claim that
leap seconds should not be taken into account for Time, Timestamp and
Date types, then it seems natural to make this claim for Interval type
as well. Alternatively, we could update the spec to make all temporal
types leap seconds agnostics.

On Mon, Sep 13, 2021 at 12:03 PM Weston Pace  wrote:
>
> One could define a sorting based on 30 days months, 365 day years, and
> 24 hour days.  It would be consistent but can lead to some surprising
> results.  It appears that this is what postgres does as I got the
> following ordering for an interval:
>
> 359 days, 12 months, 360 days, 1 year, 365 days, 366 days
>
> On the other hand, Joda time forbids comparison of periods (their
> version of what we call an interval) and offers three ways to convert
> to a duration.  There is toDurationFrom(instant),
> toDurationTo(instant) which give durations from specific calendar
> ranges and then there is toStandardDuration() which converts to a
> duration based on 24 hour days.  However, toStandardDuration will
> still fail if the period has >0 months or years (presumably because
> months and years are too inconsistent).
>
> I'm not sure though that this is something that Arrow needs to define.
> We aren't specifying any invalid ranges of values.  I don't foresee
> any interoperability concerns.  A system that treated intervals as
> comparable (and didn't factor in DST, leap years, etc.) will read and
> write intervals the same way as a system that considers intervals
> incomparable.
>
> This question seems to fall into the "compute" space inhabited by
> topics like "is 'false && null' a false value or a null value" and
> "should addition overflow or throw an exception".
>
> On Mon, Sep 13, 2021 at 6:23 AM QP Hou  wrote:
> >
> > On Mon, Sep 13, 2021 at 6:18 AM Antoine Pitrou  wrote:
> > > The Duration type is defined with a TimeUnit.  You are probably thinking
> > > about the Interval type.
> > >
> >
> > Oops, my bad, yes, it should be Interval type not Duration.
> >
> > > Ok.  How about daylight savings? I suppose they are taken into account
> > > as well.
> > >
> >
> > Yes, the day component in both DAY_TIME and MONTH_DAY_NANO all take
> > into account of daylight savings.


Re: [DISCUSS] Leap seconds/days and day light saving for Duration types

2021-09-13 Thread QP Hou
On Mon, Sep 13, 2021 at 6:18 AM Antoine Pitrou  wrote:
> The Duration type is defined with a TimeUnit.  You are probably thinking
> about the Interval type.
>

Oops, my bad, yes, it should be Interval type not Duration.

> Ok.  How about daylight savings? I suppose they are taken into account
> as well.
>

Yes, the day component in both DAY_TIME and MONTH_DAY_NANO all take
into account of daylight savings.


Re: [DataFusion] Question about async/await?

2021-09-12 Thread QP Hou
Hi Renjie,

If by datafusion benchmarks, you are referring to the code in the
datafusion/benches folder, then those benchmarks are executed with
tokio runtime.

You are correct that one should schedule compute bound tasks into a
separate task managed by a dedicated thread to avoid blocking the
async runtime main thread. This practice applies to not just tokio,
but any other async runtime in general.

The tokio runtime used in the benchmark is initiated with
`tokio::runtime::Runtime::new()`. Tokeio in datafusion/Cargo.toml is
pulled in with the `rt-multi-thread` feature flag. So I believe by
default it creates the runtime with a multi-thread scheduler. I don't
think it matters that much for benchmarks though, because in those
benchmark code, we call `Runtime::block_on` when executing the async
query code.

On Sat, Sep 11, 2021 at 7:38 PM Renjie Liu  wrote:
>
> Hi, all:
> I see that the executor trait is marked as async/await in method
> definition. I have several questions:
> 1. What async/await runtime is used in benchmarking?
> 2. Tokio is the most popular async/await runtime, and they suggest to put
> long running tasks in separate thread pool rather than using tokio runtime
> directly, and you can find this here 
>
> > If your code is CPU-bound and you wish to limit the number of threads used
> > to run it, you should run it on another thread pool such as rayon
> > .
> >
> So my second question is did you test against thread pool execution mode?
>
> It would be highly appreciated if you can answer my question.
> --
> Renjie Liu
> Software Engineer, MVAD


[DISCUSS] Leap seconds/days and day light saving for Duration types

2021-09-12 Thread QP Hou
Hi,

I would like to draw some attention to a format PR aiming to clarify
leap seconds, leap days and daylight saving handling semantics for
duration types: https://github.com/apache/arrow/pull/11138.

This came out of the effort [1] trying to implement Partial and Total
order for duration type DAY_TIME and MONTH_DAY_NANO.

In short, I am proposing we clarify the followings in the spec:

* For DAY_TIME duration, similar to Time and Timestamp, we do not take
leap seconds into account. But we take daylight saving into account.
As a result, days=1,ms=8640 does not equal to days=2,ms=0.
* For MONTH_DAY_NANO, we do not take leap seconds into account. But we
take leap days into account. Whether we take leap days into account
doesn't really have a big impact here because the number of days in a
month already varies even without leap days.

A consequence of this is we will not be able to define total order for
both DAY_TIME and MONTH_DAY_NANO durations. Similar to floating point
values, we will only be able to define partial order for these two
types. This impacts downstream sorting compute kernels because we
can't simply sort these values by raw ints tuples lexicographically.

Another consequence of this is normalization cannot be applied to both
types, i.e. we can't normalize days=1,ms=8640 into days=2 or
months=1,days=30 into months=2. This could simplify downstream hash
aggregate/join compute kernels because we can just hash the raw int
tuples to generate the hash keys.

[1]: https://github.com/jorgecarleitao/arrow2/pull/398

Thanks,
QP


Re: [DISCUSS] Use conbench.ursa.dev for arrow-rs and arrow-datafusion

2021-09-12 Thread QP Hou
Thank you Diana for the quick turnaround! The trail run looks great.

You are right that `sqrt_20_12, sqrt_20_9, sqrt_22_12, and sqrt_22_14`
are just the same type of test with different parameters, so it makes
sense to batch them. We can name these benchmarks however we want to
make it easier for conbench to parse the metadata.

I am able to reproduce the error you got with arrow-rs. I have filed
https://github.com/apache/arrow-rs/issues/770 to track the build
issue. In the meantime, you can workaround it by running the `cargo
bench` command within the arrow folder to avoid building benchmarks
from the parquet trait.

> - We probably want to add some additional context, like the
arrow-rs/arrow-datafusion version, rust version, any compiler flags,
etc.

I agree, the rust version can be obtained through `rustc --version`.
We are not setting any special compiler flag at the moment, but
definitely something useful to include if we are planning to add any
in the future.

Please don't hesitate to reach out if there is anything we could help
to unblock you. For synchronous communication, most of the Rust
developers are active in the Apache slack workspace's #arrow-rust
channel.

On Sun, Sep 12, 2021 at 12:11 PM Diana Clarke
 wrote:
>
> Thanks for those great instructions, QP!
>
> I've spiked adding the arrow-rs and arrow-datafusion benchmark results
> to the Arrow Conbench server.
>
> https://github.com/ursacomputing/benchmarks/pull/79/files
>
> And I've published one example run for arrow-datafusion:
>
>  - Example run: 
> https://conbench.ursa.dev/runs/62098d2f86314339a696f8c48c4ce2e7/
>  - Example benchmark from run:
> https://conbench.ursa.dev/benchmarks/aa841cdf78ef4e38832e075b8485ec59/
>
> Some observations along the way:
>
>  - Criterion results are in nanoseconds, but the smallest unit
> Conbench currently speaks is seconds (because Conbench was initially
> for macro not micro benchmarking). I suspect most places in Conbench
> would work just fine if nanoseconds were passed in, but I need to
> audit the code for any places that assume seconds if it isn't a
> throughput benchmark.
>
> - If the Criterion benchmarks were named better, I could tag them
> better in Conbench. For example, I suspect sqrt_20_12, sqrt_20_9,
> sqrt_22_12, and sqrt_22_14 are parameterized variations of the same
> benchmark, and if they were named something like "sqrt, foo=20,
> bar=12", I could batch them together & tag their parameters so that
> Conbench would automatically graph them in relation to each other. I
> was sort of able to do this with the following benchmarks (because
> there was a machine readable pattern). Anyhoo, that's easy enough to
> do down the road as a last integration step, and it does appear from
> the Criterion docs that they have their own recommendations for how to
> do this.
>
> - window partition by, u64_narrow, aggregate functions
> - window partition by, u64_narrow, built-in functions
> - window partition by, u64_wide, aggregate functions
> - window partition by, u64_wide, built-in functions
>
> - While Criterion benchmarks can also measure throughput in some
> cases, all the arrow-datafusion benchmarks were in elapsed time (not
> sure about the arrow-rs benchmarks), so I didn't bother writing code
> to support potential throughput results from
> arrow-datafusion/arrow-rs, but we may need to revisit that.
>
> - We probably want to add some additional context, like the
> arrow-rs/arrow-datafusion version, rust version, any compiler flags,
> etc.
>
> PS. I wasn't able to get the arrow-rs benchmarks to run, but it sounds
> like they are very similar to arrow-datafusion benchmarks. I don't
> know Rust, so I may reach out on zulip (or wherever Arrow Rust folks
> talk) for help building arrow-rs.
>
> $ cargo bench
> unresolved imports `parquet::util::DataPageBuilder`,
> `parquet::util::DataPageBuilderImpl`,
> `parquet::util::InMemoryPageIterator`
>
> There are still the orchestration steps that need to be worked on, but
> all-in-all this seems very doable. I just need to negotiate some time
> with my day-job.
>
> Cheers,
>
> --diana
>
> On Sat, Sep 11, 2021 at 4:35 PM QP Hou  wrote:
> >
> > Thanks Diana a lot for offering to help. Please see my replies inline below.
> >
> > On Sat, Sep 11, 2021 at 8:37 AM Diana Clarke
> >  wrote:
> > > If you point me to the existing benchmarks for each project and
> > > instructions on how to execute them, I can let you know the easiest
> > > integration path.
> > >
> >
> > For arrow-datafusion, you just need to install the rust toolchain
> > using `rustup` [1], then run the `cargo bench` command within the
> > project root. Ben

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-11 Thread QP Hou
Just one minor point to confirm and clarify. It looks like Julia arrow only
wants to do on demand minor and patch releases. Major version release still
needs to be aligned with the main arrow release schedule, is that correct?
In other words, breaking changes should be avoided in on demand releases
(assuming they are using semantic versioning).

>From the original julia donation thread, I got the impression that the
julia maintainers wanted to have their own versioning scheme. Maybe that’s
not the case anymore. So I wanted to make sure we set the right expectation
for Julia maintainers.

FWIW, Arrow-rs today aligns the major version with the main arrow release,
so Andrew spend quite a bit of time maintaining an active release branch to
backport backwards compatible commits for minor and patch releases.
Datadusion and ballista on the other hand has a versioning scheme that’s
fully decoupled from the main Arrow version including the major version.

On Thu, Sep 9, 2021 at 1:38 PM Sutou Kouhei  wrote:

> Hi,
>
> Thanks for all comments about release schedule.
>
> Let's use release-on-demand approach based on
> arrow-datafusion's flow for the Julia Arrow implementation.
>
> Do we have more items to be discussed? Can we start voting?
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 9
> Sep 2021 09:48:57 -0400,
>   Andrew Lamb  wrote:
>
> > I also think release on demand is a good strategy.
> >
> > The primary reasons to do an arrow-rs release every 2 weeks were:
> > 1. To have predictable cadence into downstream projects (e.g. datafusion
> > and others)
> > 2. Amortize the overhead associated with each release (the process is non
> > trivial and the current 72 hour voting window adds some backpressure as
> > well -- I remember Wes may have said windows shorter than 72 hours might
> be
> > fine too)
> >
> >
> > On Wed, Sep 8, 2021 at 12:19 AM QP Hou  wrote:
> >
> >> A minor note on the Rust side of things. arrow-rs has a 2 weeks
> >> release cycle, but arrow-datafusion mostly does release on demand at
> >> the moment. Our most uptodate release processes are documented at [1]
> >> and [2].
> >>
> >> [1]:
> https://github.com/apache/arrow-rs/blob/master/dev/release/README.md
> >> [2]:
> >>
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md
> >>
> >> On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn 
> wrote:
> >> >
> >> > Thanks kou.
> >> >
> >> > I think the TODO action list looks good.
> >> >
> >> > The one point I think could use some additional discussion is around
> the
> >> > release cadence: it IS desirable to be able to release more frequently
> >> than
> >> > the parent repo 3-4 month cadence. But we also haven't had the
> frequency
> >> of
> >> > commits to necessarily warrant a release every 2 weeks. I can think of
> >> two
> >> > possible options, not sure if one or the other would be more
> compatible
> >> > with the apache release process:
> >> >
> >> > 1) Allow for release-on-demand; this is idiomatic for most Julia
> packages
> >> > I'm aware of. When a particular bug is fixed, or feature added, a user
> >> can
> >> > request a release, a little discussion happens, and a new release is
> >> made.
> >> > This approach would work well for the "bursty" kind of contributions
> >> we've
> >> > seen to Arrow.jl where development by certain people will happen
> >> frequently
> >> > for a while, then take a break for other things. This also avoids
> having
> >> > "scheduled" releases (every 2 weeks, 3 months, etc.) where there
> hasn't
> >> > been significant updates to necessarily warrant a new release. This
> >> > approach may also facilitate differentiating between bugfix (patch)
> >> > releases vs. new functionality releases (minor), since when a release
> is
> >> > requested, it could be specified whether it should be patch or minor
> (or
> >> > major).
> >> >
> >> > 2) Commit to a scheduled release pattern like every 2 weeks, once a
> >> month,
> >> > etc. This has the advantage of consistency and clearer expectations
> for
> >> > users/devs involved. A release also doesn't need to be requested,
> because
> >> > we can just wait for the scheduled time to release. In terms of the
> >> > "unnecessary r

Re: [DISCUSS] Use conbench.ursa.dev for arrow-rs and arrow-datafusion

2021-09-11 Thread QP Hou
Thanks Diana a lot for offering to help. Please see my replies inline below.

On Sat, Sep 11, 2021 at 8:37 AM Diana Clarke
 wrote:
> If you point me to the existing benchmarks for each project and
> instructions on how to execute them, I can let you know the easiest
> integration path.
>

For arrow-datafusion, you just need to install the rust toolchain
using `rustup` [1], then run the `cargo bench` command within the
project root. Benchmark results will be saved under the
`/target/criterion/BENCH_NAME/new` folder as raw.csv file. You can
read more about the convention at
https://bheisler.github.io/criterion.rs/book/user_guide/csv_output.html.

For arrow-rs, it's the exact same setup.

We have some extra TPCH integration benchmarks in datafusion, but I
think we can work on integrating them later. Getting the basic
criterion benchmarks into conbench would already be a huge win for us.

[1]: https://rustup.rs

> If the arrow-rs benchmarks are executable from command line and return
> parsable results (like json), it should be pretty easy to publish the
> results.
>

By default, results are saved as csv files, but you can pass in a
`--message-format=json` argument to save the results as JSON files
instead.

> - The arrow-rs and arrow-datafusion GitHub repositories must use
> squash merges (or Conbench would have to be extended to understand the
> 2 other GitHub merge methods).

Yes, we are using squash merge for both repos.

> - I'm not sure what the security implications are with respect to
> adding our ursabot integration and buildkite hooks to other
> repositories.
>

Are you concerned about security from ursabot and buildkit's point of
views? If so, who should we reach out to discuss this matter?

Thanks,
QP


[DISCUSS] Use conbench.ursa.dev for arrow-rs and arrow-datafusion

2021-09-10 Thread QP Hou
Hi,

I think conbench.ursa.dev works really well for the main arrow repo,
especially the ability to request on demand benchmarks during PR
reviews by mentioning usrabot in a comment.

I am wondering if it is something that arrow-rs and arrow-datafusion
could leverage as well to help speed up PR reviews and guard against
performance degradation? If so, what's the process for setting it up?

Thanks,
QP


Re: [ANNOUNCE] New Arrow committer: Nic Crane

2021-09-09 Thread QP Hou
Congrats Nic!

On Thu, Sep 9, 2021 at 8:47 AM Neal Richardson
 wrote:
>
> On behalf of the Apache Arrow PMC, I'm happy to announce that Nic Crane
> has accepted an invitation to become a committer on Apache Arrow.
>
> Welcome and thank you for your contributions!
>
> Neal


Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-07 Thread QP Hou
A minor note on the Rust side of things. arrow-rs has a 2 weeks
release cycle, but arrow-datafusion mostly does release on demand at
the moment. Our most uptodate release processes are documented at [1]
and [2].

[1]: https://github.com/apache/arrow-rs/blob/master/dev/release/README.md
[2]: 
https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md

On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn  wrote:
>
> Thanks kou.
>
> I think the TODO action list looks good.
>
> The one point I think could use some additional discussion is around the
> release cadence: it IS desirable to be able to release more frequently than
> the parent repo 3-4 month cadence. But we also haven't had the frequency of
> commits to necessarily warrant a release every 2 weeks. I can think of two
> possible options, not sure if one or the other would be more compatible
> with the apache release process:
>
> 1) Allow for release-on-demand; this is idiomatic for most Julia packages
> I'm aware of. When a particular bug is fixed, or feature added, a user can
> request a release, a little discussion happens, and a new release is made.
> This approach would work well for the "bursty" kind of contributions we've
> seen to Arrow.jl where development by certain people will happen frequently
> for a while, then take a break for other things. This also avoids having
> "scheduled" releases (every 2 weeks, 3 months, etc.) where there hasn't
> been significant updates to necessarily warrant a new release. This
> approach may also facilitate differentiating between bugfix (patch)
> releases vs. new functionality releases (minor), since when a release is
> requested, it could be specified whether it should be patch or minor (or
> major).
>
> 2) Commit to a scheduled release pattern like every 2 weeks, once a month,
> etc. This has the advantage of consistency and clearer expectations for
> users/devs involved. A release also doesn't need to be requested, because
> we can just wait for the scheduled time to release. In terms of the
> "unnecessary releases" mentioned above, it could be as simple as
> "cancelling" a release if there hasn't been significant updates in the
> elapsed time period.
>
> My preference would be for 1), but that's influenced from what I'm familiar
> with in the Julia package ecosystem. It seems like it would still fit in
> the apache way since we would formally request a new release, wait the
> elapsed amount of time for voting (24 hours would be preferrable), then at
> the end of the voting period, a new release could be made.
>
> Thanks again kou for helping support the Julia implementation here.
>
> -Jacob
>
> 2)
>
> On Sun, Sep 5, 2021 at 3:25 PM Sutou Kouhei  wrote:
>
> > Hi,
> >
> > Sorry for the delay. This is a continuation of the "Status
> > of Arrow Julia implementation?" thread:
> >
> >
> > https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E
> >
> > I summarize the current status, the next actions and items
> > to be discussed.
> >
> > The current status:
> >
> >   * The Julia Arrow implementation uses
> > https://github.com/JuliaData/Arrow.jl as a "dev branch"
> > instead of creating a branch in
> > https://github.com/apache/arrow
> >   * The Julia Arrow implementation wants to use GitHub
> > for the main issue management platform
> >   * The Julia Arrow implementation wants to release
> > more frequency than 1 release per 3-4 months
> >   * The current workflow of the Rust Arrow implementation
> > will also fit the Julia Arrow implementation
> >
> > The current workflow of the Rust Arrow implementation:
> >
> >
> > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi
> >
> > * Uses apache/arrow-rs and apache/arrow-datafusion instead
> >   of apache/arrow for repository
> >
> > * Uses GitHub instead of JIRA for issue management
> >   platform
> >
> >
> > https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit
> >
> > * Releases a new minor and patch version every 2 weeks
> >   in addition to the quarterly release of the other releases
> >
> > The next actions after we get a consensus about this
> > discussion:
> >
> >   1. Start voting the Julia Arrow implementation move like
> >  the Rust's one:
> >
> >
> > https://lists.apache.org/x/thread.html/r44390a18b3fbb08ddb68aa4d12f37245d948984fae11a41494e5fc1d@%3Cdev.arrow.apache.org%3E
> >
> >   2. Create apache/arrow-julia
> >
> >   3. Start IP clearance process to import JuliaData/Arrow.jl
> >  to apache/arrow-julia
> >
> >  (We don't use julia/Arrow/ in apache/arrow.)
> >
> >   4. Import JuliaData/Arrow.jl to apache/arrow-julia
> >
> >   5. Prepare integration tests CI in apache/arrow-julia and apache/arrow
> >
> >   6. Prepare releasing tools in apache/arrow-julia and apache/arrow
> >
> >   7. Remove julia/... from apache/arrow and leave
> 

Re: RESULT Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC3

2021-08-15 Thread QP Hou
Thank you Wes, I will keep that in mind for the next release. I also
noticed I was adding the result prefix incorrectly while I was going
through older results threads yesterday. I have since added a calling
votes section in our release procedure documentation to reflect this.

On Sun, Aug 15, 2021 at 1:46 AM Wes McKinney  wrote:
>
> A couple of nitpicks with the result thread:
>
> * I counted 4, not 3 binding votes
> * the usual result subject line starts with “[RESULT][VOTE]” which makes
> vote results easier to locate when searching.
>
> Thank you for being the release manager!
>
> On Sat, Aug 14, 2021 at 9:56 PM QP Hou  wrote:
>
> > The vote has passed with three +1 votes. Thank you to all who helped
> > with the release verification.
> >
> > We have published new versions of datafusion and ballista to crates.io as
> > well:
> >
> > https://crates.io/crates/datafusion/5.0.0
> > https://crates.io/crates/ballista/0.5.0
> > https://crates.io/crates/ballista-core/0.5.0
> > https://crates.io/crates/ballista-executor/0.5.0
> > https://crates.io/crates/ballista-scheduler/0.5.0
> >
> > We didn't release the python binding to PyPI this time because it
> > requires a binary release, which was not included in this vote. We are
> > working on improving the process to include the python binary in the
> > release process.
> >
> > Thanks,
> > QP
> >
> > On Fri, Aug 13, 2021 at 2:34 PM Neville Dipale 
> > wrote:
> > >
> > > +1 (bniding)
> > >
> > > i verified the RC on aarch64-macos
> > >
> > > On Fri, 13 Aug 2021 at 23:29, Jorge Cardoso Leitão <
> > jorgecarlei...@gmail.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Great work everyone!
> > > >
> > > >
> > > > On Fri, Aug 13, 2021, 22:19 Daniël Heres 
> > wrote:
> > > >
> > > > > +1 (non binding). Looking good.
> > > > >
> > > > >
> > > > > On Fri, Aug 13, 2021, 07:49 QP Hou  wrote:
> > > > >
> > > > > > Good call Ruihang. I remember we used to have this toolchain file
> > when
> > > > > > we were still in the main arrow repo. I will take a look into that.
> > > > > >
> > > > > > On Wed, Aug 11, 2021 at 5:36 PM Wayne Xia 
> > > > wrote:
> > > > > > >
> > > > > > > Hi QP,
> > > > > > >
> > > > > > > When running this script I noticed that this might be because I
> > was
> > > > not
> > > > > > > using a stable toolchain when testing.
> > > > > > > Those failures occur with nightly (which is my default
> > toolchain).
> > > > And
> > > > > > > everything works fine after switching to stable 1.54.
> > > > > > > So I think it's ok from my side to vote +1.
> > > > > > >
> > > > > > > BTW, I think we can add a toolchain file [1] to datafusion repo.
> > > > > > >
> > > > > > > [1]:
> > > > > >
> > https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file
> > > > > > >
> > > > > > > On Thu, Aug 12, 2021 at 2:14 AM QP Hou 
> > wrote:
> > > > > > >
> > > > > > > > Hi Ruihang,
> > > > > > > >
> > > > > > > > Thanks for helping with the validation. It would certainly be
> > > > helpful
> > > > > > > > if you could share the error log with me.
> > > > > > > >
> > > > > > > > I have also prepared an updated version of the verification
> > script
> > > > at
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/houqp/arrow-datafusion/blob/qp_release/dev/release/verify-release-candidate.sh
> > > > > > > > .
> > > > > > > > This script does a clean checkout of everything before running
> > > > tests
> > > > > > > > and linting tools. Could you give that a try to see if you are
> > > > > getting
> > > > > > > > the same results?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > QP
> > > > > 

RESULT Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC3

2021-08-14 Thread QP Hou
The vote has passed with three +1 votes. Thank you to all who helped
with the release verification.

We have published new versions of datafusion and ballista to crates.io as well:

https://crates.io/crates/datafusion/5.0.0
https://crates.io/crates/ballista/0.5.0
https://crates.io/crates/ballista-core/0.5.0
https://crates.io/crates/ballista-executor/0.5.0
https://crates.io/crates/ballista-scheduler/0.5.0

We didn't release the python binding to PyPI this time because it
requires a binary release, which was not included in this vote. We are
working on improving the process to include the python binary in the
release process.

Thanks,
QP

On Fri, Aug 13, 2021 at 2:34 PM Neville Dipale  wrote:
>
> +1 (bniding)
>
> i verified the RC on aarch64-macos
>
> On Fri, 13 Aug 2021 at 23:29, Jorge Cardoso Leitão 
> wrote:
>
> > +1
> >
> > Great work everyone!
> >
> >
> > On Fri, Aug 13, 2021, 22:19 Daniël Heres  wrote:
> >
> > > +1 (non binding). Looking good.
> > >
> > >
> > > On Fri, Aug 13, 2021, 07:49 QP Hou  wrote:
> > >
> > > > Good call Ruihang. I remember we used to have this toolchain file when
> > > > we were still in the main arrow repo. I will take a look into that.
> > > >
> > > > On Wed, Aug 11, 2021 at 5:36 PM Wayne Xia 
> > wrote:
> > > > >
> > > > > Hi QP,
> > > > >
> > > > > When running this script I noticed that this might be because I was
> > not
> > > > > using a stable toolchain when testing.
> > > > > Those failures occur with nightly (which is my default toolchain).
> > And
> > > > > everything works fine after switching to stable 1.54.
> > > > > So I think it's ok from my side to vote +1.
> > > > >
> > > > > BTW, I think we can add a toolchain file [1] to datafusion repo.
> > > > >
> > > > > [1]:
> > > > https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file
> > > > >
> > > > > On Thu, Aug 12, 2021 at 2:14 AM QP Hou  wrote:
> > > > >
> > > > > > Hi Ruihang,
> > > > > >
> > > > > > Thanks for helping with the validation. It would certainly be
> > helpful
> > > > > > if you could share the error log with me.
> > > > > >
> > > > > > I have also prepared an updated version of the verification script
> > at
> > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/houqp/arrow-datafusion/blob/qp_release/dev/release/verify-release-candidate.sh
> > > > > > .
> > > > > > This script does a clean checkout of everything before running
> > tests
> > > > > > and linting tools. Could you give that a try to see if you are
> > > getting
> > > > > > the same results?
> > > > > >
> > > > > > Thanks,
> > > > > > QP
> > > > > >
> > > > > > On Wed, Aug 11, 2021 at 6:38 AM Wayne Xia 
> > > > wrote:
> > > > > > >
> > > > > > > Thanks, QP!
> > > > > > >
> > > > > > > I verified the signature and checked shasum, but got 3 failed
> > case
> > > > while
> > > > > > > testing:
> > > > > > >
> > > > > > > - execution_plans::shuffle_writer::tests::test
> > > > > > > - execution_plans::shuffle_writer::tests::test_partitioned
> > > > > > > -
> > > > > >
> > > >
> > >
> > physical_plan::repartition::tests::repartition_with_dropping_output_stream
> > > > > > >
> > > > > > > I set up env `ARROW_TEST_DATA` and `PARQUET_TEST_DATA`, then run
> > > the
> > > > test
> > > > > > > with
> > > > > > > "cargo test --all --no-fail-fast" on Linux 5.13.6 with x86_64
> > chip.
> > > > > > >
> > > > > > > Did I miss something? I can paste the log here or file an issue
> > if
> > > > > > needed.
> > > > > > >
> > > > > > > Ruihang
> > > > > > >
> > > > > > > QP Hou :
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > &g

Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC3

2021-08-12 Thread QP Hou
Good call Ruihang. I remember we used to have this toolchain file when
we were still in the main arrow repo. I will take a look into that.

On Wed, Aug 11, 2021 at 5:36 PM Wayne Xia  wrote:
>
> Hi QP,
>
> When running this script I noticed that this might be because I was not
> using a stable toolchain when testing.
> Those failures occur with nightly (which is my default toolchain). And
> everything works fine after switching to stable 1.54.
> So I think it's ok from my side to vote +1.
>
> BTW, I think we can add a toolchain file [1] to datafusion repo.
>
> [1]: https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file
>
> On Thu, Aug 12, 2021 at 2:14 AM QP Hou  wrote:
>
> > Hi Ruihang,
> >
> > Thanks for helping with the validation. It would certainly be helpful
> > if you could share the error log with me.
> >
> > I have also prepared an updated version of the verification script at
> >
> > https://github.com/houqp/arrow-datafusion/blob/qp_release/dev/release/verify-release-candidate.sh
> > .
> > This script does a clean checkout of everything before running tests
> > and linting tools. Could you give that a try to see if you are getting
> > the same results?
> >
> > Thanks,
> > QP
> >
> > On Wed, Aug 11, 2021 at 6:38 AM Wayne Xia  wrote:
> > >
> > > Thanks, QP!
> > >
> > > I verified the signature and checked shasum, but got 3 failed case while
> > > testing:
> > >
> > > - execution_plans::shuffle_writer::tests::test
> > > - execution_plans::shuffle_writer::tests::test_partitioned
> > > -
> > physical_plan::repartition::tests::repartition_with_dropping_output_stream
> > >
> > > I set up env `ARROW_TEST_DATA` and `PARQUET_TEST_DATA`, then run the test
> > > with
> > > "cargo test --all --no-fail-fast" on Linux 5.13.6 with x86_64 chip.
> > >
> > > Did I miss something? I can paste the log here or file an issue if
> > needed.
> > >
> > > Ruihang
> > >
> > > QP Hou :
> > >
> > > > Hi,
> > > >
> > > > I would like to propose a release of Apache Arrow Datafusion
> > > > Implementation,
> > > > version 5.0.0.
> > > >
> > > > RC3 fixed a cargo publish issue discovered in RC1.
> > > >
> > > > This release candidate is based on commit:
> > > > deb929369c9aaba728ae0c2c49dcd05bfecc8bf8 [1]
> > > > The proposed release tarball and signatures are hosted at [2].
> > > > The changelog is located at [3].
> > > >
> > > > Please download, verify checksums and signatures, run the unit tests,
> > and
> > > > vote
> > > > on the release. The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Arrow Datafusion 5.0.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this as Apache Arrow Datafusion 5.0.0 because...
> > > >
> > > > [1]:
> > > >
> > https://github.com/apache/arrow-datafusion/tree/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8
> > > > [2]:
> > > >
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc3
> > > > [3]:
> > > >
> > https://github.com/apache/arrow-datafusion/blob/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8/CHANGELOG.md
> > > >
> > > > Thanks,
> > > > QP
> > > >
> >


Re: [VOTE][RUST] Release Apache Arrow Rust 5.2.0 RC1

2021-08-12 Thread QP Hou
+1 (non-binding)

ran the verification script on Linux 5.4.0 x86_64

On Thu, Aug 12, 2021 at 12:44 PM Andrew Lamb  wrote:
>
> Hi,
>
> I would like to propose a release of Apache Arrow Rust Implementation,
> version 5.2.0.
>
> This release candidate is based on commit:
> 7c98c4c60bc776acd09bd3568c6630d360e8d652 [1]
>
> The proposed release tarball and signatures are hosted at [2].
>
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. There is a script [4] that automates some of
> the verification.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Rust
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Rust  because...
>
> [1]:
> https://github.com/apache/arrow-rs/tree/7c98c4c60bc776acd09bd3568c6630d360e8d652
> [2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-5.2.0-rc1
> [3]:
> https://github.com/apache/arrow-rs/blob/7c98c4c60bc776acd09bd3568c6630d360e8d652/CHANGELOG.md
> [4]:
> https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC3

2021-08-11 Thread QP Hou
Hi Ruihang,

Thanks for helping with the validation. It would certainly be helpful
if you could share the error log with me.

I have also prepared an updated version of the verification script at
https://github.com/houqp/arrow-datafusion/blob/qp_release/dev/release/verify-release-candidate.sh.
This script does a clean checkout of everything before running tests
and linting tools. Could you give that a try to see if you are getting
the same results?

Thanks,
QP

On Wed, Aug 11, 2021 at 6:38 AM Wayne Xia  wrote:
>
> Thanks, QP!
>
> I verified the signature and checked shasum, but got 3 failed case while
> testing:
>
> - execution_plans::shuffle_writer::tests::test
> - execution_plans::shuffle_writer::tests::test_partitioned
> - physical_plan::repartition::tests::repartition_with_dropping_output_stream
>
> I set up env `ARROW_TEST_DATA` and `PARQUET_TEST_DATA`, then run the test
> with
> "cargo test --all --no-fail-fast" on Linux 5.13.6 with x86_64 chip.
>
> Did I miss something? I can paste the log here or file an issue if needed.
>
> Ruihang
>
> QP Hou :
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Datafusion
> > Implementation,
> > version 5.0.0.
> >
> > RC3 fixed a cargo publish issue discovered in RC1.
> >
> > This release candidate is based on commit:
> > deb929369c9aaba728ae0c2c49dcd05bfecc8bf8 [1]
> > The proposed release tarball and signatures are hosted at [2].
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests, and
> > vote
> > on the release. The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Datafusion 5.0.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Datafusion 5.0.0 because...
> >
> > [1]:
> > https://github.com/apache/arrow-datafusion/tree/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8
> > [2]:
> > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc3
> > [3]:
> > https://github.com/apache/arrow-datafusion/blob/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8/CHANGELOG.md
> >
> > Thanks,
> > QP
> >


[VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC3

2021-08-11 Thread QP Hou
Hi,

I would like to propose a release of Apache Arrow Datafusion Implementation,
version 5.0.0.

RC3 fixed a cargo publish issue discovered in RC1.

This release candidate is based on commit:
deb929369c9aaba728ae0c2c49dcd05bfecc8bf8 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and vote
on the release. The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Datafusion 5.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow Datafusion 5.0.0 because...

[1]: 
https://github.com/apache/arrow-datafusion/tree/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8
[2]: 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc3
[3]: 
https://github.com/apache/arrow-datafusion/blob/deb929369c9aaba728ae0c2c49dcd05bfecc8bf8/CHANGELOG.md

Thanks,
QP


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC2

2021-08-10 Thread QP Hou
Looks like the commit is off, please ignore this vote while I prepare
for a new one.

On Tue, Aug 10, 2021 at 10:28 PM QP Hou  wrote:
>
> Hi,
>
> I would like to propose a release of Apache Arrow Datafusion Implementation,
> version 5.0.0.
>
> Compared to RC1, RC2 fixed a cargo publish issue for ballista crates.
>
> This release candidate is based on commit:
> 96658eb100436c47601ed10095d74299d2229020 [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
>
> Please download, verify checksums and signatures, run the unit tests, and vote
> on the release. The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow Datafusion
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow Datafusion  because...
>
> [1]: 
> https://github.com/apache/arrow-datafusion/tree/96658eb100436c47601ed10095d74299d2229020
> [2]: 
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc2
> [3]: 
> https://github.com/apache/arrow-datafusion/blob/96658eb100436c47601ed10095d74299d2229020/CHANGELOG.md


[VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC2

2021-08-10 Thread QP Hou
Hi,

I would like to propose a release of Apache Arrow Datafusion Implementation,
version 5.0.0.

Compared to RC1, RC2 fixed a cargo publish issue for ballista crates.

This release candidate is based on commit:
96658eb100436c47601ed10095d74299d2229020 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and vote
on the release. The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Datafusion
[ ] +0
[ ] -1 Do not release this as Apache Arrow Datafusion  because...

[1]: 
https://github.com/apache/arrow-datafusion/tree/96658eb100436c47601ed10095d74299d2229020
[2]: 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc2
[3]: 
https://github.com/apache/arrow-datafusion/blob/96658eb100436c47601ed10095d74299d2229020/CHANGELOG.md


Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC1

2021-08-10 Thread QP Hou
> 4. Verified the signatures using the commands below (not sure if the
WARNING is something we should fix)

I believe the warning is caused by my key not being signed by others
in the network, I will get that fixed.

Thank you Andy for the quick fix, I will send a voting thread for rc2
later tonight. Will also add `cargo publish --dry-run` into the
release test automation script.

Thanks,
QP Hou

On Tue, Aug 10, 2021 at 4:13 PM Andy Grove  wrote:
>
> Here is a PR to resolve this
>
> https://github.com/apache/arrow-datafusion/pull/852
>
>
> On Tue, Aug 10, 2021 at 4:24 PM Andy Grove  wrote:
>
> > Hi QP,
> >
> > This looks great overall but I did run into one issue. The Ballista crates
> > have relative paths for dependencies rather than depending on versioned
> > crates, so we would not be able to publish them to crates.io
> >
> > *$ find ballista -name Cargo.toml -exec grep -H "path" {} \;*
> > ballista/rust/scheduler/Cargo.toml:ballista-core = { path = "../core" }
> > ballista/rust/scheduler/Cargo.toml:datafusion = { path =
> > "../../../datafusion" }
> > ballista/rust/scheduler/Cargo.toml:ballista-core = { path = "../core" }
> > ballista/rust/executor/Cargo.toml:ballista-core = { path = "../core" }
> > ballista/rust/executor/Cargo.toml:datafusion = { path =
> > "../../../datafusion" }
> > ballista/rust/core/Cargo.toml:datafusion = { path = "../../../datafusion" }
> > ballista/rust/client/Cargo.toml:ballista-core = { path = "../core" }
> > ballista/rust/client/Cargo.toml:ballista-executor = { path =
> > "../executor", optional = true }
> > ballista/rust/client/Cargo.toml:ballista-scheduler = { path =
> > "../scheduler", optional = true }
> > ballista/rust/client/Cargo.toml:datafusion = { path =
> > "../../../datafusion" }
> >
> > I assume this is relatively simple to fix. I can look into this more later
> > today.
> >
> > Thanks,
> >
> > Andy.
> >
> > On Tue, Aug 10, 2021 at 2:48 PM QP Hou  wrote:
> >
> >> Hi,
> >>
> >> I would like to propose a release of Apache Arrow Datafusion
> >> Implementation,
> >> version 5.0.0.
> >>
> >> This release candidate is based on commit:
> >> 96658eb100436c47601ed10095d74299d2229020 [1]
> >> The proposed release tarball and signatures are hosted at [2].
> >> The changelog is located at [3].
> >>
> >> Please download, verify checksums and signatures, run the unit tests, and
> >> vote
> >> on the release. The vote will be open for at least 72 hours.
> >>
> >> [ ] +1 Release this as Apache Arrow Datafusion
> >> [ ] +0
> >> [ ] -1 Do not release this as Apache Arrow Datafusion  because...
> >>
> >> [1]:
> >> https://github.com/apache/arrow-datafusion/tree/96658eb100436c47601ed10095d74299d2229020
> >> [2]:
> >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc1
> >> [3]:
> >> https://github.com/apache/arrow-datafusion/blob/96658eb100436c47601ed10095d74299d2229020/CHANGELOG.md
> >>
> >> Thanks,
> >> QP
> >>
> >


[VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 5.0.0 RC1

2021-08-10 Thread QP Hou
Hi,

I would like to propose a release of Apache Arrow Datafusion Implementation,
version 5.0.0.

This release candidate is based on commit:
96658eb100436c47601ed10095d74299d2229020 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and vote
on the release. The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Datafusion
[ ] +0
[ ] -1 Do not release this as Apache Arrow Datafusion  because...

[1]: 
https://github.com/apache/arrow-datafusion/tree/96658eb100436c47601ed10095d74299d2229020
[2]: 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-5.0.0-rc1
[3]: 
https://github.com/apache/arrow-datafusion/blob/96658eb100436c47601ed10095d74299d2229020/CHANGELOG.md

Thanks,
QP


Re: [Rust] Ballista status and benchmarks

2021-08-08 Thread QP Hou
I am not able to load the benchmark graph png, so reattaching it here
in case other people have the same issue.

On Sun, Aug 8, 2021 at 1:31 PM Andy Grove  wrote:
>
> I wanted to share a quick update on the status of Ballista.
>
> Ballista is now capable of running some of the TPC-H benchmark queries 
> against the 1TB data set. I documented the benchmark results for DataFusion, 
> Ballista, and Apache Spark for reference, here:
>
> https://github.com/andygrove/ballista-research/wiki/Ballista-benchmarks-results
>
> Here is a chart showing these initial results. As noted on the wiki page, I 
> didn't put too much thought into the configurations used but they are at 
> least documented. I now plan on iterating on my benchmark process to measure 
> memory usage and show scalability as executors/cores are added for each 
> solution.
> Note that I hit failures with all three solutions and this is why there are 
> columns missing from the chart.
>
> I'll try and get something more compelling written up for a blog post to 
> coincide with the upcoming release of Ballista 0.5.0 but I figured folks 
> might be interested in these informal interim results.
>
> Thanks,
>
> Andy.
>
>
>


Re: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-08-04 Thread QP Hou
Just my two cents.

I think we all have the same goal here, which is to accelerate the
transitioning of arrow to arrow2 as the official arrow rust
implementation.

In my opinion, the biggest gain we can get from merging two projects
into one repo is to have some kind of a policy to enforce that every
new feature/test added to the current arrow implementation also  needs
to be added to the arrow2 implementation. This way, we can make sure
the gap between arrow and arrow2 is closing on every iteration.
Without this, I tend to agree with Jorge that merging two repos would
add more overhead to his work and slow him down.

For those who want to contribute to arrow2 to accelerate the
transition, I don't think they would have problem sending PRs to the
arrow2 repo. For those who are not interested in contributing to
arrow2, merging the arrow2 code base into the current arrow-rs repo
won't incentivize them to contribute. Merging arrow2 into current
arrow-rs repo could help with discovery. But I think this can be
achieved by adding a big note in the current arrow-rs README to
encourage contributions to the arrow2 repo as well.

At the end of the day, Jorge is currently the sole active contributor
to the arrow2 implementation, so I think he would have the most say on
what's the most productive way to push arrow2 forward. The only
concern I have with regards to merging arrow2 into arrow-rs right now
is Jorge spent all the efforts to do the merge, then it turned out
that he is still the only active contributor to arrow2 within
arrow-rs, but with more overhead that he has to deal with.

As for maintaining semantic versioning for arrow2, Andy had a good
point that we could still release arrow2 with its own versioning even
if we merge it into the arrow-rs repo. So I don't think we should
worry/focus too much about versioning in our discussion. Velocity to
close the gap between arrow-rs and arrow2 is the most important thing.

Lastly, I do agree with Andrew that it would be good to only maintain
a single arrow crate in crates.io in the long run. As he mentioned,
when the current arrow2 code base becomes stable, we could still
release it under the arrow namespace in crates.io with a major version
bump. The absolute value in the major version doesn't really matter as
long as we stick to the convention that breaking change will result in
a major version bump.

Thanks,
QP



On Tue, Aug 3, 2021 at 5:31 PM paddy horan  wrote:
>
> Hi Jorge,
>
> I see value in consolidating development in a single repo and releasing under 
> the existing arrow crate.  Regarding versioning, I think once we follow 
> semantic versioning we are fine.  I don't think it's worth migrating to a 
> different repo and crate to comply with the de-facto standard you mention.
>
> Just one person's opinion though,
> Paddy
>
>
> -Original Message-
> From: Jorge Cardoso Leitão 
> Sent: Tuesday, August 3, 2021 5:23 PM
> To: dev@arrow.apache.org
> Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
>
> Hi Paddy,
>
> > What do you think about moving Arrow2 into the main Arrow repo where
> > it
> is only enabled via an "experimental" feature flag?
>
> AFAIK this is already possible:
> * add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
> * add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs
>
> We do this kind of thing to expose APIs from non-arrow crates such as parts 
> of the parquet-format-rs crate, and is generally the way to go when a crate 
> wants to expose a third-party API.
>
> I would not recommend doing this, though: by exposing arrow2 from arrow, we 
> double the compilation time and binary size of all dependencies that activate 
> the flag. Furthermore, there are users of arrow2 that do not need the arrow 
> crate, which this model would not support.
>
> AFAIK where development happens is unrelated to this aspect, Rust enables 
> this by design.
>
> > but also this would be a clear signal that Arrow2 is <1.0.
> > the experimental flag will be a clear signal to the existing Arrow
> community that Arrow2 is the future but that it is <1.0
>
> arrow2 is already <1.0 
> .
>  My argument is that the arrow/arrow-flight/parquet are not versioned 
> according to the Rust community standards: It is a de facto practice in Rust 
> to delay major releases until the API is stable. Tokio's blog post about 
> their 1.0 
> 

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

2021-08-01 Thread QP Hou
Summarizing the discussed proposal in our Github issue [1] for broader
discussion and review on the dev list.

The current arrow-datafusion repo contains the following high level
subprojects: datafusion, datafusion python binding and ballista.

In order to be able to release ballista and datafusion python binding
with semantic versioning, I propose we decouple subproject versions
from each other. As a result, we will be able to release a breaking
change in datafusion without forcing a major version bump in ballista
or python binding if that breaking change is not visible to their
consumers.

To reduce release overhead, we will still vote on the whole
arrow-datafusion repo on every release. From the same release tarball,
we can then release these sub-projects to their language specific
registries (crates.io and pypi) with their own versions.

Take the upcoming datafusion 5.0.0 release as an example. Within the
same source release, we also have the code for ballista-0.5.0 and
datafusion-python-0.3.0. We only need to vote on a signed
apache-arrow-datafusion-5.0.0.tar.gz tarball.

Consequence of this process is every time we need to release a new
version of the python binding or ballista, we need to trigger a new
datafusion release as well. However, datafusion release won't require
a new release from the other two subprojects. For example, datafusion
5.1.0 release can just include a datafusion python release 0.4.0
without a ballista release. In that case, we will just skip crates.io
publish for ballista.

Here is what the release process will look like:

* Send a PR with the following changes to prepare the source tree for
a new release:
- Update versions in Cargo.toml files
- Run automation script to generate
{datafusion,python,ballista}/CHANGELOG.md
* After PR gets merged, push git tag x.y.z to Github
* Run dev/release/create-tarball.sh to create and upload a signed
tarball for voting in the dev list
* After vote passed, run ./dev/release/release-tarball.sh to move
approved tarball to the release location in SVN
* Unpack released tarball and release subproject to language specific
registries:
- run `cargo publish` in datafusion to release datafusion to crates.io
- if there is a new ballista release
- run `cargo publish` in
ballista/rust/{client,core,executor,scheduler} folders to release
ballista to crates.io
- push `ballista-x.y.z` tag to Github
- if there is a new datafusion python release
- run `maturin publish` in python folder to release datafusion
python binding to pypi
- release python documentation
- push `python-x.y.z` tag to Github

I would like to get some feedback on this proposal since it is a
little bit different from other Arrow projects. But I do think this
will provide a bitter dependency pinning experience and changelog
tracking for those sub-projects' downstream consumers.

[1]: https://github.com/apache/arrow-datafusion/issues/771


On Tue, Jul 27, 2021 at 4:18 PM Andrew Lamb  wrote:
>
> Thanks to you both -- this sounds great.
>
> On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu  wrote:
>
> > Not sure it's necessarily bundled together but I believe a Python,
> > documentation, etc. release can also be helpful. I can volunteer to help if
> > somehow these works can be parallelized.
> >
> > On Tue, Jul 27, 2021 at 3:29 PM QP Hou  wrote:
> >
> > > Following up on this, since delta-rs could really benefit from this
> > > release, I have started some initial work with
> > > https://github.com/apache/arrow-datafusion/pull/780 to move things
> > > forward. Others are welcome to join the party.
> > >
> > > On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb 
> > wrote:
> > > >
> > > > Does anyone want to make a DataFusion / Ballista official release (and
> > > then
> > > > subsequent release to crates.io)?  There is now a ticket [1] to track
> > > this
> > > > work. I think it would be great to do if someone has time. There are
> > all
> > > > sorts of great features that have gone in since 4.0.0
> > > >
> > > > I don't have much time to devote to the release management of
> > DataFusion
> > > /
> > > > Ballista in the near term (as my project uses DataFusion master and my
> > > > release management budget is already spent on managing arrow-rs
> > > releases).
> > > >
> > > > Andrew
> > > >
> > > > [1] https://github.com/apache/arrow-datafusion/issues/771
> > >
> >


Re: [ANNOUNCE] New Arrow PMC member: Neville Dipale

2021-07-30 Thread QP Hou
Well deserved, congratulations Neville!

On Thu, Jul 29, 2021 at 3:20 PM Wes McKinney  wrote:
>
> The Project Management Committee (PMC) for Apache Arrow has invited
> Neville Dipale to become a PMC member and we are pleased to announce
> that Neville has accepted.
>
> Congratulations and welcome!


Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

2021-07-27 Thread QP Hou
Following up on this, since delta-rs could really benefit from this
release, I have started some initial work with
https://github.com/apache/arrow-datafusion/pull/780 to move things
forward. Others are welcome to join the party.

On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb  wrote:
>
> Does anyone want to make a DataFusion / Ballista official release (and then
> subsequent release to crates.io)?  There is now a ticket [1] to track this
> work. I think it would be great to do if someone has time. There are all
> sorts of great features that have gone in since 4.0.0
>
> I don't have much time to devote to the release management of DataFusion /
> Ballista in the near term (as my project uses DataFusion master and my
> release management budget is already spent on managing arrow-rs releases).
>
> Andrew
>
> [1] https://github.com/apache/arrow-datafusion/issues/771


Re: [ANNOUNCE] New Arrow committer: QP Hou

2021-07-27 Thread QP Hou
Thank you all for the warm welcome! It's been a lot of fun hacking on
Arrow together with so many talented engineers :)


On Mon, Jul 26, 2021 at 10:37 PM Jorge Cardoso Leitão
 wrote:
>
> Congratulations and thank you for all the great work! It is a pleasure to
> work with you.
>
> Best,
> Jorge
>
>
> On Mon, Jul 26, 2021 at 7:38 PM Niranda Perera 
> wrote:
>
> > Congrats QP! :-)
> >
> > On Mon, Jul 26, 2021 at 1:24 PM Micah Kornfield 
> > wrote:
> >
> > > Congrats QP!
> > >
> > > On Mon, Jul 26, 2021 at 10:02 AM Andrew Lamb 
> > wrote:
> > >
> > > > Congratulations QP!
> > > >
> > > > On Mon, Jul 26, 2021 at 10:41 AM Jarek Potiuk 
> > wrote:
> > > >
> > > > > Congrats QP :). I see you are in more Apache projects now :).
> > > > >
> > > > > J.
> > > > >
> > > > > On Mon, Jul 26, 2021 at 4:10 PM Daniël Heres 
> > > > > wrote:
> > > > >
> > > > > > Welcome QP!
> > > > > >
> > > > > > Thanks for the work you are doing on DataFusion / arrow-rs and
> > > > delta-rs!
> > > > > >
> > > > > > Daniël
> > > > > >
> > > > > > Op ma 26 jul. 2021 om 16:05 schreef Wes McKinney <
> > > wesmck...@gmail.com
> > > > >:
> > > > > >
> > > > > > > On behalf of the Arrow PMC, I'm happy to announce that QP has
> > > > accepted
> > > > > an
> > > > > > > invitation to become a committer on Apache Arrow. Welcome, and
> > > thank
> > > > > you
> > > > > > > for your contributions!
> > > > > > >
> > > > > > > Wes
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Daniël Heres
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > +48 660 796 129
> > > > >
> > > >
> > >
> >
> >
> > --
> > Niranda Perera
> > https://niranda.dev/
> > @n1r44 
> >


Re: [VOTE] Donation of rust arrow2 and parquet2

2021-06-28 Thread QP Hou
+1 (non binding)

Really exciting stuff, amazing work Jorge.

On Mon, Jun 28, 2021 at 8:32 AM Antoine Pitrou  wrote:
>
> +1 as well (binding)
>
>
> Le 28/06/2021 à 17:28, Ben Kietzman a écrit :
> > +1 (binding)
> >
> > On Mon, Jun 28, 2021 at 5:35 AM Wes McKinney  wrote:
> >
> >> +1 (binding)
> >>
> >> On Mon, Jun 28, 2021 at 11:08 AM Daniël Heres 
> >> wrote:
> >>>
> >>> +1 (non binding)
> >>>
> >>> Great work Jorge!
> >>>
> >>> On Mon, Jun 28, 2021, 10:26 Weston Steimel 
> >> wrote:
> >>>
>  +1
> 
>  On Sun, 27 Jun 2021, 07:41 Jorge Cardoso Leitão, <
> >> jorgecarlei...@gmail.com
> >
>  wrote:
> 
> > Hi,
> >
> > I would like to bring to this mailing list a proposal to donate the
>  source
> > code of arrow2 [1] and parquet2 [2] as experimental repositories [3]
>  within
> > Apache Arrow, conditional on IP clearance.
> >
> > The specific PRs are:
> >
> > * https://github.com/apache/arrow-experimental-rs-arrow2/pull/1
> > * https://github.com/apache/arrow-experimental-rs-parquet2/pull/1
> >
> > The source code contains rewrites of the arrow and parquet crates
> >> with
> > safety and security in mind. In particular,
> >
> > * no buffer transmutes
> > * no unsafe APIs marked as safe
> > * parquet's implementation is unsafe free
> >
> > There are many other important features, such as big endian support
> >> and
>  IPC
> > 2.0 support. There is one regression over latest: support nested
> >> types in
> > parquet read and write. I observe no negative impact on performance.
> >
> > See a longer discussion in [4] over the reasons why the current rust
> > implementation is susceptible to safety violations. In particular,
> >> many
> > core APIs of the crate are considered security vulnerabilities under
> > RustSec's [5] definitions, and are difficult to address on its
> >> current
> > design.
> >
> > I validated that it is possible to migrate DataFusion [6] and Polars
> >> [7]
> > without further code changes.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Accept the code donation as experimental repos.
> > [ ] +0
> > [ ] -1 Do not accept the code donation as experimental repos
> >> because...
> >
> > [1]
> >
> >
> 
> >> https://github.com/apache/arrow/blob/master/docs/source/developers/experimental_repos.rst
> > [2] https://github.com/jorgecarleitao/arrow2
> > [3] https://github.com/jorgecarleitao/parquet2
> > [4] https://github.com/jorgecarleitao/arrow2#faq
> > [5] https://rustsec.org/
> > [6] https://github.com/apache/arrow-datafusion/pull/68
> > [7] https://github.com/pola-rs/polars
> >
> 
> >>
> >


Re: Delta Lake support for DataFusion

2021-06-10 Thread QP Hou
Thanks Daniël for starting the discussion!

Looks like we are on the same page to take this as an opportunity to
make datafusion more extensible :)

I think Neville and Daniël nailed the biggest missing piece at the
moment: being able to extend SQL parser and planner with new syntaxes
and map them to custom plan/expression nodes.

Another thing that I think we should do is to come up with a way to
better surface these datafusion extensions to help with discoveries.
For example, pandas has a dedicated section [1] in their official doc
for this. Perhaps we could start with adding a list of extensions in
the readme.

After thinking more on this, I feel like it's better to keep the
extension within delta-rs for now. In the future, delta-rs will likely
need to depend on ballista for processing delta table metadata using
distributed compute. So if we move the extension code into
arrow-datafusion, it might result in circular dependency. I don't see
a lot of benefits in creating a dedicated datafusion-delta-rs repo at
the moment. But I am happy to go that route if there are compelling
reasons. My main goal is just to make sure we have a single officially
maintained datafusion extension for delta lake.

[1]: https://pandas.pydata.org/docs/ecosystem.html#io

Thanks,
QP Hou

On Wed, Jun 9, 2021 at 11:30 AM Daniël Heres  wrote:
>
> Thanks all for the valuable input!
>
> I agree following the plugin / model makes a lot of sense for now (either
> in arrow-datafusion repo or somewhere external, for example in delta-rs if
> we're OK it not being part of Apache right now).
>
> In order to support certain Delta Lake features including SQL syntax we
> probably need to do make DataFusion a bit more extensible besides what is
> currently possible with the TableProvider, for example:
>
> * Allow registering a custom data format (for supporting things like *create
> external table t stored as parquet*)
> * Allow parsing and/or handling custom SQL syntax like *optimize*  /
> *vacuum* / *select * from t version as of n* , etc.
>
> And probably some more I don't think of currently. I think this is useful
> work as it also would enable other "extensions" to work in a similar way
> (e.g. Apache Iceberg and other formats / readers / writers / syntax) and
> make DataFusion a more flexible engine.
>
> Best, Daniël
>
> Op wo 9 jun. 2021 om 20:07 schreef Neville Dipale :
>
> > The correct approach might be to improve DataFusion support in
> > delta-rs. TableProvider is already implemented here:
> > https://github.com/delta-io/delta-rs/blob/main/rust/src/delta_datafusion.rs
> >
> > I've pinged QP to ask for their advice.
> >
> > Neville
> >
> > On Wed, 9 Jun 2021 at 19:58, Andrew Lamb  wrote:
> >
> > > I think the idea of DataFusion + DeltaLake is quite compelling and likely
> > > useful.
> > >
> > > However, I think DataFusion is ideally an  "embeddable query engine"
> > rather
> > > than a database system in itself, so in that mental model Delta Lake
> > > integration belongs somewhere other than the core DataFusion crate.
> > >
> > > My ideal structure would be a new crate (maybe not even part of the
> > Apache
> > > Arrow Project), perhaps called `datafusion-delta-rs`, that contained the
> > > TableProvider and whatever else was needed to integrate DataFusion with
> > > DeltaLake
> > >
> > > This structure could also start a pattern of publishing plugins for
> > > DataFusion separately from the core.
> > >
> > > Andrew
> > > p.s. now that Arrow is publishing more incrementally (e.g. 4.1.0, 4.2.0,
> > > etc), I think delta-rs[1] and datafusion both only specify `4.x` so they
> > > should work together nicely
> > >
> > > https://github.com/delta-io/delta-rs/blame/main/rust/Cargo.toml
> > >
> > > On Wed, Jun 9, 2021 at 2:29 AM Daniël Heres 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to receive some feedback about adding Delta Lake support
> > to
> > > > DataFusion (https://github.com/apache/arrow-datafusion/issues/525).
> > > > As you might know, Delta Lake <https://delta.io/> is a format adding
> > > > features like ACID transactions, statistics, and storage optimization
> > to
> > > > Parquet and is getting quite some traction for managing data lakes.
> > > > It seems a great feature to have in DataFusion as well.
> > > >
> > > > The delta-rs <https://github.com/delta-io/delta-rs> project provides a
> > > > native, Apache licensed, Rust implementation of Delta Lake, already
> > > > supporting a large part of the format and operations.
> > > >
> > > > The first integration I would like to propose is adding read support
> > via
> > > a
> > > > new TableProvider. There might be some work to do around dependencies
> > as
> > > > both DataFusion and delta-rs rely on (certain versions of) Arrow and
> > > > Parquet.
> > > >
> > > > Let me know if you have any further ideas or concerns.
> > > >
> > > > Best regards,
> > > >
> > > > Daniël Heres
> > > >
> > >
> >
>
>
> --
> Daniël Heres


Re: [DataFusion] [Discuss] Output Schema for queries with multiple relations

2021-05-19 Thread QP Hou
Hi all,

Following up on this.

We have updated the output schema doc [1] and updated invariant doc
[2] for the final round of review.

In the updated invariant doc, the main change we introduced compared
to the previous version is as follows:

We now enforce strict schema equality in all plan optimization
invariants. As a result, optimizations like reordering join sides need
to add an extra projection to maintain the schema field order. We
believe the extra projection should have minimal overhead. The upside
is it will help keep the field order semantic simple and easy for end
users to understand.

In the draft PR [3], Andy raised a concern that by referring to
physical columns using indices instead of names, it might limit our
ability to support schemaless data sources in the future. After
thinking more on this, I think the current design can be extended to
support schemaless data sources in the future by going one of the
following two routes:

* Make the index field in physical columns optional. During physical
plan execution, we could fallback to the name field for schemaless
data sources while keep using indices for data sources that have
static schemas.
* Introduce a new type of physical column expression to refer columns
in schemaless data sources

I intentionally left out discussion of schemaless data sources in the
updated invariant doc to keep the scope manageable for smaller
incremental deliverables and ease of review. My main goal here is to
make sure whatever design change we propose for multi-relations
support won't prevent us from supporting schemaless use-cases in the
future.

If you have any feedback or concern with the current design, now is a
good time to raise them :)

I am aiming to get the implementation PR out of draft mode in a week or so.

[1]: 
https://docs.google.com/document/d/1uviWavwEGD3qxwMk2AGkOgp6ENrvKGiMWQhHNbqPwhg/
[2]: 
https://docs.google.com/document/d/1dbK-3eaTHlzZcHzpTk1h-LA3b7dcxsVBcoZeVKYIPwI/
[3]: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-829296665

Thanks,
QP Hou

On Wed, May 5, 2021 at 3:52 AM Andrew Lamb  wrote:
>
> I wanted to bring some additional attention to some discussion occurring on
> a PR [1], specifically the proposal of how to construct output field names
> from queries that have multiple relations (that may have the same input
> field).
>
> The documents are:
> * Document for output schema field name semantics with examples: [2]
> * Proposed change to @jorgecarleitao 's invariant doc [3]
> * Updated invariant doc with proposed changes applied [4]
>
> Please comment on the PR / in the docs if you are interested.
>
> Andrew
>
> [1]
> https://github.com/apache/arrow-datafusion/pull/55#issuecomment-831405269
> [2]
> https://docs.google.com/document/d/1uviWavwEGD3qxwMk2AGkOgp6ENrvKGiMWQhHNbqPwhg/edit?usp=sharing
> [3]
> https://docs.google.com/document/d/158gbfDp8pcakfriT2l7dHChwJB43_RV7lcWfxEC73ng/edit?usp=sharing
> [4]
> https://docs.google.com/document/d/1dbK-3eaTHlzZcHzpTk1h-LA3b7dcxsVBcoZeVKYIPwI/edit?usp=sharing


Re: [ANNOUNCE] New Arrow committer: Daniël Heres

2021-04-28 Thread QP Hou
Congrats Daniël, well deserved!

Thanks,
QP Hou

On Wed, Apr 28, 2021 at 6:25 AM Andy Grove  wrote:
>
> On behalf of the Arrow PMC, I'm happy to announce that Daniël has
>
> accepted an invitation to become a committer on Apache Arrow.
>
> Welcome, and thank you for your contributions!


Re: [DISCUSS] [Rust] Move Rust components to new repos and process

2021-04-12 Thread QP Hou
On Mon, Apr 12, 2021 at 6:55 AM Andy Grove  wrote:
>
> Hi Krisztian,
>
> When you say that using GitHub issues is "not the apache way of issue
> tracking", are you referring to any particular ASF rules? I see no mention
> of JIRA in https://www.apache.org/theapacheway/ and there are other Apache
> projects (such as Airflow) that use GitHub issues.
>

As an Airflow committer, I can confirm that switching from JIRA to
github issue resulted in a very positive productivity gain for the
community. This experience actually made me think more ASF projects
that use Github for code review should give Github issue a try.


> > Cons:
> > - not the apache way of issue tracking
> > - doubtful outcome for large number of issues
> >

In general, as long as it's within the ASF rules and not hurting
productivity for other language implementations, we should encourage
developers to explore new tools and management structures. Doing
things just because that's how it was done in the past is usually not
a good argument :) That said, I understand Krisztian's concern of this
turning into something worse than JIRA. In the worst case, we can
always declare this a failed experiment and go back to JIRA. But the
upside is so much more. Either case, I believe there will be no extra
work for the non-Rust developers.


> > I don't like either JIRA but I can live with it, though I understand
> > the frustration around it.
> > Since GH issues vs. JIRA seems like a hot topic lately we could try to
> > experiment with a less radical change: enable github issues for the
> > whole project and sync them to JIRA (either by using an existing
> > service or by developing a github action for it). We may end up
> > preferring github issues eventually.
> >
> > All in all, I find this proposal way too invasive. It sounds more like
> > starting a new project with its own governance rather than making
> > releases more accessible to users.
> >
> > Thanks, Krisztian
> >
> >
> > On Fri, Apr 9, 2021 at 5:18 PM Andy Grove  wrote:
> > >
> > > Following on from the email thread "Rust sync meeting" I would like to
> > > start a new discussion about moving the Rust components out to new GitHub
> > > repositories and using a new process for issues and release management.
> > >
> > > I have started a Google document [1] with details and to track the work
> > > required for this effort but I will summarize the key points of the
> > > proposal here:
> > >
> > >
> > >-
> > >
> > >Move existing Rust code into two new repositories
> > >-
> > >
> > >   apache/arrow-rs
> > >   -
> > >
> > >  Arrow + Parquet crates
> > >  -
> > >
> > >   apache/datafusion
> > >   -
> > >
> > >  DataFusion + Ballista crates (which are expected to merge to
> > some
> > >  degree over time)
> > >  -
> > >
> > >  TPC-H benchmarks
> > >  -
> > >
> > >   Use GitHub issues for issue tracking
> > >   -
> > >
> > >Decouple release process
> > >-
> > >
> > >   Crates are released individually
> > >   -
> > >
> > >   A vote on the source release of the released crate is held over the
> > >   mailing list as usual.
> > >   -
> > >
> > >   Rust does not need to release a new version when the rest of Arrow
> > >   releases; we bundle our latest released crates to the signed tar.
> > >   -
> > >
> > >   Crates can depend on GitHub commit hashes between releases
> > >
> > >
> > > The Google document may be the best place to collaborate on the proposal
> > > but I can update the document based on any comments in this email thread
> > as
> > > well.
> > >
> > > Note that I have excluded discussion about arrow2/parquet2 from this
> > > proposal and I believe we should discuss that separately as a follow-on
> > > discussion.
> > >
> > > I look forward to hearing opinions on this both from current Rust
> > > maintainers and contributors and also from the wider Arrow community.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > > [1]
> > >
> > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit?usp=sharing
> >


Re: [DISCUSS] [Rust] Move Rust components to new repos and process

2021-04-09 Thread QP Hou
On Fri, Apr 9, 2021 at 4:57 PM Weston Pace  wrote:
> Note, these problems technically exist now with the concept that any
> language can release a patch at any time.  Also, since Rust isn't
> directly compiling against other Arrow libs and we are only talking
> about interoperability it's probably not going to be too big of a
> deal.  Still, worth giving some thought ahead of time.

It looks like the requirements we have here are:

* Catch integration issues between development trunks between two
repos as soon as we can
* Avoid development trunk of a repo from blocking releases of another repo

How about we always run the integration tests against the latest
releases in both repos by default. On top of that, we run integration
tests in the Rust repo against nightly Arrow to help catch integration
issues between development trunks. In the Rust repo, we could enforce
the rule that a Rust release will only be cut if the release candidate
passes the integration tests for both the latest released and nightly
Arrow code base.

This way, we will still be able to catch integration issues between
development code trunks on a daily basis. It would be Rust developers'
responsibility to report these integration errors back to the main
repo.

>From the main Arrow repo's point of view, there is less work since it
only needs to care about the last stable Rust release.

>From the Rust repo's point of view, we are effectively trading extra
maintenance burden for more release flexibility.

Thanks,
QP


[jira] [Created] (ARROW-9124) DFParser should consume sql query as instead of String

2020-06-13 Thread QP Hou (Jira)
QP Hou created ARROW-9124:
-

 Summary: DFParser should consume sql query as  instead of 
String
 Key: ARROW-9124
 URL: https://issues.apache.org/jira/browse/ARROW-9124
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: QP Hou
Assignee: QP Hou


It's more efficient to use  instead of String



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9057) Projection should work on InMemoryScan without error

2020-06-07 Thread QP Hou (Jira)
QP Hou created ARROW-9057:
-

 Summary: Projection should work on InMemoryScan without error
 Key: ARROW-9057
 URL: https://issues.apache.org/jira/browse/ARROW-9057
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9005) Support sort expression

2020-06-01 Thread QP Hou (Jira)
QP Hou created ARROW-9005:
-

 Summary: Support sort expression
 Key: ARROW-9005
 URL: https://issues.apache.org/jira/browse/ARROW-9005
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8931) [Rust] Support lexical sort in arrow compute kernel

2020-05-24 Thread QP Hou (Jira)
QP Hou created ARROW-8931:
-

 Summary: [Rust] Support lexical sort in arrow compute kernel
 Key: ARROW-8931
 URL: https://issues.apache.org/jira/browse/ARROW-8931
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8906) [Rust] Support reading multiple CSV files for schema inference

2020-05-22 Thread QP Hou (Jira)
QP Hou created ARROW-8906:
-

 Summary: [Rust] Support reading multiple CSV files for schema 
inference
 Key: ARROW-8906
 URL: https://issues.apache.org/jira/browse/ARROW-8906
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8877) [Rust] add CSV read option struct to simplify datafusion interface

2020-05-20 Thread QP Hou (Jira)
QP Hou created ARROW-8877:
-

 Summary: [Rust] add CSV read option struct to simplify datafusion 
interface
 Key: ARROW-8877
 URL: https://issues.apache.org/jira/browse/ARROW-8877
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8840) [Rust] datafusion ExecutionError should implement std::error:Error trait

2020-05-17 Thread QP Hou (Jira)
QP Hou created ARROW-8840:
-

 Summary: [Rust] datafusion ExecutionError should implement 
std::error:Error trait
 Key: ARROW-8840
 URL: https://issues.apache.org/jira/browse/ARROW-8840
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >