Re: [DISCUSS] Reviewing Arrow commit/code review policy

2019-10-12 Thread Andy Grove
Wes,

Thanks for clarifying this. This will be very helpful for me while I work
on the Rust DataFusion crate since we have a small number of committers
today. I will still generally make PRs available for review (unless they
are trivial changes) but being able to merge without review when the other
committers are busy will be very helpful for momentum with some of the
features that I would like to see in the 1.0.0 release.

Thanks,

Andy.

On Sat, Oct 12, 2019 at 2:51 PM Wes McKinney  wrote:

> hi folks,
>
> We've added many new committers to Apache Arrow over the last 3+
> years, so I thought it would be worthwhile to review our commit and
> code review policy for everyone's benefit.
>
> Since the beginning of the project, Arrow has been in "Commit Then
> Review" mode (aka CTR).
>
> https://www.apache.org/foundation/glossary.html#CommitThenReview
>
> The idea of CTR is that committers can make changes at will with the
> understanding that if there is some disagreement or if work is vetoed,
> then changes may be reverted.
>
> In particular, in CTR if a committer submits a patch, they are able to
> +1 and merge their own patch. Generally, though, as a matter of
> courtesy to the community, for non-trivial patches it is a good idea
> to allow time for code review.
>
> More mature projects, or ones with potentially contentious governance
> / political issues, sometimes adopt "Review-Then-Commit" (RTC) which
> requires a more structured sign-off process from other committers.
> While Apache Arrow is more mature now, the diversity of the project
> has resulted in a lot of spread-out code ownership. I think that RTC
> at this stage would cause hardship for contributors on some components
> where there are not a lot of active code reviewers.
>
> Personally, I am OK to stick with CTR until we start experiencing
> problems. Overall I think we have a healthy dynamic amongst the
> project's nearly 50 committers and we have had to revert patches
> relatively rarely.
>
> Any thoughts from others?
>
> Thanks
> Wes
>


[DISCUSS] Reviewing Arrow commit/code review policy

2019-10-12 Thread Wes McKinney
hi folks,

We've added many new committers to Apache Arrow over the last 3+
years, so I thought it would be worthwhile to review our commit and
code review policy for everyone's benefit.

Since the beginning of the project, Arrow has been in "Commit Then
Review" mode (aka CTR).

https://www.apache.org/foundation/glossary.html#CommitThenReview

The idea of CTR is that committers can make changes at will with the
understanding that if there is some disagreement or if work is vetoed,
then changes may be reverted.

In particular, in CTR if a committer submits a patch, they are able to
+1 and merge their own patch. Generally, though, as a matter of
courtesy to the community, for non-trivial patches it is a good idea
to allow time for code review.

More mature projects, or ones with potentially contentious governance
/ political issues, sometimes adopt "Review-Then-Commit" (RTC) which
requires a more structured sign-off process from other committers.
While Apache Arrow is more mature now, the diversity of the project
has resulted in a lot of spread-out code ownership. I think that RTC
at this stage would cause hardship for contributors on some components
where there are not a lot of active code reviewers.

Personally, I am OK to stick with CTR until we start experiencing
problems. Overall I think we have a healthy dynamic amongst the
project's nearly 50 committers and we have had to revert patches
relatively rarely.

Any thoughts from others?

Thanks
Wes


Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-12 Thread Wes McKinney
That's great.

We should be cautious to not deploy any changes to the documentation
for unreleased features or API changes.

At some point it would be nice to make available old versions of the
docs, as well as a nightly/tip version

On Sat, Oct 12, 2019 at 3:18 PM Krisztián Szűcs
 wrote:
>
> Thanks Wes!
>
> I'm actually updating the docker containers and scripts required
> to update both the apidocs and the prose sphinx documentations.
> I'd like to run that regularly on master, and preferable also deploy.
>
> On Sat, Oct 12, 2019 at 10:09 PM Wes McKinney  wrote:
>
> > I just updated them. In future releases feel free to ping me to update
> > them. I just added a bash script to my toolchain to automate it
> >
> >
> > https://github.com/wesm/dev-toolchain/commit/97b9dd8a7a3b04f93da9d49bb6d476cc1c48f1d4#diff-3110d4bcfc0b68bbc695e5f54179550bR466
> >
> > On Fri, Oct 11, 2019 at 1:36 PM Krisztián Szűcs
> >  wrote:
> > >
> > > I'm not sure that I can find time for it during the weekend, but I'll try
> > > to.
> > >
> > >
> > > On Fri, Oct 11, 2019 at 3:32 PM Wes McKinney 
> > wrote:
> > >
> > > > That work isn't done. We can update the docs later when it's written.
> > > >
> > > > On Fri, Oct 11, 2019, 3:30 AM Krisztián Szűcs <
> > szucs.kriszt...@gmail.com>
> > > > wrote:
> > > >
> > > > > There is still an open issue about the flight docs [1], once it is
> > done
> > > > > I can update the docs.
> > > > >
> > > > > [1]: https://issues.apache.org/jira/browse/ARROW-6390
> > > > >
> > > > > On Thu, Oct 10, 2019 at 11:32 PM Wes McKinney 
> > > > wrote:
> > > > >
> > > > > > @Neal -- that's fine, I just want to make sure that in the next
> > > > > > release there is a responsible party (the RM?) to seek out someone
> > to
> > > > > > help build the documentation rather than let it sit silently
> > > > > > unpublished for a week or two. So we may just want to amend the RM
> > > > > > guide to include "Find someone to update the docs if you cannot do
> > it
> > > > > > yourself"
> > > > > >
> > > > > > On Thu, Oct 10, 2019 at 3:01 PM Joris Van den Bossche
> > > > > >  wrote:
> > > > > > >
> > > > > > > Wes, if you don't get to it today, I can try to update the docs
> > > > > tomorrow.
> > > > > > >
> > > > > > > Joris
> > > > > > >
> > > > > > > On Thu, 10 Oct 2019 at 21:51, Neal Richardson <
> > > > > > neal.p.richard...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I updated the R docs because I had everything I needed to do
> > that
> > > > > > > > locally: https://github.com/apache/arrow-site/pull/30 Doing
> > the
> > > > > others
> > > > > > > > wasn't feasible for me on my computer (I don't have CUDA, and
> > the
> > > > > case
> > > > > > > > insensitivity of the macOS file system always bites me with the
> > > > > > > > pyarrow docs anyway).
> > > > > > > >
> > > > > > > > IMO improving our CI/CD around documentation should be a
> > priority
> > > > for
> > > > > > 1.0.
> > > > > > > >
> > > > > > > > Neal
> > > > > > > >
> > > > > > > > On Thu, Oct 10, 2019 at 12:03 PM Wes McKinney <
> > wesmck...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > The docs on http://arrow.apache.org/docs/ haven't been
> > updated
> > > > > yet.
> > > > > > > > > This happened the last release, too -- I ended up updating
> > the
> > > > docs
> > > > > > > > > manually after a week or two. Is this included in the release
> > > > > > > > > management guide? If no one beats me to it, I can update the
> > docs
> > > > > by
> > > > > > > > > hand again later today
> > > > > > > > >
> > > > > > > > > On Mon, Oct 7, 2019 at 6:20 PM Wes McKinney <
> > wesmck...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > I think we might be a little aggressive at removing
> > artifacts
> > > > > from
> > > > > > the
> > > > > > > > > > dist system
> > > > > > > > > >
> > > > > > > > > > Can we change our process to only remove old dist artifacts
> > > > when
> > > > > we
> > > > > > > > > > are about to upload a new RC? Otherwise it's harder to make
> > > > > > > > > > improvements to the release verification scripts without
> > any
> > > > old
> > > > > > RC to
> > > > > > > > > > test against
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 7, 2019 at 5:17 PM Neal Richardson
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > The R package has been accepted by CRAN. Binaries for
> > macOS
> > > > and
> > > > > > > > > > > Windows should become available in the next few days.
> > > > > > > > > > >
> > > > > > > > > > > Neal
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Oct 7, 2019 at 1:41 AM Krisztián Szűcs
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks Andy!
> > > > > > > > > > > >
> > > > > > > > > > > > I've just removed the RC source artefacts from SVN.
> > > > > > > > > > > >
> > > > > > > > > > > > We have two remaining post release tasks:
> > > > > > > > > > > > - homebrew
> > > > > > > > > > > > - api

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-12 Thread Wes McKinney
I think the ideal scenario is to have a mix of "endogenous" unit
testing and functional testing against real files to test for
regressions or cross-compatibility. To criticize the work we've done
in the C++ project, we have not done enough systematic integration
testing IMHO, but we do test against some "bad files" that have
accumulated.

In any case, I think it's bad practice for a file format reader to
rely exclusively on functional testing against static binary files.

This good be a good opportunity to devise a language-agnostic Parquet
integration testing strategy. Given that we're looking to add nested
data support in C++ hopefully by the end of 2020, it would be good
timing.

On Sat, Oct 12, 2019 at 11:12 AM Andy Grove  wrote:
>
> I also think that there are valid use cases for checking in binary files,
> but we have to be careful not to abuse this. For example, we might want to
> check in a Parquet file created by a particular version of Apache Spark to
> ensure that Arrow implementations can read it successfully (hypothetical
> example).
>
> It would also be good to have a small set of Parquet files using every
> possible data type that all implementations can use in their tests. I
> suppose we might want one set per Arrow format version as well.
>
> The problem we have now, in my opinion, is that we're proposing adding
> files on a pretty ad-hoc basis, driven by the needs of individual
> contributors in one language implementation, and this is perhaps happening
> because we don't already have a good set of standard test files.
>
> Renjie - perhaps you could comment on this. If we had these standard files
> covering all data types, would that have worked for you in this instance?
>
> Thanks,
>
> Andy.
>
> On Sat, Oct 12, 2019 at 12:03 AM Micah Kornfield 
> wrote:
>
> > Hi Wes,
> > >
> > > I additionally would prefer generating the test corpus at test time
> > > rather than checking in binary files.
> >
> >
> > Can you elaborate on this? I think both generated on the fly and example
> > files are useful.
> >
> > The checked in files catch regressions even when readers/writers can read
> > their own data but they have either incorrect or undefined behavior in
> > regards to the specification (for example I would imagine checking in a
> > file as part of the fix for ARROW-6844
> > ).
> >
> > Thanks,
> > Micah
> >
> > On Thu, Oct 10, 2019 at 5:30 PM Renjie Liu 
> > wrote:
> >
> > > Thanks wes. Sure I'll fix it.
> > >
> > > Wes McKinney  于 2019年10月11日周五 上午6:10写道:
> > >
> > > > I just merged the PR https://github.com/apache/arrow-testing/pull/11
> > > >
> > > > Various aspects of this make me uncomfortable so I hope they can be
> > > > addressed in follow up work
> > > >
> > > > On Thu, Oct 10, 2019 at 5:41 AM Renjie Liu 
> > > > wrote:
> > > > >
> > > > > I've create ticket to track here:
> > > > > https://issues.apache.org/jira/browse/ARROW-6845
> > > > >
> > > > > For this moment, can we check in those pregenerated data to unblock
> > > rust
> > > > > version's arrow reader?
> > > > >
> > > > > On Thu, Oct 10, 2019 at 1:20 PM Renjie Liu 
> > > > wrote:
> > > > >
> > > > > > It would be fine in that case.
> > > > > >
> > > > > > Wes McKinney  于 2019年10月10日周四 下午12:58写道:
> > > > > >
> > > > > >> On Wed, Oct 9, 2019 at 10:16 PM Renjie Liu <
> > liurenjie2...@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > 1. There already exists a low level parquet writer which can
> > > produce
> > > > > >> > parquet file, so unit test should be fine. But writer from arrow
> > > to
> > > > > >> parquet
> > > > > >> > doesn't exist yet, and it may take some period of time to finish
> > > it.
> > > > > >> > 2. In fact my data are randomly generated and it's definitely
> > > > > >> reproducible.
> > > > > >> > However, I don't think it would be good idea to randomly
> > generate
> > > > data
> > > > > >> > everytime we run ci because it would be difficult to debug. For
> > > > example
> > > > > >> PR
> > > > > >> > a introduced a bug, which is triggerred in other PR's build it
> > > > would be
> > > > > >> > confusing for contributors.
> > > > > >>
> > > > > >> Presumably any random data generation would use a fixed seed
> > > precisely
> > > > > >> to be reproducible.
> > > > > >>
> > > > > >> > 3. I think it would be good idea to spend effort on integration
> > > test
> > > > > >> with
> > > > > >> > parquet because it's an important use case of arrow. Also
> > similar
> > > > > >> approach
> > > > > >> > could be extended to other language and other file format(avro,
> > > > orc).
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Oct 9, 2019 at 11:08 PM Wes McKinney <
> > wesmck...@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > There are a number of issues worth discussion.
> > > > > >> > >
> > > > > >> > > 1. What is the timeline/plan for Rust implementing a Parquet
> > > > _writer_?
> > > > > >> > > It's OK to be reliant on

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-12 Thread Krisztián Szűcs
Thanks Wes!

I'm actually updating the docker containers and scripts required
to update both the apidocs and the prose sphinx documentations.
I'd like to run that regularly on master, and preferable also deploy.

On Sat, Oct 12, 2019 at 10:09 PM Wes McKinney  wrote:

> I just updated them. In future releases feel free to ping me to update
> them. I just added a bash script to my toolchain to automate it
>
>
> https://github.com/wesm/dev-toolchain/commit/97b9dd8a7a3b04f93da9d49bb6d476cc1c48f1d4#diff-3110d4bcfc0b68bbc695e5f54179550bR466
>
> On Fri, Oct 11, 2019 at 1:36 PM Krisztián Szűcs
>  wrote:
> >
> > I'm not sure that I can find time for it during the weekend, but I'll try
> > to.
> >
> >
> > On Fri, Oct 11, 2019 at 3:32 PM Wes McKinney 
> wrote:
> >
> > > That work isn't done. We can update the docs later when it's written.
> > >
> > > On Fri, Oct 11, 2019, 3:30 AM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > > wrote:
> > >
> > > > There is still an open issue about the flight docs [1], once it is
> done
> > > > I can update the docs.
> > > >
> > > > [1]: https://issues.apache.org/jira/browse/ARROW-6390
> > > >
> > > > On Thu, Oct 10, 2019 at 11:32 PM Wes McKinney 
> > > wrote:
> > > >
> > > > > @Neal -- that's fine, I just want to make sure that in the next
> > > > > release there is a responsible party (the RM?) to seek out someone
> to
> > > > > help build the documentation rather than let it sit silently
> > > > > unpublished for a week or two. So we may just want to amend the RM
> > > > > guide to include "Find someone to update the docs if you cannot do
> it
> > > > > yourself"
> > > > >
> > > > > On Thu, Oct 10, 2019 at 3:01 PM Joris Van den Bossche
> > > > >  wrote:
> > > > > >
> > > > > > Wes, if you don't get to it today, I can try to update the docs
> > > > tomorrow.
> > > > > >
> > > > > > Joris
> > > > > >
> > > > > > On Thu, 10 Oct 2019 at 21:51, Neal Richardson <
> > > > > neal.p.richard...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I updated the R docs because I had everything I needed to do
> that
> > > > > > > locally: https://github.com/apache/arrow-site/pull/30 Doing
> the
> > > > others
> > > > > > > wasn't feasible for me on my computer (I don't have CUDA, and
> the
> > > > case
> > > > > > > insensitivity of the macOS file system always bites me with the
> > > > > > > pyarrow docs anyway).
> > > > > > >
> > > > > > > IMO improving our CI/CD around documentation should be a
> priority
> > > for
> > > > > 1.0.
> > > > > > >
> > > > > > > Neal
> > > > > > >
> > > > > > > On Thu, Oct 10, 2019 at 12:03 PM Wes McKinney <
> wesmck...@gmail.com
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > The docs on http://arrow.apache.org/docs/ haven't been
> updated
> > > > yet.
> > > > > > > > This happened the last release, too -- I ended up updating
> the
> > > docs
> > > > > > > > manually after a week or two. Is this included in the release
> > > > > > > > management guide? If no one beats me to it, I can update the
> docs
> > > > by
> > > > > > > > hand again later today
> > > > > > > >
> > > > > > > > On Mon, Oct 7, 2019 at 6:20 PM Wes McKinney <
> wesmck...@gmail.com
> > > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > I think we might be a little aggressive at removing
> artifacts
> > > > from
> > > > > the
> > > > > > > > > dist system
> > > > > > > > >
> > > > > > > > > Can we change our process to only remove old dist artifacts
> > > when
> > > > we
> > > > > > > > > are about to upload a new RC? Otherwise it's harder to make
> > > > > > > > > improvements to the release verification scripts without
> any
> > > old
> > > > > RC to
> > > > > > > > > test against
> > > > > > > > >
> > > > > > > > > On Mon, Oct 7, 2019 at 5:17 PM Neal Richardson
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > The R package has been accepted by CRAN. Binaries for
> macOS
> > > and
> > > > > > > > > > Windows should become available in the next few days.
> > > > > > > > > >
> > > > > > > > > > Neal
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 7, 2019 at 1:41 AM Krisztián Szűcs
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > Thanks Andy!
> > > > > > > > > > >
> > > > > > > > > > > I've just removed the RC source artefacts from SVN.
> > > > > > > > > > >
> > > > > > > > > > > We have two remaining post release tasks:
> > > > > > > > > > > - homebrew
> > > > > > > > > > > - apidocs
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Oct 7, 2019 at 1:47 AM Andy Grove <
> > > > > andygrov...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I released the Rust crates from the RC2 source
> tarball. I
> > > > > had to
> > > > > > > comment
> > > > > > > > > > > > out the benchmark references in the Cargo.toml first
> > > since
> > > > > the
> > > > > > > tarball does
> > > > > > > > > > > > not include the benchmark source code. I filed
> > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-12 Thread Wes McKinney
I just updated them. In future releases feel free to ping me to update
them. I just added a bash script to my toolchain to automate it

https://github.com/wesm/dev-toolchain/commit/97b9dd8a7a3b04f93da9d49bb6d476cc1c48f1d4#diff-3110d4bcfc0b68bbc695e5f54179550bR466

On Fri, Oct 11, 2019 at 1:36 PM Krisztián Szűcs
 wrote:
>
> I'm not sure that I can find time for it during the weekend, but I'll try
> to.
>
>
> On Fri, Oct 11, 2019 at 3:32 PM Wes McKinney  wrote:
>
> > That work isn't done. We can update the docs later when it's written.
> >
> > On Fri, Oct 11, 2019, 3:30 AM Krisztián Szűcs 
> > wrote:
> >
> > > There is still an open issue about the flight docs [1], once it is done
> > > I can update the docs.
> > >
> > > [1]: https://issues.apache.org/jira/browse/ARROW-6390
> > >
> > > On Thu, Oct 10, 2019 at 11:32 PM Wes McKinney 
> > wrote:
> > >
> > > > @Neal -- that's fine, I just want to make sure that in the next
> > > > release there is a responsible party (the RM?) to seek out someone to
> > > > help build the documentation rather than let it sit silently
> > > > unpublished for a week or two. So we may just want to amend the RM
> > > > guide to include "Find someone to update the docs if you cannot do it
> > > > yourself"
> > > >
> > > > On Thu, Oct 10, 2019 at 3:01 PM Joris Van den Bossche
> > > >  wrote:
> > > > >
> > > > > Wes, if you don't get to it today, I can try to update the docs
> > > tomorrow.
> > > > >
> > > > > Joris
> > > > >
> > > > > On Thu, 10 Oct 2019 at 21:51, Neal Richardson <
> > > > neal.p.richard...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I updated the R docs because I had everything I needed to do that
> > > > > > locally: https://github.com/apache/arrow-site/pull/30 Doing the
> > > others
> > > > > > wasn't feasible for me on my computer (I don't have CUDA, and the
> > > case
> > > > > > insensitivity of the macOS file system always bites me with the
> > > > > > pyarrow docs anyway).
> > > > > >
> > > > > > IMO improving our CI/CD around documentation should be a priority
> > for
> > > > 1.0.
> > > > > >
> > > > > > Neal
> > > > > >
> > > > > > On Thu, Oct 10, 2019 at 12:03 PM Wes McKinney  > >
> > > > wrote:
> > > > > > >
> > > > > > > The docs on http://arrow.apache.org/docs/ haven't been updated
> > > yet.
> > > > > > > This happened the last release, too -- I ended up updating the
> > docs
> > > > > > > manually after a week or two. Is this included in the release
> > > > > > > management guide? If no one beats me to it, I can update the docs
> > > by
> > > > > > > hand again later today
> > > > > > >
> > > > > > > On Mon, Oct 7, 2019 at 6:20 PM Wes McKinney  > >
> > > > wrote:
> > > > > > > >
> > > > > > > > I think we might be a little aggressive at removing artifacts
> > > from
> > > > the
> > > > > > > > dist system
> > > > > > > >
> > > > > > > > Can we change our process to only remove old dist artifacts
> > when
> > > we
> > > > > > > > are about to upload a new RC? Otherwise it's harder to make
> > > > > > > > improvements to the release verification scripts without any
> > old
> > > > RC to
> > > > > > > > test against
> > > > > > > >
> > > > > > > > On Mon, Oct 7, 2019 at 5:17 PM Neal Richardson
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > The R package has been accepted by CRAN. Binaries for macOS
> > and
> > > > > > > > > Windows should become available in the next few days.
> > > > > > > > >
> > > > > > > > > Neal
> > > > > > > > >
> > > > > > > > > On Mon, Oct 7, 2019 at 1:41 AM Krisztián Szűcs
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks Andy!
> > > > > > > > > >
> > > > > > > > > > I've just removed the RC source artefacts from SVN.
> > > > > > > > > >
> > > > > > > > > > We have two remaining post release tasks:
> > > > > > > > > > - homebrew
> > > > > > > > > > - apidocs
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 7, 2019 at 1:47 AM Andy Grove <
> > > > andygrov...@gmail.com>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I released the Rust crates from the RC2 source tarball. I
> > > > had to
> > > > > > comment
> > > > > > > > > > > out the benchmark references in the Cargo.toml first
> > since
> > > > the
> > > > > > tarball does
> > > > > > > > > > > not include the benchmark source code. I filed
> > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-6801 for
> > this
> > > > bug
> > > > > > and will
> > > > > > > > > > > fix the packaging before the 1.0.0 release.
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Oct 6, 2019 at 2:01 AM Krisztián Szűcs <
> > > > > > szucs.kriszt...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> The rust publishing script fails because it cannot find
> > > the
> > > > > > benchmarks.
> > > > > > > > > > >> Seems to be related to cargo changes.
> > > > > > > > > > >> I cannot investigate it right now, @Andy could you take
> > a
> > > > look?
> > > > > > > > > > >>
> > > > > > > > > > >> On Sun,

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-12 Thread Andy Grove
I also think that there are valid use cases for checking in binary files,
but we have to be careful not to abuse this. For example, we might want to
check in a Parquet file created by a particular version of Apache Spark to
ensure that Arrow implementations can read it successfully (hypothetical
example).

It would also be good to have a small set of Parquet files using every
possible data type that all implementations can use in their tests. I
suppose we might want one set per Arrow format version as well.

The problem we have now, in my opinion, is that we're proposing adding
files on a pretty ad-hoc basis, driven by the needs of individual
contributors in one language implementation, and this is perhaps happening
because we don't already have a good set of standard test files.

Renjie - perhaps you could comment on this. If we had these standard files
covering all data types, would that have worked for you in this instance?

Thanks,

Andy.

On Sat, Oct 12, 2019 at 12:03 AM Micah Kornfield 
wrote:

> Hi Wes,
> >
> > I additionally would prefer generating the test corpus at test time
> > rather than checking in binary files.
>
>
> Can you elaborate on this? I think both generated on the fly and example
> files are useful.
>
> The checked in files catch regressions even when readers/writers can read
> their own data but they have either incorrect or undefined behavior in
> regards to the specification (for example I would imagine checking in a
> file as part of the fix for ARROW-6844
> ).
>
> Thanks,
> Micah
>
> On Thu, Oct 10, 2019 at 5:30 PM Renjie Liu 
> wrote:
>
> > Thanks wes. Sure I'll fix it.
> >
> > Wes McKinney  于 2019年10月11日周五 上午6:10写道:
> >
> > > I just merged the PR https://github.com/apache/arrow-testing/pull/11
> > >
> > > Various aspects of this make me uncomfortable so I hope they can be
> > > addressed in follow up work
> > >
> > > On Thu, Oct 10, 2019 at 5:41 AM Renjie Liu 
> > > wrote:
> > > >
> > > > I've create ticket to track here:
> > > > https://issues.apache.org/jira/browse/ARROW-6845
> > > >
> > > > For this moment, can we check in those pregenerated data to unblock
> > rust
> > > > version's arrow reader?
> > > >
> > > > On Thu, Oct 10, 2019 at 1:20 PM Renjie Liu 
> > > wrote:
> > > >
> > > > > It would be fine in that case.
> > > > >
> > > > > Wes McKinney  于 2019年10月10日周四 下午12:58写道:
> > > > >
> > > > >> On Wed, Oct 9, 2019 at 10:16 PM Renjie Liu <
> liurenjie2...@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > 1. There already exists a low level parquet writer which can
> > produce
> > > > >> > parquet file, so unit test should be fine. But writer from arrow
> > to
> > > > >> parquet
> > > > >> > doesn't exist yet, and it may take some period of time to finish
> > it.
> > > > >> > 2. In fact my data are randomly generated and it's definitely
> > > > >> reproducible.
> > > > >> > However, I don't think it would be good idea to randomly
> generate
> > > data
> > > > >> > everytime we run ci because it would be difficult to debug. For
> > > example
> > > > >> PR
> > > > >> > a introduced a bug, which is triggerred in other PR's build it
> > > would be
> > > > >> > confusing for contributors.
> > > > >>
> > > > >> Presumably any random data generation would use a fixed seed
> > precisely
> > > > >> to be reproducible.
> > > > >>
> > > > >> > 3. I think it would be good idea to spend effort on integration
> > test
> > > > >> with
> > > > >> > parquet because it's an important use case of arrow. Also
> similar
> > > > >> approach
> > > > >> > could be extended to other language and other file format(avro,
> > > orc).
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Oct 9, 2019 at 11:08 PM Wes McKinney <
> wesmck...@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > There are a number of issues worth discussion.
> > > > >> > >
> > > > >> > > 1. What is the timeline/plan for Rust implementing a Parquet
> > > _writer_?
> > > > >> > > It's OK to be reliant on other libraries in the short term to
> > > produce
> > > > >> > > files to test against, but does not strike me as a sustainable
> > > > >> > > long-term plan. Fixing bugs can be a lot more difficult than
> it
> > > needs
> > > > >> > > to be if you can't write targeted "endogenous" unit tests
> > > > >> > >
> > > > >> > > 2. Reproducible data generation
> > > > >> > >
> > > > >> > > I think if you're going to test against a pre-generated
> corpus,
> > > you
> > > > >> > > should make sure that generating the corpus is reproducible
> for
> > > other
> > > > >> > > developers (i.e. with a Dockerfile), and can be extended by
> > > adding new
> > > > >> > > files or random data generation.
> > > > >> > >
> > > > >> > > I additionally would prefer generating the test corpus at test
> > > time
> > > > >> > > rather than checking in binary files. If this isn't viable
> right
> > > now
> > > > >> > > we can create an "arrow-rust-crutch" git repository for you to
> > > st

Re: [DISCUSS] Understanding Arrow's CI problems and needs

2019-10-12 Thread Andy Grove
I've started a new section to discuss proposals and current initiatives. I
know some of us have been working on some things but without much
coordination so far. It would be good to track these efforts so everyone
can comment on them.

On Fri, Oct 11, 2019 at 11:11 AM Wes McKinney  wrote:

> It seems some time has passed here. Would some others like to read the
> document and comment? This is important stuff.
>
> On Wed, Oct 2, 2019 at 2:20 PM Krisztián Szűcs
>  wrote:
> >
> > The current document greatly summarizes the current situation, but in
> > order to properly compare and eventually select a solution we need a
> > a detailed list of explicit features with some sort of classification,
> like
> > should/must have. For example our future CI system must support
> > "PRs from forks". After filling this table for the alternatives we can
> > have a much clearer picture.
> >
> > On Wed, Oct 2, 2019 at 4:06 PM Wes McKinney  wrote:
> >
> > > I reviewed the document, thanks for putting it together! I think it
> > > captures most of the requirements and the challenges that we are
> > > currently facing. I think that anyone who is actively contributing to
> > > the project or merging pull requests should read this document since
> > > this affects all of us.
> > >
> > > On Tue, Oct 1, 2019 at 1:55 PM Wes McKinney 
> wrote:
> > > >
> > > > Thanks Neal for starting this discussion. I will review and comment.
> > > >
> > > > I will say that as a maintainer the current situation is very nearly
> > > > intolerable. As by far and away the most prolific merger-of-PRs [1],
> > > > I've been negatively affected by the long queueing times and delayed
> > > > feedback cycles. The project would not be able to accommodate 2x or
> 5x
> > > > the volume of PRs that we have now, and so it is urgent that we
> > > > develop a scalable cross-platform CI solution that is under this
> > > > community's control and does not require a high maintenance burden,
> so
> > > > if we need to increase the amount of resources dedicated to CI we can
> > > > unilaterally do so.
> > > >
> > > > [1]: https://gist.github.com/wesm/78bfda4cef3b23a5193cf4fb8a6540fb
> > > >
> > > > On Tue, Oct 1, 2019 at 1:38 PM Neal Richardson
> > > >  wrote:
> > > > >
> > > > > Hi all,
> > > > > Over the last few months, I've seen a lot of frustration and
> > > > > discussion around the shortcomings of our current CI. I'm also
> seeing
> > > > > debate over a few possible solutions; unfortunately, the debates
> tend
> > > > > not to resolve in a clear, decisive way, and we end up having the
> same
> > > > > debates repeatedly.
> > > > >
> > > > > In my experience, this pattern often happens when there's not a
> shared
> > > > > understanding of the problems we're trying to solve--it's hard to
> > > > > agree on a solution if we don't agree on the problem. To help us
> reach
> > > > > consensus on the problems, I've started a document:
> > > > >
> > >
> https://docs.google.com/document/d/1fToW48TO-B9T8VRi0_Z30fDJkjOrBisc-Fr8Epl50s4/edit#
> > > > >
> > > > > Please have a look and add/edit freely. I've tried to capture the
> > > > > arguments I've seen go by the mailing list, as well as some from my
> > > > > own experience, but if I've mischaracterized anything, please
> rectify.
> > > > >
> > > > > I know several people have been exploring some potential solutions,
> > > > > and I hope this document can help us begin to discuss their
> relative
> > > > > merits more objectively and practically.
> > > > >
> > > > > Neal
> > >
>


Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-12-0

2019-10-12 Thread Neal Richardson
I just merged https://github.com/apache/arrow/pull/5617 so hopefully
that will address the docker failures on circleCI.

Neal

On Sat, Oct 12, 2019 at 5:01 AM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2019-10-12-0
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0
>
> Failed Tasks:
> - wheel-manylinux2010-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-manylinux2010-cp35m
> - ubuntu-disco:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-azure-ubuntu-disco
> - docker-clang-format:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-clang-format
> - docker-python-3.6-nopandas:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-3.6-nopandas
> - docker-go:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-go
> - docker-js:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-js
> - docker-docs:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-docs
> - docker-iwyu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-iwyu
> - docker-pandas-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-pandas-master
> - docker-cpp-static-only:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-cpp-static-only
> - docker-c_glib:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-c_glib
> - docker-python-2.7-nopandas:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-2.7-nopandas
> - gandiva-jar-trusty:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-gandiva-jar-trusty
> - docker-java:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-java
> - docker-cpp-release:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-cpp-release
> - wheel-win-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-appveyor-wheel-win-cp37m
> - docker-r-sanitizer:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-r-sanitizer
> - docker-rust:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-rust
> - docker-lint:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-lint
> - docker-turbodbc-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-turbodbc-integration
> - wheel-win-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-appveyor-wheel-win-cp36m
> - wheel-manylinux1-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-manylinux1-cp35m
> - wheel-osx-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp37m
> - docker-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-3.6
> - debian-buster:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-azure-debian-buster
> - docker-hdfs-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-hdfs-integration
> - wheel-osx-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp36m
> - wheel-osx-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp35m
> - wheel-osx-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp27m
> - docker-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-cpp
> - docker-python-3.7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-3.7
> - debian-stretch:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-azure-debian-stretch
> - docker-spark-integration:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-spark-integration
> - docker-cpp-cmake32:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-

[jira] [Created] (ARROW-6867) [FlightRPC][Java] Flight server can hang JVM on shutdown

2019-10-12 Thread David Li (Jira)
David Li created ARROW-6867:
---

 Summary: [FlightRPC][Java] Flight server can hang JVM on shutdown
 Key: ARROW-6867
 URL: https://issues.apache.org/jira/browse/ARROW-6867
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Affects Versions: 0.15.0
Reporter: David Li
Assignee: David Li
 Fix For: 1.0.0


I noticed this while working on Flight integration tests. FlightService keeps 
an executor, which can hang the JVM on shutdown if the executor itself is not 
shut down.

It's used by Handshake and DoPut.

I think this surfaced because I wrote an AuthHandler that threw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6866) [Java] Improve the performance of calculating hash code for struct vector

2019-10-12 Thread Liya Fan (Jira)
Liya Fan created ARROW-6866:
---

 Summary: [Java] Improve the performance of calculating hash code 
for struct vector
 Key: ARROW-6866
 URL: https://issues.apache.org/jira/browse/ARROW-6866
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


Improve the performance of hashCode(int) method for StructVector:
1. We can get the child vectors directly, so there is no need to get the name 
from the child vector and then use the name to get the vector. 
2. The child vectors cannot be null, so there is no need to check it.

The performance improvement depends on the complexity of the hash algorithm. 
For computational intensive hash algorithms, the improvement can be small; 
while for simple hash algorithms, the improvement can be notable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2019-10-12-0

2019-10-12 Thread Crossbow


Arrow Build Report for Job nightly-2019-10-12-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0

Failed Tasks:
- wheel-manylinux2010-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-manylinux2010-cp35m
- ubuntu-disco:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-azure-ubuntu-disco
- docker-clang-format:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-clang-format
- docker-python-3.6-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-3.6-nopandas
- docker-go:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-go
- docker-js:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-js
- docker-docs:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-docs
- docker-iwyu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-iwyu
- docker-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-pandas-master
- docker-cpp-static-only:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-cpp-static-only
- docker-c_glib:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-c_glib
- docker-python-2.7-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-2.7-nopandas
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-gandiva-jar-trusty
- docker-java:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-java
- docker-cpp-release:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-cpp-release
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-appveyor-wheel-win-cp37m
- docker-r-sanitizer:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-r-sanitizer
- docker-rust:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-rust
- docker-lint:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-lint
- docker-turbodbc-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-turbodbc-integration
- wheel-win-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-appveyor-wheel-win-cp36m
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-manylinux1-cp35m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp37m
- docker-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-3.6
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-azure-debian-buster
- docker-hdfs-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-hdfs-integration
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp36m
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp35m
- wheel-osx-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travis-wheel-osx-cp27m
- docker-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-cpp
- docker-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-python-3.7
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-azure-debian-stretch
- docker-spark-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-spark-integration
- docker-cpp-cmake32:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-cpp-cmake32
- docker-r-conda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-r-conda
- docker-dask-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-circle-docker-dask-integration
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-12-0-travi

[jira] [Created] (ARROW-6865) [Java] Improve the performance of comparing an ArrowBuf against a byte array

2019-10-12 Thread Liya Fan (Jira)
Liya Fan created ARROW-6865:
---

 Summary: [Java] Improve the performance of comparing an ArrowBuf 
against a byte array
 Key: ARROW-6865
 URL: https://issues.apache.org/jira/browse/ARROW-6865
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


We change the way of comparing an ArrowBuf against a byte array from byte wise 
comparison to comparison by long/int/byte.

Benchmark shows that there is a 6.7x performance improvement. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6864) [C++] bz2 / zstd tests not enabled

2019-10-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6864:
-

 Summary: [C++] bz2 / zstd tests not enabled
 Key: ARROW-6864
 URL: https://issues.apache.org/jira/browse/ARROW-6864
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.15.0
Reporter: Antoine Pitrou


When passing {{-DARROW_WITH_ZSTD=on}} and {{-DARROW_WITH_BZ2=on}}, the relevant 
tests in {{arrow-compression-test}} and {{arrow-io-compressed-test}} are still 
not enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6863) [Java] Provide parallel searcher

2019-10-12 Thread Liya Fan (Jira)
Liya Fan created ARROW-6863:
---

 Summary: [Java] Provide parallel searcher
 Key: ARROW-6863
 URL: https://issues.apache.org/jira/browse/ARROW-6863
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


For scenarios where the vector is large and the a low response time is 
required, we need to search the vector in parallel to improve the 
responsiveness.

This issue tries to provide a parallel searcher for the equality semantics (the 
support for ordering semantics is not ready yet, as we need a way to distribute 
the comparator).

The implementation is based on multi-threading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)