Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

2020-04-16 Thread Micah Kornfield
Hi Wes,
Thanks that seems like a good characterization.  I opened up some JIRA
subtasks on ARROW-1644 which go into a little more detail on tasks that can
probably be worked on in parallel (I've only assigned ones to myself
that I'm actively working on, happy to add discuss/collaborate on the finer
points on the JIRAs).  There will probably be a few more JIRAs to open to
do final integration work (e.g. a flag to switch between old and new
engines).

For unit tests (Item B).  as noted earlier in the thread there is already a
disabled unit test trying to verify basic ability to round-trip but that
probably isn't sufficient.

Thanks,
Micah

On Wed, Apr 15, 2020 at 9:32 AM Wes McKinney  wrote:

> hi Micah,
>
> Sounds good. It seems like there are a few projects where people might
> be able to work without stepping on each other's toes
>
> A. Array reassembly from raw repetition/definition levels (I would
> guess this would be your focus)
> B. Schema and data generation for round-trip correctness and
> performance testing (I reckon that the unit tests for A will largely
> be hand-written examples like you did for the write path)
> C. Benchmarks, particularly to be able to assess performance changes
> going from the old incomplete implementations to the new ones
>
> Some of us should be able to pitch in to help with this. Might also be
> a good opportunity to do some cleanup of the test code in
> cpp/src/parquet/arrow
>
> - Wes
>
> On Tue, Apr 14, 2020 at 11:19 PM Micah Kornfield 
> wrote:
> >
> > Hi Wes,
> > Yes, I'm making progress and at this point I anticipate being able to
> finish it off by next release, possibly without support for round tripping
> fixed size lists.  I've been spending some time thinking about different
> approaches and have started coding some of the building blocks, which I
> think in the common case (relatively low nesting levels) should be fairly
> performant (I'm also going to write some benchmarks to sanity check this).
> One caveat to this is my schedule is going to change slightly next week and
> its possible my bandwidth might be more limited, I'll update the list if
> this happens.
> >
> > I think there are at least two areas that I'm not working on that could
> be parallelized if you or your team has bandwidth.
> >
> > 1. It would be good to have some parquet files representing real world
> datasets available to benchmark against.
> > 2. The higher level book keeping of tracking which def-levels/rep-levels
> are needed to compare against for any particular column (i.e. preceding
> repeated parent).  I'm currently working on the code that takes these and
> converts them to offsets/null fields.
> >
> > I can go into more details if you or your team would like to collaborate.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney 
> wrote:
> >>
> >> hi Micah,
> >>
> >> I'm glad that we have the write side of nested completed for 0.17.0.
> >>
> >> As far as completing the read side and then implementing sufficient
> >> testing to exercise corner cases in end-to-end reads/writes, do you
> >> anticipate being able to work on this in the next 4-6 weeks (obviously
> >> the state of the world has affected everyone's availability /
> >> bandwidth)? I ask because someone from my team (or me also) may be
> >> able to get involved and help this move along. It'd be great to have
> >> this 100% completed and checked off our list for the next release
> >> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
> >> tests get completed also)
> >>
> >> thanks
> >> Wes
> >>
> >> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield 
> wrote:
> >> >>
> >> >> Glad to hear about the progress. As I mentioned on #2, what do you
> >> >> think about setting up a feature branch for you to merge PRs into?
> >> >> Then the branch can be iterated on and we can merge it back when it's
> >> >> feature complete and does not have perf regressions for the flat
> >> >> read/write path.
> >> >>
> >> > I'd like to avoid a separate branch if possible.  I'm willing to
> close the open PR till I'm sure it is needed but I'm hoping keeping PRs as
> small focused as possible with performance testing a long the way will be a
> better reviewer and developer experience here.
> >> >
> >> >> The earliest I'd have time to work on this myself would likely be
> >> >> sometime in March. Others are welcome to jump in as well (and it'd be
> >> >> great to increase the overall level of knowledge of the Parquet
> >> >> codebase)
> >> >
> >> > Hopefully, Igor can help out otherwise I'll take up the read path
> after I finish the write path.
> >> >
> >> > -Micah
> >> >
> >> > On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney 
> wrote:
> >> >>
> >> >> hi Micah
> >> >>
> >> >> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <
> emkornfi...@gmail.com> wrote:
> >> >> >
> >> >> > Just to give an update.  I've been a little bit delayed, but my
> progress is
> >> >> > as follows:
> >> >> > 1.  Had 1 PR merged that will

[jira] [Created] (ARROW-8495) [C++] Implement non-vectorized array reconstruction logic.

2020-04-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8495:
--

 Summary: [C++] Implement non-vectorized array reconstruction logic.
 Key: ARROW-8495
 URL: https://issues.apache.org/jira/browse/ARROW-8495
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield


In contrast to the "Vectorized" reassembly this would scan:

 

{{for each rep/def level entry:}}

{{     for each field:}}

{{           update null bitmask and offsets.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8494) [C++] Implement vectorized array reassembly logic

2020-04-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8494:
--

 Summary: [C++] Implement vectorized array reassembly logic
 Key: ARROW-8494
 URL: https://issues.apache.org/jira/browse/ARROW-8494
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


This logic would attempt to create the data necessary for each field by passing 
through the levels once for each field.  it is expected that due to SIMD this 
will perform better for nested data with shallow nesting, but due to repetetive 
computation might perform worse for deep nested that include List-types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8493) [C++] Create unified schema resolution code for Array reconstruction.

2020-04-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8493:
--

 Summary: [C++] Create unified schema resolution code for Array 
reconstruction.
 Key: ARROW-8493
 URL: https://issues.apache.org/jira/browse/ARROW-8493
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield


We need a function/class that can take the parquet schema and a proposed arrow 
schema (potentially retrieved from parquet metadata) and outputs a data 
structure that contains, all of the information in "SchemaField" and the 
following additional options:

 

1.  Corresponding Definition level for nullability (wouldn't be populated for 
non-null arrays).

2.  Correspond Repetition level for lists (wouldn't be populated for for 
non-lists).

 

One option is to augment and populate these on the SchemaField.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8492) [C++] Create randomized nested data generation round trip read/write unit tests

2020-04-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8492:
--

 Summary: [C++] Create randomized nested data generation round trip 
read/write unit tests
 Key: ARROW-8492
 URL: https://issues.apache.org/jira/browse/ARROW-8492
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield


This will hopefully help catch edge cases not caught by hand-coded 
reconstruction unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8491) [C++][Parquet] Add benchmarks for rep/def level decoding at multiple levels

2020-04-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8491:
--

 Summary: [C++][Parquet] Add benchmarks for rep/def level decoding 
at multiple levels
 Key: ARROW-8491
 URL: https://issues.apache.org/jira/browse/ARROW-8491
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield


using synthetic data would be a good start but starting to build a small corpus 
of 1-10 MB files in general to run against would be a good idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8490) [C++] Expose a ReadValuesSpaced method that accepts a validity bitmap.

2020-04-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8490:
--

 Summary: [C++] Expose a ReadValuesSpaced method that accepts a 
validity bitmap.
 Key: ARROW-8490
 URL: https://issues.apache.org/jira/browse/ARROW-8490
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield


The current logic for determining spacing internally converts def levels for 
variable sized lists.  This logic won't work fixed size lists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-16 Thread Andy Grove
+1 (binding) based on testing the Rust implementation.

On Thu, Apr 16, 2020 at 8:12 PM Krisztián Szűcs 
wrote:

> My vote: +1 (binding)
>
> Tested on macOS Catalina.
>
> Java: OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_JAVA=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> C++: OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_CPP=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> Glib, Ruby: OK, OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_GLIB=1 TEST_RUBY=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> Python: OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_PYTHON=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> Go: OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_GO=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> Rust: OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_RUST=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> JS: OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_JS=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> C#: OK with sourcelink errors [1]
> `TEST_DEFAULT=0 TEST_CSHARP=1 dev/release/verify-release-candidate.sh
> source 0.17.0 0`
>
> Integration: OK
> `ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_INTEGRATION=1
> dev/release/verify-release-candidate.sh source 0.17.0 0`
>
> Binaries: OK
> `BINTRAY_REPOSITORY=apache/arrow ARROW_TMPDIR=/tmp/arrow-test
> dev/release/verify-release-candidate.sh binaries 0.17.0 0`
>
> Wheels: OK
> `BINTRAY_REPOSITORY=apache/arrow TMPDIR=/tmp/arrow
> dev/release/verify-release-candidate.sh wheels 0.17.0 0`
>
> I'll try to validate the wheels on linux and windows in the upcoming days.
>
> [1]: https://gist.github.com/kszucs/64ed7b8e188d08da5e6913cafe104212
>
> On Fri, Apr 17, 2020 at 2:26 AM Krisztián Szűcs
>  wrote:
> >
> > Hi,
> >
> > I would like to propose the following release candidate (RC0) of Apache
> > Arrow version 0.17.0. This is a release consisting of 582
> > resolved JIRA issues[1].
> >
> > This release candidate is based on commit:
> > 3cbcb7b62c2f2d02851bff837758637eb592a64b [2]
> >
> > The source release rc0 is hosted at [3].
> > The binary artifacts are hosted at [4][5][6][7].
> > The changelog is located at [8].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [9] for how to validate a release candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow 0.17.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow 0.17.0 because...
> >
> > [1]:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.17.0
> > [2]:
> https://github.com/apache/arrow/tree/3cbcb7b62c2f2d02851bff837758637eb592a64b
> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.17.0-rc0
> > [4]: https://bintray.com/apache/arrow/centos-rc/0.17.0-rc0
> > [5]: https://bintray.com/apache/arrow/debian-rc/0.17.0-rc0
> > [6]: https://bintray.com/apache/arrow/python-rc/0.17.0-rc0
> > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.17.0-rc0
> > [8]:
> https://github.com/apache/arrow/blob/3cbcb7b62c2f2d02851bff837758637eb592a64b/CHANGELOG.md
> > [9]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
>


Re: [VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-16 Thread Krisztián Szűcs
My vote: +1 (binding)

Tested on macOS Catalina.

Java: OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_JAVA=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

C++: OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_CPP=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

Glib, Ruby: OK, OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_GLIB=1 TEST_RUBY=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

Python: OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_PYTHON=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

Go: OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_GO=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

Rust: OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_RUST=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

JS: OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_JS=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

C#: OK with sourcelink errors [1]
`TEST_DEFAULT=0 TEST_CSHARP=1 dev/release/verify-release-candidate.sh
source 0.17.0 0`

Integration: OK
`ARROW_TMPDIR=/tmp/arrow-test TEST_DEFAULT=0 TEST_INTEGRATION=1
dev/release/verify-release-candidate.sh source 0.17.0 0`

Binaries: OK
`BINTRAY_REPOSITORY=apache/arrow ARROW_TMPDIR=/tmp/arrow-test
dev/release/verify-release-candidate.sh binaries 0.17.0 0`

Wheels: OK
`BINTRAY_REPOSITORY=apache/arrow TMPDIR=/tmp/arrow
dev/release/verify-release-candidate.sh wheels 0.17.0 0`

I'll try to validate the wheels on linux and windows in the upcoming days.

[1]: https://gist.github.com/kszucs/64ed7b8e188d08da5e6913cafe104212

On Fri, Apr 17, 2020 at 2:26 AM Krisztián Szűcs
 wrote:
>
> Hi,
>
> I would like to propose the following release candidate (RC0) of Apache
> Arrow version 0.17.0. This is a release consisting of 582
> resolved JIRA issues[1].
>
> This release candidate is based on commit:
> 3cbcb7b62c2f2d02851bff837758637eb592a64b [2]
>
> The source release rc0 is hosted at [3].
> The binary artifacts are hosted at [4][5][6][7].
> The changelog is located at [8].
>
> Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [9] for how to validate a release candidate.
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Release this as Apache Arrow 0.17.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow 0.17.0 because...
>
> [1]: 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.17.0
> [2]: 
> https://github.com/apache/arrow/tree/3cbcb7b62c2f2d02851bff837758637eb592a64b
> [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.17.0-rc0
> [4]: https://bintray.com/apache/arrow/centos-rc/0.17.0-rc0
> [5]: https://bintray.com/apache/arrow/debian-rc/0.17.0-rc0
> [6]: https://bintray.com/apache/arrow/python-rc/0.17.0-rc0
> [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.17.0-rc0
> [8]: 
> https://github.com/apache/arrow/blob/3cbcb7b62c2f2d02851bff837758637eb592a64b/CHANGELOG.md
> [9]: 
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates


Re: ORC JNI wrapper bugs [Re: 0.17 release procedure]

2020-04-16 Thread Fan Liya
One way to skip a test class is to place a "@Ignore" annotation in front of
the class declaration.

Best,
Liya Fan

On Thu, Apr 16, 2020 at 7:29 PM Krisztián Szűcs 
wrote:

> On Thu, Apr 16, 2020 at 11:47 AM Antoine Pitrou 
> wrote:
> >
> >
> > The ORC JNI wrapper is currently crashing on these lines:
> >
> https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L279-L281
> >
> > because in C++, certain buffers can be omitted by passing a null
> > popinter (specifically the null bitmap, if there are no nulls).
> > Therefore `buffer` in the lines above is a null pointer.
> >
> > (I tried replacing the null buffer with a 0-byte buffer: it crashes
> > further down the road...)
> >
> > Since this code has been there since ARROW-4714 was committed, my
> > intuition is that the JNI ORC wrapper was only exercised in very
> > specific use cases where C++ buffers are never null.
> >
> > My opinion is therefore that the ORC JNI tests should be ignored for
> > this release, and fixed later by some motivated developer.
> Sounds good to me. Anyone knows a way to skip certain tests with
> maven during maven release? Commenting?
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 16/04/2020 à 02:17, Krisztián Szűcs a écrit :
> > > Hi,
> > >
> > > We've merged the last required pull requests later today[/yesterday],
> > > so I started to cut RC0.
> > > The release process doesn't go smoothly, among other smaller problems
> > > I discovered a crash with the ORC Java JNI bindings (local error [1]),
> > > turned out that we don't run the orc-jni tests on the CI. I put up a PR
> > > to enable them [2], it has not reproduced the exact issue yet.
> > >
> > > Any help from the JNI developers would be appreciated. I can also cut
> > > RC0 with JNI disabled.
> > >
> > > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
> > > [2] https://github.com/apache/arrow/pull/6953
> > >
> > > Regards, Krisztian
> > >
>


Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-04-16-2

2020-04-16 Thread Krisztián Szűcs
On Fri, Apr 17, 2020 at 1:04 AM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2020-04-16-2
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2
>
> Failed Tasks:
> - centos-6-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-centos-6-amd64
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-win-vs2015-py38
> - test-conda-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-test-conda-cpp
> - wheel-osx-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-wheel-osx-cp36m
> - wheel-osx-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-wheel-osx-cp37m
> - wheel-win-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp35m
> - wheel-win-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp36m
> - wheel-win-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp37m
> - wheel-win-cp38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp38

I manually stopped these builds in favor of the binary release
artifacts. These should turn green from tomorrow.

>
> Succeeded Tasks:
> - centos-7-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-centos-7-amd64
> - centos-8-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-centos-8-amd64
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-linux-gcc-py38
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-osx-clang-py38
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-win-vs2015-py37
> - debian-buster-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-debian-buster-amd64
> - debian-stretch-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-debian-stretch-amd64
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-gandiva-jar-osx
> - gandiva-jar-xenial:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-gandiva-jar-xenial
> - homebrew-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-homebrew-cpp
> - homebrew-r-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-homebrew-r-autobrew
> - test-conda-cpp-valgrind:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-test-conda-cpp-valgrind
> - test-conda-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-test-conda-python-3.6
> - test-conda-python-3.7-dask-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-dask-latest
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-hdfs-2.9.2
> - test-conda-python-3.7-kartothek-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-kartothek-latest
> - test-conda-python-3.7-kartothek-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-kartothek-master
> - test-conda-python-3.7-pandas-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-pandas-latest
> - test-conda-pyt

Thoughts and issues regarding the release procedure

2020-04-16 Thread Krisztián Szűcs
Hi,

While our release scripts have improved a lot lately, cutting the first release
candidate still takes multiple days. I wouldn't consider the overall experience
bad - especially given the complexity of the project and the number of
artifacts we produce - but we definitely need to develop more automatisms and
tests supporting it.
I'm not sure what's the right way to have an action plan, but having more
manpower here would be great.

If you don't mind, I'd like to specially thank Kou for maintaining most of the
release scripts and (when not being the RM) always helping out with the
upcoming issues, I really appreciate it.

I tried to collect the problems, inconveniences I had with 0.17.0-RC0:

00-prepare.sh
-

*PREPARE_CHANGELOG* phase:

- need to set ARROW_HOME because changelog.py requires it
- changelog.py stopped working since adding support for parquet tickets [1],
because it requires the actual version to have a git tag, which is not yet
available during the release procedure (called from prepare.sh)

Extra *PREPARE_DEB_PACKAGE_NAMES* phase:

This is usually not required but the previous `so` versions were set to .100,
so I had to downgrade them:

```bash
PREPARE_DEFAULT=0 PREPARE_DEB_PACKAGE_NAMES=1 \
dev/release/00-prepare.sh 1.0.0 0.17.0
```

We should add this step to ensure that the so versions in the linux package
are properly set, and also consider to remove the previous version from the
pattern.

*PREPARE_TAG* phase:

The outstanding issue was the JNI ORC crash we have discussed on the mailing
list and I have a reproducer PR available for [6]. I had to `@Ignore` the
crashing test to be able to release.

Minor issues:

- on OSX brew installed ORC doesn't work so need to use the bundled source,
passing `-DORC_SOURCE=BUNDLED` fixes it
- need to update the maven versions to match -SNAPSHOT with command
`mvn versions:set -DnewVersion=0.17.0-SNAPSHOT`
- gandiva has deprication warnings, which was complicating the debugging of
the java jni orc problem [2]
- for me only OpenJDK 8 and Maven 3.5 version combination works, with newer
versions the build fails once a javadoc another time with an unknown error
- if I rerun the script without removing the version tag then the maven
process raises an error *after* compiling everything again, effectively
losing 20 minutes.

I'm generally frustrated about the compile time required iterate with the maven
build issues - I suppose there are better ways to invoke maven which I'm not
aware of. It would be nice to have a Java developer guide listing the
recommended
commands in certain scenarios.

Another thing I dislike about the release procedure is that the source code
tagging is done by / centered around maven. Preferably I would like to fire the
`git tag` command explicitly rather than letting one of the many
package managers
to do it implicitly.

02-source.sh


The previous step produces a directory with the same name as the tag:
apache-arrow-0.17.0 which makes the script failing [3]

Binary Packaging


I had to apply two patches to fix the linux packaging builds.

[Packaging][deb] Support RC version numbers for apache-arrow-archive-keyring [4]
The packaging scripts were not properly supporting the -RC0 postfixed version
number which is a special case because the linux binaries are built agains
the apache source release rather than a git tag. While I managed to fix it,
we probably need a follow-up after the rebase.

[Packaging][rpm] Fix CentOS 6 build [5]
This issue has surfaced today with the nightly builds as well, seems like
devtoolset-6 is no longer available for CentOS 6 so I had to update it.

Building the 4 windows wheels on Appveyor takes 4 hours, because we don't have
any parallelism there. We should port the windows scripts to either Azure or
Github Actions.


Looking forward to the improvement ideas!

Thanks Everyone!


[1] 
https://github.com/apache/arrow/commit/636a912c4bef6803fe3fede8a050d82124b18136#diff-fc9c73b2cf4e254206ac116714cfdbf4
[2] https://gist.github.com/kszucs/08b1582ca60a86c8dd8a1ab50bb6faad
[3] https://gist.github.com/kszucs/3337e475ce751cfbf11ea45a5a8817d2
[4] 
https://github.com/kszucs/arrow/commit/2c4cb4576a04b930a295ce6838179a8cf5a16058
[5] 
https://github.com/kszucs/arrow/commit/0b245aa3404bf016488e36e22a0140813b661f40
[6] https://github.com/apache/arrow/pull/6953


[VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-16 Thread Krisztián Szűcs
Hi,

I would like to propose the following release candidate (RC0) of Apache
Arrow version 0.17.0. This is a release consisting of 582
resolved JIRA issues[1].

This release candidate is based on commit:
3cbcb7b62c2f2d02851bff837758637eb592a64b [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7].
The changelog is located at [8].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [9] for how to validate a release candidate.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 0.17.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow 0.17.0 because...

[1]: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.17.0
[2]: 
https://github.com/apache/arrow/tree/3cbcb7b62c2f2d02851bff837758637eb592a64b
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.17.0-rc0
[4]: https://bintray.com/apache/arrow/centos-rc/0.17.0-rc0
[5]: https://bintray.com/apache/arrow/debian-rc/0.17.0-rc0
[6]: https://bintray.com/apache/arrow/python-rc/0.17.0-rc0
[7]: https://bintray.com/apache/arrow/ubuntu-rc/0.17.0-rc0
[8]: 
https://github.com/apache/arrow/blob/3cbcb7b62c2f2d02851bff837758637eb592a64b/CHANGELOG.md
[9]: 
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates


[NIGHTLY] Arrow Build Report for Job nightly-2020-04-16-2

2020-04-16 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-16-2

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2

Failed Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-centos-6-amd64
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-win-vs2015-py36
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-win-vs2015-py38
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-test-conda-cpp
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-wheel-osx-cp36m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-wheel-osx-cp37m
- wheel-win-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp35m
- wheel-win-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp36m
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp37m
- wheel-win-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-appveyor-wheel-win-cp38

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-osx-clang-py38
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-conda-win-vs2015-py37
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-debian-buster-amd64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-github-test-conda-cpp-valgrind
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-azure-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-2-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https:

[DISCUSS] Reducing scope of work for Arrow 1.0.0 release

2020-04-16 Thread Wes McKinney
hi folks,

Previously we had discussed a plan for making a 1.0.0 release based on
completeness of columnar format integration tests and making
forward/backward compatibility guarantees as formalized in

https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst

In particular, we wanted to demonstrate comprehensive Java/C++ interoperability.

As time has passed we have stalled out a bit on completing integration
tests for the "long tail" of data types and columnar format features.

https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit?usp=sharing

As such I wanted to propose a reduction in scope so that we can make a
1.0.0 release sooner. The plan would be as follows:

* Endeavor to have integration tests implemented and working in at
least one reference implementation (likely to be the C++ library). It
seems important to verify that what's in Columnar.rst is able to be
unambiguously implemented.
* Indicate in Versioning.rst or another place in the documentation the
list of data types or advanced columnar format features (like
delta/replacement dictionaries) that are not yet fully integration
tested.

Some of the essential protocol stability details and all of the most
commonly used data types have been stable for a long time now,
particularly after the recent alignment change. The current list of
features that aren't being tested for cross-implementation
compatibility should not pose risk to downstream users.

Thoughts about this? The 1.0.0 release is an important milestone for
the project and will help build continued momentum in developer and
user community growth.

Thanks
Wes


[jira] [Created] (ARROW-8489) [Developer] Autotune more things

2020-04-16 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8489:
--

 Summary: [Developer] Autotune more things
 Key: ARROW-8489
 URL: https://issues.apache.org/jira/browse/ARROW-8489
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools, Python
Reporter: Neal Richardson


ARROW-7801 added the "autotune" comment bot to fix linting errors and rebuild 
some generated files. cmake-format was left off because of Python problems (see 
description on https://github.com/apache/arrow/pull/6932). And there's probably 
other things we want to add (autopep8 for python, and similar for other 
languages?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-04-16-1

2020-04-16 Thread Neal Richardson
The patch to skip homebrew-cpp-autobrew and hiveserver2 was merged so
they'll be gone in the next run. conda-r is a flake ('Connection broken:
OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')")). I don't
know how to read the centos-6 failure.

Neal

On Thu, Apr 16, 2020 at 10:29 AM Crossbow  wrote:

>
> Arrow Build Report for Job nightly-2020-04-16-1
>
> All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1
>
> Failed Tasks:
> - centos-6-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-centos-6-amd64
> - homebrew-cpp-autobrew:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-homebrew-cpp-autobrew
> - test-conda-cpp-hiveserver2:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-cpp-hiveserver2
> - test-conda-r-3.6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-r-3.6
>
> Succeeded Tasks:
> - centos-7-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-centos-7-amd64
> - centos-8-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-centos-8-amd64
> - conda-linux-gcc-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-linux-gcc-py38
> - conda-osx-clang-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-osx-clang-py38
> - conda-win-vs2015-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-win-vs2015-py38
> - debian-buster-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-debian-buster-amd64
> - debian-stretch-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-debian-stretch-amd64
> - gandiva-jar-osx:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-gandiva-jar-osx
> - gandiva-jar-xenial:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-gandiva-jar-xenial
> - homebrew-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-homebrew-cpp
> - homebrew-r-autobrew:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-homebrew-r-autobrew
> - test-conda-cpp-valgrind:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-test-conda-cpp-valgrind
> - test-conda-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-test-conda-cpp
> - test-conda-python-3.6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-test-conda-python-3.6
> - test-conda-python-3.7-dask-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-dask-latest
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-hdfs-2.9.2
> - test-conda-python-3.7-kartothek-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-kartothek-latest
> - test-conda-python-3.7-kartothek-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-kartothek-master
> - test-conda-python-3.7-pandas-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-pandas-latest
> - test-conda-python-3.7-pandas-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-pandas-master
> - test-conda-python-3.7-spark-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches

Re: 0.17 release procedure

2020-04-16 Thread Krisztián Szűcs
On Thu, Apr 16, 2020 at 7:07 PM Micah Kornfield  wrote:
>
> I think we should remove orc from the build altogether, if tests are  
> failing.  If no one steps up to reenable it by next release we should delete 
> the code.
I did. The source release is already available, but I had to fix some packaging
issues, so currently I'm waiting for the binaries to finish.
>
> On Thursday, April 16, 2020, Krisztián Szűcs  
> wrote:
>>
>> On Thu, Apr 16, 2020 at 7:17 AM Micah Kornfield  
>> wrote:
>> >
>> > Hi Wes,
>> > I agree, I made a mistake, I've opened
>> > https://github.com/apache/arrow/pull/6955 to revert it and will merge once
>> > it turns green.
>> Don't worry about it Micah, that commit was applied after the last commit
>> required to the release so I could choose to release from the previous 
>> commit.
>> >
>> > In terms of unblocking the release, I don't think reverting will fix the
>> > issue (offline Krisztián mentioned he tried an RC on the previous commit).
>> Yes, this is unrelated to that commit.
>> >
>> > Given this hasn't been running in CI and is "contrib package", I'd advocate
>> > excluding the package from this release if we don't find a solution quickly
>> > (If memory serves I believe I asked the original contributor to CI and it
>> > appears that never happened, so for all intents and purposes I think we
>> > should treat this code as unmaintained/dead).
>> Most likely I'll skip the orc-test.
>> >
>> > Thanks,
>> > Micah
>> >
>> > On Wed, Apr 15, 2020 at 5:43 PM Wes McKinney  wrote:
>> >
>> > > FTR I don't think that ARROW-7534 should have been merged right around
>> > > the time that we are trying to produce a release candidate. Any
>> > > changes that impact packaging or codebase structures should be
>> > > approached with significant caution close to releases.
>> > >
>> > > On Wed, Apr 15, 2020 at 7:25 PM Krisztián Szűcs
>> > >  wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > We've merged the last required pull requests later today[/yesterday],
>> > > > so I started to cut RC0.
>> > > > The release process doesn't go smoothly, among other smaller problems
>> > > > I discovered a crash with the ORC Java JNI bindings (local error [1]),
>> > > > turned out that we don't run the orc-jni tests on the CI. I put up a PR
>> > > > to enable them [2], it has not reproduced the exact issue yet.
>> > > >
>> > > > Any help from the JNI developers would be appreciated. I can also cut
>> > > > RC0 with JNI disabled.
>> > > >
>> > > > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
>> > > > [2] https://github.com/apache/arrow/pull/6953
>> > > >
>> > > > Regards, Krisztian
>> > >


Re: Dictionary Memo serialization for CUDA IPC

2020-04-16 Thread Wes McKinney
hi Alex,

I haven't looked at the details of your code, but having APIs that
"collapse" the process of writing a single record batch along with its
dictionaries as a sequence of end-to-end IPC messages (and then having
a function to reverse that process to reconstruct the record batch)
and making that work for writing to GPU memory (using the new device
API) seems reasonable to me. There's a bit of refactoring that would
need to take place to be able to reuse certain code paths relating to
dictionary batch handling. Note also that we're due to implement delta
dictionaries and dictionary replacements so we might want to take all
of these needs into account to reduce the amount of code churn that
takes place.

- Wes

On Thu, Apr 16, 2020 at 1:44 PM Alex Baden  wrote:
>
> Hi all,
>
> OmniSci (formerly MapD) has been a long time user of Arrow for IPC
> serialization and mem sharing of query results, primarily through our
> python connector. We recently upgraded from Arrow 0.13 to Arrow 0.16.
> This required us to change our Arrow conversion routines to handle the
> new DictionaryMemo for serializing dictionaries. For CPU, this was
> fairly easy as I was able to just write the record batch stream using
> `arrow::ipc::WriteRecordBatchStream` (and read it using
> `RecordBatchStreamReader` on the client). For GPU/CUDA, however, I did
> not see a way to serialize the dictionary alongside the CUDA data and
> wrap that in a single "object" (the semantics of which probably need
> to be broken down, which I will do in a second). So, I came up with
> our own: 
> https://github.com/omnisci/omniscidb/blob/4ab6622bd0ee15e478bff4263f083ab761fc965c/QueryEngine/ArrowResultSetConverter.cpp#L219
>
> Essentially, I assemble a RecordBatch with the dictionaries I want to
> serialize and call WriteRecordBatchStream to serialize into a CPU IPC
> stream, which I copy to CPU shared memory. I then serialize the GPU
> record batch using SerializeRecordBatch into a CUDABuffer. The
> CudaBuffer is exported for IPC sharing, and I send both memory handles
> (CPU and GPU) over to the client. The client then has to read the
> RecordBatch containing the dictionaries and place the dictionaries
> into a DictionaryMemo, which is used to read the record batches from
> GPU. The process of building the DictionaryMemo on the client is here:
> https://github.com/omnisci/omniscidb/blob/master/Tests/ArrowIpcIntegrationTest.cpp#L380
>
> This seems to work ok, at least for C++, but I am interested in making
> it more compact and possibly contributing some or all to mainline
> Arrow. Therefore, I have two questions:
> 1) Does this look like a reasonable way to go about handling a
> serialized RecordBatch in CUDA (that is, separate the dictionaries and
> return two objects, or a single object holding two handles)?
> 2) Is this something that the Arrow community would be interested in
> seeing contributed in whatever form we agree upon for (1)?
>
> Thanks,
> Alex


Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-04-16 Thread Wes McKinney
It seems like there is reasonable consensus in the PR. If there are no
further comments I'll start a vote about this within the next several
days

On Mon, Apr 6, 2020 at 10:55 PM Wes McKinney  wrote:
>
> I updated the Format proposal again, please have a look
>
> https://github.com/apache/arrow/pull/6707
>
> On Wed, Apr 1, 2020 at 10:15 AM Wes McKinney  wrote:
> >
> > For uncompressed, memory mapping is disabled, so all of the bytes are
> > being read into RAM. I wanted to show that even when your IO pipe is
> > very fast (in the case with an NVMe SSD like I have, > 1GB/s for read
> > from disk) that you can still load faster with compressed files.
> >
> > Here were the prior Read results with
> >
> > * Single threaded decompression
> > * Memory mapping enabled
> >
> > https://ibb.co/4ZncdF8
> >
> > You can see for larger chunksizes, because the IPC reconstruction
> > overhead is about 60 microseconds per batch, that read time is very
> > low (10s of milliseconds).
> >
> > On Wed, Apr 1, 2020 at 10:10 AM Antoine Pitrou  wrote:
> > >
> > >
> > > The read times are still with memory mapping for the uncompressed case?
> > >  If so, impressive!
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 01/04/2020 à 16:44, Wes McKinney a écrit :
> > > > Several pieces of work got done in the last few days:
> > > >
> > > > * Changing from LZ4 raw to LZ4 frame format (what is recommended for
> > > > interoperability)
> > > > * Parallelizing both compression and decompression at the field level
> > > >
> > > > Here are the results (using 8 threads on an 8-core laptop). I disabled
> > > > the "memory map" feature so that in the uncompressed case all of the
> > > > data must be read off disk into memory. This helps illustrate the
> > > > compression/IO tradeoff to wall clock load times
> > > >
> > > > File size (only LZ4 may be different): https://ibb.co/CP3VQkp
> > > > Read time: https://ibb.co/vz9JZMx
> > > > Write time: https://ibb.co/H7bb68T
> > > >
> > > > In summary, now with multicore compression and decompression,
> > > > LZ4-compressed files are faster both to read and write even on a very
> > > > fast SSD, as are ZSTD-compressed files with a low ZSTD compression
> > > > level. I didn't notice a major difference between LZ4 raw and LZ4
> > > > frame formats. The reads and writes could be made faster still by
> > > > pipelining / making concurrent the disk read/write and
> > > > compression/decompression steps -- the current implementation performs
> > > > these tasks serially. We can improve this in the near future
> > > >
> > > > I'll update the Format proposal this week so we can move toward
> > > > something we can vote on. I would recommend that we await
> > > > implementations and integration tests for this before releasing this
> > > > as stable, in line with prior discussions about adding stuff to the
> > > > IPC protocol
> > > >
> > > > On Thu, Mar 26, 2020 at 4:57 PM Wes McKinney  
> > > > wrote:
> > > >>
> > > >> Here are the results:
> > > >>
> > > >> File size: https://ibb.co/71sBsg3
> > > >> Read time: https://ibb.co/4ZncdF8
> > > >> Write time: https://ibb.co/xhNkRS2
> > > >>
> > > >> Code: 
> > > >> https://github.com/wesm/notebooks/blob/master/20190919file_benchmarks/FeatherCompression.ipynb
> > > >> (based on https://github.com/apache/arrow/pull/6694)
> > > >>
> > > >> High level summary:
> > > >>
> > > >> * Chunksize 1024 vs 64K has relatively limited impact on file sizes
> > > >>
> > > >> * Wall clock read time is impacted by chunksize, maybe 30-40%
> > > >> difference between 1K row chunks versus 16K row chunks. One notable
> > > >> thing is that you can see clearly the overhead associated with IPC
> > > >> reconstruction even when the data is memory mapped. For example, in
> > > >> the Fannie Mae dataset there are 21,661 batches (each batch has 31
> > > >> fields) when the chunksize is 1024. So a read time of 1.3 seconds
> > > >> indicates ~60 microseconds of overhead for each record batch. When you
> > > >> consider the amount of business logic involved with reconstructing a
> > > >> record batch, 60 microseconds is pretty good. This also shows that
> > > >> every microsecond counts and we need to be carefully tracking
> > > >> microperformance in this critical operation.
> > > >>
> > > >> * Small chunksize results in higher write times for "expensive" codecs
> > > >> like ZSTD with a high compression ratio. For "cheap" codecs like LZ4
> > > >> it doesn't make as much of a difference
> > > >>
> > > >> * Note that LZ4 compressor results in faster wall clock time to disk
> > > >> presumably because the compression speed is faster than my SSD's write
> > > >> speed
> > > >>
> > > >> Implementation notes:
> > > >> * There is no parallelization or pipelining of reads or writes. For
> > > >> example, on write, all of the buffers are compressed with a single
> > > >> thread and then compression stops until the write to disk completes.
> > > >> On read, buffers are decompressed serially
> >

Dictionary Memo serialization for CUDA IPC

2020-04-16 Thread Alex Baden
Hi all,

OmniSci (formerly MapD) has been a long time user of Arrow for IPC
serialization and mem sharing of query results, primarily through our
python connector. We recently upgraded from Arrow 0.13 to Arrow 0.16.
This required us to change our Arrow conversion routines to handle the
new DictionaryMemo for serializing dictionaries. For CPU, this was
fairly easy as I was able to just write the record batch stream using
`arrow::ipc::WriteRecordBatchStream` (and read it using
`RecordBatchStreamReader` on the client). For GPU/CUDA, however, I did
not see a way to serialize the dictionary alongside the CUDA data and
wrap that in a single "object" (the semantics of which probably need
to be broken down, which I will do in a second). So, I came up with
our own: 
https://github.com/omnisci/omniscidb/blob/4ab6622bd0ee15e478bff4263f083ab761fc965c/QueryEngine/ArrowResultSetConverter.cpp#L219

Essentially, I assemble a RecordBatch with the dictionaries I want to
serialize and call WriteRecordBatchStream to serialize into a CPU IPC
stream, which I copy to CPU shared memory. I then serialize the GPU
record batch using SerializeRecordBatch into a CUDABuffer. The
CudaBuffer is exported for IPC sharing, and I send both memory handles
(CPU and GPU) over to the client. The client then has to read the
RecordBatch containing the dictionaries and place the dictionaries
into a DictionaryMemo, which is used to read the record batches from
GPU. The process of building the DictionaryMemo on the client is here:
https://github.com/omnisci/omniscidb/blob/master/Tests/ArrowIpcIntegrationTest.cpp#L380

This seems to work ok, at least for C++, but I am interested in making
it more compact and possibly contributing some or all to mainline
Arrow. Therefore, I have two questions:
1) Does this look like a reasonable way to go about handling a
serialized RecordBatch in CUDA (that is, separate the dictionaries and
return two objects, or a single object holding two handles)?
2) Is this something that the Arrow community would be interested in
seeing contributed in whatever form we agree upon for (1)?

Thanks,
Alex


[jira] [Created] (ARROW-8488) [R] Replace VALUE_OR_STOP with ValueOrStop

2020-04-16 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8488:
-

 Summary: [R] Replace VALUE_OR_STOP with ValueOrStop
 Key: ARROW-8488
 URL: https://issues.apache.org/jira/browse/ARROW-8488
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Francois Saint-Jacques


We should avoid macro as much as possible as per style guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8487) [FlightRPC][C++] Make it possible to target a specific payload size

2020-04-16 Thread David Li (Jira)
David Li created ARROW-8487:
---

 Summary: [FlightRPC][C++] Make it possible to target a specific 
payload size
 Key: ARROW-8487
 URL: https://issues.apache.org/jira/browse/ARROW-8487
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: David Li


gRPC by default limits message sizes on the wire. While Flight in turn disables 
these by default, they're still useful to be able to control memory 
consumption. A well-behaved client/server may wish to split up writes to 
respect these limits. However, right now, there's no way to measure the memory 
usage of what you're about to write without serializing it.

With ARROW-5377, we can in theory avoid this by having the writer take control 
of serialization, producing the IpcPayload, then measuring the size and writing 
the payload if the size is as desired. However, Flight doesn't provide such a 
low-level mechanism yet - we'd need to open that up as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8486) [C++] arrow-utility-test causes failures on a big-endian platform

2020-04-16 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-8486:
---

 Summary: [C++] arrow-utility-test causes failures on a big-endian 
platform
 Key: ARROW-8486
 URL: https://issues.apache.org/jira/browse/ARROW-8486
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Kazuaki Ishizaki


The current master causes the following failures in arrow-util-test.

```
 Start 17: arrow-utility-test
...
17: [ PASSED ] 253 tests.
17: [ FAILED ] 11 tests, listed below:
17: [ FAILED ] Bitmap.ShiftingWordsOptimization
17: [ FAILED ] Bitmap.VisitWordsAnd
17: [ FAILED ] BitArray.TestBool
17: [ FAILED ] BitArray.TestValues
17: [ FAILED ] Rle.SpecificSequences
17: [ FAILED ] Rle.TestValues
17: [ FAILED ] BitRle.Flush
17: [ FAILED ] BitRle.Random
17: [ FAILED ] BitRle.RepeatedPattern
17: [ FAILED ] BitRle.Overflow
17: [ FAILED ] RleDecoder.GetBatchSpaced
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8485) [Integration][Java] Implement extension types integration

2020-04-16 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8485:
-

 Summary: [Integration][Java] Implement extension types integration
 Key: ARROW-8485
 URL: https://issues.apache.org/jira/browse/ARROW-8485
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, Java
Reporter: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-04-16-1

2020-04-16 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-16-1

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1

Failed Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-centos-6-amd64
- homebrew-cpp-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-homebrew-cpp-autobrew
- test-conda-cpp-hiveserver2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-cpp-hiveserver2
- test-conda-r-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-r-3.6

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-debian-buster-amd64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-github-test-conda-cpp
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-azure-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-1-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/cro

Re: 0.17 release procedure

2020-04-16 Thread Micah Kornfield
I think we should remove orc from the build altogether, if tests are
failing.  If no one steps up to reenable it by next release we should
delete the code.

On Thursday, April 16, 2020, Krisztián Szűcs 
wrote:

> On Thu, Apr 16, 2020 at 7:17 AM Micah Kornfield 
> wrote:
> >
> > Hi Wes,
> > I agree, I made a mistake, I've opened
> > https://github.com/apache/arrow/pull/6955 to revert it and will merge
> once
> > it turns green.
> Don't worry about it Micah, that commit was applied after the last commit
> required to the release so I could choose to release from the previous
> commit.
> >
> > In terms of unblocking the release, I don't think reverting will fix the
> > issue (offline Krisztián mentioned he tried an RC on the previous
> commit).
> Yes, this is unrelated to that commit.
> >
> > Given this hasn't been running in CI and is "contrib package", I'd
> advocate
> > excluding the package from this release if we don't find a solution
> quickly
> > (If memory serves I believe I asked the original contributor to CI and it
> > appears that never happened, so for all intents and purposes I think we
> > should treat this code as unmaintained/dead).
> Most likely I'll skip the orc-test.
> >
> > Thanks,
> > Micah
> >
> > On Wed, Apr 15, 2020 at 5:43 PM Wes McKinney 
> wrote:
> >
> > > FTR I don't think that ARROW-7534 should have been merged right around
> > > the time that we are trying to produce a release candidate. Any
> > > changes that impact packaging or codebase structures should be
> > > approached with significant caution close to releases.
> > >
> > > On Wed, Apr 15, 2020 at 7:25 PM Krisztián Szűcs
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > We've merged the last required pull requests later today[/yesterday],
> > > > so I started to cut RC0.
> > > > The release process doesn't go smoothly, among other smaller problems
> > > > I discovered a crash with the ORC Java JNI bindings (local error
> [1]),
> > > > turned out that we don't run the orc-jni tests on the CI. I put up a
> PR
> > > > to enable them [2], it has not reproduced the exact issue yet.
> > > >
> > > > Any help from the JNI developers would be appreciated. I can also cut
> > > > RC0 with JNI disabled.
> > > >
> > > > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
> > > > [2] https://github.com/apache/arrow/pull/6953
> > > >
> > > > Regards, Krisztian
> > >
>


[jira] [Created] (ARROW-8484) [C++] TestArrayImport tests cause failures on a big-endian platform

2020-04-16 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-8484:
---

 Summary: [C++] TestArrayImport tests cause failures on a 
big-endian platform
 Key: ARROW-8484
 URL: https://issues.apache.org/jira/browse/ARROW-8484
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Kazuaki Ishizaki


The current code causes two types of failures of TestArrayImport on a a 
big-endian platform as follows:

```
12: [ RUN  ] TestSchemaImport.Struct
12: /home/ishizaki/Arrow/arrow/cpp/build-support/run-test.sh: line 92: 19528 
Segmentation fault  (core dumped) $TEST_EXECUTABLE "$@" 2>&1
12:  19529 Done| $ROOT/build-support/asan_symbolize.py
12:  19530 Done| ${CXXFILT:-c++filt}
12:  19531 Done| 
$ROOT/build-support/stacktrace_addr2line.pl $TEST_EXECUTABLE
12:  19532 Done| $pipe_cmd 2>&1
12:  19533 Done| tee $LOGFILE
```

```
12: [ RUN  ] TestArrayImport.PrimitiveWithOffset
12: /home/ishizaki/Arrow/arrow/cpp/src/arrow/testing/gtest_util.cc:77: Failure
12: Failed
12: 
12: @@ -0, +0 @@
12: -1027
12: -1541
12: -2055
12: +772
12: +1286
12: +1800
12: Expected:
12:   [
12: 1027,
12: 1541,
12: 2055
12:   ]
12: Actual:
12:   [
12: 772,
12: 1286,
12: 1800
12:   ]
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-04-16-0

2020-04-16 Thread Neal Richardson
Of the three remaining failures, two are resolved by
https://github.com/apache/arrow/pull/6952 (they're not things we should be
running nightly).

Neal

On Thu, Apr 16, 2020 at 12:04 AM Crossbow  wrote:

>
> Arrow Build Report for Job nightly-2020-04-16-0
>
> All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0
>
> Failed Tasks:
> - homebrew-cpp-autobrew:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-homebrew-cpp-autobrew
> - test-conda-cpp-hiveserver2:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-cpp-hiveserver2
> - test-conda-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-test-conda-cpp
>
> Succeeded Tasks:
> - centos-6-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-centos-6-amd64
> - centos-7-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-centos-7-amd64
> - centos-8-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-centos-8-amd64
> - conda-linux-gcc-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-linux-gcc-py38
> - conda-osx-clang-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-osx-clang-py38
> - conda-win-vs2015-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py38:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-win-vs2015-py38
> - debian-buster-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-debian-buster-amd64
> - debian-stretch-amd64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-debian-stretch-amd64
> - gandiva-jar-osx:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-gandiva-jar-osx
> - gandiva-jar-xenial:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-gandiva-jar-xenial
> - homebrew-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-homebrew-cpp
> - homebrew-r-autobrew:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-homebrew-r-autobrew
> - test-conda-cpp-valgrind:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-test-conda-cpp-valgrind
> - test-conda-python-3.6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-test-conda-python-3.6
> - test-conda-python-3.7-dask-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-dask-latest
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-hdfs-2.9.2
> - test-conda-python-3.7-kartothek-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-kartothek-latest
> - test-conda-python-3.7-kartothek-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-kartothek-master
> - test-conda-python-3.7-pandas-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-pandas-latest
> - test-conda-python-3.7-pandas-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-pandas-master
> - test-conda-python-3.7-spark-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-spark-master
> - test-conda-python-3.7-turbodbc-latest:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-turbodbc-lat

[jira] [Created] (ARROW-8483) [Ruby] Arrow::Table documentation improvement

2020-04-16 Thread Robert Borkowski (Jira)
Robert Borkowski created ARROW-8483:
---

 Summary: [Ruby] Arrow::Table documentation improvement
 Key: ARROW-8483
 URL: https://issues.apache.org/jira/browse/ARROW-8483
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Ruby
Reporter: Robert Borkowski


There is a redundant and confusing copy of example from another usage of 
initialize:

[https://github.com/apache/arrow/blob/master/ruby/red-arrow/lib/arrow/table.rb#L66]

[https://github.com/apache/arrow/blob/master/ruby/red-arrow/lib/arrow/table.rb#L84]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Format] Ambiguity with extension and dictionary

2020-04-16 Thread Wes McKinney
On Thu, Apr 16, 2020 at 8:52 AM Antoine Pitrou  wrote:
>
>
> Hello,
>
> Let's an IPC Schema message contains the following Field (in pseudo-JSON
> representation):
>
> {
>   "name" : "...",
>   "nullable" : true,
>   "type" : Utf8,
>   "dictionary": {
> "id": 0,
> "indexType": Int32,
> "isOrdered": true
>   },
>   "children" : [],
>   "metadata" : [
>  {"key": "ARROW:extension:name", "value": "MyExtType"},
>  {"key": "ARROW:extension:metadata", "value": "..."}
>   ]
> }
>
> Which of the following two logical types does it represent?
>
> - MyExtType>

This one.

> - int32-dictionary

I do not believe this is representable in the IPC metadata protocol as
it currently stands. So in C++ it would not be possible to roundtrip
dictionary, this would have to be written as
ext_type.

> Regards
>
> Antoine.


[Format] Ambiguity with extension and dictionary

2020-04-16 Thread Antoine Pitrou


Hello,

Let's an IPC Schema message contains the following Field (in pseudo-JSON
representation):

{
  "name" : "...",
  "nullable" : true,
  "type" : Utf8,
  "dictionary": {
"id": 0,
"indexType": Int32,
"isOrdered": true
  },
  "children" : [],
  "metadata" : [
 {"key": "ARROW:extension:name", "value": "MyExtType"},
 {"key": "ARROW:extension:metadata", "value": "..."}
  ]
}

Which of the following two logical types does it represent?

- MyExtType>
- int32-dictionary

Regards

Antoine.


[jira] [Created] (ARROW-8482) critical timestamp bug!

2020-04-16 Thread Olaf (Jira)
Olaf created ARROW-8482:
---

 Summary: critical timestamp bug!
 Key: ARROW-8482
 URL: https://issues.apache.org/jira/browse/ARROW-8482
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python, R
Reporter: Olaf


Hello there!

 

First of all, thanks for making parquet files a reality in *R* and *Python*. 
This is really great.

I found a very nasty bug when exchanging parquet files between the two 
platforms. Consider this.

 

 
{code:java}
import pandas as pd
import pyarrow.parquet as pq
import numpy as np

df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 
14:00:00.531'), 
 pd.to_datetime('2018-02-01 14:01:00.456'),
 pd.to_datetime('2018-03-05 14:01:02.200')]})
df['timestamp_est'] = 
pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
df
Out[5]: 
 string_time_utc timestamp_est
0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
{code}
 

Now I simply write to disk

 
{code:java}
df.to_parquet('myparquet.pq')
{code}
 

And the use *R* to load it.

 
{code:java}

test <- read_parquet('myparquet.pq')
> test
# A tibble: 3 x 2
 string_time_utc timestamp_est 
   
1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999
2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000
3 2018-03-05 09:01:02.20 2018-03-05 04:01:02.20
{code}
 

 

As you can see, the timestamps have been converted in the process. I first 
referenced this bug in feather but I still it is still there. This is a very 
dangerous, silent bug.

 

What do you think?

Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: 0.17 release procedure

2020-04-16 Thread Krisztián Szűcs
On Thu, Apr 16, 2020 at 7:17 AM Micah Kornfield  wrote:
>
> Hi Wes,
> I agree, I made a mistake, I've opened
> https://github.com/apache/arrow/pull/6955 to revert it and will merge once
> it turns green.
Don't worry about it Micah, that commit was applied after the last commit
required to the release so I could choose to release from the previous commit.
>
> In terms of unblocking the release, I don't think reverting will fix the
> issue (offline Krisztián mentioned he tried an RC on the previous commit).
Yes, this is unrelated to that commit.
>
> Given this hasn't been running in CI and is "contrib package", I'd advocate
> excluding the package from this release if we don't find a solution quickly
> (If memory serves I believe I asked the original contributor to CI and it
> appears that never happened, so for all intents and purposes I think we
> should treat this code as unmaintained/dead).
Most likely I'll skip the orc-test.
>
> Thanks,
> Micah
>
> On Wed, Apr 15, 2020 at 5:43 PM Wes McKinney  wrote:
>
> > FTR I don't think that ARROW-7534 should have been merged right around
> > the time that we are trying to produce a release candidate. Any
> > changes that impact packaging or codebase structures should be
> > approached with significant caution close to releases.
> >
> > On Wed, Apr 15, 2020 at 7:25 PM Krisztián Szűcs
> >  wrote:
> > >
> > > Hi,
> > >
> > > We've merged the last required pull requests later today[/yesterday],
> > > so I started to cut RC0.
> > > The release process doesn't go smoothly, among other smaller problems
> > > I discovered a crash with the ORC Java JNI bindings (local error [1]),
> > > turned out that we don't run the orc-jni tests on the CI. I put up a PR
> > > to enable them [2], it has not reproduced the exact issue yet.
> > >
> > > Any help from the JNI developers would be appreciated. I can also cut
> > > RC0 with JNI disabled.
> > >
> > > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
> > > [2] https://github.com/apache/arrow/pull/6953
> > >
> > > Regards, Krisztian
> >


Re: ORC JNI wrapper bugs [Re: 0.17 release procedure]

2020-04-16 Thread Krisztián Szűcs
On Thu, Apr 16, 2020 at 11:47 AM Antoine Pitrou  wrote:
>
>
> The ORC JNI wrapper is currently crashing on these lines:
> https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L279-L281
>
> because in C++, certain buffers can be omitted by passing a null
> popinter (specifically the null bitmap, if there are no nulls).
> Therefore `buffer` in the lines above is a null pointer.
>
> (I tried replacing the null buffer with a 0-byte buffer: it crashes
> further down the road...)
>
> Since this code has been there since ARROW-4714 was committed, my
> intuition is that the JNI ORC wrapper was only exercised in very
> specific use cases where C++ buffers are never null.
>
> My opinion is therefore that the ORC JNI tests should be ignored for
> this release, and fixed later by some motivated developer.
Sounds good to me. Anyone knows a way to skip certain tests with
maven during maven release? Commenting?
>
> Regards
>
> Antoine.
>
>
> Le 16/04/2020 à 02:17, Krisztián Szűcs a écrit :
> > Hi,
> >
> > We've merged the last required pull requests later today[/yesterday],
> > so I started to cut RC0.
> > The release process doesn't go smoothly, among other smaller problems
> > I discovered a crash with the ORC Java JNI bindings (local error [1]),
> > turned out that we don't run the orc-jni tests on the CI. I put up a PR
> > to enable them [2], it has not reproduced the exact issue yet.
> >
> > Any help from the JNI developers would be appreciated. I can also cut
> > RC0 with JNI disabled.
> >
> > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
> > [2] https://github.com/apache/arrow/pull/6953
> >
> > Regards, Krisztian
> >


[jira] [Created] (ARROW-8481) [Java] Provide an allocation manager based on Unsafe API

2020-04-16 Thread Liya Fan (Jira)
Liya Fan created ARROW-8481:
---

 Summary: [Java] Provide an allocation manager based on Unsafe API
 Key: ARROW-8481
 URL: https://issues.apache.org/jira/browse/ARROW-8481
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


This is in response to the discussion in 
https://github.com/apache/arrow/pull/6323#issuecomment-614195070

In this issue, we provide an allocation manager that is capable of allocation 
large (> 2GB) buffers. In addition, it does not depend on the netty library, 
which is aligning with the general trend of removing netty dependencies. In the 
future, we are going to make it the default allocation manager. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8480) [Rust] There is no check for allocation failure

2020-04-16 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8480:
--

 Summary: [Rust] There is no check for allocation failure
 Key: ARROW-8480
 URL: https://issues.apache.org/jira/browse/ARROW-8480
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan


Reported by bluss on Github:

[https://github.com/rust-ndarray/ndarray/issues/771]

 

"What I can see, there is no check for allocation success, so any buffer can be 
created with a null pointer, which leads to soundness problems in most methods. 
Best look into using {{std::alloc::handle_alloc_error}} or alternatives. (This 
problem means that the mutablebuffer is not a safe abstraction, and it should 
preferably not be exposed as public API like this.)"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8479) [Rust] Use "raw_data_mut" in "Buffer::typed_data_mut"

2020-04-16 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8479:
--

 Summary: [Rust] Use "raw_data_mut" in "Buffer::typed_data_mut"
 Key: ARROW-8479
 URL: https://issues.apache.org/jira/browse/ARROW-8479
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Paddy Horan


See: [https://github.com/apache/arrow/pull/6395/files#r408699014]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


ORC JNI wrapper bugs [Re: 0.17 release procedure]

2020-04-16 Thread Antoine Pitrou


The ORC JNI wrapper is currently crashing on these lines:
https://github.com/apache/arrow/blob/master/cpp/src/jni/orc/jni_wrapper.cpp#L279-L281

because in C++, certain buffers can be omitted by passing a null
popinter (specifically the null bitmap, if there are no nulls).
Therefore `buffer` in the lines above is a null pointer.

(I tried replacing the null buffer with a 0-byte buffer: it crashes
further down the road...)

Since this code has been there since ARROW-4714 was committed, my
intuition is that the JNI ORC wrapper was only exercised in very
specific use cases where C++ buffers are never null.

My opinion is therefore that the ORC JNI tests should be ignored for
this release, and fixed later by some motivated developer.

Regards

Antoine.


Le 16/04/2020 à 02:17, Krisztián Szűcs a écrit :
> Hi,
> 
> We've merged the last required pull requests later today[/yesterday],
> so I started to cut RC0.
> The release process doesn't go smoothly, among other smaller problems
> I discovered a crash with the ORC Java JNI bindings (local error [1]),
> turned out that we don't run the orc-jni tests on the CI. I put up a PR
> to enable them [2], it has not reproduced the exact issue yet.
> 
> Any help from the JNI developers would be appreciated. I can also cut
> RC0 with JNI disabled.
> 
> [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
> [2] https://github.com/apache/arrow/pull/6953
> 
> Regards, Krisztian
> 


[NIGHTLY] Arrow Build Report for Job nightly-2020-04-16-0

2020-04-16 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-16-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0

Failed Tasks:
- homebrew-cpp-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-homebrew-cpp-autobrew
- test-conda-cpp-hiveserver2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-cpp-hiveserver2
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-test-conda-cpp

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-centos-6-amd64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-debian-buster-amd64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-github-test-conda-cpp-valgrind
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-16-0-azure-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 
https://git