Re: 0.17 release procedure

2020-04-15 Thread Micah Kornfield
Hi Wes,
I agree, I made a mistake, I've opened
https://github.com/apache/arrow/pull/6955 to revert it and will merge once
it turns green.

In terms of unblocking the release, I don't think reverting will fix the
issue (offline Krisztián mentioned he tried an RC on the previous commit).

Given this hasn't been running in CI and is "contrib package", I'd advocate
excluding the package from this release if we don't find a solution quickly
(If memory serves I believe I asked the original contributor to CI and it
appears that never happened, so for all intents and purposes I think we
should treat this code as unmaintained/dead).

Thanks,
Micah

On Wed, Apr 15, 2020 at 5:43 PM Wes McKinney  wrote:

> FTR I don't think that ARROW-7534 should have been merged right around
> the time that we are trying to produce a release candidate. Any
> changes that impact packaging or codebase structures should be
> approached with significant caution close to releases.
>
> On Wed, Apr 15, 2020 at 7:25 PM Krisztián Szűcs
>  wrote:
> >
> > Hi,
> >
> > We've merged the last required pull requests later today[/yesterday],
> > so I started to cut RC0.
> > The release process doesn't go smoothly, among other smaller problems
> > I discovered a crash with the ORC Java JNI bindings (local error [1]),
> > turned out that we don't run the orc-jni tests on the CI. I put up a PR
> > to enable them [2], it has not reproduced the exact issue yet.
> >
> > Any help from the JNI developers would be appreciated. I can also cut
> > RC0 with JNI disabled.
> >
> > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
> > [2] https://github.com/apache/arrow/pull/6953
> >
> > Regards, Krisztian
>


[jira] [Created] (ARROW-8478) [Java] Rollback contrib package changes.

2020-04-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-8478:
--

 Summary: [Java] Rollback contrib package changes.
 Key: ARROW-8478
 URL: https://issues.apache.org/jira/browse/ARROW-8478
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: 0.17 release procedure

2020-04-15 Thread Wes McKinney
FTR I don't think that ARROW-7534 should have been merged right around
the time that we are trying to produce a release candidate. Any
changes that impact packaging or codebase structures should be
approached with significant caution close to releases.

On Wed, Apr 15, 2020 at 7:25 PM Krisztián Szűcs
 wrote:
>
> Hi,
>
> We've merged the last required pull requests later today[/yesterday],
> so I started to cut RC0.
> The release process doesn't go smoothly, among other smaller problems
> I discovered a crash with the ORC Java JNI bindings (local error [1]),
> turned out that we don't run the orc-jni tests on the CI. I put up a PR
> to enable them [2], it has not reproduced the exact issue yet.
>
> Any help from the JNI developers would be appreciated. I can also cut
> RC0 with JNI disabled.
>
> [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
> [2] https://github.com/apache/arrow/pull/6953
>
> Regards, Krisztian


0.17 release procedure

2020-04-15 Thread Krisztián Szűcs
Hi,

We've merged the last required pull requests later today[/yesterday],
so I started to cut RC0.
The release process doesn't go smoothly, among other smaller problems
I discovered a crash with the ORC Java JNI bindings (local error [1]),
turned out that we don't run the orc-jni tests on the CI. I put up a PR
to enable them [2], it has not reproduced the exact issue yet.

Any help from the JNI developers would be appreciated. I can also cut
RC0 with JNI disabled.

[1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d
[2] https://github.com/apache/arrow/pull/6953

Regards, Krisztian


[NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-2

2020-04-15 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-15-2

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2

Failed Tasks:
- homebrew-cpp-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-homebrew-cpp-autobrew
- test-conda-cpp-hiveserver2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-cpp-hiveserver2
- test-r-linux-as-cran:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-r-linux-as-cran
- test-ubuntu-18.04-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-ubuntu-18.04-cpp
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-wheel-manylinux1-cp37m
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-wheel-osx-cp37m

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-centos-6-amd64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-debian-buster-amd64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-conda-cpp
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-spark

[jira] [Created] (ARROW-8477) [C++] Enable reading and writing of long filenames for Windows

2020-04-15 Thread TP Boudreau (Jira)
TP Boudreau created ARROW-8477:
--

 Summary: [C++] Enable reading and writing of long filenames for 
Windows
 Key: ARROW-8477
 URL: https://issues.apache.org/jira/browse/ARROW-8477
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: TP Boudreau
Assignee: TP Boudreau
 Attachments: long_path.cc

Parquet file path lengths are limited to ~260 characters in Windows 
environments.  For example, the attached program runs successfully on Linux 
(provided the nested directories already exist), but fails on Windows.

Replacing the currently used _wsopen() functions with their Win32 analogues 
fixes this.  A patch is forthcoming.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8476) [C++] Create "libarrow_thrift" containing all code requiring the Thrift libraries

2020-04-15 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8476:
---

 Summary: [C++] Create "libarrow_thrift" containing all code 
requiring the Thrift libraries
 Key: ARROW-8476
 URL: https://issues.apache.org/jira/browse/ARROW-8476
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


The purpose of this is to avoid having to ever having to statically link 
libthrift.a into more than one shared library. Currently we are statically 
linking into libparquet.so, but there are some other efforts (e.g. 
libarrow_hiveserver2, which I'd eventually like to become production-worthy) 
where Thrift symbols are required. By factoring out the serialization code into 
a helper library we avoid the linking conundrum



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8475) [CI][Crossbow] Rehabilitate (or delete) hiveserver2 nightly job

2020-04-15 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8475:
--

 Summary: [CI][Crossbow] Rehabilitate (or delete) hiveserver2 
nightly job
 Key: ARROW-8475
 URL: https://issues.apache.org/jira/browse/ARROW-8475
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson


Disabled in ARROW-8474 cc [~wesm]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8474) [CI][Crossbow] Skip some nightlies we don't need to run

2020-04-15 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8474:
--

 Summary: [CI][Crossbow] Skip some nightlies we don't need to run
 Key: ARROW-8474
 URL: https://issues.apache.org/jira/browse/ARROW-8474
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson
Assignee: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Arrow sync call April 15 at 12:00 US/Eastern, 16:00 UTC

2020-04-15 Thread Neal Richardson
Attendees:

Projjal Chanda
Ben Kietzman
Uwe Korn
Micah Kornfield
Rok Mihevc
Antoine Pitrou
Neal Richardson
François Saint-Jacques

Discussion:

* Nested parquet: Micah restated what he emailed to the ML. Will make some
Jiras to let us parallelize the work. Bigger question about measuring read
performance on real world data.
* Integration testing: Antoine working on C++ side but it's going to need
someone to pick up on Java side. See current status at
https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347
* 0.17/nightly build failures: brainstorming ways to keep the nightly
builds greener

On Wed, Apr 15, 2020 at 8:28 AM Neal Richardson 
wrote:

> Hi all,
> Last minute reminder that our biweekly call is coming up in a half hour at
> https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will
> be sent out to the mailing list afterward.
>
> Neal
>


[jira] [Created] (ARROW-8473) "Statistics support" in rust/parquet readme is incorrect

2020-04-15 Thread Jira
Krzysztof Stanisławek created ARROW-8473:


 Summary: "Statistics support" in rust/parquet readme is incorrect
 Key: ARROW-8473
 URL: https://issues.apache.org/jira/browse/ARROW-8473
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Krzysztof Stanisławek


Statistics are not actually supported in rust implementation of parquet. See 
[https://github.com/apache/arrow/blob/3e3712a14a3242d70145fb9d3d6f0f4b8c374e68/rust/parquet/src/column/writer.rs#L522]
 or similar lines in this file, or writer.rs.

https://github.com/apache/arrow/pull/6951



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-1

2020-04-15 Thread Krisztián Szűcs
On Wed, Apr 15, 2020 at 7:04 PM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2020-04-15-1
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1
>
> Failed Tasks:
> - homebrew-cpp-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp-autobrew
> - test-conda-cpp-hiveserver2:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-circle-test-conda-cpp-hiveserver2
> - test-conda-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.6
> - test-conda-python-3.7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.7
> - test-conda-python-3.8:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.8
> - test-debian-10-python-3:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-10-python-3
> - test-debian-ruby:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-ruby
> - test-fedora-30-python-3:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-fedora-30-python-3
> - test-ubuntu-18.04-python-3:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-18.04-python-3
> - test-ubuntu-ruby:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-ruby
test-conda-* are fixed on master
> - wheel-osx-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-wheel-osx-cp37m
should be fixed by https://github.com/apache/arrow/pull/6950
> - wheel-win-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp35m
> - wheel-win-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp36m
> - wheel-win-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp37m
> - wheel-win-cp38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp38
I stopped the windows wheel builds.
>
> Succeeded Tasks:
> - centos-6-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-6-amd64
> - centos-7-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-7-amd64
> - centos-8-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-8-amd64
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py38
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py38
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py38
> - debian-buster-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-buster-amd64
> - debian-stretch-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-stretch-amd64
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-osx
> - gandiva-jar-xenial:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-xenial
> - homebrew-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp
> - homebrew-r-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-tra

[NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-1

2020-04-15 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-15-1

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1

Failed Tasks:
- homebrew-cpp-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp-autobrew
- test-conda-cpp-hiveserver2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-circle-test-conda-cpp-hiveserver2
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.6
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.7
- test-conda-python-3.8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.8
- test-debian-10-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-10-python-3
- test-debian-ruby:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-ruby
- test-fedora-30-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-fedora-30-python-3
- test-ubuntu-18.04-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-18.04-python-3
- test-ubuntu-ruby:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-ruby
- wheel-osx-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-wheel-osx-cp37m
- wheel-win-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp35m
- wheel-win-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp36m
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp37m
- wheel-win-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp38

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-6-amd64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-buster-amd64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-test-conda-cpp
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?qu

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

2020-04-15 Thread Wes McKinney
hi Micah,

Sounds good. It seems like there are a few projects where people might
be able to work without stepping on each other's toes

A. Array reassembly from raw repetition/definition levels (I would
guess this would be your focus)
B. Schema and data generation for round-trip correctness and
performance testing (I reckon that the unit tests for A will largely
be hand-written examples like you did for the write path)
C. Benchmarks, particularly to be able to assess performance changes
going from the old incomplete implementations to the new ones

Some of us should be able to pitch in to help with this. Might also be
a good opportunity to do some cleanup of the test code in
cpp/src/parquet/arrow

- Wes

On Tue, Apr 14, 2020 at 11:19 PM Micah Kornfield  wrote:
>
> Hi Wes,
> Yes, I'm making progress and at this point I anticipate being able to finish 
> it off by next release, possibly without support for round tripping fixed 
> size lists.  I've been spending some time thinking about different approaches 
> and have started coding some of the building blocks, which I think in the 
> common case (relatively low nesting levels) should be fairly performant (I'm 
> also going to write some benchmarks to sanity check this).  One caveat to 
> this is my schedule is going to change slightly next week and its possible my 
> bandwidth might be more limited, I'll update the list if this happens.
>
> I think there are at least two areas that I'm not working on that could be 
> parallelized if you or your team has bandwidth.
>
> 1. It would be good to have some parquet files representing real world 
> datasets available to benchmark against.
> 2. The higher level book keeping of tracking which def-levels/rep-levels are 
> needed to compare against for any particular column (i.e. preceding repeated 
> parent).  I'm currently working on the code that takes these and converts 
> them to offsets/null fields.
>
> I can go into more details if you or your team would like to collaborate.
>
> Thanks,
> Micah
>
> On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney  wrote:
>>
>> hi Micah,
>>
>> I'm glad that we have the write side of nested completed for 0.17.0.
>>
>> As far as completing the read side and then implementing sufficient
>> testing to exercise corner cases in end-to-end reads/writes, do you
>> anticipate being able to work on this in the next 4-6 weeks (obviously
>> the state of the world has affected everyone's availability /
>> bandwidth)? I ask because someone from my team (or me also) may be
>> able to get involved and help this move along. It'd be great to have
>> this 100% completed and checked off our list for the next release
>> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
>> tests get completed also)
>>
>> thanks
>> Wes
>>
>> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield  
>> wrote:
>> >>
>> >> Glad to hear about the progress. As I mentioned on #2, what do you
>> >> think about setting up a feature branch for you to merge PRs into?
>> >> Then the branch can be iterated on and we can merge it back when it's
>> >> feature complete and does not have perf regressions for the flat
>> >> read/write path.
>> >>
>> > I'd like to avoid a separate branch if possible.  I'm willing to close the 
>> > open PR till I'm sure it is needed but I'm hoping keeping PRs as small 
>> > focused as possible with performance testing a long the way will be a 
>> > better reviewer and developer experience here.
>> >
>> >> The earliest I'd have time to work on this myself would likely be
>> >> sometime in March. Others are welcome to jump in as well (and it'd be
>> >> great to increase the overall level of knowledge of the Parquet
>> >> codebase)
>> >
>> > Hopefully, Igor can help out otherwise I'll take up the read path after I 
>> > finish the write path.
>> >
>> > -Micah
>> >
>> > On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney  wrote:
>> >>
>> >> hi Micah
>> >>
>> >> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield  
>> >> wrote:
>> >> >
>> >> > Just to give an update.  I've been a little bit delayed, but my 
>> >> > progress is
>> >> > as follows:
>> >> > 1.  Had 1 PR merged that will exercise basic end-to-end tests.
>> >> > 2.  Have another PR open that allows a configuration option in C++ to
>> >> > determine which algorithm version to use for reading/writing, the 
>> >> > existing
>> >> > version and the new version supported complex-nested arrays.  I think a
>> >> > large amount of code will be reused/delegated to but I will err on the 
>> >> > side
>> >> > of not touching the existing code/algorithms so that any errors in the
>> >> > implementation  or performance regressions can hopefully be mitigated at
>> >> > runtime.  I expect in later releases (once the code has "baked") will
>> >> > become a no-op.
>> >>
>> >> Glad to hear about the progress. As I mentioned on #2, what do you
>> >> think about setting up a feature branch for you to merge PRs into?
>> >> Then the branch can be iterated on and we can mer

Arrow sync call April 15 at 12:00 US/Eastern, 16:00 UTC

2020-04-15 Thread Neal Richardson
Hi all,
Last minute reminder that our biweekly call is coming up in a half hour at
https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will
be sent out to the mailing list afterward.

Neal


[jira] [Created] (ARROW-8472) [Go][Integration] Represent 64 bit integers as JSON::string

2020-04-15 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8472:
---

 Summary: [Go][Integration] Represent 64 bit integers as 
JSON::string
 Key: ARROW-8472
 URL: https://issues.apache.org/jira/browse/ARROW-8472
 Project: Apache Arrow
  Issue Type: Bug
  Components: Go, Integration
Affects Versions: 0.16.0
Reporter: Ben Kietzman
 Fix For: 1.0.0


see ARROW-6407



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8471) [C++][Integration] Regression to /u?int64/ as JSON::number

2020-04-15 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8471:
---

 Summary: [C++][Integration] Regression to /u?int64/ as JSON::number
 Key: ARROW-8471
 URL: https://issues.apache.org/jira/browse/ARROW-8471
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Integration
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 1.0.0


In moving datagen.py under archery, the fix for ARROW-6310 was clobbered out 
resulting in representing 64 bit integers as numbers in integration JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8470) [Python][R] Expose incremental write API for Feather files

2020-04-15 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8470:
---

 Summary: [Python][R] Expose incremental write API for Feather files
 Key: ARROW-8470
 URL: https://issues.apache.org/jira/browse/ARROW-8470
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python, R
Reporter: Wes McKinney


This is already available for writing IPC files, so this would mostly be an 
interface to that with the addition of logic to handle conversions from Python 
or R data frames and splitting the inputs based on the configured Feather 
chunksize



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8469) [Dev] Fix nightly docker tests on azure

2020-04-15 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8469:
--

 Summary: [Dev] Fix nightly docker tests on azure
 Key: ARROW-8469
 URL: https://issues.apache.org/jira/browse/ARROW-8469
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 0.17.0


Need to remove pushd/popd from the azure template, see build log 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8468) [Document] Fix the incorrect null bits description

2020-04-15 Thread Liya Fan (Jira)
Liya Fan created ARROW-8468:
---

 Summary: [Document] Fix the incorrect null bits description
 Key: ARROW-8468
 URL: https://issues.apache.org/jira/browse/ARROW-8468
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation
Reporter: Liya Fan
Assignee: Liya Fan


The desription about the null bits in arrays.rst is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8467) [C++] Test cases using ArrayFromJSON assume only a little-endian platform

2020-04-15 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-8467:
---

 Summary: [C++] Test cases using ArrayFromJSON assume only a 
little-endian platform
 Key: ARROW-8467
 URL: https://issues.apache.org/jira/browse/ARROW-8467
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kazuaki Ishizaki


Test cases using ArrayFromJSON assumes little endian platform

The following test cases seem to assume little-endian platform.
TEST_F(TestChunkedArray, View) at 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/table_test.cc#L175
TEST(TestArrayView, PrimitiveAsFixedSizeBinary) at 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array_view_test.cc#L105
TEST(TestArrayView, StructAsStructSimple) at 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array_view_test.cc#L126

One of examples is {{PrimitiveAsFixedSizeBinary}}.

{code}
TEST(TestArrayView, PrimitiveAsFixedSizeBinary) {
  auto arr = ArrayFromJSON(int32(), "[2020568934, 2054316386, null]");
  auto expected = ArrayFromJSON(fixed_size_binary(4), R"(["foox", "barz", 
null])");
  CheckView(arr, expected);
  CheckView(expected, arr);
}
{code}


The expected strings are represented in binary as follows:
{code}
"foox" = [0x66 0x6f 0x6f 0x78]
"barz" = [0x62 0x61 0x72 0x7a]
{code}

This test gives a value as raw int32. The current values assume only a 
little-endian platform as follows to generate the expected binary sequence.
{code}
2020568934 = 0x786f6f66
2054316386 = 0x7a726162
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-0

2020-04-15 Thread Crossbow


Arrow Build Report for Job nightly-2020-04-15-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0

Failed Tasks:
- homebrew-cpp-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-homebrew-cpp-autobrew
- test-conda-cpp-hiveserver2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-cpp-hiveserver2
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.6
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.7
- test-conda-python-3.8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.8
- test-debian-10-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-debian-10-python-3
- test-debian-ruby:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-debian-ruby
- test-fedora-30-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-fedora-30-python-3
- test-ubuntu-18.04-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-ubuntu-18.04-python-3
- test-ubuntu-ruby:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-ubuntu-ruby

Pending Tasks:
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-spark-master

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-centos-6-amd64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-debian-buster-amd64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-debian-stretch-amd64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-test-conda-cpp
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://gith