Re: 0.17 release procedure
Hi Wes, I agree, I made a mistake, I've opened https://github.com/apache/arrow/pull/6955 to revert it and will merge once it turns green. In terms of unblocking the release, I don't think reverting will fix the issue (offline Krisztián mentioned he tried an RC on the previous commit). Given this hasn't been running in CI and is "contrib package", I'd advocate excluding the package from this release if we don't find a solution quickly (If memory serves I believe I asked the original contributor to CI and it appears that never happened, so for all intents and purposes I think we should treat this code as unmaintained/dead). Thanks, Micah On Wed, Apr 15, 2020 at 5:43 PM Wes McKinney wrote: > FTR I don't think that ARROW-7534 should have been merged right around > the time that we are trying to produce a release candidate. Any > changes that impact packaging or codebase structures should be > approached with significant caution close to releases. > > On Wed, Apr 15, 2020 at 7:25 PM Krisztián Szűcs > wrote: > > > > Hi, > > > > We've merged the last required pull requests later today[/yesterday], > > so I started to cut RC0. > > The release process doesn't go smoothly, among other smaller problems > > I discovered a crash with the ORC Java JNI bindings (local error [1]), > > turned out that we don't run the orc-jni tests on the CI. I put up a PR > > to enable them [2], it has not reproduced the exact issue yet. > > > > Any help from the JNI developers would be appreciated. I can also cut > > RC0 with JNI disabled. > > > > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d > > [2] https://github.com/apache/arrow/pull/6953 > > > > Regards, Krisztian >
[jira] [Created] (ARROW-8478) [Java] Rollback contrib package changes.
Micah Kornfield created ARROW-8478: -- Summary: [Java] Rollback contrib package changes. Key: ARROW-8478 URL: https://issues.apache.org/jira/browse/ARROW-8478 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: 0.17 release procedure
FTR I don't think that ARROW-7534 should have been merged right around the time that we are trying to produce a release candidate. Any changes that impact packaging or codebase structures should be approached with significant caution close to releases. On Wed, Apr 15, 2020 at 7:25 PM Krisztián Szűcs wrote: > > Hi, > > We've merged the last required pull requests later today[/yesterday], > so I started to cut RC0. > The release process doesn't go smoothly, among other smaller problems > I discovered a crash with the ORC Java JNI bindings (local error [1]), > turned out that we don't run the orc-jni tests on the CI. I put up a PR > to enable them [2], it has not reproduced the exact issue yet. > > Any help from the JNI developers would be appreciated. I can also cut > RC0 with JNI disabled. > > [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d > [2] https://github.com/apache/arrow/pull/6953 > > Regards, Krisztian
0.17 release procedure
Hi, We've merged the last required pull requests later today[/yesterday], so I started to cut RC0. The release process doesn't go smoothly, among other smaller problems I discovered a crash with the ORC Java JNI bindings (local error [1]), turned out that we don't run the orc-jni tests on the CI. I put up a PR to enable them [2], it has not reproduced the exact issue yet. Any help from the JNI developers would be appreciated. I can also cut RC0 with JNI disabled. [1] https://gist.github.com/kszucs/67205eda6cd19e3cd08c86894f5b4c2d [2] https://github.com/apache/arrow/pull/6953 Regards, Krisztian
[NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-2
Arrow Build Report for Job nightly-2020-04-15-2 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2 Failed Tasks: - homebrew-cpp-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-homebrew-cpp-autobrew - test-conda-cpp-hiveserver2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-cpp-hiveserver2 - test-r-linux-as-cran: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-r-linux-as-cran - test-ubuntu-18.04-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-ubuntu-18.04-cpp - wheel-manylinux1-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-wheel-manylinux1-cp37m - wheel-osx-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-wheel-osx-cp37m Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-centos-6-amd64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-centos-7-amd64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-centos-8-amd64 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-conda-win-vs2015-py38 - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-debian-buster-amd64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-debian-stretch-amd64 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-gandiva-jar-xenial - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-travis-homebrew-r-autobrew - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-github-test-conda-cpp - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-azure-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-kartothek-master - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-2-circle-test-conda-python-3.7-spark
[jira] [Created] (ARROW-8477) [C++] Enable reading and writing of long filenames for Windows
TP Boudreau created ARROW-8477: -- Summary: [C++] Enable reading and writing of long filenames for Windows Key: ARROW-8477 URL: https://issues.apache.org/jira/browse/ARROW-8477 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: TP Boudreau Assignee: TP Boudreau Attachments: long_path.cc Parquet file path lengths are limited to ~260 characters in Windows environments. For example, the attached program runs successfully on Linux (provided the nested directories already exist), but fails on Windows. Replacing the currently used _wsopen() functions with their Win32 analogues fixes this. A patch is forthcoming. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8476) [C++] Create "libarrow_thrift" containing all code requiring the Thrift libraries
Wes McKinney created ARROW-8476: --- Summary: [C++] Create "libarrow_thrift" containing all code requiring the Thrift libraries Key: ARROW-8476 URL: https://issues.apache.org/jira/browse/ARROW-8476 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney The purpose of this is to avoid having to ever having to statically link libthrift.a into more than one shared library. Currently we are statically linking into libparquet.so, but there are some other efforts (e.g. libarrow_hiveserver2, which I'd eventually like to become production-worthy) where Thrift symbols are required. By factoring out the serialization code into a helper library we avoid the linking conundrum -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8475) [CI][Crossbow] Rehabilitate (or delete) hiveserver2 nightly job
Neal Richardson created ARROW-8475: -- Summary: [CI][Crossbow] Rehabilitate (or delete) hiveserver2 nightly job Key: ARROW-8475 URL: https://issues.apache.org/jira/browse/ARROW-8475 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Neal Richardson Disabled in ARROW-8474 cc [~wesm] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8474) [CI][Crossbow] Skip some nightlies we don't need to run
Neal Richardson created ARROW-8474: -- Summary: [CI][Crossbow] Skip some nightlies we don't need to run Key: ARROW-8474 URL: https://issues.apache.org/jira/browse/ARROW-8474 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Neal Richardson Assignee: Neal Richardson -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Arrow sync call April 15 at 12:00 US/Eastern, 16:00 UTC
Attendees: Projjal Chanda Ben Kietzman Uwe Korn Micah Kornfield Rok Mihevc Antoine Pitrou Neal Richardson François Saint-Jacques Discussion: * Nested parquet: Micah restated what he emailed to the ML. Will make some Jiras to let us parallelize the work. Bigger question about measuring read performance on real world data. * Integration testing: Antoine working on C++ side but it's going to need someone to pick up on Java side. See current status at https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347 * 0.17/nightly build failures: brainstorming ways to keep the nightly builds greener On Wed, Apr 15, 2020 at 8:28 AM Neal Richardson wrote: > Hi all, > Last minute reminder that our biweekly call is coming up in a half hour at > https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will > be sent out to the mailing list afterward. > > Neal >
[jira] [Created] (ARROW-8473) "Statistics support" in rust/parquet readme is incorrect
Krzysztof Stanisławek created ARROW-8473: Summary: "Statistics support" in rust/parquet readme is incorrect Key: ARROW-8473 URL: https://issues.apache.org/jira/browse/ARROW-8473 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Krzysztof Stanisławek Statistics are not actually supported in rust implementation of parquet. See [https://github.com/apache/arrow/blob/3e3712a14a3242d70145fb9d3d6f0f4b8c374e68/rust/parquet/src/column/writer.rs#L522] or similar lines in this file, or writer.rs. https://github.com/apache/arrow/pull/6951 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-1
On Wed, Apr 15, 2020 at 7:04 PM Crossbow wrote: > > > Arrow Build Report for Job nightly-2020-04-15-1 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1 > > Failed Tasks: > - homebrew-cpp-autobrew: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp-autobrew > - test-conda-cpp-hiveserver2: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-circle-test-conda-cpp-hiveserver2 > - test-conda-python-3.6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.6 > - test-conda-python-3.7: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.7 > - test-conda-python-3.8: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.8 > - test-debian-10-python-3: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-10-python-3 > - test-debian-ruby: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-ruby > - test-fedora-30-python-3: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-fedora-30-python-3 > - test-ubuntu-18.04-python-3: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-18.04-python-3 > - test-ubuntu-ruby: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-ruby test-conda-* are fixed on master > - wheel-osx-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-wheel-osx-cp37m should be fixed by https://github.com/apache/arrow/pull/6950 > - wheel-win-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp35m > - wheel-win-cp36m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp36m > - wheel-win-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp37m > - wheel-win-cp38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp38 I stopped the windows wheel builds. > > Succeeded Tasks: > - centos-6-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-6-amd64 > - centos-7-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-7-amd64 > - centos-8-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-8-amd64 > - conda-linux-gcc-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py36 > - conda-linux-gcc-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py37 > - conda-linux-gcc-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py38 > - conda-osx-clang-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py36 > - conda-osx-clang-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py37 > - conda-osx-clang-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py38 > - conda-win-vs2015-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py36 > - conda-win-vs2015-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py37 > - conda-win-vs2015-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py38 > - debian-buster-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-buster-amd64 > - debian-stretch-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-stretch-amd64 > - gandiva-jar-osx: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-osx > - gandiva-jar-xenial: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-xenial > - homebrew-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp > - homebrew-r-autobrew: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-tra
[NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-1
Arrow Build Report for Job nightly-2020-04-15-1 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1 Failed Tasks: - homebrew-cpp-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp-autobrew - test-conda-cpp-hiveserver2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-circle-test-conda-cpp-hiveserver2 - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.6 - test-conda-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.7 - test-conda-python-3.8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-conda-python-3.8 - test-debian-10-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-10-python-3 - test-debian-ruby: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-debian-ruby - test-fedora-30-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-fedora-30-python-3 - test-ubuntu-18.04-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-18.04-python-3 - test-ubuntu-ruby: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-test-ubuntu-ruby - wheel-osx-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-wheel-osx-cp37m - wheel-win-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp35m - wheel-win-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp36m - wheel-win-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp37m - wheel-win-cp38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-appveyor-wheel-win-cp38 Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-6-amd64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-7-amd64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-centos-8-amd64 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-azure-conda-win-vs2015-py38 - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-buster-amd64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-debian-stretch-amd64 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-gandiva-jar-xenial - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-travis-homebrew-r-autobrew - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-1-github-test-conda-cpp - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?qu
Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)
hi Micah, Sounds good. It seems like there are a few projects where people might be able to work without stepping on each other's toes A. Array reassembly from raw repetition/definition levels (I would guess this would be your focus) B. Schema and data generation for round-trip correctness and performance testing (I reckon that the unit tests for A will largely be hand-written examples like you did for the write path) C. Benchmarks, particularly to be able to assess performance changes going from the old incomplete implementations to the new ones Some of us should be able to pitch in to help with this. Might also be a good opportunity to do some cleanup of the test code in cpp/src/parquet/arrow - Wes On Tue, Apr 14, 2020 at 11:19 PM Micah Kornfield wrote: > > Hi Wes, > Yes, I'm making progress and at this point I anticipate being able to finish > it off by next release, possibly without support for round tripping fixed > size lists. I've been spending some time thinking about different approaches > and have started coding some of the building blocks, which I think in the > common case (relatively low nesting levels) should be fairly performant (I'm > also going to write some benchmarks to sanity check this). One caveat to > this is my schedule is going to change slightly next week and its possible my > bandwidth might be more limited, I'll update the list if this happens. > > I think there are at least two areas that I'm not working on that could be > parallelized if you or your team has bandwidth. > > 1. It would be good to have some parquet files representing real world > datasets available to benchmark against. > 2. The higher level book keeping of tracking which def-levels/rep-levels are > needed to compare against for any particular column (i.e. preceding repeated > parent). I'm currently working on the code that takes these and converts > them to offsets/null fields. > > I can go into more details if you or your team would like to collaborate. > > Thanks, > Micah > > On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney wrote: >> >> hi Micah, >> >> I'm glad that we have the write side of nested completed for 0.17.0. >> >> As far as completing the read side and then implementing sufficient >> testing to exercise corner cases in end-to-end reads/writes, do you >> anticipate being able to work on this in the next 4-6 weeks (obviously >> the state of the world has affected everyone's availability / >> bandwidth)? I ask because someone from my team (or me also) may be >> able to get involved and help this move along. It'd be great to have >> this 100% completed and checked off our list for the next release >> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration >> tests get completed also) >> >> thanks >> Wes >> >> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield >> wrote: >> >> >> >> Glad to hear about the progress. As I mentioned on #2, what do you >> >> think about setting up a feature branch for you to merge PRs into? >> >> Then the branch can be iterated on and we can merge it back when it's >> >> feature complete and does not have perf regressions for the flat >> >> read/write path. >> >> >> > I'd like to avoid a separate branch if possible. I'm willing to close the >> > open PR till I'm sure it is needed but I'm hoping keeping PRs as small >> > focused as possible with performance testing a long the way will be a >> > better reviewer and developer experience here. >> > >> >> The earliest I'd have time to work on this myself would likely be >> >> sometime in March. Others are welcome to jump in as well (and it'd be >> >> great to increase the overall level of knowledge of the Parquet >> >> codebase) >> > >> > Hopefully, Igor can help out otherwise I'll take up the read path after I >> > finish the write path. >> > >> > -Micah >> > >> > On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney wrote: >> >> >> >> hi Micah >> >> >> >> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield >> >> wrote: >> >> > >> >> > Just to give an update. I've been a little bit delayed, but my >> >> > progress is >> >> > as follows: >> >> > 1. Had 1 PR merged that will exercise basic end-to-end tests. >> >> > 2. Have another PR open that allows a configuration option in C++ to >> >> > determine which algorithm version to use for reading/writing, the >> >> > existing >> >> > version and the new version supported complex-nested arrays. I think a >> >> > large amount of code will be reused/delegated to but I will err on the >> >> > side >> >> > of not touching the existing code/algorithms so that any errors in the >> >> > implementation or performance regressions can hopefully be mitigated at >> >> > runtime. I expect in later releases (once the code has "baked") will >> >> > become a no-op. >> >> >> >> Glad to hear about the progress. As I mentioned on #2, what do you >> >> think about setting up a feature branch for you to merge PRs into? >> >> Then the branch can be iterated on and we can mer
Arrow sync call April 15 at 12:00 US/Eastern, 16:00 UTC
Hi all, Last minute reminder that our biweekly call is coming up in a half hour at https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will be sent out to the mailing list afterward. Neal
[jira] [Created] (ARROW-8472) [Go][Integration] Represent 64 bit integers as JSON::string
Ben Kietzman created ARROW-8472: --- Summary: [Go][Integration] Represent 64 bit integers as JSON::string Key: ARROW-8472 URL: https://issues.apache.org/jira/browse/ARROW-8472 Project: Apache Arrow Issue Type: Bug Components: Go, Integration Affects Versions: 0.16.0 Reporter: Ben Kietzman Fix For: 1.0.0 see ARROW-6407 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8471) [C++][Integration] Regression to /u?int64/ as JSON::number
Ben Kietzman created ARROW-8471: --- Summary: [C++][Integration] Regression to /u?int64/ as JSON::number Key: ARROW-8471 URL: https://issues.apache.org/jira/browse/ARROW-8471 Project: Apache Arrow Issue Type: Bug Components: C++, Integration Affects Versions: 0.16.0 Reporter: Ben Kietzman Assignee: Ben Kietzman Fix For: 1.0.0 In moving datagen.py under archery, the fix for ARROW-6310 was clobbered out resulting in representing 64 bit integers as numbers in integration JSON. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8470) [Python][R] Expose incremental write API for Feather files
Wes McKinney created ARROW-8470: --- Summary: [Python][R] Expose incremental write API for Feather files Key: ARROW-8470 URL: https://issues.apache.org/jira/browse/ARROW-8470 Project: Apache Arrow Issue Type: Improvement Components: Python, R Reporter: Wes McKinney This is already available for writing IPC files, so this would mostly be an interface to that with the addition of logic to handle conversions from Python or R data frames and splitting the inputs based on the configured Feather chunksize -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8469) [Dev] Fix nightly docker tests on azure
Krisztian Szucs created ARROW-8469: -- Summary: [Dev] Fix nightly docker tests on azure Key: ARROW-8469 URL: https://issues.apache.org/jira/browse/ARROW-8469 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs Assignee: Krisztian Szucs Fix For: 0.17.0 Need to remove pushd/popd from the azure template, see build log https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.7 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8468) [Document] Fix the incorrect null bits description
Liya Fan created ARROW-8468: --- Summary: [Document] Fix the incorrect null bits description Key: ARROW-8468 URL: https://issues.apache.org/jira/browse/ARROW-8468 Project: Apache Arrow Issue Type: Bug Components: Documentation Reporter: Liya Fan Assignee: Liya Fan The desription about the null bits in arrays.rst is incorrect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8467) [C++] Test cases using ArrayFromJSON assume only a little-endian platform
Kazuaki Ishizaki created ARROW-8467: --- Summary: [C++] Test cases using ArrayFromJSON assume only a little-endian platform Key: ARROW-8467 URL: https://issues.apache.org/jira/browse/ARROW-8467 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kazuaki Ishizaki Test cases using ArrayFromJSON assumes little endian platform The following test cases seem to assume little-endian platform. TEST_F(TestChunkedArray, View) at https://github.com/apache/arrow/blob/master/cpp/src/arrow/table_test.cc#L175 TEST(TestArrayView, PrimitiveAsFixedSizeBinary) at https://github.com/apache/arrow/blob/master/cpp/src/arrow/array_view_test.cc#L105 TEST(TestArrayView, StructAsStructSimple) at https://github.com/apache/arrow/blob/master/cpp/src/arrow/array_view_test.cc#L126 One of examples is {{PrimitiveAsFixedSizeBinary}}. {code} TEST(TestArrayView, PrimitiveAsFixedSizeBinary) { auto arr = ArrayFromJSON(int32(), "[2020568934, 2054316386, null]"); auto expected = ArrayFromJSON(fixed_size_binary(4), R"(["foox", "barz", null])"); CheckView(arr, expected); CheckView(expected, arr); } {code} The expected strings are represented in binary as follows: {code} "foox" = [0x66 0x6f 0x6f 0x78] "barz" = [0x62 0x61 0x72 0x7a] {code} This test gives a value as raw int32. The current values assume only a little-endian platform as follows to generate the expected binary sequence. {code} 2020568934 = 0x786f6f66 2054316386 = 0x7a726162 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2020-04-15-0
Arrow Build Report for Job nightly-2020-04-15-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0 Failed Tasks: - homebrew-cpp-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-homebrew-cpp-autobrew - test-conda-cpp-hiveserver2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-cpp-hiveserver2 - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.6 - test-conda-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.7 - test-conda-python-3.8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-conda-python-3.8 - test-debian-10-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-debian-10-python-3 - test-debian-ruby: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-debian-ruby - test-fedora-30-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-fedora-30-python-3 - test-ubuntu-18.04-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-ubuntu-18.04-python-3 - test-ubuntu-ruby: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-test-ubuntu-ruby Pending Tasks: - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-spark-master Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-centos-6-amd64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-centos-7-amd64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-centos-8-amd64 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-azure-conda-win-vs2015-py38 - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-debian-buster-amd64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-debian-stretch-amd64 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-gandiva-jar-xenial - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-travis-homebrew-r-autobrew - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-github-test-conda-cpp - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-15-0-circle-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: https://gith