Re: [ANNOUNCE] New Arrow committer: Kazuaki Ishizaki
Congrats! On Sun, Jun 6, 2021 at 9:28 PM Sutou Kouhei wrote: > Hi, > > On behalf of the Arrow PMC, I'm happy to announce that > Kazuaki Ishizaki has accepted an invitation to become a > committer on Apache Arrow. Welcome, and thank you for your > contributions! > > > Thanks, > -- > kou >
[ANNOUNCE] New Arrow committer: Kazuaki Ishizaki
Hi, On behalf of the Arrow PMC, I'm happy to announce that Kazuaki Ishizaki has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! Thanks, -- kou
Re: [NIGHTLY] Arrow Build Report for Job nightly-2021-06-06-0
Folks, I count 28 failing nightly builds. This is not good. Has moving the nightly build report to a separate mailing list allowed us to ignore the failures more easily? Leaving aside any questions of improving our nightly build monitoring, which I know are ongoing: could you please take a look at the failures, particularly the newer ones (I know there are some persistent ones here that have open JIRA issues already) and see if they can be fixed? Thanks, Neal On Sun, Jun 6, 2021 at 3:17 AM Crossbow wrote: > > Arrow Build Report for Job nightly-2021-06-06-0 > > All tasks: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0 > > Failed Tasks: > - centos-8-amd64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-centos-8-amd64 > - centos-8-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-centos-8-arm64 > - conda-osx-clang-py36-r36: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py36-r36 > - conda-osx-clang-py37-r40: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py37-r40 > - conda-osx-clang-py38: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py38 > - conda-osx-clang-py39: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py39 > - debian-bullseye-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-debian-bullseye-arm64 > - debian-buster-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-debian-buster-arm64 > - java-jars: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-java-jars > - test-conda-python-3.7-kartothek-latest: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-kartothek-latest > - test-conda-python-3.7-kartothek-master: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-kartothek-master > - test-conda-python-3.7-turbodbc-latest: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-turbodbc-latest > - test-conda-python-3.7-turbodbc-master: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-turbodbc-master > - test-conda-python-3.8-spark-master: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.8-spark-master > - test-r-linux-valgrind: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-test-r-linux-valgrind > - test-r-without-arrow: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-test-r-without-arrow > - test-ubuntu-18.04-cpp-release: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-ubuntu-18.04-cpp-release > - test-ubuntu-18.04-cpp-static: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-ubuntu-18.04-cpp-static > - test-ubuntu-18.04-cpp: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-ubuntu-18.04-cpp > - test-ubuntu-18.04-r-sanitizer: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-test-ubuntu-18.04-r-sanitizer > - ubuntu-bionic-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-bionic-arm64 > - ubuntu-focal-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-focal-arm64 > - ubuntu-groovy-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-groovy-arm64 > - ubuntu-hirsute-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-hirsute-arm64 > - wheel-manylinux2014-cp36-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-wheel-manylinux2014-cp36-arm64 > - wheel-manylinux2014-cp37-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-wheel-manylinux2014-cp37-arm64 > - wheel-manylinux2014-cp38-arm64: > URL: > https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-wheel-manylinux2014-cp38-arm64 > - wheel-manylinux2014-cp39-arm64: > URL: >
Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations
Hi, Thanks a lot for your feedback. I agree with all the arguments put forward, including Andrew's point about the large change. I tried a gradual 4 months ago, but it was really difficult and I gave up. I estimate that the work involved is half the work of writing parquet2 and arrow2 in the first place. The internal dependency on ArrayData (the main culprit of the unsafe) on arrow-rs is so prevalent that all core components need to be re-written from scratch (IPC, FFI, IO, array/transform/*, compute, SIMD). I personally do not have the motivation to do it, though. Jed, the public API changes are small for end users. A typical migration is [1]. I agree that we can further reduce the change-set by keeping legacy interfaces available. Andy, on my machine, the current benchmarks on query 1 yield: type, master (ms), PR [2] for arrow2+parquet2 (ms) memory (-m): 332.9, 239.6 load (the initial time in -m with --format parquet): 5286.0, 3043.0 parquet format: 1316.1, 930.7 tbl format: 5297.3, 5383.1 i.e. I am observing some improvements. Queries with joins are still slower. The pruning of parquet groups and pages based on stats are not yet there; I am working on them. I agree that this should go through IP clearance. I will start this process. My thinking would be to create two empty repos on apache/*, and create 2 PRs from the main branches of each of my repos to those repos, and only merge them once IP is cleared. Would that be a reasonable process, Wes? Names: arrow-experimental-rs2 and arrow-experimental-rs-parquet2, or? Best, Jorge [1] https://github.com/apache/arrow-datafusion/pull/68/files#diff-2ec0d66fd16c73ff72a23d40186944591e040507c731228ad70b4e168e2a4660 [2] https://github.com/apache/arrow-datafusion/pull/68 On Fri, May 28, 2021 at 5:22 AM Josh Taylor wrote: > I played around with it, for my use case I really like the new way of > writing CSVs, it's much more obvious. I love the `read_stream_metadata` > function as well. > > I'm seeing a very slight speed (~8ms) improvement on my end, but I read a > bunch of files in a directory and spit out a CSV, the bottleneck is the > parsing of lots of files, but it's pretty quick per file. > > old: > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_0 120224 > bytes took 1ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_1 123144 > bytes took 1ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_10 > 17127928 bytes took 159ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_11 > 17127144 bytes took 160ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_12 > 17130352 bytes took 158ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_13 > 17128544 bytes took 158ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_14 > 17128664 bytes took 158ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_15 > 17128328 bytes took 158ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_16 > 17129288 bytes took 158ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_17 > 17131056 bytes took 158ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_18 > 17130344 bytes took 158ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_19 > 17128432 bytes took 160ms > > new: > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_0 120224 > bytes took 1ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_1 123144 > bytes took 1ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_10 > 17127928 bytes took 157ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_11 > 17127144 bytes took 152ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_12 > 17130352 bytes took 154ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_13 > 17128544 bytes took 153ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_14 > 17128664 bytes took 154ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_15 > 17128328 bytes took 153ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_16 > 17129288 bytes took 152ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_17 > 17131056 bytes took 153ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_18 > 17130344 bytes took 155ms > /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_19 > 17128432 bytes took 153ms > > I'm going to chunk the dirs to speed up the reads and throw it into a par > iter. > > On Fri, 28 May 2021 at 09:09, Josh Taylor wrote: > > > Hi! > > > > I've been using arrow/arrow-rs for a while now, my use case is to parse > > Arrow streaming files and convert them into CSV. > > > > Rust has been an absolute fantastic tool for this, the performance is > > outstanding and I have had no issues using it for my use case. > > > > I would be happy to test out