Re: [ANNOUNCE] New Arrow committer: Kazuaki Ishizaki

2021-06-06 Thread Wes McKinney
Congrats!

On Sun, Jun 6, 2021 at 9:28 PM Sutou Kouhei  wrote:

> Hi,
>
> On behalf of the Arrow PMC, I'm happy to announce that
> Kazuaki Ishizaki has accepted an invitation to become a
> committer on Apache Arrow. Welcome, and thank you for your
> contributions!
>
>
> Thanks,
> --
> kou
>


[ANNOUNCE] New Arrow committer: Kazuaki Ishizaki

2021-06-06 Thread Sutou Kouhei
Hi,

On behalf of the Arrow PMC, I'm happy to announce that
Kazuaki Ishizaki has accepted an invitation to become a
committer on Apache Arrow. Welcome, and thank you for your
contributions!


Thanks,
--
kou


Re: [NIGHTLY] Arrow Build Report for Job nightly-2021-06-06-0

2021-06-06 Thread Neal Richardson
Folks, I count 28 failing nightly builds. This is not good. Has moving the
nightly build report to a separate mailing list allowed us to ignore the
failures more easily?

Leaving aside any questions of improving our nightly build monitoring,
which I know are ongoing: could you please take a look at the failures,
particularly the newer ones (I know there are some persistent ones here
that have open JIRA issues already) and see if they can be fixed?

Thanks,
Neal


On Sun, Jun 6, 2021 at 3:17 AM Crossbow  wrote:

>
> Arrow Build Report for Job nightly-2021-06-06-0
>
> All tasks:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0
>
> Failed Tasks:
> - centos-8-amd64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-centos-8-amd64
> - centos-8-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-centos-8-arm64
> - conda-osx-clang-py36-r36:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py36-r36
> - conda-osx-clang-py37-r40:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py37-r40
> - conda-osx-clang-py38:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py38
> - conda-osx-clang-py39:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-conda-osx-clang-py39
> - debian-bullseye-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-debian-bullseye-arm64
> - debian-buster-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-debian-buster-arm64
> - java-jars:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-java-jars
> - test-conda-python-3.7-kartothek-latest:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-kartothek-latest
> - test-conda-python-3.7-kartothek-master:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-kartothek-master
> - test-conda-python-3.7-turbodbc-latest:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-turbodbc-latest
> - test-conda-python-3.7-turbodbc-master:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.7-turbodbc-master
> - test-conda-python-3.8-spark-master:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-conda-python-3.8-spark-master
> - test-r-linux-valgrind:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-test-r-linux-valgrind
> - test-r-without-arrow:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-test-r-without-arrow
> - test-ubuntu-18.04-cpp-release:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-ubuntu-18.04-cpp-release
> - test-ubuntu-18.04-cpp-static:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-ubuntu-18.04-cpp-static
> - test-ubuntu-18.04-cpp:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-github-test-ubuntu-18.04-cpp
> - test-ubuntu-18.04-r-sanitizer:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-azure-test-ubuntu-18.04-r-sanitizer
> - ubuntu-bionic-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-bionic-arm64
> - ubuntu-focal-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-focal-arm64
> - ubuntu-groovy-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-groovy-arm64
> - ubuntu-hirsute-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-ubuntu-hirsute-arm64
> - wheel-manylinux2014-cp36-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-wheel-manylinux2014-cp36-arm64
> - wheel-manylinux2014-cp37-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-wheel-manylinux2014-cp37-arm64
> - wheel-manylinux2014-cp38-arm64:
>   URL:
> https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-06-06-0-travis-wheel-manylinux2014-cp38-arm64
> - wheel-manylinux2014-cp39-arm64:
>   URL:
> 

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-06-06 Thread Jorge Cardoso Leitão
Hi,

Thanks a lot for your feedback. I agree with all the arguments put forward,
including Andrew's point about the large change.

I tried a gradual 4 months ago, but it was really difficult and I gave up.
I estimate that the work involved is half the work of writing parquet2 and
arrow2 in the first place. The internal dependency on ArrayData (the main
culprit of the unsafe) on arrow-rs is so prevalent that all core components
need to be re-written from scratch (IPC, FFI, IO, array/transform/*,
compute, SIMD). I personally do not have the motivation to do it, though.

Jed, the public API changes are small for end users. A typical migration is
[1]. I agree that we can further reduce the change-set by keeping legacy
interfaces available.

Andy, on my machine, the current benchmarks on query 1 yield:

type, master (ms), PR [2] for arrow2+parquet2 (ms)
memory (-m): 332.9, 239.6
load (the initial time in -m with --format parquet): 5286.0, 3043.0
parquet format: 1316.1, 930.7
tbl format: 5297.3, 5383.1

i.e. I am observing some improvements. Queries with joins are still slower.
The pruning of parquet groups and pages based on stats are not yet there; I
am working on them.

I agree that this should go through IP clearance. I will start this
process. My thinking would be to create two empty repos on apache/*, and
create 2 PRs from the main branches of each of my repos to those repos, and
only merge them once IP is cleared. Would that be a reasonable process, Wes?

Names: arrow-experimental-rs2 and arrow-experimental-rs-parquet2, or?

Best,
Jorge

[1]
https://github.com/apache/arrow-datafusion/pull/68/files#diff-2ec0d66fd16c73ff72a23d40186944591e040507c731228ad70b4e168e2a4660
[2] https://github.com/apache/arrow-datafusion/pull/68


On Fri, May 28, 2021 at 5:22 AM Josh Taylor  wrote:

> I played around with it, for my use case I really like the new way of
> writing CSVs, it's much more obvious. I love the `read_stream_metadata`
> function as well.
>
> I'm seeing a very slight speed (~8ms) improvement on my end, but I read a
> bunch of files in a directory and spit out a CSV, the bottleneck is the
> parsing of lots of files, but it's pretty quick per file.
>
> old:
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_0 120224
> bytes took 1ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_1 123144
> bytes took 1ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_10
> 17127928 bytes took 159ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_11
> 17127144 bytes took 160ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_12
> 17130352 bytes took 158ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_13
> 17128544 bytes took 158ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_14
> 17128664 bytes took 158ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_15
> 17128328 bytes took 158ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_16
> 17129288 bytes took 158ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_17
> 17131056 bytes took 158ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_18
> 17130344 bytes took 158ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_19
> 17128432 bytes took 160ms
>
> new:
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_0 120224
> bytes took 1ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_1 123144
> bytes took 1ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_10
> 17127928 bytes took 157ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_11
> 17127144 bytes took 152ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_12
> 17130352 bytes took 154ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_13
> 17128544 bytes took 153ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_14
> 17128664 bytes took 154ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_15
> 17128328 bytes took 153ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_16
> 17129288 bytes took 152ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_17
> 17131056 bytes took 153ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_18
> 17130344 bytes took 155ms
> /home/josh/staging/019c4715-3200-48fa--4105000cd71e/data_0_0_19
> 17128432 bytes took 153ms
>
> I'm going to chunk the dirs to speed up the reads and throw it into a par
> iter.
>
> On Fri, 28 May 2021 at 09:09, Josh Taylor  wrote:
>
> > Hi!
> >
> > I've been using arrow/arrow-rs for a while now, my use case is to parse
> > Arrow streaming files and convert them into CSV.
> >
> > Rust has been an absolute fantastic tool for this, the performance is
> > outstanding and I have had no issues using it for my use case.
> >
> > I would be happy to test out