Re: Thank you
This is an awesome sentimate, thank you Release orchestrstors and contributors! Cheers, Lucas On Thu, Aug 27, 2020 at 1:26 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > > > I am writing to just thank all those involved in the release process. > > Sometimes the work of releases is not fully appreciated within development > > (where are the PRs ^_^?), but I find it impressive that the release is so > > smooth for such a complex project, and IMO that is to a large extent due to > > the team orchestrating the release. > > > > Best, > > Jorge > >
Thank you
Hi, I am writing to just thank all those involved in the release process. Sometimes the work of releases is not fully appreciated within development (where are the PRs ^_^?), but I find it impressive that the release is so smooth for such a complex project, and IMO that is to a large extent due to the team orchestrating the release. Best, Jorge
Re: Arrow Dataset API on Ceph
Hi Antoine, > Our main concern is that this new arrow::dataset::RadosFormat class will > be > > deriving from the arrow::dataset::FileFormat class, which seems to raise > a > > conceptual mismatch as there isn’t really a RADOS format but rather a > > formatting/serialization deferral that will be taking place, effectively > > introducing a new client-server layer in the Dataset API. > > So, RadosFormat would ultimately redirect to another dataset format > (e.g. ParquetFormat) when it comes to actually understanding the data? > Yes, that is our plan. Since this is going to be done on the storage-, server-side, this would be transparent to the client. So our main concern is whether this be OK from the design perspective, and could this eventually be merged upstream? thanks!
Re: Ursabot Benchmark framework for other languages
Hello Kazuaki! I recommend you read and take a look at the benchmark sub-library [1] of archery and how it's glued [2]. You will need to implement: - A runner for the framework you intend to use [3] and [4], it also implies capturing the output into a class that implements the "Benchmark" interface. - Tweak the glue code such that it doesn't assume that there's only a CppBenchmarkRunner [5] and probably some change in the command line UX, e.g. `--language=` with default to cpp such that it doesn't break the current workflow. Once you have that and it's working locally, you can ask Krizstian help on how to integrate it with ursabot. François [1] https://github.com/apache/arrow/tree/master/dev/archery/archery/benchmark [2] https://github.com/apache/arrow/blob/master/dev/archery/archery/cli.py#L349-L577 [3] https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark/runner.py#L133-L207 [4] https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark/google.py#L33-L175 [5] https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark/runner.py#L51-L86 On Thu, Aug 27, 2020 at 1:59 PM Kazuaki Ishizaki wrote: > > I am new to the Ursabot benchmark framework like > https://github.com/apache/arrow/pull/7940#issuecomment-673183390. > > How can we run benchmark programs written in other languages (e.g. Java) > beyond C++? If it is not supported yet, what do we need to support to run > benchmark programs in other languages? > > Best Regards, > Kazuaki Ishizaki, >
Ursabot Benchmark framework for other languages
I am new to the Ursabot benchmark framework like https://github.com/apache/arrow/pull/7940#issuecomment-673183390. How can we run benchmark programs written in other languages (e.g. Java) beyond C++? If it is not supported yet, what do we need to support to run benchmark programs in other languages? Best Regards, Kazuaki Ishizaki,
[NIGHTLY] Arrow Build Report for Job nightly-2020-08-27-0
Arrow Build Report for Job nightly-2020-08-27-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0 Failed Tasks: - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-osx-clang-py38 - conda-win-vs2017-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-win-vs2017-py36 - conda-win-vs2017-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-win-vs2017-py37 - conda-win-vs2017-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-win-vs2017-py38 - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-homebrew-r-autobrew - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-cpp-valgrind - test-conda-python-3.6-pandas-0.23: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.6-pandas-0.23 - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.7-kartothek-master Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-centos-6-amd64 - centos-7-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-centos-7-aarch64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-centos-7-amd64 - centos-8-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-centos-8-aarch64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-centos-8-amd64 - conda-clean: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-clean - conda-linux-gcc-py36-cpu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py36-cpu - conda-linux-gcc-py36-cuda: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py36-cuda - conda-linux-gcc-py37-cpu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py37-cpu - conda-linux-gcc-py37-cuda: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py37-cuda - conda-linux-gcc-py38-cpu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py38-cpu - conda-linux-gcc-py38-cuda: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py38-cuda - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-debian-buster-amd64 - debian-buster-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-debian-buster-arm64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-debian-stretch-amd64 - debian-stretch-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-debian-stretch-arm64 - example-cpp-minimal-build-static-system-dependency: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-example-cpp-minimal-build-static-system-dependency - example-cpp-minimal-build-static: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-example-cpp-minimal-build-static - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-gandiva-jar-xenial - nuget: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-nuget -
Re: Writing parquet to new filesystem API
Small correction, I of course meant to use "write_table" and not "write_to_dataset" in the code snippet (as the latter won't work that way). Corrected example below: But, if you already want to use the new filesystems for writing as well, > there is one workaround to create an output stream manually and pass that > instead of the path. > So in your example, you could replace > > pq.write_table(table, out_path, filesystem=subtree_filesystem) > > with > > with subtree_filesystem.open_output_stream(out_path) as f: > pq.write_table(table, f) > > However, this only works with single files (and not yet with > write_to_dataset for partitioned datasets). > >
Re: Writing parquet to new filesystem API
Hi Weston, You are not missing something obvious, but this is a bit an unfortunate "transitional phase" where we have new filesystems, but they are not yet fully supported (on the reading side they are supported in pyarrow 1.0, but for the writing side we are actively working on that, which will only be for the next release. I actually have an open PR to add support for the new filesystems to pq.write_table: https://github.com/apache/arrow/pull/7991). But, if you already want to use the new filesystems for writing as well, there is one workaround to create an output stream manually and pass that instead of the path. So in your example, you could replace pq.write_to_dataset(table, out_path, filesystem=subtree_filesystem) with with subtree_filesystem.open_output_stream(out_path) as f: pq.write_table(table, f) However, this only works with single files (and not yet with write_to_dataset for partitioned datasets). Best, Joris On Thu, 27 Aug 2020 at 00:58, Weston Pace wrote: > > Forgive me if I am missing something obvious but I am unable to write > parquet files using the new filesystem API. > > Here is what I am trying: > > https://gist.github.com/westonpace/0c5ef01e21a40de5d16608b7f12de80d > > I receive an error: > > OSError: Unrecognized filesystem:
Re: Arrow Dataset API on Ceph
Hello Ivo, Le 27/08/2020 à 07:02, Ivo Jimenez a écrit : > > Our main concern is that this new arrow::dataset::RadosFormat class will be > deriving from the arrow::dataset::FileFormat class, which seems to raise a > conceptual mismatch as there isn’t really a RADOS format but rather a > formatting/serialization deferral that will be taking place, effectively > introducing a new client-server layer in the Dataset API. So, RadosFormat would ultimately redirect to another dataset format (e.g. ParquetFormat) when it comes to actually understanding the data? Regards Antoine.