Re: Thank you

2020-08-27 Thread Lucas Pickup
This is an awesome sentimate, thank you Release orchestrstors and
contributors!

Cheers,
Lucas

On Thu, Aug 27, 2020 at 1:26 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
>
>
> I am writing to just thank all those involved in the release process.
>
> Sometimes the work of releases is not fully appreciated within development
>
> (where are the PRs ^_^?), but I find it impressive that the release is so
>
> smooth for such a complex project, and IMO that is to a large extent due to
>
> the team orchestrating the release.
>
>
>
> Best,
>
> Jorge
>
>


Thank you

2020-08-27 Thread Jorge Cardoso Leitão
Hi,

I am writing to just thank all those involved in the release process.
Sometimes the work of releases is not fully appreciated within development
(where are the PRs ^_^?), but I find it impressive that the release is so
smooth for such a complex project, and IMO that is to a large extent due to
the team orchestrating the release.

Best,
Jorge


Re: Arrow Dataset API on Ceph

2020-08-27 Thread Ivo Jimenez
Hi Antoine,

> Our main concern is that this new arrow::dataset::RadosFormat class will
> be
> > deriving from the arrow::dataset::FileFormat class, which seems to raise
> a
> > conceptual mismatch as there isn’t really a RADOS format but rather a
> > formatting/serialization deferral that will be taking place, effectively
> > introducing a new client-server layer in the Dataset API.
>
> So, RadosFormat would ultimately redirect to another dataset format
> (e.g. ParquetFormat) when it comes to actually understanding the data?
>

Yes, that is our plan. Since this is going to be done on the storage-,
server-side, this would be transparent to the client. So our main concern
is whether this be OK from the design perspective, and could this
eventually be merged upstream?

thanks!


Re: Ursabot Benchmark framework for other languages

2020-08-27 Thread Francois Saint-Jacques
Hello Kazuaki!

I recommend you read and take a look at the benchmark sub-library [1]
of archery and how it's glued [2]. You will need to implement:

- A runner for the framework you intend to use [3] and [4], it also
implies capturing the output into a class that implements the
"Benchmark" interface.
- Tweak the glue code such that it doesn't assume that there's only a
CppBenchmarkRunner [5] and probably some change in the command line
UX, e.g. `--language=` with default to cpp such that it doesn't
break the current workflow.

Once you have that and it's working locally, you can ask Krizstian
help on how to integrate it with ursabot.

François

[1] https://github.com/apache/arrow/tree/master/dev/archery/archery/benchmark
[2] 
https://github.com/apache/arrow/blob/master/dev/archery/archery/cli.py#L349-L577
[3] 
https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark/runner.py#L133-L207
[4] 
https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark/google.py#L33-L175
[5] 
https://github.com/apache/arrow/blob/master/dev/archery/archery/benchmark/runner.py#L51-L86

On Thu, Aug 27, 2020 at 1:59 PM Kazuaki Ishizaki  wrote:
>
> I am new to the Ursabot benchmark framework like
> https://github.com/apache/arrow/pull/7940#issuecomment-673183390.
>
> How can we run benchmark programs written in other languages (e.g. Java)
> beyond C++?  If it is not supported yet, what do we need to support to run
> benchmark programs in other languages?
>
> Best Regards,
> Kazuaki Ishizaki,
>


Ursabot Benchmark framework for other languages

2020-08-27 Thread Kazuaki Ishizaki
I am new to the Ursabot benchmark framework like 
https://github.com/apache/arrow/pull/7940#issuecomment-673183390.

How can we run benchmark programs written in other languages (e.g. Java) 
beyond C++?  If it is not supported yet, what do we need to support to run 
benchmark programs in other languages?

Best Regards,
Kazuaki Ishizaki,



[NIGHTLY] Arrow Build Report for Job nightly-2020-08-27-0

2020-08-27 Thread Crossbow


Arrow Build Report for Job nightly-2020-08-27-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0

Failed Tasks:
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-osx-clang-py38
- conda-win-vs2017-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-win-vs2017-py36
- conda-win-vs2017-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-win-vs2017-py37
- conda-win-vs2017-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-win-vs2017-py38
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-cpp-valgrind
- test-conda-python-3.6-pandas-0.23:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.6-pandas-0.23
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-test-conda-python-3.7-kartothek-master

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-clean
- conda-linux-gcc-py36-cpu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py36-cpu
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py36-cuda
- conda-linux-gcc-py37-cpu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py37-cpu
- conda-linux-gcc-py37-cuda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py37-cuda
- conda-linux-gcc-py38-cpu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py38-cpu
- conda-linux-gcc-py38-cuda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-azure-conda-linux-gcc-py38-cuda
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-debian-stretch-amd64
- debian-stretch-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-debian-stretch-arm64
- example-cpp-minimal-build-static-system-dependency:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-example-cpp-minimal-build-static-system-dependency
- example-cpp-minimal-build-static:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-example-cpp-minimal-build-static
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-travis-gandiva-jar-xenial
- nuget:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-27-0-github-nuget
- 

Re: Writing parquet to new filesystem API

2020-08-27 Thread Joris Van den Bossche
Small correction, I of course meant to use "write_table" and not
"write_to_dataset" in the code snippet (as the latter won't work that way).
Corrected example below:

But, if you already want to use the new filesystems for writing as well,
> there is one workaround to create an output stream manually and pass that
> instead of the path.
> So in your example, you could replace
>
> pq.write_table(table, out_path, filesystem=subtree_filesystem)
>
> with
>
> with subtree_filesystem.open_output_stream(out_path) as f:
> pq.write_table(table, f)
>
> However, this only works with single files (and not yet with
> write_to_dataset for partitioned datasets).
>
>


Re: Writing parquet to new filesystem API

2020-08-27 Thread Joris Van den Bossche
Hi Weston,

You are not missing something obvious, but this is a bit an unfortunate
"transitional phase" where we have new filesystems, but they are not yet
fully supported (on the reading side they are supported in pyarrow 1.0, but
for the writing side we are actively working on that, which will only be
for the next release. I actually have an open PR to add support for the new
filesystems to pq.write_table: https://github.com/apache/arrow/pull/7991).

But, if you already want to use the new filesystems for writing as well,
there is one workaround to create an output stream manually and pass that
instead of the path.
So in your example, you could replace

pq.write_to_dataset(table, out_path, filesystem=subtree_filesystem)

with

with subtree_filesystem.open_output_stream(out_path) as f:
pq.write_table(table, f)

However, this only works with single files (and not yet with
write_to_dataset for partitioned datasets).

Best,
Joris

On Thu, 27 Aug 2020 at 00:58, Weston Pace  wrote:
>
> Forgive me if I am missing something obvious but I am unable to write
> parquet files using the new filesystem API.
>
> Here is what I am trying:
>
> https://gist.github.com/westonpace/0c5ef01e21a40de5d16608b7f12de80d
>
> I receive an error:
>
> OSError: Unrecognized filesystem: 


Re: Arrow Dataset API on Ceph

2020-08-27 Thread Antoine Pitrou


Hello Ivo,

Le 27/08/2020 à 07:02, Ivo Jimenez a écrit :
> 
> Our main concern is that this new arrow::dataset::RadosFormat class will be
> deriving from the arrow::dataset::FileFormat class, which seems to raise a
> conceptual mismatch as there isn’t really a RADOS format but rather a
> formatting/serialization deferral that will be taking place, effectively
> introducing a new client-server layer in the Dataset API.

So, RadosFormat would ultimately redirect to another dataset format
(e.g. ParquetFormat) when it comes to actually understanding the data?

Regards

Antoine.