Re: Writing parquet to new filesystem API

Joris Van den Bossche Thu, 27 Aug 2020 03:05:13 -0700

Hi Weston,

You are not missing something obvious, but this is a bit an unfortunate
"transitional phase" where we have new filesystems, but they are not yet
fully supported (on the reading side they are supported in pyarrow 1.0, but
for the writing side we are actively working on that, which will only be
for the next release. I actually have an open PR to add support for the new
filesystems to pq.write_table: https://github.com/apache/arrow/pull/7991).

But, if you already want to use the new filesystems for writing as well,
there is one workaround to create an output stream manually and pass that
instead of the path.
So in your example, you could replace

pq.write_to_dataset(table, out_path, filesystem=subtree_filesystem)

with

with subtree_filesystem.open_output_stream(out_path) as f:
    pq.write_table(table, f)

However, this only works with single files (and not yet with
write_to_dataset for partitioned datasets).

Best,
Joris

On Thu, 27 Aug 2020 at 00:58, Weston Pace <[email protected]> wrote:
>
> Forgive me if I am missing something obvious but I am unable to write
> parquet files using the new filesystem API.
>
> Here is what I am trying:
>
> https://gist.github.com/westonpace/0c5ef01e21a40de5d16608b7f12de80d
>
> I receive an error:
>
> OSError: Unrecognized filesystem: <class 'pyarrow._fs.SubTreeFileSystem'>

Re: Writing parquet to new filesystem API

Reply via email to