[ https://issues.apache.org/jira/browse/ARROW-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175034#comment-17175034 ]
Joris Van den Bossche commented on ARROW-9682: ---------------------------------------------- [~ldacey] the writing functionality in the Datasets API was not yet implemented for pyarrow 1.0. It has been recently added to C++, and I am actually right now creating the python bindings for it -> ARROW-9658 / https://github.com/apache/arrow/pull/7921 That PR enables you to write a partitioned dataset with the directory-style partitioning (right now this is through a new `pyarrow.dataset.write_dataset` but it will probably also be exposed in the `pyarrow.parquet.write_to_dataset`) > [Python] Unable to specify the partition style with pq.write_to_dataset > ----------------------------------------------------------------------- > > Key: ARROW-9682 > URL: https://issues.apache.org/jira/browse/ARROW-9682 > Project: Apache Arrow > Issue Type: Improvement > Affects Versions: 1.0.0 > Environment: Ubuntu 18.04 > Python 3.7 > Reporter: Lance Dacey > Priority: Major > Labels: parquet, parquetWriter > > I am able to import and test DirectoryPartitioning but I am not able to > figure out a way to write a dataset using this feature. It seems like > write_to_dataset defaults to the "hive" style. Is there a way to test this? > {code:java} > from pyarrow.dataset import DirectoryPartitioning > partitioning = DirectoryPartitioning(pa.schema([("year", pa.int16()), > ("month", pa.int8()), ("day", pa.int8())])) > print(partitioning.parse("/2009/11/3")) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)