[jira] [Commented] (ARROW-9682) [Python] Unable to specify the partition style with pq.write_to_dataset

Joris Van den Bossche (Jira) Mon, 10 Aug 2020 13:04:22 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175034#comment-17175034
 ]


Joris Van den Bossche commented on ARROW-9682:
----------------------------------------------

[~ldacey] the writing functionality in the Datasets API was not yet implemented 
for pyarrow 1.0. It has been recently added to C++, and I am actually right now 
creating the python bindings for it -> ARROW-9658 / 
https://github.com/apache/arrow/pull/7921

That PR enables you to write a partitioned dataset with the directory-style 
partitioning (right now this is through a new `pyarrow.dataset.write_dataset` 
but it will probably also be exposed in the `pyarrow.parquet.write_to_dataset`)

> [Python] Unable to specify the partition style with pq.write_to_dataset
> -----------------------------------------------------------------------
>
>                 Key: ARROW-9682
>                 URL: https://issues.apache.org/jira/browse/ARROW-9682
>             Project: Apache Arrow
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>         Environment: Ubuntu 18.04
> Python 3.7
>            Reporter: Lance Dacey
>            Priority: Major
>              Labels: parquet, parquetWriter
>
> I am able to import and test DirectoryPartitioning but I am not able to 
> figure out a way to write a dataset using this feature. It seems like 
> write_to_dataset defaults to the "hive" style. Is there a way to test this?
> {code:java}
> from pyarrow.dataset import DirectoryPartitioning
> partitioning = DirectoryPartitioning(pa.schema([("year", pa.int16()), 
> ("month", pa.int8()), ("day", pa.int8())]))
> print(partitioning.parse("/2009/11/3"))
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9682) [Python] Unable to specify the partition style with pq.write_to_dataset

Reply via email to