[ https://issues.apache.org/jira/browse/ARROW-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-2728: ----------------------------------------- Component/s: C++ - Dataset > [Python][C++][Dataset] Support partitioned Parquet datasets using glob-style > file paths > --------------------------------------------------------------------------------------- > > Key: ARROW-2728 > URL: https://issues.apache.org/jira/browse/ARROW-2728 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Dataset, Python > Affects Versions: 0.9.0 > Environment: pyarrow : 0.9.0.post1 > dask : 0.17.1 > Mac OS > Reporter: pranav kohli > Priority: Minor > Labels: dataset, parquet > > I am saving a dask dataframe to parquet with two partition columns using the > pyarrow engine. The problem arises in scanning the partition columns. When I > scan using the directory path, I get the partition columns in the output > dataframe, whereas if I scan using the glob path, I dont get these columns > > https://github.com/apache/arrow/issues/2147 -- This message was sent by Atlassian Jira (v8.3.4#803005)