[ https://issues.apache.org/jira/browse/ARROW-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe L. Korn resolved ARROW-3020. -------------------------------- Resolution: Fixed Issue resolved by pull request 3269 [https://github.com/apache/arrow/pull/3269] > [Python] Addition of option to allow empty Parquet row groups > ------------------------------------------------------------- > > Key: ARROW-3020 > URL: https://issues.apache.org/jira/browse/ARROW-3020 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Python > Reporter: Alex Mendelson > Assignee: Wes McKinney > Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > While our use case is not common, I was able to find one related request from > roughly a year ago. Could this be added as a feature? > https://issues.apache.org/jira/browse/PARQUET-1047 > *Motivation* > We have an application where each row is associated with one of N contexts, > though a minority of contexts may have no associated rows. When encountering > the Nth context, we will wish to retrieve all the associated rows. Row groups > would provide a natural way to index the data, as the nth context could > naturally relate to the nth row group. > Unfortunately, this is not possible at the present time, as pyarrow does not > support writing empty row groups. If one writes a pyarrow.Table containing > zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final > file, and this distorts the indexing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)