[jira] [Resolved] (ARROW-3020) [Python] Addition of option to allow empty Parquet row groups

Uwe L. Korn (JIRA) Fri, 28 Dec 2018 06:58:55 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe L. Korn resolved ARROW-3020.
--------------------------------
    Resolution: Fixed

Issue resolved by pull request 3269
[https://github.com/apache/arrow/pull/3269]

> [Python] Addition of option to allow empty Parquet row groups
> -------------------------------------------------------------
>
>                 Key: ARROW-3020
>                 URL: https://issues.apache.org/jira/browse/ARROW-3020
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: Alex Mendelson
>            Assignee: Wes McKinney
>            Priority: Major
>              Labels: parquet, pull-request-available
>             Fix For: 0.12.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> While our use case is not common, I was able to find one related request from 
> roughly a year ago. Could this be added as a feature?
> https://issues.apache.org/jira/browse/PARQUET-1047
> *Motivation*
> We have an application where each row is associated with one of N contexts, 
> though a minority of contexts may have no associated rows. When encountering 
> the Nth context, we will wish to retrieve all the associated rows. Row groups 
> would provide a natural way to index the data, as the nth context could 
> naturally relate to the nth row group.
> Unfortunately, this is not possible at the present time, as pyarrow does not 
> support writing empty row groups. If one writes a pyarrow.Table containing 
> zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final 
> file, and this distorts the indexing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3020) [Python] Addition of option to allow empty Parquet row groups

Reply via email to