[ 
https://issues.apache.org/jira/browse/PARQUET-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17648342#comment-17648342
 ] 

Gang Wu commented on PARQUET-2219:
----------------------------------

According to the error message, it seems that empty row group is deemed as 
malformed. CMIW, some downstream readers may use the parquet reader as an 
iterator to check if it can continue reading any row. However, the empty row 
group may terminate the iterator early which is an incorrect behavior. So I 
suggest not writing the parquet file if there is no row of data.

> ParquetFileReader throws a runtime exception when a file contains only 
> headers and now row data
> -----------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2219
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2219
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.12.1
>            Reporter: chris stockton
>            Priority: Minor
>
> Google BigQuery has an option to export table data to Parquet-formatted 
> files, but some of these files are written with header data only.  When this 
> happens and these files are opened with the ParquetFileReader, an exception 
> is thrown:
> {{RuntimeException("Illegal row group of 0 rows");}}
> It seems like the ParquetFileReader should not throw an exception when it 
> encounters such a file.
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L949



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to