Xianyang Liu created PARQUET-2365:
-------------------------------------
Summary: Fixes NPE when rewriting column without column index
Key: PARQUET-2365
URL: https://issues.apache.org/jira/browse/PARQUET-2365
Project: Parquet
Issue Type: Bug
Reporter: Xianyang Liu
The ColumnIndex could be null in some scenes, for example, the float/double
column contains NaN or the size has exceeded the expected value. And the page
header statistics are not written anymore after we supported ColumnIndex. So we
will get NPE when rewriting the column without ColumnIndex due to we need to
get NULL page statistics when converted from the ColumnIndex(NULL) or page
header statistics(NULL). Such as the following:
```java
java.lang.NullPointerException
at
org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:727)
at
org.apache.parquet.hadoop.ParquetFileWriter.innerWriteDataPage(ParquetFileWriter.java:663)
at
org.apache.parquet.hadoop.ParquetFileWriter.writeDataPage(ParquetFileWriter.java:650)
at
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processChunk(ParquetRewriter.java:453)
at
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:317)
at
org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250)
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)