[
https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gang Wu reassigned PARQUET-2343:
--------------------------------
Assignee: Xianyang Liu
> Fixes NPE when rewriting file with multiple rowgroups
> -----------------------------------------------------
>
> Key: PARQUET-2343
> URL: https://issues.apache.org/jira/browse/PARQUET-2343
> Project: Parquet
> Issue Type: Bug
> Reporter: Xianyang Liu
> Assignee: Xianyang Liu
> Priority: Major
> Fix For: 1.14.0
>
>
> Currently, the ParquetRewiter creates the `ColumnReadStoreImpl crStore` and
> reuses it for all the blocks rewriting. This should be incorrect and we
> should create the `crStore` for each block that needs to be rewritten.
> Otherwise, we will fail as the following:
> ```java
> java.lang.NullPointerException
> at
> org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:620)
> at
> org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:594)
> at
> org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:735)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:47)
> at
> org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:82)
> at
> org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:316)
> at
> org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250)
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)