[ 
https://issues.apache.org/jira/browse/PARQUET-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Wu resolved PARQUET-2343.
------------------------------
    Resolution: Fixed

> Fixes NPE when rewriting file with multiple rowgroups
> -----------------------------------------------------
>
>                 Key: PARQUET-2343
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2343
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Xianyang Liu
>            Assignee: Xianyang Liu
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Currently, the ParquetRewiter creates the `ColumnReadStoreImpl crStore` and 
> reuses it for all the blocks rewriting. This should be incorrect and we 
> should create the `crStore` for each block that needs to be rewritten. 
> Otherwise, we will fail as the following:
> ```java
> java.lang.NullPointerException
>       at 
> org.apache.parquet.column.impl.ColumnReaderBase.readPage(ColumnReaderBase.java:620)
>       at 
> org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:594)
>       at 
> org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:735)
>       at 
> org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
>       at 
> org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:47)
>       at 
> org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:82)
>       at 
> org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocksFromReader(ParquetRewriter.java:316)
>       at 
> org.apache.parquet.hadoop.rewrite.ParquetRewriter.processBlocks(ParquetRewriter.java:250)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to