ConeyLiu commented on code in PR #1173:
URL: https://github.com/apache/parquet-mr/pull/1173#discussion_r1365223686
##########
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java:
##########
@@ -543,6 +546,11 @@ public static ColumnIndex build(
* the statistics to be added
*/
public void add(Statistics<?> stats) {
+ if (stats.isEmpty()) {
Review Comment:
The problem happens when both the `ColumnIndex` and the page header
`Statistics` are null. Because we get `null` returned from the
`convertStatistics`. However, the `ParquetFileWriter.writeDataPage` needs the
page statistics. So here we pass invalid page statistics to avoid the NPE and
overwrite the column statistics in the end. Otherwise, we need to add some
methods that don't need page statistics.
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java:
##########
@@ -612,13 +612,13 @@ public void writeDataPage(
* @throws IOException if any I/O error occurs during writing the file
*/
public void writeDataPage(
- int valueCount, int uncompressedPageSize,
- BytesInput bytes,
- Statistics<?> statistics,
- long rowCount,
- Encoding rlEncoding,
- Encoding dlEncoding,
- Encoding valuesEncoding) throws IOException {
+ int valueCount, int uncompressedPageSize,
+ BytesInput bytes,
+ Statistics<?> statistics,
+ long rowCount,
+ Encoding rlEncoding,
+ Encoding dlEncoding,
+ Encoding valuesEncoding) throws IOException {
Review Comment:
Sure, will revert it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]