wgtmac commented on code in PR #1173:
URL: https://github.com/apache/parquet-mr/pull/1173#discussion_r1364833487
##########
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java:
##########
@@ -543,6 +546,11 @@ public static ColumnIndex build(
* the statistics to be added
*/
public void add(Statistics<?> stats) {
+ if (stats.isEmpty()) {
Review Comment:
Let me try to understand what happens here. `convertStatistics` is used to
recover page statistics from ColumnIndex or original page header if the
ColumnIndex is unavailable. The problem emerges when ColumnIndex is
unavailable. Am I correct? If true, then why do we need those changes in the
ColumnIndexBuilder?
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java:
##########
@@ -335,10 +340,15 @@ private void processBlocksFromReader() throws IOException
{
}
}
- private void processChunk(ColumnChunkMetaData chunk,
- CompressionCodecName newCodecName,
- ColumnChunkEncryptorRunTime
columnChunkEncryptorRunTime,
- boolean encryptColumn) throws IOException {
+ /**
+ * Rewrite a single column with the given new compression codec or new
encryptor
Review Comment:
```suggestion
* Rewrite a single column with the given new compression codec and/or new
encryptor
```
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java:
##########
@@ -612,13 +612,13 @@ public void writeDataPage(
* @throws IOException if any I/O error occurs during writing the file
*/
public void writeDataPage(
- int valueCount, int uncompressedPageSize,
- BytesInput bytes,
- Statistics<?> statistics,
- long rowCount,
- Encoding rlEncoding,
- Encoding dlEncoding,
- Encoding valuesEncoding) throws IOException {
+ int valueCount, int uncompressedPageSize,
+ BytesInput bytes,
+ Statistics<?> statistics,
+ long rowCount,
+ Encoding rlEncoding,
+ Encoding dlEncoding,
+ Encoding valuesEncoding) throws IOException {
Review Comment:
Could you avoid these style changes? They are unrelated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]