[
https://issues.apache.org/jira/browse/PARQUET-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770034#comment-17770034
]
ASF GitHub Bot commented on PARQUET-2348:
-----------------------------------------
ConeyLiu commented on code in PR #1143:
URL: https://github.com/apache/parquet-mr/pull/1143#discussion_r1340067800
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java:
##########
@@ -366,6 +366,10 @@ private void processChunk(ColumnChunkMetaData chunk,
ColumnIndex columnIndex = reader.readColumnIndex(chunk);
OffsetIndex offsetIndex = reader.readOffsetIndex(chunk);
+ BloomFilter bloomFilter = reader.readBloomFilter(chunk);
+ if (bloomFilter != null) {
+ writer.addBloomFilter(chunk.getPath().toDotString(), bloomFilter);
Review Comment:
Sorry for the late response. Added the related UTs.
> Recompression/Re-encrypt should rewrite bloomfilter
> ---------------------------------------------------
>
> Key: PARQUET-2348
> URL: https://issues.apache.org/jira/browse/PARQUET-2348
> Project: Parquet
> Issue Type: Bug
> Reporter: Xianyang Liu
> Priority: Major
>
> The bloomfilter data is lost after rewriting with recompression or
> re-encrypt. We should rewrite the bloomfilter data as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)