[jira] [Commented] (PARQUET-2348) Recompression/Re-encrypt should rewrite bloomfilter

ASF GitHub Bot (Jira) Thu, 28 Sep 2023 05:29:05 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770034#comment-17770034
 ]


ASF GitHub Bot commented on PARQUET-2348:
-----------------------------------------

ConeyLiu commented on code in PR #1143:
URL: https://github.com/apache/parquet-mr/pull/1143#discussion_r1340067800


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java:
##########
@@ -366,6 +366,10 @@ private void processChunk(ColumnChunkMetaData chunk,
 
     ColumnIndex columnIndex = reader.readColumnIndex(chunk);
     OffsetIndex offsetIndex = reader.readOffsetIndex(chunk);
+    BloomFilter bloomFilter = reader.readBloomFilter(chunk);
+    if (bloomFilter != null) {
+      writer.addBloomFilter(chunk.getPath().toDotString(), bloomFilter);

Review Comment:
   Sorry for the late response. Added the related UTs.





> Recompression/Re-encrypt should rewrite bloomfilter
> ---------------------------------------------------
>
>                 Key: PARQUET-2348
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2348
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Xianyang Liu
>            Priority: Major
>
> The bloomfilter data is lost after rewriting with recompression or 
> re-encrypt. We should rewrite the bloomfilter data as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2348) Recompression/Re-encrypt should rewrite bloomfilter

Reply via email to