[jira] [Commented] (PARQUET-2373) Improve I/O performance with bloom_filter_length

ASF GitHub Bot (Jira) Thu, 16 Nov 2023 21:51:22 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787034#comment-17787034
 ]


ASF GitHub Bot commented on PARQUET-2373:
-----------------------------------------

zhangjiashen commented on code in PR #1184:
URL: https://github.com/apache/parquet-mr/pull/1184#discussion_r1396684158


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java:
##########
@@ -341,6 +351,15 @@ public long getBloomFilterOffset() {
     return bloomFilterOffset;
   }
 
+  /**
+   * @return the length to the Bloom filter or {@code -1} if there is no bloom 
filter for this column chunk
+   */
+  @Private
+  public int getBloomFilterLength() {

Review Comment:
   it will be -1 by default and then length will be loaded from bloom filter 
header if bloom filter length doesn't exist
   There are bunch of tests including E2E tests in TestParquetFileWriter, 
TestParquetMetadataConverter, ParquetRewriterTest, please take a look?



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java:
##########
@@ -341,6 +351,15 @@ public long getBloomFilterOffset() {
     return bloomFilterOffset;
   }
 
+  /**
+   * @return the length to the Bloom filter or {@code -1} if there is no bloom 
filter for this column chunk
+   */
+  @Private
+  public int getBloomFilterLength() {

Review Comment:
   it will be -1 by default and then length will be loaded from bloom filter 
header if bloom filter length doesn't exist.
   There are bunch of tests including E2E tests in TestParquetFileWriter, 
TestParquetMetadataConverter, ParquetRewriterTest, please take a look?





> Improve I/O performance with bloom_filter_length
> ------------------------------------------------
>
>                 Key: PARQUET-2373
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2373
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Jiashen Zhang
>            Priority: Minor
>
> The spec PARQUET-2257 has added bloom_filter_length for reader to load the 
> bloom filter in a single shot. This implementation alters the code to make 
> use of the 'bloom_filter_length' field for loading the bloom filter 
> (consisting of the header and bitset) in order to enhance I/O scheduling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2373) Improve I/O performance with bloom_filter_length

Reply via email to