[jira] [Commented] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

ASF GitHub Bot (Jira) Mon, 13 Mar 2023 08:24:14 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699700#comment-17699700
 ]


ASF GitHub Bot commented on PARQUET-2257:
-----------------------------------------

mapleFU commented on code in PR #194:
URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134158075


##########
src/main/thrift/parquet.thrift:
##########
@@ -753,6 +753,9 @@ struct ColumnMetaData {
 
   /** Byte offset from beginning of file to Bloom filter data. **/
   14: optional i64 bloom_filter_offset;
+
+  /** Size of Bloom filter data, in bytes. **/
+  15: optional i32 bloom_filter_length;

Review Comment:
   Seems that if length exists, offset must exists. However, if offset not 
exists, length should not exist?
   
   And reader should check that:
   1. Does it have offset
   2. If has, can it go fast path that using bloom_filter_length to read?





> [Format] Add bloom_filter_length to ColumnMetaData
> --------------------------------------------------
>
>                 Key: PARQUET-2257
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2257
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format
>            Reporter: Gang Wu
>            Assignee: Gang Wu
>            Priority: Major
>
> The specs only has added bloom_filter_offset to locate the bloom filter. The 
> reader cannot load the bloom filter in a single shot until it parses the 
> bloom filter header to get the total size.
> This issue proposes to add an optional bloom_filter_length field to track the 
> size of bloom filter to facilitate I/O scheduling. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2257) [Format] Add bloom_filter_length to ColumnMetaData

Reply via email to