[
https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699707#comment-17699707
]
ASF GitHub Bot commented on PARQUET-2257:
-----------------------------------------
wgtmac commented on code in PR #194:
URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134163134
##########
src/main/thrift/parquet.thrift:
##########
@@ -753,6 +753,9 @@ struct ColumnMetaData {
/** Byte offset from beginning of file to Bloom filter data. **/
14: optional i64 bloom_filter_offset;
+
+ /** Size of Bloom filter data, in bytes. **/
+ 15: optional i32 bloom_filter_length;
Review Comment:
On the writer side:
- Old writer only writes offset.
- New writer should write length as well.
On the reader side:
- Old reader only checks offset.
- New reader checks offset and then checks if length exists.
> [Format] Add bloom_filter_length to ColumnMetaData
> --------------------------------------------------
>
> Key: PARQUET-2257
> URL: https://issues.apache.org/jira/browse/PARQUET-2257
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-format
> Reporter: Gang Wu
> Assignee: Gang Wu
> Priority: Major
>
> The specs only has added bloom_filter_offset to locate the bloom filter. The
> reader cannot load the bloom filter in a single shot until it parses the
> bloom filter header to get the total size.
> This issue proposes to add an optional bloom_filter_length field to track the
> size of bloom filter to facilitate I/O scheduling.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)