[
https://issues.apache.org/jira/browse/PARQUET-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684133#comment-17684133
]
ASF GitHub Bot commented on PARQUET-2237:
-----------------------------------------
yabola opened a new pull request, #1023:
URL: https://github.com/apache/parquet-mr/pull/1023
Bloomfilter needs to load from filesystem, it may costs time and space. If
we can exactly determine the existence/nonexistence of the value from other
filters , then we can avoid using Bloomfilter to Improve performance.
When the minMax values in StatisticsFilter is same, we can exactly
determine the existence/nonexistence of the value.
When we have page dictionaries, we can also determine the
existence/nonexistence of the value.
> Improve performance when filters in RowGroupFilter can match exactly
> --------------------------------------------------------------------
>
> Key: PARQUET-2237
> URL: https://issues.apache.org/jira/browse/PARQUET-2237
> Project: Parquet
> Issue Type: Improvement
> Reporter: Mars
> Priority: Major
>
> Bloomfilter needs to load from filesystem, it may costs time and space. If we
> can exactly determine the existence/nonexistence of the value from other
> filters , then we can avoid using Bloomfilter to Improve performance.
>
> When the minMax values in StatisticsFilter is same, we can exactly determine
> the existence/nonexistence of the value.
> When we have page dictionaries, we can also determine the
> existence/nonexistence of the value.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)