[
https://issues.apache.org/jira/browse/PHOENIX-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534654#comment-17534654
]
ASF GitHub Bot commented on PHOENIX-6710:
-----------------------------------------
chrajeshbabu commented on PR #1436:
URL: https://github.com/apache/phoenix/pull/1436#issuecomment-1123128220
+1 LGTM @ankitsinghal
> Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables
> -----------------------------------------------------------------------
>
> Key: PHOENIX-6710
> URL: https://issues.apache.org/jira/browse/PHOENIX-6710
> Project: Phoenix
> Issue Type: Bug
> Components: core
> Affects Versions: 4.11.0
> Reporter: Ankit Singhal
> Assignee: Ankit Singhal
> Priority: Major
>
> PHOENIX-3842 was done to workaround PHOENIX-3797 to unblock a release, and
> with the assumption that Phoenix is not used for GETs.
>
> At one of our users, we saw that they have been doing heavy GETs in their
> custom coprocessor to check if the key is present or not in the current. At
> most 99% of the time, the key is not expected to be present during the
> initial load as keys are expected to be random, but there is still some
> chance that there is 1% of keys would be duplicated. But in the absence of
> BloomFilter, HBase has to seek HFile to confirm if the key is not present,
> which results in regression in performance for about 2x slower.
>
> Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will
> also be impacted without bloom filters.
>
> As Phoenix is still used for GETs by the users (SELECT query with key as a
> filter). and we also have constructs that intrinsically do GETs like Index
> maintenance and
> "On Duplicate key". So I believe it is always better to have a bloom filter
> should be "ON" by default as I don't also see any implication of it, even if
> it is not getting used.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)