[jira] [Commented] (PHOENIX-6710) Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables

Ankit Singhal (Jira) Mon, 09 May 2022 23:41:40 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534171#comment-17534171
 ]


Ankit Singhal commented on PHOENIX-6710:
----------------------------------------

Thanks [~gjacoby] for the feedback.

{quote}Seems like we'd want it to be configurable for both a default and 
table-specific override in a CREATE statement.{quote}

It is already configurable at CREATE statement level like any other Column 
family attribute but we don't have cluster wide property to set it (even in 
HBase)
{code}
create table TEST (
id char(1) NOT NULL,
col1 integer NOT NULL,
CONSTRAINT NAME_PK PRIMARY KEY (id)
) BLOOMFILTER = 'ROW' 
{code}

{quote}Also based on PHOENIX-3797, do we need an exception if you try to create 
a local index on a table with a bloom filter? (Or an exception if you try to 
turn bloom filters on on a table that already has a local index?){quote}
PHONIEX-3797 issue was different, where we were not generating the local index 
file correctly. Therefore, we were getting the bloom filter out of order error, 
and disabling the bloom was just masking the error only. But now, as 
PHOENIX-3797 is already fixed, we shouldn't see any errors related to the bloom 
filter. I'll see if I can confirm it with a similar test that [~mujtaba] ran 
for PHOENIX-3797.


> Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-6710
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6710
>             Project: Phoenix
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 4.11.0
>            Reporter: Ankit Singhal
>            Assignee: Ankit Singhal
>            Priority: Major
>
> PHOENIX-3842 was done to workaround PHOENIX-3797  to unblock a release, and 
> with the assumption that Phoenix is not used for GETs.
>  
> At one of our users, we saw that they have been doing heavy GETs in their 
> custom coprocessor to check if the key is present or not in the current. At 
> most 99% of the time, the key is not expected to be present during the 
> initial load as keys are expected to be random, but there is still some 
> chance that there is 1% of keys would be duplicated. But in the absence of 
> BloomFilter, HBase has to seek HFile to confirm if the key is not present, 
> which results in regression in performance for about 2x slower.
>  
> Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
> also be impacted without bloom filters.
>  
> As Phoenix is still used for GETs by the users (SELECT query with key as a 
> filter). and we also have constructs that intrinsically do GETs like Index 
> maintenance and
> "On Duplicate key". So I believe it is always better to have a bloom filter 
> should be "ON" by default as I don't also see any implication of it, even if 
> it is not getting used.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (PHOENIX-6710) Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables

Reply via email to