[jira] [Updated] (PHOENIX-6710) Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables

2022-05-10 Thread Ankit Singhal (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-6710:
---
Fix Version/s: 5.1.3

> Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables
> ---
>
> Key: PHOENIX-6710
> URL: https://issues.apache.org/jira/browse/PHOENIX-6710
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.11.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
>
> PHOENIX-3842 was done to workaround PHOENIX-3797  to unblock a release, and 
> with the assumption that Phoenix is not used for GETs.
>  
> At one of our users, we saw that they have been doing heavy GETs in their 
> custom coprocessor to check if the key is present or not in the current. At 
> most 99% of the time, the key is not expected to be present during the 
> initial load as keys are expected to be random, but there is still some 
> chance that there is 1% of keys would be duplicated. But in the absence of 
> BloomFilter, HBase has to seek HFile to confirm if the key is not present, 
> which results in regression in performance for about 2x slower.
>  
> Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
> also be impacted without bloom filters.
>  
> As Phoenix is still used for GETs by the users (SELECT query with key as a 
> filter). and we also have constructs that intrinsically do GETs like Index 
> maintenance and
> "On Duplicate key". So I believe it is always better to have a bloom filter 
> should be "ON" by default as I don't also see any implication of it, even if 
> it is not getting used.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (PHOENIX-6710) Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables

2022-05-10 Thread Ankit Singhal (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-6710:
---
Fix Version/s: 5.2.0

> Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables
> ---
>
> Key: PHOENIX-6710
> URL: https://issues.apache.org/jira/browse/PHOENIX-6710
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.11.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 5.2.0
>
>
> PHOENIX-3842 was done to workaround PHOENIX-3797  to unblock a release, and 
> with the assumption that Phoenix is not used for GETs.
>  
> At one of our users, we saw that they have been doing heavy GETs in their 
> custom coprocessor to check if the key is present or not in the current. At 
> most 99% of the time, the key is not expected to be present during the 
> initial load as keys are expected to be random, but there is still some 
> chance that there is 1% of keys would be duplicated. But in the absence of 
> BloomFilter, HBase has to seek HFile to confirm if the key is not present, 
> which results in regression in performance for about 2x slower.
>  
> Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
> also be impacted without bloom filters.
>  
> As Phoenix is still used for GETs by the users (SELECT query with key as a 
> filter). and we also have constructs that intrinsically do GETs like Index 
> maintenance and
> "On Duplicate key". So I believe it is always better to have a bloom filter 
> should be "ON" by default as I don't also see any implication of it, even if 
> it is not getting used.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (PHOENIX-6710) Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables

2022-05-09 Thread Ankit Singhal (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-6710:
---
Description: 
PHOENIX-3842 was done to workaround PHOENIX-3797  to unblock a release, and 
with the assumption that Phoenix is not used for GETs.

 

At one of our users, we saw that they have been doing heavy GETs in their 
custom coprocessor to check if the key is present or not in the current. At 
most 99% of the time, the key is not expected to be present during the initial 
load as keys are expected to be random, but there is still some chance that 
there is 1% of keys would be duplicated. But in the absence of BloomFilter, 
HBase has to seek HFile to confirm if the key is not present, which results in 
regression in performance for about 2x slower.

 

Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
also be impacted without bloom filters.

 

As Phoenix is still used for GETs by the users (SELECT query with key as a 
filter). and we also have constructs that intrinsically do GETs like Index 
maintenance and
"On Duplicate key". So I believe it is always better to have a bloom filter 
should be "ON" by default as I don't also see any implication of it, even if it 
is not getting used.

 

  was:
It looks like PHOENIX-3842 was done to workaround PHOENIX-3797 in order to 
unblock a release, and it was assumed that Phoenix is not used for GETs.

 

At one of our users, we saw that they have been doing heavy GETs in their 
custom coprocessor to check if the key is present or not in the current. At 
most 99% of the time, the key is not expected to be present as the load initial 
and keys are expected to be random, but there is still some chance that there 
is 1% of keys would be duplicated. But in the absence of BloomFilter, HBase has 
to seek HFile to confirm if the key is not present, which results in regression 
in performance for about 2x slower.

 

Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
also be impacted without bloom filters.

 

As Phoenix is still used for GETs by the users. and we also have constructs 
that intrinsically do GETs like Index maintenance and others. So I believe it 
is always better to have a bloom filter that should be "ON" by default as I 
don't see any implication of keeping it ON, even if it is not getting used.

 


> Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables
> ---
>
> Key: PHOENIX-6710
> URL: https://issues.apache.org/jira/browse/PHOENIX-6710
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.11.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
>
> PHOENIX-3842 was done to workaround PHOENIX-3797  to unblock a release, and 
> with the assumption that Phoenix is not used for GETs.
>  
> At one of our users, we saw that they have been doing heavy GETs in their 
> custom coprocessor to check if the key is present or not in the current. At 
> most 99% of the time, the key is not expected to be present during the 
> initial load as keys are expected to be random, but there is still some 
> chance that there is 1% of keys would be duplicated. But in the absence of 
> BloomFilter, HBase has to seek HFile to confirm if the key is not present, 
> which results in regression in performance for about 2x slower.
>  
> Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
> also be impacted without bloom filters.
>  
> As Phoenix is still used for GETs by the users (SELECT query with key as a 
> filter). and we also have constructs that intrinsically do GETs like Index 
> maintenance and
> "On Duplicate key". So I believe it is always better to have a bloom filter 
> should be "ON" by default as I don't also see any implication of it, even if 
> it is not getting used.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (PHOENIX-6710) Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables

2022-05-09 Thread Ankit Singhal (Jira)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-6710:
---
Description: 
It looks like PHOENIX-3842 was done to workaround PHOENIX-3797 in order to 
unblock a release, and it was assumed that Phoenix is not used for GETs.

 

At one of our users, we saw that they have been doing heavy GETs in their 
custom coprocessor to check if the key is present or not in the current. At 
most 99% of the time, the key is not expected to be present as the load initial 
and keys are expected to be random, but there is still some chance that there 
is 1% of keys would be duplicated. But in the absence of BloomFilter, HBase has 
to seek HFile to confirm if the key is not present, which results in regression 
in performance for about 2x slower.

 

Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
also be impacted without bloom filters.

 

As Phoenix is still used for GETs by the users. and we also have constructs 
that intrinsically do GETs like Index maintenance and others. So I believe it 
is always better to have a bloom filter that should be "ON" by default as I 
don't see any implication of keeping it ON, even if it is not getting used.

 

  was:
It looks like PHOENIX-3842 was done to workaround PHOENIX-3797 in order to 
unblock a release, and it was assumed that Phoenix is not used for GETs.

 

At one of our users, we saw that they have been doing heavy GETs in their 
custom coprocessor to check if the key is present or not in the current. At 
most 99% of the time, the key is not expected to be present as the load initial 
and keys are expected to be random, but there is still some chance that there 
is 1% of keys would be duplicated. But in the absence of BloomFilter, HBase has 
to seek HFile to confirm if the key is not present, which results in regression 
in performance for about 2x slower.

 

Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
also be impacted without bloom filters.

 

As Phoenix is still used for GETs by the users. and we also have constructs 
that intrinsically do GETs like Index maintenance and others. So I believe it 
is always better to have a bloom filter should "ON" by default as I don't see 
any implication of it getting on even if it is not getting used.

 


> Revert PHOENIX-3842 Turn on back default bloomFilter for Phoenix Tables
> ---
>
> Key: PHOENIX-6710
> URL: https://issues.apache.org/jira/browse/PHOENIX-6710
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Affects Versions: 4.11.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
>
> It looks like PHOENIX-3842 was done to workaround PHOENIX-3797 in order to 
> unblock a release, and it was assumed that Phoenix is not used for GETs.
>  
> At one of our users, we saw that they have been doing heavy GETs in their 
> custom coprocessor to check if the key is present or not in the current. At 
> most 99% of the time, the key is not expected to be present as the load 
> initial and keys are expected to be random, but there is still some chance 
> that there is 1% of keys would be duplicated. But in the absence of 
> BloomFilter, HBase has to seek HFile to confirm if the key is not present, 
> which results in regression in performance for about 2x slower.
>  
> Even in use cases like Index maintenance and "ON DUPLICATE KEY" queries will 
> also be impacted without bloom filters.
>  
> As Phoenix is still used for GETs by the users. and we also have constructs 
> that intrinsically do GETs like Index maintenance and others. So I believe it 
> is always better to have a bloom filter that should be "ON" by default as I 
> don't see any implication of keeping it ON, even if it is not getting used.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)