[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Guangxu Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488804#comment-16488804
 ] 

Guangxu Cheng commented on HBASE-20636:
---

1. ROWCOL: specify the length of the prefix
2. ROWPREFIX_DELIMITED: specify the delimiter of the prefix

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488946#comment-16488946
 ] 

Hadoop QA commented on HBASE-20636:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
59s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
59s{color} | {color:blue} hbase-server in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
14s{color} | {color:red} hbase-server: The patch generated 37 new + 223 
unchanged - 8 fixed = 260 total (was 231) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
48s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
14m 29s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
15s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 14s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 42s{color} 
| {color:red} hbase-mapreduce in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 91m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestScanWithBloomError |
|   | hadoop.hbase.mapreduce.TestCopyTable |
|   | hadoop.hbase.mapreduce.TestImportTSVWithVisibilityLabels |
|   | hadoop.hbase.mapreduce.TestHFileOutputFormat2 |
|   | hadoop.hbase.mapreduce.TestImportTsv |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20636 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12924927/HBASE-20636.master.001.patch
 |
| Opt

[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489011#comment-16489011
 ] 

Duo Zhang commented on HBASE-20636:
---

I think the failed UTs are related? And mind posting a simple design doc for 
this feature?

Thanks.

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Guangxu Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489292#comment-16489292
 ] 

Guangxu Cheng commented on HBASE-20636:
---

bq. I think the failed UTs are related? And mind posting a simple design doc 
for this feature?

Yes, the failed UTs are related.I will fix it.
I will supply a design doc as soon as soon as possible.Thanks [~Apache9]

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Guangxu Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489381#comment-16489381
 ] 

Guangxu Cheng commented on HBASE-20636:
---

Attach 002 patch to fix failed UTs and checkstyle warnnings.

Thanks

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489586#comment-16489586
 ] 

Andrew Purtell commented on HBASE-20636:


I think this needs backwards compatibility consideration. It should be possible 
to upgrade to a version with these new BF types and then downgrade. It is fine 
if the downgraded installation will have suboptimal performance until the 
HFiles are rewritten after downgrade via compaction with an older BF type. 

We should never be in a position where we can't read an older HFile version. 
Does this require a HFile version increment? I would argue yes. There should 
also be a fallback. We can at least process the HFile without blooms if we 
don't recognize the bloom filter type.

Likewise, the unknown BF type in the schema should not cause the affected 
regions not to be deployed. 

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489834#comment-16489834
 ] 

Hadoop QA commented on HBASE-20636:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
17s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
13s{color} | {color:blue} hbase-server in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} The patch hbase-client passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} hbase-server: The patch generated 0 new + 224 
unchanged - 9 fixed = 224 total (was 233) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} The patch hbase-mapreduce passed checkstyle {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
56s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
16m 11s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
26s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}184m 25s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
13s{color} | {color:green} hbase-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 2s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}268m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.io.hfile.TestSeekBeforeWithInlineBlocks |
|   | hadoop.hbase.regionserver.TestSeekOptimizations |
|   | hadoop.hbase.regionserver.TestMultiColumnScanner |
\\
\\
|| Subs

[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490078#comment-16490078
 ] 

Allan Yang commented on HBASE-20636:


{quote}
Does this require a HFile version increment?
{quote}
[~apurte4ll], just reviewed the patch,  this does not need a HFile version 
increment.
{quote}
There should also be a fallback. We can at least process the HFile without 
blooms if we don't recognize the bloom filter type.
{quote}
Well, I think it is hard to fallback, since we read bloom filter type like this 
in StoreFileReader.loadFileInfo():
{code:java}
byte[] b = fi.get(BLOOM_FILTER_TYPE_KEY);
if (b != null) {
  bloomFilterType = BloomType.valueOf(Bytes.toString(b));
}
{code}
It the old version of RegionServer can't recognize the bloom filter type, it 
will throw a IllegalArgumentException here.

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490084#comment-16490084
 ] 

Andrew Purtell commented on HBASE-20636:


bq. t reviewed the patch, this does not need a HFile version increment.

If we end up writing a HFile that cannot be read by an earlier version of 
HBase, HFile needs a version increment. 

bq. It the old version of RegionServer can't recognize the bloom filter type, 
it will throw a IllegalArgumentException here.

And this is why.


> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Allan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490097#comment-16490097
 ] 

Allan Yang commented on HBASE-20636:


I think bump up the HFile version  for a new type of bloom filter is too much. 
Maybe we can go around, like:
Set the BLOOM_FILTER_TYPE_KEY to NONE if it is a prefix filter. Add another 
file info e.g. PREFIX_BLOOM_FILTER_FLAG. The type of the new BF can be set 
here. The old version of RegionServer will regards the  BF as NONE while the 
new server has enough information. 
I don't know if it is a good idea, [~apurtell], [~andrewcheng] 

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-24 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490184#comment-16490184
 ] 

Andrew Purtell commented on HBASE-20636:


If the file with the new filter cannot be read by an older version of HBase 
then we are lying about compatibility if the hfile version is not incremented. 
If you bump only the minor then a fallback will be necessary

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491700#comment-16491700
 ] 

Ted Yu commented on HBASE-20636:


I ran failed tests for patch v2 with patch v1. They also failed.
e.g.
{code}
testMultipleTimestampRanges[3](org.apache.hadoop.hbase.regionserver.TestSeekOptimizations)
  Time elapsed: 0.067 s  <<< ERROR!
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
testMultipleTimestampRanges,,1527348574853.1d8afb02c3e80b1a6ecbd065190b254a.
  at 
org.apache.hadoop.hbase.regionserver.TestSeekOptimizations.createTimestampRange(TestSeekOptimizations.java:442)
  at 
org.apache.hadoop.hbase.regionserver.TestSeekOptimizations.testMultipleTimestampRanges(TestSeekOptimizations.java:162)
Caused by: java.lang.IllegalArgumentException: Bloom filter type is ROWPREFIX, 
RowPrefixBloomFilter.prefix_length not specified.
  at 
org.apache.hadoop.hbase.regionserver.TestSeekOptimizations.createTimestampRange(TestSeekOptimizations.java:442)
  at 
org.apache.hadoop.hbase.regionserver.TestSeekOptimizations.testMultipleTimestampRanges(TestSeekOptimizations.java:162)
{code}
TestSeekOptimizations is parameterized:
{code}
  public static final Collection parameters() {
return HBaseTestingUtility.BLOOM_AND_COMPRESSION_COMBINATIONS;
{code}
For the new row prefix filter, filter parameter is needed. Please take this 
into account in generating unit test parameter combinations.

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-29 Thread Guangxu Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493454#comment-16493454
 ] 

Guangxu Cheng commented on HBASE-20636:
---

{quote}I think this needs backwards compatibility consideration.
{quote}
[~apurtell] [~allan163] Sorry for late reply. I also think that It is necessary 
to keep compatibility. But, even if the HFile has a version increment, the old 
version of HBase still unable to read the HFile. Maybe we can update the old 
version. When HBase does not recognize the BF type, the Bloom filter is ignored.
{quote}I ran failed tests for patch v2 with patch v1. They also failed.
{quote}
[~yuzhih...@gmail.com] Sorry for late reply.  Attach 003 patch to fix uts. And 
this patch also contains the suggestions as you mentioned on the ReviewBoard.

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493714#comment-16493714
 ] 

Hadoop QA commented on HBASE-20636:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
54s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m  
3s{color} | {color:blue} hbase-server in master has 2 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
14s{color} | {color:red} hbase-server: The patch generated 1 new + 235 
unchanged - 9 fixed = 236 total (was 244) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
14m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
52s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}118m 58s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
49s{color} | {color:green} hbase-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
55s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}194m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.TestMasterFailoverBalancerPersistence |
|   | 
hadoop.hbase.replication.regionserver.TestRegionReplicaReplicationEndpointNoMaster
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20636 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12

[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-05-29 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493835#comment-16493835
 ] 

Andrew Purtell commented on HBASE-20636:


bq.  I also think that It is necessary to keep compatibility. But, even if the 
HFile has a version increment, the old version of HBase still unable to read 
the HFile. 

Yes, this is why I said there needs to be a fallback so the read can proceed 
even when the BF type is not recognized.

At we need to at least increment the minor HFile version. I think that would be 
enough.

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-20 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622764#comment-16622764
 ] 

Andrew Purtell commented on HBASE-20636:


bq. At we need to at least increment the minor HFile version. I think that 
would be enough.

I had a look at HBASE-16213 and I didn't raise this there, so I rescind my 
comment. We already put in a row encoding change without a HFile version 
change. 

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622773#comment-16622773
 ] 

Hadoop QA commented on HBASE-20636:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HBASE-20636 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.7.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-20636 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12925536/HBASE-20636.master.003.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14464/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-20 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622901#comment-16622901
 ] 

Andrew Purtell commented on HBASE-20636:


Attached a rebased patch for master as 004. No changes needed except for one 
minor fix up for changed import ordering.

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-20 Thread Guangxu Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622943#comment-16622943
 ] 

Guangxu Cheng commented on HBASE-20636:
---

bq.Attached a rebased patch for master as 004. No changes needed except for one 
minor fix up for changed import ordering.
Thanks [~apurtell]. Pending QA

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623029#comment-16623029
 ] 

Hadoop QA commented on HBASE-20636:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
16s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
14s{color} | {color:red} hbase-server: The patch generated 1 new + 227 
unchanged - 8 fixed = 228 total (was 235) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
13s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 10s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
10s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}134m 
21s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 
57s{color} | {color:green} hbase-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}211m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20636 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940699/HBASE-20636.master.004.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 3f98dac15d12 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
1

[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-21 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624282#comment-16624282
 ] 

Andrew Purtell commented on HBASE-20636:


005 patch fixes checkstyle nit. Ready for commit as far as I'm concerned, going 
to proceed. 

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624289#comment-16624289
 ] 

Hadoop QA commented on HBASE-20636:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  3s{color} 
| {color:red} HBASE-20636 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.7.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20636 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940861/HBASE-20636.master.005.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14475/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-21 Thread Anoop Sam John (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624484#comment-16624484
 ] 

Anoop Sam John commented on HBASE-20636:


[~andrewcheng]  Can you add RN here pls?

> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-21 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624509#comment-16624509
 ] 

Hudson commented on HBASE-20636:


Results for branch branch-2
[build #1284 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1284/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1284//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1284//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1284//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20636) Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED

2018-09-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624654#comment-16624654
 ] 

Hudson commented on HBASE-20636:


Results for branch master
[build #505 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/505/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/505//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/505//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/505//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED
> ---
>
> Key: HBASE-20636
> URL: https://issues.apache.org/jira/browse/HBASE-20636
> Project: HBase
>  Issue Type: New Feature
>  Components: HFile, regionserver, scan
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-20636.master.001.patch, 
> HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, 
> HBASE-20636.master.004.patch, HBASE-20636.master.005.patch
>
>
> As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary 
> files to improve read performance. But they only support Get and do not 
> support Scan.
> In our company(Tencent), many users need to scan all rows with the same 
> prefix, such as Tencent Game. Game user's some operational record will be 
> written into HBase, each game user will have a lot of records, the rowkey is 
> constructed as userid+'#'+timestamps. So we can scan all records for a given 
> user for a specified period.
> For this scenario, we designed the prefix Bloom filter. If the startRow and 
> stopRow of the Scan has a valid common prefix, the scan will be allowed to 
> use BloomFilter to filter files which will enhance the performance of the 
> scan.
> Now, this feature has been running on our cluster over a year, and scan 
> performance for this scenario has been improved by more than one times than 
> before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)