[ https://issues.apache.org/jira/browse/HBASE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758181#comment-13758181 ]
Hudson commented on HBASE-9428: ------------------------------- FAILURE: Integrated in hbase-0.96 #8 (See [https://builds.apache.org/job/hbase-0.96/8/]) HBASE-9428 Regex filters are at least an order of magnitude slower since 0.94.3 (larsh: rev 1520100) * /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/RegexStringComparator.java > Regex filters are at least an order of magnitude slower since 0.94.3 > -------------------------------------------------------------------- > > Key: HBASE-9428 > URL: https://issues.apache.org/jira/browse/HBASE-9428 > Project: HBase > Issue Type: Bug > Reporter: Jean-Daniel Cryans > Assignee: Lars Hofhansl > Fix For: 0.98.0, 0.94.12, 0.96.1 > > Attachments: 9428-0.94.txt, 9428-trunk.txt > > > I found this issue after debugging a performance problem on an OpenTSDB > cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It > was caused by HBASE-7279 (ping [~lhofhansl]). > The easiest way to see it is to run a simple 1 client PE: > {noformat} > $ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 > {noformat} > Then in the shell do a filter scan (flush the table first and make sure if > fits in your blockcache if you want stable numbers). > Pre HBASE-7279: > {noformat} > hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=, > 'regexstring:0000055872') )"} > ROW COLUMN+CELL > > > 0000055872 column=info:data, > timestamp=1378248850191, value=(blanked) > > > 1 row(s) in 1.2780 seconds > {noformat} > Post HBASE-7279 > {noformat} > hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=, > 'regexstring:0000055872') )"} > ROW COLUMN+CELL > > > 0000055872 column=info:data, > timestamp=1378248850191, value=(blanked) > > > 1 row(s) in 24.2940 seconds > {noformat} > I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all > slow like this. > It seems that since that jira went in we do a lot more row matching, and > running the regex gets super expensive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira