[
https://issues.apache.org/jira/browse/PHOENIX-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385351#comment-14385351
]
Shuxiong Ye commented on PHOENIX-1287:
--------------------------------------
I set up environment using my laptop.
I use performance.py to generate 10m rows, and run the following queries, using
ByteBased and StringBased regex, 5 times each.
{code}
Query # 6 - Like + Count - SELECT COUNT(1) FROM PERFORMANCE_10000000 WHERE
DOMAIN LIKE '%o%e%';
Query # 7 - Replace + Count - SELECT COUNT(1) FROM PERFORMANCE_10000000 WHERE
REGEXP_REPLACE(DOMAIN, '[a-z]+')='G.';
Query # 8 - Substr + Count - SELECT COUNT(1) FROM PERFORMANCE_10000000 WHERE
REGEXP_SUBSTR(DOMAIN, '[a-z]+')='oogle';
{code}
|| || ByteBased || StringBased || SpeedUp ||
| Like | 8.644/ 7.995/ 7.868/ 7.865/ 7.763 | 9.803/ 9.497/ 8.706/ 8.796/
8.805 | 1.136 |
| Replace | 11.725/11.071/11.199/10.988/10.970 |
10.576/10.495/10.271/10.354/10.178 | 0.927 |
| Substr | 8.380/ 8.107/ 8.248/ 8.319/ 8.302 | 9.478/ 9.227/ 9.294/ 9.024/
9.158 | 1.116 |
Like and Substr have slightly speedup, while for Replace, Byte-Based
implementation is slower than String-Based one.
> Use the joni byte[] regex engine in place of j.u.regex
> ------------------------------------------------------
>
> Key: PHOENIX-1287
> URL: https://issues.apache.org/jira/browse/PHOENIX-1287
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Assignee: Shuxiong Ye
> Labels: gsoc2015
>
> See HBASE-11907. We'd get a 2x perf benefit plus it's driven off of byte[]
> instead of strings.Thanks for the pointer, [~apurtell].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)