[ 
https://issues.apache.org/jira/browse/PHOENIX-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385351#comment-14385351
 ] 

Shuxiong Ye commented on PHOENIX-1287:
--------------------------------------

I set up environment using my laptop.

I use performance.py to generate 10m rows, and run the following queries, using 
ByteBased and StringBased regex, 5 times each.

{code}
Query # 6 - Like + Count - SELECT COUNT(1) FROM PERFORMANCE_10000000 WHERE 
DOMAIN LIKE '%o%e%';
Query # 7 - Replace + Count - SELECT COUNT(1) FROM PERFORMANCE_10000000 WHERE 
REGEXP_REPLACE(DOMAIN, '[a-z]+')='G.';
Query # 8 - Substr + Count - SELECT COUNT(1) FROM PERFORMANCE_10000000 WHERE 
REGEXP_SUBSTR(DOMAIN, '[a-z]+')='oogle';
{code}

|| || ByteBased || StringBased || SpeedUp ||
| Like |  8.644/ 7.995/ 7.868/ 7.865/ 7.763 |  9.803/ 9.497/ 8.706/ 8.796/ 
8.805 | 1.136 |
| Replace | 11.725/11.071/11.199/10.988/10.970 | 
10.576/10.495/10.271/10.354/10.178 | 0.927 |
| Substr |  8.380/ 8.107/ 8.248/ 8.319/ 8.302 | 9.478/ 9.227/ 9.294/ 9.024/ 
9.158 | 1.116 |

Like and Substr have slightly speedup, while for Replace, Byte-Based 
implementation is slower than String-Based one. 

> Use the joni byte[] regex engine in place of j.u.regex
> ------------------------------------------------------
>
>                 Key: PHOENIX-1287
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1287
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Shuxiong Ye
>              Labels: gsoc2015
>
> See HBASE-11907. We'd get a 2x perf benefit plus it's driven off of byte[] 
> instead of strings.Thanks for the pointer, [~apurtell].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to