[ 
https://issues.apache.org/jira/browse/PHOENIX-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492821#comment-14492821
 ] 

Shuxiong Ye commented on PHOENIX-1287:
--------------------------------------

[~jamestaylor] - I add StringUtil.calculateUTF8Offset, which is used to 
calculate offset in bytes according to offset in string for a utf-8 encoded 
string[\[1\]|https://github.com/shuxiong/phoenix/commit/6685633d4fe77fbc2fb023e722c662b91205d997].

I write another performance 
tests[\[2\]|https://github.com/shuxiong/phoenix/commit/0a14497ca7c487fe4e2efa3448fec33c5ef77289],
 which run the related functions of JavaPattern and JONIPattern, about 10m 
times.

The tests are
1) Firstly construct parameter, some ImmutableBytesWritable,
2) call related functions in JavaPattern and JONIPattern using 
ImmutableBytesWritable parameters about 10m times,
3) print consuming time

The result is :
{code}
Java Like Time=5.115
JONI Like Time=3.567
Java replaceAll Time=8.843
JONI replaceAll Time=8.512
Java Substr Time=5.183
JONI Substr Time=3.399
GuavaSplit Time=22.737
JONI Split Time=9.675
{code}

In this case, Functions in JavaPattern have to turn bytes in 
ImmutableBytesWritable to strings, compute, and turn strings back to bytes into 
ImmutableBytesWritable, while Functions in JONIPattern compute directly. So 
JONIPattern is faster.

\[1\] 
https://github.com/shuxiong/phoenix/commit/6685633d4fe77fbc2fb023e722c662b91205d997
\[2\] 
https://github.com/shuxiong/phoenix/commit/0a14497ca7c487fe4e2efa3448fec33c5ef77289

> Use the joni byte[] regex engine in place of j.u.regex
> ------------------------------------------------------
>
>                 Key: PHOENIX-1287
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1287
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Shuxiong Ye
>              Labels: gsoc2015
>         Attachments: add_varchar_to_performance_script.patch
>
>
> See HBASE-11907. We'd get a 2x perf benefit plus it's driven off of byte[] 
> instead of strings.Thanks for the pointer, [~apurtell].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to