[
https://issues.apache.org/jira/browse/PHOENIX-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492821#comment-14492821
]
Shuxiong Ye commented on PHOENIX-1287:
--------------------------------------
[~jamestaylor] - I add StringUtil.calculateUTF8Offset, which is used to
calculate offset in bytes according to offset in string for a utf-8 encoded
string[\[1\]|https://github.com/shuxiong/phoenix/commit/6685633d4fe77fbc2fb023e722c662b91205d997].
I write another performance
tests[\[2\]|https://github.com/shuxiong/phoenix/commit/0a14497ca7c487fe4e2efa3448fec33c5ef77289],
which run the related functions of JavaPattern and JONIPattern, about 10m
times.
The tests are
1) Firstly construct parameter, some ImmutableBytesWritable,
2) call related functions in JavaPattern and JONIPattern using
ImmutableBytesWritable parameters about 10m times,
3) print consuming time
The result is :
{code}
Java Like Time=5.115
JONI Like Time=3.567
Java replaceAll Time=8.843
JONI replaceAll Time=8.512
Java Substr Time=5.183
JONI Substr Time=3.399
GuavaSplit Time=22.737
JONI Split Time=9.675
{code}
In this case, Functions in JavaPattern have to turn bytes in
ImmutableBytesWritable to strings, compute, and turn strings back to bytes into
ImmutableBytesWritable, while Functions in JONIPattern compute directly. So
JONIPattern is faster.
\[1\]
https://github.com/shuxiong/phoenix/commit/6685633d4fe77fbc2fb023e722c662b91205d997
\[2\]
https://github.com/shuxiong/phoenix/commit/0a14497ca7c487fe4e2efa3448fec33c5ef77289
> Use the joni byte[] regex engine in place of j.u.regex
> ------------------------------------------------------
>
> Key: PHOENIX-1287
> URL: https://issues.apache.org/jira/browse/PHOENIX-1287
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Assignee: Shuxiong Ye
> Labels: gsoc2015
> Attachments: add_varchar_to_performance_script.patch
>
>
> See HBASE-11907. We'd get a 2x perf benefit plus it's driven off of byte[]
> instead of strings.Thanks for the pointer, [~apurtell].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)