[ https://issues.apache.org/jira/browse/PHOENIX-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492821#comment-14492821 ]
Shuxiong Ye commented on PHOENIX-1287: -------------------------------------- [~jamestaylor] - I add StringUtil.calculateUTF8Offset, which is used to calculate offset in bytes according to offset in string for a utf-8 encoded string[\[1\]|https://github.com/shuxiong/phoenix/commit/6685633d4fe77fbc2fb023e722c662b91205d997]. I write another performance tests[\[2\]|https://github.com/shuxiong/phoenix/commit/0a14497ca7c487fe4e2efa3448fec33c5ef77289], which run the related functions of JavaPattern and JONIPattern, about 10m times. The tests are 1) Firstly construct parameter, some ImmutableBytesWritable, 2) call related functions in JavaPattern and JONIPattern using ImmutableBytesWritable parameters about 10m times, 3) print consuming time The result is : {code} Java Like Time=5.115 JONI Like Time=3.567 Java replaceAll Time=8.843 JONI replaceAll Time=8.512 Java Substr Time=5.183 JONI Substr Time=3.399 GuavaSplit Time=22.737 JONI Split Time=9.675 {code} In this case, Functions in JavaPattern have to turn bytes in ImmutableBytesWritable to strings, compute, and turn strings back to bytes into ImmutableBytesWritable, while Functions in JONIPattern compute directly. So JONIPattern is faster. \[1\] https://github.com/shuxiong/phoenix/commit/6685633d4fe77fbc2fb023e722c662b91205d997 \[2\] https://github.com/shuxiong/phoenix/commit/0a14497ca7c487fe4e2efa3448fec33c5ef77289 > Use the joni byte[] regex engine in place of j.u.regex > ------------------------------------------------------ > > Key: PHOENIX-1287 > URL: https://issues.apache.org/jira/browse/PHOENIX-1287 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Assignee: Shuxiong Ye > Labels: gsoc2015 > Attachments: add_varchar_to_performance_script.patch > > > See HBASE-11907. We'd get a 2x perf benefit plus it's driven off of byte[] > instead of strings.Thanks for the pointer, [~apurtell]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)