toCharArray()

Xueming Shen Thu, 28 Apr 2011 12:48:56 -0700

On 04/28/2011 05:44 AM, Ulf Zibis wrote:

In malformed(byte[] src, int sp, int nb) I think you could cache theByteBuffer bb, instead instantiating a new one all the time. For thisthe method should not be static to ensure thread-safety.

I was assuming that in scenario that you have malformed byte(s) in yourinput bytesduring String.toCharAray()/getBytes() coding, the performance probablyis no longeryour top priority. That said, you do have the point, we should do bettereven inmalformed case, to wrap the input bytes every time there is a malformedbyte isdefinitely not preferred. The webrev has been updated to "cache" aByteBuffer wrapperobject for each round of decode/encode() operation, when necessary(means if a

malformed detected).

http://cr.openjdk.java.net/~sherman/7040220/webrev

(the previous one is athttp://cr.openjdk.java.net/~sherman/7040220/webrev.00)


Thanks,
-Sherman

Am 28.04.2011 08:34, schrieb Xueming Shen:
 Hi
This is motivated by Neil's request to optimize common-case UTF8 pathfor native ZipFile.getEntry calls [1].As I said in my replying email [2] I believe a better approach mightbe to "patch" UTF8 charset directly toimplement sun.nio.cs.ArrayDecoder/Encoder interface to speed up thecoding operation for array basedencoding/decoding under certain circumstance, as we did for allsingle byte charsets in #6636323 [3]. I
have a old blog [4] that has some data for this optimization.
The original plan was to do the same thing for our new UTF8 [5] aswell in JDK7, but then (excuse, excuse)I was just too busy to come back to this topic till 2 days ago. Aftertwo days of small tweaking here and thereand testing those possible corner cases I can think of, I'm happywith the result and think it might beworth sending it out for a codereview for JDK7, knowing we only havecouple days left.
The webrev is at

http://cr.openjdk.java.net/~sherman/7040220/webrev
Those tests are supposed to make sure the coding result from the newpaths for String.getBytes()/
toCharArray() matches the result from the existing implementation.
The performance results of running StrCodingBenchmarkUTF8 (includedin webrev) on my linux
box in -client and -server mode respectively are included at

http://cr.openjdk.java.net/~sherman/7040220/client
http://cr.openjdk.java.net/~sherman/7040220/server
The microbenchmark measures 1-byte, 2-byte, 3-byte and 4 bytes utf8bits separately with different
length of data (from 12 bytes to thousands)

Thanks!
-Sherman
[1]http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006710.html[2]http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006726.html
[3] http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
[4] http://blogs.sun.com/xuemingshen/entry/faster_new_string_bytes_cs
[5] http://blogs.sun.com/xuemingshen/entry/the_big_overhaul_of_java

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

Reply via email to