On 04/28/2011 05:44 AM, Ulf Zibis wrote:

In malformed(byte[] src, int sp, int nb) I think you could cache the ByteBuffer bb, instead instantiating a new one all the time. For this the method should not be static to ensure thread-safety.

I was assuming that in scenario that you have malformed byte(s) in your input bytes during String.toCharAray()/getBytes() coding, the performance probably is no longer your top priority. That said, you do have the point, we should do better even in malformed case, to wrap the input bytes every time there is a malformed byte is definitely not preferred. The webrev has been updated to "cache" a ByteBuffer wrapper object for each round of decode/encode() operation, when necessary (means if a
malformed detected).

http://cr.openjdk.java.net/~sherman/7040220/webrev

(the previous one is at http://cr.openjdk.java.net/~sherman/7040220/webrev.00)

Thanks,
-Sherman




Am 28.04.2011 08:34, schrieb Xueming Shen:
 Hi

This is motivated by Neil's request to optimize common-case UTF8 path for native ZipFile.getEntry calls [1]. As I said in my replying email [2] I believe a better approach might be to "patch" UTF8 charset directly to implement sun.nio.cs.ArrayDecoder/Encoder interface to speed up the coding operation for array based encoding/decoding under certain circumstance, as we did for all single byte charsets in #6636323 [3]. I
have a old blog [4] that has some data for this optimization.

The original plan was to do the same thing for our new UTF8 [5] as well in JDK7, but then (excuse, excuse) I was just too busy to come back to this topic till 2 days ago. After two days of small tweaking here and there and testing those possible corner cases I can think of, I'm happy with the result and think it might be worth sending it out for a codereview for JDK7, knowing we only have couple days left.

The webrev is at

http://cr.openjdk.java.net/~sherman/7040220/webrev

Those tests are supposed to make sure the coding result from the new paths for String.getBytes()/
toCharArray() matches the result from the existing implementation.

The performance results of running StrCodingBenchmarkUTF8 (included in webrev) on my linux
box in -client and -server mode respectively are included at

http://cr.openjdk.java.net/~sherman/7040220/client
http://cr.openjdk.java.net/~sherman/7040220/server

The microbenchmark measures 1-byte, 2-byte, 3-byte and 4 bytes utf8 bits separately with different
length of data (from 12 bytes to thousands)

Thanks!
-Sherman

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006710.html [2] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006726.html
[3] http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
[4] http://blogs.sun.com/xuemingshen/entry/faster_new_string_bytes_cs
[5] http://blogs.sun.com/xuemingshen/entry/the_big_overhaul_of_java


Reply via email to