On 04/28/2011 05:44 AM, Ulf Zibis wrote:
In malformed(byte[] src, int sp, int nb) I think you could cache the
ByteBuffer bb, instead instantiating a new one all the time. For this
the method should not be static to ensure thread-safety.
I was assuming that in scenario that you have malformed byte(s) in your
input bytes
during String.toCharAray()/getBytes() coding, the performance probably
is no longer
your top priority. That said, you do have the point, we should do better
even in
malformed case, to wrap the input bytes every time there is a malformed
byte is
definitely not preferred. The webrev has been updated to "cache" a
ByteBuffer wrapper
object for each round of decode/encode() operation, when necessary
(means if a
malformed detected).
http://cr.openjdk.java.net/~sherman/7040220/webrev
(the previous one is at
http://cr.openjdk.java.net/~sherman/7040220/webrev.00)
Thanks,
-Sherman
Am 28.04.2011 08:34, schrieb Xueming Shen:
Hi
This is motivated by Neil's request to optimize common-case UTF8 path
for native ZipFile.getEntry calls [1].
As I said in my replying email [2] I believe a better approach might
be to "patch" UTF8 charset directly to
implement sun.nio.cs.ArrayDecoder/Encoder interface to speed up the
coding operation for array based
encoding/decoding under certain circumstance, as we did for all
single byte charsets in #6636323 [3]. I
have a old blog [4] that has some data for this optimization.
The original plan was to do the same thing for our new UTF8 [5] as
well in JDK7, but then (excuse, excuse)
I was just too busy to come back to this topic till 2 days ago. After
two days of small tweaking here and there
and testing those possible corner cases I can think of, I'm happy
with the result and think it might be
worth sending it out for a codereview for JDK7, knowing we only have
couple days left.
The webrev is at
http://cr.openjdk.java.net/~sherman/7040220/webrev
Those tests are supposed to make sure the coding result from the new
paths for String.getBytes()/
toCharArray() matches the result from the existing implementation.
The performance results of running StrCodingBenchmarkUTF8 (included
in webrev) on my linux
box in -client and -server mode respectively are included at
http://cr.openjdk.java.net/~sherman/7040220/client
http://cr.openjdk.java.net/~sherman/7040220/server
The microbenchmark measures 1-byte, 2-byte, 3-byte and 4 bytes utf8
bits separately with different
length of data (from 12 bytes to thousands)
Thanks!
-Sherman
[1]
http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006710.html
[2]
http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006726.html
[3] http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
[4] http://blogs.sun.com/xuemingshen/entry/faster_new_string_bytes_cs
[5] http://blogs.sun.com/xuemingshen/entry/the_big_overhaul_of_java