RFR: JDK-8184947:,ZipCoder performance improvements

Xueming Shen Fri, 08 Dec 2017 15:14:54 -0800

Hi,

Please help review the changes for j.u.z.ZipCoder/JDK-8184947 (whichalso includescleanup/improvement work in java.lang.StringCoding.java to speed upgeneral String

coding performance, especially for UTF8).


issue: https://bugs.openjdk.java.net/browse/JDK-8184947
webrev: http://cr.openjdk.java.net/~sherman/8184947/webrev

jmh benchmark:
http://cr.openjdk.java.net/~sherman/8184947/ZipCodingBM.java
http://cr.openjdk.java.net/~sherman/8184947/StringCodingBM.java

Notes:

(1) StringCoding.de/encode() for new String()/String.getBytes() withdefault charset.

For historical reason the existing SC.decode(byte[], off,len)/encode(coder, val)

implementation has code to handle any "possible" UnsupportedEncodingExcetion

situation and turn to the slow "charset name" version of de/encode() forreal work.Given the fact that the Charset.defaultCharset() now returns UTF8 as thefallbackdefault charset if there is anything wrong to obtain a default charset(we did that injdk7 or 8?), there is no need actually to handle the UEE. This alsoprovides the

opportunity to use fastpath for stateless UTF8/88591/ASCII de/encode(). The

benchmark data for newString_xxx/ getBytes_xxx (which uses the defaultencoding,

UTF8  in this case) suggests a big speed up fo ascii-only String.

StringCodingBM         size)  Mode  Cnt   NEW Score   Error     OLD Score    
Error  Units


getBytes_ASCII            16  avgt    5    21.155 Â±   5.586      63.777 Â±   
54.262  ns/op
getBytes_ASCII            64  avgt    5    20.854 Â±   6.237      98.988 Â±   
62.932  ns/op
getBytes_ASCII           256  avgt    5    38.291 Â±   8.494     272.306 Â±   
77.951  ns/op
getBytes_Latin            16  avgt    5    80.968 Â±  15.814      76.769 Â±   
38.512  ns/op
getBytes_Latin            64  avgt    5   163.078 Â±  51.993     219.085 Â±   
42.665  ns/op
getBytes_Latin           256  avgt    5   759.548 Â±  99.386     824.594 Â±  
763.735  ns/op
getBytes_Unicode          16  avgt    5    94.311 Â±  22.189     124.185 Â±   
32.751  ns/op
getBytes_Unicode          64  avgt    5   289.603 Â± 152.056     321.541 Â±  
103.703  ns/op
getBytes_Unicode         256  avgt    5  1253.098 Â± 216.243    1201.667 Â±  
512.532  ns/op

newString_ASCII           16  avgt    5    33.273 Â±  13.780      50.402 Â±   
17.574  ns/op
newString_ASCII           64  avgt    5    30.420 Â±   6.207      84.989 Â±   
43.355  ns/op
newString_ASCII          256  avgt    5    54.391 Â±  10.451     208.096 Â±  
102.716  ns/op
newString_Latin           16  avgt    5   115.606 Â±   7.181     114.186 Â±   
36.310  ns/op
newString_Latin           64  avgt    5   393..710 Â±  73.478    414.286 Â±  
176.837  ns/op
newString_Latin          256  avgt    5  1618.967 Â± 289.044    1551.499 Â±  
487.904  ns/op
newString_Unicode         16  avgt    5   104.848 Â±  32.694     127.558 Â±   
12.029  ns/op
newString_Unicode         64  avgt    5   377.894 Â± 147.731     374.779 Â±   
53.028  ns/op
newString_Unicode        256  avgt    5  1557.977 Â± 318.652    1457.236 Â±  
284.424  ns/op

(2) updated to "fast path" UTF8/8859-1/ASCII in all de/coding operation,which are allimplemented in static /stateless methods. (benchmark for MS932 [4]provide to make

sure no regression for "other" charsets)

(3) added "fast path" for "ascii-only' bytes for utf8encoding/getBytes(). The benchmark[1] suggests a big speedup for ascii-only getBytes() with limited costto non-ascii-onlycases. (this helps big for (4), the ZipCoder situation, which mainlyuses ascii only).


(4) java.util.zip.ZipCoder

This is where this patch actually started from. As the rfe suggested weare now usingbyte[] as the internal storage for the String class, the optimization weput in ZipCoderfor UTF8 (which uses the byte[]/char[] interface of out UTF8implementation to helpavoid the relatively heavy ByteBuffer/CharBuffer coding interface) nowappears to be

not that "optimized". The to/from char[] copy/paste has become a waste.

ZipCoder implementation can't use new String/String.getBytes() directlybecause of thethe different malformed/unmappable character handing requirement. Theproposedchange here is to add a pair of special new String()/String.getBytes()in StrngCodingclass to throw IAE instead of silent replacement, via (yet another)SharedSecretsinterface. This brings us much faster de/encoding (30%-50% speed up) andmuch lessmemory usage (no more unnecessary byte[]/char[] allocation and indefault mode, thereis only ONE utf8 ZipCoder), on all "Jar/ZipEntry" related accessoperations.


ZipCodeBenchMark [latest]
    * "New Score" is with the patch

* getEntry() is mainly String.getBytes(), entries()/stream() ismainly new String(bytes)).


               Mode  Cnt     New Score   Error      Old Score         Units
jf_entries     avgt   20     0.582 Â±    0.036      0.953 Â±   0.108   ms/op
jf_getEntry    avgt   20     1.506 Â±    0.158      2.052 Â±   0.171   ms/op
jf_stream      avgt   20     0.698 Â±    0.060      0.940 Â±   0.067   ms/op
zf_entries     avgt   20     0.691 Â±    0.057      0.917 Â±   0.080   ms/op
zf_getEntry    avgt   20     1.459 Â±    0.180      2.081 Â±   0.161   ms/op
zf_stream      avgt   20     0.626 Â±    0.074      0.909 Â±   0.075   ms/op



Thanks,
Sherman

[1] http://cr.openjdk.java.net/~sherman/8184947/StringCoding.utf8
[2]http://cr.openjdk.java.net/~sherman/8184947/StringCoding.8859_1
[3] http://cr.openjdk.java.net/~sherman/8184947/StringCoding.ascii
[4]http://cr.openjdk.java.net/~sherman/8184947/StringCoding.ms932
[5] http://cr.openjdk.java.net/~sherman/8184947/ZipCoding.bm

RFR: JDK-8184947:,ZipCoder performance improvements

Reply via email to