Re: Reducing Garbage Generated by URLClassLoader

Xueming Shen Sun, 04 Dec 2016 21:59:18 -0800

On 12/4/16, 1:21 PM, Scott Palmer wrote:

Excuse me if this is the wrong list for this discussion.  Please direct me to 
the right place if this isn’t it.


When doing an analysis of garbage generation in our application we discovered a 
significant number of redundant strings generated by the class loader.  In my 
case there are hundreds of jars on the classpath - everything in the 
application is a plugin.  I figured on average 10kB of useless garbage char[]s 
were generated per findResource call for plugin resources.

This is caused mostly by the ZipFile implementation.  What is the purpose of 
java.util.zip.ZipCoder’s byte[] getBytes(String s) method?  It seems to simply 
be a custom implementation of string.getBytes(CharSet cs) and as such needs to 
first make a copy of the char[] to work on.

The "entry name" stored in the zip/jar file is not encoded as a UTF16char sequence but bytes insome "native" encodings, utf8 is one of these encodings the ZipFilesupports. The default one fora jar file is utf8. So when you want to lookup a resource from the jarfile with a name as a Stringobject, we have to convert/encode this "name" from String into thecorresponding byte[] in utf8and do a hash table lookup to find the resource. Here are someimplementation details

(1) why do we need a "custom" version in ZipFile. This is becauseString.getBytes(cs) replacesunmappable/malformed chars with "?" silently, ZipFile API needs to throwan corresponding

exception in this scenario, so we have to have a "custom" version to do it.

(2) for performance reason we don't want to convert all jar entry namesin all open jar file intoeither String or char[] in advance, they are kept as byte[] in theiroriginal form and we don't evenhave a single byte[] copy for each entry name, all names are kept intheir original "cen" table formin byte[] and we only have a "offset" to each entry's offset. We aretalking about hundreds ofjars and each jar has hundreds if not thousands of entries. Arguably wecan do the other wayaround, always convert those entry names in each open jar file toString, and then we don't haveto do the String->byte[] during lookup. It's a design decision. If thereis enough evidencesuggests otherwise, it can be changed/doable, given we now have all theimplementation at

Java level in jdk9.

That said, given the optimization we have done for String in jdk9, itmight be worth consideringto have a fast path for those ascii-only entry names (I would assume99.9%+ of the entry namesare ascii-only in real world), then it should take a simple byte[] copyto convert/encode those

entry names from String to byte[].

sherman

  This combined with the need to operate on byte[] path names internally in the 
ZipFile implementation means that URLClassLoader generates a lot of unnecessary 
garbage in a findResource call - proportional to the number of jars on the 
classpath.

Since JarFile forces the ZipFile to be open with UTF-8 always, if there was 
some API exposed that took a byte[] for the resource name, all of that extra 
string copying and encoding could be hoisted out of the loop in 
sun.misc.URLClassPath. Would this be worth it creating an internal class for 
something like a ‘ClasspathJarFile’ to and tweaking ZipFile so the byte[] based 
method is protected instead of private?

I also noticed that sun.net.www.ParseUtil.encodePath(String, boolean) usually 
had nothing useful to do but still made three copies of the string passed in 
anyway (two char arrays to work on, and the String returned).



Cheers,

Scott

Re: Reducing Garbage Generated by URLClassLoader

Reply via email to