Hi Scott,
On 12/04/2016 01:21 PM, Scott Palmer wrote:
Excuse me if this is the wrong list for this discussion. Please direct me to
the right place if this isn’t it.
I think this is a good place based on the aspects you're addressing.
When doing an analysis of garbage generation in our application we discovered a
significant number of redundant strings generated by the class loader. In my
case there are hundreds of jars on the classpath - everything in the
application is a plugin. I figured on average 10kB of useless garbage char[]s
were generated per findResource call for plugin resources.
This is caused mostly by the ZipFile implementation. What is the purpose of
java.util.zip.ZipCoder’s byte[] getBytes(String s) method? It seems to simply
be a custom implementation of string.getBytes(CharSet cs) and as such needs to
first make a copy of the char[] to work on. This combined with the need to
operate on byte[] path names internally in the ZipFile implementation means
that URLClassLoader generates a lot of unnecessary garbage in a findResource
call - proportional to the number of jars on the classpath.
Since JarFile forces the ZipFile to be open with UTF-8 always, if there was
some API exposed that took a byte[] for the resource name, all of that extra
string copying and encoding could be hoisted out of the loop in
sun.misc.URLClassPath. Would this be worth it creating an internal class for
something like a ‘ClasspathJarFile’ to and tweaking ZipFile so the byte[] based
method is protected instead of private?
I can't answer for ZipCoder, but there's been some recent attempts to
address some of this, the latest of which was ultimately abandoned since
the added complexity was deemed too high:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-June/034397.html
Is there some way for you to test that patch on your application on an
OpenJDK build? If it gives you more than 1-2% it might be reason to re-open.
Exposing methods that take byte[]s as input is probably not a good idea,
however, since it can lead to various unwanted side-effects[1]
I also noticed that sun.net.www.ParseUtil.encodePath(String, boolean) usually
had nothing useful to do but still made three copies of the string passed in
anyway (two char arrays to work on, and the String returned).
I've seen this in some startup profiles before, and it did look like a
low-hanging fruit at that time too, but I recall having issues with
regressing on strings that actually needs to be encoded. Could be rather
straightforward to resolve if someone has time to attempt a solution
that avoids allocation when there's nothing to encode and can avoid a
throughput regression when encoding *does* happen I'd be happy to review it.
Thanks!
/Claes
[1] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use
Cheers,
Scott