Hi Volker,
On 2020-04-24 20:37, Volker Simonis wrote:
On Thu, Apr 23, 2020 at 2:35 PM Claes Redestad
<[email protected]> wrote:
Hi,
current implementation of ZipFile.getEntryPos takes the encoded byte[]
of the String we're looking up. Which means when looking up entries
across multiple jar files, we allocate the byte[] over and over for each
jar file searched.
If we instead refactor the internal hash table to use a normalized
String-based hash value, we can almost always avoid both the repeated
encoding and (most of) the hash calculation when the entry is not found
in the jar/zip.
This realizes a significant startup improvement on applications with
several or many jar files. A rather typical case. For example I see a
25% speedup of ZipEnty.getEntry calls on a Spring PetClinic, which
reduces total startup time by ~120ms, or ~2.5% of total, on my setup. At
the same time remaining neutral on single-jar apps.
Webrev: http://cr.openjdk.java.net/~redestad/8243469/open.00/
Bug: https://bugs.openjdk.java.net/browse/JDK-8243469
Testing: tier1-2
Hi Claes,
that's a little puzzling change but also a nice enhancement and cleanup.
first: thanks for reviewing!
Yes, I've tried to simplify and explain everything clearly, since it's a
rather delicate area and we're already stretching the complexity budget
thin.
I think it looks good. I have only two minor comments:
There's no check for "end > 0" here:
93 @Override
94 boolean hasTrailingSlash(byte[] a, int end) {
95 return a[end - 1] == '/';
96 }
I think that's currently no real problem, but maybe put in a check for any case?
Will do.
And while you're on it, I think the following comment should be updated from:
641 /* Checks ensureOpen() before invoke this method */
to something like:
641 /* Check ensureOpen() before invoking this method */
Will do.
I've also had a quick look at the microbenchmark which you've
apparently only added today :)
It seems that 'getEntryHitUncached' is getting slightly slower with
your change while all the other variants get significantly faster. I
don't think that's a problem, but do you have an explanation why
that's the case?
I've noticed it swing a bit either way, and have been asking myself the
same thing. After a little analysis I think it's actually a bug in my
microbenchmark: I'm always looking up the same entry, and thus hitting
the same bucket in the hash table. If that one has a collision, we'll do
a few extra passes. If not, we won't. This might be reflected as a
significant swing in either direction.
I'm going to try rewriting it to consider more (if not all) entries in
the zip file. That should mean the cost averages out a bit.
/Claes
Thanks for this nice improvement,
Volker
Thanks!
/Claes