Hi Volker,

On 2020-04-24 20:37, Volker Simonis wrote:
On Thu, Apr 23, 2020 at 2:35 PM Claes Redestad
<[email protected]> wrote:

Hi,

current implementation of ZipFile.getEntryPos takes the encoded byte[]
of the String we're looking up. Which means when looking up entries
across multiple jar files, we allocate the byte[] over and over for each
jar file searched.

If we instead refactor the internal hash table to use a normalized
String-based hash value, we can almost always avoid both the repeated
encoding and (most of) the hash calculation when the entry is not found
in the jar/zip.

This realizes a significant startup improvement on applications with
several or many jar files. A rather typical case. For example I see a
25% speedup of ZipEnty.getEntry calls on a Spring PetClinic, which
reduces total startup time by ~120ms, or ~2.5% of total, on my setup. At
the same time remaining neutral on single-jar apps.

Webrev: http://cr.openjdk.java.net/~redestad/8243469/open.00/
Bug:    https://bugs.openjdk.java.net/browse/JDK-8243469

Testing: tier1-2

Hi Claes,

that's a little puzzling change but also a nice enhancement and cleanup.

first: thanks for reviewing!

Yes, I've tried to simplify and explain everything clearly, since it's a
rather delicate area and we're already stretching the complexity budget
thin.


I think it looks good. I have only two minor comments:

There's no check for "end > 0" here:
   93         @Override
   94         boolean hasTrailingSlash(byte[] a, int end) {
   95             return a[end - 1] == '/';
   96         }
I think that's currently no real problem, but maybe put in a check for any case?

Will do.


And while you're on it, I think the following comment should be updated from:

  641     /* Checks ensureOpen() before invoke this method */

to something like:

  641     /* Check ensureOpen() before invoking this method */

Will do.



I've also had a quick look at the microbenchmark which you've
apparently only added today :)

It seems that 'getEntryHitUncached' is getting slightly slower with
your change while all the other variants get significantly faster. I
don't think that's a problem, but do you have an explanation why
that's the case?

I've noticed it swing a bit either way, and have been asking myself the
same thing. After a little analysis I think it's actually a bug in my
microbenchmark: I'm always looking up the same entry, and thus hitting
the same bucket in the hash table. If that one has a collision, we'll do
a few extra passes. If not, we won't. This might be reflected as a
significant swing in either direction.

I'm going to try rewriting it to consider more (if not all) entries in
the zip file. That should mean the cost averages out a bit.

/Claes


Thanks for this nice improvement,
Volker


Thanks!

/Claes

Reply via email to