Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Staffan Friberg Tue, 02 Jun 2015 10:28:54 -0700


On 05/22/2015 01:15 PM, Staffan Friberg wrote:

On 05/22/2015 11:51 AM, Xueming Shen wrote:
On 05/22/2015 11:41 AM, Staffan Friberg wrote:
On 05/21/2015 11:00 AM, Staffan Friberg wrote:
On 05/21/2015 09:48 AM, Staffan Friberg wrote:
On 05/20/2015 10:57 AM, Xueming Shen wrote:
On 05/18/2015 06:44 PM, Staffan Friberg wrote:
Hi,
Wanted to get reviews and feedback on this performanceimprovement for reading from JAR/ZIP files during classloadingby reducing unnecessary copying and reading the entry in one goinstead of in small portions. This shows a significantimprovement when reading a single entry and for a largeapplication with 10k classes and 500+ JAR files it improved thestartup time by 4%.
For more details on the background and performance resultsplease see the RFE entry.
RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0

Cheers,
Staffan
Hi Staffan,
If I did not miss something here, from your use scenario itappears to me the only thing you really
need here to help boost your performance is

    byte[] ZipFile.getAllBytes(ZipEntry ze);
You are allocating a byte[] at use side and wrapping it with aByteBuffer if the size is small enough,otherwise, you letting the ZipFile to allocate a big enough onefor you. It does not look like youcan re-use that byte[] (has to be wrapped by theByteArrayInputStream and return), why do youneed two different methods here? The logic would be much easierto simply let the ZipFile to allocatethe needed buffer with appropriate size, fill the bytes andreturn, with a "OOME" if the entry size
is bigger than 2g.
The only thing we use from the input ze is its name, get thesize/csize from the jzentry, I don't think
jzentry.csize/size can be "unknown", they are from the "cen" table.
If the real/final use of the bytes is to wrap it with aByteArrayInputStream,why bother using ByteBufferhere? Shouldn't a direct byte[] with exactly the size of theentry server better.
-Sherman
Hi Sherman,
Thanks for the comments. I agree, was starting out with bytebufferbecause I was hoping to be able to cache things where the bufferwas being used, but since the buffer is past along further Icouldn't figure out a clean way to do it.Will rewrite it to simply just return a buffer, and only wrap itin the Resource class getByteBuffer.
What would be your thought on updating the ZipFile.getInputStreamto return ByteArrayInputStream for small entries? Currently I dothat work outside in two places and moving it would potentiallyspeed up others reading small entries as well.
Thanks,
Staffan
Just realized that my use of ByteArrayInputStream would miss Jarverification if enabled so the way to go hear would be to add it ifpossible to the ZipFile.getInputStream.
//Staffan
Hi,
Here is an updated webrev which uses a byte[] directly and also usesByteArrayInputStream in ZipFile for small entries below 128k.
I'm not sure about the benefit of doing the ByteArrayInputStream inZipFile.getInputStream. It hasthe consequence of changing the "expected" behavior ofgetInputStream() (instead of return aninput stream waiting for reading, it now reads all bytes in advance),something we might not wantto do in a performance tuning. Though it might be reasonable to guesseveryone get an input stream
is to read all bytes from it later.

-Sherman
http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1

//Staffan
Agree that it will change the behavior slightly, but as you said it isprobably expected that some one will read the stream eventually.We could reduce the size further if that makes a difference, if thesize is below 65k we would not use more memory than the bufferallocated for the InflaterStream today.The total allocation would be slightly larger for deflated entries aswe would allocate a byte[] for the compressed bytes, but it would beGC:able and not kept alive. So from a memory perspective thedifference is very limited.
//Staffan

Hi,

Bumping this thread to get some more comments on the concern aboutchanging the ZipFile.getInputStream behavior. The benefit of doing thischange is that any read of small entries from ZIP and JAR files will bemuch faster and less resources will be held, including native resourcesnormally held by the ZipInputStream.

The behavior change that will occur is that the full entry will be readas part of creating the stream and not lazily as might be expected.However when getting a today InputStream zip file will be accessed toread information about the size of the entry, so the zip file is alreadytouched when getting an InputStream, but not the compressed bytes.

I'm fine with removing this part of the change and just push the privategetBytes function and the updates to the JDK libraries to use it.


Thanks,
Staffan

Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Reply via email to