Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Staffan Friberg Tue, 23 Jun 2015 12:49:33 -0700

Hi Sherman,

Removed the check, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.4


Cheers,
Staffan

On 06/23/2015 11:22 AM, Xueming Shen wrote:

Hi Staffan,
#527 check is probably unnecessary. The size and csize are 32-bitunsigned integer, they
should never be < 0.

The rest looks good.

Thanks,
-Sherman

On 06/23/2015 10:54 AM, Staffan Friberg wrote:
Hi Sherman,
Thanks for the review. I removed the unused import and the changes toreduce the lock region.
New webrev, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.3

Thanks,
Staffan


On 06/08/2015 02:37 PM, Xueming Shen wrote:
Staffan,

(1) ByteArrayInputSteram is no longer needed in ZipFile
(2) You've changed the lock region in ZipFile.getInputSteram. Givenwe are notdoing ByteArrayInpusteram for this method, can we just nottouch this method
     and the class ZipFileInputSteram()?
The concern is that we did some changes in that area back to2010 and triggereda complicated race condition regression [1], it was finallyfixed after lot of roundsof discussion. I still have all those emails in my inbox. Itwould be better to keepwhatever works for now, instead of re-fresh all the memory(read all those emails)
     to figure out if the latest change might have a negative impact.

The getBytes() implementation looks good to me.

Thanks!
-Sherman
[1]http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-March/006355.html
On 06/05/2015 11:09 AM, Staffan Friberg wrote:
Hi Sherman,
I have a new webrev which reverts that part,http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.2/
Summary of changes
    Reduce lock region in ZipFile.getInputstream
Add private ZipFile.getBytes that can be used in select placesin the JDK where all bytes will be read
Could you sponsor this change once it has been reviewed?

Thanks,
Staffan

On 06/03/2015 10:45 AM, Xueming Shen wrote:
Staffan,
I'm not convinced that the benefit here is significant enough tochange thegetInputStream() to return a ByteArrayInputStream, given this canbe easilyachieved by wrapping the returned byte[] from getBytes(ZipEntry)at user'ssite. I would suggest to file a separate rfe on this disagreementand move on
with the agreed getBytes() for now.

Thanks,
-Sherman

On 06/02/2015 10:27 AM, Staffan Friberg wrote:
On 05/22/2015 01:15 PM, Staffan Friberg wrote:
On 05/22/2015 11:51 AM, Xueming Shen wrote:
On 05/22/2015 11:41 AM, Staffan Friberg wrote:
On 05/21/2015 11:00 AM, Staffan Friberg wrote:
On 05/21/2015 09:48 AM, Staffan Friberg wrote:
On 05/20/2015 10:57 AM, Xueming Shen wrote:
On 05/18/2015 06:44 PM, Staffan Friberg wrote:
Hi,
Wanted to get reviews and feedback on this performanceimprovement for reading from JAR/ZIP files duringclassloading by reducing unnecessary copying and readingthe entry in one go instead of in small portions. Thisshows a significant improvement when reading a singleentry and for a large application with 10k classes and500+ JAR files it improved the startup time by 4%.
For more details on the background and performance resultsplease see the RFE entry.
RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
WEBREV -http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
Cheers,
Staffan
Hi Staffan,
If I did not miss something here, from your use scenario itappears to me the only thing you really
need here to help boost your performance is

    byte[] ZipFile.getAllBytes(ZipEntry ze);
You are allocating a byte[] at use side and wrapping itwith a ByteBuffer if the size is small enough,otherwise, you letting the ZipFile to allocate a big enoughone for you. It does not look like youcan re-use that byte[] (has to be wrapped by theByteArrayInputStream and return), why do youneed two different methods here? The logic would be mucheasier to simply let the ZipFile to allocatethe needed buffer with appropriate size, fill the bytes andreturn, with a "OOME" if the entry size
is bigger than 2g.
The only thing we use from the input ze is its name, getthe size/csize from the jzentry, I don't thinkjzentry.csize/size can be "unknown", they are from the"cen" table.
If the real/final use of the bytes is to wrap it with aByteArrayInputStream,why bother using ByteBufferhere? Shouldn't a direct byte[] with exactly the size ofthe entry server better.
-Sherman
Hi Sherman,
Thanks for the comments. I agree, was starting out withbytebuffer because I was hoping to be able to cache thingswhere the buffer was being used, but since the buffer ispast along further I couldn't figure out a clean way to do it.Will rewrite it to simply just return a buffer, and onlywrap it in the Resource class getByteBuffer.
What would be your thought on updating theZipFile.getInputStream to return ByteArrayInputStream forsmall entries? Currently I do that work outside in twoplaces and moving it would potentially speed up othersreading small entries as well.
Thanks,
Staffan
Just realized that my use of ByteArrayInputStream would missJar verification if enabled so the way to go hear would be toadd it if possible to the ZipFile.getInputStream.
//Staffan
Hi,
Here is an updated webrev which uses a byte[] directly andalso uses ByteArrayInputStream in ZipFile for small entriesbelow 128k.
I'm not sure about the benefit of doing theByteArrayInputStream in ZipFile.getInputStream. It hasthe consequence of changing the "expected" behavior ofgetInputStream() (instead of return aninput stream waiting for reading, it now reads all bytes inadvance), something we might not wantto do in a performance tuning. Though it might be reasonable toguess everyone get an input stream
is to read all bytes from it later.

-Sherman
http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1

//Staffan
Agree that it will change the behavior slightly, but as you saidit is probably expected that some one will read the streameventually.We could reduce the size further if that makes a difference, ifthe size is below 65k we would not use more memory than thebuffer allocated for the InflaterStream today.The total allocation would be slightly larger for deflatedentries as we would allocate a byte[] for the compressed bytes,but it would be GC:able and not kept alive. So from a memoryperspective the difference is very limited.
//Staffan
Hi,
Bumping this thread to get some more comments on the concernabout changing the ZipFile.getInputStream behavior. The benefitof doing this change is that any read of small entries from ZIPand JAR files will be much faster and less resources will beheld, including native resources normally held by theZipInputStream.
The behavior change that will occur is that the full entry willbe read as part of creating the stream and not lazily as might beexpected. However when getting a today InputStream zip file willbe accessed to read information about the size of the entry, sothe zip file is already touched when getting an InputStream, butnot the compressed bytes.
I'm fine with removing this part of the change and just push theprivate getBytes function and the updates to the JDK libraries touse it.
Thanks,
Staffan

Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Reply via email to