Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Staffan Friberg Tue, 23 Jun 2015 10:56:59 -0700

Hi Sherman,

Thanks for the review. I removed the unused import and the changes toreduce the lock region.


New webrev, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.3

Thanks,
Staffan


On 06/08/2015 02:37 PM, Xueming Shen wrote:

Staffan,

(1) ByteArrayInputSteram is no longer needed in ZipFile
(2) You've changed the lock region in ZipFile.getInputSteram. Given weare notdoing ByteArrayInpusteram for this method, can we just not touchthis method
     and the class ZipFileInputSteram()?
The concern is that we did some changes in that area back to 2010and triggereda complicated race condition regression [1], it was finally fixedafter lot of roundsof discussion. I still have all those emails in my inbox. Itwould be better to keepwhatever works for now, instead of re-fresh all the memory (readall those emails)
     to figure out if the latest change might have a negative impact.

The getBytes() implementation looks good to me.

Thanks!
-Sherman
[1]http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-March/006355.html
On 06/05/2015 11:09 AM, Staffan Friberg wrote:
Hi Sherman,
I have a new webrev which reverts that part,http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.2/
Summary of changes
    Reduce lock region in ZipFile.getInputstream
Add private ZipFile.getBytes that can be used in select places inthe JDK where all bytes will be read
Could you sponsor this change once it has been reviewed?

Thanks,
Staffan

On 06/03/2015 10:45 AM, Xueming Shen wrote:
Staffan,
I'm not convinced that the benefit here is significant enough tochange thegetInputStream() to return a ByteArrayInputStream, given this can beeasilyachieved by wrapping the returned byte[] from getBytes(ZipEntry) atuser'ssite. I would suggest to file a separate rfe on this disagreementand move on
with the agreed getBytes() for now.

Thanks,
-Sherman

On 06/02/2015 10:27 AM, Staffan Friberg wrote:
On 05/22/2015 01:15 PM, Staffan Friberg wrote:
On 05/22/2015 11:51 AM, Xueming Shen wrote:
On 05/22/2015 11:41 AM, Staffan Friberg wrote:
On 05/21/2015 11:00 AM, Staffan Friberg wrote:
On 05/21/2015 09:48 AM, Staffan Friberg wrote:
On 05/20/2015 10:57 AM, Xueming Shen wrote:
On 05/18/2015 06:44 PM, Staffan Friberg wrote:
Hi,
Wanted to get reviews and feedback on this performanceimprovement for reading from JAR/ZIP files duringclassloading by reducing unnecessary copying and reading theentry in one go instead of in small portions. This shows asignificant improvement when reading a single entry and fora large application with 10k classes and 500+ JAR files itimproved the startup time by 4%.
For more details on the background and performance resultsplease see the RFE entry.
RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
WEBREV -http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
Cheers,
Staffan
Hi Staffan,
If I did not miss something here, from your use scenario itappears to me the only thing you really
need here to help boost your performance is

    byte[] ZipFile.getAllBytes(ZipEntry ze);
You are allocating a byte[] at use side and wrapping it witha ByteBuffer if the size is small enough,otherwise, you letting the ZipFile to allocate a big enoughone for you. It does not look like youcan re-use that byte[] (has to be wrapped by theByteArrayInputStream and return), why do youneed two different methods here? The logic would be mucheasier to simply let the ZipFile to allocatethe needed buffer with appropriate size, fill the bytes andreturn, with a "OOME" if the entry size
is bigger than 2g.
The only thing we use from the input ze is its name, get thesize/csize from the jzentry, I don't thinkjzentry.csize/size can be "unknown", they are from the "cen"table.
If the real/final use of the bytes is to wrap it with aByteArrayInputStream,why bother using ByteBufferhere? Shouldn't a direct byte[] with exactly the size of theentry server better.
-Sherman
Hi Sherman,
Thanks for the comments. I agree, was starting out withbytebuffer because I was hoping to be able to cache thingswhere the buffer was being used, but since the buffer is pastalong further I couldn't figure out a clean way to do it.Will rewrite it to simply just return a buffer, and only wrapit in the Resource class getByteBuffer.
What would be your thought on updating theZipFile.getInputStream to return ByteArrayInputStream forsmall entries? Currently I do that work outside in two placesand moving it would potentially speed up others reading smallentries as well.
Thanks,
Staffan
Just realized that my use of ByteArrayInputStream would missJar verification if enabled so the way to go hear would be toadd it if possible to the ZipFile.getInputStream.
//Staffan
Hi,
Here is an updated webrev which uses a byte[] directly and alsouses ByteArrayInputStream in ZipFile for small entries below 128k.
I'm not sure about the benefit of doing the ByteArrayInputStreamin ZipFile.getInputStream. It hasthe consequence of changing the "expected" behavior ofgetInputStream() (instead of return aninput stream waiting for reading, it now reads all bytes inadvance), something we might not wantto do in a performance tuning. Though it might be reasonable toguess everyone get an input stream
is to read all bytes from it later.

-Sherman
http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1

//Staffan
Agree that it will change the behavior slightly, but as you saidit is probably expected that some one will read the streameventually.We could reduce the size further if that makes a difference, ifthe size is below 65k we would not use more memory than the bufferallocated for the InflaterStream today.The total allocation would be slightly larger for deflated entriesas we would allocate a byte[] for the compressed bytes, but itwould be GC:able and not kept alive. So from a memory perspectivethe difference is very limited.
//Staffan
Hi,
Bumping this thread to get some more comments on the concern aboutchanging the ZipFile.getInputStream behavior. The benefit of doingthis change is that any read of small entries from ZIP and JARfiles will be much faster and less resources will be held,including native resources normally held by the ZipInputStream.
The behavior change that will occur is that the full entry will beread as part of creating the stream and not lazily as might beexpected. However when getting a today InputStream zip file will beaccessed to read information about the size of the entry, so thezip file is already touched when getting an InputStream, but notthe compressed bytes.
I'm fine with removing this part of the change and just push theprivate getBytes function and the updates to the JDK libraries touse it.
Thanks,
Staffan

Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Reply via email to