Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Xueming Shen Tue, 23 Jun 2015 11:24:12 -0700

Hi Staffan,

#527 check is probably unnecessary. The size and csize are 32-bit unsigned 
integer, they
should never be < 0.


The rest looks good.

Thanks,
-Sherman

On 06/23/2015 10:54 AM, Staffan Friberg wrote:

Hi Sherman,

Thanks for the review. I removed the unused import and the changes to reduce 
the lock region.

New webrev, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.3

Thanks,
Staffan


On 06/08/2015 02:37 PM, Xueming Shen wrote:

Staffan,

(1) ByteArrayInputSteram is no longer needed in ZipFile
(2) You've changed the lock region in ZipFile.getInputSteram. Given we are not
     doing ByteArrayInpusteram for this method, can we just not touch this 
method
     and the class ZipFileInputSteram()?
     The concern is that we did some changes in that area back to 2010 and 
triggered
     a complicated race condition regression [1], it was finally fixed after 
lot of rounds
     of discussion. I still have all those emails in my inbox. It would be 
better to keep
     whatever works for now, instead of re-fresh all the memory (read all those 
emails)
     to figure out if the latest change might have a negative impact.

The getBytes() implementation looks good to me.

Thanks!
-Sherman

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-March/006355.html

On 06/05/2015 11:09 AM, Staffan Friberg wrote:

Hi Sherman,

I have a new webrev which reverts that part, 
http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.2/

Summary of changes
    Reduce lock region in ZipFile.getInputstream
    Add private ZipFile.getBytes that can be used in select places in the JDK 
where all bytes will be read

Could you sponsor this change once it has been reviewed?

Thanks,
Staffan

On 06/03/2015 10:45 AM, Xueming Shen wrote:

Staffan,

I'm not convinced that the benefit here is significant enough to change the
getInputStream() to return a ByteArrayInputStream, given this can be easily
achieved by wrapping the returned byte[] from getBytes(ZipEntry) at user's
site. I would suggest to file a separate rfe on this disagreement and move on
with the agreed getBytes() for now.

Thanks,
-Sherman

On 06/02/2015 10:27 AM, Staffan Friberg wrote:


On 05/22/2015 01:15 PM, Staffan Friberg wrote:

On 05/22/2015 11:51 AM, Xueming Shen wrote:

On 05/22/2015 11:41 AM, Staffan Friberg wrote:


On 05/21/2015 11:00 AM, Staffan Friberg wrote:


On 05/21/2015 09:48 AM, Staffan Friberg wrote:


On 05/20/2015 10:57 AM, Xueming Shen wrote:

On 05/18/2015 06:44 PM, Staffan Friberg wrote:

Hi,

Wanted to get reviews and feedback on this performance improvement for reading 
from JAR/ZIP files during classloading by reducing unnecessary copying and 
reading the entry in one go instead of in small portions. This shows a 
significant improvement when reading a single entry and for a large application 
with 10k classes and 500+ JAR files it improved the startup time by 4%.

For more details on the background and performance results please see the RFE 
entry.

RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0

Cheers,
Staffan


Hi Staffan,

If I did not miss something here, from your use scenario it appears to me the 
only thing you really
need here to help boost your performance is

    byte[] ZipFile.getAllBytes(ZipEntry ze);

You are allocating a byte[] at use side and wrapping it with a ByteBuffer if 
the size is small enough,
otherwise, you letting the ZipFile to allocate a big enough one for you. It 
does not look like you
can re-use that byte[] (has to be wrapped by the ByteArrayInputStream and 
return), why do you
need two different methods here? The logic would be much easier to simply let 
the ZipFile to allocate
the needed buffer with appropriate size, fill the bytes and return, with a 
"OOME" if the entry size
is bigger than 2g.

The only thing we use from the input ze is its name, get the size/csize from 
the jzentry, I don't think
jzentry.csize/size can be "unknown", they are from the "cen" table.

If the real/final use of the bytes is to wrap it with a 
ByteArrayInputStream,why bother using ByteBuffer
here? Shouldn't a direct byte[] with exactly the size of the entry server 
better.

-Sherman

Hi Sherman,

Thanks for the comments. I agree, was starting out with bytebuffer because I 
was hoping to be able to cache things where the buffer was being used, but 
since the buffer is past along further I couldn't figure out a clean way to do 
it.
Will rewrite it to simply just return a buffer, and only wrap it in the 
Resource class getByteBuffer.

What would be your thought on updating the ZipFile.getInputStream to return 
ByteArrayInputStream for small entries? Currently I do that work outside in two 
places and moving it would potentially speed up others reading small entries as 
well.

Thanks,
Staffan

Just realized that my use of ByteArrayInputStream would miss Jar verification 
if enabled so the way to go hear would be to add it if possible to the 
ZipFile.getInputStream.

//Staffan

Hi,

Here is an updated webrev which uses a byte[] directly and also uses 
ByteArrayInputStream in ZipFile for small entries below 128k.


I'm not sure about the benefit of doing the ByteArrayInputStream in 
ZipFile.getInputStream. It has
the consequence of changing the "expected" behavior of getInputStream() 
(instead of return an
input stream waiting for reading, it now reads all bytes in advance), something 
we might not want
to do in a performance tuning. Though it might be reasonable to guess everyone 
get an input stream
is to read all bytes from it later.

-Sherman

http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1

//Staffan

Agree that it will change the behavior slightly, but as you said it is probably 
expected that some one will read the stream eventually.
We could reduce the size further if that makes a difference, if the size is 
below 65k we would not use more memory than the buffer allocated for the 
InflaterStream today.
The total allocation would be slightly larger for deflated entries as we would 
allocate a byte[] for the compressed bytes, but it would be GC:able and not 
kept alive. So from a memory perspective the difference is very limited.

//Staffan

Hi,

Bumping this thread to get some more comments on the concern about changing the 
ZipFile.getInputStream behavior. The benefit of doing this change is that any 
read of small entries from ZIP and JAR files will be much faster and less 
resources will be held, including native resources normally held by the 
ZipInputStream.

The behavior change that will occur is that the full entry will be read as part 
of creating the stream and not lazily as might be expected. However when 
getting a today InputStream zip file will be accessed to read information about 
the size of the entry, so the zip file is already touched when getting an 
InputStream, but not the compressed bytes.

I'm fine with removing this part of the change and just push the private 
getBytes function and the updates to the JDK libraries to use it.

Thanks,
Staffan

Re: RFR 8080640: Reduce copying when reading JAR/ZIP entries

Reply via email to