On 05/22/2015 01:15 PM, Staffan Friberg wrote:
On 05/22/2015 11:51 AM, Xueming Shen wrote:
On 05/22/2015 11:41 AM, Staffan Friberg wrote:
On 05/21/2015 11:00 AM, Staffan Friberg wrote:
On 05/21/2015 09:48 AM, Staffan Friberg wrote:
On 05/20/2015 10:57 AM, Xueming Shen wrote:
On 05/18/2015 06:44 PM, Staffan Friberg wrote:
Hi,
Wanted to get reviews and feedback on this performance
improvement for reading from JAR/ZIP files during classloading
by reducing unnecessary copying and reading the entry in one go
instead of in small portions. This shows a significant
improvement when reading a single entry and for a large
application with 10k classes and 500+ JAR files it improved the
startup time by 4%.
For more details on the background and performance results
please see the RFE entry.
RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
Cheers,
Staffan
Hi Staffan,
If I did not miss something here, from your use scenario it
appears to me the only thing you really
need here to help boost your performance is
byte[] ZipFile.getAllBytes(ZipEntry ze);
You are allocating a byte[] at use side and wrapping it with a
ByteBuffer if the size is small enough,
otherwise, you letting the ZipFile to allocate a big enough one
for you. It does not look like you
can re-use that byte[] (has to be wrapped by the
ByteArrayInputStream and return), why do you
need two different methods here? The logic would be much easier
to simply let the ZipFile to allocate
the needed buffer with appropriate size, fill the bytes and
return, with a "OOME" if the entry size
is bigger than 2g.
The only thing we use from the input ze is its name, get the
size/csize from the jzentry, I don't think
jzentry.csize/size can be "unknown", they are from the "cen" table.
If the real/final use of the bytes is to wrap it with a
ByteArrayInputStream,why bother using ByteBuffer
here? Shouldn't a direct byte[] with exactly the size of the
entry server better.
-Sherman
Hi Sherman,
Thanks for the comments. I agree, was starting out with bytebuffer
because I was hoping to be able to cache things where the buffer
was being used, but since the buffer is past along further I
couldn't figure out a clean way to do it.
Will rewrite it to simply just return a buffer, and only wrap it
in the Resource class getByteBuffer.
What would be your thought on updating the ZipFile.getInputStream
to return ByteArrayInputStream for small entries? Currently I do
that work outside in two places and moving it would potentially
speed up others reading small entries as well.
Thanks,
Staffan
Just realized that my use of ByteArrayInputStream would miss Jar
verification if enabled so the way to go hear would be to add it if
possible to the ZipFile.getInputStream.
//Staffan
Hi,
Here is an updated webrev which uses a byte[] directly and also uses
ByteArrayInputStream in ZipFile for small entries below 128k.
I'm not sure about the benefit of doing the ByteArrayInputStream in
ZipFile.getInputStream. It has
the consequence of changing the "expected" behavior of
getInputStream() (instead of return an
input stream waiting for reading, it now reads all bytes in advance),
something we might not want
to do in a performance tuning. Though it might be reasonable to guess
everyone get an input stream
is to read all bytes from it later.
-Sherman
http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1
//Staffan
Agree that it will change the behavior slightly, but as you said it is
probably expected that some one will read the stream eventually.
We could reduce the size further if that makes a difference, if the
size is below 65k we would not use more memory than the buffer
allocated for the InflaterStream today.
The total allocation would be slightly larger for deflated entries as
we would allocate a byte[] for the compressed bytes, but it would be
GC:able and not kept alive. So from a memory perspective the
difference is very limited.
//Staffan
Hi,
Bumping this thread to get some more comments on the concern about
changing the ZipFile.getInputStream behavior. The benefit of doing this
change is that any read of small entries from ZIP and JAR files will be
much faster and less resources will be held, including native resources
normally held by the ZipInputStream.
The behavior change that will occur is that the full entry will be read
as part of creating the stream and not lazily as might be expected.
However when getting a today InputStream zip file will be accessed to
read information about the size of the entry, so the zip file is already
touched when getting an InputStream, but not the compressed bytes.
I'm fine with removing this part of the change and just push the private
getBytes function and the updates to the JDK libraries to use it.
Thanks,
Staffan