Re: ZipFileSystem performance regression

2019-04-23 Thread Alan Bateman

On 23/04/2019 08:19, Lennart Börjeson wrote:

Any chance we might get this repaired in time for the Java 13 ramp down?



Under discussion/review here:
https://mail.openjdk.java.net/pipermail/nio-dev/2019-April/006043.html




Re: ZipFileSystem performance regression

2019-04-23 Thread Lennart Börjeson
Any chance we might get this repaired in time for the Java 13 ramp down?

Best regards,

/Lennart Börjeson

> 16 apr. 2019 kl. 23:02 skrev Langer, Christoph :
> 
> Hi,
> 
> I also think the regression should be repaired and maybe we can have an 
> option like "lazy compress" to avoid compression on write but defer it to 
> zipfs closing time.
> 
> It should also be possible to parallelize deflation during close, shouldn't 
> it?
> 
> Best regards
> Christoph
> 
>> -Original Message-
>> From: core-libs-dev  On Behalf
>> Of Xueming Shen
>> Sent: Dienstag, 16. April 2019 22:50
>> To: Lennart Börjeson 
>> Cc: core-libs-dev@openjdk.java.net
>> Subject: Re: ZipFileSystem performance regression
>> 
>> Well, have to admitted I didn't expect your use scenario when made the
>> change. Thought as a
>> 
>> filesystem runtime access performance has more weight compared to
>> shutdown performance...
>> 
>> basically you are no using zipfs as a filesystem, but another jar tool
>> that happens to have
>> 
>> better in/out concurrent performance. Yes, back then I was working on
>> using zipfs as a memory
>> 
>> filesystem. One possible usage is that javac to use it as its filesystem
>> (temp?) to write out compiled
>> 
>> class files ... so I thought I can have better performance if I can keep
>> those classes uncompressed
>> 
>> until the zip/jarfs is closed and written to a "jar" file.
>> 
>> That said, regression is a regression, we probably want to get the
>> performance back for your
>> 
>> use scenario. Just wanted to give you guys some background what happened
>> back then.
>> 
>> 
>> -Sherman
>> 
>> 
>> On 4/16/19 12:54 PM, Lennart Börjeson wrote:
>>> I’m using the tool I wrote to compress directories with thousands of log
>> files. The standard zip utility (as well as my utility when run with JDK 12) 
>> takes
>> up to an hour of user time to create the archive, on our server class 40+ 
>> core
>> servers this is reduced to 1–2 minutes.
>>> 
>>> So while I understand the motivation for the change, I don’t get why you
>> would want to use ZipFs for what in essence is a RAM disk, *unless* you
>> want it compressed in memory?
>>> 
>>> Oh well. Do we need a new option for this?
>>> 
>>> /Lennart Börjeson
>>> 
>>> Electrogramma ab iPhono meo missum est
>>> 
>>>> 16 apr. 2019 kl. 21:44 skrev Xueming Shen :
>>>> 
>>>> One of the motivations back then is to speed up the performance of
>> accessing
>>>> 
>>>> those entries, means you don't have to deflate/inflate those
>> new/updated entries
>>>> 
>>>> during the lifetime of that zipfilesystem. Those updated entries only get
>> compressed
>>>> 
>>>> when go to storage. So the regression is more like a trade off of
>> performance of
>>>> 
>>>> different usages. (it also simplifies the logic on handing different types 
>>>> of
>> entries ...)
>>>> 
>>>> 
>>>> One idea I experimented long time ago for jartool is to concurrently write
>> out
>>>> 
>>>> entries when need compression ... it does gain some performance
>> improvement
>>>> 
>>>> on multi-cores, but not lots, as it ends up coming back to the main thread
>> to
>>>> 
>>>> write out to the underlying filesystem.
>>>> 
>>>> 
>>>> -Sherman
>>>> 
>>>>> On 4/16/19 5:21 AM, Claes Redestad wrote:
>>>>> Both before and after this regression, it seems the default behavior is
>>>>> not to use a temporary file (until ZFS.sync(), which writes to a temp
>>>>> file and then moves it in place, but that's different from what happens
>>>>> with the useTempFile option enabled). Instead entries (and the backing
>>>>> zip file system) are kept in-memory.
>>>>> 
>>>>> The cause of the issue here is instead that no deflation happens until
>>>>> sync(), even when writing to entries in-memory. Previously, the
>>>>> deflation happened eagerly, then the result of that was copied into
>>>>> the zip file during sync().
>>>>> 
>>>>> I've written a proof-of-concept patch that restores the behavior of
>>>>> eagerly compressing entries when the meth

RE: ZipFileSystem performance regression

2019-04-18 Thread Anthony Vanelverdinghe
Hi



Alan already proposed the following RFEs when NIO2 was introduced: a memory 
FileSystemProvider [1], and a virtual FileSystemProvider [2]

With these, the javac use case could work as follows:

1) create a VirtualFileSystem, with all sources mounted under “/src”, a 
MemoryFileSystem mounted at “/target”, and a ZipFileSystem mounted at “/dist”

2) from then on, simply work with the VirtualFileSystem: read from /src, write 
compilation artifacts to /target, and finally assemble a .jar file by copying 
from /target to /dist



I think these providers would elegantly solve most use cases. And those 
providers could then be enhanced with further features as desired. For example, 
the memory FileSystem could allow on-the-fly compression, encryption, etc.



[1] https://bugs.openjdk.java.net/browse/JDK-8002315

[2] https://bugs.openjdk.java.net/browse/JDK-8002316



Kind regards

Anthony




From: core-libs-dev  on behalf of Peter 
Levart 
Sent: Wednesday, April 17, 2019 9:23:08 AM
To: Claes Redestad; Lance Andersen; Xueming Shen
Cc: core-libs-dev@openjdk.java.net
Subject: Re: ZipFileSystem performance regression

Just a thought...

Would it be feasible to create a brand new "Generic Caching Filesystem"
implementation that would delegate to another filesystem for persistent
storage (be it ZipFilesystem or any other) and implement interesting
caching strategies (lazy flushing, concurrent flushing, etc...)

So instead of parameterizing a concrete filesystem (e.g. ZipFilesystem),
the filesystems could be layered to achieve the desired performance
characteristics.

What do you think? Is the existing filesystem API suitable for such
layering?

Regards, Peter

On 4/16/19 11:50 PM, Claes Redestad wrote:
> It sounds reasonable to have that as an option, but I'd like to see it
> requested by some user first. And at least one (micro-)benchmark where
> keeping entries uncompressed in memory actually shows significant
> positive impact.
>
> I can see it might have the opposite effect depending on how often that
> memory is inflated/deflated and whether or not the inflated entries
> cause high enough memory pressure to make GC activity spike. javac might
> be one application where it could be negative as it's tuned to run with
> a rather small max heap and already spends significant time doing GCs.
>
> /Claes
>
> On 2019-04-16 23:20, Lance Andersen wrote:
>> Would it be worth  adding a ZIP File System property similar to
>> createNew which enables/disables the change that Claes has made
>> having the default be the pre-jdk 12 functionality?
>>
>>
>>> On Apr 16, 2019, at 4:50 PM, Xueming Shen 
>>> wrote:
>>>
>>> Well, have to admitted I didn't expect your use scenario when made
>>> the change. Thought as a
>>>
>>> filesystem runtime access performance has more weight compared to
>>> shutdown performance...
>>>
>>> basically you are no using zipfs as a filesystem, but another jar
>>> tool that happens to have
>>>
>>> better in/out concurrent performance. Yes, back then I was working
>>> on using zipfs as a memory
>>>
>>> filesystem. One possible usage is that javac to use it as its
>>> filesystem (temp?) to write out compiled
>>>
>>> class files ... so I thought I can have better performance if I can
>>> keep those classes uncompressed
>>>
>>> until the zip/jarfs is closed and written to a "jar" file.
>>>
>>> That said, regression is a regression, we probably want to get the
>>> performance back for your
>>>
>>> use scenario. Just wanted to give you guys some background what
>>> happened back then.
>>>
>>>
>>> -Sherman
>>>
>>>
>>> On 4/16/19 12:54 PM, Lennart Börjeson wrote:
>>>> I’m using the tool I wrote to compress directories with thousands
>>>> of log files. The standard zip utility (as well as my utility when
>>>> run with JDK 12) takes up to an hour of user time to create the
>>>> archive, on our server class 40+ core servers this is reduced to
>>>> 1–2 minutes.
>>>>
>>>> So while I understand the motivation for the change, I don’t get
>>>> why you would want to use ZipFs for what in essence is a RAM disk,
>>>> *unless* you want it compressed in memory?
>>>>
>>>> Oh well. Do we need a new option for this?
>>>>
>>>> /Lennart Börjeson
>>>>
>>>> Electrogramma ab iPhono meo missum est
>>>>
>>>>> 16 apr. 2019 kl. 21:44 skrev Xueming Shen :
>>>>>
>>>>

Re: ZipFileSystem performance regression

2019-04-17 Thread Peter Levart

Just a thought...

Would it be feasible to create a brand new "Generic Caching Filesystem" 
implementation that would delegate to another filesystem for persistent 
storage (be it ZipFilesystem or any other) and implement interesting 
caching strategies (lazy flushing, concurrent flushing, etc...)


So instead of parameterizing a concrete filesystem (e.g. ZipFilesystem), 
the filesystems could be layered to achieve the desired performance 
characteristics.


What do you think? Is the existing filesystem API suitable for such 
layering?


Regards, Peter

On 4/16/19 11:50 PM, Claes Redestad wrote:

It sounds reasonable to have that as an option, but I'd like to see it
requested by some user first. And at least one (micro-)benchmark where
keeping entries uncompressed in memory actually shows significant
positive impact.

I can see it might have the opposite effect depending on how often that
memory is inflated/deflated and whether or not the inflated entries
cause high enough memory pressure to make GC activity spike. javac might
be one application where it could be negative as it's tuned to run with
a rather small max heap and already spends significant time doing GCs.

/Claes

On 2019-04-16 23:20, Lance Andersen wrote:
Would it be worth  adding a ZIP File System property similar to 
createNew which enables/disables the change that Claes has made 
having the default be the pre-jdk 12 functionality?



On Apr 16, 2019, at 4:50 PM, Xueming Shen  
wrote:


Well, have to admitted I didn't expect your use scenario when made 
the change. Thought as a


filesystem runtime access performance has more weight compared to 
shutdown performance...


basically you are no using zipfs as a filesystem, but another jar 
tool that happens to have


better in/out concurrent performance. Yes, back then I was working 
on using zipfs as a memory


filesystem. One possible usage is that javac to use it as its 
filesystem (temp?) to write out compiled


class files ... so I thought I can have better performance if I can 
keep those classes uncompressed


until the zip/jarfs is closed and written to a "jar" file.

That said, regression is a regression, we probably want to get the 
performance back for your


use scenario. Just wanted to give you guys some background what 
happened back then.



-Sherman


On 4/16/19 12:54 PM, Lennart Börjeson wrote:
I’m using the tool I wrote to compress directories with thousands 
of log files. The standard zip utility (as well as my utility when 
run with JDK 12) takes up to an hour of user time to create the 
archive, on our server class 40+ core servers this is reduced to 
1–2 minutes.


So while I understand the motivation for the change, I don’t get 
why you would want to use ZipFs for what in essence is a RAM disk, 
*unless* you want it compressed in memory?


Oh well. Do we need a new option for this?

/Lennart Börjeson

Electrogramma ab iPhono meo missum est


16 apr. 2019 kl. 21:44 skrev Xueming Shen :

One of the motivations back then is to speed up the performance of 
accessing


those entries, means you don't have to deflate/inflate those 
new/updated entries


during the lifetime of that zipfilesystem. Those updated entries 
only get compressed


when go to storage. So the regression is more like a trade off of 
performance of


different usages. (it also simplifies the logic on handing 
different types of entries ...)



One idea I experimented long time ago for jartool is to 
concurrently write out


entries when need compression ... it does gain some performance 
improvement


on multi-cores, but not lots, as it ends up coming back to the 
main thread to


write out to the underlying filesystem.


-Sherman


On 4/16/19 5:21 AM, Claes Redestad wrote:
Both before and after this regression, it seems the default 
behavior is
not to use a temporary file (until ZFS.sync(), which writes to a 
temp
file and then moves it in place, but that's different from what 
happens
with the useTempFile option enabled). Instead entries (and the 
backing

zip file system) are kept in-memory.

The cause of the issue here is instead that no deflation happens 
until

sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().

I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED 
and the

target is to store byte[]s in-memory (the default scenario):

http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/

This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?

(I've not yet created a bug as I got too caught up in trying to 
figure

out what was going on here...)

Thanks!

/Claes


On 2019-04-16 09:29, Alan Bateman wrote:

On 15/04/2019 14:32, Lenna

Re: ZipFileSystem performance regression

2019-04-16 Thread Claes Redestad

It sounds reasonable to have that as an option, but I'd like to see it
requested by some user first. And at least one (micro-)benchmark where
keeping entries uncompressed in memory actually shows significant
positive impact.

I can see it might have the opposite effect depending on how often that
memory is inflated/deflated and whether or not the inflated entries
cause high enough memory pressure to make GC activity spike. javac might
be one application where it could be negative as it's tuned to run with
a rather small max heap and already spends significant time doing GCs.

/Claes

On 2019-04-16 23:20, Lance Andersen wrote:

Would it be worth  adding a ZIP File System property similar to createNew which 
enables/disables the change that Claes has made having the default be the 
pre-jdk 12 functionality?



On Apr 16, 2019, at 4:50 PM, Xueming Shen  wrote:

Well, have to admitted I didn't expect your use scenario when made the change. 
Thought as a

filesystem runtime access performance has more weight compared to shutdown 
performance...

basically you are no using zipfs as a filesystem, but another jar tool that 
happens to have

better in/out concurrent performance. Yes, back then I was working on using 
zipfs as a memory

filesystem. One possible usage is that javac to use it as its filesystem 
(temp?) to write out compiled

class files ... so I thought I can have better performance if I can keep those 
classes uncompressed

until the zip/jarfs is closed and written to a "jar" file.

That said, regression is a regression, we probably want to get the performance 
back for your

use scenario. Just wanted to give you guys some background what happened back 
then.


-Sherman


On 4/16/19 12:54 PM, Lennart Börjeson wrote:

I’m using the tool I wrote to compress directories with thousands of log files. 
The standard zip utility (as well as my utility when run with JDK 12) takes up 
to an hour of user time to create the archive, on our server class 40+ core 
servers this is reduced to 1–2 minutes.

So while I understand the motivation for the change, I don’t get why you would 
want to use ZipFs for what in essence is a RAM disk, *unless* you want it 
compressed in memory?

Oh well. Do we need a new option for this?

/Lennart Börjeson

Electrogramma ab iPhono meo missum est


16 apr. 2019 kl. 21:44 skrev Xueming Shen :

One of the motivations back then is to speed up the performance of accessing

those entries, means you don't have to deflate/inflate those new/updated entries

during the lifetime of that zipfilesystem. Those updated entries only get 
compressed

when go to storage. So the regression is more like a trade off of performance of

different usages. (it also simplifies the logic on handing different types of 
entries ...)


One idea I experimented long time ago for jartool is to concurrently write out

entries when need compression ... it does gain some performance improvement

on multi-cores, but not lots, as it ends up coming back to the main thread to

write out to the underlying filesystem.


-Sherman


On 4/16/19 5:21 AM, Claes Redestad wrote:
Both before and after this regression, it seems the default behavior is
not to use a temporary file (until ZFS.sync(), which writes to a temp
file and then moves it in place, but that's different from what happens
with the useTempFile option enabled). Instead entries (and the backing
zip file system) are kept in-memory.

The cause of the issue here is instead that no deflation happens until
sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().

I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED and the
target is to store byte[]s in-memory (the default scenario):

http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/

This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?

(I've not yet created a bug as I got too caught up in trying to figure
out what was going on here...)

Thanks!

/Claes


On 2019-04-16 09:29, Alan Bateman wrote:

On 15/04/2019 14:32, Lennart Börjeson wrote:
:

Previously, the deflation was done when in the call to Files.copy, thus 
executed in parallel, and the final ZipFileSystem.close() didn't do anything 
much.


Can you submit a bug? When creating/updating a zip file with zipfs then the 
closing the file system creates the zip file. Someone needs to check but it may 
have been that the temporary files (on the file system hosting the zip file) 
were deflated when writing (which is surprising but may have been the case).

-Alan


  
   


Re: ZipFileSystem performance regression

2019-04-16 Thread Lance Andersen
Would it be worth  adding a ZIP File System property similar to createNew which 
enables/disables the change that Claes has made having the default be the 
pre-jdk 12 functionality?


> On Apr 16, 2019, at 4:50 PM, Xueming Shen  wrote:
> 
> Well, have to admitted I didn't expect your use scenario when made the 
> change. Thought as a
> 
> filesystem runtime access performance has more weight compared to shutdown 
> performance...
> 
> basically you are no using zipfs as a filesystem, but another jar tool that 
> happens to have
> 
> better in/out concurrent performance. Yes, back then I was working on using 
> zipfs as a memory
> 
> filesystem. One possible usage is that javac to use it as its filesystem 
> (temp?) to write out compiled
> 
> class files ... so I thought I can have better performance if I can keep 
> those classes uncompressed
> 
> until the zip/jarfs is closed and written to a "jar" file.
> 
> That said, regression is a regression, we probably want to get the 
> performance back for your
> 
> use scenario. Just wanted to give you guys some background what happened back 
> then.
> 
> 
> -Sherman
> 
> 
> On 4/16/19 12:54 PM, Lennart Börjeson wrote:
>> I’m using the tool I wrote to compress directories with thousands of log 
>> files. The standard zip utility (as well as my utility when run with JDK 12) 
>> takes up to an hour of user time to create the archive, on our server class 
>> 40+ core servers this is reduced to 1–2 minutes.
>> 
>> So while I understand the motivation for the change, I don’t get why you 
>> would want to use ZipFs for what in essence is a RAM disk, *unless* you want 
>> it compressed in memory?
>> 
>> Oh well. Do we need a new option for this?
>> 
>> /Lennart Börjeson
>> 
>> Electrogramma ab iPhono meo missum est
>> 
>>> 16 apr. 2019 kl. 21:44 skrev Xueming Shen :
>>> 
>>> One of the motivations back then is to speed up the performance of accessing
>>> 
>>> those entries, means you don't have to deflate/inflate those new/updated 
>>> entries
>>> 
>>> during the lifetime of that zipfilesystem. Those updated entries only get 
>>> compressed
>>> 
>>> when go to storage. So the regression is more like a trade off of 
>>> performance of
>>> 
>>> different usages. (it also simplifies the logic on handing different types 
>>> of entries ...)
>>> 
>>> 
>>> One idea I experimented long time ago for jartool is to concurrently write 
>>> out
>>> 
>>> entries when need compression ... it does gain some performance improvement
>>> 
>>> on multi-cores, but not lots, as it ends up coming back to the main thread 
>>> to
>>> 
>>> write out to the underlying filesystem.
>>> 
>>> 
>>> -Sherman
>>> 
 On 4/16/19 5:21 AM, Claes Redestad wrote:
 Both before and after this regression, it seems the default behavior is
 not to use a temporary file (until ZFS.sync(), which writes to a temp
 file and then moves it in place, but that's different from what happens
 with the useTempFile option enabled). Instead entries (and the backing
 zip file system) are kept in-memory.
 
 The cause of the issue here is instead that no deflation happens until
 sync(), even when writing to entries in-memory. Previously, the
 deflation happened eagerly, then the result of that was copied into
 the zip file during sync().
 
 I've written a proof-of-concept patch that restores the behavior of
 eagerly compressing entries when the method is METHOD_DEFLATED and the
 target is to store byte[]s in-memory (the default scenario):
 
 http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/
 
 This restores performance of parallel zip to that of 11.0.2 for the
 default case. It still has a similar regression for the case where
 useTempFile is enabled, but that should be easily addressed if this
 looks like a way forward?
 
 (I've not yet created a bug as I got too caught up in trying to figure
 out what was going on here...)
 
 Thanks!
 
 /Claes
 
> On 2019-04-16 09:29, Alan Bateman wrote:
>> On 15/04/2019 14:32, Lennart Börjeson wrote:
>> :
>> 
>> Previously, the deflation was done when in the call to Files.copy, thus 
>> executed in parallel, and the final ZipFileSystem.close() didn't do 
>> anything much.
>> 
> Can you submit a bug? When creating/updating a zip file with zipfs then 
> the closing the file system creates the zip file. Someone needs to check 
> but it may have been that the temporary files (on the file system hosting 
> the zip file) were deflated when writing (which is surprising but may 
> have been the case).
> 
> -Alan

 
  

 Lance Andersen| 
Principal Member of Technical Staff | +1.781.4

RE: ZipFileSystem performance regression

2019-04-16 Thread Langer, Christoph
Hi,

I also think the regression should be repaired and maybe we can have an option 
like "lazy compress" to avoid compression on write but defer it to zipfs 
closing time.

It should also be possible to parallelize deflation during close, shouldn't it?

Best regards
Christoph

> -Original Message-
> From: core-libs-dev  On Behalf
> Of Xueming Shen
> Sent: Dienstag, 16. April 2019 22:50
> To: Lennart Börjeson 
> Cc: core-libs-dev@openjdk.java.net
> Subject: Re: ZipFileSystem performance regression
> 
> Well, have to admitted I didn't expect your use scenario when made the
> change. Thought as a
> 
> filesystem runtime access performance has more weight compared to
> shutdown performance...
> 
> basically you are no using zipfs as a filesystem, but another jar tool
> that happens to have
> 
> better in/out concurrent performance. Yes, back then I was working on
> using zipfs as a memory
> 
> filesystem. One possible usage is that javac to use it as its filesystem
> (temp?) to write out compiled
> 
> class files ... so I thought I can have better performance if I can keep
> those classes uncompressed
> 
> until the zip/jarfs is closed and written to a "jar" file.
> 
> That said, regression is a regression, we probably want to get the
> performance back for your
> 
> use scenario. Just wanted to give you guys some background what happened
> back then.
> 
> 
> -Sherman
> 
> 
> On 4/16/19 12:54 PM, Lennart Börjeson wrote:
> > I’m using the tool I wrote to compress directories with thousands of log
> files. The standard zip utility (as well as my utility when run with JDK 12) 
> takes
> up to an hour of user time to create the archive, on our server class 40+ core
> servers this is reduced to 1–2 minutes.
> >
> > So while I understand the motivation for the change, I don’t get why you
> would want to use ZipFs for what in essence is a RAM disk, *unless* you
> want it compressed in memory?
> >
> > Oh well. Do we need a new option for this?
> >
> > /Lennart Börjeson
> >
> > Electrogramma ab iPhono meo missum est
> >
> >> 16 apr. 2019 kl. 21:44 skrev Xueming Shen :
> >>
> >> One of the motivations back then is to speed up the performance of
> accessing
> >>
> >> those entries, means you don't have to deflate/inflate those
> new/updated entries
> >>
> >> during the lifetime of that zipfilesystem. Those updated entries only get
> compressed
> >>
> >> when go to storage. So the regression is more like a trade off of
> performance of
> >>
> >> different usages. (it also simplifies the logic on handing different types 
> >> of
> entries ...)
> >>
> >>
> >> One idea I experimented long time ago for jartool is to concurrently write
> out
> >>
> >> entries when need compression ... it does gain some performance
> improvement
> >>
> >> on multi-cores, but not lots, as it ends up coming back to the main thread
> to
> >>
> >> write out to the underlying filesystem.
> >>
> >>
> >> -Sherman
> >>
> >>> On 4/16/19 5:21 AM, Claes Redestad wrote:
> >>> Both before and after this regression, it seems the default behavior is
> >>> not to use a temporary file (until ZFS.sync(), which writes to a temp
> >>> file and then moves it in place, but that's different from what happens
> >>> with the useTempFile option enabled). Instead entries (and the backing
> >>> zip file system) are kept in-memory.
> >>>
> >>> The cause of the issue here is instead that no deflation happens until
> >>> sync(), even when writing to entries in-memory. Previously, the
> >>> deflation happened eagerly, then the result of that was copied into
> >>> the zip file during sync().
> >>>
> >>> I've written a proof-of-concept patch that restores the behavior of
> >>> eagerly compressing entries when the method is METHOD_DEFLATED
> and the
> >>> target is to store byte[]s in-memory (the default scenario):
> >>>
> >>> http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/
> >>>
> >>> This restores performance of parallel zip to that of 11.0.2 for the
> >>> default case. It still has a similar regression for the case where
> >>> useTempFile is enabled, but that should be easily addressed if this
> >>> looks like a way forward?
> >>>
> >>> (I've not yet created a bug as I got too caught up in trying to figure
> >>> out what was going on here...)
> >>>
> >>> Thanks!
> >>>
> >>> /Claes
> >>>
> >>>> On 2019-04-16 09:29, Alan Bateman wrote:
> >>>>> On 15/04/2019 14:32, Lennart Börjeson wrote:
> >>>>> :
> >>>>>
> >>>>> Previously, the deflation was done when in the call to Files.copy, thus
> executed in parallel, and the final ZipFileSystem.close() didn't do anything
> much.
> >>>>>
> >>>> Can you submit a bug? When creating/updating a zip file with zipfs then
> the closing the file system creates the zip file. Someone needs to check but 
> it
> may have been that the temporary files (on the file system hosting the zip
> file) were deflated when writing (which is surprising but may have been the
> case).
> >>>>
> >>>> -Alan


Re: ZipFileSystem performance regression

2019-04-16 Thread Xueming Shen
Well, have to admitted I didn't expect your use scenario when made the 
change. Thought as a


filesystem runtime access performance has more weight compared to 
shutdown performance...


basically you are no using zipfs as a filesystem, but another jar tool 
that happens to have


better in/out concurrent performance. Yes, back then I was working on 
using zipfs as a memory


filesystem. One possible usage is that javac to use it as its filesystem 
(temp?) to write out compiled


class files ... so I thought I can have better performance if I can keep 
those classes uncompressed


until the zip/jarfs is closed and written to a "jar" file.

That said, regression is a regression, we probably want to get the 
performance back for your


use scenario. Just wanted to give you guys some background what happened 
back then.



-Sherman


On 4/16/19 12:54 PM, Lennart Börjeson wrote:

I’m using the tool I wrote to compress directories with thousands of log files. 
The standard zip utility (as well as my utility when run with JDK 12) takes up 
to an hour of user time to create the archive, on our server class 40+ core 
servers this is reduced to 1–2 minutes.

So while I understand the motivation for the change, I don’t get why you would 
want to use ZipFs for what in essence is a RAM disk, *unless* you want it 
compressed in memory?

Oh well. Do we need a new option for this?

/Lennart Börjeson

Electrogramma ab iPhono meo missum est


16 apr. 2019 kl. 21:44 skrev Xueming Shen :

One of the motivations back then is to speed up the performance of accessing

those entries, means you don't have to deflate/inflate those new/updated entries

during the lifetime of that zipfilesystem. Those updated entries only get 
compressed

when go to storage. So the regression is more like a trade off of performance of

different usages. (it also simplifies the logic on handing different types of 
entries ...)


One idea I experimented long time ago for jartool is to concurrently write out

entries when need compression ... it does gain some performance improvement

on multi-cores, but not lots, as it ends up coming back to the main thread to

write out to the underlying filesystem.


-Sherman


On 4/16/19 5:21 AM, Claes Redestad wrote:
Both before and after this regression, it seems the default behavior is
not to use a temporary file (until ZFS.sync(), which writes to a temp
file and then moves it in place, but that's different from what happens
with the useTempFile option enabled). Instead entries (and the backing
zip file system) are kept in-memory.

The cause of the issue here is instead that no deflation happens until
sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().

I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED and the
target is to store byte[]s in-memory (the default scenario):

http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/

This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?

(I've not yet created a bug as I got too caught up in trying to figure
out what was going on here...)

Thanks!

/Claes


On 2019-04-16 09:29, Alan Bateman wrote:

On 15/04/2019 14:32, Lennart Börjeson wrote:
:

Previously, the deflation was done when in the call to Files.copy, thus 
executed in parallel, and the final ZipFileSystem.close() didn't do anything 
much.


Can you submit a bug? When creating/updating a zip file with zipfs then the 
closing the file system creates the zip file. Someone needs to check but it may 
have been that the temporary files (on the file system hosting the zip file) 
were deflated when writing (which is surprising but may have been the case).

-Alan


Re: ZipFileSystem performance regression

2019-04-16 Thread Claes Redestad

I think this behavior should be reverted and if the new behavior
something that should be opt-in via an option, if at all.

Intrusive behavior changes like this should at the very least have been
signalled via a clear, standalone CSR and not buried in what looked like
a bug fix.

/Claes

On 2019-04-16 21:54, Lennart Börjeson wrote:

I’m using the tool I wrote to compress directories with thousands of log files. 
The standard zip utility (as well as my utility when run with JDK 12) takes up 
to an hour of user time to create the archive, on our server class 40+ core 
servers this is reduced to 1–2 minutes.

So while I understand the motivation for the change, I don’t get why you would 
want to use ZipFs for what in essence is a RAM disk, *unless* you want it 
compressed in memory?

Oh well. Do we need a new option for this?

/Lennart Börjeson

Electrogramma ab iPhono meo missum est


16 apr. 2019 kl. 21:44 skrev Xueming Shen :

One of the motivations back then is to speed up the performance of accessing

those entries, means you don't have to deflate/inflate those new/updated entries

during the lifetime of that zipfilesystem. Those updated entries only get 
compressed

when go to storage. So the regression is more like a trade off of performance of

different usages. (it also simplifies the logic on handing different types of 
entries ...)


One idea I experimented long time ago for jartool is to concurrently write out

entries when need compression ... it does gain some performance improvement

on multi-cores, but not lots, as it ends up coming back to the main thread to

write out to the underlying filesystem.


-Sherman


On 4/16/19 5:21 AM, Claes Redestad wrote:
Both before and after this regression, it seems the default behavior is
not to use a temporary file (until ZFS.sync(), which writes to a temp
file and then moves it in place, but that's different from what happens
with the useTempFile option enabled). Instead entries (and the backing
zip file system) are kept in-memory.

The cause of the issue here is instead that no deflation happens until
sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().

I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED and the
target is to store byte[]s in-memory (the default scenario):

http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/

This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?

(I've not yet created a bug as I got too caught up in trying to figure
out what was going on here...)

Thanks!

/Claes


On 2019-04-16 09:29, Alan Bateman wrote:

On 15/04/2019 14:32, Lennart Börjeson wrote:
:

Previously, the deflation was done when in the call to Files.copy, thus 
executed in parallel, and the final ZipFileSystem.close() didn't do anything 
much.


Can you submit a bug? When creating/updating a zip file with zipfs then the 
closing the file system creates the zip file. Someone needs to check but it may 
have been that the temporary files (on the file system hosting the zip file) 
were deflated when writing (which is surprising but may have been the case).

-Alan


Re: ZipFileSystem performance regression

2019-04-16 Thread Lennart Börjeson
I’m using the tool I wrote to compress directories with thousands of log files. 
The standard zip utility (as well as my utility when run with JDK 12) takes up 
to an hour of user time to create the archive, on our server class 40+ core 
servers this is reduced to 1–2 minutes. 

So while I understand the motivation for the change, I don’t get why you would 
want to use ZipFs for what in essence is a RAM disk, *unless* you want it 
compressed in memory?

Oh well. Do we need a new option for this?

/Lennart Börjeson

Electrogramma ab iPhono meo missum est

> 16 apr. 2019 kl. 21:44 skrev Xueming Shen :
> 
> One of the motivations back then is to speed up the performance of accessing
> 
> those entries, means you don't have to deflate/inflate those new/updated 
> entries
> 
> during the lifetime of that zipfilesystem. Those updated entries only get 
> compressed
> 
> when go to storage. So the regression is more like a trade off of performance 
> of
> 
> different usages. (it also simplifies the logic on handing different types of 
> entries ...)
> 
> 
> One idea I experimented long time ago for jartool is to concurrently write out
> 
> entries when need compression ... it does gain some performance improvement
> 
> on multi-cores, but not lots, as it ends up coming back to the main thread to
> 
> write out to the underlying filesystem.
> 
> 
> -Sherman
> 
>> On 4/16/19 5:21 AM, Claes Redestad wrote:
>> Both before and after this regression, it seems the default behavior is
>> not to use a temporary file (until ZFS.sync(), which writes to a temp
>> file and then moves it in place, but that's different from what happens
>> with the useTempFile option enabled). Instead entries (and the backing
>> zip file system) are kept in-memory.
>> 
>> The cause of the issue here is instead that no deflation happens until
>> sync(), even when writing to entries in-memory. Previously, the
>> deflation happened eagerly, then the result of that was copied into
>> the zip file during sync().
>> 
>> I've written a proof-of-concept patch that restores the behavior of
>> eagerly compressing entries when the method is METHOD_DEFLATED and the
>> target is to store byte[]s in-memory (the default scenario):
>> 
>> http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/
>> 
>> This restores performance of parallel zip to that of 11.0.2 for the
>> default case. It still has a similar regression for the case where
>> useTempFile is enabled, but that should be easily addressed if this
>> looks like a way forward?
>> 
>> (I've not yet created a bug as I got too caught up in trying to figure
>> out what was going on here...)
>> 
>> Thanks!
>> 
>> /Claes
>> 
>>> On 2019-04-16 09:29, Alan Bateman wrote:
 On 15/04/2019 14:32, Lennart Börjeson wrote:
 :
 
 Previously, the deflation was done when in the call to Files.copy, thus 
 executed in parallel, and the final ZipFileSystem.close() didn't do 
 anything much.
 
>>> Can you submit a bug? When creating/updating a zip file with zipfs then the 
>>> closing the file system creates the zip file. Someone needs to check but it 
>>> may have been that the temporary files (on the file system hosting the zip 
>>> file) were deflated when writing (which is surprising but may have been the 
>>> case).
>>> 
>>> -Alan


Re: ZipFileSystem performance regression

2019-04-16 Thread Xueming Shen

One of the motivations back then is to speed up the performance of accessing

those entries, means you don't have to deflate/inflate those new/updated 
entries


during the lifetime of that zipfilesystem. Those updated entries only 
get compressed


when go to storage. So the regression is more like a trade off of 
performance of


different usages. (it also simplifies the logic on handing different 
types of entries ...)



One idea I experimented long time ago for jartool is to concurrently 
write out


entries when need compression ... it does gain some performance improvement

on multi-cores, but not lots, as it ends up coming back to the main 
thread to


write out to the underlying filesystem.


-Sherman

On 4/16/19 5:21 AM, Claes Redestad wrote:

Both before and after this regression, it seems the default behavior is
not to use a temporary file (until ZFS.sync(), which writes to a temp
file and then moves it in place, but that's different from what happens
with the useTempFile option enabled). Instead entries (and the backing
zip file system) are kept in-memory.

The cause of the issue here is instead that no deflation happens until
sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().

I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED and the
target is to store byte[]s in-memory (the default scenario):

http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/

This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?

(I've not yet created a bug as I got too caught up in trying to figure
out what was going on here...)

Thanks!

/Claes

On 2019-04-16 09:29, Alan Bateman wrote:

On 15/04/2019 14:32, Lennart Börjeson wrote:

:

Previously, the deflation was done when in the call to Files.copy, 
thus executed in parallel, and the final ZipFileSystem.close() 
didn't do anything much.


Can you submit a bug? When creating/updating a zip file with zipfs 
then the closing the file system creates the zip file. Someone needs 
to check but it may have been that the temporary files (on the file 
system hosting the zip file) were deflated when writing (which is 
surprising but may have been the case).


-Alan


Re: ZipFileSystem performance regression

2019-04-16 Thread Claes Redestad

Filed: https://bugs.openjdk.java.net/browse/JDK-8222532

I've marked the bug as potentially affecting the upcoming 11.0.3
release, since the issue that caused this regression is purportedly
being backported to that version.

/Claes

On 2019-04-16 14:21, Claes Redestad wrote:

Both before and after this regression, it seems the default behavior is
not to use a temporary file (until ZFS.sync(), which writes to a temp
file and then moves it in place, but that's different from what happens
with the useTempFile option enabled). Instead entries (and the backing
zip file system) are kept in-memory.

The cause of the issue here is instead that no deflation happens until
sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().

I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED and the
target is to store byte[]s in-memory (the default scenario):

http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/

This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?

(I've not yet created a bug as I got too caught up in trying to figure
out what was going on here...)

Thanks!

/Claes

On 2019-04-16 09:29, Alan Bateman wrote:

On 15/04/2019 14:32, Lennart Börjeson wrote:

:

Previously, the deflation was done when in the call to Files.copy, 
thus executed in parallel, and the final ZipFileSystem.close() didn't 
do anything much.


Can you submit a bug? When creating/updating a zip file with zipfs 
then the closing the file system creates the zip file. Someone needs 
to check but it may have been that the temporary files (on the file 
system hosting the zip file) were deflated when writing (which is 
surprising but may have been the case).


-Alan


Re: ZipFileSystem performance regression

2019-04-16 Thread Claes Redestad

Both before and after this regression, it seems the default behavior is
not to use a temporary file (until ZFS.sync(), which writes to a temp
file and then moves it in place, but that's different from what happens
with the useTempFile option enabled). Instead entries (and the backing
zip file system) are kept in-memory.

The cause of the issue here is instead that no deflation happens until
sync(), even when writing to entries in-memory. Previously, the
deflation happened eagerly, then the result of that was copied into
the zip file during sync().

I've written a proof-of-concept patch that restores the behavior of
eagerly compressing entries when the method is METHOD_DEFLATED and the
target is to store byte[]s in-memory (the default scenario):

http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/

This restores performance of parallel zip to that of 11.0.2 for the
default case. It still has a similar regression for the case where
useTempFile is enabled, but that should be easily addressed if this
looks like a way forward?

(I've not yet created a bug as I got too caught up in trying to figure
out what was going on here...)

Thanks!

/Claes

On 2019-04-16 09:29, Alan Bateman wrote:

On 15/04/2019 14:32, Lennart Börjeson wrote:

:

Previously, the deflation was done when in the call to Files.copy, 
thus executed in parallel, and the final ZipFileSystem.close() didn't 
do anything much.


Can you submit a bug? When creating/updating a zip file with zipfs then 
the closing the file system creates the zip file. Someone needs to check 
but it may have been that the temporary files (on the file system 
hosting the zip file) were deflated when writing (which is surprising 
but may have been the case).


-Alan


Re: ZipFileSystem performance regression

2019-04-16 Thread Alan Bateman

On 15/04/2019 14:32, Lennart Börjeson wrote:

:

Previously, the deflation was done when in the call to Files.copy, thus 
executed in parallel, and the final ZipFileSystem.close() didn't do anything 
much.

Can you submit a bug? When creating/updating a zip file with zipfs then 
the closing the file system creates the zip file. Someone needs to check 
but it may have been that the temporary files (on the file system 
hosting the zip file) were deflated when writing (which is surprising 
but may have been the case).


-Alan


RE: ZipFileSystem performance regression

2019-04-16 Thread Langer, Christoph
Hi Claes,

will you open a bug for this?

Thanks
Christoph

> -Original Message-
> From: core-libs-dev  On Behalf
> Of Lennart Börjeson
> Sent: Dienstag, 16. April 2019 09:05
> To: Claes Redestad 
> Cc: core-libs-dev@openjdk.java.net
> Subject: Re: ZipFileSystem performance regression
> 
> Very good, thank you!
> 
> Also note that the "new" implementation also requires *a lot* more heap,
> since all *uncompressed* file content is copied to the heap before deflating.
> 
> Best regards,
> 
> /Lennart Börjeson
> 
> > 15 apr. 2019 kl. 18:34 skrev Claes Redestad :
> >
> > Hi Lennart,
> >
> > I can reproduce this locally, and have narrowed down to
> > https://bugs.openjdk.java.net/browse/JDK-8034802 as the cause.
> >
> > As you say the compression is deferred to ZipFileSystem.close() now,
> > whereas previously it happened eagerly. We will have to analyze the
> > changes more in-depth to try and see why this is the case.
> >
> > Thanks!
> >
> > /Claes
> >
> > On 2019-04-15 15:32, Lennart Börjeson wrote:
> >> I have made a small command-line utility which creates zip archives by
> compressing the input files in parallel.
> >> I do this by calling Files.copy(input, zipOutputStream) in a parallel 
> >> Stream
> over all input files.
> >> I have run this with Java 1.8, 9, 10, and 11, on both my local laptop and 
> >> on
> server-class machines with up to 40 cores.
> >> It scales rather well and I get the expected speedup, i.e. roughly
> proportional to the number of cores.
> >> With OpenJDK 12, however, I get no speedup whatsoever. My
> understanding is that in JDK 12, the copy method simply
> >> transfers the contents of the input stream to a ByteArrayOutputStream,
> which is then deflated when I close the ZipFileSystem (by the
> ZipFileSystem.sync() method).
> >> Previously, the deflation was done when in the call to Files.copy, thus
> executed in parallel, and the final ZipFileSystem.close() didn't do anything
> much.
> >> (But I may of course be wrong. In case there's a simpler explanation and
> something I can remedy in my code, please let me know.)
> >> I have a small GitHub gist for the utility here:
> https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923
> <https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923>
> >> Steps to reproduce (timings on my 8-core MacBook Pro):
> >> 1) Get the contents of the gist as a single file, Zip.java
> >> 2) Compile it using Java 8.
> >> $ export JAVA_HOME=
> >> $ javac -encoding utf8 Zip.java
> >> 3) run on a sufficiently large number of files to exercise the parallelity:
> (I've used about 70 text files ca 60MB each)
> >> $ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
> >> Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options
> [PARALLEL]
> >> ...
> >> completed zip archive, now closing... done!
> >> real   0m35.558s
> >> user   3m58.134s
> >> sys0m5.543s
> >> As is evident from the ratio between "user time" and "real time", all cores
> have been busy most of the time.
> >> (Running with JDK 9, 10, and 11 produces similar timings.)
> >> But running with JDK 12 defeats the parallelism:
> >> $ export JAVA_HOME=
> >> $ rm /tmp/test.zip # From previous run
> >> $ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
> >> Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options
> [PARALLEL]
> >> ...
> >> completed zip archive, now closing... done!
> >> real   3m1.187s
> >> user   3m5.422s
> >> sys0m12.396s
> >> Now there's almost no speedup. When observing the output, note that
> the ZipFileSystem.close() method is called immediately after the "now
> closing..." output, and "Done!" Is written when it returns, and when running
> with JDK 12 almost all running time is apparently spent there.
> >> I'm hoping the previous behaviour could somehow be restored, i.e. that
> deflation actually happens when I'm copying the input files to the
> ZipFileSystem, and not when I close it.
> >> Best regards,
> >> /Lennart Börjeson
> >>> 12 apr. 2019 kl. 14:25 skrev Lennart Börjeson :
> >>>
> >>> I've found what I believe is a rather severe performance regression in
> ZipFileSystem. 1.8 and 11 runs OK, 12 does not.
> >>>
> >>> Is this the right forum to report such issues?
> >>>
> >>> Best regards,
> >>>
> >>> /Lennart Börjeson



Re: ZipFileSystem performance regression

2019-04-16 Thread Lennart Börjeson
Very good, thank you!

Also note that the "new" implementation also requires *a lot* more heap, since 
all *uncompressed* file content is copied to the heap before deflating.

Best regards,

/Lennart Börjeson

> 15 apr. 2019 kl. 18:34 skrev Claes Redestad :
> 
> Hi Lennart,
> 
> I can reproduce this locally, and have narrowed down to
> https://bugs.openjdk.java.net/browse/JDK-8034802 as the cause.
> 
> As you say the compression is deferred to ZipFileSystem.close() now,
> whereas previously it happened eagerly. We will have to analyze the
> changes more in-depth to try and see why this is the case.
> 
> Thanks!
> 
> /Claes
> 
> On 2019-04-15 15:32, Lennart Börjeson wrote:
>> I have made a small command-line utility which creates zip archives by 
>> compressing the input files in parallel.
>> I do this by calling Files.copy(input, zipOutputStream) in a parallel Stream 
>> over all input files.
>> I have run this with Java 1.8, 9, 10, and 11, on both my local laptop and on 
>> server-class machines with up to 40 cores.
>> It scales rather well and I get the expected speedup, i.e. roughly 
>> proportional to the number of cores.
>> With OpenJDK 12, however, I get no speedup whatsoever. My understanding is 
>> that in JDK 12, the copy method simply
>> transfers the contents of the input stream to a ByteArrayOutputStream, which 
>> is then deflated when I close the ZipFileSystem (by the ZipFileSystem.sync() 
>> method).
>> Previously, the deflation was done when in the call to Files.copy, thus 
>> executed in parallel, and the final ZipFileSystem.close() didn't do anything 
>> much.
>> (But I may of course be wrong. In case there's a simpler explanation and 
>> something I can remedy in my code, please let me know.)
>> I have a small GitHub gist for the utility here: 
>> https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923 
>> 
>> Steps to reproduce (timings on my 8-core MacBook Pro):
>> 1) Get the contents of the gist as a single file, Zip.java
>> 2) Compile it using Java 8.
>> $ export JAVA_HOME=
>> $ javac -encoding utf8 Zip.java
>> 3) run on a sufficiently large number of files to exercise the parallelity: 
>> (I've used about 70 text files ca 60MB each)
>> $ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
>> Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options 
>> [PARALLEL]
>> ...
>> completed zip archive, now closing... done!
>> real 0m35.558s
>> user 3m58.134s
>> sys  0m5.543s
>> As is evident from the ratio between "user time" and "real time", all cores 
>> have been busy most of the time.
>> (Running with JDK 9, 10, and 11 produces similar timings.)
>> But running with JDK 12 defeats the parallelism:
>> $ export JAVA_HOME=
>> $ rm /tmp/test.zip # From previous run
>> $ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
>> Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options 
>> [PARALLEL]
>> ...
>> completed zip archive, now closing... done!
>> real 3m1.187s
>> user 3m5.422s
>> sys  0m12.396s
>> Now there's almost no speedup. When observing the output, note that the 
>> ZipFileSystem.close() method is called immediately after the "now 
>> closing..." output, and "Done!" Is written when it returns, and when running 
>> with JDK 12 almost all running time is apparently spent there.
>> I'm hoping the previous behaviour could somehow be restored, i.e. that 
>> deflation actually happens when I'm copying the input files to the 
>> ZipFileSystem, and not when I close it.
>> Best regards,
>> /Lennart Börjeson
>>> 12 apr. 2019 kl. 14:25 skrev Lennart Börjeson :
>>> 
>>> I've found what I believe is a rather severe performance regression in 
>>> ZipFileSystem. 1.8 and 11 runs OK, 12 does not.
>>> 
>>> Is this the right forum to report such issues?
>>> 
>>> Best regards,
>>> 
>>> /Lennart Börjeson



Re: ZipFileSystem performance regression

2019-04-15 Thread Claes Redestad

Hi Lennart,

I can reproduce this locally, and have narrowed down to
https://bugs.openjdk.java.net/browse/JDK-8034802 as the cause.

As you say the compression is deferred to ZipFileSystem.close() now,
whereas previously it happened eagerly. We will have to analyze the
changes more in-depth to try and see why this is the case.

Thanks!

/Claes

On 2019-04-15 15:32, Lennart Börjeson wrote:

I have made a small command-line utility which creates zip archives by 
compressing the input files in parallel.

I do this by calling Files.copy(input, zipOutputStream) in a parallel Stream 
over all input files.

I have run this with Java 1.8, 9, 10, and 11, on both my local laptop and on 
server-class machines with up to 40 cores.

It scales rather well and I get the expected speedup, i.e. roughly proportional 
to the number of cores.

With OpenJDK 12, however, I get no speedup whatsoever. My understanding is that 
in JDK 12, the copy method simply
transfers the contents of the input stream to a ByteArrayOutputStream, which is 
then deflated when I close the ZipFileSystem (by the ZipFileSystem.sync() 
method).

Previously, the deflation was done when in the call to Files.copy, thus 
executed in parallel, and the final ZipFileSystem.close() didn't do anything 
much.

(But I may of course be wrong. In case there's a simpler explanation and 
something I can remedy in my code, please let me know.)

I have a small GitHub gist for the utility here: 
https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923 



Steps to reproduce (timings on my 8-core MacBook Pro):

1) Get the contents of the gist as a single file, Zip.java
2) Compile it using Java 8.

$ export JAVA_HOME=
$ javac -encoding utf8 Zip.java

3) run on a sufficiently large number of files to exercise the parallelity: 
(I've used about 70 text files ca 60MB each)

$ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options [PARALLEL]
...
completed zip archive, now closing... done!

real0m35.558s
user3m58.134s
sys 0m5.543s



As is evident from the ratio between "user time" and "real time", all cores 
have been busy most of the time.

(Running with JDK 9, 10, and 11 produces similar timings.)


But running with JDK 12 defeats the parallelism:

$ export JAVA_HOME=
$ rm /tmp/test.zip # From previous run
$ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options [PARALLEL]
...
completed zip archive, now closing... done!

real3m1.187s
user3m5.422s
sys 0m12.396s



Now there's almost no speedup. When observing the output, note that the ZipFileSystem.close() 
method is called immediately after the "now closing..." output, and "Done!" Is 
written when it returns, and when running with JDK 12 almost all running time is apparently spent 
there.

I'm hoping the previous behaviour could somehow be restored, i.e. that 
deflation actually happens when I'm copying the input files to the 
ZipFileSystem, and not when I close it.


Best regards,

/Lennart Börjeson



12 apr. 2019 kl. 14:25 skrev Lennart Börjeson :

I've found what I believe is a rather severe performance regression in 
ZipFileSystem. 1.8 and 11 runs OK, 12 does not.

Is this the right forum to report such issues?

Best regards,

/Lennart Börjeson




Re: ZipFileSystem performance regression

2019-04-15 Thread Lennart Börjeson
I have made a small command-line utility which creates zip archives by 
compressing the input files in parallel.

I do this by calling Files.copy(input, zipOutputStream) in a parallel Stream 
over all input files.

I have run this with Java 1.8, 9, 10, and 11, on both my local laptop and on 
server-class machines with up to 40 cores.

It scales rather well and I get the expected speedup, i.e. roughly proportional 
to the number of cores.

With OpenJDK 12, however, I get no speedup whatsoever. My understanding is that 
in JDK 12, the copy method simply
transfers the contents of the input stream to a ByteArrayOutputStream, which is 
then deflated when I close the ZipFileSystem (by the ZipFileSystem.sync() 
method).

Previously, the deflation was done when in the call to Files.copy, thus 
executed in parallel, and the final ZipFileSystem.close() didn't do anything 
much.

(But I may of course be wrong. In case there's a simpler explanation and 
something I can remedy in my code, please let me know.)

I have a small GitHub gist for the utility here: 
https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923 



Steps to reproduce (timings on my 8-core MacBook Pro):

1) Get the contents of the gist as a single file, Zip.java
2) Compile it using Java 8.  

$ export JAVA_HOME=
$ javac -encoding utf8 Zip.java

3) run on a sufficiently large number of files to exercise the parallelity: 
(I've used about 70 text files ca 60MB each)

$ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options [PARALLEL]
...
completed zip archive, now closing... done!

real0m35.558s
user3m58.134s
sys 0m5.543s



As is evident from the ratio between "user time" and "real time", all cores 
have been busy most of the time.

(Running with JDK 9, 10, and 11 produces similar timings.)


But running with JDK 12 defeats the parallelism:

$ export JAVA_HOME=
$ rm /tmp/test.zip # From previous run
$ time java -Xmx6g Zip -p /tmp/test.zip /*.log.
Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options [PARALLEL]
...
completed zip archive, now closing... done!

real3m1.187s
user3m5.422s
sys 0m12.396s



Now there's almost no speedup. When observing the output, note that the 
ZipFileSystem.close() method is called immediately after the "now closing..." 
output, and "Done!" Is written when it returns, and when running with JDK 12 
almost all running time is apparently spent there.

I'm hoping the previous behaviour could somehow be restored, i.e. that 
deflation actually happens when I'm copying the input files to the 
ZipFileSystem, and not when I close it.


Best regards,

/Lennart Börjeson


> 12 apr. 2019 kl. 14:25 skrev Lennart Börjeson :
> 
> I've found what I believe is a rather severe performance regression in 
> ZipFileSystem. 1.8 and 11 runs OK, 12 does not.
> 
> Is this the right forum to report such issues?
> 
> Best regards,
> 
> /Lennart Börjeson



RE: ZipFileSystem performance regression

2019-04-12 Thread Langer, Christoph
Hi Lennart,

sure, excited to hear it... 😉

Thanks
Christoph

> -Original Message-
> From: core-libs-dev  On Behalf
> Of Lennart Börjeson
> Sent: Freitag, 12. April 2019 14:26
> To: core-libs-dev@openjdk.java.net
> Subject: ZipFileSystem performance regression
> 
> I've found what I believe is a rather severe performance regression in
> ZipFileSystem. 1.8 and 11 runs OK, 12 does not.
> 
> Is this the right forum to report such issues?
> 
> Best regards,
> 
> /Lennart Börjeson


Re: ZipFileSystem performance regression

2019-04-12 Thread Claes Redestad

On 2019-04-12 14:25, Lennart Börjeson wrote:

Is this the right forum to report such issues?


I'm sure there are some friendly OpenJDK contributors around
these parts who'd be interested in a description and a reproducer.

/Claes