Re: ZipFileSystem performance regression

Lennart Börjeson Tue, 23 Apr 2019 00:27:55 -0700

Any chance we might get this repaired in time for the Java 13 ramp down?

Best regards,


/Lennart Börjeson

> 16 apr. 2019 kl. 23:02 skrev Langer, Christoph <christoph.lan...@sap.com>:
> 
> Hi,
> 
> I also think the regression should be repaired and maybe we can have an 
> option like "lazy compress" to avoid compression on write but defer it to 
> zipfs closing time.
> 
> It should also be possible to parallelize deflation during close, shouldn't 
> it?
> 
> Best regards
> Christoph
> 
>> -----Original Message-----
>> From: core-libs-dev <core-libs-dev-boun...@openjdk.java.net> On Behalf
>> Of Xueming Shen
>> Sent: Dienstag, 16. April 2019 22:50
>> To: Lennart Börjeson <lenbo...@gmail.com>
>> Cc: core-libs-dev@openjdk.java.net
>> Subject: Re: ZipFileSystem performance regression
>> 
>> Well, have to admitted I didn't expect your use scenario when made the
>> change. Thought as a
>> 
>> filesystem runtime access performance has more weight compared to
>> shutdown performance...
>> 
>> basically you are no using zipfs as a filesystem, but another jar tool
>> that happens to have
>> 
>> better in/out concurrent performance. Yes, back then I was working on
>> using zipfs as a memory
>> 
>> filesystem. One possible usage is that javac to use it as its filesystem
>> (temp?) to write out compiled
>> 
>> class files ... so I thought I can have better performance if I can keep
>> those classes uncompressed
>> 
>> until the zip/jarfs is closed and written to a "jar" file.
>> 
>> That said, regression is a regression, we probably want to get the
>> performance back for your
>> 
>> use scenario. Just wanted to give you guys some background what happened
>> back then.
>> 
>> 
>> -Sherman
>> 
>> 
>> On 4/16/19 12:54 PM, Lennart Börjeson wrote:
>>> I’m using the tool I wrote to compress directories with thousands of log
>> files. The standard zip utility (as well as my utility when run with JDK 12) 
>> takes
>> up to an hour of user time to create the archive, on our server class 40+ 
>> core
>> servers this is reduced to 1–2 minutes.
>>> 
>>> So while I understand the motivation for the change, I don’t get why you
>> would want to use ZipFs for what in essence is a RAM disk, *unless* you
>> want it compressed in memory?
>>> 
>>> Oh well. Do we need a new option for this?
>>> 
>>> /Lennart Börjeson
>>> 
>>> Electrogramma ab iPhono meo missum est
>>> 
>>>> 16 apr. 2019 kl. 21:44 skrev Xueming Shen <xueming.s...@gmail.com>:
>>>> 
>>>> One of the motivations back then is to speed up the performance of
>> accessing
>>>> 
>>>> those entries, means you don't have to deflate/inflate those
>> new/updated entries
>>>> 
>>>> during the lifetime of that zipfilesystem. Those updated entries only get
>> compressed
>>>> 
>>>> when go to storage. So the regression is more like a trade off of
>> performance of
>>>> 
>>>> different usages. (it also simplifies the logic on handing different types 
>>>> of
>> entries ...)
>>>> 
>>>> 
>>>> One idea I experimented long time ago for jartool is to concurrently write
>> out
>>>> 
>>>> entries when need compression ... it does gain some performance
>> improvement
>>>> 
>>>> on multi-cores, but not lots, as it ends up coming back to the main thread
>> to
>>>> 
>>>> write out to the underlying filesystem.
>>>> 
>>>> 
>>>> -Sherman
>>>> 
>>>>> On 4/16/19 5:21 AM, Claes Redestad wrote:
>>>>> Both before and after this regression, it seems the default behavior is
>>>>> not to use a temporary file (until ZFS.sync(), which writes to a temp
>>>>> file and then moves it in place, but that's different from what happens
>>>>> with the useTempFile option enabled). Instead entries (and the backing
>>>>> zip file system) are kept in-memory.
>>>>> 
>>>>> The cause of the issue here is instead that no deflation happens until
>>>>> sync(), even when writing to entries in-memory. Previously, the
>>>>> deflation happened eagerly, then the result of that was copied into
>>>>> the zip file during sync().
>>>>> 
>>>>> I've written a proof-of-concept patch that restores the behavior of
>>>>> eagerly compressing entries when the method is METHOD_DEFLATED
>> and the
>>>>> target is to store byte[]s in-memory (the default scenario):
>>>>> 
>>>>> http://cr.openjdk.java.net/~redestad/scratch/zfs.eager_deflation.00/
>>>>> 
>>>>> This restores performance of parallel zip to that of 11.0.2 for the
>>>>> default case. It still has a similar regression for the case where
>>>>> useTempFile is enabled, but that should be easily addressed if this
>>>>> looks like a way forward?
>>>>> 
>>>>> (I've not yet created a bug as I got too caught up in trying to figure
>>>>> out what was going on here...)
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> /Claes
>>>>> 
>>>>>> On 2019-04-16 09:29, Alan Bateman wrote:
>>>>>>> On 15/04/2019 14:32, Lennart Börjeson wrote:
>>>>>>> :
>>>>>>> 
>>>>>>> Previously, the deflation was done when in the call to Files.copy, thus
>> executed in parallel, and the final ZipFileSystem.close() didn't do anything
>> much.
>>>>>>> 
>>>>>> Can you submit a bug? When creating/updating a zip file with zipfs then
>> the closing the file system creates the zip file. Someone needs to check but 
>> it
>> may have been that the temporary files (on the file system hosting the zip
>> file) were deflated when writing (which is surprising but may have been the
>> case).
>>>>>> 
>>>>>> -Alan

Re: ZipFileSystem performance regression

Reply via email to