On Wed, 30 Jun 2021 16:05:40 GMT, Jaikiran Pai <j...@openjdk.org> wrote:

>> Can I please get a review for this proposed fix for the issue reported in 
>> https://bugs.openjdk.java.net/browse/JDK-8190753?
>> 
>> The commit here checks for the size of the zip entry before trying to create 
>> a `ByteArrayOutputStream` for that entry's content. A new jtreg test has 
>> been included in this commit to reproduce the issue and verify the fix.
>> 
>> P.S: It's still a bit arguable whether it's a good idea to create the 
>> `ByteArrayOutputStream` with an initial size potentially as large as the 
>> `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default 
>> value. However, I think that can be addressed separately while dealing with 
>> https://bugs.openjdk.java.net/browse/JDK-8011146
>
> Jaikiran Pai has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - an initial proposed test for testing compressed size entry greater than 2gb
>  - minor summary update on test

Based on the inputs received, I've now updated this PR to enhance the ZipFS to 
allow for rolling over the outputstream data, into a temporary file after 
reaching a threshold. Details follow:

- A new "tempFileThreshold" property has been introduced for ZipFileSystem. 
This property can be passed like other existing properties when creating the 
filesystem. The value of this property is expected to be a size in bytes and 
represents the threshold which will be used to decide whether and when to use a 
temporary file for outputstreams of zip file entries returned by the 
ZipFileSystem.

- The "tempFileThreshold" property is optional and if not set to any explicit 
value will default to 10MB. In other words, this feature is by default enabled, 
without the user having to do any specific configuration (not even the existing 
"useTempFile" property is requried to be set).

- To disable the threshold based temp file creation feature, a value of 0 or a 
negative value can be passed.

- A new (internal) `FileRolloverOutputStream` has been introduced in 
`ZipFileSystem` and forms the core of this enhancement. It extends the 
`ByteArrayOutputStream` and its write methods have the additional ability to 
rollover the current and subsequently written data into a temporary file 
representing that zip file entry.

- A new `ZipFSOutputStreamTest` has been added with the sole focus of verifying 
the usage of this new "tempFileThreshold" property.

- The previously added `LargeEntrySizeTest` and the manual 
`LargeCompressedEntrySizeTest` continue to stay and are mainly there to test 
the large file sizes and deflated sizes of the zip entries and verify that the 
original reported JBS issue is fixed by this enhancement. One important thing 
to note about these tests is that I've now removed the explicit "@requires" 
(memory) requirements, since after this enhancement (with "tempFileThreshold" 
by default enabled as in those tests), it no longer should require any specific 
high memory systems to run these tests.

There are still some decisions to be made:

1. Should we introduce this new property or should we enhance the existing 
"useTempFile" property to allow a size to be passed? Specifically, can 
we/should we use that existing property to allow users to set the following 
values:

        - "true", this would imply the temp file feature is enabled always 
irrespective of the zip entry size. This is how the value of "true" is 
currently handled before the patch in this PR. So no change in behaviour.

        - a byte size, represented as a `String` or an integer. This would 
imply that the user wants to enable the temp file feature, but only when the 
size or compressed size of the zip entry reaches a threshold specified by this 
value. This would mean that for sizes lesser than this the ZipFS implementation 
would use a ByteArrayOutputStream and would only rollover to a temp file when 
the threshold is reached.

        - "false", this would disable the temp file feature completely and 
outputstreams for zip entries of the ZipFS instance will always use 
ByteArrayOutputStream

        - value of "0" or "-1". This would be the same as specifying "false" 
value.

Using the existing property and just one property to control the temp file 
creation semantics will help avoid having to deal with 2 different properties 
("useTempFile" and the "tempFileThreshold"). Plus it would also prevent any 
potentially conflicting user specified values for them. For example, we won't 
have to decide what to do when a user sets useTempFile=false and 
tempFileThreshold=1234.

If we do decide to introduce this new property then some small amount of 
additional work needs to be done in this implementation to make sure 
semantically conflicting values for these 2 properties are handled correctly. I 
decided to skip that part in this round of the PR till we reached a decision 
about the properties.

2. Given that this is a new enhancement, I believe this requires a CSR.

3. Should this PR be now linked to 
https://bugs.openjdk.java.net/browse/JDK-8011146 instead of 
https://bugs.openjdk.java.net/browse/JDK-8190753?

4. I've never previously created a manual test case. The 
`LargeCompressedEntrySizeTest` in this PR is expected to be a manual test case 
(given how long it might take to run on various different systems). The only 
difference between this test case and other jtreg automated tests is the 
absence of a `@test` on this one. Is this how manual tests are written or is 
there some other way?

-------------

PR: https://git.openjdk.java.net/jdk/pull/4607

Reply via email to