On Mon, 30 Oct 2023 15:50:49 GMT, Eirik Bjorsnos <[email protected]> wrote:
>> Please review this PR which speeds up TestTooManyEntries and clarifies its
>> purpose:
>>
>> - The name 'TestTooManyEntries' does not clearly convey the purpose of the
>> test. What is tested is the validation that the total CEN size fits in a
>> Java byte array. Suggested rename: CenSizeTooLarge
>> - The test creates DEFLATED entries which incurs zlib costs and File Data /
>> Data Descriptors for no additional benefit. We can use STORED instead.
>> - By creating a single LocalDateTime and setting it with
>> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations.
>> - The name of entries is generated by calling UUID.randomUUID, we could use
>> simple counter instead.
>> - The produced file is unnecessarily large. We know how large a CEN entry
>> is, let's take advantage of that to create a file with the minimal size.
>> - By adding a maximally large extra field to the CEN entries, we get away
>> with fewer CEN records and save memory
>> - The summary and comments of the test can be improved to help explain the
>> purpose of the test and how we reach the limit being tested.
>> - By writing sparse 'holes' until the last CEN entry, we can reduce required
>> disk space.
>>
>> These speedups reduced the runtime from 4 min 17 sec to 3 seconds on my
>> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to ~4K. Memory
>> consumption is down from 8GB to something like 12MB.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Replace the 'afterLastCEN' boolean with a 'sparse' state variable
test/jdk/java/util/zip/ZipFile/CenSizeTooLarge.java line 204:
> 202: channel.position(position);
> 203: // Check for last CEN record
> 204: if (Arrays.equals(LAST_CEN_COMMENT_BYTES, 0,
> LAST_CEN_COMMENT_BYTES.length, b, off, len)) {
The way the instance of a `SparseOutputStream` is used in this test, it gets
passed to the constructor of `ZipOutputStream`. That then means that there is
no buffering involved when bytes are written out to the output stream
internally by `ZipOutputStream`. The implementation of `ZipOutputStream` writes
out the `ZipEntry`'s comment (if any) in one single `write(byte[])` call on the
outputstream, so it's guaranteed that the `byte[] b` passed in here will
actually have the entire comment (from `off` to `len`). So this
`Arrays.equals(...)` check is then guaranteed to pass (when that specific entry
does get written out). So this check looks good to me.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/12991#discussion_r1393760810