On Sun, 12 Mar 2023 10:00:50 GMT, Eirik Bjorsnos <[email protected]> wrote:
>> Please review this PR which speeds up TestTooManyEntries and clarifies its
>> purpose:
>>
>> - The name 'TestTooManyEntries' does not clearly convey the purpose of the
>> test. What is tested is the validation that the total CEN size fits in a
>> Java byte array. Suggested rename: CenSizeTooLarge
>> - The test creates DEFLATED entries which incurs zlib costs and File Data /
>> Data Descriptors for no additional benefit. We can use STORED instead.
>> - By creating a single LocalDateTime and setting it with
>> `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations.
>> - The name of entries is generated by calling UUID.randomUUID, we could use
>> simple counter instead.
>> - The produced file is unnecessarily large. We know how large a CEN entry
>> is, let's take advantage of that to create a file with the minimal size.
>> - The summary and comments of the test can be improved to help explain the
>> purpose of the test and how we reach the limit being tested.
>>
>> These speedups reduced the runtime from 4 min 17 sec to 1 min 54 sec on my
>> Macbook Pro. The produced ZIP size was reduced from 5.7 GB to 3.5 GB.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Use Integer.toString instead of Long.toString
> Here's a wild idea:
>
> We could inject large extra fields on the CEN entries to inflate the size of
> each CEN entry. This means we get away with much fewer CEN entries which
> again means much less memory. Also, writing mostly empty extra data seems to
> be really fast, at least on my Macbook Pro.
>
> For me, the test now run without the `@requires' and completes in less than 5
> seconds.
>
> The cost is slightly more complicated code. But perhaps still worth the added
> complexity?
>
> ```java
> @BeforeTest
> public void setup() throws IOException {
> hugeZipFile = new File("cen-too-large.zip");
> hugeZipFile.deleteOnExit();
>
> // Length of generated entry names
> int nameLength = 10;
>
> // We use a large extra field to get away with fewer CEN headers
> byte[] extra = makeLargeExtra();
>
> // The size of a single CEN header, including name and extra field
> int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length;
>
> // Determine the number of entries needed to exceed the MAX_CEN_SIZE
> int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1;
>
> ZipEntry[] entries = new ZipEntry[numEntries];
> try (ZipOutputStream zip = new ZipOutputStream(new
> BufferedOutputStream(new FileOutputStream(hugeZipFile)))) {
> // Creating the LocalDataTime once allows faster processing
> LocalDateTime time = LocalDateTime.now();
> // Add entries until MAX_CEN_SIZE is reached
> for (int i = 0; i < numEntries; i++) {
> String name = Integer.toString(i);
> name = "0".repeat(nameLength -name.length()) + name;
> ZipEntry entry = entries[i] = new ZipEntry(name);
> // Use STORED for faster processing
> entry.setMethod(ZipEntry.STORED);
> entry.setSize(0);
> entry.setCrc(0);
> entry.setTimeLocal(time);
> zip.putNextEntry(entry);
> }
> zip.closeEntry();
> for (ZipEntry entry : entries) {
> entry.setExtra(extra);
> }
> }
> }
>
> /**
> * Make an extra field with an 'unknown' tag and the maximum possible data
> size
> * @return
> */
> private static byte[] makeLargeExtra() {
> byte[] extra = new byte[Short.MAX_VALUE];
> ByteBuffer buffer = ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN);
> buffer.putShort((short) 0x9902); // 'unknown' tag
> buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size
> return extra;
> }
> ```
I think this is probably OK. I would probably create a constant with the Data
Size to make it clearer and add more of a detailed comment of this method
(laying out the structure of the Extra field) along with pointing to 4.6.1 of
the APP.note
-------------
PR: https://git.openjdk.org/jdk/pull/12991