On Tue, 7 Mar 2023 07:46:25 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
> ZipOutputStream currently writes directory entries using the DEFLATED > compression method. This does not strictly comply with the APPNOTE.TXT > specification and is also about 10x slower than using the STORED compression > method. > > Because of these concerns, `ZipOutputStream.putNextEntry` should be updated > with an `@apiNote` recommending > the use of the STORED compression method for directory entries. > > Suggested CSR in the first comment. Suggested CSR: ### Compatibility kind none ### Compatibility risk minimal ### Compatibility description This is a documentation-only change ### Summary Add an `@apiNote`to `ZipOutputStream.putNextEntry` recommending that directory entries should be added using the `STORED` compression method. ### Problem ZipOutputStream currently writes directory entries using the default DEFLATE method. This causes file data for a two-byte 'final empty' DEFLATE block to be written, followed by a 16-byte data descriptor. This is in violation of the APPNOTE.txt specification, which mandates that directory entries `MUST NOT` have file data: 4.3.8 File data Immediately following the local header for a file SHOULD be placed the compressed or stored data for the file. If the file is encrypted, the encryption header for the file SHOULD be placed after the local header and before the file data. The series of [local file header][encryption header] [file data][data descriptor] repeats for each file in the .ZIP archive. Zero-byte files, directories, and other file types that contain no content MUST NOT include file data. ``` Additionally, benchmarks show that the writing of these empty DEFLATED directory entries are ~10X slower compared to an empty STORED entry. While the `jar` command uses the STORED method for directory entries, the DEFLATE method still seems to be prevalent: An analysis of the 109 dependency jars of the Spring Petclinic project shows that 65 files had DEFLATE directories while 34 files has STORED directories. ### Solution Add an `@apiNote` to `ZipOutputStream.putNextEntry` recommending the use of theĀ STORED compression method for directory entries. The note should include a snippet which shows the recommended configuration of a directory `ZipEntry`. (As an alternative solution, `putNextEntry` could be updated to change the default compression method to STORED for directory entires. This was deemed as having a too high risk, since users may be depending of the ability to attach arbitrary data to directory entries.) ### Specification - Add the following `@apiNote` to `ZipOutputStream.putNextEntry` Index: src/java.base/share/classes/java/util/zip/ZipOutputStream.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java --- a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java (revision f2b03f9a2c0fca853211e41a1ddf46195dd56698) +++ b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java (revision f57735cf134469b49cd19472680aee778c245771) @@ -191,6 +191,22 @@ * <p> * The current time will be used if the entry has no set modification time. * + * @apiNote When writing a directory entry, the STORED compression method + * should be used and the size and CRC-32 values should be set to 0: + * + * {@snippet lang="java" : + * ZipEntry e = ...; + * if (e.isDirectory()) { + * e.setMethod(ZipEntry.STORED); + * e.setSize(0); + * e.setCrc(0); + * } + * stream.putNextEntry(e); + * } + * + * This ensures strict compliance with the ZIP specification and + * allows optimal performance when processing directory entries. + * * @param e the ZIP entry to be written * @throws ZipException if a ZIP format error has occurred * @throws IOException if an I/O error has occurred ### Here's what the generated API note looks like: <img width="838" alt="image" src="https://user-images.githubusercontent.com/300291/223421803-8541eafb-298f-4064-af67-2d93ee061114.png"> Looking for reviewers for this CSR which adds an `@apiNote` recommending the use of the STORED compression method when writing directory entries in ZipOutputStream: https://bugs.openjdk.org/browse/JDK-8303925 The CSR was initially written by me, then edited by Lance. ------------- PR: https://git.openjdk.org/jdk/pull/12899