On Tue, 7 Mar 2023 07:46:25 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:

> ZipOutputStream currently writes directory entries using the DEFLATED 
> compression method. This does not strictly comply with the APPNOTE.TXT 
> specification and is also about 10x slower than using the STORED compression 
> method.
> 
> Because of these concerns, `ZipOutputStream.putNextEntry` should be updated 
> with an `@apiNote` recommending
> the use of the STORED compression method for directory entries.
> 
> Suggested CSR in the first comment.

Suggested CSR:

### Compatibility kind
none

### Compatibility risk
minimal

### Compatibility description
This is a documentation-only change

### Summary
Add an `@apiNote`to `ZipOutputStream.putNextEntry` recommending that directory 
entries should be added using the `STORED` compression method.

### Problem

ZipOutputStream currently writes directory entries using the default DEFLATE 
method. This causes file data for a two-byte 'final empty' DEFLATE block to be 
written, followed by a 16-byte data descriptor.

This is in violation of the APPNOTE.txt specification, which mandates that 
directory entries `MUST NOT` have file data:


 4.3.8  File data

      Immediately following the local header for a file
      SHOULD be placed the compressed or stored data for the file.
      If the file is encrypted, the encryption header for the file 
      SHOULD be placed after the local header and before the file 
      data. The series of [local file header][encryption header]
      [file data][data descriptor] repeats for each file in the 
      .ZIP archive. 

      Zero-byte files, directories, and other file types that 
      contain no content MUST NOT include file data.
``` 

Additionally, benchmarks show that the writing of these empty DEFLATED 
directory entries are ~10X slower compared to an empty STORED entry. 

While the `jar` command uses the STORED method for directory entries, the 
DEFLATE method still seems to be prevalent: An analysis of the 109 dependency 
jars of the Spring Petclinic project shows that 65 files had DEFLATE directories
while 34 files has STORED directories.  

### Solution
Add an `@apiNote` to `ZipOutputStream.putNextEntry` recommending the use of theĀ 
STORED compression method for directory entries. The note should include a 
snippet which shows the recommended configuration of a directory `ZipEntry`.

(As an alternative solution, `putNextEntry` could be updated to change the 
default compression method to STORED for directory entires. This was deemed as 
having a too high risk, since users may be depending of the ability to attach 
arbitrary data to directory entries.)

###  Specification

- Add the following `@apiNote` to `ZipOutputStream.putNextEntry`


Index: src/java.base/share/classes/java/util/zip/ZipOutputStream.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java 
b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
--- a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java    
(revision f2b03f9a2c0fca853211e41a1ddf46195dd56698)
+++ b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java    
(revision f57735cf134469b49cd19472680aee778c245771)
@@ -191,6 +191,22 @@
      * <p>
      * The current time will be used if the entry has no set modification time.
      *
+     * @apiNote When writing a directory entry, the STORED compression method
+     * should be used and the size and CRC-32 values should be set to 0:
+     *
+     * {@snippet lang="java" :
+     *     ZipEntry e = ...;
+     *     if (e.isDirectory()) {
+     *         e.setMethod(ZipEntry.STORED);
+     *         e.setSize(0);
+     *         e.setCrc(0);
+     *     }
+     *     stream.putNextEntry(e);
+     * }
+     *
+     * This ensures strict compliance with the ZIP specification and
+     * allows optimal performance when processing directory entries.
+     *
      * @param     e the ZIP entry to be written
      * @throws    ZipException if a ZIP format error has occurred
      * @throws    IOException if an I/O error has occurred


###

Here's what the generated API note looks like:

<img width="838" alt="image" 
src="https://user-images.githubusercontent.com/300291/223421803-8541eafb-298f-4064-af67-2d93ee061114.png";>

Looking for reviewers for this CSR  which adds an `@apiNote` recommending the 
use of the STORED compression method when writing directory entries in 
ZipOutputStream:
   
https://bugs.openjdk.org/browse/JDK-8303925

The CSR was initially written by me, then edited by Lance.

-------------

PR: https://git.openjdk.org/jdk/pull/12899

Reply via email to