Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data section

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Wed, 7 Oct 2020 15:10:06 GMT, Alan Bateman  wrote:

>> ### Summary
>> 
>> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
>> which can lead to the `ZipException "invalid
>> entry compressed size"`.
>> ### Motivation
>> 
>> In general it is not safe to directly write a ZipEntry obtained from 
>> `ZipInputStream.getNextEntry()`,
>> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
>> `ZipOutputStream.putNextEntry()` to a
>> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
>> and write it to the `ZipOutputStream` as
>> follows:
>>  ZipEntry entry;
>>  ZipInputStream zis = new ZipInputStream(...);
>>  ZipOutputStream zos = new ZipOutputStream(...);
>>  while((entry = zis.getNextEntry()) != null) {
>>  zos.putNextEntry(entry);
>>  zis.transferTo(zos);
>>  }
>> The problem with this code is that the zip file format does not record the 
>> compression level used for deflation in its
>> entries. In general, it doesn't even mandate a predefined compression ratio 
>> per compression level. Therefore the
>> compressed size recorded in a `ZipEntry` read from a zip file might differ 
>> from the new compressed size produced by the
>> receiving `ZipOutputStream`. Such a difference will result in a 
>> `ZipException` with the following message:
>>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
>> got 7 bytes)
>>  
>> The correct way of copying all entries from one zip file into another 
>> requires the creation of a new `ZipEntry` or at
>> least resetting of the compressed size field. E.g.:
>>  while((entry = zis.getNextEntry()) != null) {
>>  ZipEntry newEntry = new ZipEntry(entry.getName());
>>  zos.putNextEntry(newEntry);
>>  zis.transferTo(zos);
>>  }
>> or:
>>  while((entry = zis.getNextEntry()) != null) {
>>  entry.setCompressedSize(-1);
>>  zos.putNextEntry(entry);
>>  zis.transferTo(zos);
>>  }
>> Unfortunately, there's a lot of user code out there which gets this wrong 
>> and uses the bad coding pattern described
>> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
>> size (expected 12 but got 7 bytes)"` gives
>> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
>> instances of this anti-pattern on GitHub when
>> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
>> wrapper task][1] is affected as well as the
>> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
>> occurrences of this pattern in OpenJDK (see
>> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
>> (e.g.
>> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
>> :).  ### Description  So while this has
>> clearly been a problem before, it apparently wasn't painful enough to 
>> trigger any action from the side of the JDK.
>> However, recently quite some zlib forks with [superior deflate/inflate 
>> performance have evolved][6]. Using them with
>> OpenJDK is quite straight-forward: one just has to configure the alternative 
>> implementations by setting
>> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
>> using these new zlib implementations for
>> selected services in production and the only reason why we haven't enabled 
>> them by default until now is the problem
>> I've just described. The reason why these new libraries uncover the 
>> described anti-pattern much more often is because
>> their compression ratio is slightly different from that of the default zlib 
>> library. This can easily trigger a
>> `ZipException` even if an application is not using a different compression 
>> levels but just a zip file created with
>> another zlib version.  I'd therefore like to propose the following 
>> workaround for the wrong
>> `ZipOutputStream.putNextEntry()` usage in user code:
>> -  ignore the compressed size if it was implicitly determined from the zip 
>> file and not explicitly set by calling
>>`ZipEntry.setCompressedSize()`.
>> 
>> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
>> `JarOutputStream.putNextEntry()` to explain the
>>   problem and why `putNextEntry()` will ignore the compressed size of a 
>> `ZipEntry` if that was set implicitely when
>>   reading that entry from a `ZipFile` or `ZipInputStream`.
>> 
>> 
>> ### Technical Details
>> 
>> A zip file consists of a stream of File Entries followed by a Central 
>> Directory (see [here for a more detailed
>> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
>> followed by the compressed Data and an
>> optional Data Descriptor. The LFH contains the File Name and among other 
>> attributes the Compressed and Uncompressed
>> size and CRC of the Data. In the case where the latter three attributes are 
>> not available at the time when the LFH is
>> created, this fact will be recorded in a flag of the LFH and will trigger 
>>

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Alan Bateman
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data section

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data section

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data section

RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-06 Thread Volker Simonis
### Summary

Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code which 
can lead to the `ZipException "invalid
entry compressed size"`.

### Motivation

In general it is not safe to directly write a ZipEntry obtained from 
`ZipInputStream.getNextEntry()`,
`ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
`ZipOutputStream.putNextEntry()` to a
`ZipOutputStream` and then read the entries data from the `ZipInputStream` and 
write it to the `ZipOutputStream` as
follows:
 ZipEntry entry;
 ZipInputStream zis = new ZipInputStream(...);
 ZipOutputStream zos = new ZipOutputStream(...);
 while((entry = zis.getNextEntry()) != null) {
 zos.putNextEntry(entry);
 zis.transferTo(zos);
 }
The problem with this code is that the zip file format does not record the 
compression level used for deflation in its
entries. In general, it doesn't even mandate a predefined compression ratio per 
compression level. Therefore the
compressed size recorded in a `ZipEntry` read from a zip file might differ from 
the new compressed size produced by the
receiving `ZipOutputStream`. Such a difference will result in a `ZipException` 
with the following message:
 java.util.zip.ZipException: invalid entry compressed size (expected 12 but got 
7 bytes)
 
The correct way of copying all entries from one zip file into another requires 
the creation of a new `ZipEntry` or at
least resetting of the compressed size field. E.g.:
 while((entry = zis.getNextEntry()) != null) {
 ZipEntry newEntry = new ZipEntry(entry.getName());
 zos.putNextEntry(newEntry);
 zis.transferTo(zos);
 }
or:
 while((entry = zis.getNextEntry()) != null) {
 entry.setCompressedSize(-1);
 zos.putNextEntry(entry);
 zis.transferTo(zos);
 }
Unfortunately, there's a lot of user code out there which gets this wrong and 
uses the bad coding pattern described
before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
size (expected 12 but got 7 bytes)"` gives
~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
instances of this anti-pattern on GitHub when
doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
wrapper task][1] is affected as well as the
latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
occurrences of this pattern in OpenJDK (see
[JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them (e.g.
[`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 :).

### Description

So while this has clearly been a problem before, it apparently wasn't painful 
enough to trigger any action from the
side of the JDK. However, recently quite some zlib forks with [superior 
deflate/inflate performance have evolved][6].
Using them with OpenJDK is quite straight-forward: one just has to configure 
the alternative implementations by setting
`LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
using these new zlib implementations for
selected services in production and the only reason why we haven't enabled them 
by default until now is the problem
I've just described. The reason why these new libraries uncover the described 
anti-pattern much more often is because
their compression ratio is slightly different from that of the default zlib 
library. This can easily trigger a
`ZipException` even if an application is not using a different compression 
levels but just a zip file created with
another zlib version.

I'd therefore like to propose the following workaround for the wrong 
`ZipOutputStream.putNextEntry()` usage in user
code:

-  ignore the compressed size if it was implicitly determined from the zip file 
and not explicitly set by calling
   `ZipEntry.setCompressedSize()`.

- Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
`JarOutputStream.putNextEntry()` to explain the
  problem and why `putNextEntry()` will ignore the compressed size of a 
`ZipEntry` if that was set implicitely when
  reading that entry from a `ZipFile` or `ZipInputStream`.


### Technical Details

A zip file consists of a stream of File Entries followed by a Central Directory 
(see [here for a more detailed
specification][7]). Each File Entry is composed of a Local File Header (LFH) 
followed by the compressed Data and an
optional Data Descriptor. The LFH contains the File Name and among other 
attributes the Compressed and Uncompressed
size and CRC of the Data. In the case where the latter three attributes are not 
available at the time when the LFH is
created, this fact will be recorded in a flag of the LFH and will trigger the 
creation of a Data Descriptor with the
corresponding information right after the Data section. Finally, the Central 
Directory contains one Central Directory
File Header (CDFH) for each entry of the  zip archive. The CDFH is an extended 
version of the LFH and the ultimate
reference for the contents of the zip archive. The redundancy between LFH and 
CDF