Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data 

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Wed, 7 Oct 2020 15:10:06 GMT, Alan Bateman  wrote:

>> ### Summary
>> 
>> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
>> which can lead to the `ZipException "invalid
>> entry compressed size"`.
>> ### Motivation
>> 
>> In general it is not safe to directly write a ZipEntry obtained from 
>> `ZipInputStream.getNextEntry()`,
>> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
>> `ZipOutputStream.putNextEntry()` to a
>> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
>> and write it to the `ZipOutputStream` as
>> follows:
>>  ZipEntry entry;
>>  ZipInputStream zis = new ZipInputStream(...);
>>  ZipOutputStream zos = new ZipOutputStream(...);
>>  while((entry = zis.getNextEntry()) != null) {
>>  zos.putNextEntry(entry);
>>  zis.transferTo(zos);
>>  }
>> The problem with this code is that the zip file format does not record the 
>> compression level used for deflation in its
>> entries. In general, it doesn't even mandate a predefined compression ratio 
>> per compression level. Therefore the
>> compressed size recorded in a `ZipEntry` read from a zip file might differ 
>> from the new compressed size produced by the
>> receiving `ZipOutputStream`. Such a difference will result in a 
>> `ZipException` with the following message:
>>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
>> got 7 bytes)
>>  
>> The correct way of copying all entries from one zip file into another 
>> requires the creation of a new `ZipEntry` or at
>> least resetting of the compressed size field. E.g.:
>>  while((entry = zis.getNextEntry()) != null) {
>>  ZipEntry newEntry = new ZipEntry(entry.getName());
>>  zos.putNextEntry(newEntry);
>>  zis.transferTo(zos);
>>  }
>> or:
>>  while((entry = zis.getNextEntry()) != null) {
>>  entry.setCompressedSize(-1);
>>  zos.putNextEntry(entry);
>>  zis.transferTo(zos);
>>  }
>> Unfortunately, there's a lot of user code out there which gets this wrong 
>> and uses the bad coding pattern described
>> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
>> size (expected 12 but got 7 bytes)"` gives
>> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
>> instances of this anti-pattern on GitHub when
>> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
>> wrapper task][1] is affected as well as the
>> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
>> occurrences of this pattern in OpenJDK (see
>> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
>> (e.g.
>> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
>> :).  ### Description  So while this has
>> clearly been a problem before, it apparently wasn't painful enough to 
>> trigger any action from the side of the JDK.
>> However, recently quite some zlib forks with [superior deflate/inflate 
>> performance have evolved][6]. Using them with
>> OpenJDK is quite straight-forward: one just has to configure the alternative 
>> implementations by setting
>> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
>> using these new zlib implementations for
>> selected services in production and the only reason why we haven't enabled 
>> them by default until now is the problem
>> I've just described. The reason why these new libraries uncover the 
>> described anti-pattern much more often is because
>> their compression ratio is slightly different from that of the default zlib 
>> library. This can easily trigger a
>> `ZipException` even if an application is not using a different compression 
>> levels but just a zip file created with
>> another zlib version.  I'd therefore like to propose the following 
>> workaround for the wrong
>> `ZipOutputStream.putNextEntry()` usage in user code:
>> -  ignore the compressed size if it was implicitly determined from the zip 
>> file and not explicitly set by calling
>>`ZipEntry.setCompressedSize()`.
>> 
>> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
>> `JarOutputStream.putNextEntry()` to explain the
>>   problem and why `putNextEntry()` will ignore the compressed size of a 
>> `ZipEntry` if that was set implicitely when
>>   reading that entry from a `ZipFile` or `ZipInputStream`.
>> 
>> 
>> ### Technical Details
>> 
>> A zip file consists of a stream of File Entries followed by a Central 
>> Directory (see [here for a more detailed
>> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
>> followed by the compressed Data and an
>> optional Data Descriptor. The LFH contains the File Name and among other 
>> attributes the Compressed and Uncompressed
>> size and CRC of the Data. In the case where the latter three attributes are 
>> not available at the time when the LFH is
>> created, this fact will be recorded in a flag of the LFH and will trigger 

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Alan Bateman
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data 

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data 

Re: RFR: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

2020-10-07 Thread Lance Andersen
On Tue, 6 Oct 2020 10:02:09 GMT, Volker Simonis  wrote:

> ### Summary
> 
> Work around wrong usage of `ZipOutputStream.putNextEntry()` in user code 
> which can lead to the `ZipException "invalid
> entry compressed size"`.
> ### Motivation
> 
> In general it is not safe to directly write a ZipEntry obtained from 
> `ZipInputStream.getNextEntry()`,
> `ZipFile.entries()`, `ZipFile.getEntry()` or `ZipFile.stream()` with 
> `ZipOutputStream.putNextEntry()` to a
> `ZipOutputStream` and then read the entries data from the `ZipInputStream` 
> and write it to the `ZipOutputStream` as
> follows:
>  ZipEntry entry;
>  ZipInputStream zis = new ZipInputStream(...);
>  ZipOutputStream zos = new ZipOutputStream(...);
>  while((entry = zis.getNextEntry()) != null) {
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> The problem with this code is that the zip file format does not record the 
> compression level used for deflation in its
> entries. In general, it doesn't even mandate a predefined compression ratio 
> per compression level. Therefore the
> compressed size recorded in a `ZipEntry` read from a zip file might differ 
> from the new compressed size produced by the
> receiving `ZipOutputStream`. Such a difference will result in a 
> `ZipException` with the following message:
>  java.util.zip.ZipException: invalid entry compressed size (expected 12 but 
> got 7 bytes)
>  
> The correct way of copying all entries from one zip file into another 
> requires the creation of a new `ZipEntry` or at
> least resetting of the compressed size field. E.g.:
>  while((entry = zis.getNextEntry()) != null) {
>  ZipEntry newEntry = new ZipEntry(entry.getName());
>  zos.putNextEntry(newEntry);
>  zis.transferTo(zos);
>  }
> or:
>  while((entry = zis.getNextEntry()) != null) {
>  entry.setCompressedSize(-1);
>  zos.putNextEntry(entry);
>  zis.transferTo(zos);
>  }
> Unfortunately, there's a lot of user code out there which gets this wrong and 
> uses the bad coding pattern described
> before. Searching for `"java.util.zip.ZipException: invalid entry compressed 
> size (expected 12 but got 7 bytes)"` gives
> ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of 
> instances of this anti-pattern on GitHub when
> doing a code search for `ZipEntry` and `putNextEntry()`. E.g. [Gradle 4.x 
> wrapper task][1] is affected as well as the
> latest version of the [mockableAndroidJar task][2]. I've recently fixed two 
> occurrences of this pattern in OpenJDK (see
> [JDK-8240333][3] and [JDK-8240235][4]) but there still exist more of them 
> (e.g.
> [`test/jdk/java/util/zip/ZipFile/CopyJar.java`][5] which is there since 1999 
> :).  ### Description  So while this has
> clearly been a problem before, it apparently wasn't painful enough to trigger 
> any action from the side of the JDK.
> However, recently quite some zlib forks with [superior deflate/inflate 
> performance have evolved][6]. Using them with
> OpenJDK is quite straight-forward: one just has to configure the alternative 
> implementations by setting
> `LD_LIBRARY_PATH` or `LD_PRELOAD` correspondingly. We've seen big saving by 
> using these new zlib implementations for
> selected services in production and the only reason why we haven't enabled 
> them by default until now is the problem
> I've just described. The reason why these new libraries uncover the described 
> anti-pattern much more often is because
> their compression ratio is slightly different from that of the default zlib 
> library. This can easily trigger a
> `ZipException` even if an application is not using a different compression 
> levels but just a zip file created with
> another zlib version.  I'd therefore like to propose the following workaround 
> for the wrong
> `ZipOutputStream.putNextEntry()` usage in user code:
> -  ignore the compressed size if it was implicitly determined from the zip 
> file and not explicitly set by calling
>`ZipEntry.setCompressedSize()`.
> 
> - Change the API-documentation of `ZipOutputStream.putNextEntry()` and 
> `JarOutputStream.putNextEntry()` to explain the
>   problem and why `putNextEntry()` will ignore the compressed size of a 
> `ZipEntry` if that was set implicitely when
>   reading that entry from a `ZipFile` or `ZipInputStream`.
> 
> 
> ### Technical Details
> 
> A zip file consists of a stream of File Entries followed by a Central 
> Directory (see [here for a more detailed
> specification][7]). Each File Entry is composed of a Local File Header (LFH) 
> followed by the compressed Data and an
> optional Data Descriptor. The LFH contains the File Name and among other 
> attributes the Compressed and Uncompressed
> size and CRC of the Data. In the case where the latter three attributes are 
> not available at the time when the LFH is
> created, this fact will be recorded in a flag of the LFH and will trigger the 
> creation of a Data Descriptor with the
> corresponding information right after the Data