On Tue, 2 Jan 2024 09:31:16 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:

>> Please review this test-only PR which adds test coverage for 
>> `ZipFile.getEntry` under certain charset conditions. 
>> 
>> When `ZipFile.getEntry` is called for an entry which has the `Language 
>> encoding flag` general purpose bit flag set,  then `ZipCoder.UTF8` is used 
>> unconditionally, even when a different charset was supplied to the `ZipFile` 
>> constructor.
>> 
>> It turns out we do not have any testing for this particular case, as can be 
>> verified by commenting out the following line of code in 
>> `ZipFile.Source.getEntryPos`:
>> 
>> 
>> //ZipCoder zc = zipCoderForPos(pos);
>> ``` 
>> 
>> and then running `make test TEST="test/jdk/java/util/zip"`
>> 
>> The current test verifies that the correct ZipCoder is used by 
>> `ZipFile.entries()`, but does not exercise `ZipFile.getEntry` the same way.
>> 
>> Seeing that [JDK-7009069](https://bugs.openjdk.org/browse/JDK-7009069) was 
>> (accidentally?) fixed by 
>> [JDK-8243469](https://bugs.openjdk.org/browse/JDK-8243469), I think it is 
>> worthwhile to add explicit testing for this case to avoid regressions.
>> 
>> While visiting `ZipCoding.java`, I took the opportunity to convert it to 
>> JUnit 5. The conversion and modernization of the code is done in the first 
>> commit 1384850ed51ec845af06dd6d13616f20f8bbaa6a in this PR, while the second 
>> commit 1776b258b0fe8383709ae0c095f2631a4e6237f6 actually adds the code 
>> required to verify the `Language encoding flag` condition for 
>> `ZipFile.getEntry`.
>> 
>> Testing: Verified that the test indeed fails when 
>> `ZipFile.Source.getEntryPos` is updated to use the ZipFile's ZipCoder as 
>> suggested above.
>
> Eirik Bjørsnøs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add more cases for 'language encoding' bit set, opened with a different 
> encoding

> The change to allow user/application specific arbritary charsets to the 
> `ZipFile` constructor seems to have been done long back in Java 1.7 days as 
> part of JDK-4244499.

There is a lot of history in this area. ZIP dates from the days of MS-DOS where 
it used IBM 437 for encoding the names of entries. So different to Java where 
it uses UTF-8 for JAR files and also non-JAR ZIP files. Up to this point (as in 
JDK 7) there were also issues with the UTF-8 decoding and some forms of 
supplementary characters. Sherman got things to a good place in JDK 7 and also 
added the constructors so you can specify the encoding when you obtain it from 
some out of band means.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17207#issuecomment-1873950701

Reply via email to