On Tue, 2 Jan 2024 09:31:16 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:
>> Please review this test-only PR which adds test coverage for >> `ZipFile.getEntry` under certain charset conditions. >> >> When `ZipFile.getEntry` is called for an entry which has the `Language >> encoding flag` general purpose bit flag set, then `ZipCoder.UTF8` is used >> unconditionally, even when a different charset was supplied to the `ZipFile` >> constructor. >> >> It turns out we do not have any testing for this particular case, as can be >> verified by commenting out the following line of code in >> `ZipFile.Source.getEntryPos`: >> >> >> //ZipCoder zc = zipCoderForPos(pos); >> ``` >> >> and then running `make test TEST="test/jdk/java/util/zip"` >> >> The current test verifies that the correct ZipCoder is used by >> `ZipFile.entries()`, but does not exercise `ZipFile.getEntry` the same way. >> >> Seeing that [JDK-7009069](https://bugs.openjdk.org/browse/JDK-7009069) was >> (accidentally?) fixed by >> [JDK-8243469](https://bugs.openjdk.org/browse/JDK-8243469), I think it is >> worthwhile to add explicit testing for this case to avoid regressions. >> >> While visiting `ZipCoding.java`, I took the opportunity to convert it to >> JUnit 5. The conversion and modernization of the code is done in the first >> commit 1384850ed51ec845af06dd6d13616f20f8bbaa6a in this PR, while the second >> commit 1776b258b0fe8383709ae0c095f2631a4e6237f6 actually adds the code >> required to verify the `Language encoding flag` condition for >> `ZipFile.getEntry`. >> >> Testing: Verified that the test indeed fails when >> `ZipFile.Source.getEntryPos` is updated to use the ZipFile's ZipCoder as >> suggested above. > > Eirik Bjørsnøs has updated the pull request incrementally with one additional > commit since the last revision: > > Add more cases for 'language encoding' bit set, opened with a different > encoding > The change to allow user/application specific arbritary charsets to the > `ZipFile` constructor seems to have been done long back in Java 1.7 days as > part of JDK-4244499. There is a lot of history in this area. ZIP dates from the days of MS-DOS where it used IBM 437 for encoding the names of entries. So different to Java where it uses UTF-8 for JAR files and also non-JAR ZIP files. Up to this point (as in JDK 7) there were also issues with the UTF-8 decoding and some forms of supplementary characters. Sherman got things to a good place in JDK 7 and also added the constructors so you can specify the encoding when you obtain it from some out of band means. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17207#issuecomment-1873950701