On Sat, 11 Apr 2026 08:51:35 GMT, Eirik Bjørsnøs <[email protected]> wrote:
>> This PR reorganizes `java.util.zip.ZipCoder` class hierarchies to handle >> UTF-8 in the base class and move support for general charsets down to a >> subclass. This better reflects reality where most ZIP files are encoded >> using UTF-8. >> >> The motivation can be summarized as: >> >> * Supporting the common case in the base class and the exceptional case in a >> subclass seems more natural and intuitive for maintainers >> * We avoid loading both classes for most JVM instances >> * A small performance improvement is observed, probably caused by unlocking >> effecively-monomorphic JVM optimizations >> >> See core-libs-dev discussion for details: >> https://mail.openjdk.org/pipermail/core-libs-dev/2026-January/158281.html >> >> I have tried to not move methods around too much, making the side-by-side >> diff when ignoring whitespace easier to review. We can discuss further >> cleanups and reorganizaton if so desired. >> >> It may be useful to review commits individually. The first commit fe4a627 >> moves some methods out the way for the second commit 2926398. >> >> This PR also includes the similar change to the `ZipFileSystem` >> implementation of ZipCoder to keep these aligned. Could be pulled out to a >> separate PR if reviewers want to focus on a single area per PR. >> >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Eirik Bjørsnøs has updated the pull request incrementally with one additional > commit since the last revision: > > Make fields 'UTF8_INSTANCE' and 'cs' fields final The rationale for making UTF-8 the default/base implementation makes sense, but the resulting hierarchy is a bit non-obvious on first read: ZipCoder now represents the common UTF-8 **_fast path_**, while CharsetZipCoder handles the more **_general_** arbitrary-charset case. It may be worth making that design/implementation choice explicit in a short implementation note somewhere, so readers do not instinctively read the base class as the generic charset implementation. Separately, did you consider keeping ZipCoder abstract and providing two concrete implementations, one for UTF-8 and one for arbitrary charsets? I’m not suggesting that would necessarily be better, but I was curious whether that alternative was explored and whether it would preserve the same performance benefits while reading a bit more naturally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/30640#issuecomment-4240489891
