On Sat, 11 Apr 2026 08:51:35 GMT, Eirik Bjørsnøs <[email protected]> wrote:

>> This PR reorganizes `java.util.zip.ZipCoder` class hierarchies to handle 
>> UTF-8 in the base class and move support for general charsets down to a 
>> subclass. This better reflects reality where most ZIP files are encoded 
>> using UTF-8. 
>> 
>> The motivation can be summarized as:
>> 
>> * Supporting the common case in the base class and the exceptional case in a 
>> subclass seems more natural and intuitive for maintainers
>> * We avoid loading both classes for most JVM instances
>> * A small performance improvement is observed, probably caused by unlocking 
>> effecively-monomorphic JVM optimizations
>> 
>> See core-libs-dev discussion for details: 
>> https://mail.openjdk.org/pipermail/core-libs-dev/2026-January/158281.html
>> 
>> I have tried to not move methods around too much,  making the side-by-side 
>> diff when ignoring whitespace easier to review. We can discuss further 
>> cleanups and reorganizaton if so desired. 
>> 
>> It may be useful to review commits individually. The first commit fe4a627 
>> moves some methods out the way for the second commit 2926398. 
>> 
>> This PR also includes the similar change to the `ZipFileSystem` 
>> implementation of ZipCoder to keep these aligned. Could be pulled out to a 
>> separate PR if reviewers want to focus on a single area per PR.
>> 
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Eirik Bjørsnøs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Make fields 'UTF8_INSTANCE' and 'cs' fields final

The rationale for making UTF-8 the default/base implementation makes sense, but 
the resulting hierarchy is a bit non-obvious on first read: ZipCoder now 
represents the common UTF-8 **_fast path_**, while CharsetZipCoder handles the 
more **_general_** arbitrary-charset case. It may be worth making that 
design/implementation choice explicit in a short implementation note somewhere, 
so readers do not instinctively read the base class as the generic charset 
implementation.

Separately, did you consider keeping ZipCoder abstract and providing two 
concrete implementations, one for UTF-8 and one for arbitrary charsets? I’m not 
suggesting that would necessarily be better, but I was curious whether that 
alternative was explored and whether it would preserve the same performance 
benefits while reading a bit more naturally.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30640#issuecomment-4240489891

Reply via email to