Hi Eirik,

While I think there may be some merit/benefit to what your suggesting, I think 
we should take a step back and also look at ZipFS to see what we might be able 
to share as part of a refactoring as there is a somewhat similar flow to ZipFS 
initCEN

The benefit would be reducing the need to have to maintain somewhat duplicate 
code across both APIs.

Best,
Lance

On Feb 10, 2026, at 6:38 AM, Eirik Bjørsnøs <[email protected]> wrote:

Hi,

Summary

Relieve ZipFile.Source.initCEN from non-parsing related responsibilities, such 
that it can become a pure function for parsing and validating CEN directories. 
Make it return Source, such that instance fields of this class may become 
final, immutable and accessible without synchronization.

Motivation

The method ZipFile.Source.initCEN has been the cause of a good amount of 
head-scratching over the years. It was brought over from the native 
implementation when this was transitioned to Java in Java 9.

Risk assessments at that time probably constrained the code to closely align 
with the native code, which together with interpreter-friendliness in early 
startup have affected the current shape of this code.

It is hard to summarize the responsibilities of this method, but I'll try:

o Call findEND() to read metadata from the (Zip64) END headers in the ZIP file, 
validate fields and reject invalid values
o Determine the position of the first LOC header
o Read the CEN byte array from the ZIP file
o Construct hash table data structures
o Assign these to instance fields in Source
o Parse and validate the CEN structure and populate hash table structures
o Detect manifest file(s), signature related files and multi-release JAR 
versions
o Detect when a ZIP has more CEN entries than declared by the END header and 
restart parsing by invoking itself recursively after counting the actual number 
of CEN entries
o Detect when it is being invoked recursively to avoid reading the END and CEN 
headers twice

It's been obvious for a while that this method performs an excessive number of 
unrelated tasks, knows too much about the state of the object being constructed 
and handles a disproportionate amount of funky control flow. It calls out to a 
number of helper methods to perform parsing and validation, some of them 
static, some relying on instance state.

The amount of synchronization in ZipFile today is probably excessive for the 
current implementation. It made more sense back when ZipFile was backed by 
native code and resources. In particular, lookup misses pay a high relative 
cost to synchronization which is probably not needed.

Making Source fields final moves us in a direction which may allow us to 
eventually remove synchronization for operations that only touch a parsed, 
immutable CEN.

Ideas and plan for improvements

I think we can improve the current situation by:

o Moving END header parsing and validation out of initCEN
o Moving CEN byte array reading out of initCEN
o Making all helper methods static (it's a mix today)

At this point, initCEN will almost, but not quite be a pure function for 
parsing the CEN byte array. So the next natural step is to actually make it a 
pure function by:

o Making it static
o Having it return a Source instance instead of assigning to the instance being 
constructed
o Making Source (mostly) immutable by marking parsed instance fields as 'final'

Rather than a big bang change, I see this effort as a step-by-step refactoring, 
probably over several PRs, each focusing on a smaller aspect of refactoring, 
hopefully making them easier to review. This is not an all-or-nothing effort, 
most of these changes will be incremental improvements on their own merit. If 
we don't get all the way, that's also okay.

Feedback?

Input to this idea/plan is welcome. If someone sees reasons this may not be a 
good idea, it's better to raise such concern here and now.

Eirik.



[oracle_sig_logo.gif]

Lance Andersen | Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
[email protected]



Reply via email to