Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Eirik Bjørsnøs Mon, 09 Feb 2026 13:36:49 -0800

On Mon, 9 Feb 2026 16:26:42 GMT, Liam Miller-Cushon <[email protected]> wrote:


> What is the use-case for `decodedLength` in `ZipFile`? Does 'efficient 
> rejection of strings without decoding' require knowing the decoded length, or 
> just whether the data is a valid encoding?

The ZIP file CEN header format only includes the length of the name in encoded 
form. Knowing the length of the decoded string could potentially let us quickly 
reject lookup matches against a lookup String only based on comparing string 
lengths (ZipFile supports returning "directory/" as a result for "directory", 
so we know a match would be 9 or 10 chars long).  

In practise, we compare hash codes before comparing strings. So this would only 
be useful for hash collisions. These are rare, so not worth optimizing. Perhaps 
there are other oppertinities though, it would certainly be possible to reject 
lookups based on min/max occurrence of entry lengths (or perhaps a bitset of 
occurring string lengths).

But I'm sure there are other use cases where a `java.lang.String` is compared 
to its encoded form and knowing the length without String allocation could be 
useful. Input validation is another, possibly combined with rejection on 
malformed encoded data.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3872836764

Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Reply via email to