On Thu, 4 Sep 2025 18:19:40 GMT, Roger Riggs <[email protected]> wrote:
>> src/java.base/share/classes/jdk/internal/util/ModifiedUtf.java line 37:
>>
>>> 35: public abstract class ModifiedUtf {
>>> 36: //Max length in Modified UTF-8 bytes for class names.(see
>>> max_symbol_length in symbol.hpp)
>>> 37: public static final int JAVA_CLASSNAME_MAX_LEN = 65535;
>>
>> max_symbol_length is not just class names - it is presumably the limit for
>> modified UTF-8, as seen in `java.io.DataOutput::writeUTF`. We can just use a
>> more generic name like `MAX_ENCODED_LENGTH`.
>
> There is no maximum length of an encoded UTF-8 string. The "modified UTF-8"
> is modified because it encodes a zero byte using the 2-byte version so the
> result never contains a null. Allowing in some use cases to terminated the
> encoded UTF-8 bytes using a nul byte.
> In the DataOutput case, it was desirable to provide the length of the encoded
> bytes to make it easy to read or skip the encoded UTF-8. It improved some
> stream decoding but increased the cost of writing because the encoded length
> was needed before writing. It also prevented an exact size allocation before
> decoding. In retrospect, it could have provided both the encoded and decoded
> lengths, saving some allocations.
> In ObjectOutputStream, the stream protocol had both long and short forms
> because Strings can be much longer.
> The method names and constants are specific to the encoding of **Class**
> names and that should be reflected in their names.
These are specific to the encoding of all UTF-8 Class File constant too,
instead of being Class specific.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26802#discussion_r2323100636