Re: RFR: 8366421: ModifiedUtf.utfLen may overflow for giant string

Guanqiang Han Thu, 18 Sep 2025 08:30:12 -0700

On Wed, 17 Sep 2025 13:32:01 GMT, Roger Riggs <[email protected]> wrote:


>> Please review this patch.
>> 
>> **Description:**
>> 
>> Currently, ModifiedUtf.utfLen returns a signed int. For very large strings, 
>> this may overflow and produce negative values, leading to incorrect behavior 
>> in code that relies on the UTF length. This patch changes the return type to 
>> long, which fully resolves the issue and allows safe handling of giant 
>> strings.
>> 
>> **Test:**
>> 
>> GHA
>
> Can you add a test of the maximum length UTF-8 encoded string. 
> That would be a string of Integer.MAX_VALUE/2 characters that were > 0xff.
> It will likely have to write it to a file and read it back, 
> ByteArrayIn/OutStream wouldn't be big enough.

Hi @RogerRiggs @liach 
Thanks for the suggestion. 
Creating a string of Integer.MAX_VALUE/2 characters would require enormous 
memory, even using a file, since the JVM still needs to hold the string content 
in memory when reading it back.
Instead, i used a small string chunk with 1-, 2-, and 3-byte UTF-8 characters 
and repeatedly called ModifiedUtf.utfLen() in a loop, accumulating the total in 
a long. This safely simulates a total length exceeding Integer.MAX_VALUE and 
verifies that the change to long prevents overflow.
Could you please take another look when you have time ? Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27285#issuecomment-3307322322

Re: RFR: 8366421: ModifiedUtf.utfLen may overflow for giant string

Reply via email to