On Wed, 17 Sep 2025 13:32:01 GMT, Roger Riggs <[email protected]> wrote:
>> Please review this patch. >> >> **Description:** >> >> Currently, ModifiedUtf.utfLen returns a signed int. For very large strings, >> this may overflow and produce negative values, leading to incorrect behavior >> in code that relies on the UTF length. This patch changes the return type to >> long, which fully resolves the issue and allows safe handling of giant >> strings. >> >> **Test:** >> >> GHA > > Can you add a test of the maximum length UTF-8 encoded string. > That would be a string of Integer.MAX_VALUE/2 characters that were > 0xff. > It will likely have to write it to a file and read it back, > ByteArrayIn/OutStream wouldn't be big enough. Hi @RogerRiggs @liach Thanks for the suggestion. Creating a string of Integer.MAX_VALUE/2 characters would require enormous memory, even using a file, since the JVM still needs to hold the string content in memory when reading it back. Instead, i used a small string chunk with 1-, 2-, and 3-byte UTF-8 characters and repeatedly called ModifiedUtf.utfLen() in a loop, accumulating the total in a long. This safely simulates a total length exceeding Integer.MAX_VALUE and verifies that the change to long prevents overflow. Could you please take another look when you have time ? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27285#issuecomment-3307322322
