Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-08 Thread Raffaello Giulietti
On Fri, 5 Apr 2024 17:44:38 GMT, Roger Riggs  wrote:

> When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) 
> computation of the buffer size may exceed the range of a positive 32-bit 
> Integer.
> If the estimated size for the result byte array is too large, pre-compute the 
> exact buffer size. 
> If that exceeds the range, then throw OutOfMemoryError.

Marked as reviewed by rgiulietti (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18663#pullrequestreview-1986623356


Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-08 Thread Raffaello Giulietti
On Mon, 8 Apr 2024 13:46:03 GMT, Roger Riggs  wrote:

>> Indeed, different OOMEs are thrown in the two cases triggered by different 
>> limits, min +2 is due to integer overflow, while min +1  is due a VM limit 
>> on the size of byte[Integer.MAX_VALUE]. Different VM implementations may 
>> have different limits on the max size of a byte array.
>
> There might be some merit in lowering the threshold at which an exact size 
> computation is triggered.
> The oversized allocation "wastes" quite a bit of memory and causes extra GC 
> work and usually triggers a second copy of the final size.
> However, some guess or heuristic would be needed to choose the threshold at 
> which extra cpu work is needed to compute the exact size vs some metric as to 
> the "cost" of wasted memory (and saving on the copy).
> Most guesses would be somewhat arbitrary; bigger than 1Mb, 1GB, etc? 
> Choosing that number would be out of scope for the issue raised by this bug.

What I mean is that the VM limit might change and suddenly accept `MAX_VALUE` 
as an allowed array size (very unlikely, I guess). The test would then fail on 
`min + 1` because it expects a OOME which will not be thrown.

But that is very remote.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555944305


Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-08 Thread Roger Riggs
On Mon, 8 Apr 2024 13:39:34 GMT, Roger Riggs  wrote:

>> test/jdk/java/lang/String/CompactString/MaxSizeUTF16String.java line 143:
>> 
>>> 141: // Strings of size min+1...min+2, throw OOME
>>> 142: // The resulting byte array would exceed implementation limits
>>> 143: for (int count = min + 1; count < max; count++) {
>> 
>> The case `min + 1` cannot lead to a `NegativeArraySizeException` in the 
>> current code, since `3 * (min + 1) <= MAX_VALUE`. In theory, it should 
>> succeed by returning the encoded `byte[]`, although It throws `OOME` for 
>> exceeding VM limits. That is, this case does not trigger the invocation of 
>> `computeSizeUTF8_UTF16()` in the proposed fix.
>> 
>> Only `min + 2` throws `NegativeArraySizeException` in the current code, and 
>> thus the invocation of `computeSizeUTF8_UTF16()` in the proposed fix.
>
> Indeed, different OOMEs are thrown in the two cases triggered by different 
> limits, min +2 is due to integer overflow, while min +1  is due a VM limit on 
> the size of byte[Integer.MAX_VALUE]. Different VM implementations may have 
> different limits on the max size of a byte array.

There might be some merit in lowering the threshold at which an exact size 
computation is triggered.
The oversized allocation "wastes" quite a bit of memory and causes extra GC 
work and usually triggers a second copy of the final size.
However, some guess or heuristic would be needed to choose the threshold at 
which extra cpu work is needed to compute the exact size vs some metric as to 
the "cost" of wasted memory (and saving on the copy).
Most guesses would be somewhat arbitrary; bigger than 1Mb, 1GB, etc? 
Choosing that number would be out of scope for the issue raised by this bug.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555875973


Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-08 Thread Roger Riggs
On Mon, 8 Apr 2024 08:54:21 GMT, Raffaello Giulietti  
wrote:

>> When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) 
>> computation of the buffer size may exceed the range of a positive 32-bit 
>> Integer.
>> If the estimated size for the result byte array is too large, pre-compute 
>> the exact buffer size. 
>> If that exceeds the range, then throw OutOfMemoryError.
>
> test/jdk/java/lang/String/CompactString/MaxSizeUTF16String.java line 143:
> 
>> 141: // Strings of size min+1...min+2, throw OOME
>> 142: // The resulting byte array would exceed implementation limits
>> 143: for (int count = min + 1; count < max; count++) {
> 
> The case `min + 1` cannot lead to a `NegativeArraySizeException` in the 
> current code, since `3 * (min + 1) <= MAX_VALUE`. In theory, it should 
> succeed by returning the encoded `byte[]`, although It throws `OOME` for 
> exceeding VM limits. That is, this case does not trigger the invocation of 
> `computeSizeUTF8_UTF16()` in the proposed fix.
> 
> Only `min + 2` throws `NegativeArraySizeException` in the current code, and 
> thus the invocation of `computeSizeUTF8_UTF16()` in the proposed fix.

Indeed, different OOMEs are thrown in the two cases triggered by different 
limits, min +2 is due to integer overflow, while min +1  is due a VM limit on 
the size of byte[Integer.MAX_VALUE]. Different VM implementations may have 
different limits. on the max size of a byte array.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555866610


Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-08 Thread Raffaello Giulietti
On Fri, 5 Apr 2024 17:44:38 GMT, Roger Riggs  wrote:

> When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) 
> computation of the buffer size may exceed the range of a positive 32-bit 
> Integer.
> If the estimated size for the result byte array is too large, pre-compute the 
> exact buffer size. 
> If that exceeds the range, then throw OutOfMemoryError.

Code fix looks good, but see the comment in the test.

test/jdk/java/lang/String/CompactString/MaxSizeUTF16String.java line 143:

> 141: // Strings of size min+1...min+2, throw OOME
> 142: // The resulting byte array would exceed implementation limits
> 143: for (int count = min + 1; count < max; count++) {

The case `min + 1` cannot lead to a `NegativeArraySizeException` in the current 
code, since `3 * (min + 1) <= MAX_VALUE`. In theory, it should succeed by 
returning the encoded `byte[]`, although It throws `OOME` for exceeding VM 
limits. That is, this case does not trigger the invocation of 
`computeSizeUTF8_UTF16()` in the proposed fix.

Only `min + 2` throws `NegativeArraySizeException` in the current code, and 
thus the invocation of `computeSizeUTF8_UTF16()` in the proposed fix.

-

PR Review: https://git.openjdk.org/jdk/pull/18663#pullrequestreview-1985830221
PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555466467


Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-05 Thread Naoto Sato
On Fri, 5 Apr 2024 21:57:15 GMT, Roger Riggs  wrote:

> The test doesn't run quickly already due to the large chunks of memory used.

OK, never mind then, if it would take considerable time.

-

PR Comment: https://git.openjdk.org/jdk/pull/18663#issuecomment-2040706486


Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-05 Thread Roger Riggs
On Fri, 5 Apr 2024 20:17:39 GMT, Naoto Sato  wrote:

> LGTM. The test case could be more thorough if it tests strings with 
> supplementary codepoints, as the new method computes them exclusively.

I considered that, but the worst case is the x3 expansion. 
A 2 character high/low surrogate pair would result in 4 bytes of UTF-8, less 
than the 6 bytes needed for a 2 16 bit characters. The test doesn't run quickly 
already due to the large chunks of memory used.

-

PR Comment: https://git.openjdk.org/jdk/pull/18663#issuecomment-2040681452


Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-05 Thread Naoto Sato
On Fri, 5 Apr 2024 17:44:38 GMT, Roger Riggs  wrote:

> When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) 
> computation of the buffer size may exceed the range of a positive 32-bit 
> Integer.
> If the estimated size for the result byte array is too large, pre-compute the 
> exact buffer size. 
> If that exceeds the range, then throw OutOfMemoryError.

LGTM. The test case could be more thorough if it tests strings with 
supplementary codepoints, as the new method computes them exclusively.

-

Marked as reviewed by naoto (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18663#pullrequestreview-1984117426


RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8

2024-04-05 Thread Roger Riggs
When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) 
computation of the buffer size may exceed the range of a positive 32-bit 
Integer.
If the estimated size for the result byte array is too large, pre-compute the 
exact buffer size. 
If that exceeds the range, then throw OutOfMemoryError.

-

Commit messages:
 - 8329623: NegativeArraySizeException encoding large String to UTF-8

Changes: https://git.openjdk.org/jdk/pull/18663/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=18663=00
  Issue: https://bugs.openjdk.org/browse/JDK-8329623
  Stats: 78 lines in 2 files changed: 76 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/18663.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18663/head:pull/18663

PR: https://git.openjdk.org/jdk/pull/18663