Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Fri, 5 Apr 2024 17:44:38 GMT, Roger Riggs wrote: > When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) > computation of the buffer size may exceed the range of a positive 32-bit > Integer. > If the estimated size for the result byte array is too large, pre-compute the > exact buffer size. > If that exceeds the range, then throw OutOfMemoryError. Marked as reviewed by rgiulietti (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/18663#pullrequestreview-1986623356
Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Mon, 8 Apr 2024 13:46:03 GMT, Roger Riggs wrote: >> Indeed, different OOMEs are thrown in the two cases triggered by different >> limits, min +2 is due to integer overflow, while min +1 is due a VM limit >> on the size of byte[Integer.MAX_VALUE]. Different VM implementations may >> have different limits on the max size of a byte array. > > There might be some merit in lowering the threshold at which an exact size > computation is triggered. > The oversized allocation "wastes" quite a bit of memory and causes extra GC > work and usually triggers a second copy of the final size. > However, some guess or heuristic would be needed to choose the threshold at > which extra cpu work is needed to compute the exact size vs some metric as to > the "cost" of wasted memory (and saving on the copy). > Most guesses would be somewhat arbitrary; bigger than 1Mb, 1GB, etc? > Choosing that number would be out of scope for the issue raised by this bug. What I mean is that the VM limit might change and suddenly accept `MAX_VALUE` as an allowed array size (very unlikely, I guess). The test would then fail on `min + 1` because it expects a OOME which will not be thrown. But that is very remote. - PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555944305
Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Mon, 8 Apr 2024 13:39:34 GMT, Roger Riggs wrote: >> test/jdk/java/lang/String/CompactString/MaxSizeUTF16String.java line 143: >> >>> 141: // Strings of size min+1...min+2, throw OOME >>> 142: // The resulting byte array would exceed implementation limits >>> 143: for (int count = min + 1; count < max; count++) { >> >> The case `min + 1` cannot lead to a `NegativeArraySizeException` in the >> current code, since `3 * (min + 1) <= MAX_VALUE`. In theory, it should >> succeed by returning the encoded `byte[]`, although It throws `OOME` for >> exceeding VM limits. That is, this case does not trigger the invocation of >> `computeSizeUTF8_UTF16()` in the proposed fix. >> >> Only `min + 2` throws `NegativeArraySizeException` in the current code, and >> thus the invocation of `computeSizeUTF8_UTF16()` in the proposed fix. > > Indeed, different OOMEs are thrown in the two cases triggered by different > limits, min +2 is due to integer overflow, while min +1 is due a VM limit on > the size of byte[Integer.MAX_VALUE]. Different VM implementations may have > different limits on the max size of a byte array. There might be some merit in lowering the threshold at which an exact size computation is triggered. The oversized allocation "wastes" quite a bit of memory and causes extra GC work and usually triggers a second copy of the final size. However, some guess or heuristic would be needed to choose the threshold at which extra cpu work is needed to compute the exact size vs some metric as to the "cost" of wasted memory (and saving on the copy). Most guesses would be somewhat arbitrary; bigger than 1Mb, 1GB, etc? Choosing that number would be out of scope for the issue raised by this bug. - PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555875973
Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Mon, 8 Apr 2024 08:54:21 GMT, Raffaello Giulietti wrote: >> When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) >> computation of the buffer size may exceed the range of a positive 32-bit >> Integer. >> If the estimated size for the result byte array is too large, pre-compute >> the exact buffer size. >> If that exceeds the range, then throw OutOfMemoryError. > > test/jdk/java/lang/String/CompactString/MaxSizeUTF16String.java line 143: > >> 141: // Strings of size min+1...min+2, throw OOME >> 142: // The resulting byte array would exceed implementation limits >> 143: for (int count = min + 1; count < max; count++) { > > The case `min + 1` cannot lead to a `NegativeArraySizeException` in the > current code, since `3 * (min + 1) <= MAX_VALUE`. In theory, it should > succeed by returning the encoded `byte[]`, although It throws `OOME` for > exceeding VM limits. That is, this case does not trigger the invocation of > `computeSizeUTF8_UTF16()` in the proposed fix. > > Only `min + 2` throws `NegativeArraySizeException` in the current code, and > thus the invocation of `computeSizeUTF8_UTF16()` in the proposed fix. Indeed, different OOMEs are thrown in the two cases triggered by different limits, min +2 is due to integer overflow, while min +1 is due a VM limit on the size of byte[Integer.MAX_VALUE]. Different VM implementations may have different limits. on the max size of a byte array. - PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555866610
Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Fri, 5 Apr 2024 17:44:38 GMT, Roger Riggs wrote: > When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) > computation of the buffer size may exceed the range of a positive 32-bit > Integer. > If the estimated size for the result byte array is too large, pre-compute the > exact buffer size. > If that exceeds the range, then throw OutOfMemoryError. Code fix looks good, but see the comment in the test. test/jdk/java/lang/String/CompactString/MaxSizeUTF16String.java line 143: > 141: // Strings of size min+1...min+2, throw OOME > 142: // The resulting byte array would exceed implementation limits > 143: for (int count = min + 1; count < max; count++) { The case `min + 1` cannot lead to a `NegativeArraySizeException` in the current code, since `3 * (min + 1) <= MAX_VALUE`. In theory, it should succeed by returning the encoded `byte[]`, although It throws `OOME` for exceeding VM limits. That is, this case does not trigger the invocation of `computeSizeUTF8_UTF16()` in the proposed fix. Only `min + 2` throws `NegativeArraySizeException` in the current code, and thus the invocation of `computeSizeUTF8_UTF16()` in the proposed fix. - PR Review: https://git.openjdk.org/jdk/pull/18663#pullrequestreview-1985830221 PR Review Comment: https://git.openjdk.org/jdk/pull/18663#discussion_r1555466467
Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Fri, 5 Apr 2024 21:57:15 GMT, Roger Riggs wrote: > The test doesn't run quickly already due to the large chunks of memory used. OK, never mind then, if it would take considerable time. - PR Comment: https://git.openjdk.org/jdk/pull/18663#issuecomment-2040706486
Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Fri, 5 Apr 2024 20:17:39 GMT, Naoto Sato wrote: > LGTM. The test case could be more thorough if it tests strings with > supplementary codepoints, as the new method computes them exclusively. I considered that, but the worst case is the x3 expansion. A 2 character high/low surrogate pair would result in 4 bytes of UTF-8, less than the 6 bytes needed for a 2 16 bit characters. The test doesn't run quickly already due to the large chunks of memory used. - PR Comment: https://git.openjdk.org/jdk/pull/18663#issuecomment-2040681452
Re: RFR: 8329623: NegativeArraySizeException encoding large String to UTF-8
On Fri, 5 Apr 2024 17:44:38 GMT, Roger Riggs wrote: > When encoding a vary large string in String.getBytes(StandardCharset.UTF_8) > computation of the buffer size may exceed the range of a positive 32-bit > Integer. > If the estimated size for the result byte array is too large, pre-compute the > exact buffer size. > If that exceeds the range, then throw OutOfMemoryError. LGTM. The test case could be more thorough if it tests strings with supplementary codepoints, as the new method computes them exclusively. - Marked as reviewed by naoto (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18663#pullrequestreview-1984117426