On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs <[email protected]> wrote:
> Strings, after construction, are immutable but may be constructed from
> mutable arrays of bytes, characters, or integers.
> The string constructors should guard against the effects of mutating the
> arrays during construction that might invalidate internal invariants for the
> correct behavior of operations on the resulting strings. In particular, a
> number of operations have optimizations for operations on pairs of latin1
> strings and pairs of non-latin1 strings, while operations between latin1 and
> non-latin1 strings use a more general implementation.
>
> The changes include:
>
> - Adding a warning to each constructor with an array as an argument to
> indicate that the results are indeterminate
> if the input array is modified before the constructor returns.
> The resulting string may contain any combination of characters sampled from
> the input array.
>
> - Ensure that strings that are represented as non-latin1 contain at least one
> non-latin1 character.
> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or
> another encoding decoded to latin1 the scanning and compression is unchanged.
> If a non-latin1 character is found, the string is represented as non-latin1
> with the added verification that a non-latin1 character is present at the
> same index.
> If that character is found to be latin1, then the input array has been
> modified and the result of the scan may be incorrect.
> Though a ConcurrentModificationException could be thrown, the risk to an
> existing application of an unexpected exception should be avoided.
> Instead, the non-latin1 copy of the input is re-scanned and compressed;
> that scan determines whether the latin1 or the non-latin1 representation is
> returned.
>
> - The methods that scan for non-latin1 characters and their intrinsic
> implementations are updated to return the index of the non-latin1 character.
>
> - String construction from StringBuilder and CharSequence must also be
> guarded as their contents may be modified during construction.
src/java.base/share/classes/java/lang/String.java line 566:
> 564: }
> 565: // Decode with a stable copy, to be the result if the
> decoded length is the same
> 566: byte[] latin1 = Arrays.copyOfRange(bytes, offset, offset
> + length);
This has to be moved before the `if (dp == length) { … }` check, as that also
does a copy:
// Decode with a stable copy, to be the result if the decoded
length is the same
byte[] latin1 = Arrays.copyOfRange(bytes, offset, offset +
length);
int dp = StringCoding.countPositives(latin1, offset, length);
if (dp == length) {
this.value = latin1;
this.coder = LATIN1;
return;
}
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1382576891