On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs <rri...@openjdk.org> wrote:

> Strings, after construction, are immutable but may be constructed from 
> mutable arrays of bytes, characters, or integers.
> The string constructors should guard against the effects of mutating the 
> arrays during construction that might invalidate internal invariants for the 
> correct behavior of operations on the resulting strings. In particular, a 
> number of operations have optimizations for operations on pairs of latin1 
> strings and pairs of non-latin1 strings, while operations between latin1 and 
> non-latin1 strings use a more general implementation. 
> 
> The changes include:
> 
> - Adding a warning to each constructor with an array as an argument to 
> indicate that the results are indeterminate 
>   if the input array is modified before the constructor returns. 
>   The resulting string may contain any combination of characters sampled from 
> the input array.
> 
> - Ensure that strings that are represented as non-latin1 contain at least one 
> non-latin1 character.
>   For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or 
> another encoding decoded to latin1 the scanning and compression is unchanged.
>   If a non-latin1 character is found, the string is represented as non-latin1 
> with the added verification that a non-latin1 character is present at the 
> same index.
>   If that character is found to be latin1, then the input array has been 
> modified and the result of the scan may be incorrect.
>   Though a ConcurrentModificationException could be thrown, the risk to an 
> existing application of an unexpected exception should be avoided.
>   Instead, the non-latin1 copy of the input is re-scanned and compressed; 
> that scan determines whether the latin1 or the non-latin1 representation is 
> returned.
> 
> - The methods that scan for non-latin1 characters and their intrinsic 
> implementations are updated to return the index of the non-latin1 character.
> 
> - String construction from StringBuilder and CharSequence must also be 
> guarded as their contents may be modified during construction.

This pull request has now been integrated.

Changeset: 155abc57
Author:    Roger Riggs <rri...@openjdk.org>
URL:       
https://git.openjdk.org/jdk/commit/155abc576a0212932825485380d4e2a9c7dd2fdc
Stats:     1415 lines in 15 files changed: 1162 ins; 110 del; 143 mod

8311906: Improve robustness of String constructors with mutable array inputs

Co-authored-by: Damon Fenacci <dfena...@openjdk.org>
Co-authored-by: Claes Redestad <redes...@openjdk.org>
Co-authored-by: Amit Kumar <amitku...@openjdk.org>
Co-authored-by: Martin Doerr <mdo...@openjdk.org>
Reviewed-by: rgiulietti, thartmann, redestad, dfenacci

-------------

PR: https://git.openjdk.org/jdk/pull/16425

Reply via email to