On Tue, 14 Nov 2023 16:05:51 GMT, Roger Riggs <rri...@openjdk.org> wrote:

>> Strings, after construction, are immutable but may be constructed from 
>> mutable arrays of bytes, characters, or integers.
>> The string constructors should guard against the effects of mutating the 
>> arrays during construction that might invalidate internal invariants for the 
>> correct behavior of operations on the resulting strings. In particular, a 
>> number of operations have optimizations for operations on pairs of latin1 
>> strings and pairs of non-latin1 strings, while operations between latin1 and 
>> non-latin1 strings use a more general implementation. 
>> 
>> The changes include:
>> 
>> - Adding a warning to each constructor with an array as an argument to 
>> indicate that the results are indeterminate 
>>   if the input array is modified before the constructor returns. 
>>   The resulting string may contain any combination of characters sampled 
>> from the input array.
>> 
>> - Ensure that strings that are represented as non-latin1 contain at least 
>> one non-latin1 character.
>>   For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or 
>> another encoding decoded to latin1 the scanning and compression is unchanged.
>>   If a non-latin1 character is found, the string is represented as 
>> non-latin1 with the added verification that a non-latin1 character is 
>> present at the same index.
>>   If that character is found to be latin1, then the input array has been 
>> modified and the result of the scan may be incorrect.
>>   Though a ConcurrentModificationException could be thrown, the risk to an 
>> existing application of an unexpected exception should be avoided.
>>   Instead, the non-latin1 copy of the input is re-scanned and compressed; 
>> that scan determines whether the latin1 or the non-latin1 representation is 
>> returned.
>> 
>> - The methods that scan for non-latin1 characters and their intrinsic 
>> implementations are updated to return the index of the non-latin1 character.
>> 
>> - String construction from StringBuilder and CharSequence must also be 
>> guarded as their contents may be modified during construction.
>
> Roger Riggs has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update PPC implementation of string_compress to return the index of the 
> non-latin1 char
>   Patch supplied by TheRealMDoerr

test/jdk/java/lang/String/StringRacyConstructor.java line 110:

> 108:         for (int i = 0; i < orig.length(); i++)
> 109:             accum |= orig.charAt(i);
> 110:         byte expectedCoder = (accum < 256) ? LATIN1 : UTF16;

I think this assumes that compact strings are enabled during this test.

test/jdk/java/lang/String/StringRacyConstructor.java line 119:

> 117:         for (int i = 0; i < orig.length(); i++)
> 118:             accum |= orig.charAt(i);
> 119:         byte expectedCoder = (accum < 256) ? LATIN1 : UTF16;

Same as above.

test/jdk/java/lang/String/StringRacyConstructor.java line 190:

> 188:                 if (printWarningCount == 0) {
> 189:                     printWarningCount = 1;
> 190:                     System.out.println("StringUTF16.compress returned 0, 
> may not be intrinsic");

It seems to me that the Java code for `StringUTF16.compress` also returns the 
index of the non-latin1 char, so I'm not sure I understand this. Just caution?

test/jdk/java/lang/String/StringRacyConstructor.java line 199:

> 197:         // Exhaustively check compress returning the correct index of 
> the non-latin1 char.
> 198:         final int SIZE = 48;
> 199:         final byte FILL_BYTE = 0x52;

Suggestion:

        final byte FILL_BYTE = 'R';

This makes it more clear that `FILL_BYTE != 'A'` for the logic below.

test/jdk/java/lang/String/StringRacyConstructor.java line 258:

> 256:      */
> 257:     public static String racyStringConstruction(String original) throws 
> ConcurrentModificationException {
> 258:         if (original.chars().max().orElseThrow() > 256) {

Suggestion:

        if (original.chars().max().getAsInt() >= 256) {

test/jdk/java/lang/String/StringRacyConstructor.java line 288:

> 286:                 }
> 287:                 if (i >= 1_000_000) {
> 288:                     System.err.printf("Unable to produce a UTF16 string 
> in %d iterations: %s%n", i, original);

AFAIU, this writes to `System.err` on "success". Is this the intent?

test/jdk/java/lang/String/StringRacyConstructor.java line 300:

> 298:      */
> 299:     public static String racyStringConstructionCodepoints(String 
> original) throws ConcurrentModificationException {
> 300:         if (original.chars().max().orElseThrow() > 256) {

Suggestion:

        if (original.chars().max().getAsInt() >= 256) {

test/jdk/java/lang/String/StringRacyConstructor.java line 347:

> 345:      */
> 346:     public static String 
> racyStringConstructionCodepointsSurrogates(String original) throws 
> ConcurrentModificationException {
> 347:         if (original.chars().max().orElseThrow() > 256) {

Suggestion:

        if (original.chars().max().getAsInt() >= 256) {

test/jdk/java/lang/String/StringRacyConstructor.java line 400:

> 398:         @Override
> 399:         public int length() {
> 400:             return aString.length() + 1;

Not sure why ` + 1`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394362549
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394363301
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394378639
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394384444
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394364212
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394411147
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394364741
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394367209
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394368361

Reply via email to