On Tue, 14 Nov 2023 16:05:51 GMT, Roger Riggs <rri...@openjdk.org> wrote:
>> Strings, after construction, are immutable but may be constructed from >> mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the >> arrays during construction that might invalidate internal invariants for the >> correct behavior of operations on the resulting strings. In particular, a >> number of operations have optimizations for operations on pairs of latin1 >> strings and pairs of non-latin1 strings, while operations between latin1 and >> non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to >> indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled >> from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least >> one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or >> another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as >> non-latin1 with the added verification that a non-latin1 character is >> present at the same index. >> If that character is found to be latin1, then the input array has been >> modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an >> existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; >> that scan determines whether the latin1 or the non-latin1 representation is >> returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic >> implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be >> guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional > commit since the last revision: > > Update PPC implementation of string_compress to return the index of the > non-latin1 char > Patch supplied by TheRealMDoerr test/jdk/java/lang/String/StringRacyConstructor.java line 110: > 108: for (int i = 0; i < orig.length(); i++) > 109: accum |= orig.charAt(i); > 110: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; I think this assumes that compact strings are enabled during this test. test/jdk/java/lang/String/StringRacyConstructor.java line 119: > 117: for (int i = 0; i < orig.length(); i++) > 118: accum |= orig.charAt(i); > 119: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; Same as above. test/jdk/java/lang/String/StringRacyConstructor.java line 190: > 188: if (printWarningCount == 0) { > 189: printWarningCount = 1; > 190: System.out.println("StringUTF16.compress returned 0, > may not be intrinsic"); It seems to me that the Java code for `StringUTF16.compress` also returns the index of the non-latin1 char, so I'm not sure I understand this. Just caution? test/jdk/java/lang/String/StringRacyConstructor.java line 199: > 197: // Exhaustively check compress returning the correct index of > the non-latin1 char. > 198: final int SIZE = 48; > 199: final byte FILL_BYTE = 0x52; Suggestion: final byte FILL_BYTE = 'R'; This makes it more clear that `FILL_BYTE != 'A'` for the logic below. test/jdk/java/lang/String/StringRacyConstructor.java line 258: > 256: */ > 257: public static String racyStringConstruction(String original) throws > ConcurrentModificationException { > 258: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 288: > 286: } > 287: if (i >= 1_000_000) { > 288: System.err.printf("Unable to produce a UTF16 string > in %d iterations: %s%n", i, original); AFAIU, this writes to `System.err` on "success". Is this the intent? test/jdk/java/lang/String/StringRacyConstructor.java line 300: > 298: */ > 299: public static String racyStringConstructionCodepoints(String > original) throws ConcurrentModificationException { > 300: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 347: > 345: */ > 346: public static String > racyStringConstructionCodepointsSurrogates(String original) throws > ConcurrentModificationException { > 347: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 400: > 398: @Override > 399: public int length() { > 400: return aString.length() + 1; Not sure why ` + 1`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394362549 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394363301 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394378639 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394384444 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394364212 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394411147 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394364741 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394367209 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394368361