On Wed, 22 Nov 2023 05:03:41 GMT, Roger Riggs <rri...@openjdk.org> wrote:
>> Strings, after construction, are immutable but may be constructed from >> mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the >> arrays during construction that might invalidate internal invariants for the >> correct behavior of operations on the resulting strings. In particular, a >> number of operations have optimizations for operations on pairs of latin1 >> strings and pairs of non-latin1 strings, while operations between latin1 and >> non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to >> indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled >> from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least >> one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or >> another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as >> non-latin1 with the added verification that a non-latin1 character is >> present at the same index. >> If that character is found to be latin1, then the input array has been >> modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an >> existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; >> that scan determines whether the latin1 or the non-latin1 representation is >> returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic >> implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be >> guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional > commit since the last revision: > > Apply StringUTF16.coderFromArrayLen src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8538: > 8536: > 8537: testl(len, -64); > 8538: jcc(Assembler::zero, post_alignment); @cl4es since some of the `jcc` instructions have been changed into `jccb` in the rest of the intrinsic, I was wondering if it would make sense do the same for the rest of them (where the jump is short): Suggestion: jccb(Assembler::zero, post_alignment); src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8547: > 8545: // bail out when there is nothing to be done > 8546: testl(tmp5, 0xFFFFFFFF); > 8547: jcc(Assembler::zero, post_alignment); Suggestion: jccb(Assembler::zero, post_alignment); and here: https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8574 src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8584: > 8582: evpcmpuw(mask1, tmp1Reg, tmp2Reg, Assembler::le, > Assembler::AVX_512bit); > 8583: kortestdl(mask1, mask1); > 8584: jcc(Assembler::carryClear, reset_for_copy_tail); Suggestion: jccb(Assembler::carryClear, reset_for_copy_tail); and here: https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8590 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406147105 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406147739 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406178794