On Wed, 22 Nov 2023 05:03:41 GMT, Roger Riggs <[email protected]> wrote:
>> Strings, after construction, are immutable but may be constructed from
>> mutable arrays of bytes, characters, or integers.
>> The string constructors should guard against the effects of mutating the
>> arrays during construction that might invalidate internal invariants for the
>> correct behavior of operations on the resulting strings. In particular, a
>> number of operations have optimizations for operations on pairs of latin1
>> strings and pairs of non-latin1 strings, while operations between latin1 and
>> non-latin1 strings use a more general implementation.
>>
>> The changes include:
>>
>> - Adding a warning to each constructor with an array as an argument to
>> indicate that the results are indeterminate
>> if the input array is modified before the constructor returns.
>> The resulting string may contain any combination of characters sampled
>> from the input array.
>>
>> - Ensure that strings that are represented as non-latin1 contain at least
>> one non-latin1 character.
>> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or
>> another encoding decoded to latin1 the scanning and compression is unchanged.
>> If a non-latin1 character is found, the string is represented as
>> non-latin1 with the added verification that a non-latin1 character is
>> present at the same index.
>> If that character is found to be latin1, then the input array has been
>> modified and the result of the scan may be incorrect.
>> Though a ConcurrentModificationException could be thrown, the risk to an
>> existing application of an unexpected exception should be avoided.
>> Instead, the non-latin1 copy of the input is re-scanned and compressed;
>> that scan determines whether the latin1 or the non-latin1 representation is
>> returned.
>>
>> - The methods that scan for non-latin1 characters and their intrinsic
>> implementations are updated to return the index of the non-latin1 character.
>>
>> - String construction from StringBuilder and CharSequence must also be
>> guarded as their contents may be modified during construction.
>
> Roger Riggs has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Apply StringUTF16.coderFromArrayLen
src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8538:
> 8536:
> 8537: testl(len, -64);
> 8538: jcc(Assembler::zero, post_alignment);
@cl4es since some of the `jcc` instructions have been changed into `jccb` in
the rest of the intrinsic, I was wondering if it would make sense do the same
for the rest of them (where the jump is short):
Suggestion:
jccb(Assembler::zero, post_alignment);
src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8547:
> 8545: // bail out when there is nothing to be done
> 8546: testl(tmp5, 0xFFFFFFFF);
> 8547: jcc(Assembler::zero, post_alignment);
Suggestion:
jccb(Assembler::zero, post_alignment);
and here:
https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8574
src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8584:
> 8582: evpcmpuw(mask1, tmp1Reg, tmp2Reg, Assembler::le,
> Assembler::AVX_512bit);
> 8583: kortestdl(mask1, mask1);
> 8584: jcc(Assembler::carryClear, reset_for_copy_tail);
Suggestion:
jccb(Assembler::carryClear, reset_for_copy_tail);
and here:
https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8590
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406147105
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406147739
PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406178794