On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs <[email protected]> wrote:
>> Strings, after construction, are immutable but may be constructed from
>> mutable arrays of bytes, characters, or integers.
>> The string constructors should guard against the effects of mutating the
>> arrays during construction that might invalidate internal invariants for the
>> correct behavior of operations on the resulting strings. In particular, a
>> number of operations have optimizations for operations on pairs of latin1
>> strings and pairs of non-latin1 strings, while operations between latin1 and
>> non-latin1 strings use a more general implementation.
>>
>> The changes include:
>>
>> - Adding a warning to each constructor with an array as an argument to
>> indicate that the results are indeterminate
>> if the input array is modified before the constructor returns.
>> The resulting string may contain any combination of characters sampled
>> from the input array.
>>
>> - Ensure that strings that are represented as non-latin1 contain at least
>> one non-latin1 character.
>> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or
>> another encoding decoded to latin1 the scanning and compression is unchanged.
>> If a non-latin1 character is found, the string is represented as
>> non-latin1 with the added verification that a non-latin1 character is
>> present at the same index.
>> If that character is found to be latin1, then the input array has been
>> modified and the result of the scan may be incorrect.
>> Though a ConcurrentModificationException could be thrown, the risk to an
>> existing application of an unexpected exception should be avoided.
>> Instead, the non-latin1 copy of the input is re-scanned and compressed;
>> that scan determines whether the latin1 or the non-latin1 representation is
>> returned.
>>
>> - The methods that scan for non-latin1 characters and their intrinsic
>> implementations are updated to return the index of the non-latin1 character.
>>
>> - String construction from StringBuilder and CharSequence must also be
>> guarded as their contents may be modified during construction.
>
> Roger Riggs has updated the pull request incrementally with three additional
> commits since the last revision:
>
> - Refactored extractCodePoints to avoid multiple resizes if the array was
> modified
> - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff`
> It performs better than the single byte array access by avoiding the
> bounds check.
> - Misc updates for review comments, javadoc cleanup
> Extra checking on maximum string lengths when calling toBytes().
Can you include PPC64, please?
diff --git a/src/hotspot/cpu/ppc/ppc.ad b/src/hotspot/cpu/ppc/ppc.ad
index 89ce51e997e..102701e4969 100644
--- a/src/hotspot/cpu/ppc/ppc.ad
+++ b/src/hotspot/cpu/ppc/ppc.ad
@@ -12727,16 +12727,8 @@ instruct string_compress(rarg1RegP src, rarg2RegP dst,
iRegIsrc len, iRegIdst re
ins_cost(300);
format %{ "String Compress $src,$dst,$len -> $result \t// KILL $tmp1, $tmp2,
$tmp3, $tmp4, $tmp5" %}
ins_encode %{
- Label Lskip, Ldone;
- __ li($result$$Register, 0);
- __ string_compress_16($src$$Register, $dst$$Register, $len$$Register,
$tmp1$$Register,
- $tmp2$$Register, $tmp3$$Register, $tmp4$$Register,
$tmp5$$Register, Ldone);
- __ rldicl_($tmp1$$Register, $len$$Register, 0, 64-3); // Remaining
characters.
- __ beq(CCR0, Lskip);
- __ string_compress($src$$Register, $dst$$Register, $tmp1$$Register,
$tmp2$$Register, Ldone);
- __ bind(Lskip);
- __ mr($result$$Register, $len$$Register);
- __ bind(Ldone);
+ __ encode_iso_array($src$$Register, $dst$$Register, $len$$Register,
$tmp1$$Register, $tmp2$$Register,
+ $tmp3$$Register, $tmp4$$Register, $tmp5$$Register,
$result$$Register, false);
%}
ins_pipe(pipe_class_default);
%}
@offamitkumar: I guess s390 also needs an adaptation.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1806526012