Hello,
I've reworked the patch in order to decide about char[] / byte[] lazily.
This allows to dodge performance impact of reflective calls to
String.isLatin1() for non-latin Strings
and at the same time keep the benefits of discrimination between char[] /
byte[].
I've collected resutls for all the versions of my patch into the table below.
As you can see
StringBuilder reduces complexity but significantly slows down the case then
non-latin Strings are joined.
Exclamation marks denote the cases where the last implementation clearly looses
in performance
agains original StringJoiner.
Regards,
Sergey Tsypanov
P.S.
Also I think the further improvement is possible here: Tagir mentioned decoding
of byte[] happening in
constructor of String(byte[]). The constructor is not aware about how the bytes
are encoded and
has to decode them. However in our case when came to `new String(byte[])` we
are sure that the bytes
are of Latin1. So we can skip decoding by calling StringLatin1.newString().
However, it copies incoming
byte[] which is again redundant in our particular case.
Benchmark (count) (latin) (length)
Original Patched1 SB Patched2 Units
stringJoiner 1 true 1 26.9 ±
0.7 48.8 ± 2.2 33.4 ± 0.2 42.6 ± 1.0 ns/op !
stringJoiner 1 true 5 30.5 ±
1.0 46.1 ± 2.1 33.0 ± 0.1 42.3 ± 1.1 ns/op !
stringJoiner 1 true 10 31.2 ±
0.6 47.3 ± 1.3 34.3 ± 0.3 46.6 ± 1.4 ns/op !
stringJoiner 1 true 100 62.5 ±
3.3 79.9 ± 4.8 44.4 ± 0.1 63.5 ± 2.1 ns/op
stringJoiner 5 true 1 78.2 ±
1.6 110.3 ± 2.9 87.8 ± 0.8 93.4 ± 1.7 ns/op !
stringJoiner 5 true 5 94.2 ±
8.7 116.6 ± 0.7 88.2 ± 0.9 91.4 ± 0.9 ns/op
stringJoiner 5 true 10 95.3 ±
6.9 100.1 ± 0.4 91.6 ± 0.6 93.7 ± 0.4 ns/op
stringJoiner 5 true 100 188.0 ±
10.2 136.0 ± 0.4 126.1 ± 0.7 135.3 ± 0.9 ns/op
stringJoiner 10 true 1 160.3 ±
4.5 172.9 ± 0.8 177.6 ± 0.8 172.1 ± 0.6 ns/op !
stringJoiner 10 true 5 169.0 ±
4.7 180.2 ± 9.1 179.4 ± 1.0 171.7 ± 0.6 ns/op
stringJoiner 10 true 10 205.7 ±
16.4 182.7 ± 1.1 189.5 ± 1.2 178.3 ± 0.5 ns/op
stringJoiner 10 true 100 366.5 ±
17.0 284.5 ± 3.1 290.0 ± 0.8 282.1 ± 0.9 ns/op
stringJoiner 100 true 1 1117.6 ±
11.1 2123.7 ± 11.1 1563.8 ± 2.8 1379.5 ± 4.4 ns/op !
stringJoiner 100 true 5 1270.7 ±
40.2 2163.6 ± 12.4 1592.4 ± 4.0 1426.1 ± 4.4 ns/op !
stringJoiner 100 true 10 1364.4 ±
14.0 2283.8 ± 16.1 1773.7 ± 57.7 1525.0 ± 3.2 ns/op !
stringJoiner 100 true 100 3592.9 ±
164.8 3535.2 ± 29.9 2899.2 ± 51.0 3043.9 ± 85.6 ns/op
stringJoiner 1 false 1 35.6 ±
1.2 59.1 ± 3.0 52.7 ± 1.2 44.9 ± 0.9 ns/op !
stringJoiner 1 false 5 39.3 ±
1.2 52.6 ± 2.5 54.4 ± 1.6 37.4 ± 0.9 ns/op
stringJoiner 1 false 10 42.2 ±
1.6 53.6 ± 0.3 52.2 ± 1.0 43.2 ± 1.0 ns/op
stringJoiner 1 false 100 70.5 ±
1.8 86.4 ± 0.4 78.6 ± 1.2 75.8 ± 2.2 ns/op !
stringJoiner 5 false 1 89.0 ±
3.5 102.2 ± 1.0 116.3 ± 3.8 85.3 ± 0.1 ns/op
stringJoiner 5 false 5 87.6 ±
0.7 106.5 ± 1.2 115.2 ± 2.9 90.7 ± 0.6 ns/op !
stringJoiner 5 false 10 109.0 ±
5.6 116.5 ± 1.2 126.5 ± 0.5 97.2 ± 0.3 ns/op
stringJoiner 5 false 100 324.0 ±
16.5 221.9 ± 0.5 288.9 ± 0.5 227.3 ± 0.7 ns/op
stringJoiner 10 false 1 183.9 ±
5.9 204.7 ± 5.5 261.2 ± 7.7 168.4 ± 0.7 ns/op
stringJoiner 10 false 5 198.7 ±
9.7 202.4 ± 1.5 253.3 ± 6.7 178.1 ± 0.6 ns/op
stringJoiner 10 false 10 196.7 ±
6.9 226.7 ± 6.4 274.3 ± 7.0 191.8 ± 0.5 ns/op
stringJoiner 10 false 100 535.8 ±
2.3 553.0 ± 5.6 677.3 ± 6.4 538.8 ± 2.3 ns/op
stringJoiner 100 false 1 1674.6 ±
122.1 1940.8 ± 16.2 2212.5 ± 32.2 1418.9 ± 2.9 ns/op
stringJoiner 100 false 5 1791.9 ±
58.1 2158.1 ± 12.0 2492.8 ± 30.9 1583.9 ± 3.9 ns/op
stringJoiner 100 false 10 2124.1 ±
193.3 2364.0 ± 25.2 2611.7 ± 17.5 1861.8 ± 5.9 ns/op
stringJoiner 100 false 100 4323.4 ±
29.2 4675.5 ± 11.8 5501.3 ± 21.1 4172.3 ± 12.7 ns/op
stringJoiner:·gc.alloc.rate.norm 1 true 1 120.0 ±
0.0 120.0 ± 0.0 144.0 ± 0.0 120.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 true 5 128.0 ±
0.0 120.0 ± 0.0 144.0 ± 0.0 120.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 true 10 144.0 ±
0.0 136.0 ± 0.0 160.0 ± 0.0 136.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 true 100 416.0 ±
0.0 312.0 ± 0.0 336.0 ± 0.0 312.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 true 1 144.0 ±
0.0 136.0 ± 0.0 160.0 ± 0.0 136.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 true 5 200.0 ±
0.0 168.0 ± 0.0 192.0 ± 0.0 168.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 true 10 272.0 ±
0.0 216.0 ± 0.0 240.0 ± 0.0 216.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 true 100 1632.0 ±
0.0 1128.0 ± 0.0 1152.0 ± 0.0 1128.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 true 1 256.0 ±
0.0 232.0 ± 0.0 256.0 ± 0.0 232.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 true 5 376.0 ±
0.0 316.8 ± 4.9 336.0 ± 0.0 312.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 true 10 520.0 ±
0.0 408.0 ± 0.0 432.0 ± 0.0 408.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 true 100 3224.1 ±
0.0 2236.9 ± 21.2 2240.1 ± 0.0 2216.1 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 true 1 1748.1 ±
4.0 1592.2 ± 0.0 1568.2 ± 0.0 1544.2 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 true 5 2948.2 ±
4.0 2392.3 ± 0.0 2368.2 ± 0.0 2344.2 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 true 10 4444.3 ±
4.0 3384.3 ± 0.0 3364.3 ± 4.0 3336.3 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 true 100 31441.4 ±
0.0 21385.4 ± 0.0 21365.1 ± 4.1 21353.2 ± 6.6 B/op
stringJoiner:·gc.alloc.rate.norm 1 false 1 144.0 ±
0.0 144.0 ± 0.0 192.0 ± 0.0 144.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 false 5 160.0 ±
0.0 160.0 ± 0.0 208.0 ± 0.0 160.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 false 10 184.0 ±
0.0 184.0 ± 0.0 240.0 ± 0.0 184.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 false 100 640.0 ±
0.0 640.0 ± 0.0 784.0 ± 0.0 640.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 false 1 184.0 ±
0.0 184.0 ± 0.0 240.0 ± 0.0 184.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 false 5 280.0 ±
0.0 280.0 ± 0.0 349.6 ± 2.4 280.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 false 10 400.0 ±
0.0 400.0 ± 0.0 496.0 ± 0.0 400.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 false 100 2664.1 ±
0.0 2664.1 ± 0.0 3216.1 ± 0.0 2664.1 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 false 1 320.0 ±
0.0 334.4 ± 7.4 384.0 ± 0.0 320.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 false 5 520.0 ±
0.0 520.0 ± 0.0 624.0 ± 0.0 520.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 false 10 760.0 ±
0.0 769.6 ± 6.5 912.0 ± 0.0 760.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 false 100 5264.2 ±
0.0 5273.8 ± 6.5 6320.3 ± 0.0 5264.2 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 false 1 2204.2 ±
4.0 2216.3 ± 0.0 2436.3 ± 6.8 2168.2 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 false 5 4196.3 ±
6.2 4216.4 ± 0.0 4832.4 ± 6.6 4168.3 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 false 10 6696.5 ±
5.4 6712.6 ± 0.0 7844.7 ± 4.0 6664.5 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 false 100 51702.0 ±
4.1 51714.4 ± 0.0 61838.6 ± 6.2 51666.1 ± 0.0 B/op
10.02.2020, 14:08, "Tagir Valeev" <[email protected]>:
> Hello!
>
> In many tests, I see little or no performance improvements. E.g.:
> stringJoiner 100 10 1768.8 ±
> 160.6 1760.8 ± 111.4 ns/op
>
> How would you explain that this code change doesn't improve the
> performance for given count and length?
>
> Also, you are using `new String(bytes)` assuming that the platform
> charset is latin1-compatible. This is not always true, so your code
> would produce incorrect results depending on this setting. In general,
> decoding via charset is tons of extra work. Instead, I would
> experiment with pre-sized StringBuilder inside compactElts, instead of
> char[] array. While it does some extra checks, StringBuilder already
> can use the compact string representation, so if it's properly
> pre-sized, no extra memory will be allocated.
>
> Finally, if you optimize for the case when X is true you should always
> benchmark what happens when X is false. It's great that in some cases
> we see a speedup for latin1 strings. But what if not all of the
> strings are latin1? Is there any performance degradation? If yes, can
> we tolerate it?
>
> With best regards,
> Tagir Valeev.
>
> On Mon, Feb 10, 2020 at 3:22 PM Сергей Цыпанов
> <[email protected]> wrote:
>> Hello,
>>
>> I've reworked the code, patch is attached. Could you please review my
>> solution regarding usage of SahredSecrets?
>>
>> P.S. After I've created the patch it came to my mind that instead of
>> checking all Strings when calling StringJoiner.add()
>> we can check them in toString() method and fail-fast in case at least one
>> of them is non-latin. This likely to reduce
>> regression related to usage of reflection.
>>
>> Regards,
>> Sergey Tsypanov
>>
>> 05.02.2020, 23:21, "[email protected]" <[email protected]>:
>> > ----- Mail original -----
>> >> De: "Сергей Цыпанов" <[email protected]>
>> >> À: "Remi Forax" <[email protected]>, "core-libs-dev"
>> <[email protected]>
>> >> Envoyé: Mercredi 5 Février 2020 22:12:34
>> >> Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
>> >
>> >> Hello,
>> >>
>> >>> If you want to optimize StringJoiner, the best way to do it is to use
>> the shared
>> >>> secret mechanism so a java.util class can see implementation details of
>> a
>> >>> java.lang class without exposing those details publicly.
>> >>> As an example, take a look to EnumSet and its implementations.
>> >>
>> >> I've looked into SharedSecrets, it seems there's no ready-to-use method
>> for
>> >> accessing package-private method. Do you mean it's necessary to add
>> particular
>> >> functionality to JavaLangReflectionAccess as they did for JavaLangAccess
>> in
>> >> order to deal with EnumSet?
>> >
>> > yes !
>> > crossing package boundary in a non public way is not free,
>> > but given that StringJoiner is used quite often (directly or indirectly
>> using Collectors.joining()), it may worth the cost.
>> >
>> >> Regards,
>> >> Sergey
>> >
>> > Regards,
>> > Rémi
>> >
>> >> 04.02.2020, 12:12, "Remi Forax" <[email protected]>:
>> >>> ----- Mail original -----
>> >>>> De: "Сергей Цыпанов" <[email protected]>
>> >>>> À: "jonathan gibbons" <[email protected]>, "core-libs-dev"
>> >>>> <[email protected]>
>> >>>> Envoyé: Mardi 4 Février 2020 08:53:31
>> >>>> Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
>> >>>
>> >>>> Hello,
>> >>>
>> >>> Hi Sergey,
>> >>>
>> >>>> I'd probably agree about a new class in java.lang, but what is wrong
>> about
>> >>>> exposing package-private method
>> >>>> which doesn't modify the state of the object and has no side effects?
>> >>>
>> >>> You can not change the implementation anymore,
>> >>> by example if instead of having a split between latin1 and non latin1,
>> we decide
>> >>> in the future to split between utf8 and non utf8.
>> >>>
>> >>> If you want to optimize StringJoiner, the best way to do it is to use
>> the shared
>> >>> secret mechanism so a java.util class can see implementation details of
>> a
>> >>> java.lang class without exposing those details publicly.
>> >>> As an example, take a look to EnumSet and its implementations.
>> >>>
>> >>> regards,
>> >>> Rémi
>> >>>
>> >>>> 04.02.2020, 00:58, "Jonathan Gibbons" <[email protected]>:
>> >>>>> Sergey,
>> >>>>>
>> >>>>> It is equally bad to create a new class in the java.lang package as it
>> >>>>> is to add a new public method to java.lang.String.
>> >>>>>
>> >>>>> -- Jon
>> >>>>>
>> >>>>> On 2/3/20 2:38 PM, Сергей Цыпанов wrote:
>> >>>>>> Hello,
>> >>>>>>
>> >>>>>> as of JDK14 java.util.StringJoiner still uses char[] as a storage of
>> glued
>> >>>>>> Strings.
>> >>>>>>
>> >>>>>> This applies for the cases when all joined Strings as well as
>> delimiter, prefix
>> >>>>>> and suffix contain only ASCII symbols.
>> >>>>>>
>> >>>>>> As a result when StringJoiner.toString() is invoked, byte[] stored
>> in String is
>> >>>>>> inflated in order to fill in char[] and
>> >>>>>> finally char[] is compressed when constructor of String is called:
>> >>>>>>
>> >>>>>> String delimiter = this.delimiter;
>> >>>>>> char[] chars = new char[this.len + addLen];
>> >>>>>> int k = getChars(this.prefix, chars, 0);
>> >>>>>> if (size > 0) {
>> >>>>>> k += getChars(elts[0], chars, k); // inflate byte[] -> char[]
>> >>>>>>
>> >>>>>> for(int i = 1; i < size; ++i) {
>> >>>>>> k += getChars(delimiter, chars, k);
>> >>>>>> k += getChars(elts[i], chars, k);
>> >>>>>> }
>> >>>>>> }
>> >>>>>>
>> >>>>>> k += getChars(this.suffix, chars, k);
>> >>>>>> return new String(chars); // compress char[] -> byte[]
>> >>>>>>
>> >>>>>> This can be improved by detecting cases when String.isLatin1()
>> returns true for
>> >>>>>> all involved Strings.
>> >>>>>>
>> >>>>>> I've prepared a patch along with benchmark proving that this change
>> is correct
>> >>>>>> and brings improvement.
>> >>>>>> The only concern I have is about String.isLatin1(): as far as String
>> belongs to
>> >>>>>> java.lang and StringJoiner to java.util
>> >>>>>> package-private String.isLatin1() cannot be directly accessed, we
>> need to make
>> >>>>>> it public for successful compilation.
>> >>>>>>
>> >>>>>> Another solution is to create an intermediate utility class located
>> in java.lang
>> >>>>>> which delegates the call to String.isLatin1():
>> >>>>>>
>> >>>>>> package java.lang;
>> >>>>>>
>> >>>>>> public class StringHelper {
>> >>>>>> public static boolean isLatin1(String str) {
>> >>>>>> return str.isLatin1();
>> >>>>>> }
>> >>>>>> }
>> >>>>>>
>> >>>>>> This allows to keep java.lang.String intact and have access to it's
>> >>>>>> package-private method outside of java.lang package.
>> >>>>>>
>> >>>>>> Below I've added results of benchmarking for specified case (all
>> Strings are
>> >>>>>> Latin1). The other case (at least one String is UTF-8) uses existing
>> code so
>> >>>>>> there will be only a tiny regression due to several if-checks.
>> >>>>>>
>> >>>>>> With best regards,
>> >>>>>> Sergey Tsypanov
>> >>>>>>
>> >>>>>> (count) (length) Original Patched Units
>> >>>>>> stringJoiner 1 1 26.7 ± 1.3 38.2 ± 1.1 ns/op
>> >>>>>> stringJoiner 1 5 27.4 ± 0.0 40.5 ± 2.2 ns/op
>> >>>>>> stringJoiner 1 10 29.6 ± 1.9 38.4 ± 1.9 ns/op
>> >>>>>> stringJoiner 1 100 61.1 ± 6.9 47.6 ± 0.6 ns/op
>> >>>>>> stringJoiner 5 1 91.1 ± 6.7 83.6 ± 2.0 ns/op
>> >>>>>> stringJoiner 5 5 96.1 ± 10.7 85.6 ± 1.1 ns/op
>> >>>>>> stringJoiner 5 10 105.5 ± 14.3 84.7 ± 1.1 ns/op
>> >>>>>> stringJoiner 5 100 266.6 ± 30.1 139.6 ± 14.0 ns/op
>> >>>>>> stringJoiner 10 1 190.7 ± 23.0 162.0 ± 2.9 ns/op
>> >>>>>> stringJoiner 10 5 200.0 ± 16.9 167.5 ± 11.0 ns/op
>> >>>>>> stringJoiner 10 10 216.4 ± 12.4 164.8 ± 1.7 ns/op
>> >>>>>> stringJoiner 10 100 545.3 ± 49.7 282.2 ± 12.0 ns/op
>> >>>>>> stringJoiner 100 1 1467.0 ± 90.3 1302.0 ± 18.5 ns/op
>> >>>>>> stringJoiner 100 5 1491.8 ± 166.2 1493.0 ± 135.4 ns/op
>> >>>>>> stringJoiner 100 10 1768.8 ± 160.6 1760.8 ± 111.4 ns/op
>> >>>>>> stringJoiner 100 100 3654.3 ± 113.1 3120.9 ± 175.9 ns/op
>> >>>>>>
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 1 120.0 ± 0.0 120.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 5 128.0 ± 0.0 120.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 10 144.0 ± 0.0 136.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 100 416.0 ± 0.0 312.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 1 144.0 ± 0.0 136.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 5 200.0 ± 0.0 168.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 10 272.0 ± 0.0 216.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 100 1632.0 ± 0.0 1128.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 1 256.0 ± 0.0 232.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 5 376.0 ± 0.0 312.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 10 520.0 ± 0.0 408.0 ± 0.0 B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 100 3224.1 ± 0.0 2216.1 ± 0.0
>> B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 100 1 1760.2 ± 14.9 1544.2 ± 0.0
>> B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 100 5 2960.3 ± 14.9 2344.2 ± 0.0
>> B/op
>> >>>>>> stringJoiner:·gc.alloc.rate.norm 100 10 4440.4 ± 0.0 3336.3 ± 0.0
>> B/op
>> >> >> >> stringJoiner:·gc.alloc.rate.norm 100 100 31449.3 ± 12.2 21346.7 ±
>> 14.7 B/opdiff --git a/src/java.base/share/classes/java/util/StringJoiner.java b/src/java.base/share/classes/java/util/StringJoiner.java
--- a/src/java.base/share/classes/java/util/StringJoiner.java
+++ b/src/java.base/share/classes/java/util/StringJoiner.java
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2013, 2018, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
@@ -24,6 +24,8 @@
*/
package java.util;
+import jdk.internal.access.SharedSecrets;
+
/**
* {@code StringJoiner} is used to construct a sequence of characters separated
* by a delimiter and optionally starting with a supplied prefix
@@ -60,9 +62,12 @@
*
* @see java.util.stream.Collectors#joining(CharSequence)
* @see java.util.stream.Collectors#joining(CharSequence, CharSequence, CharSequence)
- * @since 1.8
-*/
+ * @since 1.8
+ */
public final class StringJoiner {
+
+ private static final boolean isLatin1Available = SharedSecrets.getJavaLangStringAccess().isLatin1("");
+
private final String prefix;
private final String delimiter;
private final String suffix;
@@ -77,7 +82,7 @@
private int len;
/**
- * When overridden by the user to be non-null via {@link setEmptyValue}, the
+ * When overridden by the user to be non-null via {@link #setEmptyValue}, the
* string returned by toString() when no elements have yet been added.
* When null, prefix + suffix is used as the empty value.
*/
@@ -153,6 +158,13 @@
return len;
}
+ @SuppressWarnings("deprecation")
+ private static int getBytes(String s, byte[] bytes, int start) {
+ int len = s.length();
+ s.getBytes(0, len, bytes, start);
+ return len;
+ }
+
/**
* Returns the current value, consisting of the {@code prefix}, the values
* added so far separated by the {@code delimiter}, and the {@code suffix},
@@ -173,20 +185,42 @@
compactElts();
return size == 0 ? "" : elts[0];
}
- final String delimiter = this.delimiter;
+ if (isLatin1Available && allLatin1(elts, size)) {
+ return bytesToString(elts, size, addLen);
+ }
+ return charsToString(elts, size, addLen);
+ }
+
+ private String charsToString(String[] elts, int size, int addLen) {
final char[] chars = new char[len + addLen];
int k = getChars(prefix, chars, 0);
if (size > 0) {
+ final String delimiter = this.delimiter;
k += getChars(elts[0], chars, k);
for (int i = 1; i < size; i++) {
k += getChars(delimiter, chars, k);
k += getChars(elts[i], chars, k);
}
}
- k += getChars(suffix, chars, k);
+ getChars(suffix, chars, k);
return new String(chars);
}
+ private String bytesToString(String[] elts, int size, int addLen) {
+ final byte[] bytes = new byte[len + addLen];
+ int k = getBytes(prefix, bytes, 0);
+ if (size > 0) {
+ final String delimiter = this.delimiter;
+ k += getBytes(elts[0], bytes, k);
+ for (int i = 1; i < size; i++) {
+ k += getBytes(delimiter, bytes, k);
+ k += getBytes(elts[i], bytes, k);
+ }
+ }
+ getBytes(suffix, bytes, k);
+ return new String(bytes);
+ }
+
/**
* Adds a copy of the given {@code CharSequence} value as the next
* element of the {@code StringJoiner} value. If {@code newElement} is
@@ -239,18 +273,39 @@
private void compactElts() {
if (size > 1) {
- final char[] chars = new char[len];
- int i = 1, k = getChars(elts[0], chars, 0);
- do {
- k += getChars(delimiter, chars, k);
- k += getChars(elts[i], chars, k);
- elts[i] = null;
- } while (++i < size);
- size = 1;
- elts[0] = new String(chars);
+ if (isLatin1Available && allLatin1(elts, size)) {
+ compactBytes();
+ } else {
+ compactChars();
+ }
}
}
+ private void compactChars() {
+ final char[] chars = new char[len];
+ int i = 1, k = getChars(elts[0], chars, 0);
+ do {
+ k += getChars(delimiter, chars, k);
+ k += getChars(elts[i], chars, k);
+ elts[i] = null;
+ } while (++i < size);
+ size = 1;
+ elts[0] = new String(chars);
+ }
+
+ private void compactBytes() {
+ final byte[] bytes = new byte[len];
+ int i = 1;
+ int k = getBytes(elts[0], bytes, 0);
+ do {
+ k += getBytes(delimiter, bytes, k);
+ k += getBytes(elts[i], bytes, k);
+ elts[i] = null;
+ } while (++i < size);
+ size = 1;
+ elts[0] = new String(bytes);
+ }
+
/**
* Returns the length of the {@code String} representation
* of this {@code StringJoiner}. Note that if
@@ -265,4 +320,15 @@
return (size == 0 && emptyValue != null) ? emptyValue.length() :
len + prefix.length() + suffix.length();
}
+
+ private static boolean allLatin1(String[] strings, int size) {
+ for (int i = 0; i < size; i++) {
+ String str = strings[i];
+ if (!SharedSecrets.getJavaLangStringAccess().isLatin1(str)) {
+ return false;
+ }
+ }
+ return true;
+ }
+
}
diff --git a/src/java.base/share/classes/jdk/internal/access/JavaLangStringAccess.java b/src/java.base/share/classes/jdk/internal/access/JavaLangStringAccess.java
new file mode 100644
--- /dev/null
+++ b/src/java.base/share/classes/jdk/internal/access/JavaLangStringAccess.java
@@ -0,0 +1,41 @@
+/*
+ * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation. Oracle designates this
+ * particular file as subject to the "Classpath" exception as provided
+ * by Oracle in the LICENSE file that accompanied this code.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+package jdk.internal.access;
+
+/**
+ * An interface which gives access to internals of {@link String}.
+ */
+public interface JavaLangStringAccess {
+
+ /**
+ * Returns true in case all characters of argument {@link String} are ASCII symbols
+ *
+ * @param str String to be checked
+ * @return whether all characters of argument {@link String} are ASCII symbols
+ */
+ boolean isLatin1(String str);
+
+}
diff --git a/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java b/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java
--- a/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java
+++ b/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2002, 2019, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2002, 2020, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
@@ -58,6 +58,7 @@
private static JavaLangModuleAccess javaLangModuleAccess;
private static JavaLangRefAccess javaLangRefAccess;
private static JavaLangReflectAccess javaLangReflectAccess;
+ private static JavaLangStringAccess javaLangStringAccess;
private static JavaIOAccess javaIOAccess;
private static JavaIOFileDescriptorAccess javaIOFileDescriptorAccess;
private static JavaIOFilePermissionAccess javaIOFilePermissionAccess;
@@ -139,6 +140,17 @@
return javaLangReflectAccess;
}
+ public static JavaLangStringAccess getJavaLangStringAccess() {
+ if (javaLangStringAccess == null) {
+ unsafe.ensureClassInitialized(StringAccess.class);
+ }
+ return javaLangStringAccess;
+ }
+
+ public static void setJavaLangStringAccess(JavaLangStringAccess stringAccess) {
+ javaLangStringAccess = stringAccess;
+ }
+
public static void setJavaNetUriAccess(JavaNetUriAccess jnua) {
javaNetUriAccess = jnua;
}
diff --git a/src/java.base/share/classes/jdk/internal/access/StringAccess.java b/src/java.base/share/classes/jdk/internal/access/StringAccess.java
new file mode 100644
--- /dev/null
+++ b/src/java.base/share/classes/jdk/internal/access/StringAccess.java
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation. Oracle designates this
+ * particular file as subject to the "Classpath" exception as provided
+ * by Oracle in the LICENSE file that accompanied this code.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+package jdk.internal.access;
+
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.util.Objects;
+
+/**
+ * Package-private class implementing the
+ * jdk.internal.access.JavaLangStringAccess interface,
+ * allowing non-members of java.lang package
+ * to access internals of {@link java.lang.String}.
+ * */
+class StringAccess implements jdk.internal.access.JavaLangStringAccess {
+
+ static {
+ SharedSecrets.setJavaLangStringAccess(new StringAccess());
+ }
+
+ private final Method isLatin1 = initIsLatin1Method();
+
+ private StringAccess() {
+ }
+
+ @Override
+ public boolean isLatin1(String str) {
+ Objects.requireNonNull(str);
+ try {
+ return (boolean) isLatin1.invoke(str);
+ } catch (IllegalAccessException | InvocationTargetException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ private static Method initIsLatin1Method() {
+ try {
+ final Method isLatin1 = String.class.getDeclaredMethod("isLatin1");
+ isLatin1.setAccessible(true);
+ return isLatin1;
+ } catch (NoSuchMethodException e) {
+ throw new Error(e);
+ }
+ }
+}