Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 02.06.2015 12:15, Paul Sandoz wrote: On Jun 2, 2015, at 10:56 AM, Ivan Gerasimov wrote: You could simplify the data provider sourceTargetReplacementWithNull, there is no point testing with a null source. String.replace accepts two arguments, each can be either null or non-null. Thats 2 bits of state, so one can simply be explicit and presumably it should not matter what the non-null argument value (there are enough non-null values supplied by the other data providers): {null, null} {null, "foo"} {"foo", null} I agree there is not much value in testing the case source == null, so it can be removed. However, I would leave other test cases. Earlier in this thread Rémi spotted a bug, which had been observed in a call "bar".replace("foo", null), but not in "foo".replace("foo", null). Ok, presumably for the same identity rather than value? For any valid substring, so "foobar".replace("bar", null) wouldn't have thrown OOM, even though it should have. Surely, the current testcases often duplicate each other, but it seems to be easier to process them all then to filter the duplicates out. Sincerely yours, Ivan Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On Jun 2, 2015, at 10:56 AM, Ivan Gerasimov wrote: >> >> You could simplify the data provider sourceTargetReplacementWithNull, there >> is no point testing with a null source. String.replace accepts two >> arguments, each can be either null or non-null. Thats 2 bits of state, so >> one can simply be explicit and presumably it should not matter what the >> non-null argument value (there are enough non-null values supplied by the >> other data providers): >> >> {null, null} >> {null, "foo"} >> {"foo", null} > I agree there is not much value in testing the case source == null, so it can > be removed. > However, I would leave other test cases. > Earlier in this thread Rémi spotted a bug, which had been observed in a call > "bar".replace("foo", null), but not in "foo".replace("foo", null). > Ok, presumably for the same identity rather than value? Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Thanks! On 02.06.2015 11:54, Remi Forax wrote: Looks Ok to me, thumb up :) Rémi On 06/01/2015 08:53 PM, Ivan Gerasimov wrote: On 01.06.2015 11:33, Paul Sandoz wrote: On May 31, 2015, at 6:03 PM, Ivan Gerasimov wrote: Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Much better. For the test can you use RandomFactory recently added to the test library? Sure. Here the updated webrev with this change and a few other minor changes. http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/ The changes are: - move declaration of i below, - indent .append(), - use RandomFactory in the test, - extend number of test cases with null input. Do you think it's ready to go? Sincerely yours, Ivan Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 02.06.2015 11:20, Paul Sandoz wrote: On Jun 1, 2015, at 8:53 PM, Ivan Gerasimov wrote: On 01.06.2015 11:33, Paul Sandoz wrote: On May 31, 2015, at 6:03 PM, Ivan Gerasimov wrote: Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Much better. For the test can you use RandomFactory recently added to the test library? Sure. Here the updated webrev with this change and a few other minor changes. http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/ The changes are: - move declaration of i below, - indent .append(), - use RandomFactory in the test, - extend number of test cases with null input. Do you think it's ready to go? Almost :-) Like Sherman i think you should get rid of JavaLangAccess.newStringUnsafe. 78 @Test(dataProvider="sourceTargetReplacementWithNull") You can use the "expectedExceptions" attribute instead of an explicit try/fail/catch. 79 public void testNPE(String source, String target, String replacement) { 80 try { 81 source.replace(target, replacement); 82 fail("Expected to throw NPE: " + source + ".replace(" + target + 83 "," + replacement + ")"); 84 } catch (NullPointerException npe) { 85 } 86 } Alright, I will use expectedExceptions here. You could simplify the data provider sourceTargetReplacementWithNull, there is no point testing with a null source. String.replace accepts two arguments, each can be either null or non-null. Thats 2 bits of state, so one can simply be explicit and presumably it should not matter what the non-null argument value (there are enough non-null values supplied by the other data providers): {null, null} {null, "foo"} {"foo", null} I agree there is not much value in testing the case source == null, so it can be removed. However, I would leave other test cases. Earlier in this thread Rémi spotted a bug, which had been observed in a call "bar".replace("foo", null), but not in "foo".replace("foo", null). That was my motivation to extend the number of test cases for null inputs. Sincerely yours, Ivan No need for another review for any of these. Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Looks Ok to me, thumb up :) Rémi On 06/01/2015 08:53 PM, Ivan Gerasimov wrote: On 01.06.2015 11:33, Paul Sandoz wrote: On May 31, 2015, at 6:03 PM, Ivan Gerasimov wrote: Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Much better. For the test can you use RandomFactory recently added to the test library? Sure. Here the updated webrev with this change and a few other minor changes. http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/ The changes are: - move declaration of i below, - indent .append(), - use RandomFactory in the test, - extend number of test cases with null input. Do you think it's ready to go? Sincerely yours, Ivan Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 01.06.2015 22:10, Xueming Shen wrote: Ivan, The code looks fine for me. Just wonder what's the motivation of using the newStringUnsafe() in the test case. Simply to save the char[] copy to speed up the test? I don't think we really care about it here, right? Yes, right. I'll replace it with new String(char[]). Sincerely yours, Ivan -Sherman On 06/01/2015 11:53 AM, Ivan Gerasimov wrote: On 01.06.2015 11:33, Paul Sandoz wrote: On May 31, 2015, at 6:03 PM, Ivan Gerasimov wrote: Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Much better. For the test can you use RandomFactory recently added to the test library? Sure. Here the updated webrev with this change and a few other minor changes. http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/ The changes are: - move declaration of i below, - indent .append(), - use RandomFactory in the test, - extend number of test cases with null input. Do you think it's ready to go? Sincerely yours, Ivan Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On Jun 1, 2015, at 8:53 PM, Ivan Gerasimov wrote: > > > On 01.06.2015 11:33, Paul Sandoz wrote: >> On May 31, 2015, at 6:03 PM, Ivan Gerasimov >> wrote: >>> Which is right here: >>> http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ >>> >> Much better. >> >> For the test can you use RandomFactory recently added to the test library? > > Sure. > Here the updated webrev with this change and a few other minor changes. > http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/ > > The changes are: > - move declaration of i below, > - indent .append(), > - use RandomFactory in the test, > - extend number of test cases with null input. > > Do you think it's ready to go? > Almost :-) Like Sherman i think you should get rid of JavaLangAccess.newStringUnsafe. 78 @Test(dataProvider="sourceTargetReplacementWithNull") You can use the "expectedExceptions" attribute instead of an explicit try/fail/catch. 79 public void testNPE(String source, String target, String replacement) { 80 try { 81 source.replace(target, replacement); 82 fail("Expected to throw NPE: " + source + ".replace(" + target + 83 "," + replacement + ")"); 84 } catch (NullPointerException npe) { 85 } 86 } You could simplify the data provider sourceTargetReplacementWithNull, there is no point testing with a null source. String.replace accepts two arguments, each can be either null or non-null. Thats 2 bits of state, so one can simply be explicit and presumably it should not matter what the non-null argument value (there are enough non-null values supplied by the other data providers): {null, null} {null, "foo"} {"foo", null} No need for another review for any of these. Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Ivan, The code looks fine for me. Just wonder what's the motivation of using the newStringUnsafe() in the test case. Simply to save the char[] copy to speed up the test? I don't think we really care about it here, right? -Sherman On 06/01/2015 11:53 AM, Ivan Gerasimov wrote: On 01.06.2015 11:33, Paul Sandoz wrote: On May 31, 2015, at 6:03 PM, Ivan Gerasimov wrote: Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Much better. For the test can you use RandomFactory recently added to the test library? Sure. Here the updated webrev with this change and a few other minor changes. http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/ The changes are: - move declaration of i below, - indent .append(), - use RandomFactory in the test, - extend number of test cases with null input. Do you think it's ready to go? Sincerely yours, Ivan Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 01.06.2015 11:33, Paul Sandoz wrote: On May 31, 2015, at 6:03 PM, Ivan Gerasimov wrote: Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Much better. For the test can you use RandomFactory recently added to the test library? Sure. Here the updated webrev with this change and a few other minor changes. http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/ The changes are: - move declaration of i below, - indent .append(), - use RandomFactory in the test, - extend number of test cases with null input. Do you think it's ready to go? Sincerely yours, Ivan Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 06/01/2015 10:32 AM, Ulf Zibis wrote: Hi, Am 31.05.2015 um 18:03 schrieb Ivan Gerasimov: On 31.05.2015 18:54, Ivan Gerasimov wrote: I fixed the code and added the case to the regression test in the new webrev. Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Shoudn't the user be informed via javadoc about the risk of an OutOfMemoryError, just from illegal arguments of this method? Example: this.value.length() = 100 target.value.length() = 200 replacement.value.length() = 50 results in: ...no replacements made, since target.value.length() > this.value.length() and 1st indexOf returns -1... if 1st indexOf returns >= 0, then this.value.length() >= target.value.length() and the only way to get negative newLenHint is via an overflow indicating that a String with length() > Integer.MAX_VALUE would be required to hold the result, which is impossible. We don't have a special StringTooBigException. OOME is the best approximation here. IllegalArgumentException is another option, but seems to be to general in this case. What happens when regexp based replace is fed with such huge strings? I had to set the max. heap size to 14Gbytes to get the answer that was not an OOME caused by not enough heap space: char[] chars = new char[1 + (1 << 30)]; chars[0] = 'a'; String big = new String(chars); big.replace("a", big); Exception in thread "main" java.lang.OutOfMemoryError at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:517) at java.lang.StringBuilder.append(StringBuilder.java:175) at java.util.regex.Matcher.appendTail(Matcher.java:1122) at java.util.regex.Matcher.replaceAll(Matcher.java:1169) at TestReplace.replace(TestReplace.java:11) at TestReplace.main(TestReplace.java:18) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Regards, Peter newLenHint = -500 --> OutOfMemoryError In other words, is this a good reason to throw such an Error? Little nit: Indentation for line continuation should be 8 spaces. -Ulf
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi Ivan, Le 31 mai 2015 17:54:35 CEST, Ivan Gerasimov a écrit : >Hi Remi! > >On 31.05.2015 11:43, Remi Forax wrote: >> I agree with the others the code is more readable. >> >> There is a slightly difference between the current behavior and the >> behavior of the proposed code, >> "foo".replace("bar", null) >> should throw a NPE because replacement is null but the proposed code >> will return "foo" >> because replacement.toString() is called after the short-circuit test > >> (j < 0). >> >Yes, you're right, thanks for catching it! >And the regression test should have caught that, but I only had >"a".replace("a", null) there, which passed. > >I fixed the code and added the case to the regression test in the new >webrev. > >> so the first part of the code should be: >> String starget = target.toString(); // implict nullcheck >> String srepl = replacement.toString(); // implicit nullcheck >> int j = indexOf(starget); >> if (j < 0) { >> return this; >> } >> ... >> >> also 'i' can be initialized just before the 'do', there is no point >to >> initialize it before. >> >> To finish, the two 'final' are useless here but i suppose it's a >> matter of style :) >> >I moved declaration of i just to save a line. I don't think it >decreased performance. I don't think so too. It decrease readability, when you read the code of the loop you have to scroll up to find the initialization of 'i' like in plain old good C. > >Declaring the 'value' variable final was suggested by Martin, and I >think it is reasonable (see >http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032601.html). >The other variable is final just for symmetry :) Either i read the code in an IDE and local variables and fields do not have the same color or i read diff and in that case fields are usually not visible. For me, final on local variable is just noise. But as i said it's a matter of taste. > >Sincerely yours, >Ivan cheers, Rémi > > >> cheers, >> Rémi >> >> >> On 05/31/2015 04:19 AM, Ivan Gerasimov wrote: >>> Hi everyone! >>> >>> Here's another webrev, in which replace() is implemented with >>> StringBuilder. >>> On my benchmark it is almost as fast as the version backed with >>> arrays, but this variant is much shorter. >>> >>> Credits to Sherman for combining the general algorithm with the case > >>> of empty target. >>> >>> Comments, further suggestions are welcome! >>> >>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 >>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/ >>> >>> Sincerely yours, >>> Ivan >>> >> >> >>
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On May 31, 2015, at 6:03 PM, Ivan Gerasimov wrote: >> > Which is right here: > http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ > Much better. For the test can you use RandomFactory recently added to the test library? Paul.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi, Am 31.05.2015 um 18:03 schrieb Ivan Gerasimov: On 31.05.2015 18:54, Ivan Gerasimov wrote: I fixed the code and added the case to the regression test in the new webrev. Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ Shoudn't the user be informed via javadoc about the risk of an OutOfMemoryError, just from illegal arguments of this method? Example: this.value.length() = 100 target.value.length() = 200 replacement.value.length() = 50 results in: newLenHint = -500 --> OutOfMemoryError In other words, is this a good reason to throw such an Error? Little nit: Indentation for line continuation should be 8 spaces. -Ulf
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 31.05.2015 18:54, Ivan Gerasimov wrote: Hi Remi! On 31.05.2015 11:43, Remi Forax wrote: I agree with the others the code is more readable. There is a slightly difference between the current behavior and the behavior of the proposed code, "foo".replace("bar", null) should throw a NPE because replacement is null but the proposed code will return "foo" because replacement.toString() is called after the short-circuit test (j < 0). Yes, you're right, thanks for catching it! And the regression test should have caught that, but I only had "a".replace("a", null) there, which passed. I fixed the code and added the case to the regression test in the new webrev. Which is right here: http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/ so the first part of the code should be: String starget = target.toString(); // implict nullcheck String srepl = replacement.toString(); // implicit nullcheck int j = indexOf(starget); if (j < 0) { return this; } ... also 'i' can be initialized just before the 'do', there is no point to initialize it before. To finish, the two 'final' are useless here but i suppose it's a matter of style :) I moved declaration of i just to save a line. I don't think it decreased performance. Declaring the 'value' variable final was suggested by Martin, and I think it is reasonable (see http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032601.html). The other variable is final just for symmetry :) Sincerely yours, Ivan cheers, Rémi On 05/31/2015 04:19 AM, Ivan Gerasimov wrote: Hi everyone! Here's another webrev, in which replace() is implemented with StringBuilder. On my benchmark it is almost as fast as the version backed with arrays, but this variant is much shorter. Credits to Sherman for combining the general algorithm with the case of empty target. Comments, further suggestions are welcome! BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/ Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi Remi! On 31.05.2015 11:43, Remi Forax wrote: I agree with the others the code is more readable. There is a slightly difference between the current behavior and the behavior of the proposed code, "foo".replace("bar", null) should throw a NPE because replacement is null but the proposed code will return "foo" because replacement.toString() is called after the short-circuit test (j < 0). Yes, you're right, thanks for catching it! And the regression test should have caught that, but I only had "a".replace("a", null) there, which passed. I fixed the code and added the case to the regression test in the new webrev. so the first part of the code should be: String starget = target.toString(); // implict nullcheck String srepl = replacement.toString(); // implicit nullcheck int j = indexOf(starget); if (j < 0) { return this; } ... also 'i' can be initialized just before the 'do', there is no point to initialize it before. To finish, the two 'final' are useless here but i suppose it's a matter of style :) I moved declaration of i just to save a line. I don't think it decreased performance. Declaring the 'value' variable final was suggested by Martin, and I think it is reasonable (see http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032601.html). The other variable is final just for symmetry :) Sincerely yours, Ivan cheers, Rémi On 05/31/2015 04:19 AM, Ivan Gerasimov wrote: Hi everyone! Here's another webrev, in which replace() is implemented with StringBuilder. On my benchmark it is almost as fast as the version backed with arrays, but this variant is much shorter. Credits to Sherman for combining the general algorithm with the case of empty target. Comments, further suggestions are welcome! BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/ Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 31.05.2015 13:28, Ulf Zibis wrote: Am 31.05.2015 um 08:38 schrieb Peter Levart: Hi, Yes, this one is much easier to grasp. As I understand the check is to avoid overflow in calculation of StringBuilder initial capacity (newLenHint). If overflow happened, newLenHint would be negative and StringBuilder construction would fail with NegativeArraySizeException. But the calculation of newLenHint: int newLenHint = value.length - targLen + replValue.length; ...considering the following: value.length >= 0 targLength >= 0 replValue.length >= 0 targLength <= value.length in case of overflow, it can only produce a negative value. So the check could simply be: if (newLenHint < 0) { throw new OutOfMemoryError(); } Hm, what has this situation to do with Out-Of-Memory ? That should indicate that we were about to try to allocate an array of size > Integer.MAX_VALUE. E.g. java.util.AbstractCollection.java: private static int hugeCapacity(int minCapacity) { if (minCapacity < 0) // overflow throw new OutOfMemoryError ("Required array size too large"); In other words, IMHO NegativeArraySizeException is much better here and additionally saves performance. Additionally you could propagate it to InvalidArgumentException. I don't think it is InvalidArgument case. It just happens that the result is larger then allowed, so let's throw OOM! Sincerely yours, Ivan -Ulf
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Thanks! I'll simplify this check for overflow, as you're suggesting. Sincerely yours, Ivan On 31.05.2015 10:01, Xueming Shen wrote: On 5/30/2015 11:38 PM, Peter Levart wrote: Hi, Yes, this one is much easier to grasp. As I understand the check is to avoid overflow in calculation of StringBuilder initial capacity (newLenHint). If overflow happened, newLenHint would be negative and StringBuilder construction would fail with NegativeArraySizeException. But the calculation of newLenHint: int newLenHint = value.length - targLen + replValue.length; ...considering the following: value.length >= 0 targLength >= 0 replValue.length >= 0 targLength <= value.length in case of overflow, it can only produce a negative value. So the check could simply be: if (newLenHint < 0) { throw new OutOfMemoryError(); } Right? Regards, Peter agreed.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Am 31.05.2015 um 08:38 schrieb Peter Levart: Hi, Yes, this one is much easier to grasp. As I understand the check is to avoid overflow in calculation of StringBuilder initial capacity (newLenHint). If overflow happened, newLenHint would be negative and StringBuilder construction would fail with NegativeArraySizeException. But the calculation of newLenHint: int newLenHint = value.length - targLen + replValue.length; ...considering the following: value.length >= 0 targLength >= 0 replValue.length >= 0 targLength <= value.length in case of overflow, it can only produce a negative value. So the check could simply be: if (newLenHint < 0) { throw new OutOfMemoryError(); } Hm, what has this situation to do with Out-Of-Memory ? In other words, IMHO NegativeArraySizeException is much better here and additionally saves performance. Additionally you could propagate it to InvalidArgumentException. -Ulf
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
I agree with the others the code is more readable. There is a slightly difference between the current behavior and the behavior of the proposed code, "foo".replace("bar", null) should throw a NPE because replacement is null but the proposed code will return "foo" because replacement.toString() is called after the short-circuit test (j < 0). so the first part of the code should be: String starget = target.toString(); // implict nullcheck String srepl = replacement.toString(); // implicit nullcheck int j = indexOf(starget); if (j < 0) { return this; } ... also 'i' can be initialized just before the 'do', there is no point to initialize it before. To finish, the two 'final' are useless here but i suppose it's a matter of style :) cheers, Rémi On 05/31/2015 04:19 AM, Ivan Gerasimov wrote: Hi everyone! Here's another webrev, in which replace() is implemented with StringBuilder. On my benchmark it is almost as fast as the version backed with arrays, but this variant is much shorter. Credits to Sherman for combining the general algorithm with the case of empty target. Comments, further suggestions are welcome! BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/ Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 5/30/2015 11:38 PM, Peter Levart wrote: Hi, Yes, this one is much easier to grasp. As I understand the check is to avoid overflow in calculation of StringBuilder initial capacity (newLenHint). If overflow happened, newLenHint would be negative and StringBuilder construction would fail with NegativeArraySizeException. But the calculation of newLenHint: int newLenHint = value.length - targLen + replValue.length; ...considering the following: value.length >= 0 targLength >= 0 replValue.length >= 0 targLength <= value.length in case of overflow, it can only produce a negative value. So the check could simply be: if (newLenHint < 0) { throw new OutOfMemoryError(); } Right? Regards, Peter agreed.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 05/31/2015 06:34 AM, Xueming Shen wrote: On 5/30/15 7:19 PM, Ivan Gerasimov wrote: Hi everyone! Here's another webrev, in which replace() is implemented with StringBuilder. On my benchmark it is almost as fast as the version backed with arrays, but this variant is much shorter. Credits to Sherman for combining the general algorithm with the case of empty target. Comments, further suggestions are welcome! BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/ Sincerely yours, Ivan This is one is much better:-) I would suggest to leave the "overflow-conscious" code to the StringBuilder. The same range check is being done inside ABS every time the repl string is appended into the buffer, it does not appear to be very helpful to have a redundant check here. And it seems this check only works for the single appearance of target string. 2260 // overflow-conscious code 2261 if (value.length - targLen > Integer.MAX_VALUE - replValue.length) { 2262 throw new OutOfMemoryError(); 2263 } Hi, Yes, this one is much easier to grasp. As I understand the check is to avoid overflow in calculation of StringBuilder initial capacity (newLenHint). If overflow happened, newLenHint would be negative and StringBuilder construction would fail with NegativeArraySizeException. But the calculation of newLenHint: int newLenHint = value.length - targLen + replValue.length; ...considering the following: value.length >= 0 targLength >= 0 replValue.length >= 0 targLength <= value.length in case of overflow, it can only produce a negative value. So the check could simply be: if (newLenHint < 0) { throw new OutOfMemoryError(); } Right? Regards, Peter
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 5/30/15 7:19 PM, Ivan Gerasimov wrote: Hi everyone! Here's another webrev, in which replace() is implemented with StringBuilder. On my benchmark it is almost as fast as the version backed with arrays, but this variant is much shorter. Credits to Sherman for combining the general algorithm with the case of empty target. Comments, further suggestions are welcome! BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/ Sincerely yours, Ivan This is one is much better:-) I would suggest to leave the "overflow-conscious" code to the StringBuilder. The same range check is being done inside ABS every time the repl string is appended into the buffer, it does not appear to be very helpful to have a redundant check here. And it seems this check only works for the single appearance of target string. 2260 // overflow-conscious code 2261 if (value.length - targLen > Integer.MAX_VALUE - replValue.length) { 2262 throw new OutOfMemoryError(); 2263 }
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi everyone! Here's another webrev, in which replace() is implemented with StringBuilder. On my benchmark it is almost as fast as the version backed with arrays, but this variant is much shorter. Credits to Sherman for combining the general algorithm with the case of empty target. Comments, further suggestions are welcome! BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/ Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 05/27/2015 01:10 PM, Xueming Shen wrote: On 05/27/2015 12:43 PM, Ivan Gerasimov wrote: On 27.05.2015 21:08, Xueming Shen wrote: targLen = max(1, tagLen); ? Well, almost :) With such targLen, (i = j + targLen) would result in i == length() + 1, which will cause IOOBE in the following append(). I'm sure the algorithms can be adopted to run correctly with empty target, it just needs some accurate checking. But why can't we consider the first variant of replace()? http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ It's still faster and it can handle larger strings without OOME. I think this is important. I haven't heard any critics of it except for its relative complexity, comparing to the original code. And we have a backup variant with StringBuilder, if problems are found. Personally I prefer not do "buffer management" issue in replace(), StringBuilder is just doing the job fine. If possible I would avoid to have a special method() to just work a corner case. "is relative complexity" is a critics. We are trying to see if we can achieve most of the gain without such complexity. -Sherman
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 05/27/2015 12:43 PM, Ivan Gerasimov wrote: On 27.05.2015 21:08, Xueming Shen wrote: targLen = max(1, tagLen); ? Well, almost :) With such targLen, (i = j + targLen) would result in i == length() + 1, which will cause IOOBE in the following append(). I'm sure the algorithms can be adopted to run correctly with empty target, it just needs some accurate checking. But why can't we consider the first variant of replace()? http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ It's still faster and it can handle larger strings without OOME. I think this is important. I haven't heard any critics of it except for its relative complexity, comparing to the original code. And we have a backup variant with StringBuilder, if problems are found.
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 27.05.2015 21:08, Xueming Shen wrote: targLen = max(1, tagLen); ? Well, almost :) With such targLen, (i = j + targLen) would result in i == length() + 1, which will cause IOOBE in the following append(). I'm sure the algorithms can be adopted to run correctly with empty target, it just needs some accurate checking. But why can't we consider the first variant of replace()? http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ It's still faster and it can handle larger strings without OOME. I think this is important. I haven't heard any critics of it except for its relative complexity, comparing to the original code. And we have a backup variant with StringBuilder, if problems are found. Sincerely yours, Ivan On 05/27/2015 10:59 AM, Ivan Gerasimov wrote: On 27.05.2015 20:40, Xueming Shen wrote: The .append(srepl) -> append(srep.value) And I would simply remove the "fastpath" for the target.length() = 0 special case ... And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for the sb to further reduce its internal expanding ... Sorry, I forgot to comment this. This is not really a fastpath. This is a special case, which isn't handled by the general algorithm. We can modify the general algorithm, of course to handle empty target, but I thought it would be more clear to have a separate branch. I think your variant below would loop forever in the loop with an empty target: i = j + targLen; // targLen == 0 j = indexOf(starget, i) // j == i Sincerely yours, Ivan Personally this code is straightforward and I would be happy with the 800+ vs 1000+ score diff. public String replace(CharSequence target, CharSequence replacement) { String starget = target.toString(); int targLen = starget.length(); String srepl = replacement.toString(); int j = indexOf(starget); // special case: nothing to replace if (j< 0) { return this; } final char[] value = this.value; StringBuilder sb = new StringBuilder(); int i = 0; do { sb.append(value, i, j - i) .append(srepl.value); i = j + targLen; } while ((j = indexOf(starget, i))> 0); return sb.append(this, i, length()).toString(); } -Sherman On 05/27/2015 10:06 AM, Ivan Gerasimov wrote: On 27.05.2015 18:44, Xueming Shen wrote: You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str). Ah, yes sure! It would improve things. I updated the webrev in place: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ However the general picture is almost the same: performance: Baseline: MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s StringBuilder: MyBenchmark.test thrpt 40 808'033.808 ± 22152.166 ops/s Arrays:MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s Memory efficiency is still less then for array-version: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.429s user0m2.112s sys0m0.792s Sincerely yours, Ivan Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case here for it. thanks, -Sherman On 5/27/15 8:29 AM, Ivan Gerasimov wrote: For completeness, here's another webrev, which uses StringBuilder: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ Its performance is somewhere in between the current implementation and the array-based implementation: MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s Memory efficiency is less then the one of array-based implementation: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.585s user0m1.875s sys0m0.887s Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I wan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
targLen = max(1, tagLen); ? On 05/27/2015 10:59 AM, Ivan Gerasimov wrote: On 27.05.2015 20:40, Xueming Shen wrote: The .append(srepl) -> append(srep.value) And I would simply remove the "fastpath" for the target.length() = 0 special case ... And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for the sb to further reduce its internal expanding ... Sorry, I forgot to comment this. This is not really a fastpath. This is a special case, which isn't handled by the general algorithm. We can modify the general algorithm, of course to handle empty target, but I thought it would be more clear to have a separate branch. I think your variant below would loop forever in the loop with an empty target: i = j + targLen; // targLen == 0 j = indexOf(starget, i) // j == i Sincerely yours, Ivan Personally this code is straightforward and I would be happy with the 800+ vs 1000+ score diff. public String replace(CharSequence target, CharSequence replacement) { String starget = target.toString(); int targLen = starget.length(); String srepl = replacement.toString(); int j = indexOf(starget); // special case: nothing to replace if (j< 0) { return this; } final char[] value = this.value; StringBuilder sb = new StringBuilder(); int i = 0; do { sb.append(value, i, j - i) .append(srepl.value); i = j + targLen; } while ((j = indexOf(starget, i))> 0); return sb.append(this, i, length()).toString(); } -Sherman On 05/27/2015 10:06 AM, Ivan Gerasimov wrote: On 27.05.2015 18:44, Xueming Shen wrote: You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str). Ah, yes sure! It would improve things. I updated the webrev in place: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ However the general picture is almost the same: performance: Baseline: MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s StringBuilder: MyBenchmark.test thrpt 40 808'033.808 ± 22152.166 ops/s Arrays:MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s Memory efficiency is still less then for array-version: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.429s user0m2.112s sys0m0.792s Sincerely yours, Ivan Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case here for it. thanks, -Sherman On 5/27/15 8:29 AM, Ivan Gerasimov wrote: For completeness, here's another webrev, which uses StringBuilder: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ Its performance is somewhere in between the current implementation and the array-based implementation: MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s Memory efficiency is less then the one of array-based implementation: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.585s user0m1.875s sys0m0.887s Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/j
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 27.05.2015 20:40, Xueming Shen wrote: The .append(srepl) -> append(srep.value) And I would simply remove the "fastpath" for the target.length() = 0 special case ... And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for the sb to further reduce its internal expanding ... Sorry, I forgot to comment this. This is not really a fastpath. This is a special case, which isn't handled by the general algorithm. We can modify the general algorithm, of course to handle empty target, but I thought it would be more clear to have a separate branch. I think your variant below would loop forever in the loop with an empty target: i = j + targLen; // targLen == 0 j = indexOf(starget, i) // j == i Sincerely yours, Ivan Personally this code is straightforward and I would be happy with the 800+ vs 1000+ score diff. public String replace(CharSequence target, CharSequence replacement) { String starget = target.toString(); int targLen = starget.length(); String srepl = replacement.toString(); int j = indexOf(starget); // special case: nothing to replace if (j< 0) { return this; } final char[] value = this.value; StringBuilder sb = new StringBuilder(); int i = 0; do { sb.append(value, i, j - i) .append(srepl.value); i = j + targLen; } while ((j = indexOf(starget, i))> 0); return sb.append(this, i, length()).toString(); } -Sherman On 05/27/2015 10:06 AM, Ivan Gerasimov wrote: On 27.05.2015 18:44, Xueming Shen wrote: You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str). Ah, yes sure! It would improve things. I updated the webrev in place: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ However the general picture is almost the same: performance: Baseline: MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s StringBuilder: MyBenchmark.test thrpt 40 808'033.808 ± 22152.166 ops/s Arrays:MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s Memory efficiency is still less then for array-version: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.429s user0m2.112s sys0m0.792s Sincerely yours, Ivan Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case here for it. thanks, -Sherman On 5/27/15 8:29 AM, Ivan Gerasimov wrote: For completeness, here's another webrev, which uses StringBuilder: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ Its performance is somewhere in between the current implementation and the array-based implementation: MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s Memory efficiency is less then the one of array-based implementation: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.585s user0m1.875s sys0m0.887s Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-e
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
The .append(srepl) -> append(srep.value) And I would simply remove the "fastpath" for the target.length() = 0 special case ... And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for the sb to further reduce its internal expanding ... Personally this code is straightforward and I would be happy with the 800+ vs 1000+ score diff. public String replace(CharSequence target, CharSequence replacement) { String starget = target.toString(); int targLen = starget.length(); String srepl = replacement.toString(); int j = indexOf(starget); // special case: nothing to replace if (j< 0) { return this; } final char[] value = this.value; StringBuilder sb = new StringBuilder(); int i = 0; do { sb.append(value, i, j - i) .append(srepl.value); i = j + targLen; } while ((j = indexOf(starget, i))> 0); return sb.append(this, i, length()).toString(); } -Sherman On 05/27/2015 10:06 AM, Ivan Gerasimov wrote: On 27.05.2015 18:44, Xueming Shen wrote: You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str). Ah, yes sure! It would improve things. I updated the webrev in place: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ However the general picture is almost the same: performance: Baseline: MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s StringBuilder: MyBenchmark.test thrpt 40 808'033.808 ± 22152.166 ops/s Arrays:MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s Memory efficiency is still less then for array-version: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.429s user0m2.112s sys0m0.792s Sincerely yours, Ivan Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case here for it. thanks, -Sherman On 5/27/15 8:29 AM, Ivan Gerasimov wrote: For completeness, here's another webrev, which uses StringBuilder: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ Its performance is somewhere in between the current implementation and the array-based implementation: MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s Memory efficiency is less then the one of array-based implementation: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.585s user0m1.875s sys0m0.887s Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 27.05.2015 18:44, Xueming Shen wrote: You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str). Ah, yes sure! It would improve things. I updated the webrev in place: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ However the general picture is almost the same: performance: Baseline: MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s StringBuilder: MyBenchmark.test thrpt 40 808'033.808 ± 22152.166 ops/s Arrays:MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s Memory efficiency is still less then for array-version: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.429s user0m2.112s sys0m0.792s Sincerely yours, Ivan Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case here for it. thanks, -Sherman On 5/27/15 8:29 AM, Ivan Gerasimov wrote: For completeness, here's another webrev, which uses StringBuilder: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ Its performance is somewhere in between the current implementation and the array-based implementation: MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s Memory efficiency is less then the one of array-based implementation: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.585s user0m1.875s sys0m0.887s Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
You might want to use directly sb.append.(char[], off, len), instead of append(str, off, len)/append(str). Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do we need a special case here for it. thanks, -Sherman On 5/27/15 8:29 AM, Ivan Gerasimov wrote: For completeness, here's another webrev, which uses StringBuilder: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ Its performance is somewhere in between the current implementation and the array-based implementation: MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s Memory efficiency is less then the one of array-based implementation: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.585s user0m1.875s sys0m0.887s Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
For completeness, here's another webrev, which uses StringBuilder: http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/ Its performance is somewhere in between the current implementation and the array-based implementation: MyBenchmark.test thrpt 40 796'059.192 ± 12455.970 ops/s Memory efficiency is less then the one of array-based implementation: $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m2.585s user0m1.875s sys0m0.887s Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Thank you Peter for looking at this! On 27.05.2015 16:15, Peter Levart wrote: Hi Ivan, How does this implementation compare to your's two regarding speed: http://cr.openjdk.java.net/~plevart/jdk9-dev/String.replace/webrev.01/ You need to check for possible overflow here: 2287 resLen += repLen + ss.length(); It is not much shorter than your's first, but more object oriented ;-) Well, the main difference is that you store the positions of substrings in a linked-list instead of an array. But arrays are more efficient. On small data the performance is almost the same as for the 'array' version: MyBenchmark.test thrpt 40 1'046'137.437 ± 13961.121 ops/s But on huge data the performance is much worse, comparing to even current implementation. Here are the results of running the stress test, I had posted earlier: - time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode) 25) 100663296 real0m12.945s user0m15.670s sys0m7.189s - It took 6 times more time to complete, comparing to the current implementation, which uses regexp engine. Sincerely yours, Ivan Regards, Peter On 05/27/2015 11:36 AM, Ivan Gerasimov wrote: Hi Sherman! Please take a look at my other webrev, that I provided in the first message. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ I used StringJoiner there, which in some aspects seems to fit better here, comparing to StringBuilder. It does reduce the code, but of course runs slower. In that version every part of the source string had to be converted to a separate String, which add another redundant copying. I still prefer the version, which deals with arrays. I don't think it's too complex. It surely adds some complexity to the original code, but it's only 70 lines of code in total. Everything else are comments and empty lines. Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi Ivan, How does this implementation compare to your's two regarding speed: http://cr.openjdk.java.net/~plevart/jdk9-dev/String.replace/webrev.01/ It is not much shorter than your's first, but more object oriented ;-) Regards, Peter On 05/27/2015 11:36 AM, Ivan Gerasimov wrote: Hi Sherman! Please take a look at my other webrev, that I provided in the first message. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ I used StringJoiner there, which in some aspects seems to fit better here, comparing to StringBuilder. It does reduce the code, but of course runs slower. In that version every part of the source string had to be converted to a separate String, which add another redundant copying. I still prefer the version, which deals with arrays. I don't think it's too complex. It surely adds some complexity to the original code, but it's only 70 lines of code in total. Everything else are comments and empty lines. Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 05/27/2015 11:36 AM, Ivan Gerasimov wrote: Hi Sherman! Please take a look at my other webrev, that I provided in the first message. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ I used StringJoiner there, which in some aspects seems to fit better here, comparing to StringBuilder. It does reduce the code, but of course runs slower. In that version every part of the source string had to be converted to a separate String, which add another redundant copying. I still prefer the version, which deals with arrays. I don't think it's too complex. It surely adds some complexity to the original code, but it's only 70 lines of code in total. Everything else are comments and empty lines. Sincerely yours, Ivan Hi Ivan, If I was to choose, I would take the 1st variant (with index arrays) too. The speed does matter! But anyway, every now and then one would like to write an algorithm that gathers sub-strings and then either joins them or dumps them to some buffer that eventually becomes another string or array. Since String.substring() changed when offset/length fields were removed, there is no official way to handle sub-sequences as "vews" of underlying String. There was some debate in the past about using String's implementation of CharSequence.subSequence() for that purpose - to return a private CharSequence implementation that would represent a view of underlying String's array, but that was not pursued because of limiting API specification that says: "Returns a character sequence that is a subsequence of this sequence. An invocation of this method of the form str.subSequence(begin, end) behaves in exactly the same way as the invocation str.substring(begin, end)" My opinion is that this part of text did matter at the time when substring() did return a view over underlying array, but when the substring method changed, there was no reason to keep the specification bound to that text. It would have been wiser to keep the behavior unchanged and change the specification. Anyway, what's done is done. I don't know if String.subSequence can be changed now. Maybe? What could be done is introduce some new API. It could be in the form of a new class or just a static method (on CharSequence perhaps?) that would return a private implementation of CharSequence as a sub-sequence view over underlying CharSequence or String's array in case the passed-in argument is a String. With such API one could gather sub-sequence view objects into a LinkedList and then dump them into the correctly pre-sized StringBuilder. There could even be a String counstructor with the following signature: public String(Iterable sequences, int estimatedLength) { ... ..which would eliminate the last copying step too. Hopefully all temporary view objects would be found as method-local and would be allocated on stack / in registers by JIT (wishful thinking)... Regards, Peter On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s S
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi Sherman! Please take a look at my other webrev, that I provided in the first message. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ I used StringJoiner there, which in some aspects seems to fit better here, comparing to StringBuilder. It does reduce the code, but of course runs slower. In that version every part of the source string had to be converted to a separate String, which add another redundant copying. I still prefer the version, which deals with arrays. I don't think it's too complex. It surely adds some complexity to the original code, but it's only 70 lines of code in total. Everything else are comments and empty lines. Sincerely yours, Ivan On 27.05.2015 2:38, Xueming Shen wrote: Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Ivan, It might be worth trying String.index + StringBuilder, instead of writing/handling everything yourself. Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to String, but it might have better balance between performance and code complexity. The regex is probably a little heavy for literal string replacement, but StringBuilder should not be that bad ... -Sherman On 5/26/15 4:11 PM, Ivan Gerasimov wrote: I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
I updated the webrev: http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/ In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more accurate here. And I want to add that this proposed implementation is not only faster, but also more memory efficient. The following simple stress-test shows that the proposed version is able to handle twice larger strings, comparing to the current implementation. public class C { public static void main(String[] args) throws Throwable { String s = "string"; for (int i = 1; i < Integer.MAX_VALUE; ++i) { try { s = s.replace("string", "stringstring"); } catch (OutOfMemoryError o) { System.out.println(i + ") " + s.length()); break; } } } } $ time ~/java9/jdk/bin/java -showversion -Xmx1g C java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b63) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode) 25) 100663296 real0m4.525s user0m4.402s sys0m1.189s $ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode) 26) 201326592 real0m2.139s user0m1.960s sys0m0.461s Sincerely yours, Ivan On 24.05.2015 23:17, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
I know it was like that before :) I asked similar question about fixing String.contains to not call toString on it's CharSequence argument ( http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032394.html). Responses were generally encouraging. Regards, Tomasz On Tue, May 26, 2015 at 7:39 PM, Ivan Gerasimov wrote: > But the current implementation converts the argument to String in the > first place too: > > return Pattern.compile(*target.toString()*, > Pattern.LITERAL).matcher( > this).replaceAll(Matcher.quoteReplacement( > *replacement.toString()*)); > > So I didn't introduce this conversion :-) > > When browsing the source code, looking for usages of this method, I only > saw Strings passed as the parameters. > Thus, I guess it is reasonable to have the variant which works best in > this case. > > Sincerely yours, > Ivan > > > On 26.05.2015 20:26, Tomasz Kowalczewski wrote: > > I second that. API accepts CharSequence for a reason. The benchmarks might > look a little different if you pass StringBuilder in a call to replace. It > looks like removing toString() call on CharSequences will be easy with one > drawback of loosing the benefit of System.arraycopy call. Note that > String.indexOf requires String argument but that can also be fixed. > > -- > Tomasz > > On Tue, May 26, 2015 at 7:17 PM, Xueming Shen > > wrote: > > > On 5/26/15 10:08 AM, Ivan Gerasimov wrote: > > > Thank you Roger for looking at this! > > On 26.05.2015 19:40, Roger Riggs wrote: > > > Hi Ivan, > > Did you consider doing the optimization inside of > Pattern.compile/replaceAll. > That would have a broader impact and benefit. > > > > In Pattern.compile the flag LITERAL can be combined with several other > flags, which makes things more complex. > However, in String.replace we've got a special case, which can be > optimized in a straight-forward (though a little bit verbose) way. > > > It probably makes big difference to access those characters via > CharSequence.charAt() > and the direct internal char[] access. Though avoiding allocating those > objects created > by Pattern/Matcher/SB definitely helps. > > -Sherman > > > > > Sincerely yours, > Ivan > > Roger > > > On 5/26/2015 12:36 PM, Ivan Gerasimov wrote: > > > Thank you Mark for looking at this! > > On 26.05.2015 18:39, mark.reinh...@oracle.com wrote: > > > Your micro-benchmark improvements are significant, but do you have > evidence to suggest that the performance of this method is actually > critical to real applications? > > In other words, is the added code complexity really worth it? > > > The enhancement request contains a few links to the discussions of this > method's performance at open forums. > The most frequent suggestion is to use alternatives from 3rd party > libraries. > > That should prove the benefits of this fix -- by improving performance > we can keep some users from moving away from JDK :) > > grep shows that langtools would also benefit from making replace() > faster. > > Sincerely yours, > Ivan > > - Mark > > > > > -- Tomasz Kowalczewski
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
But the current implementation converts the argument to String in the first place too: return Pattern.compile(*target.toString()*, Pattern.LITERAL).matcher( this).replaceAll(Matcher.quoteReplacement(*replacement.toString()*)); So I didn't introduce this conversion :-) When browsing the source code, looking for usages of this method, I only saw Strings passed as the parameters. Thus, I guess it is reasonable to have the variant which works best in this case. Sincerely yours, Ivan On 26.05.2015 20:26, Tomasz Kowalczewski wrote: I second that. API accepts CharSequence for a reason. The benchmarks might look a little different if you pass StringBuilder in a call to replace. It looks like removing toString() call on CharSequences will be easy with one drawback of loosing the benefit of System.arraycopy call. Note that String.indexOf requires String argument but that can also be fixed. -- Tomasz On Tue, May 26, 2015 at 7:17 PM, Xueming Shen wrote: On 5/26/15 10:08 AM, Ivan Gerasimov wrote: Thank you Roger for looking at this! On 26.05.2015 19:40, Roger Riggs wrote: Hi Ivan, Did you consider doing the optimization inside of Pattern.compile/replaceAll. That would have a broader impact and benefit. In Pattern.compile the flag LITERAL can be combined with several other flags, which makes things more complex. However, in String.replace we've got a special case, which can be optimized in a straight-forward (though a little bit verbose) way. It probably makes big difference to access those characters via CharSequence.charAt() and the direct internal char[] access. Though avoiding allocating those objects created by Pattern/Matcher/SB definitely helps. -Sherman Sincerely yours, Ivan Roger On 5/26/2015 12:36 PM, Ivan Gerasimov wrote: Thank you Mark for looking at this! On 26.05.2015 18:39, mark.reinh...@oracle.com wrote: Your micro-benchmark improvements are significant, but do you have evidence to suggest that the performance of this method is actually critical to real applications? In other words, is the added code complexity really worth it? The enhancement request contains a few links to the discussions of this method's performance at open forums. The most frequent suggestion is to use alternatives from 3rd party libraries. That should prove the benefits of this fix -- by improving performance we can keep some users from moving away from JDK :) grep shows that langtools would also benefit from making replace() faster. Sincerely yours, Ivan - Mark
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
I second that. API accepts CharSequence for a reason. The benchmarks might look a little different if you pass StringBuilder in a call to replace. It looks like removing toString() call on CharSequences will be easy with one drawback of loosing the benefit of System.arraycopy call. Note that String.indexOf requires String argument but that can also be fixed. -- Tomasz On Tue, May 26, 2015 at 7:17 PM, Xueming Shen wrote: > On 5/26/15 10:08 AM, Ivan Gerasimov wrote: > >> Thank you Roger for looking at this! >> >> On 26.05.2015 19:40, Roger Riggs wrote: >> >>> Hi Ivan, >>> >>> Did you consider doing the optimization inside of >>> Pattern.compile/replaceAll. >>> That would have a broader impact and benefit. >>> >>> >> In Pattern.compile the flag LITERAL can be combined with several other >> flags, which makes things more complex. >> However, in String.replace we've got a special case, which can be >> optimized in a straight-forward (though a little bit verbose) way. >> > > It probably makes big difference to access those characters via > CharSequence.charAt() > and the direct internal char[] access. Though avoiding allocating those > objects created > by Pattern/Matcher/SB definitely helps. > > -Sherman > > > >> Sincerely yours, >> Ivan >> >> Roger >>> >>> >>> On 5/26/2015 12:36 PM, Ivan Gerasimov wrote: >>> Thank you Mark for looking at this! On 26.05.2015 18:39, mark.reinh...@oracle.com wrote: > Your micro-benchmark improvements are significant, but do you have > evidence to suggest that the performance of this method is actually > critical to real applications? > > In other words, is the added code complexity really worth it? > The enhancement request contains a few links to the discussions of this method's performance at open forums. The most frequent suggestion is to use alternatives from 3rd party libraries. That should prove the benefits of this fix -- by improving performance we can keep some users from moving away from JDK :) grep shows that langtools would also benefit from making replace() faster. Sincerely yours, Ivan - Mark > > > >>> >>> >>> >> > -- Tomasz Kowalczewski
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
On 5/26/15 10:08 AM, Ivan Gerasimov wrote: Thank you Roger for looking at this! On 26.05.2015 19:40, Roger Riggs wrote: Hi Ivan, Did you consider doing the optimization inside of Pattern.compile/replaceAll. That would have a broader impact and benefit. In Pattern.compile the flag LITERAL can be combined with several other flags, which makes things more complex. However, in String.replace we've got a special case, which can be optimized in a straight-forward (though a little bit verbose) way. It probably makes big difference to access those characters via CharSequence.charAt() and the direct internal char[] access. Though avoiding allocating those objects created by Pattern/Matcher/SB definitely helps. -Sherman Sincerely yours, Ivan Roger On 5/26/2015 12:36 PM, Ivan Gerasimov wrote: Thank you Mark for looking at this! On 26.05.2015 18:39, mark.reinh...@oracle.com wrote: Your micro-benchmark improvements are significant, but do you have evidence to suggest that the performance of this method is actually critical to real applications? In other words, is the added code complexity really worth it? The enhancement request contains a few links to the discussions of this method's performance at open forums. The most frequent suggestion is to use alternatives from 3rd party libraries. That should prove the benefits of this fix -- by improving performance we can keep some users from moving away from JDK :) grep shows that langtools would also benefit from making replace() faster. Sincerely yours, Ivan - Mark
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Thank you Roger for looking at this! On 26.05.2015 19:40, Roger Riggs wrote: Hi Ivan, Did you consider doing the optimization inside of Pattern.compile/replaceAll. That would have a broader impact and benefit. In Pattern.compile the flag LITERAL can be combined with several other flags, which makes things more complex. However, in String.replace we've got a special case, which can be optimized in a straight-forward (though a little bit verbose) way. Sincerely yours, Ivan Roger On 5/26/2015 12:36 PM, Ivan Gerasimov wrote: Thank you Mark for looking at this! On 26.05.2015 18:39, mark.reinh...@oracle.com wrote: Your micro-benchmark improvements are significant, but do you have evidence to suggest that the performance of this method is actually critical to real applications? In other words, is the added code complexity really worth it? The enhancement request contains a few links to the discussions of this method's performance at open forums. The most frequent suggestion is to use alternatives from 3rd party libraries. That should prove the benefits of this fix -- by improving performance we can keep some users from moving away from JDK :) grep shows that langtools would also benefit from making replace() faster. Sincerely yours, Ivan - Mark
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi Ivan, Did you consider doing the optimization inside of Pattern.compile/replaceAll. That would have a broader impact and benefit. Roger On 5/26/2015 12:36 PM, Ivan Gerasimov wrote: Thank you Mark for looking at this! On 26.05.2015 18:39, mark.reinh...@oracle.com wrote: Your micro-benchmark improvements are significant, but do you have evidence to suggest that the performance of this method is actually critical to real applications? In other words, is the added code complexity really worth it? The enhancement request contains a few links to the discussions of this method's performance at open forums. The most frequent suggestion is to use alternatives from 3rd party libraries. That should prove the benefits of this fix -- by improving performance we can keep some users from moving away from JDK :) grep shows that langtools would also benefit from making replace() faster. Sincerely yours, Ivan - Mark
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Thank you Mark for looking at this! On 26.05.2015 18:39, mark.reinh...@oracle.com wrote: Your micro-benchmark improvements are significant, but do you have evidence to suggest that the performance of this method is actually critical to real applications? In other words, is the added code complexity really worth it? The enhancement request contains a few links to the discussions of this method's performance at open forums. The most frequent suggestion is to use alternatives from 3rd party libraries. That should prove the benefits of this fix -- by improving performance we can keep some users from moving away from JDK :) grep shows that langtools would also benefit from making replace() faster. Sincerely yours, Ivan - Mark
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Your micro-benchmark improvements are significant, but do you have evidence to suggest that the performance of this method is actually critical to real applications? In other words, is the added code complexity really worth it? - Mark
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi Rémi! On 25.05.2015 0:23, Remi Forax wrote: Hi Ivan, I will let people to comment on your first version but your second version uses a Stream and some lambdas which is in my opinion not a good idea in term of dependencies (Stream will be loaded too soon by example) and it may create some trouble in the future because lambdas depends on some classes of java.lang.invoke that are initialized late in the boot process while java.lang.String is initialized early. Good point. It could be rewritten using anonymous class instead. Though, I still like the first version better :-) Sincerely yours, Ivan cheers, Rémi On 05/24/2015 10:17 PM, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan
Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)
Hi Ivan, I will let people to comment on your first version but your second version uses a Stream and some lambdas which is in my opinion not a good idea in term of dependencies (Stream will be loaded too soon by example) and it may create some trouble in the future because lambdas depends on some classes of java.lang.invoke that are initialized late in the boot process while java.lang.String is initialized early. cheers, Rémi On 05/24/2015 10:17 PM, Ivan Gerasimov wrote: Hello everybody! I know many people here like it when the performance is getting better. It was suggested to make the literal variant of String.replace() faster. Currently, this method is implemented as a few calls to regexp API, so that the whole implementation takes only two lines of code. I've created two versions of the fix. In the first one, we scan the string and store indices of the found substrings in an array. Then, we allocate the precisely sized char array and fill it it. The case with the empty target has to be handled separately. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779 WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/ The second variant is much less verbose, however it's less efficient too. Here the StringJoiner is used as an intermediate storage. WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/ Here are the micro-benchmark results (in a string of ~300 chars do ~15 replacements). 0) Baseline MyBenchmark.test thrpt 40 257'051.948 ± 4537.484 ops/s 1) Heavy-duty +308% MyBenchmark.test thrpt 40 1'049'235.602 ± 15501.803 ops/s 2) StringJoiner +190% MyBenchmark.test thrpt 40 746'000.629 ± 15387.036 ops/s Personally, I like my first variant better, even though it adds almost 300 lines of code. But I'd like to hear what people think of it. Sincerely yours, Ivan