Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-02 Thread Ivan Gerasimov



On 02.06.2015 12:15, Paul Sandoz wrote:

On Jun 2, 2015, at 10:56 AM, Ivan Gerasimov  wrote:

You could simplify the data provider sourceTargetReplacementWithNull, there is 
no point testing with a null source. String.replace accepts two arguments, each 
can be either null or non-null. Thats 2 bits of state, so one can simply be 
explicit and presumably it should not matter what the non-null argument value 
(there are enough non-null values supplied by the other data providers):

   {null, null}
   {null, "foo"}
   {"foo", null}

I agree there is not much value in testing the case source == null, so it can 
be removed.
However, I would leave other test cases.
Earlier in this thread Rémi spotted a bug, which had been observed in a call "bar".replace("foo", 
null), but not in "foo".replace("foo", null).


Ok, presumably for the same identity rather than value?


For any valid substring, so "foobar".replace("bar", null) wouldn't have 
thrown OOM, even though it should have.
Surely, the current testcases often duplicate each other, but it seems 
to be easier to process them all then to filter the duplicates out.


Sincerely yours,
Ivan


Paul.








Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-02 Thread Paul Sandoz
On Jun 2, 2015, at 10:56 AM, Ivan Gerasimov  wrote:
>> 
>> You could simplify the data provider sourceTargetReplacementWithNull, there 
>> is no point testing with a null source. String.replace accepts two 
>> arguments, each can be either null or non-null. Thats 2 bits of state, so 
>> one can simply be explicit and presumably it should not matter what the 
>> non-null argument value (there are enough non-null values supplied by the 
>> other data providers):
>> 
>>   {null, null}
>>   {null, "foo"}
>>   {"foo", null}
> I agree there is not much value in testing the case source == null, so it can 
> be removed.
> However, I would leave other test cases.
> Earlier in this thread Rémi spotted a bug, which had been observed in a call 
> "bar".replace("foo", null), but not in "foo".replace("foo", null).
> 

Ok, presumably for the same identity rather than value?

Paul.




Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-02 Thread Ivan Gerasimov

Thanks!

On 02.06.2015 11:54, Remi Forax wrote:

Looks Ok to me,
thumb up :)

Rémi

On 06/01/2015 08:53 PM, Ivan Gerasimov wrote:



On 01.06.2015 11:33, Paul Sandoz wrote:
On May 31, 2015, at 6:03 PM, Ivan Gerasimov 
 wrote:

Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Much better.

For the test can you use RandomFactory recently added to the test 
library?


Sure.
Here the updated webrev with this change and a few other minor changes.
http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/

The changes are:
- move declaration of i below,
- indent .append(),
- use RandomFactory in the test,
- extend number of test cases with null input.

Do you think it's ready to go?

Sincerely yours,
Ivan


Paul.













Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-02 Thread Ivan Gerasimov



On 02.06.2015 11:20, Paul Sandoz wrote:

On Jun 1, 2015, at 8:53 PM, Ivan Gerasimov  wrote:



On 01.06.2015 11:33, Paul Sandoz wrote:

On May 31, 2015, at 6:03 PM, Ivan Gerasimov  wrote:

Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Much better.

For the test can you use RandomFactory recently added to the test library?

Sure.
Here the updated webrev with this change and a few other minor changes.
http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/

The changes are:
- move declaration of i below,
- indent .append(),
- use RandomFactory in the test,
- extend number of test cases with null input.

Do you think it's ready to go?


Almost :-)


Like Sherman i think you should get rid of JavaLangAccess.newStringUnsafe.


   78 @Test(dataProvider="sourceTargetReplacementWithNull")

You can use the "expectedExceptions" attribute instead of an explicit 
try/fail/catch.

   79 public void testNPE(String source, String target, String replacement) 
{
   80 try {
   81 source.replace(target, replacement);
   82 fail("Expected to throw NPE: " + source + ".replace(" + 
target +
   83 "," + replacement + ")");
   84 } catch (NullPointerException npe) {
   85 }
   86 }


Alright, I will use expectedExceptions here.


You could simplify the data provider sourceTargetReplacementWithNull, there is 
no point testing with a null source. String.replace accepts two arguments, each 
can be either null or non-null. Thats 2 bits of state, so one can simply be 
explicit and presumably it should not matter what the non-null argument value 
(there are enough non-null values supplied by the other data providers):

   {null, null}
   {null, "foo"}
   {"foo", null}
I agree there is not much value in testing the case source == null, so 
it can be removed.

However, I would leave other test cases.
Earlier in this thread Rémi spotted a bug, which had been observed in a 
call "bar".replace("foo", null), but not in "foo".replace("foo", null).


That was my motivation to extend the number of test cases for null inputs.

Sincerely yours,
Ivan


No need for another review for any of these.

Paul.






Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-02 Thread Remi Forax

Looks Ok to me,
thumb up :)

Rémi

On 06/01/2015 08:53 PM, Ivan Gerasimov wrote:



On 01.06.2015 11:33, Paul Sandoz wrote:
On May 31, 2015, at 6:03 PM, Ivan Gerasimov 
 wrote:

Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Much better.

For the test can you use RandomFactory recently added to the test 
library?


Sure.
Here the updated webrev with this change and a few other minor changes.
http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/

The changes are:
- move declaration of i below,
- indent .append(),
- use RandomFactory in the test,
- extend number of test cases with null input.

Do you think it's ready to go?

Sincerely yours,
Ivan


Paul.









Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-02 Thread Ivan Gerasimov



On 01.06.2015 22:10, Xueming Shen wrote:

Ivan,

The code looks fine for me.

Just wonder what's the motivation of using the newStringUnsafe() in 
the test case. Simply
to save the char[] copy to speed up the test? I don't think we really 
care about it here,

right?


Yes, right. I'll replace it with new String(char[]).

Sincerely yours,
Ivan


-Sherman

On 06/01/2015 11:53 AM, Ivan Gerasimov wrote:



On 01.06.2015 11:33, Paul Sandoz wrote:
On May 31, 2015, at 6:03 PM, Ivan Gerasimov 
 wrote:

Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Much better.

For the test can you use RandomFactory recently added to the test 
library?


Sure.
Here the updated webrev with this change and a few other minor changes.
http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/

The changes are:
- move declaration of i below,
- indent .append(),
- use RandomFactory in the test,
- extend number of test cases with null input.

Do you think it's ready to go?

Sincerely yours,
Ivan


Paul.













Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-02 Thread Paul Sandoz

On Jun 1, 2015, at 8:53 PM, Ivan Gerasimov  wrote:

> 
> 
> On 01.06.2015 11:33, Paul Sandoz wrote:
>> On May 31, 2015, at 6:03 PM, Ivan Gerasimov  
>> wrote:
>>> Which is right here:
>>> http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/
>>> 
>> Much better.
>> 
>> For the test can you use RandomFactory recently added to the test library?
> 
> Sure.
> Here the updated webrev with this change and a few other minor changes.
> http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/
> 
> The changes are:
> - move declaration of i below,
> - indent .append(),
> - use RandomFactory in the test,
> - extend number of test cases with null input.
> 
> Do you think it's ready to go?
> 

Almost :-)


Like Sherman i think you should get rid of JavaLangAccess.newStringUnsafe. 


  78 @Test(dataProvider="sourceTargetReplacementWithNull")

You can use the "expectedExceptions" attribute instead of an explicit 
try/fail/catch.

  79 public void testNPE(String source, String target, String replacement) {
  80 try {
  81 source.replace(target, replacement);
  82 fail("Expected to throw NPE: " + source + ".replace(" + target 
+
  83 "," + replacement + ")");
  84 } catch (NullPointerException npe) {
  85 }
  86 }


You could simplify the data provider sourceTargetReplacementWithNull, there is 
no point testing with a null source. String.replace accepts two arguments, each 
can be either null or non-null. Thats 2 bits of state, so one can simply be 
explicit and presumably it should not matter what the non-null argument value 
(there are enough non-null values supplied by the other data providers):

  {null, null}
  {null, "foo"}
  {"foo", null}

No need for another review for any of these.

Paul.


Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-01 Thread Xueming Shen

Ivan,

The code looks fine for me.

Just wonder what's the motivation of using the newStringUnsafe() in the test 
case. Simply
to save the char[] copy to speed up the test? I don't think we really care 
about it here,
right?

-Sherman

On 06/01/2015 11:53 AM, Ivan Gerasimov wrote:



On 01.06.2015 11:33, Paul Sandoz wrote:

On May 31, 2015, at 6:03 PM, Ivan Gerasimov  wrote:

Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Much better.

For the test can you use RandomFactory recently added to the test library?


Sure.
Here the updated webrev with this change and a few other minor changes.
http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/

The changes are:
- move declaration of i below,
- indent .append(),
- use RandomFactory in the test,
- extend number of test cases with null input.

Do you think it's ready to go?

Sincerely yours,
Ivan


Paul.









Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-01 Thread Ivan Gerasimov



On 01.06.2015 11:33, Paul Sandoz wrote:

On May 31, 2015, at 6:03 PM, Ivan Gerasimov  wrote:

Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Much better.

For the test can you use RandomFactory recently added to the test library?


Sure.
Here the updated webrev with this change and a few other minor changes.
http://cr.openjdk.java.net/~igerasim/8058779/06/webrev/

The changes are:
- move declaration of i below,
- indent .append(),
- use RandomFactory in the test,
- extend number of test cases with null input.

Do you think it's ready to go?

Sincerely yours,
Ivan


Paul.







Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-01 Thread Peter Levart



On 06/01/2015 10:32 AM, Ulf Zibis wrote:

Hi,

Am 31.05.2015 um 18:03 schrieb Ivan Gerasimov:


On 31.05.2015 18:54, Ivan Gerasimov wrote:


I fixed the code and added the case to the regression test in the 
new webrev.



Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Shoudn't the user be informed via javadoc about the risk of an 
OutOfMemoryError, just from illegal arguments of this method?

Example:
this.value.length() = 100
target.value.length() = 200
replacement.value.length() = 50
results in:


...no replacements made, since target.value.length() > 
this.value.length() and 1st indexOf returns -1...


if 1st indexOf returns >= 0, then this.value.length() >= 
target.value.length() and the only way to get negative newLenHint is via 
an overflow indicating that a String with length() > Integer.MAX_VALUE 
would be required to hold the result, which is impossible. We don't have 
a special StringTooBigException. OOME is the best approximation here. 
IllegalArgumentException is another option, but seems to be to general 
in this case. What happens when regexp based replace is fed with such 
huge strings?


I had to set the max. heap size to 14Gbytes to get the answer that was 
not an OOME caused by not enough heap space:


char[] chars = new char[1 + (1 << 30)];
chars[0] = 'a';
String big = new String(chars);
big.replace("a", big);

Exception in thread "main" java.lang.OutOfMemoryError
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:517)

at java.lang.StringBuilder.append(StringBuilder.java:175)
at java.util.regex.Matcher.appendTail(Matcher.java:1122)
at java.util.regex.Matcher.replaceAll(Matcher.java:1169)
at TestReplace.replace(TestReplace.java:11)
at TestReplace.main(TestReplace.java:18)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:502)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)


Regards, Peter


newLenHint = -500 --> OutOfMemoryError
In other words, is this a good reason to throw such an Error?

Little nit:
Indentation for line continuation should be 8 spaces.

-Ulf





Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-01 Thread Rémi Forax
Hi Ivan,


Le 31 mai 2015 17:54:35 CEST, Ivan Gerasimov  a 
écrit :
>Hi Remi!
>
>On 31.05.2015 11:43, Remi Forax wrote:
>> I agree with the others the code is more readable.
>>
>> There is a slightly difference between the current behavior and the 
>> behavior of the proposed code,
>>   "foo".replace("bar", null)
>> should throw a NPE because replacement is null but the proposed code 
>> will return "foo"
>> because replacement.toString() is called after the short-circuit test
>
>> (j < 0).
>>
>Yes, you're right, thanks for catching it!
>And the regression test should have caught that, but I only had 
>"a".replace("a", null) there, which passed.
>
>I fixed the code and added the case to the regression test in the new 
>webrev.
>
>> so the first part of the code should be:
>>   String starget = target.toString();  // implict nullcheck
>>   String srepl = replacement.toString();  // implicit nullcheck
>>   int j = indexOf(starget);
>>   if (j < 0) {
>> return this;
>>   }
>>   ...
>>
>> also 'i' can be initialized just before the 'do', there is no point
>to 
>> initialize it before.
>>
>> To finish, the two 'final' are useless here but i suppose it's a 
>> matter of style :)
>>
>I moved declaration of i just to save a line.  I don't think it 
>decreased performance.

I don't think so too.
It decrease readability, when you read the code of the loop you have to scroll 
up to find the initialization of 'i' like in plain old good C.

>
>Declaring the 'value' variable final was suggested by Martin, and I 
>think it is reasonable (see 
>http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032601.html).
>The other variable is final just for symmetry :)

Either i read the code in an IDE and local variables and fields do not have the 
same color or i read diff and in that case fields are usually not visible. 
For me, final on local variable is just noise. But as i said it's a matter of 
taste.

>
>Sincerely yours,
>Ivan

cheers,
Rémi 

>
>
>> cheers,
>> Rémi
>>
>>
>> On 05/31/2015 04:19 AM, Ivan Gerasimov wrote:
>>> Hi everyone!
>>>
>>> Here's another webrev, in which replace() is implemented with 
>>> StringBuilder.
>>> On my benchmark it is almost as fast as the version backed with 
>>> arrays, but this variant is much shorter.
>>>
>>> Credits to Sherman for combining the general algorithm with the case
>
>>> of empty target.
>>>
>>> Comments, further suggestions are welcome!
>>>
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/
>>>
>>> Sincerely yours,
>>> Ivan
>>>
>>
>>
>>



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-01 Thread Paul Sandoz
On May 31, 2015, at 6:03 PM, Ivan Gerasimov  wrote:
>> 
> Which is right here:
> http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/
> 

Much better.

For the test can you use RandomFactory recently added to the test library?

Paul.



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-06-01 Thread Ulf Zibis

Hi,

Am 31.05.2015 um 18:03 schrieb Ivan Gerasimov:


On 31.05.2015 18:54, Ivan Gerasimov wrote:


I fixed the code and added the case to the regression test in the new webrev.


Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


Shoudn't the user be informed via javadoc about the risk of an OutOfMemoryError, just from illegal 
arguments of this method?

Example:
this.value.length() = 100
target.value.length() = 200
replacement.value.length() = 50
results in:
newLenHint = -500 --> OutOfMemoryError
In other words, is this a good reason to throw such an Error?

Little nit:
Indentation for line continuation should be 8 spaces.

-Ulf



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-31 Thread Ivan Gerasimov



On 31.05.2015 18:54, Ivan Gerasimov wrote:

Hi Remi!

On 31.05.2015 11:43, Remi Forax wrote:

I agree with the others the code is more readable.

There is a slightly difference between the current behavior and the 
behavior of the proposed code,

  "foo".replace("bar", null)
should throw a NPE because replacement is null but the proposed code 
will return "foo"
because replacement.toString() is called after the short-circuit test 
(j < 0).



Yes, you're right, thanks for catching it!
And the regression test should have caught that, but I only had 
"a".replace("a", null) there, which passed.


I fixed the code and added the case to the regression test in the new 
webrev.



Which is right here:
http://cr.openjdk.java.net/~igerasim/8058779/05/webrev/


so the first part of the code should be:
  String starget = target.toString();  // implict nullcheck
  String srepl = replacement.toString();  // implicit nullcheck
  int j = indexOf(starget);
  if (j < 0) {
return this;
  }
  ...

also 'i' can be initialized just before the 'do', there is no point 
to initialize it before.


To finish, the two 'final' are useless here but i suppose it's a 
matter of style :)


I moved declaration of i just to save a line.  I don't think it 
decreased performance.


Declaring the 'value' variable final was suggested by Martin, and I 
think it is reasonable (see 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032601.html).

The other variable is final just for symmetry :)

Sincerely yours,
Ivan



cheers,
Rémi


On 05/31/2015 04:19 AM, Ivan Gerasimov wrote:

Hi everyone!

Here's another webrev, in which replace() is implemented with 
StringBuilder.
On my benchmark it is almost as fast as the version backed with 
arrays, but this variant is much shorter.


Credits to Sherman for combining the general algorithm with the case 
of empty target.


Comments, further suggestions are welcome!

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/

Sincerely yours,
Ivan













Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-31 Thread Ivan Gerasimov

Hi Remi!

On 31.05.2015 11:43, Remi Forax wrote:

I agree with the others the code is more readable.

There is a slightly difference between the current behavior and the 
behavior of the proposed code,

  "foo".replace("bar", null)
should throw a NPE because replacement is null but the proposed code 
will return "foo"
because replacement.toString() is called after the short-circuit test 
(j < 0).



Yes, you're right, thanks for catching it!
And the regression test should have caught that, but I only had 
"a".replace("a", null) there, which passed.


I fixed the code and added the case to the regression test in the new 
webrev.



so the first part of the code should be:
  String starget = target.toString();  // implict nullcheck
  String srepl = replacement.toString();  // implicit nullcheck
  int j = indexOf(starget);
  if (j < 0) {
return this;
  }
  ...

also 'i' can be initialized just before the 'do', there is no point to 
initialize it before.


To finish, the two 'final' are useless here but i suppose it's a 
matter of style :)


I moved declaration of i just to save a line.  I don't think it 
decreased performance.


Declaring the 'value' variable final was suggested by Martin, and I 
think it is reasonable (see 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032601.html).

The other variable is final just for symmetry :)

Sincerely yours,
Ivan



cheers,
Rémi


On 05/31/2015 04:19 AM, Ivan Gerasimov wrote:

Hi everyone!

Here's another webrev, in which replace() is implemented with 
StringBuilder.
On my benchmark it is almost as fast as the version backed with 
arrays, but this variant is much shorter.


Credits to Sherman for combining the general algorithm with the case 
of empty target.


Comments, further suggestions are welcome!

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/

Sincerely yours,
Ivan









Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-31 Thread Ivan Gerasimov



On 31.05.2015 13:28, Ulf Zibis wrote:


Am 31.05.2015 um 08:38 schrieb Peter Levart:

Hi,

Yes, this one is much easier to grasp.

As I understand the check is to avoid overflow in calculation of 
StringBuilder initial capacity (newLenHint). If overflow happened, 
newLenHint would be negative and StringBuilder construction would 
fail with NegativeArraySizeException. But the calculation of newLenHint:


int newLenHint = value.length - targLen + replValue.length;

...considering the following:

value.length >= 0
targLength >= 0
replValue.length >= 0
targLength <= value.length

in case of overflow, it can only produce a negative value. So the 
check could simply be:


if (newLenHint < 0) {
throw new OutOfMemoryError();
}


Hm, what has this situation to do with Out-Of-Memory ?

That should indicate that we were about to try to allocate an array of 
size > Integer.MAX_VALUE.

E.g. java.util.AbstractCollection.java:

   private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError
("Required array size too large");


In other words, IMHO NegativeArraySizeException is much better here 
and additionally saves performance.

Additionally you could propagate it to InvalidArgumentException.


I don't think it is InvalidArgument case.
It just happens that the result is larger then allowed, so let's throw OOM!

Sincerely yours,
Ivan


-Ulf







Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-31 Thread Ivan Gerasimov

Thanks! I'll simplify this check for overflow, as you're suggesting.

Sincerely yours,
Ivan

On 31.05.2015 10:01, Xueming Shen wrote:


On 5/30/2015 11:38 PM, Peter Levart wrote:



Hi,

Yes, this one is much easier to grasp.

As I understand the check is to avoid overflow in calculation of 
StringBuilder initial capacity (newLenHint). If overflow happened, 
newLenHint would be negative and StringBuilder construction would 
fail with NegativeArraySizeException. But the calculation of newLenHint:


int newLenHint = value.length - targLen + replValue.length;

...considering the following:

value.length >= 0
targLength >= 0
replValue.length >= 0
targLength <= value.length

in case of overflow, it can only produce a negative value. So the 
check could simply be:


if (newLenHint < 0) {
throw new OutOfMemoryError();
}

Right?

Regards, Peter



agreed.





Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-31 Thread Ulf Zibis


Am 31.05.2015 um 08:38 schrieb Peter Levart:

Hi,

Yes, this one is much easier to grasp.

As I understand the check is to avoid overflow in calculation of StringBuilder initial capacity 
(newLenHint). If overflow happened, newLenHint would be negative and StringBuilder construction 
would fail with NegativeArraySizeException. But the calculation of newLenHint:


int newLenHint = value.length - targLen + replValue.length;

...considering the following:

value.length >= 0
targLength >= 0
replValue.length >= 0
targLength <= value.length

in case of overflow, it can only produce a negative value. So the check could 
simply be:

if (newLenHint < 0) {
throw new OutOfMemoryError();
}


Hm, what has this situation to do with Out-Of-Memory ?

In other words, IMHO NegativeArraySizeException is much better here and 
additionally saves performance.
Additionally you could propagate it to InvalidArgumentException.

-Ulf



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-31 Thread Remi Forax

I agree with the others the code is more readable.

There is a slightly difference between the current behavior and the 
behavior of the proposed code,

  "foo".replace("bar", null)
should throw a NPE because replacement is null but the proposed code 
will return "foo"
because replacement.toString() is called after the short-circuit test (j 
< 0).


so the first part of the code should be:
  String starget = target.toString();  // implict nullcheck
  String srepl = replacement.toString();  // implicit nullcheck
  int j = indexOf(starget);
  if (j < 0) {
return this;
  }
  ...

also 'i' can be initialized just before the 'do', there is no point to 
initialize it before.


To finish, the two 'final' are useless here but i suppose it's a matter 
of style :)


cheers,
Rémi


On 05/31/2015 04:19 AM, Ivan Gerasimov wrote:

Hi everyone!

Here's another webrev, in which replace() is implemented with 
StringBuilder.
On my benchmark it is almost as fast as the version backed with 
arrays, but this variant is much shorter.


Credits to Sherman for combining the general algorithm with the case 
of empty target.


Comments, further suggestions are welcome!

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/

Sincerely yours,
Ivan





Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-31 Thread Xueming Shen


On 5/30/2015 11:38 PM, Peter Levart wrote:



Hi,

Yes, this one is much easier to grasp.

As I understand the check is to avoid overflow in calculation of 
StringBuilder initial capacity (newLenHint). If overflow happened, 
newLenHint would be negative and StringBuilder construction would fail 
with NegativeArraySizeException. But the calculation of newLenHint:


int newLenHint = value.length - targLen + replValue.length;

...considering the following:

value.length >= 0
targLength >= 0
replValue.length >= 0
targLength <= value.length

in case of overflow, it can only produce a negative value. So the 
check could simply be:


if (newLenHint < 0) {
throw new OutOfMemoryError();
}

Right?

Regards, Peter



agreed.



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-30 Thread Peter Levart



On 05/31/2015 06:34 AM, Xueming Shen wrote:

On 5/30/15 7:19 PM, Ivan Gerasimov wrote:

Hi everyone!

Here's another webrev, in which replace() is implemented with 
StringBuilder.
On my benchmark it is almost as fast as the version backed with 
arrays, but this variant is much shorter.


Credits to Sherman for combining the general algorithm with the case 
of empty target.


Comments, further suggestions are welcome!

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/

Sincerely yours,
Ivan



This is one is much better:-) I would suggest to leave the 
"overflow-conscious" code to the StringBuilder.
The same range check is being done inside ABS every time the repl 
string is appended into the buffer,
it does not appear to be very helpful to have a redundant check here. 
And it seems this check only works

for the single appearance of target string.

2260 // overflow-conscious code
2261 if (value.length - targLen > Integer.MAX_VALUE - 
replValue.length) {

2262 throw new OutOfMemoryError();
2263 }



Hi,

Yes, this one is much easier to grasp.

As I understand the check is to avoid overflow in calculation of 
StringBuilder initial capacity (newLenHint). If overflow happened, 
newLenHint would be negative and StringBuilder construction would fail 
with NegativeArraySizeException. But the calculation of newLenHint:


int newLenHint = value.length - targLen + replValue.length;

...considering the following:

value.length >= 0
targLength >= 0
replValue.length >= 0
targLength <= value.length

in case of overflow, it can only produce a negative value. So the check 
could simply be:


if (newLenHint < 0) {
throw new OutOfMemoryError();
}

Right?

Regards, Peter



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-30 Thread Xueming Shen

On 5/30/15 7:19 PM, Ivan Gerasimov wrote:

Hi everyone!

Here's another webrev, in which replace() is implemented with 
StringBuilder.
On my benchmark it is almost as fast as the version backed with 
arrays, but this variant is much shorter.


Credits to Sherman for combining the general algorithm with the case 
of empty target.


Comments, further suggestions are welcome!

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/

Sincerely yours,
Ivan



This is one is much better:-) I would suggest to leave the 
"overflow-conscious" code to the StringBuilder.
The same range check is being done inside ABS every time the repl string 
is appended into the buffer,
it does not appear to be very helpful to have a redundant check here. 
And it seems this check only works

for the single appearance of target string.

2260 // overflow-conscious code
2261 if (value.length - targLen > Integer.MAX_VALUE - replValue.length) 
{
2262 throw new OutOfMemoryError();
2263 }



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-30 Thread Ivan Gerasimov

Hi everyone!

Here's another webrev, in which replace() is implemented with StringBuilder.
On my benchmark it is almost as fast as the version backed with arrays, 
but this variant is much shorter.


Credits to Sherman for combining the general algorithm with the case of 
empty target.


Comments, further suggestions are welcome!

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/04/webrev/

Sincerely yours,
Ivan



Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Xueming Shen

On 05/27/2015 01:10 PM, Xueming Shen wrote:

On 05/27/2015 12:43 PM, Ivan Gerasimov wrote:



On 27.05.2015 21:08, Xueming Shen wrote:

targLen = max(1, tagLen);   ?


Well, almost :)
With such targLen, (i = j + targLen) would result in i == length() + 1, which 
will cause IOOBE in the following append().

I'm sure the algorithms can be adopted to run correctly with empty target, it 
just needs some accurate checking.

But why can't we consider the first variant of replace()?
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

It's still faster and it can handle larger strings without OOME.  I think this 
is important.

I haven't heard any critics of it except for its relative complexity, comparing 
to the original code.
And we have a backup variant with StringBuilder, if problems are found.






Personally I prefer not do "buffer management" issue in replace(), 
StringBuilder is just doing
the job fine. If possible I would avoid to have a special method() to just work 
a corner case.
"is relative complexity" is a critics. We are trying to see if we can achieve 
most of the gain
without such complexity.

-Sherman


Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Xueming Shen

On 05/27/2015 12:43 PM, Ivan Gerasimov wrote:



On 27.05.2015 21:08, Xueming Shen wrote:

targLen = max(1, tagLen);   ?


Well, almost :)
With such targLen, (i = j + targLen) would result in i == length() + 1, which 
will cause IOOBE in the following append().

I'm sure the algorithms can be adopted to run correctly with empty target, it 
just needs some accurate checking.

But why can't we consider the first variant of replace()?
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

It's still faster and it can handle larger strings without OOME.  I think this 
is important.

I haven't heard any critics of it except for its relative complexity, comparing 
to the original code.
And we have a backup variant with StringBuilder, if problems are found.






Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Ivan Gerasimov



On 27.05.2015 21:08, Xueming Shen wrote:

targLen = max(1, tagLen);   ?


Well, almost :)
With such targLen, (i = j + targLen) would result in i == length() + 1, 
which will cause IOOBE in the following append().


I'm sure the algorithms can be adopted to run correctly with empty 
target, it just needs some accurate checking.


But why can't we consider the first variant of replace()?
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

It's still faster and it can handle larger strings without OOME.  I 
think this is important.


I haven't heard any critics of it except for its relative complexity, 
comparing to the original code.

And we have a backup variant with StringBuilder, if problems are found.

Sincerely yours,
Ivan


On 05/27/2015 10:59 AM, Ivan Gerasimov wrote:



On 27.05.2015 20:40, Xueming Shen wrote:


The .append(srepl) -> append(srep.value)

And I would simply remove the "fastpath" for the target.length() = 0 
special case ...
And maybe pick an appropriate initial size (maybe this.length + 16 * 
diff) for the sb

to further reduce its internal expanding ...


Sorry, I forgot to comment this.  This is not really a fastpath.
This is a special case, which isn't handled by the general algorithm.
We can modify the general algorithm, of course to handle empty 
target, but I thought it would be more clear to have a separate branch.


I think your variant below would loop forever in the loop with an 
empty target:

i = j + targLen; // targLen == 0
j = indexOf(starget, i) // j == i

Sincerely yours,
Ivan

Personally this code is straightforward and I would be happy with 
the 800+ vs 1000+

score diff.

public String replace(CharSequence target, CharSequence 
replacement) {

String starget = target.toString();
int targLen = starget.length();
String srepl = replacement.toString();

int j = indexOf(starget);
// special case: nothing to replace
if (j<  0) {
return this;
}

final char[] value = this.value;
StringBuilder sb = new StringBuilder();
int i = 0;
do {
sb.append(value, i, j - i)
.append(srepl.value);
i = j + targLen;
} while ((j = indexOf(starget, i))>  0);

return sb.append(this, i, length()).toString();
}

-Sherman


On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:


On 27.05.2015 18:44, Xueming Shen wrote:
You might want to use directly sb.append.(char[], off, len), 
instead of append(str, off, len)/append(str).


Ah, yes sure!  It would improve things.
I updated the webrev in place:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

However the general picture is almost the same:
performance:
Baseline:  MyBenchmark.test  thrpt   40  257'051.948 ± 
4537.484  ops/s
StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 
22152.166  ops/s
Arrays:MyBenchmark.test  thrpt   40  1'049'235.602 ± 
15501.803  ops/s


Memory efficiency is still less then for array-version:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.429s
user0m2.112s
sys0m0.792s

Sincerely yours,
Ivan

Btw, is the target.lenght==0 the popular use case/scenario? Just 
wonder why do we need a special case

here for it.

thanks,
-Sherman

On 5/27/15 8:29 AM, Ivan Gerasimov wrote:

For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current 
implementation and the array-based implementation:

MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970 ops/s

Memory efficiency is less then the one of array-based 
implementation:

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.585s
user0m1.875s
sys0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead 
of writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The 
regex is probably a little heavy for
literal string replacement, but StringBuilder should not be that 
bad ...


-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced 
MAX_ARRAY_SIZE with Integer.MAX_VALUE, which seems to be more 
accurate here.


And  I wan

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Xueming Shen

targLen = max(1, tagLen);   ?

On 05/27/2015 10:59 AM, Ivan Gerasimov wrote:



On 27.05.2015 20:40, Xueming Shen wrote:


The .append(srepl) -> append(srep.value)

And I would simply remove the "fastpath" for the target.length() = 0 special 
case ...
And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for 
the sb
to further reduce its internal expanding ...


Sorry, I forgot to comment this.  This is not really a fastpath.
This is a special case, which isn't handled by the general algorithm.
We can modify the general algorithm, of course to handle empty target, but I 
thought it would be more clear to have a separate branch.

I think your variant below would loop forever in the loop with an empty target:
i = j + targLen; // targLen == 0
j = indexOf(starget, i) // j == i

Sincerely yours,
Ivan


Personally this code is straightforward and I would be happy with the 800+ vs 
1000+
score diff.

public String replace(CharSequence target, CharSequence replacement) {
String starget = target.toString();
int targLen = starget.length();
String srepl = replacement.toString();

int j = indexOf(starget);
// special case: nothing to replace
if (j<  0) {
return this;
}

final char[] value = this.value;
StringBuilder sb = new StringBuilder();
int i = 0;
do {
sb.append(value, i, j - i)
.append(srepl.value);
i = j + targLen;
} while ((j = indexOf(starget, i))>  0);

return sb.append(this, i, length()).toString();
}

-Sherman


On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:


On 27.05.2015 18:44, Xueming Shen wrote:

You might want to use directly sb.append.(char[], off, len), instead of 
append(str, off, len)/append(str).


Ah, yes sure!  It would improve things.
I updated the webrev in place:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

However the general picture is almost the same:
performance:
Baseline:  MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  ops/s
Arrays:MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

Memory efficiency is still less then for array-version:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)

25) 100663296

real0m2.429s
user0m2.112s
sys0m0.792s

Sincerely yours,
Ivan


Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do 
we need a special case
here for it.

thanks,
-Sherman

On 5/27/15 8:29 AM, Ivan Gerasimov wrote:

For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current implementation and the 
array-based implementation:
MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s

Memory efficiency is less then the one of array-based implementation:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)

25) 100663296

real0m2.585s
user0m1.875s
sys0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to 
String, but it might have
better balance between performance and code complexity. The regex is probably a 
little heavy for
literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with 
Integer.MAX_VALUE, which seems to be more accurate here.

And  I want to add that this proposed implementation is not only faster, but 
also more memory efficient.
The following simple stress-test shows that the proposed version is able to 
handle twice larger strings, comparing to the current implementation.


public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/j

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Ivan Gerasimov



On 27.05.2015 20:40, Xueming Shen wrote:


The .append(srepl) -> append(srep.value)

And I would simply remove the "fastpath" for the target.length() = 0 
special case ...
And maybe pick an appropriate initial size (maybe this.length + 16 * 
diff) for the sb

to further reduce its internal expanding ...


Sorry, I forgot to comment this.  This is not really a fastpath.
This is a special case, which isn't handled by the general algorithm.
We can modify the general algorithm, of course to handle empty target, 
but I thought it would be more clear to have a separate branch.


I think your variant below would loop forever in the loop with an empty 
target:

i = j + targLen; // targLen == 0
j = indexOf(starget, i) // j == i

Sincerely yours,
Ivan

Personally this code is straightforward and I would be happy with the 
800+ vs 1000+

score diff.

public String replace(CharSequence target, CharSequence 
replacement) {

String starget = target.toString();
int targLen = starget.length();
String srepl = replacement.toString();

int j = indexOf(starget);
// special case: nothing to replace
if (j<  0) {
return this;
}

final char[] value = this.value;
StringBuilder sb = new StringBuilder();
int i = 0;
do {
sb.append(value, i, j - i)
.append(srepl.value);
i = j + targLen;
} while ((j = indexOf(starget, i))>  0);

return sb.append(this, i, length()).toString();
}

-Sherman


On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:


On 27.05.2015 18:44, Xueming Shen wrote:
You might want to use directly sb.append.(char[], off, len), instead 
of append(str, off, len)/append(str).


Ah, yes sure!  It would improve things.
I updated the webrev in place:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

However the general picture is almost the same:
performance:
Baseline:  MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  
ops/s
StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  
ops/s
Arrays:MyBenchmark.test  thrpt   40  1'049'235.602 ± 
15501.803  ops/s


Memory efficiency is still less then for array-version:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.429s
user0m2.112s
sys0m0.792s

Sincerely yours,
Ivan

Btw, is the target.lenght==0 the popular use case/scenario? Just 
wonder why do we need a special case

here for it.

thanks,
-Sherman

On 5/27/15 8:29 AM, Ivan Gerasimov wrote:

For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current implementation 
and the array-based implementation:

MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s

Memory efficiency is less then the one of array-based implementation:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.585s
user0m1.875s
sys0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex 
is probably a little heavy for
literal string replacement, but StringBuilder should not be that 
bad ...


-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version 
is able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-e

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Xueming Shen


The .append(srepl) -> append(srep.value)

And I would simply remove the "fastpath" for the target.length() = 0 special 
case ...
And maybe pick an appropriate initial size (maybe this.length + 16 * diff) for 
the sb
to further reduce its internal expanding ...

Personally this code is straightforward and I would be happy with the 800+ vs 
1000+
score diff.

public String replace(CharSequence target, CharSequence replacement) {
String starget = target.toString();
int targLen = starget.length();
String srepl = replacement.toString();

int j = indexOf(starget);
// special case: nothing to replace
if (j<  0) {
return this;
}

final char[] value = this.value;
StringBuilder sb = new StringBuilder();
int i = 0;
do {
sb.append(value, i, j - i)
.append(srepl.value);
i = j + targLen;
} while ((j = indexOf(starget, i))>  0);

return sb.append(this, i, length()).toString();
}

-Sherman


On 05/27/2015 10:06 AM, Ivan Gerasimov wrote:


On 27.05.2015 18:44, Xueming Shen wrote:

You might want to use directly sb.append.(char[], off, len), instead of 
append(str, off, len)/append(str).


Ah, yes sure!  It would improve things.
I updated the webrev in place:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

However the general picture is almost the same:
performance:
Baseline:  MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  ops/s
Arrays:MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

Memory efficiency is still less then for array-version:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)

25) 100663296

real0m2.429s
user0m2.112s
sys0m0.792s

Sincerely yours,
Ivan


Btw, is the target.lenght==0 the popular use case/scenario? Just wonder why do 
we need a special case
here for it.

thanks,
-Sherman

On 5/27/15 8:29 AM, Ivan Gerasimov wrote:

For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current implementation and the 
array-based implementation:
MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s

Memory efficiency is less then the one of array-based implementation:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)

25) 100663296

real0m2.585s
user0m1.875s
sys0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the StrinbBuilder to 
String, but it might have
better balance between performance and code complexity. The regex is probably a 
little heavy for
literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with 
Integer.MAX_VALUE, which seems to be more accurate here.

And  I want to add that this proposed implementation is not only faster, but 
also more memory efficient.
The following simple stress-test shows that the proposed version is able to 
handle twice larger strings, comparing to the current implementation.


public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)

26) 201326592

real0m2.139s
user0m1.960

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Ivan Gerasimov


On 27.05.2015 18:44, Xueming Shen wrote:
You might want to use directly sb.append.(char[], off, len), instead 
of append(str, off, len)/append(str).


Ah, yes sure!  It would improve things.
I updated the webrev in place:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

However the general picture is almost the same:
performance:
Baseline:  MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s
StringBuilder: MyBenchmark.test  thrpt   40  808'033.808 ± 22152.166  ops/s
Arrays:MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  
ops/s


Memory efficiency is still less then for array-version:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.429s
user0m2.112s
sys0m0.792s

Sincerely yours,
Ivan

Btw, is the target.lenght==0 the popular use case/scenario? Just 
wonder why do we need a special case

here for it.

thanks,
-Sherman

On 5/27/15 8:29 AM, Ivan Gerasimov wrote:

For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current implementation 
and the array-based implementation:

MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s

Memory efficiency is less then the one of array-based implementation:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.585s
user0m1.875s
sys0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for
literal string replacement, but StringBuilder should not be that bad 
...


-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting 
better.
It was suggested to make the literal variant of String.replace() 
faster.
Currently, this method is implemented as a few calls to regexp 
API, so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the 
found substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less 
efficient too.

Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do 
~15 replacements).

0) Baseline
MyBenchmark.test  thrpt 

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Xueming Shen
You might want to use directly sb.append.(char[], off, len), instead of 
append(str, off, len)/append(str).
Btw, is the target.lenght==0 the popular use case/scenario? Just wonder 
why do we need a special case

here for it.

thanks,
-Sherman

On 5/27/15 8:29 AM, Ivan Gerasimov wrote:

For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current implementation and 
the array-based implementation:

MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s

Memory efficiency is less then the one of array-based implementation:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.585s
user0m1.875s
sys0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for

literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting 
better.
It was suggested to make the literal variant of String.replace() 
faster.
Currently, this method is implemented as a few calls to regexp API, 
so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less 
efficient too.

Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do 
~15 replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds 
almost 300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan














Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Ivan Gerasimov

For completeness, here's another webrev, which uses StringBuilder:
http://cr.openjdk.java.net/~igerasim/8058779/03/webrev/

Its performance is somewhere in between the current implementation and 
the array-based implementation:

MyBenchmark.test  thrpt   40  796'059.192 ± 12455.970  ops/s

Memory efficiency is less then the one of array-based implementation:
$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m2.585s
user0m1.875s
sys0m0.887s

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for

literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting better.
It was suggested to make the literal variant of String.replace() 
faster.
Currently, this method is implemented as a few calls to regexp API, 
so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less efficient 
too.

Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do 
~15 replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds 
almost 300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan












Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Ivan Gerasimov

Thank you Peter for looking at this!

On 27.05.2015 16:15, Peter Levart wrote:

Hi Ivan,

How does this implementation compare to your's two regarding speed:

http://cr.openjdk.java.net/~plevart/jdk9-dev/String.replace/webrev.01/



You need to check for possible overflow here:
2287 resLen += repLen + ss.length();


It is not much shorter than your's first, but more object oriented ;-)

Well, the main difference is that you store the positions of substrings 
in a linked-list instead of an array.

But arrays are more efficient.

On small data the performance is almost the same as for the 'array' version:
MyBenchmark.test  thrpt   40  1'046'137.437 ± 13961.121  ops/s

But on huge data the performance is much worse, comparing to even 
current implementation.

Here are the results of running the stress test, I had posted earlier:
-
time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_27_14_47-b00, mixed mode)


25) 100663296

real0m12.945s
user0m15.670s
sys0m7.189s
-

It took 6 times more time to complete, comparing to the current 
implementation, which uses regexp engine.


Sincerely yours,
Ivan


Regards, Peter

On 05/27/2015 11:36 AM, Ivan Gerasimov wrote:

Hi Sherman!

Please take a look at my other webrev, that I provided in the first 
message.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/

I used StringJoiner there, which in some aspects seems to fit better 
here, comparing to StringBuilder.

It does reduce the code, but of course runs slower.
In that version every part of the source string had to be converted 
to a separate String, which add another redundant copying.


I still prefer the version, which deals with arrays.  I don't think 
it's too complex.
It surely adds some complexity to the original code, but it's only 70 
lines of code in total.

Everything else are comments and empty lines.

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for
literal string replacement, but StringBuilder should not be that bad 
...


-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting 
better.
It was suggested to make the literal variant of String.replace() 
faster.
Currently, this method is implemented as a few calls to regexp 
API, so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the 
found substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less 
efficient too.

Here the 

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Peter Levart

Hi Ivan,

How does this implementation compare to your's two regarding speed:

http://cr.openjdk.java.net/~plevart/jdk9-dev/String.replace/webrev.01/

It is not much shorter than your's first, but more object oriented ;-)

Regards, Peter

On 05/27/2015 11:36 AM, Ivan Gerasimov wrote:

Hi Sherman!

Please take a look at my other webrev, that I provided in the first 
message.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/

I used StringJoiner there, which in some aspects seems to fit better 
here, comparing to StringBuilder.

It does reduce the code, but of course runs slower.
In that version every part of the source string had to be converted to 
a separate String, which add another redundant copying.


I still prefer the version, which deals with arrays.  I don't think 
it's too complex.
It surely adds some complexity to the original code, but it's only 70 
lines of code in total.

Everything else are comments and empty lines.

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for

literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting 
better.
It was suggested to make the literal variant of String.replace() 
faster.
Currently, this method is implemented as a few calls to regexp API, 
so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less 
efficient too.

Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do 
~15 replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803 ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds 
almost 300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan














Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Peter Levart



On 05/27/2015 11:36 AM, Ivan Gerasimov wrote:

Hi Sherman!

Please take a look at my other webrev, that I provided in the first 
message.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/

I used StringJoiner there, which in some aspects seems to fit better 
here, comparing to StringBuilder.

It does reduce the code, but of course runs slower.
In that version every part of the source string had to be converted to 
a separate String, which add another redundant copying.


I still prefer the version, which deals with arrays.  I don't think 
it's too complex.
It surely adds some complexity to the original code, but it's only 70 
lines of code in total.

Everything else are comments and empty lines.

Sincerely yours,
Ivan


Hi Ivan,

If I was to choose, I would take the 1st variant (with index arrays) 
too. The speed does matter!


But anyway, every now and then one would like to write an algorithm that 
gathers sub-strings and then either joins them or dumps them to some 
buffer that eventually becomes another string or array. Since 
String.substring() changed when offset/length fields were removed, there 
is no official way to handle sub-sequences as "vews" of underlying 
String. There was some debate in the past about using String's 
implementation of CharSequence.subSequence() for that purpose - to 
return a private CharSequence implementation that would represent a view 
of underlying String's array, but that was not pursued because of 
limiting API specification that says:


"Returns a character sequence that is a subsequence of this sequence.

An invocation of this method of the form
   str.subSequence(begin, end)
behaves in exactly the same way as the invocation
   str.substring(begin, end)"

My opinion is that this part of text did matter at the time when 
substring() did return a view over underlying array, but when the 
substring method changed, there was no reason to keep the specification 
bound to that text. It would have been wiser to keep the behavior 
unchanged and change the specification. Anyway, what's done is done. I 
don't know if String.subSequence can be changed now. Maybe?


What could be done is introduce some new API. It could be in the form of 
a new class or just a static method (on CharSequence perhaps?) that 
would return a private implementation of CharSequence as a sub-sequence 
view over underlying CharSequence or String's array in case the 
passed-in argument is a String.


With such API one could gather sub-sequence view objects into a 
LinkedList and then dump them into the correctly pre-sized 
StringBuilder. There could even be a String counstructor with the 
following signature:


public String(Iterable sequences, int estimatedLength) { ...

..which would eliminate the last copying step too. Hopefully all 
temporary view objects would be found as method-local and would be 
allocated on stack / in registers by JIT (wishful thinking)...


Regards, Peter



On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for

literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

S

Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-27 Thread Ivan Gerasimov

Hi Sherman!

Please take a look at my other webrev, that I provided in the first message.
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/

I used StringJoiner there, which in some aspects seems to fit better 
here, comparing to StringBuilder.

It does reduce the code, but of course runs slower.
In that version every part of the source string had to be converted to a 
separate String, which add another redundant copying.


I still prefer the version, which deals with arrays.  I don't think it's 
too complex.
It surely adds some complexity to the original code, but it's only 70 
lines of code in total.

Everything else are comments and empty lines.

Sincerely yours,
Ivan

On 27.05.2015 2:38, Xueming Shen wrote:

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for

literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE 
with Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting better.
It was suggested to make the literal variant of String.replace() 
faster.
Currently, this method is implemented as a few calls to regexp API, 
so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less efficient 
too.

Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do 
~15 replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds 
almost 300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan












Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Xueming Shen

Ivan,

It might be worth trying String.index + StringBuilder, instead of 
writing/handling everything yourself.
Yes, it inevitably adds an arraycopy at the end to convert the 
StrinbBuilder to String, but it might have
better balance between performance and code complexity. The regex is 
probably a little heavy for

literal string replacement, but StringBuilder should not be that bad ...

-Sherman

On 5/26/15 4:11 PM, Ivan Gerasimov wrote:

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with 
Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only 
faster, but also more memory efficient.
The following simple stress-test shows that the proposed version is 
able to handle twice larger strings, comparing to the current 
implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting better.
It was suggested to make the literal variant of String.replace() faster.
Currently, this method is implemented as a few calls to regexp API, 
so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less efficient 
too.

Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do 
~15 replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds 
almost 300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan








Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Ivan Gerasimov

I updated the webrev:
http://cr.openjdk.java.net/~igerasim/8058779/02/webrev/

In the check at 2300-2301 and 2351-2352 I replaced MAX_ARRAY_SIZE with 
Integer.MAX_VALUE, which seems to be more accurate here.


And  I want to add that this proposed implementation is not only faster, 
but also more memory efficient.
The following simple stress-test shows that the proposed version is able 
to handle twice larger strings, comparing to the current implementation.



public class C {
public static void main(String[] args) throws Throwable {
String s = "string";
for (int i = 1; i < Integer.MAX_VALUE; ++i) {
try {
s = s.replace("string", "stringstring");
} catch (OutOfMemoryError o) {
System.out.println(i + ") " + s.length());
break;
}
}
}
}


$ time ~/java9/jdk/bin/java -showversion -Xmx1g C
java version "1.9.0-ea"
Java(TM) SE Runtime Environment (build 1.9.0-ea-b63)
Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b63, mixed mode)

25) 100663296

real0m4.525s
user0m4.402s
sys0m1.189s

$ time ~/java9/jdk-build/bin/java -showversion -Xmx1g C
java version "1.9.0-internal"
Java(TM) SE Runtime Environment (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00)
Java HotSpot(TM) 64-Bit Server VM (build 
1.9.0-internal-igerasim_2015_05_23_19_25-b00, mixed mode)


26) 201326592

real0m2.139s
user0m1.960s
sys0m0.461s

Sincerely yours,
Ivan

On 24.05.2015 23:17, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting better.
It was suggested to make the literal variant of String.replace() faster.
Currently, this method is implemented as a few calls to regexp API, so 
that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less efficient too.
Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do ~15 
replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds almost 
300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan






Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Tomasz Kowalczewski
I know it was like that before :) I asked similar question about fixing
String.contains to not call toString on it's CharSequence argument (
http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-March/032394.html).
Responses were generally encouraging.

Regards,
Tomasz

On Tue, May 26, 2015 at 7:39 PM, Ivan Gerasimov 
wrote:

>  But the current implementation converts the argument to String in the
> first place too:
>
> return Pattern.compile(*target.toString()*,
> Pattern.LITERAL).matcher(
> this).replaceAll(Matcher.quoteReplacement(
> *replacement.toString()*));
>
> So I didn't introduce this conversion :-)
>
> When browsing the source code, looking for usages of this method, I only
> saw Strings passed as the parameters.
> Thus, I guess it is reasonable to have the variant which works best in
> this case.
>
> Sincerely yours,
> Ivan
>
>
> On 26.05.2015 20:26, Tomasz Kowalczewski wrote:
>
> I second that. API accepts CharSequence for a reason. The benchmarks might
> look a little different if you pass StringBuilder in a call to replace. It
> looks like removing toString() call on CharSequences will be easy with one
> drawback of loosing the benefit of System.arraycopy call. Note that
> String.indexOf requires String argument but that can also be fixed.
>
> --
> Tomasz
>
> On Tue, May 26, 2015 at 7:17 PM, Xueming Shen  
> 
> wrote:
>
>
>  On 5/26/15 10:08 AM, Ivan Gerasimov wrote:
>
>
>  Thank you Roger for looking at this!
>
> On 26.05.2015 19:40, Roger Riggs wrote:
>
>
>  Hi Ivan,
>
> Did you consider doing the optimization inside of
> Pattern.compile/replaceAll.
> That would have a broader impact and benefit.
>
>
>
>  In Pattern.compile the flag LITERAL can be combined with several other
> flags, which makes things more complex.
> However, in String.replace we've got a special case, which can be
> optimized in a straight-forward (though a little bit verbose) way.
>
>
>  It probably makes big difference to access those characters via
> CharSequence.charAt()
> and the direct internal char[] access. Though avoiding allocating those
> objects created
> by Pattern/Matcher/SB definitely helps.
>
> -Sherman
>
>
>
>
>  Sincerely yours,
> Ivan
>
>  Roger
>
>
> On 5/26/2015 12:36 PM, Ivan Gerasimov wrote:
>
>
>  Thank you Mark for looking at this!
>
> On 26.05.2015 18:39, mark.reinh...@oracle.com wrote:
>
>
>  Your micro-benchmark improvements are significant, but do you have
> evidence to suggest that the performance of this method is actually
> critical to real applications?
>
> In other words, is the added code complexity really worth it?
>
>
>  The enhancement request contains a few links to the discussions of this
> method's performance at open forums.
> The most frequent suggestion is to use alternatives from 3rd party
> libraries.
>
> That should prove the benefits of this fix -- by improving performance
> we can keep some users from moving away from JDK :)
>
> grep shows that langtools would also benefit from making replace()
> faster.
>
> Sincerely yours,
> Ivan
>
>  - Mark
>
>
>
>
>


-- 
Tomasz Kowalczewski


Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Ivan Gerasimov
But the current implementation converts the argument to String in the 
first place too:


return Pattern.compile(*target.toString()*, 
Pattern.LITERAL).matcher(

this).replaceAll(Matcher.quoteReplacement(*replacement.toString()*));


So I didn't introduce this conversion :-)

When browsing the source code, looking for usages of this method, I only 
saw Strings passed as the parameters.
Thus, I guess it is reasonable to have the variant which works best in 
this case.


Sincerely yours,
Ivan

On 26.05.2015 20:26, Tomasz Kowalczewski wrote:

I second that. API accepts CharSequence for a reason. The benchmarks might
look a little different if you pass StringBuilder in a call to replace. It
looks like removing toString() call on CharSequences will be easy with one
drawback of loosing the benefit of System.arraycopy call. Note that
String.indexOf requires String argument but that can also be fixed.

--
Tomasz

On Tue, May 26, 2015 at 7:17 PM, Xueming Shen 
wrote:


On 5/26/15 10:08 AM, Ivan Gerasimov wrote:


Thank you Roger for looking at this!

On 26.05.2015 19:40, Roger Riggs wrote:


Hi Ivan,

Did you consider doing the optimization inside of
Pattern.compile/replaceAll.
That would have a broader impact and benefit.



In Pattern.compile the flag LITERAL can be combined with several other
flags, which makes things more complex.
However, in String.replace we've got a special case, which can be
optimized in a straight-forward (though a little bit verbose) way.


It probably makes big difference to access those characters via
CharSequence.charAt()
and the direct internal char[] access. Though avoiding allocating those
objects created
by Pattern/Matcher/SB definitely helps.

-Sherman




Sincerely yours,
Ivan

  Roger


On 5/26/2015 12:36 PM, Ivan Gerasimov wrote:


Thank you Mark for looking at this!

On 26.05.2015 18:39, mark.reinh...@oracle.com wrote:


Your micro-benchmark improvements are significant, but do you have
evidence to suggest that the performance of this method is actually
critical to real applications?

In other words, is the added code complexity really worth it?


The enhancement request contains a few links to the discussions of this
method's performance at open forums.
The most frequent suggestion is to use alternatives from 3rd party
libraries.

That should prove the benefits of this fix -- by improving performance
we can keep some users from moving away from JDK :)

grep shows that langtools would also benefit from making replace()
faster.

Sincerely yours,
Ivan

  - Mark












Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Tomasz Kowalczewski
I second that. API accepts CharSequence for a reason. The benchmarks might
look a little different if you pass StringBuilder in a call to replace. It
looks like removing toString() call on CharSequences will be easy with one
drawback of loosing the benefit of System.arraycopy call. Note that
String.indexOf requires String argument but that can also be fixed.

--
Tomasz

On Tue, May 26, 2015 at 7:17 PM, Xueming Shen 
wrote:

> On 5/26/15 10:08 AM, Ivan Gerasimov wrote:
>
>> Thank you Roger for looking at this!
>>
>> On 26.05.2015 19:40, Roger Riggs wrote:
>>
>>> Hi Ivan,
>>>
>>> Did you consider doing the optimization inside of
>>> Pattern.compile/replaceAll.
>>> That would have a broader impact and benefit.
>>>
>>>
>> In Pattern.compile the flag LITERAL can be combined with several other
>> flags, which makes things more complex.
>> However, in String.replace we've got a special case, which can be
>> optimized in a straight-forward (though a little bit verbose) way.
>>
>
> It probably makes big difference to access those characters via
> CharSequence.charAt()
> and the direct internal char[] access. Though avoiding allocating those
> objects created
> by Pattern/Matcher/SB definitely helps.
>
> -Sherman
>
>
>
>> Sincerely yours,
>> Ivan
>>
>>  Roger
>>>
>>>
>>> On 5/26/2015 12:36 PM, Ivan Gerasimov wrote:
>>>
 Thank you Mark for looking at this!

 On 26.05.2015 18:39, mark.reinh...@oracle.com wrote:

> Your micro-benchmark improvements are significant, but do you have
> evidence to suggest that the performance of this method is actually
> critical to real applications?
>
> In other words, is the added code complexity really worth it?
>

 The enhancement request contains a few links to the discussions of this
 method's performance at open forums.
 The most frequent suggestion is to use alternatives from 3rd party
 libraries.

 That should prove the benefits of this fix -- by improving performance
 we can keep some users from moving away from JDK :)

 grep shows that langtools would also benefit from making replace()
 faster.

 Sincerely yours,
 Ivan

  - Mark
>
>
>

>>>
>>>
>>>
>>
>


-- 
Tomasz Kowalczewski


Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Xueming Shen

On 5/26/15 10:08 AM, Ivan Gerasimov wrote:

Thank you Roger for looking at this!

On 26.05.2015 19:40, Roger Riggs wrote:

Hi Ivan,

Did you consider doing the optimization inside of 
Pattern.compile/replaceAll.

That would have a broader impact and benefit.



In Pattern.compile the flag LITERAL can be combined with several other 
flags, which makes things more complex.
However, in String.replace we've got a special case, which can be 
optimized in a straight-forward (though a little bit verbose) way.


It probably makes big difference to access those characters via 
CharSequence.charAt()
and the direct internal char[] access. Though avoiding allocating those 
objects created

by Pattern/Matcher/SB definitely helps.

-Sherman



Sincerely yours,
Ivan


Roger


On 5/26/2015 12:36 PM, Ivan Gerasimov wrote:

Thank you Mark for looking at this!

On 26.05.2015 18:39, mark.reinh...@oracle.com wrote:

Your micro-benchmark improvements are significant, but do you have
evidence to suggest that the performance of this method is actually
critical to real applications?

In other words, is the added code complexity really worth it?


The enhancement request contains a few links to the discussions of 
this method's performance at open forums.
The most frequent suggestion is to use alternatives from 3rd party 
libraries.


That should prove the benefits of this fix -- by improving 
performance we can keep some users from moving away from JDK :)


grep shows that langtools would also benefit from making replace() 
faster.


Sincerely yours,
Ivan


- Mark














Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Ivan Gerasimov

Thank you Roger for looking at this!

On 26.05.2015 19:40, Roger Riggs wrote:

Hi Ivan,

Did you consider doing the optimization inside of 
Pattern.compile/replaceAll.

That would have a broader impact and benefit.



In Pattern.compile the flag LITERAL can be combined with several other 
flags, which makes things more complex.
However, in String.replace we've got a special case, which can be 
optimized in a straight-forward (though a little bit verbose) way.


Sincerely yours,
Ivan


Roger


On 5/26/2015 12:36 PM, Ivan Gerasimov wrote:

Thank you Mark for looking at this!

On 26.05.2015 18:39, mark.reinh...@oracle.com wrote:

Your micro-benchmark improvements are significant, but do you have
evidence to suggest that the performance of this method is actually
critical to real applications?

In other words, is the added code complexity really worth it?


The enhancement request contains a few links to the discussions of 
this method's performance at open forums.
The most frequent suggestion is to use alternatives from 3rd party 
libraries.


That should prove the benefits of this fix -- by improving 
performance we can keep some users from moving away from JDK :)


grep shows that langtools would also benefit from making replace() 
faster.


Sincerely yours,
Ivan


- Mark












Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Roger Riggs

Hi Ivan,

Did you consider doing the optimization inside of 
Pattern.compile/replaceAll.

That would have a broader impact and benefit.

Roger


On 5/26/2015 12:36 PM, Ivan Gerasimov wrote:

Thank you Mark for looking at this!

On 26.05.2015 18:39, mark.reinh...@oracle.com wrote:

Your micro-benchmark improvements are significant, but do you have
evidence to suggest that the performance of this method is actually
critical to real applications?

In other words, is the added code complexity really worth it?


The enhancement request contains a few links to the discussions of 
this method's performance at open forums.
The most frequent suggestion is to use alternatives from 3rd party 
libraries.


That should prove the benefits of this fix -- by improving performance 
we can keep some users from moving away from JDK :)


grep shows that langtools would also benefit from making replace() 
faster.


Sincerely yours,
Ivan


- Mark








Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread Ivan Gerasimov

Thank you Mark for looking at this!

On 26.05.2015 18:39, mark.reinh...@oracle.com wrote:

Your micro-benchmark improvements are significant, but do you have
evidence to suggest that the performance of this method is actually
critical to real applications?

In other words, is the added code complexity really worth it?


The enhancement request contains a few links to the discussions of this 
method's performance at open forums.
The most frequent suggestion is to use alternatives from 3rd party 
libraries.


That should prove the benefits of this fix -- by improving performance 
we can keep some users from moving away from JDK :)


grep shows that langtools would also benefit from making replace() faster.

Sincerely yours,
Ivan


- Mark






Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-26 Thread mark . reinhold
Your micro-benchmark improvements are significant, but do you have
evidence to suggest that the performance of this method is actually
critical to real applications?

In other words, is the added code complexity really worth it?

- Mark


Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-24 Thread Ivan Gerasimov

Hi Rémi!

On 25.05.2015 0:23, Remi Forax wrote:

Hi Ivan,
I will let people to comment on your first version but your second 
version uses a Stream and some lambdas which is in my opinion not a 
good idea in term of dependencies (Stream will be loaded too soon by 
example) and it may create some trouble in the future because lambdas 
depends on some classes of java.lang.invoke that are initialized late 
in the boot process while java.lang.String is initialized early.



Good point. It could be rewritten using anonymous class instead.
Though, I still like the first version better :-)

Sincerely yours,
Ivan


cheers,
Rémi

On 05/24/2015 10:17 PM, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting better.
It was suggested to make the literal variant of String.replace() faster.
Currently, this method is implemented as a few calls to regexp API, 
so that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less efficient 
too.

Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do 
~15 replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds 
almost 300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan








Re: 8058779: Faster implementation of String.replace(CharSequence, CharSequence)

2015-05-24 Thread Remi Forax

Hi Ivan,
I will let people to comment on your first version but your second 
version uses a Stream and some lambdas which is in my opinion not a good 
idea in term of dependencies (Stream will be loaded too soon by example) 
and it may create some trouble in the future because lambdas depends on 
some classes of java.lang.invoke that are initialized late in the boot 
process while java.lang.String is initialized early.


cheers,
Rémi

On 05/24/2015 10:17 PM, Ivan Gerasimov wrote:

Hello everybody!

I know many people here like it when the performance is getting better.
It was suggested to make the literal variant of String.replace() faster.
Currently, this method is implemented as a few calls to regexp API, so 
that the whole implementation takes only two lines of code.


I've created two versions of the fix.
In the first one, we scan the string and store indices of the found 
substrings in an array.

Then, we allocate the precisely sized char array and fill it it.
The case with the empty target has to be handled separately.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058779
WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/00/webrev/

The second variant is much less verbose, however it's less efficient too.
Here the StringJoiner is used as an intermediate storage.

WEBREV: http://cr.openjdk.java.net/~igerasim/8058779/01/webrev/


Here are the micro-benchmark results (in a string of ~300 chars do ~15 
replacements).

0) Baseline
MyBenchmark.test  thrpt   40  257'051.948 ± 4537.484  ops/s

1) Heavy-duty +308%
MyBenchmark.test  thrpt   40  1'049'235.602 ± 15501.803  ops/s

2) StringJoiner +190%
MyBenchmark.test  thrpt   40  746'000.629 ± 15387.036  ops/s


Personally, I like my first variant better, even though it adds almost 
300 lines of code.

But I'd like to hear what people think of it.

Sincerely yours,
Ivan