Re: RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

2016-12-01 Thread Xueming Shen

On 12/01/2016 12:05 PM, Paul Sandoz wrote:

On 1 Dec 2016, at 11:17, Xueming Shen  wrote:

On 11/28/2016 11:27 AM, Paul Sandoz wrote:

On 25 Nov 2016, at 02:47, Tobias Hartmann   wrote:


I'm not sure if it is still desired to do the same boundary check in case of 
LATIN1
for the benefit of consistency.  Assume there might be concurrent 
access/operation
between val = this.value and count = this.count; (for StringBuilder) for 
example,
the this.value got doubled and the this.count got increased. One will end up 
with
StringIndexOutOfBoundsException() from checkOffset, but the other will be ioobe
from vm?

You mean when hitting a check in an intrinsic compared to when hitting a check 
in the Java wrapper?


Not quite. There is a spliterator implementation for each coder, in the case of 
the LATIN1 coder there are no associated intrinsics. I think it’s ok just to 
perform the explicit bounds check for the UTF16 coder, since for the LATIN1 
bounds checks will be performed by baloads.

However, i think there is bug. The coder could change from LATIN1 to UTF16, 
while holding a reference to a byte[] value as LATIN1.

For StringBuilder i could fix that by atomically reading the value/count/coder, 
but there is still an issue with StringBuffer. Thus, in the UTF16 coder 
spliterator we need to perform the explicit bounds before *every* 
StringUTF16.getChar().

Webrev updated:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/

Paul.


Hi Paul,

Seems like we do have a bug here. But I think to use "charAt()" might not be the
desired/correct approach here?


I think the patch is correct for both StringBuffer and StringBuilder.

It is specified that buffer must remain constant during execution of the stream 
terminal operation otherwise the results are undefined. Such an undefined 
result might be the throwing of an IndexOutOfBounds.



Ok, if the spec does not require to guarantee the correctness in , then the
patch is fine. It might be a small surprise  for some of the StringBuffer users
with the false assumption that StringBuffer is thread-safe so the returned
stream is kinda thread-safe as well.

I'm fine with either way you think is appropriate from "stream" point of
view.

Sherman





For StringBuffer, since we have to guarantee the correctness in concurrent use 
scenario,
I would assume we need a synchronized version of getSupplier(...) to return a 
late binding
and thread-safe Supplier in StringBuffer, similar to "getBytes(...)" (we have a 
un-synced
version of getBytes() in AbstractStringBuilder and a synced version at the 
bottom of the
StringBuffer.java). This should be for the latin1 case as well.


I did ponder that, and moving the stream returning methods into the concrete 
builder classes. I opted for the sneakier approach :-) what do you prefer?



For StringBuilder, since the only concern is the bounds check, I think the 
existing
checkOffset(...) should be good enough, even in case the coder changes from 
latin1
to utf16, as the checkOffset(...) checks the "count" against "val.length>>  
coder" on
the local copy,  the getChar() access after that should be safe (though might 
not be
correct, if it's a utf16 coder on a latin1 byte[]).


Yes, i was being very conservative (I mistakenly thought the original code read 
the coder field twice).

If we go with the single wider bounds check i think we should place it in the a 
root spliterator constructor if possible.

Paul.




Re: RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

2016-12-01 Thread Paul Sandoz

> On 1 Dec 2016, at 11:17, Xueming Shen  wrote:
> 
> On 11/28/2016 11:27 AM, Paul Sandoz wrote:
>>> On 25 Nov 2016, at 02:47, Tobias Hartmann  
>>> wrote:
>>> 
 I'm not sure if it is still desired to do the same boundary check in case 
 of LATIN1
 for the benefit of consistency.  Assume there might be concurrent 
 access/operation
 between val = this.value and count = this.count; (for StringBuilder) for 
 example,
 the this.value got doubled and the this.count got increased. One will end 
 up with
 StringIndexOutOfBoundsException() from checkOffset, but the other will be 
 ioobe
 from vm?
>>> You mean when hitting a check in an intrinsic compared to when hitting a 
>>> check in the Java wrapper?
>>> 
>> Not quite. There is a spliterator implementation for each coder, in the case 
>> of the LATIN1 coder there are no associated intrinsics. I think it’s ok just 
>> to perform the explicit bounds check for the UTF16 coder, since for the 
>> LATIN1 bounds checks will be performed by baloads.
>> 
>> However, i think there is bug. The coder could change from LATIN1 to UTF16, 
>> while holding a reference to a byte[] value as LATIN1.
>> 
>> For StringBuilder i could fix that by atomically reading the 
>> value/count/coder, but there is still an issue with StringBuffer. Thus, in 
>> the UTF16 coder spliterator we need to perform the explicit bounds before 
>> *every* StringUTF16.getChar().
>> 
>> Webrev updated:
>> 
>>   
>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/
>> 
>> Paul.
>> 
> 
> Hi Paul,
> 
> Seems like we do have a bug here. But I think to use "charAt()" might not be 
> the
> desired/correct approach here?
> 

I think the patch is correct for both StringBuffer and StringBuilder.

It is specified that buffer must remain constant during execution of the stream 
terminal operation otherwise the results are undefined. Such an undefined 
result might be the throwing of an IndexOutOfBounds.


> For StringBuffer, since we have to guarantee the correctness in concurrent 
> use scenario,
> I would assume we need a synchronized version of getSupplier(...) to return a 
> late binding
> and thread-safe Supplier in StringBuffer, similar to "getBytes(...)" (we have 
> a un-synced
> version of getBytes() in AbstractStringBuilder and a synced version at the 
> bottom of the
> StringBuffer.java). This should be for the latin1 case as well.
> 

I did ponder that, and moving the stream returning methods into the concrete 
builder classes. I opted for the sneakier approach :-) what do you prefer?


> For StringBuilder, since the only concern is the bounds check, I think the 
> existing
> checkOffset(...) should be good enough, even in case the coder changes from 
> latin1
> to utf16, as the checkOffset(...) checks the "count" against "val.length >> 
> coder" on
> the local copy,  the getChar() access after that should be safe (though might 
> not be
> correct, if it's a utf16 coder on a latin1 byte[]).
> 

Yes, i was being very conservative (I mistakenly thought the original code read 
the coder field twice).

If we go with the single wider bounds check i think we should place it in the a 
root spliterator constructor if possible.

Paul.


Re: RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

2016-12-01 Thread Xueming Shen

On 11/28/2016 11:27 AM, Paul Sandoz wrote:

On 25 Nov 2016, at 02:47, Tobias Hartmann  wrote:


I'm not sure if it is still desired to do the same boundary check in case of 
LATIN1
for the benefit of consistency.  Assume there might be concurrent 
access/operation
between val = this.value and count = this.count; (for StringBuilder) for 
example,
the this.value got doubled and the this.count got increased. One will end up 
with
StringIndexOutOfBoundsException() from checkOffset, but the other will be ioobe
from vm?

You mean when hitting a check in an intrinsic compared to when hitting a check 
in the Java wrapper?


Not quite. There is a spliterator implementation for each coder, in the case of 
the LATIN1 coder there are no associated intrinsics. I think it’s ok just to 
perform the explicit bounds check for the UTF16 coder, since for the LATIN1 
bounds checks will be performed by baloads.

However, i think there is bug. The coder could change from LATIN1 to UTF16, 
while holding a reference to a byte[] value as LATIN1.

For StringBuilder i could fix that by atomically reading the value/count/coder, 
but there is still an issue with StringBuffer. Thus, in the UTF16 coder 
spliterator we need to perform the explicit bounds before *every* 
StringUTF16.getChar().

Webrev updated:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/

Paul.



Hi Paul,

Seems like we do have a bug here. But I think to use "charAt()" might not be the
desired/correct approach here?

For StringBuffer, since we have to guarantee the correctness in concurrent use 
scenario,
I would assume we need a synchronized version of getSupplier(...) to return a 
late binding
and thread-safe Supplier in StringBuffer, similar to "getBytes(...)" (we have a 
un-synced
version of getBytes() in AbstractStringBuilder and a synced version at the 
bottom of the
StringBuffer.java). This should be for the latin1 case as well.

For StringBuilder, since the only concern is the bounds check, I think the 
existing
checkOffset(...) should be good enough, even in case the coder changes from 
latin1
to utf16, as the checkOffset(...) checks the "count" against "val.length >> 
coder" on
the local copy,  the getChar() access after that should be safe (though might 
not be
correct, if it's a utf16 coder on a latin1 byte[]).

Sherman



Actually, bound checks should always be done in the Java wrapper method that 
calls the (unchecked) intrinsic. In general, the unchecked intrinsic should not 
be called directly. StringUTF16.putChar/getChar() is an exception because there 
are places where we we need to omit the check for performance reasons.

I'm planning to revisit this with JDK-8156534 in JDK 10.

Best regards,
Tobias


Sherman



If so i propose the following to make this clearer:

return StreamSupport.intStream(
 () ->   {
 byte[] val = this.value; int count = this.count;
 if (coder == LATIN1) {
 return new StringLatin1.CharsSpliterator(val, 0, count, 0);
 } else {
 // Perform an explicit bounds check since HotSpot
 // intrinsics to access UTF-16 characters in the byte[]
 // array will not perform bounds checks
 checkOffset(count, val.length>>   coder);
 return new StringUTF16.CharsSpliterator(val, 0, count, 0);
 }
 },
 Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
 false);

Paul.




Re: RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

2016-11-28 Thread Paul Sandoz

> On 25 Nov 2016, at 02:47, Tobias Hartmann  wrote:
> 
>> I'm not sure if it is still desired to do the same boundary check in case of 
>> LATIN1
>> for the benefit of consistency.  Assume there might be concurrent 
>> access/operation
>> between val = this.value and count = this.count; (for StringBuilder) for 
>> example,
>> the this.value got doubled and the this.count got increased. One will end up 
>> with
>> StringIndexOutOfBoundsException() from checkOffset, but the other will be 
>> ioobe
>> from vm?
> 
> You mean when hitting a check in an intrinsic compared to when hitting a 
> check in the Java wrapper?
> 

Not quite. There is a spliterator implementation for each coder, in the case of 
the LATIN1 coder there are no associated intrinsics. I think it’s ok just to 
perform the explicit bounds check for the UTF16 coder, since for the LATIN1 
bounds checks will be performed by baloads.

However, i think there is bug. The coder could change from LATIN1 to UTF16, 
while holding a reference to a byte[] value as LATIN1.

For StringBuilder i could fix that by atomically reading the value/count/coder, 
but there is still an issue with StringBuffer. Thus, in the UTF16 coder 
spliterator we need to perform the explicit bounds before *every* 
StringUTF16.getChar().

Webrev updated:

  
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/

Paul.

> Actually, bound checks should always be done in the Java wrapper method that 
> calls the (unchecked) intrinsic. In general, the unchecked intrinsic should 
> not be called directly. StringUTF16.putChar/getChar() is an exception because 
> there are places where we we need to omit the check for performance reasons.
> 
> I'm planning to revisit this with JDK-8156534 in JDK 10.
> 
> Best regards,
> Tobias
> 
>> 
>> Sherman
>> 
>> 
>>> If so i propose the following to make this clearer:
>>> 
>>> return StreamSupport.intStream(
>>> () ->  {
>>> byte[] val = this.value; int count = this.count;
>>> if (coder == LATIN1) {
>>> return new StringLatin1.CharsSpliterator(val, 0, count, 0);
>>> } else {
>>> // Perform an explicit bounds check since HotSpot
>>> // intrinsics to access UTF-16 characters in the byte[]
>>> // array will not perform bounds checks
>>> checkOffset(count, val.length>>  coder);
>>> return new StringUTF16.CharsSpliterator(val, 0, count, 0);
>>> }
>>> },
>>> Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
>>> false);
>>> 
>>> Paul.
>> 



Re: RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

2016-11-25 Thread Tobias Hartmann
Hi,

On 24.11.2016 00:56, Xueming Shen wrote:
> On 11/23/2016 02:39 PM, Paul Sandoz wrote:
>> Hi,
>>
>> Please review:
>>
>>
>> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/
>>
>> This patch:
>>
>> 1) updates the StringBuilder/StringBuffer.chars()/codePoints() so they are 
>> late binding.
>>
>> 2) refactors the spliterator late binding and fail fast tests, separating 
>> them out, and to the former additional tests are added for CharSequence 
>> implementations and BitSet.
>>
>> The code in AbstractStringBuilder has the following bounds check call to 
>> checkOffset:
>>
>> 1558 return StreamSupport.intStream(
>> 1559 () ->  {
>> 1560 byte[] val = this.value; int count = this.count;
>> 1561 checkOffset(count, val.length>>  coder);
>> 1562 return coder == LATIN1
>> 1563? new StringLatin1.CharsSpliterator(val, 0, 
>> count, 0)
>> 1564: new StringUTF16.CharsSpliterator(val, 0, 
>> count, 0);
>> 1565 },
>> 1566 Spliterator.ORDERED | Spliterator.SIZED | 
>> Spliterator.SUBSIZED,
>> 1567 false);
>>
>> (Separately checkOffset could be replaced with the internal 
>> Preconditions.checkIndex.)

Yes, if possible, bound checks should be replaced by the intrinsified 
Preconditions.check() (see https://bugs.openjdk.java.net/browse/JDK-8042997).

>> On initial inspection that should be a no-op, but talking to Sherman we 
>> believe it is required for UTF-16 access, since the intrinsics do not 
>> perform bounds checks, so they need to be guarded (for conformance purposes 
>> if a no-op).

If an out of bounds access is possible when using 
StringUTF16.putChar/getChar(), the check in the Java code is required because 
the intrinsic does not implement any bound checks (see below).

> One of the purposes of having StringUTF16.putChar() (other than the charAt())
> is to avoid the expensive boundary check on each/every char access. I'm 
> forwarding
> the email to Tobias to make sure it's still the case on hotspot side.

Yes, this is still correct. For performance reasons, the intrinsic for 
StringUTF16.putChar/getChar() has no boundary checks (see 
LibraryCallKit::inline_string_char_access()). It should *only* be used without 
a bounds check in the Java code if the index is guaranteed to be always in 
bounds.

> I'm not sure if it is still desired to do the same boundary check in case of 
> LATIN1
> for the benefit of consistency.  Assume there might be concurrent 
> access/operation
> between val = this.value and count = this.count; (for StringBuilder) for 
> example,
> the this.value got doubled and the this.count got increased. One will end up 
> with
> StringIndexOutOfBoundsException() from checkOffset, but the other will be 
> ioobe
> from vm?

You mean when hitting a check in an intrinsic compared to when hitting a check 
in the Java wrapper?

Actually, bound checks should always be done in the Java wrapper method that 
calls the (unchecked) intrinsic. In general, the unchecked intrinsic should not 
be called directly. StringUTF16.putChar/getChar() is an exception because there 
are places where we we need to omit the check for performance reasons.

I'm planning to revisit this with JDK-8156534 in JDK 10.

Best regards,
Tobias

> 
> Sherman
> 
> 
>> If so i propose the following to make this clearer:
>>
>> return StreamSupport.intStream(
>>  () ->  {
>>  byte[] val = this.value; int count = this.count;
>>  if (coder == LATIN1) {
>>  return new StringLatin1.CharsSpliterator(val, 0, count, 0);
>>  } else {
>>  // Perform an explicit bounds check since HotSpot
>>  // intrinsics to access UTF-16 characters in the byte[]
>>  // array will not perform bounds checks
>>  checkOffset(count, val.length>>  coder);
>>  return new StringUTF16.CharsSpliterator(val, 0, count, 0);
>>  }
>>  },
>>  Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
>>  false);
>>
>> Paul.
> 


Re: RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

2016-11-23 Thread Xueming Shen

On 11/23/2016 02:39 PM, Paul Sandoz wrote:

Hi,

Please review:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/

This patch:

1) updates the StringBuilder/StringBuffer.chars()/codePoints() so they are late 
binding.

2) refactors the spliterator late binding and fail fast tests, separating them 
out, and to the former additional tests are added for CharSequence 
implementations and BitSet.

The code in AbstractStringBuilder has the following bounds check call to 
checkOffset:

1558 return StreamSupport.intStream(
1559 () ->  {
1560 byte[] val = this.value; int count = this.count;
1561 checkOffset(count, val.length>>  coder);
1562 return coder == LATIN1
1563? new StringLatin1.CharsSpliterator(val, 0, 
count, 0)
1564: new StringUTF16.CharsSpliterator(val, 0, 
count, 0);
1565 },
1566 Spliterator.ORDERED | Spliterator.SIZED | 
Spliterator.SUBSIZED,
1567 false);

(Separately checkOffset could be replaced with the internal 
Preconditions.checkIndex.)

On initial inspection that should be a no-op, but talking to Sherman we believe 
it is required for UTF-16 access, since the intrinsics do not perform bounds 
checks, so they need to be guarded (for conformance purposes if a no-op).


One of the purposes of having StringUTF16.putChar() (other than the charAt())
is to avoid the expensive boundary check on each/every char access. I'm 
forwarding
the email to Tobias to make sure it's still the case on hotspot side.

I'm not sure if it is still desired to do the same boundary check in case of 
LATIN1
for the benefit of consistency.  Assume there might be concurrent 
access/operation
between val = this.value and count = this.count; (for StringBuilder) for 
example,
the this.value got doubled and the this.count got increased. One will end up 
with
StringIndexOutOfBoundsException() from checkOffset, but the other will be ioobe
from vm?

Sherman



If so i propose the following to make this clearer:

return StreamSupport.intStream(
 () ->  {
 byte[] val = this.value; int count = this.count;
 if (coder == LATIN1) {
 return new StringLatin1.CharsSpliterator(val, 0, count, 0);
 } else {
 // Perform an explicit bounds check since HotSpot
 // intrinsics to access UTF-16 characters in the byte[]
 // array will not perform bounds checks
 checkOffset(count, val.length>>  coder);
 return new StringUTF16.CharsSpliterator(val, 0, count, 0);
 }
 },
 Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
 false);

Paul.




RFR 8170155: StringBuffer and StringBuilder stream methods are not late-binding

2016-11-23 Thread Paul Sandoz
Hi,

Please review:

  
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/

This patch:

1) updates the StringBuilder/StringBuffer.chars()/codePoints() so they are late 
binding.

2) refactors the spliterator late binding and fail fast tests, separating them 
out, and to the former additional tests are added for CharSequence 
implementations and BitSet.

The code in AbstractStringBuilder has the following bounds check call to 
checkOffset:

1558 return StreamSupport.intStream(
1559 () -> {
1560 byte[] val = this.value; int count = this.count;
1561 checkOffset(count, val.length >> coder);
1562 return coder == LATIN1
1563? new StringLatin1.CharsSpliterator(val, 0, 
count, 0)
1564: new StringUTF16.CharsSpliterator(val, 0, 
count, 0);
1565 },
1566 Spliterator.ORDERED | Spliterator.SIZED | 
Spliterator.SUBSIZED,
1567 false);

(Separately checkOffset could be replaced with the internal 
Preconditions.checkIndex.)

On initial inspection that should be a no-op, but talking to Sherman we believe 
it is required for UTF-16 access, since the intrinsics do not perform bounds 
checks, so they need to be guarded (for conformance purposes if a no-op).

If so i propose the following to make this clearer:

return StreamSupport.intStream(
() -> {
byte[] val = this.value; int count = this.count;
if (coder == LATIN1) {
return new StringLatin1.CharsSpliterator(val, 0, count, 0);
} else {
// Perform an explicit bounds check since HotSpot
// intrinsics to access UTF-16 characters in the byte[]
// array will not perform bounds checks
checkOffset(count, val.length >> coder);
return new StringUTF16.CharsSpliterator(val, 0, count, 0);
}
},
Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
false);

Paul.