On 11/28/2016 11:27 AM, Paul Sandoz wrote:
On 25 Nov 2016, at 02:47, Tobias Hartmann<tobias.hartm...@oracle.com>  wrote:

I'm not sure if it is still desired to do the same boundary check in case of 
LATIN1
for the benefit of consistency.  Assume there might be concurrent 
access/operation
between val = this.value and count = this.count; (for StringBuilder) for 
example,
the this.value got doubled and the this.count got increased. One will end up 
with
StringIndexOutOfBoundsException() from checkOffset, but the other will be ioobe
from vm?
You mean when hitting a check in an intrinsic compared to when hitting a check 
in the Java wrapper?

Not quite. There is a spliterator implementation for each coder, in the case of 
the LATIN1 coder there are no associated intrinsics. I think it’s ok just to 
perform the explicit bounds check for the UTF16 coder, since for the LATIN1 
bounds checks will be performed by baloads.

However, i think there is bug. The coder could change from LATIN1 to UTF16, 
while holding a reference to a byte[] value as LATIN1.

For StringBuilder i could fix that by atomically reading the value/count/coder, 
but there is still an issue with StringBuffer. Thus, in the UTF16 coder 
spliterator we need to perform the explicit bounds before *every* 
StringUTF16.getChar().

Webrev updated:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8170155-string-buffer-builder-late-binding/webrev/

Paul.


Hi Paul,

Seems like we do have a bug here. But I think to use "charAt()" might not be the
desired/correct approach here?

For StringBuffer, since we have to guarantee the correctness in concurrent use 
scenario,
I would assume we need a synchronized version of getSupplier(...) to return a 
late binding
and thread-safe Supplier in StringBuffer, similar to "getBytes(...)" (we have a 
un-synced
version of getBytes() in AbstractStringBuilder and a synced version at the 
bottom of the
StringBuffer.java). This should be for the latin1 case as well.

For StringBuilder, since the only concern is the bounds check, I think the 
existing
checkOffset(...) should be good enough, even in case the coder changes from 
latin1
to utf16, as the checkOffset(...) checks the "count" against "val.length >> 
coder" on
the local copy,  the getChar() access after that should be safe (though might 
not be
correct, if it's a utf16 coder on a latin1 byte[]).

Sherman


Actually, bound checks should always be done in the Java wrapper method that 
calls the (unchecked) intrinsic. In general, the unchecked intrinsic should not 
be called directly. StringUTF16.putChar/getChar() is an exception because there 
are places where we we need to omit the check for performance reasons.

I'm planning to revisit this with JDK-8156534 in JDK 10.

Best regards,
Tobias

Sherman


If so i propose the following to make this clearer:

return StreamSupport.intStream(
         () ->   {
             byte[] val = this.value; int count = this.count;
             if (coder == LATIN1) {
                 return new StringLatin1.CharsSpliterator(val, 0, count, 0);
             } else {
                 // Perform an explicit bounds check since HotSpot
                 // intrinsics to access UTF-16 characters in the byte[]
                 // array will not perform bounds checks
                 checkOffset(count, val.length>>   coder);
                 return new StringUTF16.CharsSpliterator(val, 0, count, 0);
             }
         },
         Spliterator.ORDERED | Spliterator.SIZED | Spliterator.SUBSIZED,
         false);

Paul.

Reply via email to