Re: String.subSequence and CR#6924259: Remove offset and count fields from java.lang.String

Mike Duigou Tue, 26 Jun 2012 11:11:52 -0700

On Jun 26 2012, at 07:13 , Martin Desruisseaux wrote:

> If String.substring(int, int) now performs a copy of the underlying char[] 
> array and if there is no String.subSequence(int, int) providing the old 
> functionality, maybe the following implications should be investigated?
> 
> 
> StringBuilder.append(...)
> --------------------
> Since, in order to avoid a useless array copy, the users may be advised to 
> replace the following pattern:
> 
>      StringBuilder.append(string.substring(lower, upper));
> by:
>      StringBuilder.append(string, lower, upper);


This would seem to be a good refactoring regardless of the substring 
implementation as it avoids creation of a temporary object.

> 
> would it be worth to add a special-case in the 
> AbstractStringBuilder.append(CharSequence, int, int) implementation for the 
> String case in order to reach the efficiency of the 
> AbstractStringBuilder.append(String) method? The later copies the data with a 
> single call to System.arraycopy, as opposed to the former which invoke 
> CharSequence.charAt(int) in a loop.

I think a microbenchmark to compare 
StringBuilder.append(string.substring(lower, upper)) with 
AbstractStringBuilder.append.append(CharSequence, int, int) would help. I 
wouldn't be surprised if the later is faster when a substring has to be created 
but slower when the string is an existing string.

> 
> Integer.parseInt(...)
> ----------------
> There was a thread one years ago about allowing Integer.parseInt(String) to 
> accept a CharSequence.
> 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-April/thread.html#9801
> 
> One invoked reason was performance, since the cost of calling 
> CharSequence.toString() has been measured with the NetBeans profiler as 
> significant (assuming that the CharSequence is not already a String) when 
> reading large ASCII files. Now if the new String.substring(...) 
> implementation copies the internal array, we may expect a performance cost 
> similar to StringBuilder.toString(). Would it be worth to revisit the 
> Integer.parseInt(String) case - and similar methods in other wrapper classes 
> - for allowing CharSequence input?

Probably. 

>    Martin
> 
> 
> 
> Le 23/06/12 00:15, Mike Duigou a écrit :
>> I've made a test implementation of subSequence() utilizing an inner class 
>> with offset and count fields to try to understand all the parts that would 
>> be impacted. My observations thus far:
>> 
>> - The specification of the subSequence() method is currently too specific. 
>> It says that the result is a subString(). This would no longer be true. 
>> Hopefully nobody assumed that this meant they could cast the result to 
>> String. I know, why would you if you can just call subString() instead? I've 
>> learned to assume that somebody somewhere does always does the most 
>> unexpected thing.
>> - The CharSequences returned by subSequence would follow only the general 
>> CharSequence rules for equals()/hashCode(). Any current usages of the result 
>> of subSequence for equals() or hashing, even though it's not advised, would 
>> break. We could add equals() and hashCode() implementations to the 
>> CharSequence returned but they would probably be expensive.
>> - In general I wonder if parsers will be satisfied with a CharSequence that 
>> only implements identity equals().
>> - I also worry about applications that currently do use subSequence 
>> currently and which will fail when the result is not a String instance as 
>> String.equals() will return false for all CharSequences that aren't Strings. 
>> ie. CharSequence token =ine.subSequence(line, start, end); if 
>> (keyword.equals(token)) ... This would now fail.
>> 
>> At this point I wonder if this is a feature worth pursuing.
>> 
>> Mike
>

Re: String.subSequence and CR#6924259: Remove offset and count fields from java.lang.String

Reply via email to