Hello all,

encouraged by your responses I did a simple implementation and performed
some JMH experiments. Details below.

Webrev [1] contains my changes. I deliberately wanted to do minimal code
changes. The gist of it is implementing indexOf that works on CharSequence
instead of String's internal char array. Both versions are almost identical
(I did not change existing implementation in place to not impact all other
String-only paths that use it).

In my first attempt I just inlined (and simplified) indexOf implementation
into String::contains, but in the end decided to create general purpose
(package private) version. This might prove useful for others[2]

JMH test project is here [3]. All benchmarks were run using "java -jar
benchmarks.jar -wi 10 -i 10 -f -prof gc" on Amazon EC2 instance (c4.xlarge,
4 virtual cores). I have run it against three java builds:

a. stock java 8u40
b. my own build from clean jdk9 sources
c. my own build from modified jdk9 sources

Results with gc profiler enabled are included in benchmark repo. This gist
contains summary results [4].

I know that this test covers only very narrow space of possible performance
regressions (ideas form more comprehensive tests are welcome). The results
show that String only path is not impacted by my changes. When using actual
CharSequence (StringBuilder in this case) we are mostly winning, exception
is case when CharSequence is of medium size and has many partial matches in
searched string.

I hope this exercise will be useful to someone and that this change might
be considered for inclusion in OpenJDK. If not then maybe at least tests
for String::indexOf?

[1] http://openjdk.s3-website-us-east-1.amazonaws.com/
[2] Changeset from
http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-May/033678.html uses
indexOf that could accept CharSequence instead of String.
[3] https://github.com/tkowalcz/openjdk-jmh
[4] https://gist.github.com/tkowalcz/d8f5b9435e184f65fd2a

Regards,
Tomasz Kowalczewski

On Sat, Mar 21, 2015 at 9:21 AM, Tomasz Kowalczewski <
tomasz.kowalczew...@gmail.com> wrote:

> There are other places which accept CharSequence and iterate over it
> not caring about possible concurrent modification (the only sane way
> IMHO). Examples are String.contentEquals and String.join that uses
> StringJoiner which iterates its argument char by char.
>
> It looks like contains is the exception in this regard.
>
> On 20 mar 2015, at 17:05, Vitaly Davidovich <vita...@gmail.com> wrote:
>
> >>
> >> If CharSequence is mutable and thread-safe, I would expect toString()
> >> implementation to provide the atomic snapshot (e.g. do the
> >> synchronization). Therefore, I find Sherman's argument interesting.
> >
> >
> > That's my point about "it ought to be up to the caller" -- they're in a
> > position to make this call, but String.contains() is playing defensive
> > because it has no clue of the context.
> >
> > On Fri, Mar 20, 2015 at 11:55 AM, Aleksey Shipilev <
> > aleksey.shipi...@oracle.com> wrote:
> >
> >> If CharSequence is mutable and thread-safe, I would expect toString()
> >> implementation to provide the atomic snapshot (e.g. do the
> >> synchronization). Therefore, I find Sherman's argument interesting.
> >>
> >> On the other hand, we don't seem to be having any details/requirements
> >> for contains(CharSequence) w.r.t. it's own argument. What if
> >> String.contains pulls the values one charAt at a time? The concurrency
> >> requirements are not enforceable in CharSequence alone then, unless you
> >> call the supposed-to-be-atomic toString() first.
> >>
> >> -Aleksey.
> >>
> >>> On 03/20/2015 06:48 PM, Vitaly Davidovich wrote:
> >>> Yes, but that ought to be for the caller to decide.  Also, although the
> >>> resulting String is immutable, toString() itself may observe mutation.
> >>>
> >>> On Fri, Mar 20, 2015 at 11:40 AM, Xueming Shen <
> xueming.s...@oracle.com>
> >>> wrote:
> >>>
> >>>>> On 03/20/2015 02:34 AM, Tomasz Kowalczewski wrote:
> >>>>>
> >>>>> Hello!
> >>>>>
> >>>>> Current implementation of String.contains that accepts CharSequence
> >> calls
> >>>>> toString on it and passes resulting string to indexOf(String). This
> IMO
> >>>>> defeats the purpose of using CharSequences (that is to have a mutable
> >>>>> character buffer and not allocate unnecessary objects).
> >>>> It is arguable that cs.toString() may serve the purpose of taking a
> >>>> snapshot of an otherwise
> >>>> "mutable" character buffer?
> >>>>
> >>>> -Sherman
> >>>>
> >>>>
> >>>> Is changing this a desirable development? It seems pretty
> >> straightforward
> >>>>> to port indexOf(String) to use CharSequence.
> >>>>>
> >>>>> If all you need is patch then I can work on it (I have signed OCA)
> just
> >>>>> wanted to make sure it is not a futile work.
> >>
> >>
> >>
>



-- 
Tomasz Kowalczewski

Reply via email to