subject:"RFR 8225466 \: Optimize matching BMP Slice nodes"

Re: RFR 8225466 : Optimize matching BMP Slice nodes

2019-12-18 Thread Ivan Gerasimov


Hi Roger.

Thank you for taking a look!

The variant with a single loop was the first thing I tried (should have 
mentioned that in the review request).


Unfortunately, that showed performance decrease.

I suspect that hitting the end of the input should be a less common 
thing, that's why it it beneficial to separate it as a slow path.


With kind regards,

Ivan


On 12/18/19 7:48 AM, Roger Riggs wrote:

Hi Ivan,

Though the new code has a good effect, the asymmetry and duplication 
seems unnecessary.
Can it be structured to have a single copy of the loop comparing the 
available range

and still get the desired performance improvement.

Like:

boolean match(Matcher matcher,int i, CharSequence seq) {
    int[] buf =buffer;
    int len = buf.length;
    for (int j =0; j < Math.min(len, matcher.to); j++) {
    if (buf[j] != seq.charAt(i+j))
    return false;
    }
    if (len >= matcher.to) {
    matcher.hitEnd =true;
    return false;
    }
    return next.match(matcher, i+len, seq);
}

Regards, Roger


On 10/28/19 9:03 PM, Ivan Gerasimov wrote:

Hello!

When building a Pattern object, the regex parser recognizes "slices" 
- continuous char subsequences, which all have to be matched 
case-sensitively/case-insensitively.  Matching with such a slice is 
implemented as a simple loop over a portion of the input.


In the current implementation, on each iteration of the loop it is 
checked if we have hit the end of the input (which is an uncommon case).


This check can be done only once, before the loop, which will make 
the loop lighter.


Benchmark shows up to +4% to the throughput for the case-insensitive 
matching.


Would you please help review the enhancement?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/


--- benchmark results ---

UNFIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  190.612 ? 0.336  ns/op

FIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  182.954 ? 0.493  ns/op
---




--
With kind regards,
Ivan Gerasimov

Re: RFR 8225466 : Optimize matching BMP Slice nodes

2019-12-18 Thread Roger Riggs


Hi Ivan,

Though the new code has a good effect, the asymmetry and duplication 
seems unnecessary.
Can it be structured to have a single copy of the loop comparing the 
available range

and still get the desired performance improvement.

Like:

boolean match(Matcher matcher,int i, CharSequence seq) {
int[] buf =buffer;
int len = buf.length;
for (int j =0; j < Math.min(len, matcher.to); j++) {
if (buf[j] != seq.charAt(i+j))
return false;
}
if (len >= matcher.to) {
matcher.hitEnd =true;
return false;
}
return next.match(matcher, i+len, seq);
}

Regards, Roger


On 10/28/19 9:03 PM, Ivan Gerasimov wrote:

Hello!

When building a Pattern object, the regex parser recognizes "slices" - 
continuous char subsequences, which all have to be matched 
case-sensitively/case-insensitively.  Matching with such a slice is 
implemented as a simple loop over a portion of the input.


In the current implementation, on each iteration of the loop it is 
checked if we have hit the end of the input (which is an uncommon case).


This check can be done only once, before the loop, which will make the 
loop lighter.


Benchmark shows up to +4% to the throughput for the case-insensitive 
matching.


Would you please help review the enhancement?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/


--- benchmark results ---

UNFIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  190.612 ? 0.336  ns/op

FIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  182.954 ? 0.493  ns/op
---

RFR 8225466 : Optimize matching BMP Slice nodes

2019-10-28 Thread Ivan Gerasimov


Hello!

When building a Pattern object, the regex parser recognizes "slices" - 
continuous char subsequences, which all have to be matched 
case-sensitively/case-insensitively.  Matching with such a slice is 
implemented as a simple loop over a portion of the input.


In the current implementation, on each iteration of the loop it is 
checked if we have hit the end of the input (which is an uncommon case).


This check can be done only once, before the loop, which will make the 
loop lighter.


Benchmark shows up to +4% to the throughput for the case-insensitive 
matching.


Would you please help review the enhancement?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/


--- benchmark results ---

UNFIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  190.612 ? 0.336  ns/op

FIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  182.954 ? 0.493  ns/op
---

--
With kind regards,
Ivan Gerasimov

Re: RFR 8225466 : Optimize matching BMP Slice nodes

Re: RFR 8225466 : Optimize matching BMP Slice nodes

RFR 8225466 : Optimize matching BMP Slice nodes

3 matches

Site Navigation

Mail list logo

Footer information