Thanks Roger!

webrev has been updated accordingly. See comments below.

http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/webrev/

On 03/22/2016 12:48 PM, Roger Riggs wrote:
Hi Sherman,

A few more comments,

Pattern.java:
 - 1782: typo: "deterministci"

fixed.

 - 2176: commented out code?

Yes, commented out for now. It helps the case below, which has "lots"
of ".*" lined up in a single patter. But I have not decided if it's worth the
potential cost of adding a "backtracking stopper" for each ".*". In most
this kind of cases, it gets slow, very slow, but still come back alive,
instead of a complete hangup.

            // 5014450    -> top level single greedy ...
            { "^\\s*sites\\[sites\\.length\\+\\+\\]\\s*=\\s*new\\s+Array\\(.*" +
              "\\s*//\\s*(\\d+).*" +
              "\\s*//\\s*([^-]+).*" +
              "\\s*//\\s*([^-]+).*" +
              "\\s*//\\s*([^-]+).*" +
              "/(?sui).*$",
              "\tsites[sites.length++] = new Array(\n" +
              "// 1079193366647 - creation time\n" +
              "// 1078902678663 1078852539723 1078753482632 0 0 0 0 0 0 0 0 0 0 0 - 
creation time last 14 days\n" +
              "// 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - bad\n" +
              "// 0.0030 0.0080 0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
-\n\n",
              false
           },

I would leave it for now. There is a bugid for it. So we can address it 
separately,
might with better tradeoff.

 - 2667: indentation


fixed.

 - 5660,5655: lambda syntax  use of simply "ch" is preferred over "(ch)" for 
single parameter lambdas
   and for consistency within the file.

fixed.

PrintPattern.java needs the standard copyright text if it is going to be in the 
repo.

fixed.

- 29: The Print(fmt, args...) method should follow the method naming 
convention. (initial lowercase)

fixed.

IntHashSet:  does performance matter enough to warrant adding this extra code.


The measurement I did suggests it's still worth adding such a small piece code, 
given
this one probably will be used for most of the greedy {}, with lots of raw 
"int" in and
out, without boxing, and much smaller footprint.

Thanks again,

Sherman





On 3/18/2016 4:05 PM, Xueming Shen wrote:
Hi,

There are couple regex related changes waiting for review. I have pull them
together here (with the notes) to make it easy to review.

http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/webrev/

(1) Exponential backtracking

Note: 
http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/backtracking

https://bugs.openjdk.java.net/browse/JDK-6328855
https://bugs.openjdk.java.net/browse/JDK-6192895
https://bugs.openjdk.java.net/browse/JDK-6345469
https://bugs.openjdk.java.net/browse/JDK-6988218
https://bugs.openjdk.java.net/browse/JDK-6693451
https://bugs.openjdk.java.net/browse/JDK-7006761
https://bugs.openjdk.java.net/browse/JDK-8140212

(2) Anonymous class to lambda function cleanup

Note: 
http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/lambdafunction

https://bugs.openjdk.java.net/browse/JDK-8151481
https://bugs.openjdk.java.net/browse/JDK-6609854

(3) Canonical Equivalents

Note: http://cr.openjdk.java.net/~sherman/regexBackTrack.Lamnda.CanonEQ/canonEQ

https://bugs.openjdk.java.net/browse/JDK-4916384
https://bugs.openjdk.java.net/browse/JDK-4867170
https://bugs.openjdk.java.net/browse/JDK-6995635
https://bugs.openjdk.java.net/browse/JDK-6728861
https://bugs.openjdk.java.net/browse/JDK-6736245
https://bugs.openjdk.java.net/browse/JDK-7080302

Thanks
Sherman


Reply via email to