[ 
https://issues.apache.org/jira/browse/TEXT-188?focusedWorklogId=494265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-494265
 ]

ASF GitHub Bot logged work on TEXT-188:
---------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Oct/20 09:43
            Start Date: 03/Oct/20 09:43
    Worklog Time Spent: 10m 
      Work Description: vesterstroem commented on a change in pull request #174:
URL: https://github.com/apache/commons-text/pull/174#discussion_r499133321



##########
File path: 
src/main/java/org/apache/commons/text/similarity/LevenshteinDistance.java
##########
@@ -285,6 +286,11 @@ private static int limitedCompare(CharSequence left, 
CharSequence right, final i
                     // left and up
                     d[i] = 1 + Math.min(Math.min(d[i - 1], p[i]), p[i - 1]);
                 }
+                lowerBound = Math.min(lowerBound, d[i]);
+            }
+            // if the lower bound is greater than the threshold, then exit 
early
+            if (lowerBound > threshold) {
+                return -1;

Review comment:
       The improvement introduces the need for an extra test, which I forgot to 
add.
   
   The assertion:
   
           assertThat(new LevenshteinDistance(1).apply("abc", 
"acb")).isEqualTo(-1);
   
   should be inserted into e.g. testGetLevenshteinDistance_StringStringInt of 
LevenshteinDistanceTest.
   
   That will give coverage to the uncovered line - and it will fail, if we just 
always return p[n].
   
   Should I make a new pull request or update the existing one?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 494265)
    Time Spent: 40m  (was: 0.5h)

> Speed up LevenshteinDistance with threshold
> -------------------------------------------
>
>                 Key: TEXT-188
>                 URL: https://issues.apache.org/jira/browse/TEXT-188
>             Project: Commons Text
>          Issue Type: Improvement
>    Affects Versions: 1.9.1
>            Reporter: Jakob Vesterstrøm
>            Priority: Major
>         Attachments: improvement.patch
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The calculation made by the LevenshteinDistance class can often be made 
> faster, when the class in initialized with a threshold, and when the distance 
> is found to be larger than the threshold. In those cases, it is often not 
> necessary to iterate through the whole string, since a lower bound for the 
> result can be established after each iteration. If that lower bound is larger 
> than the threshold, the method can simply exit early with the same result as 
> without this improvement. 
> A patch with the proposed change is attached to this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to