[
https://issues.apache.org/jira/browse/TEXT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18065910#comment-18065910
]
Ron Ladin commented on TEXT-103:
--------------------------------
Hi ,[~ggregory]
The idea is to allow custom costs for insert, delete, and substitute
operations, which are currently hardcoded to 1. This enables Weighted
Levenshtein Distance.
In practice, this is useful when some typos are more likely than others. For
example, in OCR, confusing '0' with 'O' should cost less than a random change.
Same goes for keyboard proximity swapping adjacent keys is a common error that
shouldn't always have the same weight as a completely different character.
The implementation is backward compatible and maintains the original O(min(n,
m)) memory efficiency.
> Add provision to change the cost for insert, delete and replace operation in
> levenshtein distance
> -------------------------------------------------------------------------------------------------
>
> Key: TEXT-103
> URL: https://issues.apache.org/jira/browse/TEXT-103
> Project: Commons Text
> Issue Type: Improvement
> Reporter: Rohit Agarwal
> Priority: Minor
> Labels: newbie, patch
> Fix For: 1.x
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> There are two implementation of levenshtein distance, unlimitedCompare and
> limitedCompare.
> I propose to generalise the levenshtein distance by adding an option to
> change the value of
> 1) Addition of Character.
> 2) Deletion of Character.
> 3) Substitution of Character.
> Currently they are all set to 1. For backward compatibility this will be the
> default case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)