[ 
https://issues.apache.org/jira/browse/LANG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated LANG-944:
-----------------------------

    Attachment: LANG-944.1.patch

Thanks Benedikt.
Currently StringUtils has LevenshteinDistance for string distance, but in 
multiple cases we need a similarity score.
Attached patch has jaro winkler similarity implementation.Higher score shows 
more similarity, and is much helpful in data mining.

StringUtils.getLevenshteinDistance("PENNSYLVANIA", "PENCILVANYA") = 4; which 
does not give clearly the similarity ratio.
Now, StringUtils.getSimilarityScore("PENNSYLVANIA", "PENCILVANYA") = 0.87.

> Add a feature of SimilarityMatch in StringUtils 
> ------------------------------------------------
>
>                 Key: LANG-944
>                 URL: https://issues.apache.org/jira/browse/LANG-944
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>    Affects Versions: 3.3
>            Reporter: Rekha Joshi
>             Fix For: Patch Needed, Discussion
>
>         Attachments: LANG-944.1.patch
>
>
> Add SimilarityMatch algorithm to evaluate a similarity matching ratio between 
> two strings.
> double matchscore = StringUtils.calculateSimilarityMatching(String s1, String 
> s2)
> I have a patch ready with implementation of similaritymatch.
> This happens to be a usual need in science algorithm and directly using 
> commons lang3 library for these string operation would be neat.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to