[jira] [Comment Edited] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567970#comment-15567970 ] Bruno P. Kinoshita edited comment on TEXT-21 at 10/12/16 7:59 AM: -- Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] was (Author: kinow): Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [Edit distance on Wikipedia](https://en.wikipedia.org/wiki/Edit_distance) > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567970#comment-15567970 ] Bruno P. Kinoshita edited comment on TEXT-21 at 10/12/16 8:00 AM: -- Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [A New Edit Distance Method for Finding Similarity in Dna Sequence (PDF)|http://waset.org/publications/7178/a-new-edit-distance-method-for-finding-similarity-in-dna-sequence] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] was (Author: kinow): Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [A New Edit Distance Method for Finding Similarity in Dna Sequence |http://waset.org/publications/7178/a-new-edit-distance-method-for-finding-similarity-in-dna-sequence] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15567970#comment-15567970 ] Bruno P. Kinoshita edited comment on TEXT-21 at 10/12/16 7:59 AM: -- Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [A New Edit Distance Method for Finding Similarity in Dna Sequence |http://waset.org/publications/7178/a-new-edit-distance-method-for-finding-similarity-in-dna-sequence] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] was (Author: kinow): Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)