[ https://issues.apache.org/jira/browse/TEXT-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Gregory updated TEXT-158: ------------------------------ Fix Version/s: (was: 1.8) 1.8.1 > Incorrect values for Jaccard similarity with empty strings > ---------------------------------------------------------- > > Key: TEXT-158 > URL: https://issues.apache.org/jira/browse/TEXT-158 > Project: Commons Text > Issue Type: Bug > Affects Versions: 1.6 > Reporter: Bruno P. Kinoshita > Priority: Minor > Fix For: 1.8.1 > > > In a discussion part of TEXT-126, it was > [pointed|https://github.com/apache/commons-text/pull/103#discussion_r263988298] > that the Jaccard similarity returns 0.0, and the distance 1.0. While in > other libraries it returns the opposite for each. > {code:java} > package br.eti.kinoshita.tests.text; > import java.util.Collections; > public class EditDistances { > public static void main(String[] args) { > System.out.println("Testing jaccard sim/dis with empty strings"); > System.out.println("---"); > org.simmetrics.metrics.Jaccard<String> j1 = new > org.simmetrics.metrics.Jaccard<>(); > float s1 = j1.compare(Collections.emptySet(), Collections.emptySet()); > System.out.println("Simmetrics Jaccard similarity: " + s1); > float d1 = j1.distance(Collections.emptySet(), > Collections.emptySet()); > System.out.println("Simmetrics Jaccard distance: " + d1); > > System.out.println("---"); > > info.debatty.java.stringsimilarity.Jaccard j2 = new > info.debatty.java.stringsimilarity.Jaccard(); > double s2 = j2.similarity("", ""); > System.out.println("javastringsimilarity Jaccard similarity: " + s2); > double d2 = j2.distance("", ""); > System.out.println("javastringsimilarity Jaccard distance: " + d2); > > System.out.println("---"); > > org.apache.commons.text.similarity.JaccardSimilarity j3_1 = new > org.apache.commons.text.similarity.JaccardSimilarity(); > double s3 = j3_1.apply("", ""); > System.out.println("commons-text Jaccard similarity: " + s3); > org.apache.commons.text.similarity.JaccardDistance j3_2 = new > org.apache.commons.text.similarity.JaccardDistance(); > double d3 = j3_2.apply("", ""); > System.out.println("commons-text Jaccard distance: " + d3); > } > }{code} > Produces: > {noformat} > Testing jaccard sim/dis with empty strings > --- > Simmetrics Jaccard similarity: 1.0 > Simmetrics Jaccard distance: 0.0 > --- > javastringsimilarity Jaccard similarity: 1.0 > javastringsimilarity Jaccard distance: 0.0 > --- > commons-text Jaccard similarity: 0.0 > commons-text Jaccard distance: 1.0{noformat} > We need to confirm what's the correct output for similarity and distance with > empty strings. And either document why we are returning what we are > returning, or fix it as a bug for the next release. -- This message was sent by Atlassian Jira (v8.3.2#803003)