Martin, Is there a probability value that is returned for the matching string(s)? I actually just came across a blog post[1] that does something similar to what you are working towards. They use the verbiage "best partial" for determining strings of noticeably different lengths. This appears to be similar to using a Jaccard index[2] for string comparison but on smaller bodies of text like the titles of said aliases. Would this be an application for using a Lucene index that already has all the info retrieval goodness built in to it?
Adam [1] http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/ [2] http://en.wikipedia.org/wiki/Jaccard_index On Tue, Nov 26, 2013 at 4:11 PM, Martin Desruisseaux <[email protected]> wrote: > Le 25/11/13 23:51, Martin Desruisseaux a écrit : > >> I would like a better method name for "nameMatches". If possible, I would >> like something that contains the word "heuristic" or "lenient" in it, or >> anything else which said the heuristic nature of this method. Does anyone >> have suggestions? I do not know if "nameMatchesHeuristically" or >> "heuristicNameMatches" would be correct English. > > > I'm trying "isHeuristicMatchForName(String)" [1]. A search on internet found > a few hits for "is heuristic match". If anyone has other idea, please let us > known. > > This particular method may need to be revisited as we try to handle data > from a larger range of data producers, so I think it is worth to make its > purpose easy to spot. > > Martin > > > [1] > https://builds.apache.org/job/sis-jdk7/site/apidocs/org/apache/sis/referencing/AbstractIdentifiedObject.html#isHeuristicMatchForName%28java.lang.String%29 >
