Hi, Adam, Lucene would be way overkill for the type of matching that Martin is talking about. The work he's doing now corresponds to both doing alias lookups in the EPSG database and the morphFromEsri call in OGRSpatialReference. Unfortunately, datum matching is limited to the name, as Martin said, and everyone seems to have their own.
Joe Sent from my iPad > On Nov 26, 2013, at 7:37 PM, Adam Estrada <[email protected]> wrote: > > Understood! Thanks Martin and I think isHeuristicMatchForName() sounds great! > > Adam > > On Tue, Nov 26, 2013 at 7:18 PM, Martin Desruisseaux > <[email protected]> wrote: >> Hello Adam >> >> Thanks for the links, I was not aware of them. There is currently no >> probability value for matching string(s). The current heuristic rules are >> based on known practices, like ESRI adding the "D_" prefix for datum, spaces >> replaced by '_' and non-alphanumeric characters ignored. I have not yet >> found a need to match strings that are only similar. For now I have seen >> either exact match with above rules, or completely different names (e.g. >> "International 1924" and "Hayford 1909" are the same ellipsoid). >> >> Lucene of course have a role, and actually we do use it, but rather in some >> layers on top of metadata. I think it will come to SIS later, presumably in >> a separated module... >> >> Martin >> >> >> >> Le 26/11/13 18:49, Adam Estrada a écrit : >> >>> Martin, >>> >>> Is there a probability value that is returned for the matching >>> string(s)? I actually just came across a blog post[1] that does >>> something similar to what you are working towards. They use the >>> verbiage "best partial" for determining strings of noticeably >>> different lengths. This appears to be similar to using a Jaccard >>> index[2] for string comparison but on smaller bodies of text like the >>> titles of said aliases. Would this be an application for using a >>> Lucene index that already has all the info retrieval goodness built in >>> to it? >>> >>> Adam >>> >>> [1] >>> http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/ >>> [2] http://en.wikipedia.org/wiki/Jaccard_index >> >>
