Hello Adam and Joe
Thanks for the feedbacks. I was not aware of OGRSpatialReference API, it
is interresting to know.
Martin
Le 26/11/13 20:09, Joe White a écrit :
Hi, Adam,
Lucene would be way overkill for the type of matching that Martin is talking
about. The work he's doing now corresponds to both doing alias lookups in the
EPSG database and the morphFromEsri call in OGRSpatialReference.
Unfortunately, datum matching is limited to the name, as Martin said, and
everyone seems to have their own.
Joe
Sent from my iPad
On Nov 26, 2013, at 7:37 PM, Adam Estrada <[email protected]> wrote:
Understood! Thanks Martin and I think isHeuristicMatchForName() sounds great!
Adam
On Tue, Nov 26, 2013 at 7:18 PM, Martin Desruisseaux
<[email protected]> wrote:
Hello Adam
Thanks for the links, I was not aware of them. There is currently no
probability value for matching string(s). The current heuristic rules are
based on known practices, like ESRI adding the "D_" prefix for datum, spaces
replaced by '_' and non-alphanumeric characters ignored. I have not yet
found a need to match strings that are only similar. For now I have seen
either exact match with above rules, or completely different names (e.g.
"International 1924" and "Hayford 1909" are the same ellipsoid).
Lucene of course have a role, and actually we do use it, but rather in some
layers on top of metadata. I think it will come to SIS later, presumably in
a separated module...
Martin
Le 26/11/13 18:49, Adam Estrada a écrit :
Martin,
Is there a probability value that is returned for the matching
string(s)? I actually just came across a blog post[1] that does
something similar to what you are working towards. They use the
verbiage "best partial" for determining strings of noticeably
different lengths. This appears to be similar to using a Jaccard
index[2] for string comparison but on smaller bodies of text like the
titles of said aliases. Would this be an application for using a
Lucene index that already has all the info retrieval goodness built in
to it?
Adam
[1] http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
[2] http://en.wikipedia.org/wiki/Jaccard_index